Skip to main content
Saved
Pattern
Difficulty Intermediate

Response Regeneration

Let users re-run a prompt to get a different answer, keeping the previous attempts as switchable variants rather than throwing them away.

Den Odell
By Den Odell Added

Response Regeneration

Problem

A user gets an answer that’s almost right: good structure, wrong tone. On a normal web app there’d be nothing to do; the response is the response. But this is a model, and asking again will produce something different. So they want a do-over.

Most implementations handle this badly in one of two ways. Either there’s no regenerate at all, and the user has to retype or rephrase the prompt to try again. Or there’s a regenerate button that overwrites the previous answer, so if the new one is worse, the good-enough version they had is gone. Now they’re regenerating repeatedly, gambling that the next roll is better, with no way back to the one they liked.

Regeneration isn’t error recovery; the first answer was fine. It’s a way to get a second answer and pick the better one. Treating it as a destructive retry throws away what makes it useful: having both answers to compare.

Solution

Make regeneration re-run the same prompt and store the result alongside the previous one, not on top of it. Model an assistant turn as a set of variants with a pointer to the active one. Regenerating appends a new variant and moves the pointer; a small ”‹ 2 / 3 ›” control lets the user page back and forth between attempts. The previous answer is never lost; it’s just one of several.

Under the hood, regeneration re-enters the same generation cycle as the original: it streams like any other response (Streaming Response), can be stopped (Stop Generation), and can itself fail and retry (Retry). The only difference is where the result lands. Because a Message Thread already keys turns by id, adding a variants array to the assistant message is a small change.

There are two related actions worth distinguishing. Plain regeneration reuses the exact prompt. Edit-and-resubmit changes the user’s prompt, which invalidates everything after it, so truncate the thread at that point and generate fresh, keeping the old branch only if you support branching. And if your API exposes it, nudge sampling (a higher temperature, or an explicit “try a different approach” instruction) on regeneration so the new variant is actually different rather than a near-duplicate.

Example

The variant model and reducer, the regenerate-and-switch UI across frameworks, and edit-and-resubmit truncation.

Modeling Variants

An assistant message holds every attempt and a pointer to the one on screen.

// An assistant turn with switchable attempts
// { id, role: 'assistant', variants: [{ id, content, status }], activeVariant: 0 }

function messageReducer(message, action) {
  switch (action.type) {
    case 'regenerate': // append a fresh, streaming variant and switch to it
      return {
        ...message,
        variants: [...message.variants, { id: crypto.randomUUID(), content: '', status: 'streaming' }],
        activeVariant: message.variants.length,
      };

    case 'token':
      return {
        ...message,
        variants: message.variants.map((v, i) =>
          i === message.activeVariant ? { ...v, content: v.content + action.chunk } : v
        ),
      };

    case 'switch': // page between attempts without losing any
      return { ...message, activeVariant: action.index };

    default:
      return message;
  }
}

Regenerate and Switch Between Attempts

function AssistantTurn({ message, onRegenerate, onSwitch }) {
  const { variants, activeVariant } = message;
  const active = variants[activeVariant];
  const many = variants.length > 1;

  return (
    <article className="assistant-turn">
      <div className="content">{active.content}</div>
      <footer className="actions">
        {many && (
          <div className="variant-pager" role="group" aria-label="Response versions">
            <button onClick={() => onSwitch(activeVariant - 1)} disabled={activeVariant === 0}>‹</button>
            <span>{activeVariant + 1} / {variants.length}</span>
            <button onClick={() => onSwitch(activeVariant + 1)} disabled={activeVariant === variants.length - 1}>›</button>
          </div>
        )}
        <button onClick={onRegenerate} disabled={active.status === 'streaming'}>
          Regenerate
        </button>
      </footer>
    </article>
  );
}

Nudging the Model to Differ

If every regeneration returns near-identical text, raise the sampling temperature or add an explicit instruction so the new variant explores a different answer.

function regenerate(messages, previousAttempts) {
  return model.chat({
    messages,
    // Push variety up as attempts accumulate so users don't get the same answer twice
    temperature: Math.min(0.7 + previousAttempts * 0.15, 1.2),
  });
}

Edit-and-Resubmit Truncates the Thread

Changing an earlier prompt invalidates everything generated after it. Cut the thread at that turn and generate fresh from there.

function editAndResubmit(messages, editedId, newText) {
  const index = messages.findIndex(m => m.id === editedId);
  // Everything after the edited prompt is now stale
  const kept = messages.slice(0, index);
  return [...kept, { ...messages[index], content: newText }];
  // …then start a new generation from this truncated history
}

Benefits

  • Users can keep sampling for a better answer without retyping the prompt.
  • Storing attempts as variants makes regeneration non-destructive: the previous answer is a click away, not gone.
  • The pager lets users compare versions and settle on the one they prefer, which is the whole point of asking a model the same thing twice.
  • It reuses the streaming, stop, and retry machinery already in place, so it’s mostly a state-shape change.
  • Edit-and-resubmit gives users a clean way to correct the input rather than arguing with the output.

Tradeoffs

  • Every retained variant is a full response held in memory and, if persisted, in storage, so costs add up on long conversations.
  • Variant paging adds UI and state that has to stay coherent while a new attempt is still streaming.
  • Without a sampling nudge, regeneration can return near-identical answers, making the feature feel broken.
  • Edit-and-resubmit has to truncate downstream turns, which is destructive unless you build full conversation branching.
  • Deciding what “the answer” is for persistence, export, or downstream use gets ambiguous once a turn has several variants.

Summary

A model answers differently every time, so regeneration is a core interaction in its own right. Re-run the prompt into a new variant, keep the previous attempts, and give users a pager to compare and choose. Reuse your streaming and stop machinery for the generation itself, nudge sampling so the variants actually differ, and when someone edits an earlier prompt, cut the thread there on purpose. The result lets users try for a better answer without losing the one they already had.

Newsletter

A Monthly Email
from Den Odell

Behind-the-scenes thinking on frontend patterns, site updates, and more

No spam. Unsubscribe anytime.