Response Regeneration
Problem
A user gets an answer that’s almost right: good structure, wrong tone. On a normal web app there’d be nothing to do; the response is the response. But this is a model, and asking again will produce something different. So they want a do-over.
Most implementations handle this badly in one of two ways. Either there’s no regenerate at all, and the user has to retype or rephrase the prompt to try again. Or there’s a regenerate button that overwrites the previous answer, so if the new one is worse, the good-enough version they had is gone. Now they’re regenerating repeatedly, gambling that the next roll is better, with no way back to the one they liked.
Regeneration isn’t error recovery; the first answer was fine. It’s a way to get a second answer and pick the better one. Treating it as a destructive retry throws away what makes it useful: having both answers to compare.
Solution
Make regeneration re-run the same prompt and store the result alongside the previous one, not on top of it. Model an assistant turn as a set of variants with a pointer to the active one. Regenerating appends a new variant and moves the pointer; a small ”‹ 2 / 3 ›” control lets the user page back and forth between attempts. The previous answer is never lost; it’s just one of several.
Under the hood, regeneration re-enters the same generation cycle as the original: it streams like any other response (Streaming Response), can be stopped (Stop Generation), and can itself fail and retry (Retry). The only difference is where the result lands. Because a Message Thread already keys turns by id, adding a variants array to the assistant message is a small change.
There are two related actions worth distinguishing. Plain regeneration reuses the exact prompt. Edit-and-resubmit changes the user’s prompt, which invalidates everything after it, so truncate the thread at that point and generate fresh, keeping the old branch only if you support branching. And if your API exposes it, nudge sampling (a higher temperature, or an explicit “try a different approach” instruction) on regeneration so the new variant is actually different rather than a near-duplicate.
Example
The variant model and reducer, the regenerate-and-switch UI across frameworks, and edit-and-resubmit truncation.
Modeling Variants
An assistant message holds every attempt and a pointer to the one on screen.
// An assistant turn with switchable attempts
// { id, role: 'assistant', variants: [{ id, content, status }], activeVariant: 0 }
function messageReducer(message, action) {
switch (action.type) {
case 'regenerate': // append a fresh, streaming variant and switch to it
return {
...message,
variants: [...message.variants, { id: crypto.randomUUID(), content: '', status: 'streaming' }],
activeVariant: message.variants.length,
};
case 'token':
return {
...message,
variants: message.variants.map((v, i) =>
i === message.activeVariant ? { ...v, content: v.content + action.chunk } : v
),
};
case 'switch': // page between attempts without losing any
return { ...message, activeVariant: action.index };
default:
return message;
}
}
Regenerate and Switch Between Attempts
function AssistantTurn({ message, onRegenerate, onSwitch }) {
const { variants, activeVariant } = message;
const active = variants[activeVariant];
const many = variants.length > 1;
return (
<article className="assistant-turn">
<div className="content">{active.content}</div>
<footer className="actions">
{many && (
<div className="variant-pager" role="group" aria-label="Response versions">
<button onClick={() => onSwitch(activeVariant - 1)} disabled={activeVariant === 0}>‹</button>
<span>{activeVariant + 1} / {variants.length}</span>
<button onClick={() => onSwitch(activeVariant + 1)} disabled={activeVariant === variants.length - 1}>›</button>
</div>
)}
<button onClick={onRegenerate} disabled={active.status === 'streaming'}>
Regenerate
</button>
</footer>
</article>
);
} <template>
<article class="assistant-turn">
<div class="content">{{ active.content }}</div>
<footer class="actions">
<div v-if="variants.length > 1" class="variant-pager" role="group" aria-label="Response versions">
<button @click="onSwitch(message.activeVariant - 1)" :disabled="message.activeVariant === 0">‹</button>
<span>{{ message.activeVariant + 1 }} / {{ variants.length }}</span>
<button @click="onSwitch(message.activeVariant + 1)" :disabled="message.activeVariant === variants.length - 1">›</button>
</div>
<button @click="onRegenerate" :disabled="active.status === 'streaming'">Regenerate</button>
</footer>
</article>
</template>
<script setup>
import { computed } from 'vue';
const props = defineProps({ message: Object });
defineEmits(['regenerate', 'switch']);
const variants = computed(() => props.message.variants);
const active = computed(() => variants.value[props.message.activeVariant]);
</script> <script>
export let message;
export let onRegenerate;
export let onSwitch;
$: variants = message.variants;
$: active = variants[message.activeVariant];
</script>
<article class="assistant-turn">
<div class="content">{active.content}</div>
<footer class="actions">
{#if variants.length > 1}
<div class="variant-pager" role="group" aria-label="Response versions">
<button on:click={() => onSwitch(message.activeVariant - 1)} disabled={message.activeVariant === 0}>‹</button>
<span>{message.activeVariant + 1} / {variants.length}</span>
<button on:click={() => onSwitch(message.activeVariant + 1)} disabled={message.activeVariant === variants.length - 1}>›</button>
</div>
{/if}
<button on:click={onRegenerate} disabled={active.status === 'streaming'}>Regenerate</button>
</footer>
</article> class AssistantTurn extends HTMLElement {
set message(m) {
const active = m.variants[m.activeVariant];
const many = m.variants.length > 1;
this.innerHTML = `
<div class="content">${active.content}</div>
<footer class="actions">
${many ? `<span>${m.activeVariant + 1} / ${m.variants.length}</span>` : ''}
<button class="regen">Regenerate</button>
</footer>`;
this.querySelector('.regen').onclick = () =>
this.dispatchEvent(new CustomEvent('regenerate'));
}
}
customElements.define('assistant-turn', AssistantTurn); Nudging the Model to Differ
If every regeneration returns near-identical text, raise the sampling temperature or add an explicit instruction so the new variant explores a different answer.
function regenerate(messages, previousAttempts) {
return model.chat({
messages,
// Push variety up as attempts accumulate so users don't get the same answer twice
temperature: Math.min(0.7 + previousAttempts * 0.15, 1.2),
});
}
Edit-and-Resubmit Truncates the Thread
Changing an earlier prompt invalidates everything generated after it. Cut the thread at that turn and generate fresh from there.
function editAndResubmit(messages, editedId, newText) {
const index = messages.findIndex(m => m.id === editedId);
// Everything after the edited prompt is now stale
const kept = messages.slice(0, index);
return [...kept, { ...messages[index], content: newText }];
// …then start a new generation from this truncated history
}
Benefits
- Users can keep sampling for a better answer without retyping the prompt.
- Storing attempts as variants makes regeneration non-destructive: the previous answer is a click away, not gone.
- The pager lets users compare versions and settle on the one they prefer, which is the whole point of asking a model the same thing twice.
- It reuses the streaming, stop, and retry machinery already in place, so it’s mostly a state-shape change.
- Edit-and-resubmit gives users a clean way to correct the input rather than arguing with the output.
Tradeoffs
- Every retained variant is a full response held in memory and, if persisted, in storage, so costs add up on long conversations.
- Variant paging adds UI and state that has to stay coherent while a new attempt is still streaming.
- Without a sampling nudge, regeneration can return near-identical answers, making the feature feel broken.
- Edit-and-resubmit has to truncate downstream turns, which is destructive unless you build full conversation branching.
- Deciding what “the answer” is for persistence, export, or downstream use gets ambiguous once a turn has several variants.
Summary
A model answers differently every time, so regeneration is a core interaction in its own right. Re-run the prompt into a new variant, keep the previous attempts, and give users a pager to compare and choose. Reuse your streaming and stop machinery for the generation itself, nudge sampling so the variants actually differ, and when someone edits an earlier prompt, cut the thread there on purpose. The result lets users try for a better answer without losing the one they already had.