Tokens stream in real time from a GPT-2 model running in a Web Worker. The page never freezes — inference happens off the main thread. Hit Cancel at any time to stop generation mid-stream. First run downloads ~160 MB.
const model = await pool.load('text-generation', {
model: 'Xenova/gpt2',
});
const stream = model.stream(prompt, { max_new_tokens: 120 });
for await (const token of readableToAsyncIter(stream)) {
output += token; // tokens arrive one by one
}
Output will appear here…