Drop any image and get the top-5 predicted labels with confidence scores. Uses Vision Transformer (ViT) trained on ImageNet-1k — 1000 categories. The model runs entirely in your browser; the image never leaves your device. First run downloads ~350 MB.
const pool = await createPool({
adapter: transformersAdapter(),
defaultDevice: 'webgpu', // or 'wasm'
});
const model = await pool.load('image-classification', {
model: 'Xenova/vit-base-patch16-224',
});
const result = await model.run(URL.createObjectURL(file), { topk: 5 });
// [{ label: 'tabby cat', score: 0.94 }, …]