Tokenize Anything via Prompting

This Space is a modified version of BAAI/tokenize-anything, used as a demo for the following custom HTML components built with Gradio 6's gr.HTML:

A promptable model capable of simultaneous segmentation, recognition, and caption.

How to use — Upload an image (or pick an example), then draw prompts on the left panel. Left-click to add a foreground point, right-click for a background point, or click-and-drag to draw a bounding box. Each prompt produces a mask, a label with confidence scores, and a caption shown in the right panel. The label format is concept (IoU, confidence): caption.

Examples