Few-shot / fine-tuning corpus for teaching a vision-language model to convert messy whiteboard photos into polished tech diagrams. The OCR aspect is the focus: the model must reliably read handwritten labels, arrows, shapes, and annotations before any stylistic "cleanup" stage can succeed.
See NOTES.md for current thinking — purpose, use cases, open
questions, and next steps. See SAMPLES.md for the original
(more ambitious) coverage-matrix spec; current practice is narrower and
focused on verbatim-OCR accuracy / pseudotext reduction on a personal corpus.
samples/flat/NN.webp— original 4K photos, flat for batch upload.samples/in-folders/NN/— same photo +transcription.md+description.md.samples/lowres/NN.webp— 1024-wide ~28 KB mirrors for cheap agent-context loading.hf-dataset/— staging for the Hugging Face dataset (metadata builder, dataset card, packageddata/).
Published at danielrosehill/Whiteboards.
Each row exposes file_name, id, category, transcription, description.
See hf-dataset/README.md for the dataset card and
the envisioned evaluation tasks (zero-shot OCR, few-shot grounding,
image-to-image pseudotext preservation, VQA, etc.).
To rebuild the staged dataset from samples/in-folders/:
python3 hf-dataset/build_metadata.pyTo push to the Hub (requires huggingface-cli login):
huggingface-cli upload danielrosehill/Whiteboards hf-dataset/ --repo-type=dataset