Whiteboard-OCR-Few-Shot-Learning

Few-shot / fine-tuning corpus for teaching a vision-language model to convert messy whiteboard photos into polished tech diagrams. The OCR aspect is the focus: the model must reliably read handwritten labels, arrows, shapes, and annotations before any stylistic "cleanup" stage can succeed.

See NOTES.md for current thinking — purpose, use cases, open questions, and next steps. See SAMPLES.md for the original (more ambitious) coverage-matrix spec; current practice is narrower and focused on verbatim-OCR accuracy / pseudotext reduction on a personal corpus.

Layout

samples/flat/NN.webp — original 4K photos, flat for batch upload.
samples/in-folders/NN/ — same photo + transcription.md + description.md.
samples/lowres/NN.webp — 1024-wide ~28 KB mirrors for cheap agent-context loading.
hf-dataset/ — staging for the Hugging Face dataset (metadata builder, dataset card, packaged data/).

Hugging Face dataset

Published at danielrosehill/Whiteboards.

Each row exposes file_name, id, category, transcription, description. See hf-dataset/README.md for the dataset card and the envisioned evaluation tasks (zero-shot OCR, few-shot grounding, image-to-image pseudotext preservation, VQA, etc.).

To rebuild the staged dataset from samples/in-folders/:

python3 hf-dataset/build_metadata.py

To push to the Hub (requires huggingface-cli login):

huggingface-cli upload danielrosehill/Whiteboards hf-dataset/ --repo-type=dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whiteboard-OCR-Few-Shot-Learning

Layout

Hugging Face dataset

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
hf-dataset		hf-dataset
samples		samples
NOTES.md		NOTES.md
README.md		README.md
SAMPLES.md		SAMPLES.md

Folders and files

Latest commit

History

Repository files navigation

Whiteboard-OCR-Few-Shot-Learning

Layout

Hugging Face dataset

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages