Skip to content

danielrosehill/Whiteboard-OCR-Few-Shot-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Whiteboard-OCR-Few-Shot-Learning

Few-shot / fine-tuning corpus for teaching a vision-language model to convert messy whiteboard photos into polished tech diagrams. The OCR aspect is the focus: the model must reliably read handwritten labels, arrows, shapes, and annotations before any stylistic "cleanup" stage can succeed.

See NOTES.md for current thinking — purpose, use cases, open questions, and next steps. See SAMPLES.md for the original (more ambitious) coverage-matrix spec; current practice is narrower and focused on verbatim-OCR accuracy / pseudotext reduction on a personal corpus.

Layout

  • samples/flat/NN.webp — original 4K photos, flat for batch upload.
  • samples/in-folders/NN/ — same photo + transcription.md + description.md.
  • samples/lowres/NN.webp — 1024-wide ~28 KB mirrors for cheap agent-context loading.
  • hf-dataset/ — staging for the Hugging Face dataset (metadata builder, dataset card, packaged data/).

Hugging Face dataset

Published at danielrosehill/Whiteboards.

Each row exposes file_name, id, category, transcription, description. See hf-dataset/README.md for the dataset card and the envisioned evaluation tasks (zero-shot OCR, few-shot grounding, image-to-image pseudotext preservation, VQA, etc.).

To rebuild the staged dataset from samples/in-folders/:

python3 hf-dataset/build_metadata.py

To push to the Hub (requires huggingface-cli login):

huggingface-cli upload danielrosehill/Whiteboards hf-dataset/ --repo-type=dataset

About

Exploring various vision AI workflows for whiteboard processing

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages