One username in, every video they've ever posted, across 50+ sites out.
Harvestr is a cross-platform video archival tool that probes dozens of video
hosting, cam-archive, and creator-economy sites for a single username, then
pulls down every video it finds β with a browser UI, an aggressive
downloader stack (aria2c + ffmpeg + curl fallback), content-based
deduplication, and an extensible custom-scraper framework for sites
yt-dlp doesn't cover.
If you follow a creator and want a local archive of their work, their content is usually scattered across 5-15 different sites: main platform, cross-posted mirrors, archive sites, fan sites, leak sites, etc. Chasing each site manually is tedious and you inevitably miss things, duplicate downloads, or fall behind.
Harvestr solves this with one command:
python universal_downloader.py alice_examplewhich fans out to 50+ sites in parallel, finds every profile/page for that name, and downloads every video β skipping anything it already has.
| Capability | How |
|---|---|
| 1800+ sites via yt-dlp | All mainstream + adult tube sites |
| Custom scrapers for 25+ cam-archive + creator sites | KVS mirror family, Coomer, Kemono, RedGifs, X.com, Reddit, Archivebate, Recordbate, Recu.me, CamCaps⦠|
| Parallel probing | 8-way concurrent site probing (~15 seconds for 50 sites) |
| aria2c 16-connection downloads | Multi-segment MP4 downloads at wire speed |
| HLS / DASH / m3u8 | ffmpeg pipeline for fragmented streams |
| Cloudflare bypass | curl_cffi Chrome TLS fingerprint + cloudscraper fallback |
| DDoS-Guard bypass | Accept: text/css trick for Coomer / Kemono |
| Cross-mirror dedup | One video across 5 mirrors β downloaded once |
| Content-based dedup | Post-hoc sweep using size + head/tail SHA1 (99%+ accuracy, <50 ms per file) |
| Cookie auth | Netscape cookies.txt with per-site domain filtering |
| Premium X.com (Twitter) | GraphQL API with auth_token + ct0 cookies |
| Web UI | Flask dashboard with live log, start/stop, inline video preview |
| Atomic state | Thread-safe history.json / failed.json, Windows-safe |
| Resumable | Re-runs only download new videos; rolling window per site |
| Dry-run mode | See what would be downloaded without touching disk |
Harvestr runs in two complementary modes, switchable with a single tab click
(or keyboard 2 / 1) in the web UI:
Given a username, fan out across 50+ sites to find every video this person has ever posted and download the ones you don't already have. See the Archive section below.
π΄ Live mode (backed by vendored StreaMonitor)
Track cam models across 18 platforms and auto-record the moment they go
live. Harvestr keeps a lightweight bot per model that polls the site every
5-30s; when status flips to PUBLIC, the HLS/RTMP stream is immediately
handed to ffmpeg and written to disk. Supported sites out-of-the-box:
| Site | Site | Site |
|---|---|---|
| Chaturbate | StripChat / StripChat VR | CamSoda |
| Cam4 | BongaCams | Flirt4Free |
| Cherry.tv | Streamate | MyFreeCams |
| ManyVids | FanslyLive | AmateurTV |
| CamsCom | DreamCam / DreamCam VR | SexChatHU |
| XLoveCam |
Zero-setup β StreaMonitor is vendored into live_backend/streamonitor/
and ships with Harvestr. Clone Harvestr, install requirements.txt, and the
Live tab lights up. No second repo to clone, no env var to set.
GPL-3.0 notice: the vendored StreaMonitor code retains its original GPL-3.0 license (see
live_backend/LICENSEandlive_backend/NOTICE.md). Combined distributions of Harvestr + live_backend/ must comply with GPL-3.0. Harvestr's own code outsidelive_backend/remains MIT.
To point at a development checkout of StreaMonitor instead of the vendored
copy, set HARVESTR_STREAMONITOR=<path> before launching webui.py.
UI features:
- Per-model cards with animated state dots (green pulse = recording, blue = connecting, purple = private, yellow = offline, red = problem)
- Filter by site, by status bucket, by username substring
- Sort by status / name / site / recorded size
- Live badge on the tab when any recording is active
- Bulk start/stop all
- Command palette (
Ctrl+K) for quick actions across both tabs
YouTube Β· Dailymotion Β· Vimeo Β· Rumble Β· Twitch (VODs & clips) Β· Kick Β· Odysee Β· BitChute Β· Soundcloud Β· Reddit Β· X.com / Twitter (premium) Β· RedGifs
PornHub Β· XVideos Β· xHamster Β· SpankBang Β· XNXX Β· YouPorn Β· Redtube Β· SpankWire Β· RedTube Β· 4Tube Β· TNA Flix Β· EPorner Β· Beeg Β· DrTuber Β· HotMovs Β· KeezMovies Β· ManyVids Β· Motherless Β· SxyPrn Β· Tube8
camwhores.tv Β· camwhores.video Β· camwhores.co Β· camwhores.bz Β· camwhoresHD Β· camwhoresbay Β· camwhorescloud Β· camvideos.tv Β· camhub.cc Β· camwh.com Β· cambro.tv Β· camcaps.tv Β· camcaps.io Β· camstreams.tv Β· porntrex Β· camsrip Β· recordbate Β· archivebate Β· recu.me
- Leakedzone.com β¨ β OnlyFans / IG / Snap archive with HLS video streams (served from the main domain β bypasses the Coomer CDN outage)
- Fapello.com β¨ β OnlyFans / IG / Snap archive with deterministic numbered posts
- Coomer.st β OnlyFans / Fansly / CandFans mirror (auto-recovers when their CDN subnet comes back β see Coomer outage below)
- Kemono.cr β Patreon / Fanbox / Gumroad / SubscribeStar / Fantia / Boosty / Discord / DLSite mirror
- RedGifs β v2 API, auto-acquired temp token
β οΈ Coomer CDN outage (April 2026): The91.149.227.0/24subnet that hosts Coomer's sharded video CDN (n1-n4.coomer.st) is globally null-routed. Metadata still works (profile pages, post counts, post titles) but video downloads time out. Harvestr has a fast-fail pre-flight health check that detects this and routes around Coomer for you β and the new Leakedzone + Fapello scrapers cover the same OnlyFans-archive content from unaffected infrastructure.
python universal_downloader.py --list-sites# Required
pip install -U "yt-dlp[default,curl-cffi]" requests cloudscraper rich flask
# Recommended (16x faster downloads)
winget install aria2.aria2
# Required for HLS / m3u8 streams
# Download from https://www.gyan.dev/ffmpeg/builds/ and add to PATHThe browser-extraction code paths (embed_extractors.py,
live_backend/streamonitor/utils/cf_broker.py) prefer
patchright β a
Playwright fork with stealth patches that frequently lets invisible-managed
Cloudflare Turnstile auto-pass without a captcha service.
You don't need to install it manually. The first time the browser tier runs and finds patchright missing, it will run once per machine:
pip install patchright
patchright install chromium # ~180 MB Chromium download
β¦then continue normally. The outcome is cached for the rest of the
process; subsequent calls hit the in-memory cache instantly. If the
install fails (offline, pip blocked, mirror down) the code falls back
to vanilla playwright silently β nothing breaks, you just lose the
stealth advantage on Cloudflare-protected hosts.
Forcing the install up-front is also fine:
pip install patchright
patchright install chromiumEither way, the code auto-detects at import. On startup the log shows
which driver is in use (Browser driver: patchright (stealth) vs the
vanilla-playwright fallback).
git clone https://github.com/KevinStreetCoder/harvestr.git
cd harvestr
cp config.example.json config.jsonpython webui.py --port 7860Open http://127.0.0.1:7860 and you get:
- Performer management (add/remove by name)
- Per-site checkbox filter (or presets: All / Custom only / yt-dlp only)
- Start / Stop buttons with live log tail
- History table with inline video preview
- Failed / skipped table with reason codes
- One-click Dedup (content-based dupe scan)
- Auto-refresh every 2 seconds
# Download every video for one username across all sites
python universal_downloader.py alice_example
# Restrict to specific sites
python universal_downloader.py alice_example --sites coomer,kemono,xcom
# Dry-run (probe + enumerate, no downloads)
python universal_downloader.py alice_example --dry-run
# Run for every performer configured in config.json
python universal_downloader.py --all
# Show every supported site
python universal_downloader.py --list-sites
# Verbose / debug mode
python universal_downloader.py alice_example -vThe web UI's Archive tab now has a Storage card that shows:
- Drive bar β horizontal stacked chart of Harvestr archive / other files / free space
- Free-space warning β bar turns yellow under 10 GB free, red under 2 GB
- Per-performer meter β each performer is a row with a proportional fill bar, byte total, file count, and a β button to wipe them
- Cleanup tools: "Prune older thanβ¦", "Free up spaceβ¦" (prune oldest until N GB free), "Dedup"
All destructive ops are 2-step confirm with dry-run preview:
Prune older than 90 days?
β Found 23 files (would free 1.8 GB)
β Delete them? [OK / Cancel]
| Endpoint | Method | Purpose |
|---|---|---|
/api/disk |
GET | Snapshot (cached 3 s) |
/api/disk/wipe |
POST | Remove every video for a performer (requires confirm:true) |
/api/disk/delete |
POST | Remove specific file paths |
/api/disk/prune_older |
POST | Remove files older than N days (dry-run by default; add apply:true) |
/api/disk/prune_to_free |
POST | Remove oldest until N GB free |
/api/disk/enforce_cap |
POST | Keep a performer's archive under N GB by deleting oldest |
History.json is kept in sync automatically β if you wipe a performer, their entries disappear from history so they'll re-download cleanly on the next run.
python dedupe.py # scan & report, no changes
python dedupe.py --apply # actually delete dupes
python dedupe.py --performer alice_example # limit to oneDedup uses size + 64 KB head SHA1 + 64 KB tail SHA1, catching >99% of real duplicates in <50 ms per file. Keeper chosen by longest filename (most descriptive title), tiebreaker oldest mtime.
{
"output_dir": "C:\\...\\downloads",
"performers": ["alice_example", "bob_example"],
"enabled_sites": [],
"max_videos_per_site": 200,
"min_probe_entries": 1,
"max_parallel_probes": 8,
"max_parallel_downloads": 3,
"min_disk_gb": 5.0,
"use_aria2c": true,
"aria2c_connections": 16,
"rate_limit": "",
"cookies_from_browser": "",
"cookies_file": "",
"impersonate_target": "chrome",
"min_duration_seconds": 30.0,
"retries": 5,
"probe_timeout": 60,
"verbose": false
}| Field | Purpose |
|---|---|
performers |
List used by --all and by the UI |
enabled_sites |
Empty = all sites. Otherwise a whitelist |
max_videos_per_site |
Rolling-window cap per performer per site per run |
max_parallel_probes |
How many site probes run concurrently |
max_parallel_downloads |
How many videos download concurrently |
min_disk_gb |
Pause if free space drops below this |
use_aria2c |
Toggle aria2c multi-segment downloader |
aria2c_connections |
Connections per file (16 = sweet spot) |
rate_limit |
Per-download cap, e.g. "500K" / "2M" |
cookies_from_browser |
"chrome" / "firefox" β picks up login cookies |
cookies_file |
Path to Netscape cookies.txt |
impersonate_target |
curl_cffi target, "chrome" is safe default |
min_duration_seconds |
Skip very short clips |
Some sites (recu.me, camwhores.tv private videos, camvault, X.com premium) require login cookies. See COOKIES_SETUP.md for the full cookie-export walkthrough.
Sites that do not need auth: Coomer, Kemono, RedGifs, Reddit (public), all KVS mirrors (tags/search pages).
Sites that benefit from auth: X.com (premium = 10Γ daily quota), Recu.me (premium = unlimited plays).
Sites that absolutely need auth: camwhores.tv "friend-locked" private videos, Recurbate premium downloads.
βββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββ
β CLI β --> β UniversalDown- β --> β probe_all_sites β
β or β β loader β β (parallel fanout) β
β UI β β (orchestrator) β βββββββ¬ββββββββββββββββ
βββββββββββ ββββββββββββββββββββ β
β
βββββββββββββββββββββββββββββββββββββββββ΄βββββββββββ
β β
βΌ βΌ
ββββββββββββββββββ βββββββββββββββββββ
β yt-dlp flat β β custom scrapers β
β extraction (29 β β (25+ classes) β
β sites cfg'd) β β Coomer, Kemono, β
βββββββββ¬βββββββββ β KVS family, ... β
β ββββββββββ¬βββββββββ
βββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββ
β filter_new β <-- cross-mirror dedup by video_id
β (history + β <-- URL / title filter (Macy Cartel etc.)
β failed.json) β
βββββββββ¬βββββββββ
βΌ
ββββββββββββββββββββββββββββββ
β download_videos β
β ββββββββββββββββββββββββ β
β β aria2c (MP4) β β
β β ffmpeg (HLS/DASH) β β
β β curl fallback β β
β ββββββββββββββββββββββββ β
βββββββββ¬βββββββββββββββββββββ
βΌ
ββββββββββββββββββ
β atomic history β
β write + lock β
ββββββββββββββββββ
Every scraper in custom_scrapers.py implements four methods:
class MyScraper(SiteScraper):
NAME = "mysite"
BASE_URL = "https://mysite.com"
CATEGORY = "adult" # or "mainstream", "archive"
COOKIE_DOMAIN = "mysite.com" # optional β filter cookies.txt per site
def probe(self, username) -> Optional[ProbeHit]:
"""Cheap test: does this user exist here? Return None or a hit."""
def enumerate(self, hit, username, limit) -> List[VideoRef]:
"""List all video refs (may or may not populate stream URL)."""
def extract_stream(self, ref) -> bool:
"""Resolve a ref's playable URL (m3u8 / direct mp4). Returns True on success."""Register the class in ALL_SCRAPER_CLASSES at the bottom of
custom_scrapers.py and it's auto-picked-up by both CLI and UI.
Everything under downloads/:
history.jsonβ successful downloads, keyed by{performer: {site|video_id: info}}failed.jsonβ failures, marked permanent after 3 attempts if dead / privateuniversal.logβ full debug log (also tailed live in the UI)
Live end-to-end smoke test for the new scrapers (actually downloads one small clip per working scraper):
python tests/smoketest_new_scrapers.pyExpected output:
[PASS] Coomer (OnlyFans/Fansly mirror) PIPELINE OK (download skipped: CDN unreachable from this network)
[PASS] Kemono (Patreon/Fanbox mirror) PIPELINE OK (download skipped: CDN unreachable from this network)
[PASS] RedGifs OK user=toasted500 3.51 MB -> ...
[PASS] Reddit user OK user=GallowBoob 9.06 MB -> ...
[FAIL] X.com (needs auth cookies) (expected: no cookies.txt)
Coomer/Kemono produce valid URLs but their CDN shards (n1-n4.coomer.st /
equivalents) are blocked by some ISPs β use a VPN if the actual download step
times out.
Both the Archive and Live tabs have a Bulk button next to the + Add
form. It opens a dialog with a textarea + JSON-upload button.
Paste one username per line (or comma-separated). Lines starting with #
are treated as comments. Uploading a JSON file merges:
performers[](union, case-insensitive dedup)enabled_sites[]- scalar settings like
max_videos_per_site,max_parallel_downloads, etc.
Same schema as config.example.json β you can drop in another Harvestr
install's config.
Paste one model per line. Format: username Site [room_id].
alice_model Chaturbate
bob_model StripChat 987654 # with room id
charlie_m Cam4
Uploading a JSON file accepts the same schema as StreaMonitor's
config.json: an array of {"username","site","room_id?"} objects.
POST /api/config/performer/bulk_addβ{"text": "..."}or{"names": [...]}POST /api/config/importβ merge any config JSONPOST /api/live/bulk_addβ{"text": "..."}or{"entries": [...]}
This is the most common "nothing happens" scenario and it usually means two things combined:
-
Your chosen performer has a narrow web footprint β Coomer is the only place with their content. Not every archived creator exists on every mirror. Harvestr logs
No hits for '<name>' on: leakedzone, fapello, kemono, β¦at the end of the probe phase so you can see exactly which scrapers reported zero content. -
Your network can't reach Coomer's sharded CDN (
n1-n4.coomer.st, subnet91.149.227.0/24). We documented in April 2026 that this subnet was null-routed globally for a stretch; even when it's back, many ISPs IP-block the range regardless.
Fix options (in order of least effort):
- Set a download proxy in the UI's Settings card:
socks5://127.0.0.1:9055(built-in Tor β click "Use Tor" button, ~60s to bootstrap) - Connect a VPN (Mullvad: Switzerland/Netherlands/Sweden exits work best; US exits often apply SNI filtering that blocks Coomer entirely)
- Try a different network β mobile hotspot often routes differently than a fixed-line ISP
The scraper is correct; only the route to the bytes is broken.
Make sure you haven't just completed a run β the progress card is only shown while a session is active. If you're on the Archive tab and don't see progress, check the Live tab badge (top-right of the nav) to see if something's running over there.
CamSmut has two layers of anti-scraping:
- Hash obfuscation in URLs β solved (we reverse the
pointeroverJS transform automatically). - Cloudflare + JS-rendered player on the embed host (playmogo.com, doodstream, etc.) β NOT solved with pure HTTP. The embed returns 403 Cloudflare challenge unless rendered in a real browser.
Workaround: use the standalone Playwright-based camsmut downloader at
C:\Users\<you>\Documents\Scripts\Downloaders\camsmut\camsmut_downloader.py
for actual downloads. Harvestr's built-in CamSmut scraper correctly probes
and enumerates but marks individual videos NEEDS-BROWSER (skipped, not
failed) so they don't pollute failed.json.
The vendored StreaMonitor under live_backend/streamonitor/ failed to
import. Most likely causes:
- You deleted
live_backend/(restore from git) - You set
HARVESTR_STREAMONITORto an invalid path (unset it or point at a real StreaMonitor checkout) - A Python version mismatch (StreaMonitor targets 3.10+; upgrade if on 3.8-3.9)
When you run Harvestr for an OnlyFans creator, all three scrapers probe in parallel. Order of preference at download time:
- Leakedzone β HLS streams over
leakedzone.commain domain. Reachable from networks where Coomer is blocked. Single-pass decoder pulls fresh signed URLs β ffmpeg immediately (URLs expire in ~5 min). - Fapello β best for photo archives. Most creators on Fapello are image-only (Harvestr skips images), so it's often 0 videos in practice.
- Coomer β when its CDN is up, wins on coverage. CDN-health pre-check short-circuits the 200+ video attempts when Coomer is down.
- Kemono β Patreon/Fanbox content (different scope than the OF trio).
Cross-mirror dedup handles the common case of a single video appearing on multiple sources β you'll only ever end up with one copy on disk.
This tool is for archiving content you have a right to access: creators you subscribe to, content in the public domain, content under permissive licenses, backups of your own uploads, etc.
Don't use it to:
- Redistribute copyrighted content
- Bypass paywalls for content you don't have a legitimate license to
- Scrape at a rate that abuses or disrupts a host site
- Circumvent technological protection measures that violate your local jurisdiction's anti-circumvention laws
You are responsible for complying with each site's Terms of Service and your local law. The authors disclaim any liability for misuse.
MIT β see LICENSE.
Stands on the shoulders of:
- yt-dlp β the universal extractor
- aria2 β multi-segment downloads
- ffmpeg β HLS / DASH demuxing
- curl_cffi β Chrome TLS fingerprint
- cloudscraper β Cloudflare IUAM bypass
- Flask β web UI
- StreaMonitor β the entire Live-mode
backend is StreaMonitor's 19-site
Botframework. We import it at runtime rather than reimplementing the per-site reverse-engineering; they've earned those lines of code the hard way
Harvestr is a proof of concept. It exists to demonstrate what a cross-site usernameβvideo pipeline looks like in practice β probes, custom scrapers, embed extractors, live recording, drift detection, UI, all glued together. It is not a production-grade product: sites change layouts weekly, CDNs rotate, and any one of the 50+ supported endpoints can go silently broken overnight. Treat every download run as experimental, verify the output before trusting it, and expect the occasional breakage that needs a quick scraper patch. PRs welcome β but please understand this project prioritises "does it work today?" over long-term maintenance commitments.