Skip to content

Rynaro/eidolons

Repository files navigation

Eidolons — a personal, portable team of AI agents

Eidolons

A personal, portable team of AI agents. Named specialists that work alone when the task is sharp, in harmony when it's big — and travel with you from project to project, host to host.

Roster Health CI Apache-2.0 nexus release EIIS release Integrity: strict


Most AI coding tools ship a single generalist that plans, scouts, builds, and documents all at once — and hits a ceiling fast. Eidolons is a different shape: eight independently-versioned specialists across seven roles, one CLI, dropped into any project. You get sharp boundaries instead of one confused generalist — the right specialist for each phase, over a shared memory that carries context between them.

And the routing is mechanical, not hopeful. Most multi-agent setups are a paragraph in CLAUDE.md the model is free to ignore — so it does, until you name an agent yourself. Eidolons installs real per-host hooks: at session start a deterministic, non-LLM kernel computes the routing decision and injects it — plus recalled memory — into context, on its own, every time. The team travels across Claude Code, Codex, GitHub Copilot, Cursor, and OpenCode, and degrades gracefully to documentary routing wherever a host's hooks aren't sound.

Try it in 60 seconds

Evaluation, not commitment — this drops a read-only ATLAS into a throwaway folder:

curl -sSL https://raw.githubusercontent.com/Rynaro/eidolons/main/cli/install.sh | bash
cd /tmp && mkdir eidolons-demo && cd eidolons-demo
eidolons init --preset minimal --non-interactive

Explore, then rm -rf /tmp/eidolons-demo and walk away. Full flow in Install.

Does it actually work?

Two questions, both measured against a bare host running the same model — no hand-waving.

1. Does wiring in the team change what the host actually does? eidolons eval compliance runs one prompt suite through a headless host twice — once with the harness wired, once with only the prose cortex (≈ a bare host) — and scores how it routes.

Routing  (Claude Code · k=2 · 56 sessions) Prose only With Eidolons
Stability — picks the right specialist on both runs 16.7% 58.3%  (3.5×)
Routes "which approach?" → the reasoner 0% 100%
Correct target overall 58.3% 66.7%
False delegation on control prompts 0% 0%

The signature win is consistency — the injection makes routing far more deterministic. (This is a SessionStart-only lower bound: headless hosts don't fire per-prompt hooks, so the interactive number is expected higher. Honest writeup: .spectra/research/compliance-eval-2026-06-12.md.)

2. Does the specialist shape beat one generalist pass? On an adversarial-hard coding suite (budget-matched, k=2) — measured in Vivi's own repoVivi's parallel-candidate shape lands every fix where a single pass lands two-thirds:

Hard-task fix quality  (pass², resolved on both runs) Single pass Vivi (fanout)
Adversarial-hard suite 0.67 1.00

Zero reward-hacks in 63 holdout-gated runs. And Kupo, the executor the team delegates micro-tasks to, earned its roster seat on a behavioral additive-proof — 36/36 tasks, pass³ 1.00.

Don't trust our numbers — reproduce the floor yourself. The behavioral evals above are billed and model-dependent, but the routing decision underneath them is a deterministic, non-LLM kernel — so we ship it as a benchmark anyone can run cold, with no API key, no billing, and ~0 tokens:

eidolons eval routing --suite public      # 15 labelled tasks across 12 routing categories

It grades the kernel's output against Eidolons-authored ground truth (evals/routing-suite.yaml); because the kernel is deterministic, pass^k == pass^1 and you'll get the exact same result we do. (--validate-suite self-tests the suite; --json for machine output.) This is the reproducible floor the billed evals build on — verify it, then weigh the rest.

These are early, small-N signals, framed honestly in the research digests and CHANGELOG.md — not marketing.

Meet the team

Eidolon What it does Reach for it when… Latest
ATLAS scout Maps an unfamiliar codebase without writing a line. Evidence-anchored, read-only by construction. Auditing a new repo, onboarding, before any change.
SPECTRA planner Turns a rough idea or scout report into a decision-ready spec — rubrics, gates, GIVEN/WHEN/THEN. Planning a feature before you build it.
Vivi coder · default The default coder. Brownfield, pattern-first, test-anchored — drives a closed edit-run-test loop and gates on pass^k instead of one green run. Shipping the change SPECTRA planned, on a loop-capable host.
APIVR-Δ coder · fallback Vivi's conservative predecessor, for hosts without the closed loop. Same discipline, non-loop posture — add with eidolons add apivr. A loop-incompetent host, or a cautious builder.
IDG scriber Synthesizes docs from sessions, specs, and deltas — provenance-first, with [GAP]/[DISPUTED] markers. Chronicling what you just built.
FORGE reasoner Deliberates on ambiguous trade-offs. Names alternatives, surfaces assumptions, returns a verdict + confidence. Two patterns apply and the choice isn't obvious.
VIGIL debugger Forensic debugger for failures that resist normal repair. Reproduction-gated, counterfactual-verified. A flaky test, heisenbug, or unexplained regression.
Kupo executor Low-effort delegate target. Patches an ephemeral sandbox, proves it with a real verifier, and proposes the patch back — never writes the real tree. Offloading trivial localized edits to keep a session lean.
CRYSTALIUM memory The shared four-layer memory substrate every member writes to and recalls from — tier-gated writes, hybrid recall, Dream consolidation, principled forgetting. Carrying context and learned patterns across sessions and members.

Eight shipped specialists across seven capability classes — scout, planner, coder, scriber, reasoner, debugger, executor — plus CRYSTALIUM, the memory substrate underneath them all. Versions and handoff contracts live in roster/index.yaml, the machine-readable source of truth.

How they compose

The team has a default shape: ATLAS scouts, SPECTRA plans, Vivi builds, IDG chronicles. FORGE and VIGIL are lateral specialists — consultable at any stage. CRYSTALIUM sits underneath all of them, the shared memory every member writes handoffs into and recalls from. Partial teams are first-class: bring just ATLAS to an audit, or the full pipeline to a greenfield.

Canonical pipeline
ATLAS ───▶ SPECTRA ───▶  Vivi  ───▶ IDG
  scout      plan         build       chronicle
             ▲             │ ▲
           FORGE ◀── (ambiguity, trade-offs, novel problems)
                           │ │
                         VIGIL ◀── (failure resisted repair; forensic attribution)
                           │
                         Kupo ◀── (localized verifier-backed micro-tasks, PROPOSE-only)

  ╞════════════════ CRYSTALIUM ════════════════╡
   shared memory — every member commits handoff
   artifacts and recalls them (bidirectional)

Handoffs are structured artifacts written to disk, not free-form messages. See methodology/composition.md for the contract table and partial-team matrix.

Mechanical routing — the harness

A descriptor table in a prose file can only suggest delegation; the host decides whether to listen, and usually it doesn't until you name an Eidolon yourself. The decision to build a mechanical harness wasn't a hunch — it's backed by a research synthesis of 112 adversarially-verified capability rows across 18 agents (DOSSIER-HARNESS-2026-06.md). The harness closes that gap with three pieces:

  • A deterministic routing kernel. eidolons run "<prompt>" --json classifies a prompt against roster/routing.yaml — no LLM, fully reproducible — and emits which Eidolon(s) handle it, at what tier, in what chain.
  • Per-host hook adapters. eidolons harness install wires that kernel into each host's own lifecycle hooks. At session start the routing artifact and a memory digest are injected as context — the host doesn't have to remember to delegate; the routing arrives on its own.
  • Graceful degradation. Routing is injected by default (advisory-mechanical). Where a host's hooks are absent or buggy, it silently falls back to documentary routing — never worse than prose.
eidolons harness install            # wire routing + memory injection into detected hosts
eidolons harness install --strict   # opt-in: add tool-boundary delegate-or-deny where sound
eidolons harness status             # per-host effective enforcement tier

For a hard backstop, --strict adds a PreToolUse delegate-or-deny tier that mechanically blocks direct main-loop edits (only delegated subagents may write), with soundness graded per host. When CRYSTALIUM is installed, the session-start hook also runs eidolons memory preflight — a one-shot recall that injects prior project memory, fail-open and bounded. The full per-host capability matrix is in DOSSIER-HARNESS-2026-06.md and docs/architecture.md § "Harness Layer".

Spec-Driven lifecycle — ESL

Routing decides who works. ESL — the Eidolons Spec Lifecycle — decides how a change moves, so non-trivial work runs through a right-sized, auditable lifecycle instead of a one-shot prompt. It isn't a second framework bolted on: the specialists you already have are the lifecycle — SPECTRA specifies, FORGE deliberates, Vivi implements, Kupo/VIGIL verify, IDG archives — and ESL is the thin grammar that sequences them, change by change, on disk under .spectra/changes/. Each Eidolon ships its own lifecycle hop; the cortex orchestrates the rest.

It's built deliberately against the documented failure modes of spec-driven development — over-specification, instruction bloat, spec-as-waterfall, "spec" as a throwaway prompt:

  • A mechanical right-sizing gate classifies every change by observable signals — trivial → Kupo direct (no ceremony), lite → one-page spec, full → the whole lifecycle. You can't over-specify a one-line fix.
  • maker ≠ checker, enforced — the implementer and the verifier are mechanically distinct identities, checked on the hand-off envelope. A change cannot self-verify.
  • Drift-check before archive re-derives the change against its living spec, catching implementation that outran the intent.
  • Opt-in, then mechanically forced — advisory by default (SHOULD open a change first); escalates to blocking (MUST) once the project crosses mechanical size thresholds (change-count / repo-LOC / full-spec ratio), recorded auditably in the lock. With tonberry installed, the harness injects an ESL reminder at every session start and on every non-trivial routed prompt — not left to memory. Trivial work is always exempt; install auto-assesses (skip with EIDOLONS_SKIP_AUTO_ASSESS=1).

The official implementation is tonberry — a thin (~13 MB, distroless) Go MCP whose verify is byte-identical to a zero-dependency bash 3.2 conformance checker, so the rich runtime and the minimal reference can never drift. Install it with eidolons mcp install tonberry; the contract it implements is Rynaro/eidolons-esl. ESL is opt-in — absent the MCP, the Eidolons route and build exactly as before. Once installed, the surfacing is mechanical — injected at session start every time, not (yet) a hard edit-time block.

Install

One-time, global:

curl -sSL https://raw.githubusercontent.com/Rynaro/eidolons/main/cli/install.sh | bash

This installs the eidolons CLI to ~/.local/bin/eidolons and caches the nexus at ~/.eidolons/nexus.

Per project — empty folders or running projects:

cd <any-project>
eidolons init                # interactive — choose members and preset (offers CRYSTALIUM memory)
eidolons add forge           # add a single member later
eidolons sync                # reconcile installed members to eidolons.yaml
eidolons harness install     # wire mechanical routing + memory injection into your hosts
eidolons verify              # re-check installed Eidolons against the roster's signed metadata

Keep the nexus current with eidolons upgrade self (atomic, integrity-verified, with --check and --rollback); upgrade installed Eidolons with eidolons upgrade. MCP servers — CRYSTALIUM memory and the tonberry ESL runtime — are a separate catalogue managed through eidolons mcp {list,install,upgrade,…}. Commit eidolons.lock alongside eidolons.yaml for reproducible, tamper-evident installs. Full walkthrough: docs/getting-started.md.

Verified releases

Every shipped Eidolon publishes attestation-backed releases through one canonical workflow (eidolon-release-template.yml) hosted here. Each release records its commit, tree, and archive SHA-256 into roster/index.yaml via Roster Intake. Under the default integrity.enforcement: strict posture, eidolons sync and eidolons verify abort with exit 1 if any installed Eidolon's checksum drifts from the signed metadata — the same gate Roster Health runs nightly. Nexus releases use the same model. Read the trust model at docs/release-integrity.md.

What's in this repo

Area What it contains
roster/ Machine-readable registry of every Eidolon — versions, repos, handoffs; the routing table (routing.yaml) and MCP catalogue (mcps.yaml)
methodology/ Design principles, composition contracts, the routing cortex, vocabulary
research/ Papers, citations, production patterns, scientific backing
cli/ The eidolons command-line tool — installs, wires, and orchestrates the team
schemas/ JSON Schemas for eidolons.yaml, eidolons.lock, roster entries, eval suites
docs/ Getting started, architecture, CLI reference, MCP store, model management, release integrity
examples/ Worked examples: greenfield, brownfield, solo-member, partial-team
Why a nexus, when each Eidolon is independently installable?

Each Eidolon is its own first-class repo, independently versioned — that's a hard invariant. The nexus is a coordinator, not an owner. It exists for what no single Eidolon can hold:

  1. Discovery — without a roster, nobody knows which Eidolons exist or how they relate.
  2. Composition — handoff contracts, pipeline conventions, the routing kernel, and partial-team patterns are shared assets.
  3. Research — the scientific backing for the whole program lives in one place instead of drifting across repos.
  4. Wiring — one eidolons add atlas,spectra,vivi beats fifty lines of clone-and-install docs, and eidolons harness install reaches hosts no individual Eidolon could.
  5. Supply-chain integrity — one canonical signing workflow, one ingestion path, one consumer-side gate. Independent signing schemes would defeat the trust model.

The four-layer architecture (install standard → Eidolon repos → this nexus → consumer project) is documented in docs/architecture.md. The install contract every Eidolon satisfies is the Eidolons Individual Install Standard (Rynaro/eidolons-eiis) — versioned independently; the CLI refuses to install non-conformant members.

Contributing

Per-Eidolon bugs and features belong in that Eidolon's repo (an ATLAS finding → Rynaro/ATLAS). CLI bugs, roster issues, and composition-contract changes belong here. Install-standard questions belong in Rynaro/eidolons-eiis. Unsure which layer owns a concern? docs/architecture.md maps the four layers and their responsibilities.

License

Apache-2.0. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages