Genie is a cross-platform app (iOS, Android, Web) where users describe a goal and a containerized LLM agent ("Genie") is provisioned to autonomously pursue that goal. Genies run on schedules, monitor the world, and proactively push updates to their user. Users control what each Genie can access on the network via a real-time approval system.
┌─────────────────────────────────────────────────────────┐
│ Frontend (React Native Web) │
│ iOS / Android / Web from one codebase │
│ │
│ ┌──────────┐ ┌──────────┐ ┌────────────┐ ┌──────────┐ │
│ │ Home / │ │ Create │ │ Genie │ │ Terminal │ │
│ │ List │ │ Flow │ │ Detail │ │ (Debug) │ │
│ └──────────┘ └──────────┘ └────────────┘ └──────────┘ │
└──────────────────────┬──────────────────────────────────┘
│ REST + WebSocket
┌──────────────────────▼──────────────────────────────────┐
│ Backend API (Node.js / TS) │
│ │
│ ┌────────────┐ ┌────────────┐ ┌───────────────────────┐│
│ │ Auth │ │ Scheduler │ │ Container Orchestrator││
│ │ Service │ │ Service │ │ (Docker / Fly / ECS) ││
│ └────────────┘ └────────────┘ └───────────────────────┘│
│ ┌────────────┐ ┌────────────┐ ┌───────────────────────┐│
│ │ Chat / │ │ Network │ │ Notification ││
│ │ Message Q │ │ Approval │ │ Service (APNS/FCM) ││
│ └────────────┘ └────────────┘ └───────────────────────┘│
└──────────────────────┬──────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────┐
│ Genie Runtime (per container) │
│ │
│ ┌─────────────────────────────────────────────────────┐│
│ │ Genie Harness (Core IP) ││
│ │ ││
│ │ ┌──────────┐ ┌──────────┐ ┌───────────────────┐ ││
│ │ │ Planning │ │Execution │ │ Memory Manager │ ││
│ │ │ LLM │ │ LLM │ │ (read/write/ │ ││
│ │ │ │ │ │ │ summarize) │ ││
│ │ └──────────┘ └──────────┘ └───────────────────┘ ││
│ │ ┌──────────┐ ┌──────────┐ ┌───────────────────┐ ││
│ │ │ Tool │ │ Network │ │ Schedule │ ││
│ │ │ Runner │ │ Proxy │ │ Self-Recommender │ ││
│ │ │ (shell, │ │ (egress │ │ │ ││
│ │ │ files) │ │ filter) │ │ │ ││
│ │ └──────────┘ └──────────┘ └───────────────────┘ ││
│ │ ┌─────────────────────────────────────────────┐ ││
│ │ │ Metrics & Self-Evaluation Engine │ ││
│ │ │ (KPIs, tracking, periodic self-review) │ ││
│ │ └─────────────────────────────────────────────┘ ││
│ └─────────────────────────────────────────────────────┘│
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ Persistent Volume │ │ Debug SSH / Terminal Server │ │
│ │ (memory, state) │ │ (for drop-in diagnostics) │ │
│ └──────────────────┘ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────┐
│ Data Layer │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────────┐ │
│ │ Postgres │ │ Object Store │ │ Vector DB │ │
│ │ (users, │ │ (S3 - genie │ │ (genie long-term │ │
│ │ genies, │ │ artifacts) │ │ memory) │ │
│ │ perms) │ │ │ │ │ │
│ └──────────┘ └──────────────┘ └───────────────────┘ │
└─────────────────────────────────────────────────────────┘
Tech: React Native + react-native-web + Expo
Screens:
| Screen | Purpose |
|---|---|
| Auth | Sign up / login |
| Home | List of user's genies with status (running, sleeping, error) |
| Create Genie | Multi-step: describe goal -> LLM suggests config -> user reviews/approves -> deploy |
| Genie Detail | Chat interface, latest updates/briefings, network permissions panel, metrics dashboard, settings |
| Network Approvals | Pending approval requests (also push notifications) |
| Terminal | Web terminal (xterm.js) to drop into a genie's container for debugging |
| Settings | Account, notification preferences |
Key Features:
- Push notifications via APNS (iOS) and FCM (Android)
- WebSocket connection for real-time chat and status updates
- Offline message queuing (messages sent while genie is asleep are queued)
Tech: Node.js + TypeScript + Express/Fastify
Database: PostgreSQL
Key Services:
- JWT-based auth
- User management
- Provisions containers on demand (Docker on a VM cluster, or Fly.io Machines API, or AWS ECS)
- Start/stop/destroy genie containers
- Attaches persistent volumes for memory
- Manages container lifecycle (spin up on schedule, spin down after idle)
- Stores each genie's schedule (cron expressions)
- Triggers container wake-up at scheduled times
- Genie can recommend its own schedule during planning phase; user approves
- Proxies messages between user and genie
- Queues user messages when genie container is offline
- On container wake-up, delivers queued messages to genie
- Stores full conversation history in Postgres
- Receives egress requests from genie containers (via the network proxy)
- Creates approval requests
- Sends push notification to user
- On approval: updates firewall rules for that container
- Supports "allow once" vs "allow always" (per genie, per domain)
- APNS + FCM integration
- Sends: network approval requests, genie briefings/updates, genie status changes
This is the agent runtime that runs inside each container. It is the most critical component.
Tech: Python (best LLM tooling ecosystem)
- Used during genie creation to analyze the user's goal
- Suggests: container specs, schedule, model choices, required tools
- Also used by the genie for high-level reasoning and re-planning
- Model: configurable, suggested during creation (e.g., Claude Sonnet for simple tasks, Opus for complex)
- Handles the actual task execution: web scraping, data analysis, composing briefings
- Model: configurable, can be lighter/cheaper than the planning model
- Runs within tool-use loops
The genie's persistent brain. This is the key differentiator.
Memory Architecture:
┌─────────────────────────────────────┐
│ Memory Manager │
│ │
│ ┌───────────┐ ┌────────────────┐ │
│ │ Working │ │ Long-Term │ │
│ │ Memory │ │ Memory │ │
│ │ │ │ │ │
│ │ - Current │ │ - Vector DB │ │
│ │ task │ │ (semantic │ │
│ │ - Recent │ │ search) │ │
│ │ findings │ │ - Structured │ │
│ │ - Session │ │ knowledge │ │
│ │ state │ │ (JSON/SQLite)│ │
│ └───────────┘ └────────────────┘ │
│ │
│ ┌───────────────────────────────┐ │
│ │ Memory Lifecycle │ │
│ │ │ │
│ │ 1. After each task run: │ │
│ │ - Summarize findings │ │
│ │ - Extract key facts │ │
│ │ - Store in long-term │ │
│ │ │ │
│ │ 2. Before each task run: │ │
│ │ - Load relevant memories │ │
│ │ - Reconstruct context │ │
│ │ - Resume where left off │ │
│ │ │ │
│ │ 3. Periodically: │ │
│ │ - Consolidate/compress │ │
│ │ - Prune stale info │ │
│ │ - Re-rank importance │ │
│ └───────────────────────────────┘ │
└─────────────────────────────────────┘
Storage: Persistent volume mounted at /genie/memory/ survives container restarts.
working.json— current session stateknowledge.db— SQLite for structured factsvectors/— local vector index (e.g., ChromaDB) for semantic search over accumulated knowledgehistory/— compressed logs of past runs
Executes actions on behalf of the genie:
- Shell commands (sandboxed, non-root)
- File read/write (within the container)
- Web requests (routed through the network proxy)
- Data processing (Python libraries available)
- All outbound HTTP(S) from the container routes through a local proxy
- Proxy checks domain against the genie's allowlist
- If domain not approved: blocks request, sends approval request to backend
- If approved: forwards request
- Implemented as a transparent proxy (e.g., mitmproxy or a lightweight custom proxy)
The genie must measure its own performance against the user's goal. This is critical — without metrics, there's no feedback loop and no improvement.
How it works:
-
Metric Definition (at creation time): During the planning phase, the Planning LLM analyzes the user's goal and defines measurable KPIs. The user reviews and can adjust these.
Examples by goal type:
Goal Metrics "Monitor housing prices in Austin" - # of listings surfaced per week
- % of surfaced listings user found relevant (user feedback)
- Average time from listing appearing to user notification
- Coverage: % of major listing sources monitored"Daily briefing on Iran conflict" - Briefing delivered on time (Y/N per day)
- # of unique sources consulted
- User engagement: did user read/respond?
- User rating (optional thumbs up/down on briefings)"Monitor financial markets" - Briefing timeliness
- # of actionable insights flagged
- Accuracy of flagged trends (retroactive self-check)
- Source diversity -
Metric Collection (each run): After every task execution, the genie records metrics to a structured store:
/genie/memory/metrics/ ├── definitions.json # KPI definitions, targets, thresholds ├── observations.jsonl # Append-only log of metric data points per run └── evaluations.jsonl # Periodic self-evaluation summaries -
Self-Evaluation (periodic): On a configurable cadence (e.g., weekly, or every N runs), the Planning LLM reviews accumulated metrics and produces a self-evaluation:
- What's going well vs. what's underperforming
- Root cause analysis for missed targets
- Proposed adjustments (change sources, adjust schedule, refine search criteria)
- These adjustments are sent to the user for approval before being applied
-
User Feedback Loop:
- User can rate genie outputs (thumbs up/down, or 1-5 stars on briefings)
- User can flag irrelevant results ("this listing isn't what I'm looking for")
- This feedback is stored as a metric and factored into self-evaluation
- The genie learns what the user actually values over time
-
Metric Dashboard (in app):
- Genie Detail screen shows a simple performance summary
- Trend lines for key metrics over time
- Current self-evaluation score
- History of adjustments the genie has made
Storage: Metrics persist on the same mounted volume as memory, under /genie/memory/metrics/.
- After the planning phase, the genie suggests when it should be woken
- Examples: "I should check every 4 hours", "Once daily at 5:30 AM user's timezone"
- User approves/modifies the schedule
- Genie can also request schedule changes over its lifetime
- Lightweight SSH or WebSocket terminal server
- Allows the user to "drop in" to the container from the app
- Read-only mode available for safe inspection
- Full shell mode for debugging
PostgreSQL — primary database:
- Users, auth tokens
- Genies (config, status, schedule, goal, model choices)
- Conversations (messages between user and genie)
- Network permissions (per genie, per domain)
- Approval requests
Object Storage (S3 or equivalent):
- Genie artifacts (reports, generated files)
- Exported briefings
Vector DB (per genie, local in container):
- ChromaDB or similar embedded vector DB
- Stores genie's accumulated knowledge embeddings
- Persisted on the mounted volume
1. CREATION
User describes goal
→ Planning LLM analyzes goal
→ Suggests: container spec, schedule, models, estimated cost
→ User reviews and approves
→ Container provisioned, harness installed, genie initialized
2. FIRST RUN
Genie reads its goal
→ Planning LLM creates initial plan
→ Planning LLM defines KPIs/metrics for the goal
→ User reviews and approves metrics
→ Genie recommends its schedule ("wake me every morning at 5 AM")
→ User approves schedule
→ Genie begins first task execution
→ Hits network blocks, requests approvals
→ User approves domains
→ Genie completes first run, stores memories, sends first briefing
→ Container goes to sleep
3. SCHEDULED RUNS
Scheduler triggers wake-up
→ Container starts
→ Harness loads: reads persisted memory, checks for queued user messages
→ Execution LLM runs task with context from memory
→ Metrics recorded for this run
→ Results stored, briefing pushed to user
→ If self-evaluation due: Planning LLM reviews metrics, proposes adjustments
→ Container sleeps
4. USER-INITIATED INTERACTION
User sends message in chat
→ If container sleeping: wake it up, deliver message
→ Genie responds via chat
→ Container stays alive for a cooldown period, then sleeps
5. DEBUGGING
User opens terminal in app
→ Backend starts container if needed
→ WebSocket terminal connects to container's shell
→ User inspects logs, memory, state
6. TERMINATION
User deletes genie
→ Container destroyed
→ Persistent volume archived or deleted (user choice)
→ Data cleaned up
genie/
├── apps/
│ └── mobile/ # React Native + Web app (Expo)
│ ├── src/
│ │ ├── screens/
│ │ ├── components/
│ │ ├── services/ # API client, WebSocket, notifications
│ │ ├── store/ # State management
│ │ └── navigation/
│ └── app.json
│
├── backend/
│ ├── src/
│ │ ├── api/ # REST endpoints
│ │ ├── services/
│ │ │ ├── auth/
│ │ │ ├── container/ # Orchestrator
│ │ │ ├── scheduler/
│ │ │ ├── chat/
│ │ │ ├── network/ # Approval service
│ │ │ └── notifications/
│ │ ├── models/ # DB models
│ │ └── config/
│ └── package.json
│
├── harness/ # Genie Harness (Core IP)
│ ├── genie/
│ │ ├── core/
│ │ │ ├── harness.py # Main loop / lifecycle
│ │ │ ├── planner.py # Planning LLM interface
│ │ │ ├── executor.py # Execution LLM interface
│ │ │ └── scheduler.py # Schedule self-recommender
│ │ ├── metrics/
│ │ │ ├── engine.py # Metric collection + storage
│ │ │ ├── definitions.py # KPI definition framework
│ │ │ ├── evaluator.py # Periodic self-evaluation via Planning LLM
│ │ │ └── feedback.py # User feedback ingestion
│ │ ├── memory/
│ │ │ ├── manager.py # Memory lifecycle
│ │ │ ├── working.py # Working memory
│ │ │ ├── longterm.py # Long-term storage + vector search
│ │ │ └── consolidator.py # Memory compression / pruning
│ │ ├── tools/
│ │ │ ├── shell.py # Shell command execution
│ │ │ ├── files.py # File operations
│ │ │ ├── web.py # HTTP requests (via proxy)
│ │ │ └── data.py # Data processing utilities
│ │ ├── network/
│ │ │ ├── proxy.py # Egress proxy
│ │ │ └── firewall.py # Allowlist management
│ │ ├── comms/
│ │ │ ├── chat.py # Chat endpoint (WebSocket client)
│ │ │ └── terminal.py # Debug terminal server
│ │ └── config.py
│ ├── Dockerfile
│ ├── requirements.txt
│ └── entrypoint.sh
│
├── infra/ # Infrastructure as code
│ ├── docker-compose.yml # Local dev
│ ├── terraform/ # Cloud provisioning
│ └── scripts/
│
└── docs/
└── PLAN.md # This file (symlinked or copied)
| Layer | Technology |
|---|---|
| Mobile + Web | React Native + Expo + react-native-web |
| Backend API | Node.js + TypeScript + Fastify |
| Database | PostgreSQL |
| Container Runtime | Docker (dev), Fly.io Machines or AWS ECS (prod) |
| Genie Harness | Python 3.12+ |
| LLM Integration | Anthropic API (Claude), OpenAI API (GPT), configurable |
| Vector DB | ChromaDB (embedded, per genie) |
| Network Proxy | mitmproxy or custom lightweight proxy |
| Push Notifications | APNS + FCM via firebase-admin |
| Debug Terminal | xterm.js (frontend) + WebSocket shell relay |
| IaC | Terraform + Docker Compose |
- Genie harness: core loop, planning/execution LLM integration
- Memory manager: working memory, long-term storage, consolidation
- Metrics engine: KPI definition, per-run collection, self-evaluation
- Tool runner: shell, files, web requests
- Backend: auth, genie CRUD, container orchestrator (Docker locally)
- Basic chat relay (WebSocket)
- Local dev environment (docker-compose)
- Network proxy in container (egress filtering)
- Network approval flow (backend + push notifications)
- Scheduler service (cron-based wake/sleep)
- Schedule self-recommendation by genie
- Message queuing for offline genies
Build as a web app first for fast iteration, then wrap for mobile.
- Web app (React + Vite, or Next.js) — same component library usable in React Native later
- Auth screens
- Genie creation flow (with LLM suggestion step)
- Genie list / home screen
- Chat interface
- Network approval UI
- Push notification integration
- React Native app wrapping shared components
- APNS + FCM push notifications
- App store submission
- Debug terminal (drop-in shell from app)
- Genie status monitoring
- Memory inspection UI
- Error handling and recovery
- Container health checks
- Cloud deployment (Fly.io or AWS)
- Terraform IaC
- Monitoring and logging
- Rate limiting and abuse prevention
- Billing infrastructure (Stripe)
- App store submission
Start with Phase 1 — build the harness first since it's the core IP, then the backend to support it.
Immediate first tasks:
- Scaffold the
harness/Python project - Implement the core harness loop (wake -> load memory -> plan -> execute -> store memory -> report -> sleep)
- Implement memory manager with working + long-term memory
- Build a simple CLI to test genies locally before the app exists