Genie - Project Plan

Vision

Genie is a cross-platform app (iOS, Android, Web) where users describe a goal and a containerized LLM agent ("Genie") is provisioned to autonomously pursue that goal. Genies run on schedules, monitor the world, and proactively push updates to their user. Users control what each Genie can access on the network via a real-time approval system.

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    Frontend (React Native Web)           │
│              iOS / Android / Web from one codebase       │
│                                                          │
│  ┌──────────┐ ┌──────────┐ ┌────────────┐ ┌──────────┐ │
│  │  Home /  │ │  Create  │ │   Genie    │ │ Terminal │ │
│  │  List    │ │  Flow    │ │   Detail   │ │  (Debug) │ │
│  └──────────┘ └──────────┘ └────────────┘ └──────────┘ │
└──────────────────────┬──────────────────────────────────┘
                       │ REST + WebSocket
┌──────────────────────▼──────────────────────────────────┐
│                    Backend API (Node.js / TS)            │
│                                                          │
│  ┌────────────┐ ┌────────────┐ ┌───────────────────────┐│
│  │    Auth    │ │  Scheduler │ │  Container Orchestrator││
│  │  Service   │ │  Service   │ │  (Docker / Fly / ECS) ││
│  └────────────┘ └────────────┘ └───────────────────────┘│
│  ┌────────────┐ ┌────────────┐ ┌───────────────────────┐│
│  │   Chat /   │ │  Network   │ │   Notification        ││
│  │  Message Q │ │  Approval  │ │   Service (APNS/FCM)  ││
│  └────────────┘ └────────────┘ └───────────────────────┘│
└──────────────────────┬──────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────┐
│               Genie Runtime (per container)              │
│                                                          │
│  ┌─────────────────────────────────────────────────────┐│
│  │              Genie Harness (Core IP)                 ││
│  │                                                      ││
│  │  ┌──────────┐ ┌──────────┐ ┌───────────────────┐   ││
│  │  │ Planning │ │Execution │ │  Memory Manager   │   ││
│  │  │  LLM     │ │  LLM     │ │  (read/write/     │   ││
│  │  │          │ │          │ │   summarize)      │   ││
│  │  └──────────┘ └──────────┘ └───────────────────┘   ││
│  │  ┌──────────┐ ┌──────────┐ ┌───────────────────┐   ││
│  │  │  Tool    │ │ Network  │ │  Schedule         │   ││
│  │  │  Runner  │ │ Proxy    │ │  Self-Recommender │   ││
│  │  │ (shell,  │ │ (egress  │ │                   │   ││
│  │  │  files)  │ │  filter) │ │                   │   ││
│  │  └──────────┘ └──────────┘ └───────────────────┘   ││
│  │  ┌─────────────────────────────────────────────┐   ││
│  │  │  Metrics & Self-Evaluation Engine            │   ││
│  │  │  (KPIs, tracking, periodic self-review)      │   ││
│  │  └─────────────────────────────────────────────┘   ││
│  └─────────────────────────────────────────────────────┘│
│                                                          │
│  ┌──────────────────┐  ┌──────────────────────────────┐ │
│  │ Persistent Volume │  │  Debug SSH / Terminal Server │ │
│  │ (memory, state)   │  │  (for drop-in diagnostics)  │ │
│  └──────────────────┘  └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────┐
│                   Data Layer                              │
│                                                          │
│  ┌──────────┐ ┌──────────────┐ ┌───────────────────┐   │
│  │ Postgres │ │  Object Store │ │  Vector DB        │   │
│  │ (users,  │ │  (S3 - genie │ │  (genie long-term │   │
│  │  genies, │ │   artifacts)  │ │   memory)         │   │
│  │  perms)  │ │               │ │                   │   │
│  └──────────┘ └──────────────┘ └───────────────────┘   │
└─────────────────────────────────────────────────────────┘

Component Breakdown

1. Frontend (React Native Web)

Tech: React Native + react-native-web + Expo

Screens:

Screen	Purpose
Auth	Sign up / login
Home	List of user's genies with status (running, sleeping, error)
Create Genie	Multi-step: describe goal -> LLM suggests config -> user reviews/approves -> deploy
Genie Detail	Chat interface, latest updates/briefings, network permissions panel, metrics dashboard, settings
Network Approvals	Pending approval requests (also push notifications)
Terminal	Web terminal (xterm.js) to drop into a genie's container for debugging
Settings	Account, notification preferences

Key Features:

Push notifications via APNS (iOS) and FCM (Android)
WebSocket connection for real-time chat and status updates
Offline message queuing (messages sent while genie is asleep are queued)

2. Backend API

Tech: Node.js + TypeScript + Express/Fastify

Database: PostgreSQL

Key Services:

Auth Service

JWT-based auth
User management

Container Orchestrator

Provisions containers on demand (Docker on a VM cluster, or Fly.io Machines API, or AWS ECS)
Start/stop/destroy genie containers
Attaches persistent volumes for memory
Manages container lifecycle (spin up on schedule, spin down after idle)

Scheduler Service

Stores each genie's schedule (cron expressions)
Triggers container wake-up at scheduled times
Genie can recommend its own schedule during planning phase; user approves

Chat / Message Queue

Proxies messages between user and genie
Queues user messages when genie container is offline
On container wake-up, delivers queued messages to genie
Stores full conversation history in Postgres

Network Approval Service

Receives egress requests from genie containers (via the network proxy)
Creates approval requests
Sends push notification to user
On approval: updates firewall rules for that container
Supports "allow once" vs "allow always" (per genie, per domain)

Notification Service

APNS + FCM integration
Sends: network approval requests, genie briefings/updates, genie status changes

3. Genie Harness (Core IP)

This is the agent runtime that runs inside each container. It is the most critical component.

Tech: Python (best LLM tooling ecosystem)

3a. Planning LLM

Used during genie creation to analyze the user's goal
Suggests: container specs, schedule, model choices, required tools
Also used by the genie for high-level reasoning and re-planning
Model: configurable, suggested during creation (e.g., Claude Sonnet for simple tasks, Opus for complex)

3b. Execution LLM

Handles the actual task execution: web scraping, data analysis, composing briefings
Model: configurable, can be lighter/cheaper than the planning model
Runs within tool-use loops

3c. Memory Manager

The genie's persistent brain. This is the key differentiator.

Memory Architecture:

┌─────────────────────────────────────┐
│           Memory Manager            │
│                                     │
│  ┌───────────┐  ┌────────────────┐  │
│  │  Working   │  │   Long-Term    │  │
│  │  Memory    │  │   Memory       │  │
│  │            │  │                │  │
│  │ - Current  │  │ - Vector DB    │  │
│  │   task     │  │   (semantic    │  │
│  │ - Recent   │  │    search)     │  │
│  │   findings │  │ - Structured   │  │
│  │ - Session  │  │   knowledge    │  │
│  │   state    │  │   (JSON/SQLite)│  │
│  └───────────┘  └────────────────┘  │
│                                     │
│  ┌───────────────────────────────┐  │
│  │     Memory Lifecycle          │  │
│  │                               │  │
│  │ 1. After each task run:       │  │
│  │    - Summarize findings       │  │
│  │    - Extract key facts        │  │
│  │    - Store in long-term       │  │
│  │                               │  │
│  │ 2. Before each task run:      │  │
│  │    - Load relevant memories   │  │
│  │    - Reconstruct context      │  │
│  │    - Resume where left off    │  │
│  │                               │  │
│  │ 3. Periodically:              │  │
│  │    - Consolidate/compress     │  │
│  │    - Prune stale info         │  │
│  │    - Re-rank importance       │  │
│  └───────────────────────────────┘  │
└─────────────────────────────────────┘

Storage: Persistent volume mounted at /genie/memory/ survives container restarts.

working.json — current session state
knowledge.db — SQLite for structured facts
vectors/ — local vector index (e.g., ChromaDB) for semantic search over accumulated knowledge
history/ — compressed logs of past runs

3d. Tool Runner

Executes actions on behalf of the genie:

Shell commands (sandboxed, non-root)
File read/write (within the container)
Web requests (routed through the network proxy)
Data processing (Python libraries available)

3e. Network Proxy (Egress Filter)

All outbound HTTP(S) from the container routes through a local proxy
Proxy checks domain against the genie's allowlist
If domain not approved: blocks request, sends approval request to backend
If approved: forwards request
Implemented as a transparent proxy (e.g., mitmproxy or a lightweight custom proxy)

3f. Metrics & Self-Evaluation Engine

The genie must measure its own performance against the user's goal. This is critical — without metrics, there's no feedback loop and no improvement.

How it works:

Metric Definition (at creation time): During the planning phase, the Planning LLM analyzes the user's goal and defines measurable KPIs. The user reviews and can adjust these.

Examples by goal type:

Goal	Metrics
"Monitor housing prices in Austin"	- # of listings surfaced per week - % of surfaced listings user found relevant (user feedback) - Average time from listing appearing to user notification - Coverage: % of major listing sources monitored
"Daily briefing on Iran conflict"	- Briefing delivered on time (Y/N per day) - # of unique sources consulted - User engagement: did user read/respond? - User rating (optional thumbs up/down on briefings)
"Monitor financial markets"	- Briefing timeliness - # of actionable insights flagged - Accuracy of flagged trends (retroactive self-check) - Source diversity

Metric Collection (each run): After every task execution, the genie records metrics to a structured store:

/genie/memory/metrics/
├── definitions.json    # KPI definitions, targets, thresholds
├── observations.jsonl  # Append-only log of metric data points per run
└── evaluations.jsonl   # Periodic self-evaluation summaries

Self-Evaluation (periodic): On a configurable cadence (e.g., weekly, or every N runs), the Planning LLM reviews accumulated metrics and produces a self-evaluation:
- What's going well vs. what's underperforming
- Root cause analysis for missed targets
- Proposed adjustments (change sources, adjust schedule, refine search criteria)
- These adjustments are sent to the user for approval before being applied
User Feedback Loop:
- User can rate genie outputs (thumbs up/down, or 1-5 stars on briefings)
- User can flag irrelevant results ("this listing isn't what I'm looking for")
- This feedback is stored as a metric and factored into self-evaluation
- The genie learns what the user actually values over time
Metric Dashboard (in app):
- Genie Detail screen shows a simple performance summary
- Trend lines for key metrics over time
- Current self-evaluation score
- History of adjustments the genie has made

Storage: Metrics persist on the same mounted volume as memory, under /genie/memory/metrics/.

3g. Schedule Self-Recommender

After the planning phase, the genie suggests when it should be woken
Examples: "I should check every 4 hours", "Once daily at 5:30 AM user's timezone"
User approves/modifies the schedule
Genie can also request schedule changes over its lifetime

3g. Debug Terminal Server

Lightweight SSH or WebSocket terminal server
Allows the user to "drop in" to the container from the app
Read-only mode available for safe inspection
Full shell mode for debugging

4. Data Layer

PostgreSQL — primary database:

Users, auth tokens
Genies (config, status, schedule, goal, model choices)
Conversations (messages between user and genie)
Network permissions (per genie, per domain)
Approval requests

Object Storage (S3 or equivalent):

Genie artifacts (reports, generated files)
Exported briefings

Vector DB (per genie, local in container):

ChromaDB or similar embedded vector DB
Stores genie's accumulated knowledge embeddings
Persisted on the mounted volume

Genie Lifecycle

1. CREATION
   User describes goal
   → Planning LLM analyzes goal
   → Suggests: container spec, schedule, models, estimated cost
   → User reviews and approves
   → Container provisioned, harness installed, genie initialized

2. FIRST RUN
   Genie reads its goal
   → Planning LLM creates initial plan
   → Planning LLM defines KPIs/metrics for the goal
   → User reviews and approves metrics
   → Genie recommends its schedule ("wake me every morning at 5 AM")
   → User approves schedule
   → Genie begins first task execution
   → Hits network blocks, requests approvals
   → User approves domains
   → Genie completes first run, stores memories, sends first briefing
   → Container goes to sleep

3. SCHEDULED RUNS
   Scheduler triggers wake-up
   → Container starts
   → Harness loads: reads persisted memory, checks for queued user messages
   → Execution LLM runs task with context from memory
   → Metrics recorded for this run
   → Results stored, briefing pushed to user
   → If self-evaluation due: Planning LLM reviews metrics, proposes adjustments
   → Container sleeps

4. USER-INITIATED INTERACTION
   User sends message in chat
   → If container sleeping: wake it up, deliver message
   → Genie responds via chat
   → Container stays alive for a cooldown period, then sleeps

5. DEBUGGING
   User opens terminal in app
   → Backend starts container if needed
   → WebSocket terminal connects to container's shell
   → User inspects logs, memory, state

6. TERMINATION
   User deletes genie
   → Container destroyed
   → Persistent volume archived or deleted (user choice)
   → Data cleaned up

Project Structure

genie/
├── apps/
│   └── mobile/                  # React Native + Web app (Expo)
│       ├── src/
│       │   ├── screens/
│       │   ├── components/
│       │   ├── services/        # API client, WebSocket, notifications
│       │   ├── store/           # State management
│       │   └── navigation/
│       └── app.json
│
├── backend/
│   ├── src/
│   │   ├── api/                 # REST endpoints
│   │   ├── services/
│   │   │   ├── auth/
│   │   │   ├── container/       # Orchestrator
│   │   │   ├── scheduler/
│   │   │   ├── chat/
│   │   │   ├── network/         # Approval service
│   │   │   └── notifications/
│   │   ├── models/              # DB models
│   │   └── config/
│   └── package.json
│
├── harness/                     # Genie Harness (Core IP)
│   ├── genie/
│   │   ├── core/
│   │   │   ├── harness.py       # Main loop / lifecycle
│   │   │   ├── planner.py       # Planning LLM interface
│   │   │   ├── executor.py      # Execution LLM interface
│   │   │   └── scheduler.py     # Schedule self-recommender
│   │   ├── metrics/
│   │   │   ├── engine.py        # Metric collection + storage
│   │   │   ├── definitions.py   # KPI definition framework
│   │   │   ├── evaluator.py     # Periodic self-evaluation via Planning LLM
│   │   │   └── feedback.py      # User feedback ingestion
│   │   ├── memory/
│   │   │   ├── manager.py       # Memory lifecycle
│   │   │   ├── working.py       # Working memory
│   │   │   ├── longterm.py      # Long-term storage + vector search
│   │   │   └── consolidator.py  # Memory compression / pruning
│   │   ├── tools/
│   │   │   ├── shell.py         # Shell command execution
│   │   │   ├── files.py         # File operations
│   │   │   ├── web.py           # HTTP requests (via proxy)
│   │   │   └── data.py          # Data processing utilities
│   │   ├── network/
│   │   │   ├── proxy.py         # Egress proxy
│   │   │   └── firewall.py      # Allowlist management
│   │   ├── comms/
│   │   │   ├── chat.py          # Chat endpoint (WebSocket client)
│   │   │   └── terminal.py      # Debug terminal server
│   │   └── config.py
│   ├── Dockerfile
│   ├── requirements.txt
│   └── entrypoint.sh
│
├── infra/                       # Infrastructure as code
│   ├── docker-compose.yml       # Local dev
│   ├── terraform/               # Cloud provisioning
│   └── scripts/
│
└── docs/
    └── PLAN.md                  # This file (symlinked or copied)

Tech Stack Summary

Layer	Technology
Mobile + Web	React Native + Expo + react-native-web
Backend API	Node.js + TypeScript + Fastify
Database	PostgreSQL
Container Runtime	Docker (dev), Fly.io Machines or AWS ECS (prod)
Genie Harness	Python 3.12+
LLM Integration	Anthropic API (Claude), OpenAI API (GPT), configurable
Vector DB	ChromaDB (embedded, per genie)
Network Proxy	mitmproxy or custom lightweight proxy
Push Notifications	APNS + FCM via firebase-admin
Debug Terminal	xterm.js (frontend) + WebSocket shell relay
IaC	Terraform + Docker Compose

Development Phases

Phase 1: Foundation (Harness + Backend Core)

Genie harness: core loop, planning/execution LLM integration
Memory manager: working memory, long-term storage, consolidation
Metrics engine: KPI definition, per-run collection, self-evaluation
Tool runner: shell, files, web requests
Backend: auth, genie CRUD, container orchestrator (Docker locally)
Basic chat relay (WebSocket)
Local dev environment (docker-compose)

Phase 2: Network & Scheduling

Network proxy in container (egress filtering)
Network approval flow (backend + push notifications)
Scheduler service (cron-based wake/sleep)
Schedule self-recommendation by genie
Message queuing for offline genies

Phase 3: Frontend (Web First)

Build as a web app first for fast iteration, then wrap for mobile.

Web app (React + Vite, or Next.js) — same component library usable in React Native later
Auth screens
Genie creation flow (with LLM suggestion step)
Genie list / home screen
Chat interface
Network approval UI
Push notification integration

Phase 3b: Mobile

React Native app wrapping shared components
APNS + FCM push notifications
App store submission

Phase 4: Debug & Polish

Phase 5: Production Readiness

Next Steps

Start with Phase 1 — build the harness first since it's the core IP, then the backend to support it.

Immediate first tasks:

Scaffold the harness/ Python project
Implement the core harness loop (wake -> load memory -> plan -> execute -> store memory -> report -> sleep)
Implement memory manager with working + long-term memory
Build a simple CLI to test genies locally before the app exists

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Genie - Project Plan

Vision

Architecture Overview

Component Breakdown

1. Frontend (React Native Web)

2. Backend API

Auth Service

Container Orchestrator

Scheduler Service

Chat / Message Queue

Network Approval Service

Notification Service

3. Genie Harness (Core IP)

3a. Planning LLM

3b. Execution LLM

3c. Memory Manager

3d. Tool Runner

3e. Network Proxy (Egress Filter)

3f. Metrics & Self-Evaluation Engine

3g. Schedule Self-Recommender

3g. Debug Terminal Server

4. Data Layer

Genie Lifecycle

Project Structure

Tech Stack Summary

Development Phases

Phase 1: Foundation (Harness + Backend Core)

Phase 2: Network & Scheduling

Phase 3: Frontend (Web First)

Phase 3b: Mobile

Phase 4: Debug & Polish

Phase 5: Production Readiness

Next Steps

FilesExpand file tree

PLAN.md

Latest commit

History

PLAN.md

File metadata and controls

Genie - Project Plan

Vision

Architecture Overview

Component Breakdown

1. Frontend (React Native Web)

2. Backend API

Auth Service

Container Orchestrator

Scheduler Service

Chat / Message Queue

Network Approval Service

Notification Service

3. Genie Harness (Core IP)

3a. Planning LLM

3b. Execution LLM

3c. Memory Manager

3d. Tool Runner

3e. Network Proxy (Egress Filter)

3f. Metrics & Self-Evaluation Engine

3g. Schedule Self-Recommender

3g. Debug Terminal Server

4. Data Layer

Genie Lifecycle

Project Structure

Tech Stack Summary

Development Phases

Phase 1: Foundation (Harness + Backend Core)

Phase 2: Network & Scheduling

Phase 3: Frontend (Web First)

Phase 3b: Mobile

Phase 4: Debug & Polish

Phase 5: Production Readiness

Next Steps