automation script to pull models.yml#9635
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
This PR is large and would use a significant portion of your monthly review quota. Comment |
Bundle ReportChanges will increase total bundle size by 14.39kB (0.06%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: marimo-esmAssets Changed:
Files in
|
|
This PR is large and would use a significant portion of your monthly review quota. Comment |
|
This PR is large and would use a significant portion of your monthly review quota. Comment |
There was a problem hiding this comment.
Pull request overview
This PR introduces a manual sync workflow for packages/llm-info/data/models.yml from the public models.dev catalog, restructures the model catalog to be provider-keyed, and updates codegen + frontend consumption to match the new schema (capabilities, modalities, pricing, release dates).
Changes:
- Add a
pnpm sync-modelsscript + implementation to fetchmodels.dev/api.jsonand append/replace entries inmodels.ymlwhile preserving curated entries/comments. - Change the
llm-infomodel schema and data layout to a top-level provider map, enriching entries with capabilities, modalities, release dates, and cost. - Update codegen/tests and frontend model registry + UI “thinking” indicator to use the new structure.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/llm-info/src/sync-models.ts | CLI entrypoint + YAML-preserving append/replace writer for models.yml. |
| packages/llm-info/src/cli.ts | Parses sync-models CLI flags (mode, providers filter, per-provider cap). |
| packages/llm-info/src/sources/models-dev.ts | Fetches/parses models.dev API response with Zod validation and warnings. |
| packages/llm-info/src/sources/merge.ts | Merges models.dev data into local provider buckets with trimming/sorting. |
| packages/llm-info/src/index.ts | Updates exported types (capabilities, modalities, provider-keyed structure). |
| packages/llm-info/src/generate.ts | Updates codegen validation + JSON structure to provider-keyed models. |
| packages/llm-info/src/tests/sync-models.test.ts | Adds comprehensive tests for merge + sync behavior and YAML formatting preservation. |
| packages/llm-info/src/tests/schema.test.ts | Updates schema tests for new model entry shape + provider-keyed YAML. |
| packages/llm-info/src/tests/json-structure.test.ts | Updates JSON shape expectations (models is a provider-keyed map). |
| packages/llm-info/data/models.yml | Converts catalog to provider-keyed sections and adds enriched fields. |
| packages/llm-info/package.json | Adds sync-models script. |
| packages/llm-info/README.md | Documents pnpm sync-models usage examples. |
| packages/llm-info/skills/SKILL.md | Adds a documented workflow for backfilling empty descriptions (Cursor skill). |
| .cursor/skills/fill-model-descriptions/SKILL.md | Same skill doc mirrored under .cursor/skills. |
| frontend/src/core/ai/model-registry.ts | Updates frontend registry to consume provider-keyed models + rehydrate dates + provider field. |
| frontend/src/core/ai/tests/model-registry.test.ts | Updates mocks/expectations for provider-keyed models and provider-owned entries. |
| frontend/src/components/app-config/ai-config.tsx | Switches “thinking” badge to capabilities.includes("thinking"). |
| frontend/src/components/ai/ai-model-dropdown.tsx | Switches “thinking” indicators to capabilities.includes("thinking"). |
| frontend/src/components/ai/tests/ai-utils.test.ts | Updates models.json mock to provider-keyed format + new fields. |
| capabilities: Capability[]; | ||
| input_types: DataType[]; | ||
| output_types: DataType[]; | ||
| release_date: Date; |
There was a problem hiding this comment.
9 issues found across 19 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/llm-info/src/cli.ts">
<violation number="1" location="packages/llm-info/src/cli.ts:35">
P2: `--mode replace` is ignored. Use `getFlag()` here so the spaced form does not fall back to append mode.</violation>
<violation number="2" location="packages/llm-info/src/cli.ts:73">
P1: Empty provider list should fail. In `--replace` mode, a typo can rewrite `models.yml` from an empty result set.</violation>
</file>
<file name="packages/llm-info/src/sync-models.ts">
<violation number="1" location="packages/llm-info/src/sync-models.ts:237">
P2: Cap is not per final provider. `google` can get trimmed entries from both `google` and `google-vertex`, so `-n 5` can still append 10 models. Trim after merging into one bucket.</violation>
</file>
Architecture diagram
sequenceDiagram
participant CLI as Sync Script (cli.ts)
participant Sync as sync-models.ts
participant Merge as sources/merge.ts
participant ModelsDev as sources/models-dev.ts
participant API as models.dev API
participant YAML as data/models.yml
participant Codegen as generate.ts
participant JSON as data/generated/models.json
participant Frontend as Frontend Components
participant Registry as model-registry.ts
Note over CLI,Registry: Model Sync and Consumption Flow
CLI->>Sync: pnpm sync-models [--replace] [-n 10] [-p openai,google]
Sync->>ModelsDev: fetchModelsDev()
ModelsDev->>API: GET https://models.dev/api.json
API-->>ModelsDev: JSON response
ModelsDev->>ModelsDev: parseModelsDev() – validate with Zod schema
ModelsDev-->>Sync: ModelsDevApi object
Sync->>YAML: readFileSync(models.yml)
YAML-->>Sync: YAML text
Sync->>Sync: parseExistingModels() – extract provider-model pairs
Sync->>Merge: mergeModels(existing, modelsDev, options)
Merge->>Merge: For each provider in PROVIDER_MAP:
Merge->>Merge: - Build AiModel entries from API data
Merge->>Merge: - Derive roles, capabilities, cost, modalities
Merge->>Merge: - Sort newest-first, cap at maxPerProvider
Merge->>Merge: - Skip models that already exist locally
Merge-->>Sync: MergeSummary (newEntries, preservedCount)
alt mode === "append"
Sync->>Sync: appendIntoDocument() – append new entries to existing YAML
else mode === "replace"
Sync->>Sync: renderFresh() – generate entirely new YAML
end
Sync->>YAML: writeFileSync(models.yml)
Note over Codegen: Codegen runs separately
Codegen->>YAML: readFileSync + parse
Codegen->>Codegen: Validate with ModelsByProviderSchema
Codegen->>JSON: writeFileSync(models.json)
Note over Frontend: Runtime consumption
Frontend->>JSON: import models.json
JSON-->>Frontend: { models: { providerId: AiModel[] } }
Frontend->>Registry: getKnownModelMaps()
Registry->>Registry: Flatten per-provider arrays into single Map
Registry->>Registry: Each model gets its own provider field
Registry-->>Frontend: QualifiedModelId → AiModel
alt User opens model dropdown
Frontend->>Registry: getModelsByProvider(provider)
Registry-->>Frontend: AiModel[]
Frontend->>Frontend: Check model.capabilities.includes("thinking")
end
Note over Registry: Key schema changes
Note over Registry: providers: string[] → provider: string (single)
Note over Registry: thinking: boolean → capabilities: string[]
Note over Registry: New fields: input_types, output_types, release_date, cost
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| modelsYamlPath, | ||
| write = true, | ||
| mode = "append", | ||
| maxPerProvider, |
There was a problem hiding this comment.
P2: Cap is not per final provider. google can get trimmed entries from both google and google-vertex, so -n 5 can still append 10 models. Trim after merging into one bucket.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/llm-info/src/sync-models.ts, line 237:
<comment>Cap is not per final provider. `google` can get trimmed entries from both `google` and `google-vertex`, so `-n 5` can still append 10 models. Trim after merging into one bucket.</comment>
<file context>
@@ -0,0 +1,304 @@
+ modelsYamlPath,
+ write = true,
+ mode = "append",
+ maxPerProvider,
+ providers,
+ } = options;
</file context>
There was a problem hiding this comment.
This was addressed in 2f7425a — the dedup landed in merge.ts rather than sync-models.ts, so the diff for this file looks unchanged but the underlying behaviour is fixed.
mergeModels now accumulates candidates per marimo provider into a single Map<modelId, AiModel>, then calls sortAndTrim once per marimo provider. See merge.ts ~L174-L212 (candidatesByProvider).
Regression test in packages/llm-info/src/__tests__/sync-models.test.ts:
it("enforces `maxPerProvider` after deduping across mapped providers", ...)
passes google + google-vertex with overlapping ids and maxPerProvider: 2, and asserts the merged bucket has exactly 2 entries. So -n 5 cannot produce 10 anymore.
There was a problem hiding this comment.
Thanks for the feedback.
There was a problem hiding this comment.
1 issue found across 9 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/llm-info/src/sync-models.ts">
<violation number="1" location="packages/llm-info/src/sync-models.ts:237">
P2: Cap is not per final provider. `google` can get trimmed entries from both `google` and `google-vertex`, so `-n 5` can still append 10 models. Trim after merging into one bucket.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/llm-info/src/sync-models.ts">
<violation number="1" location="packages/llm-info/src/sync-models.ts:237">
P2: Cap is not per final provider. `google` can get trimmed entries from both `google` and `google-vertex`, so `-n 5` can still append 10 models. Trim after merging into one bucket.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
📝 Summary
Pulling this data from https://models.dev. It has an open-source API. Considered openrouter as well, but it has a slightly different structure. Anyway, easy to changeover if needed.
Some models don't exist on the API, maybe we can cross-check. I've removed them manually for now (eg.
gpt-5.5-codex-spark)We could put this into a github actions workflow in the future.
📋 Pre-Review Checklist
✅ Merge Checklist