merutable-capi: C ABI layer for the merutable engine#70
Draft
jakeswenson wants to merge 2 commits into
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces
merutable-capi: a C ABI layer over the merutable Rust engine, generated via cbindgen. Exposes the full database lifecycle (open, read, write, close) plus a new lightweightmeru_manifest_infofunction for read-only catalog inspection — the primary entry point for the DuckDB extension.Commits
feat: add merutable-capi— initial C ABI crate with full CRUD lifecycle, cbindgen header, and smoke testfeat(capi): add meru_manifest_info— shared runtime model, read-only manifest inspection, internal column filteringDesign principles
1. The ABI surface is minimal and stable.
Only types and functions that have a clear external consumer are exported. Internal Rust types (manifests, WAL structures, compaction state) never cross the boundary. cbindgen generates
include/merutable.hdirectly from Rust#[repr(C)]types, so the header is always in sync with the implementation.2. Ownership is always explicit and single-sided.
Every heap allocation made by the Rust side is documented in the header and freed by a paired
*_freefunction. The caller never callsfree()directly on any pointer returned by the API. PassingNULLto any*_freefunction is safe. This contract is enforced by the RustDrop/Box::from_raw/CString::from_rawdiscipline in the free functions.3. Async I/O is driven by an explicit, caller-owned runtime.
The crate defines a
MeruRuntimeopaque type wrapping atokio::runtime::Runtimebehind anArc. The caller creates one runtime withmeru_runtime_new()and passes it to every function that performs I/O. Multiple database handles opened on the same runtime share one thread pool. No hidden runtime is ever allocated per-call. TheArcensures the thread pool stays alive as long as any handle that references it is open, and shuts down naturally when the last reference is dropped.4. Internal columns are filtered before crossing the ABI boundary.
merutable's Parquet files carry bookkeeping columns (
_merutable_ikey,_merutable_seq,_merutable_op,_merutable_value) that are injected by the codec layer and are invisible to the public schema. TheMeruManifestInfo.columnsarray is explicitly filtered against these names in Rust before being handed to C. The C++ extension does not need — and must not need — anIsSystemColumn()guard.5. All file paths returned are absolute.
Manifest entries use relative paths internally. Any function that returns file paths canonicalizes the base directory and prefixes every entry, so callers receive paths that can be passed directly to a file reader.
6. Dual-format manifest reading.
Functions that read the manifest (including
meru_manifest_infoandmeru_open_existing) preferv{N}.metadata.pb(the canonical protobuf format introduced in #28) and fall back tov{N}.metadata.jsonfor catalogs committed before the migration. Format detection uses theMRUBmagic prefix.ABI surface
Runtime lifecycle
Database lifecycle
meru_open_existingreads theTableSchemafrom the manifest on disk — no schema re-supply needed. ReturnsMeruStatus_ErrNotFoundwhen no catalog exists at the path.Read / write
Maintenance
Manifest inspection (new)
meru_manifest_inforeads the catalog manifest atpathwithout acquiring a write lock or initializing aMeruHandle. It is the intended first call from the DuckDB extension: inspect the schema and enumerate Parquet files cheaply, then decide whether to open a full handle. ReturnsMeruStatus_ErrNotFoundwhen no catalog exists atpath.Memory free helpers
Key types
MeruColumnTypeMeruValueMeruRowMeruValuein schema column orderMeruColumnDefMeruSchema/MeruOpenOptionsMeruScanResultmeru_scanMeruStatsMeruManifestInfoMeruStatusTesting
crates/merutable-capi/tests/c_smoke.rs— Rust test that compilesexamples/smoke.cagainst the built dylib and runs it end-to-end (open → put → get → scan → stats → close → reopen → close)examples/smoke.c— can also be compiled and run manually against any catalog pathAlso in this PR
rust-toolchain.toml— pins the workspace to stable with rustfmt and clippy components.gitignore— ignoresversion-hint.textandmetadata/*.metadata.{json,pb}at the repo root, which were being generated by test runs using relative catalog paths