Skip to content

Fix: expand dict transformation#9561

Open
Shamik-07 wants to merge 18 commits into
marimo-team:mainfrom
Shamik-07:fix/expand_dict_transformation
Open

Fix: expand dict transformation#9561
Shamik-07 wants to merge 18 commits into
marimo-team:mainfrom
Shamik-07:fix/expand_dict_transformation

Conversation

@Shamik-07
Copy link
Copy Markdown
Contributor

@Shamik-07 Shamik-07 commented May 15, 2026

📝 Summary

Using narwahls to convert all backend to polars and then using the unnest function of polars for expanding the dict and then convert it back to the original backend.
Closes #4583

Screen.Recording.2026-05-15.at.18.07.461.mov

📋 Pre-Review Checklist

  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • Video or media evidence is provided for any visual changes (optional).

✅ Merge Checklist

  • I have read the contributor guidelines.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Tests have been added for the changes made.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 21, 2026 10:13pm

Request Review

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Architecture diagram
sequenceDiagram
    participant User as User (Marimo UI)
    participant DFPlugin as Dataframe Plugin
    participant Handler as ExpandDict Handler (handlers.py)
    participant Narwhals as Narwhals Layer
    participant Polars as Polars Engine
    participant Backend as Original Backend (pandas/Ibis/other)

    Note over User,Backend: Expand Dict Transformation Flow

    User->>DFPlugin: Trigger expand dict on column
    DFPlugin->>Handler: handle_expand_dict(df, transform)

    Handler->>Narwhals: collect_and_preserve_type(df)
    Narwhals->>Backend: Collect actual data from original backend
    Backend-->>Narwhals: Data as native type
    Narwhals-->>Handler: Collected DataFrame + undo function

    Handler->>Polars: collected_df.to_polars()
    Note over Handler,Polars: Convert to Polars for unnest support

    Polars->>Polars: polars_df.unnest(column_id)
    Note over Polars: Handles null dict values correctly

    Polars-->>Handler: Unnested Polars DataFrame

    Handler->>Narwhals: nw.from_native(unnested)
    Narwhals->>Handler: Narwhals wrapper

    Handler->>Handler: undo(narwhals_df)
    Note over Handler: Convert back to original backend type

    Handler-->>DFPlugin: Transformed DataFrame
    DFPlugin-->>User: Updated table with expanded columns
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Re-trigger cubic

Comment thread marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py
{"A": [{"foo": 1, "bar": "hello"}], "B": [1]}
{"A": [{"foo": 1, "bar": "hello"}], "B": [1]},
),
create_test_dataframes(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

polars should already be created in this create_test_dataframes. so we dont need to add the dataframe below

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown
Contributor

@mscolnick mscolnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to take an optional dep on polars

Shamik-07 added 3 commits May 19, 2026 17:45
… rows with the create test dataframes instead.
refactor: adding None == NaN in assert frame equal with nans method to use it in expand_dict test.
@Shamik-07
Copy link
Copy Markdown
Contributor Author

need to take an optional dep on polars

Done.
Added allow_none_equals_nan for assert_frame_equal_with_nans as None!=NaN, which was causing the use of assert_frame_equal_with_nans to fail for test_expand_dict.

@Shamik-07 Shamik-07 requested a review from mscolnick May 19, 2026 22:30
@Shamik-07
Copy link
Copy Markdown
Contributor Author

There are some pandas CI errors that i am looking into.

@Shamik-07
Copy link
Copy Markdown
Contributor Author

There are some pandas CI errors that i am looking into.

This is happening because of data conversion mismatch between pandas and narwahls with mixed data types

"Could not convert '3' with type str: tried to convert to double"

for

test_print_code_result_matches_actual_transform_pandas(
    transform=ExpandDictTransform(
        type=TransformType.EXPAND_DICT,
        column_id='strings',
    ),
)

So my only option is to fallback to pandas backend processing separately for the unnest.
This should fix it.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Bundle Report

Bundle size has no change ✅

Affected Assets, Files, and Routes:

view changes for bundle: marimo-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/dist-*.js -166 bytes 169 bytes -49.55%
assets/dist-*.js 82 bytes 259 bytes 46.33% ⚠️
assets/dist-*.js 39 bytes 176 bytes 28.47% ⚠️
assets/dist-*.js 234 bytes 403 bytes 138.46% ⚠️
assets/dist-*.js -73 bytes 183 bytes -28.52%
assets/dist-*.js -33 bytes 104 bytes -24.09%
assets/dist-*.js 59 bytes 335 bytes 21.38% ⚠️
assets/dist-*.js -74 bytes 102 bytes -42.05%
assets/dist-*.js 62 bytes 164 bytes 60.78% ⚠️
assets/dist-*.js -60 bytes 104 bytes -36.59%
assets/dist-*.js -76 bytes 183 bytes -29.34%
assets/dist-*.js 283 bytes 387 bytes 272.12% ⚠️
assets/dist-*.js 33 bytes 137 bytes 31.73% ⚠️
assets/dist-*.js -14 bytes 169 bytes -7.65%
assets/dist-*.js -250 bytes 137 bytes -64.6%
assets/dist-*.js 172 bytes 276 bytes 165.38% ⚠️
assets/dist-*.js 87 bytes 256 bytes 51.48% ⚠️
assets/dist-*.js -226 bytes 177 bytes -56.08%
assets/dist-*.js -79 bytes 104 bytes -43.17%
assets/ttcn-*.js -7 bytes 57 bytes -10.94%
assets/ttcn-*.js 7 bytes 64 bytes 12.28% ⚠️

@Shamik-07
Copy link
Copy Markdown
Contributor Author

@mscolnick
Apart from the cubic ai's review comment, we started this because expand_dict was having trouble with NaN rows.
We have certainly fixed that, but the issue is regarding deeply nested dicts being expanded by pandas if we are using pd.json_normalize vs. polars unnest will only unnest a single level by default. This will yield discrepancies between both the backend outputs.

Current Behaviour
e.g. df with nested columns
image

polars unnest
image

pandas unnest
image

Proposal
Let's unnest only a single level in pandas thus matching polars unnest.
This will always give us the same backend agnostic output
image

I have added the fix to this PR, please let me know your thoughts.

@Shamik-07
Copy link
Copy Markdown
Contributor Author

Screen.Recording.2026-05-21.at.18.07.581.mov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New dataframe transform: Polars Native Expand Dict Transformation

2 participants