[FIX] Pooled global/groupby lag transforms to use RANGE semantics by simonez-tuidi · Pull Request #641 · Nixtla/mlforecast

simonez-tuidi · 2026-05-08T13:27:11Z

PR Description

Summary

This PR reworks global_ and groupby lag transforms so pooled features are computed over the underlying observations in each time range, matching SQL-style RANGE BETWEEN ... PRECEDING semantics.

This implements Option A from the issue #640 : change the default global_ / groupby behavior to RANGE semantics instead of preserving the current sum-then-roll behavior.

Previously, global and grouped transforms were backed by separate ad hoc state paths that aggregated each timestamp using sum before applying the lag transform. That made transforms such as:

RollingMean(window_size=2, global_=True)
RollingMean(window_size=2, groupby=["brand"])

behave like transforms over per-timestamp sums rather than transforms over all rows in the relevant time window. This branch introduces a shared pooled state representation and computes pooled transforms directly from bucketed observation arrays.

Problem Addressed

When multiple series share a pooled bucket, the old implementation first collapsed the data by timestamp:

ds	a	b	summed y
1	1	10	11
2	2	20	22
3	3	30	33
4	4	40	44

Then RollingMean(window_size=2, lag=1, global_=True) operated on [11, 22, 33, 44], producing values such as mean(11, 22) = 16.5.

That has two practical problems:

RollingMean scales with the number of series in the group, because it is effectively averaging sums.
Results can jump when the membership of a group changes, even if the target distribution is stable.

With this PR, the same transform operates over the individual observations in the RANGE window. For example, at ds=3 the window contains [1, 10, 2, 20], so the mean is 8.25.

One detail worth making explicit: min_samples is still evaluated over observations. With multiple series in a bucket, a window containing one timestamp can satisfy min_samples=2 if that timestamp has two observed rows.

What Changed

Added `PooledState`

Added mlforecast/pooled.py with a PooledState object that owns the state needed by pooled transforms:

flat observation arrays: bucket id, timestamp, time index, and target value
a GroupedArray for existing transform state initialization
bucket metadata used to join computed features back to the original dataframe
group key mappings for groupby transforms
update logic for predictions, new observations, new groups, and new series

This replaces the previous separate _global_ga / _global_times and _group_states code paths with a single state model.

Compute Pooled Features Directly

Lag transforms already expose _compute_bucket_feature(...) (added in earlier commits on this branch) for rolling, seasonal, expanding, EWM, Offset, and Combine transforms. This PR routes all pooled computation through those methods via a new compute_pooled_features() function, and removes the previous silent fallback to positional GroupedArray behavior.

The old fallback was problematic: when a transform did not implement _compute_bucket_feature, the code silently fell back to GA positional semantics, which produced incorrect results under RANGE window bounds. Unsupported pooled transforms now raise a clear NotImplementedError instead.

SQL-Like Range Semantics

Pooled transforms now use a per-bucket time_index derived from the validated regular time grid. For global and groupby transforms, this gives interval-style window bounds while preserving the existing codebase assumption that non-partitioned series do not contain gaps.

This means a grouped feature behaves like:

AVG(y) OVER (
  PARTITION BY brand
  RANGE BETWEEN 2 PERIODS PRECEDING AND 1 PERIOD PRECEDING
)

rather than first aggregating y by (brand, timestamp).

The intended equivalence model is:

mlforecast configuration	SQL mental model
`RollingMean(w, lag=l)`	`AVG(y) OVER (PARTITION BY unique_id RANGE BETWEEN ...)`
`RollingMean(w, lag=l, global_=True)`	`AVG(y) OVER (RANGE BETWEEN ...)`
`RollingMean(w, lag=l, groupby=["brand"])`	`AVG(y) OVER (PARTITION BY brand RANGE BETWEEN ...)`

Update Path Fixes

The pooled state update path now keeps all related arrays and metadata in sync:

appends observations to bucket_df
recomputes/extends time indexes after update()
handles new series and new groups
updates series-to-bucket mappings after static features change
preserves prediction-time feature computation after updates

This also fixes a pre-existing bug in TimeSeries.update() where new series received wrong static features: ufp.take_rows(df, ...) was indexing into the full DataFrame instead of the new-series subset, causing incorrect bucket assignments for series introduced via update().

Categorical Group Key Support

Grouped buckets are represented internally by numeric _bucket_ids, but public group keys such as brand or subcategory may be categorical. The new helpers reconcile pandas and Polars categoricals before joins and concatenations, including when updates introduce a new group value.

Tests

Added tests/test_pooled.py covering:

global and grouped update state preservation
sequential updates
staggered series starts
new series in new groups
categorical group keys with new group values
prediction-time feature computation after updates
unsupported pooled transform errors

Updated existing core tests to assert the new RANGE-style semantics for global and grouped rolling/expanding transforms.

The previous test_group_lag_transform used one series per group, which made sum-by-timestamp and RANGE semantics indistinguishable. The updated tests include multiple series in the same group so this behavior is covered directly.

Verification

Ran:

python -m pytest tests/test_pooled.py -x -q
python -m pytest tests/test_core.py -x -q
ruff check mlforecast/core.py mlforecast/pooled.py mlforecast/lag_transforms.py tests/test_pooled.py tests/test_core.py

The full branch suite was also verified with:

python -m pytest tests/test_core.py tests/test_lag_transforms.py tests/test_forecast.py tests/test_pooled.py -x -q

Result:

279 passed
2 skipped
lint clean
mlforecast/pooled.py at 98% coverage

Compatibility / Breaking Change

The public transform API is unchanged. Existing global_ and groupby arguments continue to be used.

This is nevertheless a necessary breaking change for users who already rely on global_ or groupby transforms with more than one series in a pooled bucket. The numeric output changes from "sum by timestamp, then apply the transform" to "apply the transform over all observations in the RANGE window".

For example, RollingMean(window_size=2, lag=1, global_=True) over two aligned series changes from:

ds=3: mean(sum(ds=1), sum(ds=2)) = mean(11, 22) = 16.5

to:

ds=3: mean(all observations at ds=1 and ds=2) = mean(1, 10, 2, 20) = 8.25

This change is intentional because the previous behavior made means scale with the number of series in the group and diverged from the SQL RANGE mental model used by the rest of the pooled/partitioned transform design. Preserving the old behavior would require adding a separate aggregation mode (for example, "sum by timestamp before transforming"), which would keep the incorrect default and add API complexity. This PR chooses correctness and consistency instead.

Users who depended on the old sum-then-transform behavior will need to reproduce that aggregation explicitly before fitting or use a future explicit aggregation option if one is added.

The internal fitted TimeSeries state shape changed, so previously pickled fitted TimeSeries objects that depend on the old private pooled state are not expected to be compatible.

Linking issue

Closes #640

Introduce PooledState for global/groupby transform state, compute pooled features from raw bucketed observations instead of summed timestamps, and add update/new-group/categorical coverage.

codspeed-hq · 2026-05-08T13:34:22Z

Merging this PR will not alter performance

✅ 12 untouched benchmarks

_{Comparing simonez-tuidi:feature/groupby_with_range_semantics (634550c) with main (3d501aa)}

janrth · 2026-05-13T15:09:02Z

Will this help finding a smarter solution for the partition_by problem?

As I just mentioned on the issue itself, I still like option B (from the issue) as I see reasons why we would first do the sum over ds before rolling window functions.

For example, check out this entry in discussion:
#644

simonez-tuidi · 2026-05-13T16:00:17Z

@janrth

Thanks Jan, yes I believe this approach would help bridge the gap between groupby in the current implementation and the proposed partition_by logic - by introducing the PooledState object that is tasked with handling both - actually all three, namely global or groupby plus partition_by

I'll try and wrap up soon with the new PR which branches off of this branch and includes the partition_by implementation using PooledState

Regarding the issue just raised, I believe this would work nicely with either implementation, and in my use cases I would see it more beneficial to go with Option A - using Combine to take the ratio of two RollingMean, one across a single series and one grouped or global

janrth · 2026-05-14T06:55:24Z

@codex

chatgpt-codex-connector · 2026-05-14T07:04:01Z

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

simonez-tuidi · 2026-05-14T09:20:23Z

@nasaul @janrth

One thing worth noting here which wasn't included in the PR description: the new logic treats the min_samples parameter in quite a literal way, which I thought was the better option but could be up for interpretation. In the current code, the samples represent the number of individual observations, whereas in the previous implementation of groupby we were counting the number of individual timestamps that form part of the window.

So now if we perform a RollingMean(window_size=1, min_samples=2, groupby=["brand"]) the result will be not null if there are at least two series.

Now this behavior diverges from the single series implementation, where min_samples is always capped at window_size if the user passes a higher value.

I'm going to add better docs to this branch and explain this behavior more explicitly, pending discussion.

In my opinion we could add in a separate feature branch an optional parameter min_intervals which counts the number of observed timestamps with non-null values across the aggregation, what do you think?

janrth · 2026-05-14T11:35:53Z

@nasaul @janrth

One thing worth noting here which wasn't included in the PR description: the new logic treats the min_samples parameter in quite a literal way, which I thought was the better option but could be up for interpretation. In the current code, the samples represent the number of individual observations, whereas in the previous implementation of groupby we were counting the number of individual timestamps that form part of the window.

So now if we perform a RollingMean(window_size=1, min_samples=2, groupby=["brand"]) the result will be not null if there are at least two series.

Now this behavior diverges from the single series implementation, where min_samples is always capped at window_size if the user passes a higher value.

I'm going to add better docs to this branch and explain this behavior more explicitly, pending discussion.

In my opinion we could add in a separate feature branch an optional parameter min_intervals which counts the number of observed timestamps with non-null values across the aggregation, what do you think?

I realised this and had to think about this for a minute. I think totally fine. If people do a detailed deep dive into preprocess and inspect data they might realise the difference, but it is logically consistent and in line with the sql notation.

Making a few more doc strings will do the job for me!

simonez-tuidi · 2026-05-14T12:32:56Z

Thanks for taking the time to review and for the feedback, really appreciate it! @janrth

I realised this and had to think about this for a minute. I think totally fine. If people do a detailed deep dive into preprocess and inspect data they might realise the difference, but it is logically consistent and in line with the sql notation.

Making a few more doc strings will do the job for me!

In local (per-series) mode coreforecast caps min_samples at window_size, but in pooled mode (global_/groupby) min_samples counts total non-NaN observations across all series in the bucket with no capping. This documents the divergence in the _RollingBase and _Seasonal_RollingBase docstrings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pooled (global/groupby) lag transforms assume a continuous, gap-free time grid. Emit a UserWarning in preprocess() when the user disables data validation so gaps don't silently produce incorrect feature values. The warning is suppressed for cross_validation's internal fit calls since that path validates the full dataset upfront. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Introduce _TimestampAggregates on PooledState that pre-computes per-bucket sums/counts/n_rows by timestamp. RollingMean uses the compact T-length arrays instead of the full (n_series * T)-length row arrays, reducing the working set from O(n_series * T) to O(T). Other transforms keep the existing row-level approach and are marked with TODOs for future migration. The cache is built in from_global/ from_groupby, updated incrementally in append_predictions, and rebuilt in append_observations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

During recursive prediction, RollingMean now computes the latest timestamp's feature value directly from cached _TimestampAggregates via _compute_latest_from_aggs, avoiding the O(n_series * T) query array construction. Transforms that don't support the fast path (all others currently) fall back to the existing build_query_arrays path. Benchmark (10k series, 100 timestamps, RollingMean(28), 20 recursive steps, 3 repeats — timing isolates _update_features + _update_y): global_=True: previous: 1.705s (85.25ms/step) → current: 0.138s (6.92ms/step) — 12.3x groupby=["brand"], 100 groups × 100 series: previous: 5.672s (283.59ms/step) → current: 0.330s (16.51ms/step) — 17.2x fit_transform time unchanged (~0.52s global, ~1.10s groupby). Checksum comparison of the first 3 recursive steps confirmed identical sums and NaN counts for both global and groupby cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

simonez-tuidi · 2026-05-14T15:34:55Z

@janrth

I've added four commits addressing the comments you raised:

ac8e7d3 - Documents that min_samples in pooled mode counts total observations across all series (no capping), unlike coreforecast's per-series behavior.
0d6ede4 - Emits a UserWarning when validate_data=False and pooled transforms exist. Suppressed in cross_validation internal calls where validation is done upfront.
54874ba - Adds _TimestampAggregates cache on PooledState with per-bucket sums/counts/n_rows. RollingMean uses the compact T-length arrays for both _transform() and compute_pooled_features(). Other transforms marked with TODOs.
95fa716 - Adds _compute_latest_from_aggs so RollingMean during recursive prediction computes directly from cached aggregates, skipping build_query_arrays entirely. 12-17x speedup on the prediction path.

Here's the specifics of the benchmarks:

Setup: 10,000 series, 100 timestamps each, RollingMean(28), 20 recursive update steps, 3 repeats. Timing isolates ts._update_features(); ts._update_y(...), so it measures the changed prediction path rather than model overhead.

Scenario	Impl	Recursive 20 steps	Per step	Speedup
`global_=True`	`3267a2f`	1.705s	85.25ms	1.0x
`global_=True`	current	0.138s	6.92ms	12.3x
`groupby=["brand"]`, 100 groups x 100 series	`3267a2f`	5.672s	283.59ms	1.0x
`groupby=["brand"]`, 100 groups x 100 series	current	0.330s	16.51ms	17.2x

fit_transform time was basically unchanged, as expected: global around 0.51-0.53s, groupby around 1.07-1.12s.

I also ran a short checksum comparison for the first 3 recursive steps on both implementations; sums and NaN counts matched for both global and groupby cases.

…) helper that extracts the shared cumsum/searchsorted computation. Refactored _compute_from_aggregates() to call it. Added _compute_ts_level_from_aggs() on _BaseLagTransform (returns None), RollingMean (calls helper per bucket), Offset (delegates), and Combine (combines element-wise). core.py: In _transform(), the global block now tries _compute_ts_level_from_aggs first. Transforms that support it get mapped directly to df_sorted rows via np.searchsorted on unique timestamps, bypassing the pandas merge in _join_bucket_features. Unsupported transforms fall through to the existing path.

When min_samples=0, empty windows caused ZeroDivisionError in _compute_latest_from_aggs (predict path) and silently returned 0.0 instead of NaN in _rolling_mean_from_agg (preprocess path). Add win_cnt > 0 guards to all five computation paths (_rolling_mean_from_agg, _compute_latest_from_aggs, _compute_row_level, _RollingBase, and _Seasonal_RollingBase) so empty windows consistently produce NaN. Warn at init time when min_samples=0 is used with global_/groupby. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…paths Add _expanding_mean_from_agg and _ewm_from_agg helpers that compute features from cached per-timestamp aggregates instead of the O(M*N_b) per-timestamp Python loop. ExpandingMean uses cumsum of sums/counts with searchsorted boundary lookup. EWM replaces the two-pass approach (per-timestamp mean loop + sequential scan) with a single pass over pre-computed sums/counts. Both transforms now support _compute_from_aggregates (fit), _compute_latest_from_aggs (predict), and _compute_ts_level_from_aggs (preprocess) fast paths. Offset and Combine delegate _compute_latest_from_aggs to their wrapped transforms. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace Python loops over series_bucket_id with vectorized numpy lookup arrays for both the fast path (_compute_latest_from_aggs) and slow path (query_arrays) in _update_features groupby handling. Sizes lookup array to cover all bucket IDs from both the result dict and series_bucket_id to prevent out-of-bounds indexing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add per-timestamp sum-of-squares, minimum, and maximum to the cached aggregates. sum_sq enables RollingStd/ExpandingStd fast paths via Bessel-corrected variance from prefix sums. mins/maxs enable RollingMin/Max/ExpandingMin/Max via sparse table or prefix reduction. _build_ts_aggs computes mins/maxs using np.minimum.at/np.maximum.at for O(N_b) vectorized construction. append_predictions updates all three new fields incrementally for both global and groupby buckets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

janrth

Great work!

I was wondering if we could use the same timestamp-level aggregate fast paths for min, max and std. Currently they are using the generic pooled masking path, which is slower.

I know it is annoying and honestly I did not have that in mind earlier. But maybe we can have one more check for min, max and std.

I think for quantile rolling we would need a more complex data structure as it needs the full distribution in the rolling window. So let's leave this out for now, but would be great if you could have a final look at min, max, std and see if they can use the fast timestamp path.

janrth · 2026-05-16T15:56:14Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2620175c7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

simonez-tuidi · 2026-05-18T07:49:23Z

Great work!

I was wondering if we could use the same timestamp-level aggregate fast paths for min, max and std. Currently they are using the generic pooled masking path, which is slower.

I know it is annoying and honestly I did not have that in mind earlier. But maybe we can have one more check for min, max and std.

I think for quantile rolling we would need a more complex data structure as it needs the full distribution in the rolling window. So let's leave this out for now, but would be great if you could have a final look at min, max, std and see if they can use the fast timestamp path.

Thanks @janrth ! I already started implementing those and yes, for quantile we need to fall back on the "slow" implementation but for everything else it should still be solid

The pooled EWM accumulator was consuming timestamps 0..k-1 before emitting at timestamp k, ignoring the lag parameter. With lag=L, the output at timestamp k should only reflect timestamps 0..k-L. Fix all three code paths (_ewm_from_agg, slow path, _compute_latest_from_aggs) to use a two-pointer approach where consume_idx only advances up to unique_times[k] - lag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ingStd, ExpandingMin/Max Extend the per-timestamp aggregate optimization to all remaining decomposable transforms: - RollingStd/ExpandingStd: cumsum of sums, counts, sum_sq with Bessel-corrected variance formula - RollingMin/Max: sparse table (O(n log n) build, O(1) query) over per-timestamp mins/maxs - ExpandingMin/Max: prefix min/max via np.fmin/fmax.accumulate Each transform gets _compute_bucket_feature (fit), _compute_ts_level_from_aggs (preprocess), and _compute_latest_from_aggs (predict) fast paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add _idsorted_to_bucket_pos permutation to PooledState, built once at fit time using ufp.sort for categorical-safe ordering. During preprocess, groupby transforms with _compute_ts_level_from_aggs now map results directly via the permutation instead of going through _join_bucket_features (which does an O(n log n) pandas/polars merge on every call). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Parameterize all existing pooled tests over lag=[1, 3] so structural and numerical assertions run at higher lags. Add targeted tests: - test_ewm_lag_semantics: hand-computed EWM values at lag=2 verifying the two-pointer consumption fix (global + groupby) - test_pooled_transforms_lag3_global: all 8 decomposable transforms with lag=3, checking preprocess + predict against expected values - test_pooled_transforms_lag2_groupby: 7 transforms in groupby mode - test_fast_vs_slow_equivalence: parameterized over all 9 transforms × lag=[1,3], exercises fit (_compute_from_aggregates), preprocess (_compute_ts_level_from_aggs), and predict (_compute_latest_from_aggs) paths by comparing aggregate fast path vs row-level slow path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Independent verification of lag-transforms using global_ or groupby mode with RANGE BETWEEN semantics using SQLite window functions. Covers 8 transforms × 3 lags × global/groupby, multi-column groupby, custom min_samples, staggered starts, and random stress tests (61 cases total). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

simonez-tuidi · 2026-05-19T10:58:25Z

@janrth @nasaul

I've added another sanity check by comparing results of the transforms with SQL equivalent semantics, on top of the hard-coded values in the other tests

Now all transforms that support groupby/global mode have been reimplemented and where possible are optimized for efficiency by avoiding joins and broadcasting values instead.

Let me know what you think, I'd love to see this merged soon and start implementing the partition_by feature on top of it

It would also be great to have it be part of the milestones in the next release, what do you think?

janrth · 2026-05-19T11:03:21Z

@janrth @nasaul

I've added another sanity check by comparing results of the transforms with SQL equivalent semantics, on top of the hard-coded values in the other tests

Now all transforms that support groupby/global mode have been reimplemented and where possible are optimized for efficiency by avoiding joins and broadcasting values instead.

Let me know what you think, I'd love to see this merged soon and start implementing the partition_by feature on top of it

It would also be great to have it be part of the milestones in the next release, what do you think?

Give ma a few days, but will look into your latest changes once I have a bit of time. But I have the feeling you have gotten pretty far and we are hopefully close to merging :)

…slow-path implementation

nasaul · 2026-05-19T20:01:26Z

Hey @simonez-tuidi this is a great job! I was ooo but great to see your work. Shout-out to @janrth for diving in with you. In order to merge I would argue that we need to work on:

Updating the docs
Add RollingSum and ExpandingSum (Another PR)
Translate pandas/polars logic into narwhals logic. This creates an easier time for reading and maintaining code without the splits.

If you're okay I can finish this tasks for merging.

simonez-tuidi · 2026-05-20T07:40:46Z

@nasaul thanks for that, absolutely go ahead, totally agree with switching to Narwhals to make it more maintainable

On adding RollingSum and ExpandingSum perhaps that would be best suited for merging in its own separate PR?

I can see other transforms being beneficial - like Counts - and we would also need to find the right design (e.g. do we sum all observations or is it intended to sum per-timestamp and then average?)

Another issue that was raised in #644 would be nice to see merged in its own PR - allowing non-local and local transforms to be used with Combine - this would actually probably fit in well with the current PR as well

Finally, I was also wondering if these changes will eventually end up being migrated to coreforecast instead, but I would say it'd be great to start merging in this repo and port them later once stable

nasaul · 2026-05-20T22:22:33Z

Okay, I agree that it should be another PR. Going into coreforecast can be a great option, but then we need to implement this functionality in C++, but let's first see if the implementation is stable. I'll work on the docs and the refactor into Narwhals.

review-notebook-app · 2026-05-22T23:34:42Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

nasaul

Thanks for this PR, great work!

Fix pooled global/groupby lag transforms to use RANGE semantics

3267a2f

Introduce PooledState for global/groupby transform state, compute pooled features from raw bucketed observations instead of summed timestamps, and add update/new-group/categorical coverage.

simonez-tuidi mentioned this pull request May 8, 2026

[FEAT] Add partition_by support for lag transforms (with global_/groupby fixes) #636

Draft

simonez-tuidi mentioned this pull request May 13, 2026

[CORE] groupby/global_ transforms use sum-then-roll semantics instead of SQL RANGE window semantics #640

Closed

janrth reviewed May 14, 2026

View reviewed changes

Comment thread mlforecast/pooled.py

Comment thread mlforecast/pooled.py

simonez-tuidi and others added 4 commits May 14, 2026 16:55

simonez-tuidi requested a review from janrth May 14, 2026 16:30

janrth reviewed May 14, 2026

View reviewed changes

Comment thread mlforecast/core.py

simonez-tuidi and others added 6 commits May 15, 2026 11:27

Merge branch 'main' into feature/groupby_with_range_semantics

e262017

janrth reviewed May 16, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 16, 2026

View reviewed changes

Comment thread mlforecast/lag_transforms.py Outdated

simonez-tuidi and others added 4 commits May 18, 2026 12:42

simonez-tuidi requested a review from janrth May 19, 2026 10:58

Added fix to address gap in the test that failed to cover the actual …

6281149

…slow-path implementation

simonez-tuidi mentioned this pull request May 20, 2026

Adding partition_by to lag-transforms building upon PooledState #651

Open

nasaul added 2 commits May 21, 2026 20:53

Refactors to narwhals

5ccb574

Add docs

8d5210e

Merge branch 'main' into feature/groupby_with_range_semantics

634550c

nasaul changed the title ~~Fix pooled global/groupby lag transforms to use RANGE semantics~~ [FIX] Pooled global/groupby lag transforms to use RANGE semantics May 22, 2026

nasaul approved these changes May 23, 2026

View reviewed changes

nasaul merged commit 06e3d8f into Nixtla:main May 23, 2026
22 checks passed

Conversation

simonez-tuidi commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

Summary

Problem Addressed

What Changed

Added PooledState

Compute Pooled Features Directly

SQL-Like Range Semantics

Update Path Fixes

Categorical Group Key Support

Tests

Verification

Compatibility / Breaking Change

Linking issue

Uh oh!

codspeed-hq Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

janrth commented May 13, 2026

Uh oh!

simonez-tuidi commented May 13, 2026

Uh oh!

janrth commented May 14, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 14, 2026

Uh oh!

simonez-tuidi commented May 14, 2026

Uh oh!

Uh oh!

Uh oh!

janrth commented May 14, 2026

Uh oh!

simonez-tuidi commented May 14, 2026

Uh oh!

simonez-tuidi commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

janrth left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janrth commented May 16, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

simonez-tuidi commented May 18, 2026

Uh oh!

simonez-tuidi commented May 19, 2026

Uh oh!

janrth commented May 19, 2026

Uh oh!

nasaul commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonez-tuidi commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nasaul commented May 20, 2026

Uh oh!

review-notebook-app Bot commented May 22, 2026

Uh oh!

nasaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simonez-tuidi commented May 8, 2026 •

edited

Loading

Added `PooledState`

codspeed-hq Bot commented May 8, 2026 •

edited

Loading

simonez-tuidi commented May 14, 2026 •

edited

Loading

janrth left a comment •

edited

Loading

nasaul commented May 19, 2026 •

edited

Loading

simonez-tuidi commented May 20, 2026 •

edited

Loading