[Torchvision API] Input metadata by mdabek-nvidia · Pull Request #6364 · NVIDIA/DALI

mdabek-nvidia · 2026-05-22T11:57:37Z

Category:

New feature

Description:

Torchvision functional API operator to get metadata of input:

get_image_size
get_dimensions

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

mdabek-nvidia · 2026-05-22T17:37:30Z

@greptileai review

greptile-apps · 2026-05-22T17:43:14Z

Greptile Summary

This PR adds three new capabilities to the DALI torchvision functional API: get_image_size, get_dimensions (image metadata functions mirroring torchvision.transforms.v2.functional), crop (functional deterministic crop), and RandomCrop (random-location crop operator). All new code operates on PIL images and torch.Tensor inputs and delegates to DALI's fn.slice / ndd.slice under the hood.

get_image_size and get_dimensions are pure Python wrappers with correct W/H ordering for PIL and correct last-3-dims extraction for tensors; tests compare directly against torchvision.
RandomCrop builds the random offset inside a DALI pipeline graph using fn.random.uniform; the pad_if_needed logic is correct, but when pad_if_needed=False and the image is smaller than the crop, _randint receives a negative max_value and constructs an invalid uniform range — producing undefined pipeline behavior instead of an actionable error.
The test file contains a large commented-out block with an unresolved TODO that should be removed and tracked as a separate issue before merging.

Confidence Score: 3/5

The metadata and functional crop additions are solid, but RandomCrop has an unguarded code path that produces undefined DALI pipeline behavior, and the test file should not land with a commented-out TODO block.

RandomCrop silently passes a negative value to _randint when the image is smaller than the crop size and pad_if_needed is not set — this corrupts the pipeline graph instead of raising a meaningful error, and there is no test covering this failure mode. Additionally, a substantial block of commented-out test code with an unresolved TODO is present in the test file. Both should be addressed before merging.

randomcrop.py (negative-max_value guard in _kernel) and test_tv_randomcrop.py (commented-out TODO block).

Important Files Changed

Filename	Overview
dali/python/nvidia/dali/experimental/torchvision/v2/functional/image_metadata.py	New file implementing get_image_size and get_dimensions; logic correctly mirrors torchvision for PIL and tensor inputs; minor: error messages missing trailing periods.
dali/python/nvidia/dali/experimental/torchvision/v2/randomcrop.py	New RandomCrop operator; _randint receives a negative max_value when image is smaller than crop size and pad_if_needed=False, producing undefined DALI pipeline behavior with no actionable error message.
dali/python/nvidia/dali/experimental/torchvision/v2/functional/crop.py	New functional crop wrapper; layout-axis mapping is correct; docstring is minimal and omits parameters/return type.
dali/test/python/torchvision/test_tv_randomcrop.py	New tests for RandomCrop; correctness verified against torchvision; assert_raises calls lack glob= patterns; large commented-out TODO block must be removed before merging.
dali/test/python/torchvision/test_tv_crop.py	New tests for functional crop; validates against torchvision across device/mode/layout; assert_raises calls lack glob= patterns.
dali/test/python/torchvision/test_tv_image_metadata.py	New tests for get_image_size and get_dimensions; good coverage of PIL modes, tensor ranks, GPU, and torchvision compatibility; error-path tests present.

Sequence Diagram

sequenceDiagram
    participant User
    participant crop_fn as functional.crop
    participant RandomCrop
    participant ndd_slice as ndd.slice / fn.slice

    User->>crop_fn: crop(inpt, top, left, height, width)
    crop_fn->>crop_fn: _verify_crop_coordinate(top, left)
    crop_fn->>RandomCrop: "verify_args(size=(height,width), ...)"
    crop_fn->>ndd_slice: "slice(inpt, (top,left), (height,width), axes=...)"
    ndd_slice-->>User: cropped tensor/batch

    User->>RandomCrop: __call__(inpt)
    RandomCrop->>RandomCrop: preprocess_data → get_HWC_from_layout_pipeline
    RandomCrop->>RandomCrop: _kernel(in_h, in_w, _, tensor)
    alt needs_padding
        RandomCrop->>fn.slice: pad with out_of_bounds_policy
        fn.slice-->>RandomCrop: padded tensor
    end
    RandomCrop->>RandomCrop: _randint(max_top), _randint(max_left)
    RandomCrop->>fn.slice: slice at random offset
    fn.slice-->>User: cropped tensor

Comments Outside Diff (2)

dali/test/python/torchvision/test_tv_randomcrop.py, line 1084-1103 (link)

[Bug] Commented-out test code with TODO should not be merged

A large block of commented-out test functions wrapped in a docstring is left behind with # TODO: Fill using dictionary pattern is currently not supported. Per project policy, TODOs must be resolved (or tracked with an issue reference) before merging, and dead code should not be preserved in source — git history serves that purpose. Either remove this block entirely and open a tracked issue, or implement the functionality before landing.
dali/test/python/torchvision/test_tv_randomcrop.py, line 1157-1158 (link)

[Style] assert_raises calls lack a message-pattern glob

Project convention requires assert_raises(ExcType, glob="<pattern>") so that a test can't pass if an unrelated path raises the same exception type with a meaningless message. This pattern appears throughout test_tv_randomcrop.py (e.g. test_random_crop_invalid_size, test_random_crop_invalid_padding, test_random_crop_invalid_fill) and throughout test_tv_crop.py (test_crop_invalid_input_type, test_crop_invalid_output_size, test_crop_invalid_coordinates). Adding glob= guards that the right error is raised for the right reason.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "Image information Torchvision's function..." | Re-trigger Greptile}

greptile-apps · 2026-05-22T17:43:22Z

+        max_top = fn.cast(in_h, dtype=dali.types.INT32) - crop_h
+        max_left = fn.cast(in_w, dtype=dali.types.INT32) - crop_w
+
+        top = RandomCrop._randint(max_top)
+        left = RandomCrop._randint(max_left)
+
+        return fn.slice(
+            tensor,
+            fn.stack(left, top),
+            fn.stack(crop_w, crop_h),
+            device=self.device,
+            axis_names="WH",
+        )


[Bug] Undefined behavior when crop size exceeds image size and pad_if_needed=False

When pad_if_needed=False (the default) and no explicit padding is supplied, self.needs_padding is False so the padding block is skipped. If the input image is smaller than the requested crop, max_top = in_h - crop_h becomes negative. _randint then builds range_end = max_value + 1 ≤ 0, passing fn.random.uniform(range=[0, ≤0]) — an invalid range — to DALI. Torchvision raises a clear ValueError in this case; DALI silently produces garbage or a pipeline crash with no actionable message. A guard like if max_top < 0 or max_left < 0: raise ValueError(...) should be added before _randint is called.

greptile-apps · 2026-05-22T17:43:27Z

+        if inpt.ndim < 2:
+            raise TypeError(
+                f"get_image_size requires a tensor with at least 2 dimensions, got {inpt.ndim}"
+            )
+        return [inpt.shape[-1], inpt.shape[-2]]  # [W, H]
+    raise TypeError(f"Unsupported input type: {type(inpt)}")
+
+
+def get_dimensions(inpt: Image.Image | torch.Tensor) -> List[int]:


[Style] Error messages should end with a period

Both error messages are missing a trailing period, violating the project convention that error messages must read as complete sentences.

Suggested change

if inpt.ndim < 2:

raise TypeError(

f"get_image_size requires a tensor with at least 2 dimensions, got {inpt.ndim}"

)

return [inpt.shape[-1], inpt.shape[-2]] # [W, H]

raise TypeError(f"Unsupported input type: {type(inpt)}")

def get_dimensions(inpt: Image.Image | torch.Tensor) -> List[int]:

if inpt.ndim < 2:

raise TypeError(

f"get_image_size requires a tensor with at least 2 dimensions, got {inpt.ndim}."

)

return [inpt.shape[-1], inpt.shape[-2]] # [W, H]

raise TypeError(f"Unsupported input type: {type(inpt)}.")

def get_dimensions(inpt: Image.Image | torch.Tensor) -> List[int]:

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-05-22T17:43:28Z

+        if inpt.ndim < 2:
+            raise TypeError(
+                f"get_dimensions requires a tensor with at least 2 dimensions, got {inpt.ndim}"
+            )
+        if inpt.ndim == 2:


[Style] get_dimensions error messages also missing trailing period

Same sentence-ending convention issue in get_dimensions.

Suggested change

if inpt.ndim < 2:

raise TypeError(

f"get_dimensions requires a tensor with at least 2 dimensions, got {inpt.ndim}"

)

if inpt.ndim == 2:

if inpt.ndim < 2:

raise TypeError(

f"get_dimensions requires a tensor with at least 2 dimensions, got {inpt.ndim}."

)

if inpt.ndim == 2:

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

mdabek-nvidia added 10 commits May 8, 2026 11:59

Torchvision API RandomApply implementation

d3f473b

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

Greptile review fixes

b0c754b

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

Adding 0 < p 1 tests

57302e8

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

Review fixes

a4d6209

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

Torchvision API RandomCrop and crop operartors

693e9af

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

Merge branch 'main' into torchvision_crop

518c4d1

Greptile review comments and "cpu"/"gpu" unit tests

2c7e9ef

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

Lint fixes

08ebc42

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

More tests

12dddd3

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

Image information Torchvision's functional API

15c1775

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

greptile-apps Bot reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torchvision API] Input metadata#6364

[Torchvision API] Input metadata#6364
mdabek-nvidia wants to merge 10 commits into
NVIDIA:mainfrom
mdabek-nvidia:torchvision_image_metadata

mdabek-nvidia commented May 22, 2026

Uh oh!

mdabek-nvidia commented May 22, 2026

Uh oh!

greptile-apps Bot commented May 22, 2026 •

edited

Loading

Greptile Summary

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot May 22, 2026

Uh oh!

greptile-apps Bot May 22, 2026

Uh oh!

greptile-apps Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mdabek-nvidia commented May 22, 2026

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Uh oh!

mdabek-nvidia commented May 22, 2026

Uh oh!

greptile-apps Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented May 22, 2026 •

edited

Loading