Skip to content

[Torchvision API] Input metadata#6364

Draft
mdabek-nvidia wants to merge 10 commits into
NVIDIA:mainfrom
mdabek-nvidia:torchvision_image_metadata
Draft

[Torchvision API] Input metadata#6364
mdabek-nvidia wants to merge 10 commits into
NVIDIA:mainfrom
mdabek-nvidia:torchvision_image_metadata

Conversation

@mdabek-nvidia
Copy link
Copy Markdown
Collaborator

Category:

New feature

Description:

Torchvision functional API operator to get metadata of input:

  • get_image_size
  • get_dimensions

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
@mdabek-nvidia
Copy link
Copy Markdown
Collaborator Author

@greptileai review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR adds three new capabilities to the DALI torchvision functional API: get_image_size, get_dimensions (image metadata functions mirroring torchvision.transforms.v2.functional), crop (functional deterministic crop), and RandomCrop (random-location crop operator). All new code operates on PIL images and torch.Tensor inputs and delegates to DALI's fn.slice / ndd.slice under the hood.

  • get_image_size and get_dimensions are pure Python wrappers with correct W/H ordering for PIL and correct last-3-dims extraction for tensors; tests compare directly against torchvision.
  • RandomCrop builds the random offset inside a DALI pipeline graph using fn.random.uniform; the pad_if_needed logic is correct, but when pad_if_needed=False and the image is smaller than the crop, _randint receives a negative max_value and constructs an invalid uniform range — producing undefined pipeline behavior instead of an actionable error.
  • The test file contains a large commented-out block with an unresolved TODO that should be removed and tracked as a separate issue before merging.

Confidence Score: 3/5

The metadata and functional crop additions are solid, but RandomCrop has an unguarded code path that produces undefined DALI pipeline behavior, and the test file should not land with a commented-out TODO block.

RandomCrop silently passes a negative value to _randint when the image is smaller than the crop size and pad_if_needed is not set — this corrupts the pipeline graph instead of raising a meaningful error, and there is no test covering this failure mode. Additionally, a substantial block of commented-out test code with an unresolved TODO is present in the test file. Both should be addressed before merging.

randomcrop.py (negative-max_value guard in _kernel) and test_tv_randomcrop.py (commented-out TODO block).

Important Files Changed

Filename Overview
dali/python/nvidia/dali/experimental/torchvision/v2/functional/image_metadata.py New file implementing get_image_size and get_dimensions; logic correctly mirrors torchvision for PIL and tensor inputs; minor: error messages missing trailing periods.
dali/python/nvidia/dali/experimental/torchvision/v2/randomcrop.py New RandomCrop operator; _randint receives a negative max_value when image is smaller than crop size and pad_if_needed=False, producing undefined DALI pipeline behavior with no actionable error message.
dali/python/nvidia/dali/experimental/torchvision/v2/functional/crop.py New functional crop wrapper; layout-axis mapping is correct; docstring is minimal and omits parameters/return type.
dali/test/python/torchvision/test_tv_randomcrop.py New tests for RandomCrop; correctness verified against torchvision; assert_raises calls lack glob= patterns; large commented-out TODO block must be removed before merging.
dali/test/python/torchvision/test_tv_crop.py New tests for functional crop; validates against torchvision across device/mode/layout; assert_raises calls lack glob= patterns.
dali/test/python/torchvision/test_tv_image_metadata.py New tests for get_image_size and get_dimensions; good coverage of PIL modes, tensor ranks, GPU, and torchvision compatibility; error-path tests present.

Sequence Diagram

sequenceDiagram
    participant User
    participant crop_fn as functional.crop
    participant RandomCrop
    participant ndd_slice as ndd.slice / fn.slice

    User->>crop_fn: crop(inpt, top, left, height, width)
    crop_fn->>crop_fn: _verify_crop_coordinate(top, left)
    crop_fn->>RandomCrop: "verify_args(size=(height,width), ...)"
    crop_fn->>ndd_slice: "slice(inpt, (top,left), (height,width), axes=...)"
    ndd_slice-->>User: cropped tensor/batch

    User->>RandomCrop: __call__(inpt)
    RandomCrop->>RandomCrop: preprocess_data → get_HWC_from_layout_pipeline
    RandomCrop->>RandomCrop: _kernel(in_h, in_w, _, tensor)
    alt needs_padding
        RandomCrop->>fn.slice: pad with out_of_bounds_policy
        fn.slice-->>RandomCrop: padded tensor
    end
    RandomCrop->>RandomCrop: _randint(max_top), _randint(max_left)
    RandomCrop->>fn.slice: slice at random offset
    fn.slice-->>User: cropped tensor
Loading

Comments Outside Diff (2)

  1. dali/test/python/torchvision/test_tv_randomcrop.py, line 1084-1103 (link)

    P1 [Bug] Commented-out test code with TODO should not be merged

    A large block of commented-out test functions wrapped in a docstring is left behind with # TODO: Fill using dictionary pattern is currently not supported. Per project policy, TODOs must be resolved (or tracked with an issue reference) before merging, and dead code should not be preserved in source — git history serves that purpose. Either remove this block entirely and open a tracked issue, or implement the functionality before landing.

  2. dali/test/python/torchvision/test_tv_randomcrop.py, line 1157-1158 (link)

    P2 [Style] assert_raises calls lack a message-pattern glob

    Project convention requires assert_raises(ExcType, glob="<pattern>") so that a test can't pass if an unrelated path raises the same exception type with a meaningless message. This pattern appears throughout test_tv_randomcrop.py (e.g. test_random_crop_invalid_size, test_random_crop_invalid_padding, test_random_crop_invalid_fill) and throughout test_tv_crop.py (test_crop_invalid_input_type, test_crop_invalid_output_size, test_crop_invalid_coordinates). Adding glob= guards that the right error is raised for the right reason.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "Image information Torchvision's function..." | Re-trigger Greptile

Comment on lines +245 to +257
max_top = fn.cast(in_h, dtype=dali.types.INT32) - crop_h
max_left = fn.cast(in_w, dtype=dali.types.INT32) - crop_w

top = RandomCrop._randint(max_top)
left = RandomCrop._randint(max_left)

return fn.slice(
tensor,
fn.stack(left, top),
fn.stack(crop_w, crop_h),
device=self.device,
axis_names="WH",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 [Bug] Undefined behavior when crop size exceeds image size and pad_if_needed=False

When pad_if_needed=False (the default) and no explicit padding is supplied, self.needs_padding is False so the padding block is skipped. If the input image is smaller than the requested crop, max_top = in_h - crop_h becomes negative. _randint then builds range_end = max_value + 1 ≤ 0, passing fn.random.uniform(range=[0, ≤0]) — an invalid range — to DALI. Torchvision raises a clear ValueError in this case; DALI silently produces garbage or a pipeline crash with no actionable message. A guard like if max_top < 0 or max_left < 0: raise ValueError(...) should be added before _randint is called.

Comment on lines +45 to +53
if inpt.ndim < 2:
raise TypeError(
f"get_image_size requires a tensor with at least 2 dimensions, got {inpt.ndim}"
)
return [inpt.shape[-1], inpt.shape[-2]] # [W, H]
raise TypeError(f"Unsupported input type: {type(inpt)}")


def get_dimensions(inpt: Image.Image | torch.Tensor) -> List[int]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 [Style] Error messages should end with a period

Both error messages are missing a trailing period, violating the project convention that error messages must read as complete sentences.

Suggested change
if inpt.ndim < 2:
raise TypeError(
f"get_image_size requires a tensor with at least 2 dimensions, got {inpt.ndim}"
)
return [inpt.shape[-1], inpt.shape[-2]] # [W, H]
raise TypeError(f"Unsupported input type: {type(inpt)}")
def get_dimensions(inpt: Image.Image | torch.Tensor) -> List[int]:
if inpt.ndim < 2:
raise TypeError(
f"get_image_size requires a tensor with at least 2 dimensions, got {inpt.ndim}."
)
return [inpt.shape[-1], inpt.shape[-2]] # [W, H]
raise TypeError(f"Unsupported input type: {type(inpt)}.")
def get_dimensions(inpt: Image.Image | torch.Tensor) -> List[int]:

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +75 to +79
if inpt.ndim < 2:
raise TypeError(
f"get_dimensions requires a tensor with at least 2 dimensions, got {inpt.ndim}"
)
if inpt.ndim == 2:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 [Style] get_dimensions error messages also missing trailing period

Same sentence-ending convention issue in get_dimensions.

Suggested change
if inpt.ndim < 2:
raise TypeError(
f"get_dimensions requires a tensor with at least 2 dimensions, got {inpt.ndim}"
)
if inpt.ndim == 2:
if inpt.ndim < 2:
raise TypeError(
f"get_dimensions requires a tensor with at least 2 dimensions, got {inpt.ndim}."
)
if inpt.ndim == 2:

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant