Skip to content

Commit 2842a83

Browse files
authored
Improve DLPack support for external tensor consumption (#6261)
- Adds bulk DLPack TensorList constructors for lists of external tensors. The C++ path builds CPU and GPU TensorLists in one pass after preflighting devices and peeking all capsule dtypes, so dtype or device mismatches can fall back without partially consuming single-use capsules. - Adds `ndd.Batch` fast paths for: - lists of already evaluated `ndd.Tensor` objects, preserving storage, layout, dtype, enum metadata, laziness behavior where applicable, and CUDA stream order; - lists of external GPU tensors exposing DLPack, including stream-keyword handshakes and same-device validation; - lists of CPU and CUDA-host/pinned CPU DLPack tensors. - Keeps requested dtype conversion on the existing slow path, so external DLPack capsules are not consumed before conversion fallback. - Improves `ndd.Tensor` DLPack ingestion: - CPU read-only arrays that raise `BufferError` fall back to an explicit copy through the array interface; - GPU objects retry `__dlpack__()` without the `stream` keyword only when the producer rejects the keyword, and fall back to `__cuda_array_interface__` only when DLPack is unavailable. - Fixes mixed pinned/non-pinned non-contiguous GPU TensorLists. GPU sample sharing no longer requires uniform `is_pinned()` state, which unblocks mixed-GPU batches where cross-device copies use pinned staging memory. CPU TensorLists still enforce pinned-state compatibility. - Makes `UserStream::GetStream(size_t dev)` public so the new bulk DLPack path can derive a concrete per-device consumer stream. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
1 parent 44419fa commit 2842a83

9 files changed

Lines changed: 757 additions & 75 deletions

File tree

dali/pipeline/data/tensor_list.cc

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -257,9 +257,11 @@ void TensorList<Backend>::VerifySampleShareCompatibility(DALIDataType type, int
257257
this->GetLayout(), ", new: ", layout, " or come with empty layout ",
258258
error_suffix));
259259

260-
DALI_ENFORCE(this->is_pinned() == pinned,
261-
make_string("Sample must have the same pinned status as target batch, current: ",
262-
this->is_pinned(), ", new: ", pinned, error_suffix));
260+
if constexpr (std::is_same_v<Backend, CPUBackend>) {
261+
DALI_ENFORCE(this->is_pinned() == pinned,
262+
make_string("Sample must have the same pinned status as target batch, current: ",
263+
this->is_pinned(), ", new: ", pinned, error_suffix));
264+
}
263265

264266
DALI_ENFORCE(this->device_id() == device_id,
265267
make_string("Sample must have the same device id as target batch, current: ",

dali/pipeline/data/tensor_list_test.cc

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -983,13 +983,18 @@ TYPED_TEST(TensorListSuite, SetupLikeMultiGPU) {
983983
template <typename Backend>
984984
std::vector<std::pair<std::string, std::function<void(TensorList<Backend> &)>>> SetRequiredSetters(
985985
int sample_dim, DALIDataType type, TensorLayout layout, bool pinned, int device_id) {
986-
return {
986+
std::vector<std::pair<std::string, std::function<void(TensorList<Backend> &)>>> result = {
987987
{"sample dim", [sample_dim](TensorList<Backend> &t) { t.set_sample_dim(sample_dim); }},
988988
{"type", [type](TensorList<Backend> &t) { t.set_type(type); }},
989989
{"layout", [layout](TensorList<Backend> &t) { t.SetLayout(layout); }},
990990
{"device id", [device_id](TensorList<Backend> &t) { t.set_device_id(device_id); }},
991-
{"pinned", [pinned](TensorList<Backend> &t) { t.set_pinned(pinned); }},
992991
};
992+
// GPU TensorList allows mixed pinned/non-pinned samples in non-contiguous mode
993+
// (pinned tensors are valid GPU-accessible staging buffers).
994+
if constexpr (std::is_same_v<Backend, CPUBackend>) {
995+
result.push_back({"pinned", [pinned](TensorList<Backend> &t) { t.set_pinned(pinned); }});
996+
}
997+
return result;
993998
}
994999

9951000
TYPED_TEST(TensorListSuite, PartialSetupSetMultiGPU) {

0 commit comments

Comments
 (0)