Skip to content

A merge of a bunch of different changes for pytorch.#14

Open
AWoloszyn wants to merge 4 commits into
mainfrom
users/awoloszyn/rollup-for-pytorch
Open

A merge of a bunch of different changes for pytorch.#14
AWoloszyn wants to merge 4 commits into
mainfrom
users/awoloszyn/rollup-for-pytorch

Conversation

@AWoloszyn
Copy link
Copy Markdown
Collaborator

@AWoloszyn AWoloszyn commented May 28, 2026

This is a roll-up of a significant number of changes that are required for getting pytorch running on top of hrx.

Copy link
Copy Markdown
Collaborator

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to carry the compiler support forward (likely you just got this from the merge but it was deleted in this branch).

As discussed, since you are flushing your perma-branches, I am ok with this going in as-is. But this needs unit and/or CTS tests, especially for things like the ELF/CCOB manipulation. Please file an issue so we don't lose track and can get this test coverage in. Not blocking this one so we can get out of branch debt.

Comment thread libhrx/include/hrx_compiler.h Outdated
@@ -0,0 +1,69 @@
// Copyright 2026 The HRX Authors
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we dropped this entirely (and the cc file).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

This is a roll-up of a significant number of changes that are
required for getting pytorch running on top of hrx.
@AWoloszyn AWoloszyn force-pushed the users/awoloszyn/rollup-for-pytorch branch from 03ca5f8 to 6ff95b5 Compare June 1, 2026 13:11
@AWoloszyn AWoloszyn marked this pull request as ready for review June 1, 2026 14:53
Copy link
Copy Markdown
Contributor

@zjgarvey zjgarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to read the bits which seemed important for me to understand better: e.g., kernel launching, and command buffer/stream interplay handling.

I have a few other questions, but could probably wait.

Comment thread libhrx/CMakeLists.txt
Comment thread libhrx/src/binding/common/stream.c

while (remaining > 0) {
size_t this_chunk = remaining < chunk_size ? remaining : chunk_size;
iree_status_t iree_status = iree_hal_device_transfer_h2d(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this API fully synchronous? Why is this section labelled Async Stream Transfers? I guess the same below for d2h. I wanted to understand how async transfers worked between host and device and I'm kind of confused here. Are the async transfers just not implemented yet?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a terminology question more than anything: does "host" literally mean "host thread" so it must inherently block on the host thread? Or can some separate thread on the CPU do the transfer and signal to the host thread when it's done? Not sure how important this is unless we are paging data for a training job?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I will answer both of the questions in ordedr

  1. Are the async transfers just not implemented yet? << Yes this is the case. This was originally used by the streaming layer, but then we ended up writing it ourselves in the streaming layer. (There is more work to be done to move more of binding/common to use the libhrx functions instead of iree functions directly.
  2. "host" here means "memory residing in system RAM not GPU ram"

stream->submitted_value = signal_value;
}
} else {
status = iree_hal_command_buffer_dispatch(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This writes the kernel launch into the stream->command_buffer, correct?

I'm trying to understand what the lock stream->mutex is being used for (It's only used in stream_begin and stream_flush), and since this isn't under lock, I'm wondering if it possible that this could be interleaved with a stream_flush from a different thread to result in a UAF/nullptr deref or something?

Copy link
Copy Markdown
Collaborator Author

@AWoloszyn AWoloszyn Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is legal to send commands to a hip stream from multiple threads at once. So, this is SOME work in the API to handle this, but it is incomplete.
I have filed #26 to add tests and fix the implementation here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants