Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@
from concurrent.futures import CancelledError
from contextlib import contextmanager
from dataclasses import dataclass
from pathlib import Path, PosixPath
from pathlib import Path
from queue import Queue
from urllib.parse import urlparse

import click
import pexpect
import requests
from jumpstarter_driver_composite.client import CompositeClient
from jumpstarter_driver_opendal.client import FlasherClient, OpendalClient, operator_for_path
from jumpstarter_driver_opendal.client import FlasherClient, OpendalClient, clean_filename, operator_for_path
from jumpstarter_driver_opendal.common import PathBuf
from jumpstarter_driver_pyserial.client import Console
from opendal import Metadata, Operator
Expand Down Expand Up @@ -167,10 +167,14 @@ def flash( # noqa: C901
"http", root="/", endpoint=f"{parsed.scheme}://{parsed.netloc}", token=bearer_token
)
operator_scheme = "http"
path = Path(parsed.path)
# Preserve query parameters so that signed URLs
# (e.g. CloudFront with ?Expires=...&Signature=...) work correctly.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Inline comments could be replaced with self-documenting code.

Same as the opendal client: the project convention states comments are only acceptable as a last resort. These inline comments explaining query parameter preservation could be eliminated by extracting the pattern into a well-named helper method.

AI-generated, human reviewed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. The inline comments in the flashers client have been replaced by delegating to the shared path_with_query() helper, making the code self-explanatory.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. Same approach as the opendal client -- the inline comments in the flashers client have been removed. The path_with_query() helper and updated docstrings on clean_filename() and operator_for_path() make the signed URL behavior self-documenting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181 alongside the opendal client. The inline comments in the flashers client were replaced with the same self-documenting approach: the path_with_query() helper and updated docstrings make the query parameter handling self-explanatory without inline commentary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. Replaced the inline comments in the flashers client with self-documenting code, same as the opendal client. Both now use the path_with_query() helper and clean_filename() with descriptive docstrings, eliminating the need for inline comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8ff6181 along with the opendal client instance. Both inline comment blocks have been replaced by the self-documenting path_with_query() helper.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. The inline comments in the flashers client have been replaced with self-documenting code, mirroring the changes made in the opendal client. The query parameter handling is now done through the path_with_query() helper with a descriptive docstring, and the clean_filename() helper provides the filename extraction logic. Both are self-explanatory without inline comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9ce59ff. The inline comments in the flashers client were replaced with the self-documenting path_with_query() helper, matching the same refactoring done in the opendal client.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Same as the opendal client -- the inline comments were replaced by the path_with_query() helper, making the code self-documenting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment in the bearer token path was replaced with the path_with_query() helper, which is self-documenting through its name and docstring.

path = parsed.path
if parsed.query:
path = f"{path}?{parsed.query}"
else:
path, operator, operator_scheme = operator_for_path(path)
image_url = self.http.get_url() + "/" + path.name
image_url = self.http.get_url() + "/" + self._filename(path)

# start counting time for the flash operation
start_time = time.time()
Expand Down Expand Up @@ -968,7 +972,7 @@ def _transfer_bg_thread(
"""
self.logger.info(f"Writing image to storage in the background: {src_path}")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Signed URL credentials leaked in log output.

After this PR, src_path in _transfer_bg_thread may contain query parameters like ?Expires=...&Signature=AbCdEf...&Key-Pair-Id=KXYZ123. This is logged at INFO level:

self.logger.info(f"Writing image to storage in the background: {src_path}")

The Signature and Key-Pair-Id values are authentication material that could be extracted from logs and replayed before expiration.

Suggested fix: log only the clean filename instead:

self.logger.info(f"Writing image to storage in the background: {self._filename(src_path)}")

AI-generated, human reviewed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. The _transfer_bg_thread method now extracts the clean filename via self._filename(src_path) before logging, so only the filename (e.g. image.raw.xz) appears in logs instead of the full signed URL with credentials.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already addressed in the current revision. The log line now uses self._filename(src_path) which strips query parameters via clean_filename().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 8ff6181. The log line now uses self._filename(src_path) (which calls clean_filename) instead of logging the raw src_path with query parameters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good security catch. Fixed in commit 8ff6181 -- the log line now uses self._filename(src_path) instead of the raw src_path, so only the clean filename is logged and no query parameters (containing Signature/Key-Pair-Id) are exposed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the security issue. Fixed -- the log line now uses self._filename(src_path) to log only the clean filename, not the full URL with credentials.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the credential leak. Fixed -- _transfer_bg_thread now logs only the clean filename:

filename = self._filename(src_path)
self.logger.info(f"Writing image to storage in the background: {filename}")

This strips all query parameters (including Signature, Key-Pair-Id, Expires) before logging.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 8ff61818. The log line now uses self._filename(src_path) to log only the clean filename, preventing signed URL credentials from appearing in log output.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The log line now uses self._filename(src_path) so only the clean filename is logged, not the full signed URL with credentials.

try:
filename = Path(src_path).name if isinstance(src_path, (str, os.PathLike)) else src_path.name
filename = self._filename(src_path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Signed URL credentials persisted in metadata JSON file (see _create_metadata_and_json around line 1026).

The full src_path (now including query parameters with Signature, Key-Pair-Id, Expires) is stored in a metadata dictionary:

metadata_dict = {"path": str(src_path)}

This is serialized to JSON and written to exporter storage as filename + ".metadata", persisting authentication material indefinitely.

Suggested fix: strip query parameters before storing:

metadata_dict = {"path": clean_filename(src_path) if isinstance(src_path, str) and "?" in src_path else str(src_path)}

AI-generated, human reviewed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. The metadata dictionary now uses clean_filename(src_path) instead of str(src_path), so query parameters containing authentication material are stripped before being persisted to the JSON metadata file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already addressed in the current revision. The metadata dict now uses clean_filename(src_path) to strip query parameters before persisting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 8ff6181. The metadata dict now stores clean_filename(src_path) instead of str(src_path), so query parameters with authentication material are never persisted.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- storing signed URL credentials in the metadata file was a security issue. Fixed in commit 8ff6181 by using clean_filename(src_path) for the metadata "path" field, which strips all query parameters before persisting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed -- metadata_dict["path"] now uses clean_filename(src_path) to strip query parameters before persisting to the metadata JSON file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed -- _create_metadata_and_json now uses clean_filename(src_path) for the metadata path, so authentication material is never persisted to the metadata JSON file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 8ff61818. The metadata dict now stores clean_filename(src_path) instead of the raw str(src_path), so query parameters with authentication material are no longer persisted to the metadata JSON file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The metadata dict now stores clean_filename(src_path) instead of the full path with query params, preventing credential persistence in the JSON file.


if src_operator_scheme == "fs":
file_hash = self._sha256_file(src_operator, src_path)
Expand Down Expand Up @@ -1088,8 +1092,8 @@ def dump(
raise NotImplementedError("Dump is not implemented for this driver yet")

def _filename(self, path: PathBuf) -> str:
"""Extract filename from url or path"""
if path.startswith("oci://"):
"""Extract filename from url or path, stripping any query parameters"""
if isinstance(path, str) and path.startswith("oci://"):
oci_path = path[6:] # Remove "oci://" prefix
if ":" in oci_path:
repository, tag = oci_path.rsplit(":", 1)
Expand All @@ -1098,10 +1102,8 @@ def _filename(self, path: PathBuf) -> str:
else:
repo_name = oci_path.split("/")[-1] if "/" in oci_path else oci_path
return repo_name
elif path.startswith(("http://", "https://")):
return urlparse(path).path.split("/")[-1]
else:
return Path(path).name
return clean_filename(path)
Comment thread
raballew marked this conversation as resolved.

def _upload_artifact(self, storage, path: PathBuf, operator: Operator):
"""Upload artifact to storage"""
Expand Down Expand Up @@ -1636,17 +1638,12 @@ def _get_decompression_command(filename_or_url) -> str:
Determine the appropriate decompression command based on file extension

Args:
filename (str): Name of the file to check
filename_or_url (str): Name of the file or URL to check

Returns:
str: Decompression command ('zcat', 'xzcat', or 'cat' for uncompressed)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] _get_decompression_command docstring is inaccurate.

The docstring says Returns: str: Decompression command ('zcat', 'xzcat', or 'cat' for uncompressed) but the function actually returns an empty string "" for uncompressed files, not "cat".

Suggested fix: update the docstring to match the actual return value.

AI-generated, human reviewed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. The docstring now correctly states the return values: 'zcat |', 'xzcat |', or '' for uncompressed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. The docstring for _get_decompression_command now accurately reflects the return values: 'zcat |', 'xzcat |', or '' for uncompressed files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. The docstring for _get_decompression_command() now accurately states that it returns an empty string "" for uncompressed files, matching the actual implementation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. The docstring now correctly states the return values: 'zcat |', 'xzcat |', or '' for uncompressed, matching the actual behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e5b532c. The docstring now reads 'zcat |', 'xzcat |', 'zstdcat |', or '' for uncompressed which matches the actual return values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e5b532c. The _get_decompression_command docstring now accurately reflects the actual return values: 'zcat |' for .gz/.gzip, 'xzcat |' for .xz, 'zstdcat |' for .zst, and '' (empty string) for uncompressed files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e5b532c. The docstring now accurately reflects the return values: 'zcat |', 'xzcat |', 'zstdcat |', or '' for uncompressed files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The docstring now reads ('zcat |', 'xzcat |', 'zstdcat |', or '' for uncompressed) which matches the actual return values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docstring to accurately reflect the return values: 'zcat |', 'xzcat |', 'zstdcat |', or '' for uncompressed.

"""
if type(filename_or_url) is PosixPath:
filename = filename_or_url.name
elif filename_or_url.startswith(("http://", "https://")):
filename = urlparse(filename_or_url).path.split("/")[-1]

filename = filename.lower()
filename = clean_filename(filename_or_url).lower()
if filename.endswith((".gz", ".gzip")):
return "zcat |"
elif filename.endswith(".xz"):
Comment on lines +1635 to 1638
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] _get_decompression_command does not handle .zst extension.

The implementation handles .gz/.gzip and .xz but has no branch for .zst. A .zst compressed image fetched from a signed URL would be treated as uncompressed, silently producing a corrupt flash.

Suggested fix: add a .zst branch returning "zstdcat |" and a corresponding test case.

AI-generated, human reviewed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Adding .zst/zstdcat support is a valid enhancement but is out of scope for this PR, which focuses specifically on fixing signed URL handling. The .zst gap predates this PR and applies equally to non-signed URLs. I'd suggest tracking this as a separate issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the missing .zst extension, but this is a pre-existing gap that was present before this PR. Adding .zst support would change the scope of this fix beyond the original issue (preserving URL query parameters for signed URLs). I would suggest tracking this as a separate issue/PR to keep changes focused.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Adding .zst support with zstdcat | and a corresponding test case in the next push.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added .zst support in commit e5b532c. The _get_decompression_command() function now includes a .zst branch returning "zstdcat |", and test_decompression_command_with_query_params covers .zst for both PosixPath, HTTP URL, and query-parameter paths.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added .zst support returning "zstdcat |" along with test cases for both PosixPath and string URL inputs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. A .zst branch returning "zstdcat |" has been added to _get_decompression_command, along with corresponding test cases for both PosixPath and string URL inputs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: after further consideration, .zst decompression support was added in e5b532c along with a corresponding test case. The _get_decompression_command function now returns "zstdcat |" for .zst files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in commit e5b532c5. The .zst extension now maps to zstdcat | and test cases cover both PosixPath and string URL inputs with .zst files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the .zst branch returning "zstdcat |" along with test cases for both PosixPath and signed URL string paths.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,144 @@ def test_categorize_exception_preserves_cause_for_wrapped_exceptions():
assert "File not found" in str(result)


def test_filename_strips_query_params_from_url_path():
"""Test _filename strips query parameters from paths with signed URL params"""
client = MockFlasherClient()

# Full HTTP URL
assert client._filename("https://cdn.example.com/images/image.raw.xz") == "image.raw.xz"

# Full HTTP URL with query parameters (e.g. CloudFront signed URL)
assert (
client._filename("https://cdn.example.com/images/image.raw.xz?Expires=123&Signature=abc&Key-Pair-Id=xyz")
== "image.raw.xz"
)

# Path string with query parameters (as returned by operator_for_path after fix)
assert client._filename("/images/image.raw.xz?Expires=123&Signature=abc") == "image.raw.xz"

# Plain path without query parameters
assert client._filename("/images/image.raw.xz") == "image.raw.xz"

# OCI path
assert client._filename("oci://quay.io/org/myimage:latest") == "myimage-latest"


def test_decompression_command_with_query_params():
"""Test _get_decompression_command handles paths with query parameters"""
from pathlib import PosixPath

from .client import _get_decompression_command

# Standard PosixPath
assert _get_decompression_command(PosixPath("/images/image.raw.xz")) == "xzcat |"
assert _get_decompression_command(PosixPath("/images/image.raw.gz")) == "zcat |"
assert _get_decompression_command(PosixPath("/images/image.raw")) == ""

# Full HTTP URL
assert _get_decompression_command("https://cdn.example.com/images/image.raw.xz") == "xzcat |"

# String path with query parameters (as returned by operator_for_path for signed URLs)
assert _get_decompression_command("/images/image.raw.xz?Expires=123&Signature=abc") == "xzcat |"
assert _get_decompression_command("/images/image.raw.gz?Expires=123") == "zcat |"
assert _get_decompression_command("/images/image.raw?Expires=123") == ""


def test_flash_signed_url_preserves_query_params():
"""Test that flash with a signed HTTP URL preserves query parameters for image_url"""
client = MockFlasherClient()

class DummyService:
def __init__(self):
self.storage = object()

def start(self):
pass

def stop(self):
pass

def get_url(self):
return "http://exporter"

client.http = DummyService()
client.tftp = DummyService()
client.call = lambda *args, **kwargs: None

captured = {}

def capture_perform(*args):
captured["image_url"] = args[3]
captured["should_download_to_httpd"] = args[4]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Flash integration tests use brittle positional arg capture.

Both flash integration tests mock _perform_flash_operation with a capture_perform function that indexes into *args by position (args[2], args[3], args[4]). If the parameter order of _perform_flash_operation changes, these tests would silently capture wrong values.

Suggested fix: use named parameters instead:

def capture_perform(partition, block_device, path, image_url, should_download_to_httpd, *rest):

AI-generated, human reviewed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9ce59ff. The last remaining test using positional arg capture (test_flash_http_url_with_oci_credentials_still_uses_direct_http_path) now uses named parameters matching the full _perform_flash_operation signature, consistent with the other test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. Both capture_perform functions in the flash integration tests now use named parameters instead of positional *args indexing, so they won't silently break if the parameter order of _perform_flash_operation changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9ce59ff. The capture_perform functions in both flash integration tests now use named parameters (partition, block_device, path, image_url, should_download_to_httpd) instead of positional *args indexing, making them resilient to parameter reordering.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 15d3321. The capture_perform function in the flash integration tests now uses named parameters instead of brittle positional indexing (args[2], args[3], etc.). The newer test (test_flash_bearer_token_signed_url_preserves_query_params) also uses named parameters to stay consistent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181 and e5b532c. All capture_perform functions now use fully named parameters instead of positional *args indexing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9ce59ff and e5b532c. All capture_perform callbacks in the flash integration tests now use named parameters (partition, block_device, path, image_url, should_download_to_httpd, *rest) instead of indexing into *args by position. This makes the tests resilient to parameter reordering.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9ce59ff. Both flash integration tests now use named parameters in the capture_perform function signature instead of positional *args indexing, making them resilient to parameter order changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 9ce59ff1. Both capture_perform functions now use named parameters as suggested.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. All capture_perform functions in the tests now use named parameters instead of positional indexing.


client._perform_flash_operation = capture_perform

# Direct HTTP URL with query params (no force_exporter_http) should preserve full URL
signed_url = "https://cdn.example.com/images/image.raw.xz?Expires=123&Signature=abc&Key-Pair-Id=xyz"
client.flash(signed_url, method="fls", fls_version="")

assert captured["image_url"] == signed_url
assert captured["should_download_to_httpd"] is False
Comment thread
raballew marked this conversation as resolved.
Comment thread
raballew marked this conversation as resolved.


def test_flash_bearer_token_signed_url_preserves_query_params():
"""Test that flash with force_exporter_http=True and bearer token preserves query params.

When a signed URL is used with a bearer token, the flash() method enters the
bearer token code path (lines 162-174 in client.py) which reconstructs the path
from parsed.path + '?' + parsed.query. This test verifies query params are preserved
and the path passed to the storage thread is correct.
"""
client = MockFlasherClient()

class DummyService:
def __init__(self):
self.storage = object()

def start(self):
pass

def stop(self):
pass

def get_url(self):
return "http://exporter"

def get_host(self):
return "127.0.0.1"

client.http = DummyService()
client.tftp = DummyService()
client.call = lambda *args, **kwargs: None

captured = {}

def capture_perform(*args):
captured["path"] = args[2]
captured["image_url"] = args[3]
captured["should_download_to_httpd"] = args[4]

client._perform_flash_operation = capture_perform
# Mock the background transfer thread to prevent it from actually running
client._transfer_bg_thread = lambda *args, **kwargs: None

signed_url = "https://cdn.example.com/images/image.raw.xz?Expires=123&Signature=abc&Key-Pair-Id=xyz"
client.flash(
signed_url,
force_exporter_http=True,
bearer_token="test-token-123",
method="fls",
fls_version="",
)

# With force_exporter_http=True and bearer_token, should download to httpd
assert captured["should_download_to_httpd"] is True
# The path should have query params preserved (reconstructed from parsed.path + '?' + parsed.query)
assert captured["path"] == "/images/image.raw.xz?Expires=123&Signature=abc&Key-Pair-Id=xyz"
# The image_url should point to the exporter with the clean filename (no query params)
assert captured["image_url"] == "http://exporter/image.raw.xz"


def test_resolve_flash_parameters():
"""Test flash parameter resolution for single file, partitions, and error cases"""
client = MockFlasherClient()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,21 @@ async def aclose(self):
pass


def clean_filename(path: PathBuf) -> str:
"""Extract a clean filename from a path or URL, stripping query parameters.

This handles paths returned by operator_for_path() which may contain
query parameters for signed URLs (e.g. /path/to/image.raw.xz?Expires=...&Signature=...).
"""
path_str = str(path)
if path_str.startswith(("http://", "https://")):
return urlparse(path_str).path.split("/")[-1]
name = Path(path_str).name
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] clean_filename() produces wrong results when query parameters contain unencoded / characters.

The non-URL branch calls Path(path_str).name before stripping query parameters. If the query string has unencoded slashes (e.g. base64-encoded signatures like ?Signature=abc/def/ghi), Path treats them as directory separators and .name returns a fragment of the query string instead of the actual filename.

Example: clean_filename("/images/image.raw.xz?Expires=123&Signature=abc/def/ghi") returns "ghi" instead of "image.raw.xz".

Suggested fix: strip the query string before passing to Path().name:

def clean_filename(path: PathBuf) -> str:
    path_str = str(path)
    if path_str.startswith(("http://", "https://")):
        return urlparse(path_str).path.split("/")[-1]
    if "?" in path_str:
        path_str = path_str.split("?", 1)[0]
    return Path(path_str).name

AI-generated, human reviewed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. The clean_filename() now strips the query string before passing to Path().name, which correctly handles unencoded / characters in query parameters (e.g. base64-encoded signatures). Added a dedicated test case for this edge case in both driver_test.py and client_test.py.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this case. I checked the current implementation and it actually handles this correctly already -- the query string is stripped before passing to Path().name. So for the example path, the split on '?' produces '/images/image.raw.xz' first, then Path().name returns 'image.raw.xz'. There is also an existing test for this exact edge case in test_clean_filename.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 8ff6181. The non-URL branch now strips query parameters before calling Path().name, so paths like /images/image.raw.xz?Signature=abc/def/ghi correctly return image.raw.xz. Added a regression test for this edge case in test_clean_filename.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch -- this was a real bug. Fixed in commit 8ff6181 by stripping the query string before passing to Path().name, exactly as you suggested. The clean_filename() function now does:

if "?" in path_str:
    path_str = path_str.split("?", 1)[0]
return Path(path_str).name

Also added a test case for this edge case: clean_filename("/images/image.raw.xz?Expires=123&Signature=abc/def/ghi") correctly returns "image.raw.xz".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. Fixed by stripping query parameters before passing to Path().name. The fix follows your suggested approach. Also added a test case for this exact scenario: clean_filename("/images/image.raw.xz?Expires=123&Signature=abc/def/ghi") == "image.raw.xz".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This is now fixed -- clean_filename() strips the query string before calling Path().name, exactly as you suggested. The non-URL branch does path_str.split("?", 1)[0] first, then passes the result to Path().name. A test case for this edge case (unencoded slashes in query params like Signature=abc/def/ghi) is included in test_clean_filename in driver_test.py.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in commit 8ff61818 -- clean_filename() now strips the query string with split("?", 1)[0] before passing to Path().name, preventing unencoded / characters in query params from breaking filename extraction. A dedicated test case for this exact scenario (Signature=abc/def/ghi) was added in driver_test.py.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. Fixed exactly as suggested -- clean_filename() now strips the query string before passing to Path().name, which avoids the unencoded / issue. The test_clean_filename test includes a case for Signature=abc/def/ghi to prevent regression.

if "?" in name:
name = name.split("?", 1)[0]
return name
Comment thread
raballew marked this conversation as resolved.
Outdated


def operator_for_path(path: PathBuf) -> tuple[PathBuf, Operator, str]:
"""Create an operator for the given path
Return a tuple of:
Expand All @@ -54,7 +69,13 @@ def operator_for_path(path: PathBuf) -> tuple[PathBuf, Operator, str]:
if type(path) is str and path.startswith(("http://", "https://")):
parsed_url = urlparse(path)
operator = Operator("http", root="/", endpoint=f"{parsed_url.scheme}://{parsed_url.netloc}")
return Path(parsed_url.path), operator, "http"
# Preserve query parameters in the path so that signed URLs
# (e.g. CloudFront URLs with ?Expires=...&Signature=...&Key-Pair-Id=...)
# are fetched correctly by the OpenDAL HTTP operator.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Inline comments could be replaced with self-documenting code.

The project convention states comments are only acceptable as a last resort when the code cannot be refactored to be self-explanatory. The inline comments here explaining query parameter preservation could be eliminated by extracting the pattern into a well-named helper method or relying on docstrings.

AI-generated, human reviewed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. Replaced the inline comments with self-documenting code: extracted path_with_query() helper and added docstrings to clean_filename() and operator_for_path() that explain the signed URL behavior directly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comments explain why query parameters need to be preserved (signed URLs with authentication tokens), which is not immediately obvious from the code alone. The helper functions (clean_filename, path_with_query) are self-documenting for what they do, but the comments provide context on why they exist. Happy to remove them if the team feels strongly about it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 8ff6181 by extracting the path_with_query() helper function, eliminating the inline comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by extracting the logic into well-named helper functions: clean_filename() and path_with_query(), each with docstrings that explain their purpose. The inline comments in operator_for_path have been replaced with the function's docstring explaining the signed URL behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Replaced the inline comments with the self-documenting path_with_query() helper. The function name and its docstring convey the intent without needing inline comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comments explaining query parameter preservation have been mostly replaced by self-documenting code: the helper functions clean_filename() and path_with_query() have descriptive names and docstrings that make the intent clear. The remaining docstring on operator_for_path() documents the signed URL behavior, which seems appropriate since the query-param-preserving behavior is not obvious from the function signature alone and callers need to know about it. I think the current state is a reasonable balance.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. The inline comments were replaced by extracting the path_with_query() helper function, which makes the intent self-documenting. The remaining text in operator_for_path is in the docstring, which is appropriate for explaining the function's contract.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by moving the explanation into the docstrings of clean_filename(), path_with_query(), and operator_for_path(). The remaining inline comment in operator_for_path was removed in favor of the function-level docstring.

op_path = parsed_url.path
if parsed_url.query:
op_path = f"{op_path}?{parsed_url.query}"
return op_path, operator, "http"
Comment thread
raballew marked this conversation as resolved.
Outdated
else:
return Path(path).resolve(), Operator("fs", root="/"), "fs"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] operator_for_path return type inconsistency between branches.

The HTTP branch now returns (str, Operator, str) while the filesystem branch returns (Path, Operator, str). Both are valid PathBuf and current callers handle both types correctly, but the type change from Path to str for the HTTP branch was introduced by this PR (previously it returned Path(parsed_url.path)).

AI-generated, human reviewed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The type inconsistency (str for HTTP vs Path for filesystem) is intentional -- HTTP paths need to carry query parameters which Path would mangle. Both are valid PathBuf types and all callers handle them correctly. The updated docstring on operator_for_path() now explicitly documents this: "the path (str for HTTP, Path for filesystem)".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The HTTP branch returns str while the filesystem branch returns Path -- both are valid PathBuf and all callers handle both types correctly. Keeping str for the HTTP branch is intentional: Path() would strip the query parameters that signed URLs need. The type annotation PathBuf = str | os.PathLike already covers this, so no change needed here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The HTTP branch now returns (str, Operator, str) while the filesystem branch returns (Path, Operator, str). Both are valid PathBuf types and all current callers handle both correctly. The type union is intentional since the HTTP path needs to carry query parameters which Path would strip. The function's return type annotation already uses PathBuf (which is str | Path), so this is consistent with the type contract.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The docstring for operator_for_path() now explicitly documents this: the path (str for HTTP, Path for filesystem). The type annotation uses PathBuf which accepts both, and all callers handle both types correctly. Keeping it this way is intentional -- returning a str for the HTTP branch is necessary to preserve query parameters that Path would strip.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The return type annotation PathBuf encompasses both str and Path, so the signature is accurate. The change from Path to str for the HTTP branch is intentional -- Path would strip the query parameters. All callers handle both types correctly through the PathBuf union type.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The return type difference is intentional -- the HTTP branch returns str because Path would strip query parameters from signed URLs, breaking the core functionality this PR fixes. The filesystem branch continues to return Path as before. Both are valid PathBuf types and all callers handle them correctly. This is now documented in the operator_for_path() docstring.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The HTTP branch returns str while the filesystem branch returns Path -- both are valid PathBuf types and all downstream callers handle both correctly. Since changing the filesystem branch to return str would be a larger refactor with no functional benefit, leaving this as-is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The type signature uses PathBuf which accepts both str and Path, and the docstring now explicitly notes str for HTTP, Path for filesystem. All current callers handle both types correctly. Unifying to a single return type would require either converting the filesystem branch to str (losing Path convenience) or wrapping the HTTP path in Path (which would strip query parameters). The current approach is the least-invasive option.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged in the updated operator_for_path() docstring which now explicitly notes str for HTTP and Path for filesystem. Both are valid PathBuf values and all callers handle both types correctly.


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -322,3 +322,63 @@ def test_copy_and_rename_tracking(tmp_path):
assert "copied_dir" in created_paths
assert "renamed_dir" in created_paths
assert len(created_paths) == 4


def test_clean_filename():
"""Test clean_filename extracts filenames and strips query parameters"""
from pathlib import PosixPath

from .client import clean_filename

# Plain filesystem path
assert clean_filename("/images/image.raw.xz") == "image.raw.xz"
assert clean_filename(PosixPath("/images/image.raw.xz")) == "image.raw.xz"

# Filesystem path with query params (as returned by operator_for_path for signed URLs)
assert clean_filename("/images/image.raw.xz?Expires=123&Signature=abc") == "image.raw.xz"

# Full HTTP URL without query params
assert clean_filename("https://cdn.example.com/images/image.raw.xz") == "image.raw.xz"
assert clean_filename("http://cdn.example.com/images/image.raw.xz") == "image.raw.xz"

# Full HTTP URL with query params (e.g. CloudFront signed URL)
assert (
clean_filename("https://cdn.example.com/images/image.raw.xz?Expires=123&Signature=abc&Key-Pair-Id=xyz")
== "image.raw.xz"
)

# Edge case: no directory component
assert clean_filename("image.raw.xz") == "image.raw.xz"
assert clean_filename("image.raw.xz?Expires=123") == "image.raw.xz"

# Edge case: compressed extensions
assert clean_filename("/path/to/image.raw.gz?token=abc") == "image.raw.gz"
assert clean_filename("/path/to/image.raw.gzip?token=abc") == "image.raw.gzip"


def test_operator_for_path_preserves_query_params():
"""Test that operator_for_path preserves query parameters for HTTP URLs"""
from .client import operator_for_path

# HTTP URL without query parameters
path, operator, scheme = operator_for_path("https://cdn.example.com/images/image.raw.xz")
assert scheme == "http"
assert path == "/images/image.raw.xz"

# HTTP URL with query parameters (e.g. CloudFront signed URL)
path, operator, scheme = operator_for_path(
"https://cdn.example.com/images/image.raw.xz?Expires=123&Signature=abc&Key-Pair-Id=xyz"
)
assert scheme == "http"
assert path == "/images/image.raw.xz?Expires=123&Signature=abc&Key-Pair-Id=xyz"
assert "Expires=123" in path
assert "Signature=abc" in path
assert "Key-Pair-Id=xyz" in path

# Filesystem path (use resolve() for the expected value since macOS
# resolves /tmp to /private/tmp)
from pathlib import Path

path, operator, scheme = operator_for_path("/tmp/image.raw.xz")
assert scheme == "fs"
assert path == Path("/tmp/image.raw.xz").resolve()
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

import click
from jumpstarter_driver_composite.client import CompositeClient
from jumpstarter_driver_opendal.client import FlasherClient, operator_for_path
from jumpstarter_driver_opendal.client import FlasherClient, clean_filename, operator_for_path
from jumpstarter_driver_power.client import PowerClient
from opendal import Operator

Expand Down Expand Up @@ -39,7 +39,7 @@ def _upload_file_if_needed(self, file_path: str, operator: Operator | None = Non
path_buf = Path(file_path)
operator_scheme = "unknown"

filename = Path(path_buf).name
filename = clean_filename(path_buf)
Comment thread
raballew marked this conversation as resolved.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] No test coverage for ridesx _upload_file_if_needed with signed URL paths.

This PR changes this line from Path(path_buf).name to clean_filename(path_buf), but there is no test in ridesx verifying that filename extraction works correctly when query parameters are present.

Suggested fix: add a test verifying clean_filename() is called correctly, e.g. by testing with a path like /images/image.raw.xz?Expires=123&Signature=abc.

AI-generated, human reviewed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ff6181. Added test_upload_file_if_needed_strips_query_params in client_test.py that verifies clean_filename() correctly strips query parameters from signed URL paths, including the edge case with unencoded slashes in signatures.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already addressed -- test_upload_file_if_needed_strips_query_params in client_test.py verifies clean_filename works correctly with signed URL paths including query parameters with unencoded slashes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test_upload_file_if_needed_strips_query_params in commit 8ff6181 which verifies clean_filename() handles signed URL paths with query parameters correctly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test_upload_file_if_needed_strips_query_params in ridesx client_test.py, which verifies that clean_filename() correctly strips query parameters from signed URL paths including the edge case with unencoded slashes in the Signature parameter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test_upload_file_if_needed_strips_query_params in ridesx client_test.py verifying that clean_filename() correctly strips query params from signed URL paths, including the case with unencoded slashes in the signature.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. A test test_upload_file_if_needed_strips_query_params has been added to the ridesx client_test.py, verifying that clean_filename() correctly strips query parameters from signed URL paths, including the edge case with unencoded slashes in the signature.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in client_test.py -- the test_upload_file_if_needed_strips_query_params test verifies clean_filename() works correctly for signed URL paths including those with unencoded slashes in query parameters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test_upload_file_if_needed_strips_query_params in client_test.py that verifies clean_filename() handles paths with query params including unencoded slashes in signatures.


if self._should_upload_file(self.storage, filename, path_buf, operator, operator_scheme):
if operator_scheme == "http":
Expand Down
Loading