-
Notifications
You must be signed in to change notification settings - Fork 1
Add pixel chunking to specification #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds TCZYX-aware chunking support: new schema fields ( Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Ingest as Ingest
participant Chunker as Chunk Builder
participant Meta as OME-Arrow Struct
participant Export as Export/Transform
participant Extract as Plane Extractor
User->>Ingest: submit planes + chunk params
activate Ingest
Ingest->>Chunker: _build_chunks_from_planes(...)
activate Chunker
Chunker->>Chunker: split planes into chunks (TCZYX)
Chunker-->>Ingest: chunks list (coords, shapes, pixels)
deactivate Chunker
Ingest->>Meta: assemble StructScalar with chunk_grid & chunks
Meta-->>Ingest: StructScalar
deactivate Ingest
User->>Export: request export / slice
activate Export
Export->>Chunker: reconstruct or slice chunks (if needed)
Chunker-->>Export: updated chunks / plane data
Export-->>User: exported data or sliced record
deactivate Export
User->>Extract: request plane (t,c,z)
activate Extract
Extract->>Extract: plane_from_chunks(data, t,c,z)
Extract-->>User: 2D numpy array (Y,X)
deactivate Extract
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @.pre-commit-config.yaml:
- Around line 41-42: The pre-commit hook was updated to rev "v0.14.14" for repo:
https://github.com/astral-sh/ruff-pre-commit which changes autofix behavior
(many fixes now considered unsafe and not applied by default); verify the new
behavior by running your ruff-check hook against the repo/CI and confirm the
reduced autofixes is acceptable, or if you want the previous autofix behavior
enable unsafe fixes by adding unsafe = true to your ruff configuration
(ruff.toml) or adjust the ruff-check hook settings accordingly so CI/local
workflows apply the unsafe autofixes.
In `@src/ome_arrow/export.py`:
- Around line 82-126: The loop over chunks currently only checks lower bounds
but must also validate that each chunk's extents do not exceed the image bounds
before reshaping/assignment: for each chunk (variables
t,c,z,y,x,shape_z,shape_y,shape_x) add checks that z + shape_z <= sz, and that y
+ shape_y <= sy and x + shape_x <= sx (you can get sy,sx from out.shape[3] and
out.shape[4]) and raise a clear ValueError like f"chunks[{i}] extent out of
range: z+shape_z={z+shape_z} > sz={sz}" (and analogous messages for y/x) before
computing expected_len/reshape and assigning into
out[t,c,z:z+shape_z,y:y+shape_y,x:x+shape_x]; this prevents silent
shape-mismatch errors during arr3d assignment.
- Around line 214-254: The code currently returns a zero-filled plane even when
no chunk matched and does not validate chunk bounds; update the chunks handling
in the export routine (the loop that iterates over chunks and uses variables
chunk_grid, chunks, strict, _cast_plane, plane) to (1) track a boolean like
any_chunk_matched set to True whenever a chunk is applied to the plane and only
return plane if any_chunk_matched is True, otherwise fall through to the
existing fallback or raise; and (2) validate chunk origins and shapes (z0, y0,
x0 >= 0 and z0+szc <= depth, y0+syc <= sy, x0+sxc <= sx) before
slicing/assigning and raise ValueError (or honor strict) for out-of-bounds or
negative origins so malformed chunks cannot silently write/clip into the plane.
In `@src/ome_arrow/ingest.py`:
- Around line 329-347: The slab allocation currently hardcodes dtype=np.uint16
which can truncate or misrepresent input planes; modify the code that builds
slabs (the slab variable in the loop using plane_map and the chunks.append
block) to infer the dtype from an available plane (e.g., inspect
plane_map[(t,c,z0 + zi)] or a representative plane before the zi loop) or accept
a dtype parameter, then allocate slab = np.zeros((sz, sy, sx),
dtype=inferred_dtype) and ensure pixels = slab.reshape(-1).tolist() preserves
that dtype/precision when serializing. Adjust any callers of this ingest routine
if you add a parameter so dtype flows through correctly.
🧹 Nitpick comments (3)
tests/test_chunks.py (1)
73-83: Consider adding edge case tests.The happy-path test is solid. Consider adding tests for:
- Out-of-bounds indices (should raise appropriate errors)
- Partial/boundary chunks (chunks that don't align perfectly with image dimensions)
- Missing chunks scenario
src/ome_arrow/transform.py (1)
11-11: Importing private helper functions across modules.The
_build_chunks_from_planesand_normalize_chunk_shapefunctions are prefixed with underscore indicating they're internal/private. Consider either:
- Renaming them without underscore prefix if they're part of the intended API
- Keeping them internal and re-exporting through
__all__if needed externallyThis is a minor concern since both modules are within the same package.
src/ome_arrow/ingest.py (1)
756-775: Inconsistent chunking API acrossfrom_*functions.
from_numpyexposeschunk_shape,chunk_order, andbuild_chunksparameters, butfrom_tiff,from_ome_zarr, andfrom_stack_pattern_pathdon't expose them. Sincebuild_chunks=Trueis the default into_ome_arrow, chunks will be automatically built for all these functions.Consider either:
- Adding chunking parameters to all
from_*functions for consistency- Documenting that chunking is always enabled for these functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/ome_arrow/ingest.py`:
- Around line 453-477: When callers pass a non-None chunks list, validate that
the stored chunk metadata reflects the actual chunks instead of blindly using
chunk_shape/chunk_order defaults: inspect the provided chunks (e.g., derive
per-axis sizes from the first chunk entry or compute max per-axis extents) and
use that to compute cz,cy,cx (instead of relying only on chunk_shape) via the
same normalization helper (_normalize_chunk_shape) or explicit checks; verify
chunk_order is one of the supported orders and that derived chunk dims match the
supplied chunk_shape (raise ValueError on mismatch) or override chunk_shape to
match chunks before building chunk_grid; update the block around
chunks/chunk_grid and calls to _build_chunks_from_planes,
_normalize_chunk_shape, chunk_shape, chunk_order, and chunks to perform these
validations and fail-fast with a clear error if inconsistent.
- Around line 269-274: The code computing cz, cy, cx must validate chunk_shape
before indexing: ensure chunk_shape is a sequence of length 3 (and raise a clear
ValueError if not) before using chunk_shape[0]/[1]/[2]; also validate each
element is numeric/convertible to int (and positive) so the subsequent int(...)
and min/max calls won't produce cryptic IndexError or TypeError—update the
function containing this block to check the shape, raise a descriptive error
like "chunk_shape must be a sequence of three integers (z,y,x)" and then proceed
to compute cz, cy, cx as before.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.