Skip to content

Conversation

@d33bs
Copy link
Member

@d33bs d33bs commented Jan 23, 2026

Summary by CodeRabbit

  • New Features
    • TCZYX-aware chunked pixel storage: configurable chunk dimensions/order, automatic chunk generation, dense reconstruction, and direct plane extraction API used by the viewer.
  • Documentation
    • Clarified default chunking behavior for OME-Zarr exports.
  • Chores
    • Bumped pre-commit hook version.
  • Tests
    • Added tests for chunk building, metadata, plane extraction, and reconstruction.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

Adds TCZYX-aware chunking support: new schema fields (chunk_grid, chunks), helpers to build/normalize chunks, ingestion/export/transform/view integration for chunked pixels, and tests + fixture updates to validate reconstruction and plane extraction.

Changes

Cohort / File(s) Summary
Configuration & Hook Version
\.pre-commit-config.yaml
Bumped ruff pre-commit hook from v0.14.13 to v0.14.14.
Schema / Metadata
src/ome_arrow/meta.py
Added chunk_grid and chunks fields to OME_ARROW_STRUCT (chunk-grid metadata and per-chunk entries with coordinates, shapes, and pixel lists).
Core Input Normalization
src/ome_arrow/core.py
Dict inputs now aligned to schema fields when constructing a StructScalar; updated OME-Zarr chunks docstring.
Ingest & Chunk Builders
src/ome_arrow/ingest.py
Added _normalize_chunk_shape() and _build_chunks_from_planes(); extended to_ome_arrow() and from_numpy() signatures with chunks, chunk_shape, chunk_order, build_chunks; auto-build/validate chunk metadata and include chunk_grid/chunks in records.
Export & Reconstruction
src/ome_arrow/export.py
Added public plane_from_chunks() to extract a 2D plane from chunked pixels; to_numpy() can reconstruct full TCZYX from chunks; added default chunking heuristic for to_ome_zarr(); normalized non-StructScalar inputs for Parquet/Vortex exports.
Transformation / Slicing
src/ome_arrow/transform.py
slice_ome_arrow() preserves or rebuilds chunk_grid and chunks when slicing, using ingest helpers and normalized chunk shapes.
Visualization
src/ome_arrow/view.py
view_matplotlib() now delegates plane extraction to plane_from_chunks() instead of manual pixel unpacking.
Tests & Fixtures
tests/conftest.py, tests/test_chunks.py
Fixture example_correct_data augmented with chunk_grid and chunks; new tests for chunk reconstruction, chunk building, and plane_from_chunks() extraction.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Ingest as Ingest
    participant Chunker as Chunk Builder
    participant Meta as OME-Arrow Struct
    participant Export as Export/Transform
    participant Extract as Plane Extractor

    User->>Ingest: submit planes + chunk params
    activate Ingest
    Ingest->>Chunker: _build_chunks_from_planes(...)
    activate Chunker
    Chunker->>Chunker: split planes into chunks (TCZYX)
    Chunker-->>Ingest: chunks list (coords, shapes, pixels)
    deactivate Chunker
    Ingest->>Meta: assemble StructScalar with chunk_grid & chunks
    Meta-->>Ingest: StructScalar
    deactivate Ingest

    User->>Export: request export / slice
    activate Export
    Export->>Chunker: reconstruct or slice chunks (if needed)
    Chunker-->>Export: updated chunks / plane data
    Export-->>User: exported data or sliced record
    deactivate Export

    User->>Extract: request plane (t,c,z)
    activate Extract
    Extract->>Extract: plane_from_chunks(data, t,c,z)
    Extract-->>User: 2D numpy array (Y,X)
    deactivate Extract
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Chunks in my burrow, neat and round,
Grids and pixels snugly found.
Planes stack tidy, pieces play—
I hop through bytes and nibble away. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: adding pixel chunking support to the OME-Arrow specification and implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @.pre-commit-config.yaml:
- Around line 41-42: The pre-commit hook was updated to rev "v0.14.14" for repo:
https://github.com/astral-sh/ruff-pre-commit which changes autofix behavior
(many fixes now considered unsafe and not applied by default); verify the new
behavior by running your ruff-check hook against the repo/CI and confirm the
reduced autofixes is acceptable, or if you want the previous autofix behavior
enable unsafe fixes by adding unsafe = true to your ruff configuration
(ruff.toml) or adjust the ruff-check hook settings accordingly so CI/local
workflows apply the unsafe autofixes.

In `@src/ome_arrow/export.py`:
- Around line 82-126: The loop over chunks currently only checks lower bounds
but must also validate that each chunk's extents do not exceed the image bounds
before reshaping/assignment: for each chunk (variables
t,c,z,y,x,shape_z,shape_y,shape_x) add checks that z + shape_z <= sz, and that y
+ shape_y <= sy and x + shape_x <= sx (you can get sy,sx from out.shape[3] and
out.shape[4]) and raise a clear ValueError like f"chunks[{i}] extent out of
range: z+shape_z={z+shape_z} > sz={sz}" (and analogous messages for y/x) before
computing expected_len/reshape and assigning into
out[t,c,z:z+shape_z,y:y+shape_y,x:x+shape_x]; this prevents silent
shape-mismatch errors during arr3d assignment.
- Around line 214-254: The code currently returns a zero-filled plane even when
no chunk matched and does not validate chunk bounds; update the chunks handling
in the export routine (the loop that iterates over chunks and uses variables
chunk_grid, chunks, strict, _cast_plane, plane) to (1) track a boolean like
any_chunk_matched set to True whenever a chunk is applied to the plane and only
return plane if any_chunk_matched is True, otherwise fall through to the
existing fallback or raise; and (2) validate chunk origins and shapes (z0, y0,
x0 >= 0 and z0+szc <= depth, y0+syc <= sy, x0+sxc <= sx) before
slicing/assigning and raise ValueError (or honor strict) for out-of-bounds or
negative origins so malformed chunks cannot silently write/clip into the plane.

In `@src/ome_arrow/ingest.py`:
- Around line 329-347: The slab allocation currently hardcodes dtype=np.uint16
which can truncate or misrepresent input planes; modify the code that builds
slabs (the slab variable in the loop using plane_map and the chunks.append
block) to infer the dtype from an available plane (e.g., inspect
plane_map[(t,c,z0 + zi)] or a representative plane before the zi loop) or accept
a dtype parameter, then allocate slab = np.zeros((sz, sy, sx),
dtype=inferred_dtype) and ensure pixels = slab.reshape(-1).tolist() preserves
that dtype/precision when serializing. Adjust any callers of this ingest routine
if you add a parameter so dtype flows through correctly.
🧹 Nitpick comments (3)
tests/test_chunks.py (1)

73-83: Consider adding edge case tests.

The happy-path test is solid. Consider adding tests for:

  • Out-of-bounds indices (should raise appropriate errors)
  • Partial/boundary chunks (chunks that don't align perfectly with image dimensions)
  • Missing chunks scenario
src/ome_arrow/transform.py (1)

11-11: Importing private helper functions across modules.

The _build_chunks_from_planes and _normalize_chunk_shape functions are prefixed with underscore indicating they're internal/private. Consider either:

  1. Renaming them without underscore prefix if they're part of the intended API
  2. Keeping them internal and re-exporting through __all__ if needed externally

This is a minor concern since both modules are within the same package.

src/ome_arrow/ingest.py (1)

756-775: Inconsistent chunking API across from_* functions.

from_numpy exposes chunk_shape, chunk_order, and build_chunks parameters, but from_tiff, from_ome_zarr, and from_stack_pattern_path don't expose them. Since build_chunks=True is the default in to_ome_arrow, chunks will be automatically built for all these functions.

Consider either:

  1. Adding chunking parameters to all from_* functions for consistency
  2. Documenting that chunking is always enabled for these functions

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/ome_arrow/ingest.py`:
- Around line 453-477: When callers pass a non-None chunks list, validate that
the stored chunk metadata reflects the actual chunks instead of blindly using
chunk_shape/chunk_order defaults: inspect the provided chunks (e.g., derive
per-axis sizes from the first chunk entry or compute max per-axis extents) and
use that to compute cz,cy,cx (instead of relying only on chunk_shape) via the
same normalization helper (_normalize_chunk_shape) or explicit checks; verify
chunk_order is one of the supported orders and that derived chunk dims match the
supplied chunk_shape (raise ValueError on mismatch) or override chunk_shape to
match chunks before building chunk_grid; update the block around
chunks/chunk_grid and calls to _build_chunks_from_planes,
_normalize_chunk_shape, chunk_shape, chunk_order, and chunks to perform these
validations and fail-fast with a clear error if inconsistent.
- Around line 269-274: The code computing cz, cy, cx must validate chunk_shape
before indexing: ensure chunk_shape is a sequence of length 3 (and raise a clear
ValueError if not) before using chunk_shape[0]/[1]/[2]; also validate each
element is numeric/convertible to int (and positive) so the subsequent int(...)
and min/max calls won't produce cryptic IndexError or TypeError—update the
function containing this block to check the shape, raise a descriptive error
like "chunk_shape must be a sequence of three integers (z,y,x)" and then proceed
to compute cz, cy, cx as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant