Skip to content

Use np.load(..., mmap_mode="r") for time_vector in BaseRecording._extra_metadata_from_folder#4608

Merged
alejoe91 merged 5 commits into
SpikeInterface:mainfrom
grahamfindlay:bugfix/excessive_eager_time_vector
Jun 10, 2026
Merged

Use np.load(..., mmap_mode="r") for time_vector in BaseRecording._extra_metadata_from_folder#4608
alejoe91 merged 5 commits into
SpikeInterface:mainfrom
grahamfindlay:bugfix/excessive_eager_time_vector

Conversation

@grahamfindlay

Copy link
Copy Markdown
Contributor

BaseRecording._extra_metadata_from_folder was eagerly loading a BinaryFolderRecording's time vector, which can be tens of GB on long recordings.

I ran into this because run_sorter_by_property (I was sorting tetrodes) reconstructs the recording once per joblib worker, so every worker was loading the whole time vector, even when only a short frame slice was needed.

Another consequence of the eager loading was that si.load on a 48h recording with a time_vector was using 1.6GB of memory, even if the time vector was never touched.

I switched the load to mmap_mode="r", so memory use no longer scales with both recording duration and number of workers. It is bounded to what's actually touched, and the memmap can be shared by the joblib workers.

The cost is that this time vector is read-only, so in-place operations on it would raise an error.

Thankfully there was only 1: TimeSeries.shift_times! I made it shift in place only when the vector is writeable and fall back to an out-of-place op for the now read-only memmap. (For simplicity, we could scrap the conditional altogether and just keep the out-of-place path, since this is presumably a pretty rare op -- but I chose to keep the in-place, no-copy path for best performance).

grahamfindlay and others added 2 commits June 7, 2026 19:41
…ta_from_folder`

`BaseRecording._extra_metadata_from_folder` was eagerly loading a
`BinaryFolderRecording`'s time vector, which can be tens of GB on long
recordings.

I ran into this because `run_sorter_by_property` (I was sorting tetrodes)
reconstructs the recording _once per joblib worker_, so every worker was
loading the whole time vector, even when only a short frame slice was
needed.

Another consequence of the eager loading was that `si.load` on a 48h
recording with a time_vector was using 1.6GB of memory, even if the
time vector was never touched.

I switched the load to `mmap_mode="r"`, so memory use no longer scales
with both recording duration and number of workers. It is bounded to
what's actually touched, and the memmap can be shared by the joblib workers.

The cost is that this time vector is read-only, so in-place operations
on it would raise an error.

Thankfully there was only 1: `TimeSeries.shift_times`! I made it shift in
place only when the vector is writeable and fall back to an out-of-place
op for the now read-only memmap.
Comment thread src/spikeinterface/core/time_series.py Outdated
@alejoe91 alejoe91 added the core Changes to core module label Jun 8, 2026
Comment thread src/spikeinterface/core/time_series.py Outdated

if self.has_time_vector(segment_index=segment_index):
rs.time_vector += shift
if rs.time_vector.flags.writeable:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check if zarr.Array also implements this flag

@alejoe91 alejoe91 Jun 8, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, it doesn't have this flag, so we should do:

Suggested change
if rs.time_vector.flags.writeable:
if isinstance(rs.time_vector, np.ndarray) and rs.time_vector.flags.writeable:

@grahamfindlay grahamfindlay Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wow, I didn't even realize time_vector could be a zarr.Array! In that case, the out-of-place op is also going to fail. And the in-place op is probably already failing -- I guess it just never surfaced.

So I think we actually need something like:

if isinstance(rs.time_vector, np.ndarray) and rs.time_vector.flags.writeable:
      rs.time_vector += shift    # in-place, no copy
  else:
      rs.time_vector = np.asarray(rs.time_vector) + shift    # read-only memmap or zarr.Array

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

grahamfindlay and others added 3 commits June 9, 2026 11:34
Recordings loaded from disk may have time vectors
that are out-of-memory (OOM) arrays: read-only
mem-maps or zarr.Array. These can't be shifted in
place, and need to be materialized as as numpy
arrays before shifting.
@alejoe91 alejoe91 merged commit 1d2fab0 into SpikeInterface:main Jun 10, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Changes to core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants