Vectorize _find_duplicated_spikes_keep_first/last_iterative kernels. by grahamfindlay · Pull Request #4614 · SpikeInterface/spikeinterface

grahamfindlay · 2026-06-11T01:31:22Z

Let N be the number of spikes in the train, and D be the number of spikes flagged as duplicates (i.e. the fall within the censored period of an earlier kept spike).

The old kernels were using if i in indices_of_duplicates for each of N spikes to check if the spike was a duplicate, which is O(N*D). But for heavily contaminated units, D can be quite large. On long (48h) recordings, this was blowing up, and taking 56min to compute sd_ratio for 232 units. One unit took nearly 5min by itself.

This approach is constant-time lookup, so O(N) total. New time on the 48h dataset is 10.6s total to compute sd_ratio for all units. Tested equivalence with the old method on both the real data and synthetic bursty trains, plus the existing test still passes (note for the future: the existing test should probably check the exact indices).

Also switched cache=True to cache=False on the keep_last variant to match its twin and conform with the rest of the codebase.

Let `N` be the number of spikes in the train, and `D` be the number of spikes flagged as duplicates (i.e. the fall within the censored period of an earlier kept spike). The old kernels were using `if i in indices_of_duplicates` for each of `N` spikes to check if the spike was a duplicate, which is `O(N*D)`. But for heavily contaminated units, `D` can be quite large. On long (48h) recordings, this was blowing up, and taking 56min to compute `sd_ratio` for 232 units. One unit took nearly 5min by itself. This approach is constant-time lookup, so `O(N)` total. New time on the 48h dataset is 10.6s total to compute `sd_ratio` for all units. Tested equivalence with the old method on both the real data and synthetic bursty trains, plus the existing test still passes (note for the future: the existing test should probably check the exact indices).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize _find_duplicated_spikes_keep_first/last_iterative kernels.#4614

Vectorize _find_duplicated_spikes_keep_first/last_iterative kernels.#4614
grahamfindlay wants to merge 1 commit into
SpikeInterface:mainfrom
grahamfindlay:perf/find-duplicated-spikes-optimization

grahamfindlay commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

grahamfindlay commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant