Skip to content

ENH: Add description filter parameter to Raw.crop_by_annotations()#13820

Open
aman-coder03 wants to merge 3 commits into
mne-tools:mainfrom
aman-coder03:enh-crop-by-annotations-description
Open

ENH: Add description filter parameter to Raw.crop_by_annotations()#13820
aman-coder03 wants to merge 3 commits into
mne-tools:mainfrom
aman-coder03:enh-crop-by-annotations-description

Conversation

@aman-coder03

Copy link
Copy Markdown
Contributor

Reference issue (if any)

fixes #13743

What does this implement/fix?

Raw.crop_by_annotations() currently crops the raw data for every annotation with no way to filter by description. This PR adds an optional description parameter that lets you crop only annotations with matching descriptions
example:

# crop only stimulus annotations
raws = raw.crop_by_annotations(description="stimulus")

# or multiple types at once
raws = raw.crop_by_annotations(description=["stimulus", "response"])

when description=None (the default), the method behaves exactly as before, so there is no API breakage...
filtering uses np.isin on the annotation descriptions, which follows the same pattern as _select_annotations_based_on_description already used internally in mne/annotations.py.

Additional information

  • added a test test_crop_by_annotations_description in mne/io/tests/test_raw.py
  • parametrized over meas_date and first_samp to match the existing test_crop_by_annotations style
  • fully backward compatible, description=None is the default

@PragnyaKhandelwal

Copy link
Copy Markdown
Contributor

Hey @aman-coder03, I tested your branch locally to see how it handles some edge cases. The base implementation looks good, but I found some things that might be worth refining to match with MNE standards

I noticed you mentioned _select_annotations_based_on_description in the issue but didn't use it in the PR. I ran a test (see below) and found that using the manual np.isin logic misses out on Regex support, which MNE users generally expect when filtering by description. Was there a specific reason you decided to move away from the helper function?
Currently, the filtering is strictly case-sensitive. The internal helper might provide more flexibility here like we have the rules in the events_from_annotations.
If a user provides a description that doesn't exist (like a typo), the method returns an empty list silently. Would it be worth adding a logger.warning so the user knows why they got no results?

image

@aman-coder03 aman-coder03 requested a review from larsoner as a code owner April 9, 2026 05:44
@aman-coder03

Copy link
Copy Markdown
Contributor Author

thanks for the review, i investigated _select_annotations_based_on_description but it expects an event_id dict rather than a description list, so it's not the right tool here.
i have kept np.isin for exact matching and added a RuntimeWarning when no annotations match. Regex and case-insensitivity could be a follow-up enhancement...

@PragnyaKhandelwal

Copy link
Copy Markdown
Contributor

Thanks for the update, @aman-coder03! Adding that RuntimeWarning will definitely help users catch typos in their filters. I see your point about the helper function being a bit more aligned with the event_id workflow. I'll leave it to the maintainers to decide if they want to stick with exact matching for now or if they'd prefer the full Regex/Case flexibility provided by the internal helpers.

@nordme

nordme commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@aman-coder03 @PragnyaKhandelwal @drammock Yeah, there's some inconsistency whether annotation descriptions are matched exactly or not (for example, #13940). A common convention when creating annotations is using "BAD_" to denote parts of the data that ought to be excluded, while altering the rest of the string to indicate the reason for exclusion (e.g. "BAD_blink", "BAD_movement"). This naming convention means that a regex style ability to find all the annotations starting with "BAD_" is desirable. But spotty implementation of regex vs exact matching might cause problems. @drammock, Thoughts?

bkowshik added a commit to bkowshik/mne-python that referenced this pull request Jun 4, 2026
Adds an optional `regexp` parameter to `Raw.crop_by_annotations()` so
only annotations whose description matches the pattern are cropped
(e.g. `regexp="^BAD_"`). Default `regexp=None` crops every annotation,
preserving current behavior.

Extracts the regex-matching core of
`_select_annotations_based_on_description` into a shared
`_match_descriptions` helper so cropping reuses the same matching as
`events_from_annotations` without inheriting its `event_id` machinery.
No-match emits a `RuntimeWarning`; the matcher itself stays policy-free
so each caller chooses its own no-match behavior.

Implements the design discussed on mne-tools#13820.
@bkowshik

bkowshik commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

On the exact-vs-regex question at the end of the thread, I think there's a path that gives the regex everyone wants without the event_id baggage that ruled out _select_annotations_based_on_description.

The key observation: that helper does two separable things ‚ (1) a regex match over descriptions, and (2) resolving each match to an integer trigger via event_id. @aman-coder03 is right that (2) is the wrong fit for cropping ‚ but (1) is two lines with no dependency on event_id, so it can be factored into a shared helper:

def _match_descriptions(descriptions, regexp):
    """Return indices of descriptions matching ``regexp`` (all if None)."""
    regexp_comp = re.compile(".*" if regexp is None else regexp)
    return [
        ii for ii, desc in enumerate(descriptions)
        if regexp_comp.match(desc) is not None
    ]

_select_annotations_based_on_description then calls this for its regex pass (keeping its own event_id + ValueError logic on top), so there's a single source of truth for matching ‚ which should keep regex-vs-exact from drifting apart, @nordme.

On the API, I'd suggest regexp: str rather than a description=[...] list, to match every other pattern-based filter in MNE:

Function filters parameter
events_from_annotations annotation descriptions regexp: str
read_labels_from_annot label names regexp: str
pick_channels_regexp channel names regexp: str

I've implemented exactly this with a parametrized test, and confirmed the refactor is behavior-identical for events_from_annotations. Full diff here: main...bkowshik:crop-by-annotations-regexp. It's your PR and the credit's yours, @aman-coder03 ‚ happy to hand this over however's easiest, or open a PR only if the maintainers would prefer.

Testing

(mnedev) bkowshik@Coimbatore mne-python % python -c "
import numpy as np, mne
raw = mne.io.RawArray(np.zeros((1,4000)), mne.create_info(1,1000.,'eeg'))
raw.set_annotations(mne.Annotations([0,1.5,3.0],[1,.5,.5],['BAD_blink','stimulus','BAD_movement']))
print('all     :', len(raw.crop_by_annotations()))                   # 3
print('^BAD_   :', len(raw.crop_by_annotations(regexp='^BAD_')))     # 2
print('^bad_   :', len(raw.crop_by_annotations(regexp='^bad_')))     # 0 + RuntimeWarning
print('(?i)bad :', len(raw.crop_by_annotations(regexp='(?i)^bad_'))) # 2
"
Creating RawArray with float64 data, n_channels=1, n_times=4000
    Range : 0 ... 3999 =      0.000 ...     3.999 secs
Ready.
all     : 3
^BAD_   : 2
<string>:7: RuntimeWarning: No annotation descriptions matched regexp '^bad_'; returning an empty list.
^bad_   : 0
(?i)bad : 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Add description filter parameter to Raw.crop_by_annotations()

4 participants