AllenNeuralDynamics · arjunsridhar12345 · Jun 10, 2026 · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026
diff --git a/docs/nwb_contents.md b/docs/nwb_contents.md
@@ -0,0 +1,55 @@
+# Final NWB Contents
+
+This document describes the contents of the NWB file produced by this
+repository. It is a companion to issue
+[#12](https://github.com/AllenNeuralDynamics/dynamic-foraging-processing/issues/12),
+which serves as the authoritative changelog for content decisions.
+
+When the NWB contents change, update both the [Changelog](#changelog) below
+and the relevant section in this document. Each entry should record at minimum
+the date, what changed, and why.
+
+## Acquisition
+
+The `acquisition` container holds the HARP streams from the rig (e.g.
+VR Foraging) along with four behavior-derived series carried over from the
+NWB produced by the combined dynamic foraging + FIP pipeline:
+
+- `left_lick_time`
+- `right_lick_time`
+- `left_reward_delivery_time`
+- `right_reward_delivery_time`
+
+Each series stores both timestamps and a parallel `data` array. For the
+reward delivery series, `data` annotates each reward as `earned`, `manual`,
+or `automatic`.
+
+See [`trials_table_mapping.md`](trials_table_mapping.md#acquisition-container)
+for the raw sources backing each of these four series.
+
+## Events
+
+The `events` container follows the conventions in
+[aind-physio-arch#1072](https://github.com/AllenNeuralDynamics/aind-physio-arch/issues/1072).
+
+The events sidecar will be version-controlled in this repository for now so that
+changes can be tracked alongside the code.
+
+Events are on pause pending validation by the HED team. See the
+[Changelog](#changelog) for details.
+
+## Trials
+
+The `trials` table is built from the raw acquisition streams. The full
+column-by-column mapping is documented in
+[`trials_table_mapping.md`](trials_table_mapping.md), and the source-of-truth
+discussion lives in issue
+[#5](https://github.com/AllenNeuralDynamics/dynamic-foraging-processing/issues/5).
+
+## Changelog
+
+| Date | Section | Change | Reason |
+| --- | --- | --- | --- |
+| 2026-06-03 | acquisition / trials | Initial scope confirmed: HARP streams + `{left,right}_lick_time` and `{left,right}_reward_delivery_time` in `acquisition`; trials mapping per issue #5. | Meeting with Alex. |
+| 2026-06-05 | events | Events on pause. | Pending validation by the HED team. |
+| 2026-06-08 | acquisition | Documented `data` arrays alongside timestamps; reward delivery series annotate each reward as `earned`, `manual`, or `automatic`. | Clarify acquisition contents. |
diff --git a/docs/qc_upgrade_plan.md b/docs/qc_upgrade_plan.md
@@ -0,0 +1,212 @@
+# QC Upgrade Plan
+
+Plan for upgrading
+[`aind-dynamic-foraging-qc/code/run_capsule.py`](https://github.com/AllenNeuralDynamics/aind-dynamic-foraging-qc/blob/main/code/run_capsule.py)
+to:
+
+1. Conform to the current
+   [`aind_data_schema.core.quality_control`](https://github.com/AllenNeuralDynamics/aind-data-schema/blob/dev/src/aind_data_schema/core/quality_control.py)
+   schema (v2.4.1).
+2. Operate on primitive structures (numpy arrays, pandas DataFrames). QC
+   functions are agnostic to where the data came from — the caller is free
+   to load from
+   [`RawDataLoader`](../src/dynamic_foraging_processing/raw_data_loader/loader.py),
+   an NWB file, or anything else, as long as the primitives match the
+   expected shape.
+
+This document is a design reference only. Implementation will happen on a
+separate branch.
+
+## 1. Schema changes
+
+The schema removed `QCEvaluation`. The new `QualityControl` object holds a flat
+`metrics: List[QCMetric | CurationMetric]` and groups metrics via per-metric
+`tags`. Each `QCMetric` now requires `modality` and `stage` directly (these
+moved off `QCEvaluation`), and `QualityControl` requires `default_grouping`.
+
+### Field-by-field migration
+
+| Old (capsule) | New (schema v2.4.1) |
+| --- | --- |
+| `QCEvaluation(name, modality, stage, metrics, description, allow_failed_metrics)` | Removed. Replace each evaluation with one or more `QCMetric`s sharing a tag. |
+| `QCMetric(name, value, status_history, description?, reference?)` | `QCMetric(name, modality, stage, value, status_history, description?, reference?, tags={}, evaluated_assets?)` |
+| `QualityControl(evaluations=[...])` | `QualityControl(metrics=[...], default_grouping=[...], key_experimenters?, notes?, allow_tag_failures?)` |
+| n/a | `Status.PENDING` is now a valid third state alongside `PASS` / `FAIL`. |
+| `allow_failed=True` on an evaluation | `allow_tag_failures=["<tag value>"]` on the top-level `QualityControl`. |
+
+### Tag convention
+
+Each ported behavior metric is tagged with `{"behavior": "<metric name>"}` —
+the key is the group, the value is the metric's name. Contraqctor results
+use a fixed `"test_suite"` key plus a dynamic per-suite key (see
+[Contraqctor-based QA suites](#contraqctor-based-qa-suites-per-meeting-with-alex-2026-06-03)).
+
+### Helper rewrites
+
+`Bool2Status` keeps its shape but must produce timezone-aware timestamps
+(schema uses `AwareDatetimeWithDefault`). The existing `datetime.now(seattle_tz)`
+already satisfies this.
+
+`create_evaluation(...)` is deleted. Replace with a small `make_metric(...)`
+helper that stamps `modality`, `stage`, and `tags` onto each `QCMetric`.
+
+## 2. Data inputs
+
+The old capsule consumed a single `behavior.json` (e.g. `B_Bias`,
+`B_LeftLickTime`, `B_RightLickTime`, `B_StagePositions`, `drop_frames_tag`,
+`Experimenter`, `dirty_files`, ...). The new pipeline does not produce this
+file.
+
+QC functions now take primitive structures directly. The entry point is
+responsible for producing those primitives — whether it pulls them from
+`RawDataLoader.get_all_raw_data()`, an NWB file, or any other source is
+out of scope for the QC module. This keeps the QC logic testable without
+any dataset on disk.
+
+### Primitive inputs per metric
+
+| Primitive | Type | Old `behavior.json` analogue |
+| --- | --- | --- |
+| `left_lick_times` | `np.ndarray` of seconds | `B_LeftLickTime` |
+| `right_lick_times` | `np.ndarray` of seconds | `B_RightLickTime` |
+| `animal_response` | `np.ndarray` of `{0,1,2}` per trial | `B_AnimalResponseHistory` |
+| `go_cue_times` | `np.ndarray` of seconds | `B_GoCueTimeSoundCard` |
+| `rewarded_history` | `pd.DataFrame` with `left`/`right` boolean columns | `B_RewardedHistory` |
+| `stage_positions` | `pd.DataFrame` with `x`/`y`/`z` columns per trial | `B_StagePositions` |
+
+### Out-of-scope (no equivalent in the new data, drop the check)
+
+- `drop_frames_tag`, `frame_num`, `trigger_length` — dropped-frames check.
+- `Experimenter`, `dirty_files`, `repo_dirty_flag` — basic-configuration check.
+- `B_Bias`, `B_Bias_CI` — pre-computed side bias; recompute from
+  `animal_response` instead (rolling fraction of right vs. left choices).
+
+## 3. Metrics in the new capsule
+
+Keep only what maps cleanly. All metrics get `stage=Stage.RAW` and
+`modality=Modality.BEHAVIOR` unless noted.
+
+### Side bias (`tags={"behavior": "average side bias"}`)
+
+- Input: `animal_response: np.ndarray` (`0=left`, `1=right`, `2=ignore`).
+- Average bias = `mean(is_right) - mean(is_left)` over responded trials (or
+  the rolling form, matching the old `B_Bias`).
+- Metric: `"average side bias"`, pass when `abs(mean_bias) < 0.5`.
+- `reference="side_bias.png"`.
+
+### Lick intervals
+
+Port `calculate_lick_intervals` verbatim. Inputs are
+`left_lick_times: np.ndarray` and `right_lick_times: np.ndarray`, extracted
+from the `Behavior.Lickometer` stream at the entry point.
+
+Emit the same four metrics, each tagged with its own name under the
+`behavior` key:
+
+| Metric | Tag | Pass rule |
+| --- | --- | --- |
+| `Left Lick Interval (%)` | `{"behavior": "Left Lick Interval (%)"}` | `< 10` |
+| `Right Lick Interval (%)` | `{"behavior": "Right Lick Interval (%)"}` | `< 10` |
+| `Cross Side Lick Interval (%)` | `{"behavior": "Cross Side Lick Interval (%)"}` | `< 10` |
+| `Artifact Percent (%)` | `{"behavior": "Artifact Percent (%)"}` | `< 1` |
+
+All carry `reference="lick_intervals.png"`.
+
+### Plots to keep
+
+- `lick_intervals.png` — five-panel histogram of inter-lick intervals
+  (`left licks`, `right licks`, `left to right licks`, `right to left licks`,
+  `all licks`); inputs are `left_lick_times` and `right_lick_times`.
+- `side_bias.png` — four-panel figure:
+  - Side bias trace (with confidence interval band) — rolling `B_Bias` /
+    `B_Bias_CI` recomputed from `animal_response`.
+  - Lickspout position over trials — `stage_positions` (x / y1 / y2 / z,
+    relative to session start, in mm).
+  - Behavior event raster — `animal_response` (L/R choice, ignore),
+    `rewarded_history` (L/R earned water), manual water times, and
+    `auto_water` (L/R) per trial.
+  - Reward probabilities — `reward_probabilityL` / `reward_probabilityR`
+    per trial.
+
+### Contraqctor-based QA suites (per meeting with Alex, 2026-06-03)
+
+Same approach as VR foraging QA.
+
+The runner is provided by
+[`aind_behavior_dynamic_foraging.data_qc.suite.make_qc_runner(dataset)`](https://github.com/AllenNeuralDynamics/Aind.Behavior.DynamicForaging/blob/main/src/aind_behavior_dynamic_foraging/data_qc/suite.py),
+so just needs to call it on `loader.dataset` and convert
+the results. `make_qc_runner` already wires up:
+
+- `ContractTestSuite` (dataset loading errors, excluding Harp command streams)
+- `HarpDeviceTestSuite` for every `HarpDevice` under `Behavior`
+- `HarpHubTestSuite`
+- `HarpLicketySplitTestSuite` for the left and right lickometers
+- `HarpSniffDetectorTestSuite` / `HarpEnvironmentSensorTestSuite` (conditional on the rig)
+- `CameraTestSuite` for every camera in `BehaviorVideos` (uses `rig.triggered_camera_controller.frame_rate`)
+- `CsvTestSuite` for every CSV stream
+- `DynamicForagingQcSuite` (currently `test_end_session_exists`)
+
+#### Result → `QCMetric` conversion
+
+Map contraqctor statuses onto schema statuses:
+
+```python
+status_converter = {
+    qc.Status.PASSED:  Status.PASS,
+    qc.Status.SKIPPED: Status.PASS,
+    qc.Status.WARNING: Status.PENDING,
+    qc.Status.FAILED:  Status.FAIL,
+    qc.Status.ERROR:   Status.FAIL,
+}
+```
+
+For each `qc.Result`:
+
+- `name = f"{result.suite_name}::{result.test_name}"`
+- `description = f"Test: {result.description} // Message: {result.message}"`
+- `value = convert_numpy_to_python_data_type(result.result)`
+- `status_history = [QCStatus(evaluator="Automated", status=..., timestamp=now_utc)]`
+- `modality = Modality.BEHAVIOR`, `stage = Stage.RAW`
+- `tags = {"test_suite": result.suite_name, result.suite_name: group_name}`
+  — one fixed `"test_suite"` key whose value is the suite name, plus a
+  dynamic key (the suite name) whose value is the runner group (defaulting
+  to `"NoGroup"`).
+- `reference`: if `result.context["asset"]` is a `matplotlib.figure.Figure`,
+  save it under the results folder and store the relative path.
+
+#### Updated tag / grouping plan
+
+| Tag key | Values |
+| --- | --- |
+| `behavior` | metric name (e.g. `average side bias`, `Left Lick Interval (%)`) |
+| `test_suite` | only on contraqctor metrics; suite name (e.g. `HarpEnvironmentSensorTestSuite`) |
+
+`default_grouping` tells the QC portal which tag *keys* to use when
+laying out the metrics hierarchically (see the schema field's
+[description](https://github.com/AllenNeuralDynamics/aind-data-schema/blob/dev/src/aind_data_schema/core/quality_control.py)).
+Each entry is a tag key (or a list of tag keys at the same level); the
+portal walks them in order and groups metrics by the values it finds for
+those keys.
+
+So `behavior` and `test_suite` are siblings at the top level; a metric
+ends up under whichever one its tags match. They don't overlap because
+the two groups of metrics carry disjoint tag keys.
+
+Sample portal layout:
+
+```
+behavior
+  Metric...
+  Metric...
+
+test_suite
+  Metric...
+  Metric...
+```
+
+## Changelog
+
+| Date | Section | Change | Reason |
+| --- | --- | --- | --- |
+| 2026-06-03 | metrics | Confirmed kept QC metrics: side bias, lick intervals, and Harp/contract QA via `make_qc_runner`. Dropped checks tied to old `behavior.json` (dropped frames, basic configuration). | Meeting with Alex. |
+| 2026-06-03 | qa | Adopt contraqctor `qc.Runner` output (`make_qc_runner(dataset)`) as the source for Harp / camera / contract / DynamicForaging QA, converted into `QCMetric`s. | Meeting with Alex. |
diff --git a/docs/trials_table_mapping.md b/docs/trials_table_mapping.md
@@ -0,0 +1,124 @@
+# Mapping of Raw Acquisition Streams to the NWB Trials Table
+
+This document describes how the NWB `acquisition` container and the `trials` table
+are constructed from the raw dynamic foraging acquisition streams.
+
+Reference asset used while mapping:
+[behavior_836626_2026-05-20_14-19-10_processed_2026-05-21_17-40-47](https://codeocean.allenneuraldynamics.org/data-assets/49d1b596-c1a0-4c52-a3dd-26181f4b2b55/behavior_836626_2026-05-20_14-19-10_processed_2026-05-21_17-40-47).
+
+Trial column descriptions are derived from
+[`nwb_trial_column_info.json`](https://github.com/AllenNeuralDynamics/aind-fip-nwb-base-capsule/blob/main/code/util/nwb_trial_column_info.json)
+in the combined pipeline.
+
+> **Note:** Any column related to `autoTrain` can be disregarded (per meeting with
+> Alex on June 3rd, 2026).
+
+## Acquisition Container
+
+The NWB `acquisition` container holds four behavior-related time series:
+
+| Acquisition series | Source stream | Notes |
+| --- | --- | --- |
+| `left_lick_time` | `Behavior/Lickometer` | |
+| `right_lick_time` | `Behavior/Lickometer` | |
+| `left_reward_delivery_time` | `Behavior/HarpBehavior` `OutputSet` (`SupplyPort0`, `WRITE` messages) | Same as left valve open. |
+| `right_reward_delivery_time` | `Behavior/HarpBehavior` `OutputSet` (`SupplyPort1`, `WRITE` messages) | Same as right valve open. |
+
+Earlier mapping used `Response.json` (`SoftwareEvents`) for lick times (where
+`Item1` is the time and `Item2` is `left`/`right`) and `TrialOutcome.json`
+(filtered on `is_rewarded`, then `left`/`right`) for reward delivery times.
+Lick times now come from the `Behavior/Lickometer` stream, and reward delivery
+times use the Harp valve open times.
+
+## Trials Table
+
+Columns are grouped by the raw source they map from.
+
+### From `task_logic_input` (under `Logs`, `trial_generator` key)
+
+| Trials column | Source field |
+| --- | --- |
+| `ITI_beta`, `ITI_min`, `ITI_max`, `ITI_duration` | `inter_trial_interval_duration` |
+| `block_beta`, `block_duration`, `block_min`, `block_max` | `block_length` |
+| `delay_beta`, `delay_duration`, `delay_min`, `delay_max` | `quiescent_duration_key` (scalar distribution, so no beta/min/max) |
+
+### From `Response.json` (`SoftwareEvents` stream)
+
+| Trials column | Mapping |
+| --- | --- |
+| `animal_response` | `0` = left choice, `1` = right choice, `2` = no response. |
+
+### From `TrialOutcome.json` (`SoftwareEvents` stream)
+
+> **Note:** For `is_auto_response_right`, `True` means right and `False`
+> means left.
+
+| Trials column | Mapping |
+| --- | --- |
+| `auto_waterL` / `auto_waterR` | From `is_auto_response_right`. `NULL` for None, `true` for right, `false` for left. Encoded `0`/`1`. |
+| `bait_left` / `bait_right` | Boolean. `bait_right` is `True` if `p_reward_right == 1` and `is_auto_response_right` is `None` or `False`. `bait_left` is `True` if `p_reward_left == 1` and `is_auto_response_right` is `None` or `True`. |
+| `response_duration` | `response_deadline_duration`. |
+| `reward_consumption_duration` | `Trial -> reward_consumption_duration`. |
+| `reward_probabilityL` / `reward_probabilityR` | Most likely the block probability: `Trial -> Metadata -> p_reward_left` / `p_reward_right`. Confirm with Alex whether the actual lickspout probability is intended. |
+| `rewarded_historyL` / `rewarded_historyR` | Filter `is_rewarded == True`, then on `is_right_choice`. |
+
+### From `TrialGeneratorSpec.json` (`SoftwareEvents` stream)
+
+| Trials column | Mapping |
+| --- | --- |
+| `base_reward_probability_sum` | If `type == "CoupledTrialGenerator"`, look at `reward_probability_parameters`. |
+| `min_reward_each_block` | Present when `type == "CoupledTrialGenerator"`; otherwise `None`. |
+
+### From `QuiescentPeriod.json` (`SoftwareEvents` stream)
+
+| Trials column | Mapping |
+| --- | --- |
+| `delay_start_time` | `timestamp`. |
+| `start_time` | `timestamp` column. |
+
+### From `ITI_period.json` (`SoftwareEvents` stream)
+
+| Trials column | Mapping |
+| --- | --- |
+| `stop_time` | `timestamp` column. Possible QC check: length should match `QuiescentPeriod.json`. |
+
+### From `HarpBehavior` (`OutputSet`)
+
+| Trials column | Mapping |
+| --- | --- |
+| `left_valve_open_time` | `SupplyPort0`. |
+| `right_valve_open_time` | `SupplyPort1`. |
+
+Cross-correlate with software-event manual-reward times from the UI against
+trial `start_time`/`stop_time` to disambiguate manual valve opens. Double-check
+this.
+
+### From `SoundCard` (`WRITE` messages)
+
+| Trials column | Mapping |
+| --- | --- |
+| `goCue_start_time` | `PlaySoundOrFrequency` `WRITE` message. |
+
+### From `InitialManipulatorPosition` (software event)
+
+| Trials column | Mapping |
+| --- | --- |
+| `lickspout_positions` | `data` field. |
+
+### From `trainer_state.json` and `acquisition.json` (autoTrain — can be disregarded)
+
+These were mapped during exploration but are no longer in scope:
+
+- `auto_train_curriculum_name` / `auto_train_curriculum_schema_version` —
+  `trainer_state.json` (top level).
+- `auto_train_engaged` — Boolean flag in `acquisition.json` indicating whether
+  the curriculum is running.
+- `auto_train_stage` — `stage` in `trainer_state.json` (should always exist).
+- `auto_train_stage_overridden` — `True` when `on_curriculum` in
+  `acquisition.json` is `False`.
+
+### Not applicable to this task
+
+| Trials column | Mapping |
+| --- | --- |
+| `reward_random_L` / `reward_random_R` | None — no task component drives these. |