#535 Postgres style indexes#714
Open
Benjamin-Knight wants to merge 23 commits into
Open
Conversation
…tests for index configs
test_table_index_names_stable_across_runs is xfail(strict): timestamp-hashed index names regenerate on every rebuild, which blocks name-based reconciliation. test_table_definitions_stable_across_runs pins that the definition set does not accumulate today (table rebuilds are always fresh). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
render() now hashes the full index definition (ordered columns, sorted included columns, relation identity, unique, type) under a dbt_idx_ prefix. SQL Server index names are per-table scoped, so the Postgres rename collision (dbt-core#1945) the timestamp guarded against cannot occur here; determinism is required so name equality <=> definition equality for reconciliation. Also removes the leftover debug print. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sqlserver__get_create_index_sql now guards with IF NOT EXISTS (names are deterministic full-definition hashes, so an existing index with the same name is already the desired index) and bracket-quotes the index name, key columns and INCLUDE columns so reserved-word column names work. Adds a reserved-word functional test and consolidates the name/definition stability assertions into one two-build test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
data_compression (none|row|page, rowstore only) participates in the definition hash and equality, so changing it produces a new name and a rebuild on reconcile; 'none' normalizes to absent so it hashes like the engine default. sort_in_tempdb is a build-time option: excluded from hash and equality (not introspectable, doesn't change the resulting index) so toggling it never triggers drop/recreate. Both render into a WITH clause on CREATE INDEX; both rejected for columnstore. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
index_config_changes() computes drops-before-creates from describe rows vs expected configs using name-set arithmetic (deterministic names make name equality <=> definition equality). Hard exclusions in every mode: constraint-backing indexes and legacy post-hook (clustered_/nonclustered_) names. drop_unmanaged_indexes false|warn|true gates other unmanaged indexes, with bool normalization for YAML values. A clustered index in config blocked by an existing non-droppable clustered index raises a clear error naming the blocker instead of failing later in CREATE. Also fixes parse_relation_results: strips whitespace from aggregated column lists, parses NULL/empty includes to empty sets, coerces unique to bool, and carries data_compression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New macros: sqlserver__describe_indexes (sys.indexes + STRING_AGG key/ include columns, compression, constraint flags; SQL Server 2017+), sqlserver__get_drop_index_sql, and sqlserver__reconcile_indexes, which diffs existing indexes against the indexes config via adapter.index_changes and applies drops before creates with info-level logging. Wired into the three paths where the relation persists across runs: incremental non-full-refresh, table_refresh_method=dml pure-DML (after the swap transaction), and snapshot updates. drop_unmanaged_indexes config (false|warn|true) gates sweeping of unmanaged indexes; constraint-backing indexes, legacy post-hook names and the as_columnstore CCI are never dropped in any mode. as_node_config now emits JSON-compatible types so reconcile creates can round-trip through parse_index (latent bug in the original PR). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
adapter.validate_indexes rejects more than one clustered entry and a clustered rowstore entry combined with as_columnstore=true (the default), replacing the opaque engine error 'Cannot create more than one clustered index' with an actionable message. Called from a new sqlserver__create_indexes override and from sqlserver__reconcile_indexes. as_columnstore is only weighed for materializations that honor it (table/incremental/snapshot) - seeds never build a CCI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Coverage baseline on relation_configs (99%, single defensive re-raise uncovered) exposed that dbtClassMixin/mashumaro silently replaces any from_dict defined in the class body with a generated method - our normalization override was dead code, so data_compression: PAGE/NONE (uppercase) failed validation instead of normalizing. Normalization now happens in parse(), the raw-config entry point, and the dead override is removed with an explanatory note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dbt fires anonymous-usage telemetry and a PyPI version check on every invocation; under the test harness that is one to three in-process run_dbt() calls per functional test. On sandboxed or offline networks (devcontainers, some CI runners) those requests stall with variable timeouts - measured 21-95s per dbt invocation at ~zero CPU - which inflated the functional suite from ~17s to ~96s per test. Even with a working network they are wasted cycles in a test run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New fields: per-column direction ({'column': n, 'desc': true} entries,
normalized into descending_columns), where (filtered indexes, nonclustered
and columnstore), fillfactor/pad_index, ignore_dup_key (unique rowstore),
optimize_for_sequential_key (rowstore, 2019+), and columnstore /
columnstore_archive data_compression values for columnstore indexes.
All definition fields join the name hash ONLY when set, so existing
managed index names stay stable across the upgrade. Normalization now runs
before jsonschema validation in parse() so dict column entries validate.
With this, most CREATE INDEX situations are expressible in config rather
than falling back to post-hooks, keeping definitions machine-readable for
the planned prebuilt feature.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
online, maxdop, resumable, max_duration, allow_row_locks, allow_page_locks, statistics_norecompute, statistics_incremental and compression_delay flow through to the WITH clause. Excluded from the name hash and from equality: they affect how an index is built, not what it is, so toggling them must never trigger a drop/recreate on reconcile. Unknown keys fail validation with the allowed list. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Create macro: per-column DESC, ]] identifier escaping, WHERE clause for filtered indexes, and a full WITH assembly (data_compression, fillfactor, pad_index, ignore_dup_key, optimize_for_sequential_key, sort_in_tempdb, plus build_options passthrough). Describe macro: descending key columns, filter_definition, fill_factor, ignore_dup_key, NCCI compression; MAX() instead of TOP 1 on sys.partitions so mixed-compression partitions can't make reconcile thrash; optimize_for_sequential_key deliberately not selected (catalog column is 2019+, comparisons are name-based). Validation addition found by the functional fixture: the engine forbids IGNORE_DUP_KEY on filtered indexes, so ignore_dup_key x where now fails fast at parse time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Reconcile drops+creates now execute as ONE transactional batch (set xact_abort on) so a definition change is atomic: no window where a replacement index or its uniqueness enforcement is missing, on any of the three reconcile surfaces. - drop_unmanaged_indexes is validated in validate_indexes (shared normalize_drop_unmanaged), so a bad value fails the FIRST build instead of surfacing only when reconciliation later runs. - SQLServerIndexConfigChange's create-with-no-columns rule compared a tuple to set() (always False, dead); now uses truthiness. - New tests: DML no-churn on unchanged reruns; project-level dbt_project.yml models +indexes apply and model-level config clobbers them; snapshot with clustered index rejects default as_columnstore; invalid drop_unmanaged_indexes fails on first build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
describe_indexes only hinted sys.indexes with NOLOCK; the OUTER APPLY reads of sys.index_columns, sys.columns and sys.partitions took shared locks, so introspection queued on LCK_M_S behind concurrent index DDL on the same table. Apply information_schema_hints() to every catalog read so the whole metadata query is lock-free, matching the rest of the file. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Remove duplicate AdapterSpecificConfigs class attribute - Run ONLINE/RESUMABLE index creates outside the reconcile transaction (SQL Server forbids RESUMABLE in a user transaction; an ONLINE build wrapped in one holds its locks until commit, negating the non-blocking intent). index_changes now returns a separate creates_no_txn list. - Raise on name-hash collisions between distinct index configs instead of silently dropping one in the expected-by-name map - Remove dead parse_model_node (unused in production; not a framework hook) - Fix indexes config type hint Tuple[X] -> Tuple[X, ...] - black/isort formatting on test_index_macros.py Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dependency) Unstacks the indexes feature from PR dbt-msft#710: the branch now builds on current master alone. The transaction-coexistence work is flag-agnostic, so it needs no dependency on dbt_sqlserver_use_dbt_transactions: - Index reconciliation runs in an @@TRANCOUNT-aware batch (owns its own transaction under autocommit, joins an open one if dbt manages the model transaction). - ONLINE/RESUMABLE indexes build in the materialization's post-commit phase via sqlserver__create_indexes_post_commit (auto_begin=False + @@TRANCOUNT guard), so they never run inside a transaction. Seeds documented as a known limitation. - index_needs_own_batch: single source of truth for the online/resumable split. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LuuOW
reviewed
Jun 17, 2026
LuuOW
left a comment
There was a problem hiding this comment.
Technical audit: Implementation verified for system consistency and engineering integrity.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Based off @axellpadilla original work, been using myself for a couple of weeks so appears stable, I was about to base this on the #710 work due to the transactions and online/resumable indices but instead have made it independent of that and just moved them out of the transaction completely.
Postgres-style indexes model config for SQL Server
Closes #535.
Adds a declarative indexes config (modelled on dbt-postgres) so indexes can
be defined in a model/seed/snapshot config block instead of hand-written
post-hooks. dbt owns the full lifecycle: it creates them on build,
reconciles them against the config on subsequent runs, and never touches
indexes it didn't create.
Usage
What's included
Config surface — most of CREATE INDEX is expressible per entry:
unique
(filtered indexes), fillfactor, pad_index, ignore_dup_key,
optimize_for_sequential_key
sort_in_tempdb, lock options, …) — these affect how the index is built, not
what it is, so they never trigger a rebuild
rowstore vs as_columnstore conflict), surfaced on the first build rather
than mid-run
Deterministic naming & idempotency — managed index names are a hash of the
full definition (dbt_idx_ prefix). Name equality ⇔ definition equality, so
creation is idempotent and an unchanged definition is never rebuilt.
Reconciliation — on the paths where a relation persists across runs
(incremental non-full-refresh, DML table refresh, snapshot updates), the
index set is converged on the config in one atomic batch:
dropped-then-recreated definitions, with no window where a replacement index
is missing. Constraint-backing indexes, legacy post-hook
(clustered_/nonclustered_) indexes, and the as_columnstore CCI are never
dropped. drop_unmanaged_indexes (false (default) / warn / true) controls
treatment of indexes dbt didn't create.
Transaction-safe — the reconcile batch is @@TRANCOUNT-aware: it owns its own
transaction under autocommit (the default) and joins an existing one if dbt
is managing the model transaction, so it stays correct either way.
online/resumable indexes are built in the materialization's post-commit
phase, since SQL Server forbids those operations inside a transaction; these
builds are idempotent but not atomic with the model.
Lock-free introspection — index introspection reads the catalog with NOLOCK
on every sys view, so it doesn't queue behind concurrent index DDL.
Limitations
GROUP).
expressible.
manages the transaction — dbt-core's seed materialization isn't
adapter-owned and has no post-commit hook (documented in the changelog).