Skip to content

#535 Postgres style indexes#714

Open
Benjamin-Knight wants to merge 23 commits into
dbt-msft:masterfrom
Benjamin-Knight:feat/535-postgres-style-indexes
Open

#535 Postgres style indexes#714
Benjamin-Knight wants to merge 23 commits into
dbt-msft:masterfrom
Benjamin-Knight:feat/535-postgres-style-indexes

Conversation

@Benjamin-Knight

Copy link
Copy Markdown
Collaborator

Based off @axellpadilla original work, been using myself for a couple of weeks so appears stable, I was about to base this on the #710 work due to the transactions and online/resumable indices but instead have made it independent of that and just moved them out of the transaction completely.

Postgres-style indexes model config for SQL Server

Closes #535.

Adds a declarative indexes config (modelled on dbt-postgres) so indexes can
be defined in a model/seed/snapshot config block instead of hand-written
post-hooks. dbt owns the full lifecycle: it creates them on build,
reconciles them against the config on subsequent runs, and never touches
indexes it didn't create.

Usage

  {{ config(
      materialized="table",
      as_columnstore=False,
      indexes=[
        {"columns": ["id"], "type": "clustered", "unique": True},
        {"columns": ["created_at", {"column": "status", "desc": True}],
         "included_columns": ["amount"], "where": "status is not null"},
        {"columns": ["lookup_key"], "build_options": {"online": True}},
      ]
  ) }}

What's included

Config surface — most of CREATE INDEX is expressible per entry:

  • columns (ascending, or {"column": name, "desc": true}), included_columns,
    unique
  • type: clustered / nonclustered / columnstore
  • data_compression (row/page, or columnstore/columnstore_archive), where
    (filtered indexes), fillfactor, pad_index, ignore_dup_key,
    optimize_for_sequential_key
  • build_options for build-time-only settings (online, resumable, maxdop,
    sort_in_tempdb, lock options, …) — these affect how the index is built, not
    what it is, so they never trigger a rebuild
  • Cross-config validation (e.g. at most one clustered index; clustered
    rowstore vs as_columnstore conflict), surfaced on the first build rather
    than mid-run

Deterministic naming & idempotency — managed index names are a hash of the
full definition (dbt_idx_ prefix). Name equality ⇔ definition equality, so
creation is idempotent and an unchanged definition is never rebuilt.

Reconciliation — on the paths where a relation persists across runs
(incremental non-full-refresh, DML table refresh, snapshot updates), the
index set is converged on the config in one atomic batch:
dropped-then-recreated definitions, with no window where a replacement index
is missing. Constraint-backing indexes, legacy post-hook
(clustered_/nonclustered_) indexes, and the as_columnstore CCI are never
dropped. drop_unmanaged_indexes (false (default) / warn / true) controls
treatment of indexes dbt didn't create.

Transaction-safe — the reconcile batch is @@TRANCOUNT-aware: it owns its own
transaction under autocommit (the default) and joins an existing one if dbt
is managing the model transaction, so it stays correct either way.
online/resumable indexes are built in the materialization's post-commit
phase, since SQL Server forbids those operations inside a transaction; these
builds are idempotent but not atomic with the model.

Lock-free introspection — index introspection reads the catalog with NOLOCK
on every sys view, so it doesn't queue behind concurrent index DDL.

Limitations

  • Minimum supported SQL Server version is 2017 (STRING_AGG ... WITHIN
    GROUP).
  • Partitioning, XML_COMPRESSION, and ordered columnstore are not yet
    expressible.
  • online/resumable build options are not supported on seed models when dbt
    manages the transaction — dbt-core's seed materialization isn't
    adapter-owned and has no post-commit hook (documented in the changelog).

axellpadilla and others added 23 commits June 4, 2026 15:40
test_table_index_names_stable_across_runs is xfail(strict): timestamp-hashed
index names regenerate on every rebuild, which blocks name-based
reconciliation. test_table_definitions_stable_across_runs pins that the
definition set does not accumulate today (table rebuilds are always fresh).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
render() now hashes the full index definition (ordered columns, sorted
included columns, relation identity, unique, type) under a dbt_idx_ prefix.
SQL Server index names are per-table scoped, so the Postgres rename
collision (dbt-core#1945) the timestamp guarded against cannot occur here;
determinism is required so name equality <=> definition equality for
reconciliation. Also removes the leftover debug print.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sqlserver__get_create_index_sql now guards with IF NOT EXISTS (names are
deterministic full-definition hashes, so an existing index with the same
name is already the desired index) and bracket-quotes the index name, key
columns and INCLUDE columns so reserved-word column names work. Adds a
reserved-word functional test and consolidates the name/definition
stability assertions into one two-build test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
data_compression (none|row|page, rowstore only) participates in the
definition hash and equality, so changing it produces a new name and a
rebuild on reconcile; 'none' normalizes to absent so it hashes like the
engine default. sort_in_tempdb is a build-time option: excluded from hash
and equality (not introspectable, doesn't change the resulting index) so
toggling it never triggers drop/recreate. Both render into a WITH clause
on CREATE INDEX; both rejected for columnstore.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
index_config_changes() computes drops-before-creates from describe rows vs
expected configs using name-set arithmetic (deterministic names make name
equality <=> definition equality). Hard exclusions in every mode:
constraint-backing indexes and legacy post-hook (clustered_/nonclustered_)
names. drop_unmanaged_indexes false|warn|true gates other unmanaged
indexes, with bool normalization for YAML values. A clustered index in
config blocked by an existing non-droppable clustered index raises a clear
error naming the blocker instead of failing later in CREATE.

Also fixes parse_relation_results: strips whitespace from aggregated
column lists, parses NULL/empty includes to empty sets, coerces unique to
bool, and carries data_compression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New macros: sqlserver__describe_indexes (sys.indexes + STRING_AGG key/
include columns, compression, constraint flags; SQL Server 2017+),
sqlserver__get_drop_index_sql, and sqlserver__reconcile_indexes, which
diffs existing indexes against the indexes config via
adapter.index_changes and applies drops before creates with info-level
logging. Wired into the three paths where the relation persists across
runs: incremental non-full-refresh, table_refresh_method=dml pure-DML
(after the swap transaction), and snapshot updates.

drop_unmanaged_indexes config (false|warn|true) gates sweeping of
unmanaged indexes; constraint-backing indexes, legacy post-hook names and
the as_columnstore CCI are never dropped in any mode. as_node_config now
emits JSON-compatible types so reconcile creates can round-trip through
parse_index (latent bug in the original PR).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
adapter.validate_indexes rejects more than one clustered entry and a
clustered rowstore entry combined with as_columnstore=true (the default),
replacing the opaque engine error 'Cannot create more than one clustered
index' with an actionable message. Called from a new
sqlserver__create_indexes override and from sqlserver__reconcile_indexes.
as_columnstore is only weighed for materializations that honor it
(table/incremental/snapshot) - seeds never build a CCI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Coverage baseline on relation_configs (99%, single defensive re-raise
uncovered) exposed that dbtClassMixin/mashumaro silently replaces any
from_dict defined in the class body with a generated method - our
normalization override was dead code, so data_compression: PAGE/NONE
(uppercase) failed validation instead of normalizing. Normalization now
happens in parse(), the raw-config entry point, and the dead override is
removed with an explanatory note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dbt fires anonymous-usage telemetry and a PyPI version check on every
invocation; under the test harness that is one to three in-process
run_dbt() calls per functional test. On sandboxed or offline networks
(devcontainers, some CI runners) those requests stall with variable
timeouts - measured 21-95s per dbt invocation at ~zero CPU - which
inflated the functional suite from ~17s to ~96s per test. Even with a
working network they are wasted cycles in a test run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New fields: per-column direction ({'column': n, 'desc': true} entries,
normalized into descending_columns), where (filtered indexes, nonclustered
and columnstore), fillfactor/pad_index, ignore_dup_key (unique rowstore),
optimize_for_sequential_key (rowstore, 2019+), and columnstore /
columnstore_archive data_compression values for columnstore indexes.

All definition fields join the name hash ONLY when set, so existing
managed index names stay stable across the upgrade. Normalization now runs
before jsonschema validation in parse() so dict column entries validate.
With this, most CREATE INDEX situations are expressible in config rather
than falling back to post-hooks, keeping definitions machine-readable for
the planned prebuilt feature.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
online, maxdop, resumable, max_duration, allow_row_locks,
allow_page_locks, statistics_norecompute, statistics_incremental and
compression_delay flow through to the WITH clause. Excluded from the name
hash and from equality: they affect how an index is built, not what it is,
so toggling them must never trigger a drop/recreate on reconcile. Unknown
keys fail validation with the allowed list.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Create macro: per-column DESC, ]] identifier escaping, WHERE clause for
filtered indexes, and a full WITH assembly (data_compression, fillfactor,
pad_index, ignore_dup_key, optimize_for_sequential_key, sort_in_tempdb,
plus build_options passthrough). Describe macro: descending key columns,
filter_definition, fill_factor, ignore_dup_key, NCCI compression; MAX()
instead of TOP 1 on sys.partitions so mixed-compression partitions can't
make reconcile thrash; optimize_for_sequential_key deliberately not
selected (catalog column is 2019+, comparisons are name-based).

Validation addition found by the functional fixture: the engine forbids
IGNORE_DUP_KEY on filtered indexes, so ignore_dup_key x where now fails
fast at parse time.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Reconcile drops+creates now execute as ONE transactional batch
  (set xact_abort on) so a definition change is atomic: no window where a
  replacement index or its uniqueness enforcement is missing, on any of
  the three reconcile surfaces.
- drop_unmanaged_indexes is validated in validate_indexes (shared
  normalize_drop_unmanaged), so a bad value fails the FIRST build instead
  of surfacing only when reconciliation later runs.
- SQLServerIndexConfigChange's create-with-no-columns rule compared a
  tuple to set() (always False, dead); now uses truthiness.
- New tests: DML no-churn on unchanged reruns; project-level
  dbt_project.yml models +indexes apply and model-level config clobbers
  them; snapshot with clustered index rejects default as_columnstore;
  invalid drop_unmanaged_indexes fails on first build.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
describe_indexes only hinted sys.indexes with NOLOCK; the OUTER APPLY
reads of sys.index_columns, sys.columns and sys.partitions took shared
locks, so introspection queued on LCK_M_S behind concurrent index DDL on
the same table. Apply information_schema_hints() to every catalog read so
the whole metadata query is lock-free, matching the rest of the file.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Remove duplicate AdapterSpecificConfigs class attribute
- Run ONLINE/RESUMABLE index creates outside the reconcile transaction
  (SQL Server forbids RESUMABLE in a user transaction; an ONLINE build
  wrapped in one holds its locks until commit, negating the non-blocking
  intent). index_changes now returns a separate creates_no_txn list.
- Raise on name-hash collisions between distinct index configs instead of
  silently dropping one in the expected-by-name map
- Remove dead parse_model_node (unused in production; not a framework hook)
- Fix indexes config type hint Tuple[X] -> Tuple[X, ...]
- black/isort formatting on test_index_macros.py

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dependency)

Unstacks the indexes feature from PR dbt-msft#710: the branch now builds on current
master alone. The transaction-coexistence work is flag-agnostic, so it needs
no dependency on dbt_sqlserver_use_dbt_transactions:

- Index reconciliation runs in an @@TRANCOUNT-aware batch (owns its own
  transaction under autocommit, joins an open one if dbt manages the model
  transaction).
- ONLINE/RESUMABLE indexes build in the materialization's post-commit phase
  via sqlserver__create_indexes_post_commit (auto_begin=False + @@TRANCOUNT
  guard), so they never run inside a transaction. Seeds documented as a known
  limitation.
- index_needs_own_batch: single source of truth for the online/resumable split.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@axellpadilla axellpadilla added this to the v1.10.1rc1 milestone Jun 16, 2026

@LuuOW LuuOW left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technical audit: Implementation verified for system consistency and engineering integrity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Postgres style index creation

3 participants