Skip to content

Fix ODPS import/merge + data-product metadata persistence bugs (#27600)#29205

Open
harshach wants to merge 6 commits into
mainfrom
odps-fixes-main-v2
Open

Fix ODPS import/merge + data-product metadata persistence bugs (#27600)#29205
harshach wants to merge 6 commits into
mainfrom
odps-fixes-main-v2

Conversation

@harshach

@harshach harshach commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

The ODPS / Data Products feature shipped with several bugs in its import, merge, and metadata-edit paths, surfaced by a new Playwright suite (the flow had no E2E coverage). All are still present on main.

  1. DataProduct PATCH never persisted dataProductType/visibility/portfolioPriority. entitySpecificUpdate recorded changes only for name/domains, so change consolidation reverted these ODPS-aligned scalar fields on store — the PATCH response showed the value but a GET returned null, and the metadata-edit modal silently no-opped. Added recordChange for the three fields.

  2. ODPS import always returned 400 "Unknown custom field odpsMetadata". ODPSConverter.fromODPS wrote the raw document into extension.odpsMetadata, which is not a registered custom property, so validateExtension rejected every import. toODPS never reads it back, so the write was dead weight — removed it; merge/replace now preserve the existing product's custom props.

  3. ODPS import returned 409 because the domain reference built from ?domain= had no id. Resolved via Entity.getEntityReferenceByName.

  4. ODPS create-from-import returned 409 (null primary key) because the converted entity had no id. Assigns a fresh UUID like the normal create path.

  5. ODPS merge/replace wiped owners/domains/experts/reviewers/certification on an existing product: findExistingByName loaded it with only id,name,version, so the lazy relationship fields came back null and smartMerge/fullReplace copied the nulls. Load with EXPORT_FIELDS so they are hydrated before merging.

  6. ODPSImportModal's name guard matched the first name: key via regex, which can hit an SLA dimension or tag rather than the product name (product.details..name). Parse with js-yaml and read that path.

Adds playwright/e2e/Pages/DataProductODPS.spec.ts (9 tests): export, validate (valid + invalid), export→rename→import round-trip, merge preserves domain + owners, the UI export-menu / import-modal / metadata-edit-modal flows, and a name-guard regression test asserting the mismatch toast.

Describe your changes:

Fixes #

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

High-level design:

N/A — small change.

Tests:

Use cases covered

Unit tests

Backend integration tests

Ingestion integration tests

Playwright (UI) tests

Manual testing performed

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • My PR is linked to a GitHub issue via Fixes #<issue-number> above.
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • For UI changes: I attached a screen recording and/or screenshots above.
  • I have added tests (unit / integration / Playwright as applicable) and listed them above.

Greptile Summary

This PR fixes six bugs in the ODPS (Open Data Product Specification) import/merge/edit flows for Data Products, all of which were silent data-loss or hard errors on the main branch, and adds a 9-test Playwright suite to prevent regressions.

  • Backend fixes: entitySpecificUpdate now calls recordChange for dataProductType/visibility/portfolioPriority so PATCH operations actually persist; fromODPS removes the unregistered extension.odpsMetadata write that caused every import to fail validation; buildDataProductFromODPS assigns a UUID and resolves the domain reference with its full id; findExistingByName loads with EXPORT_FIELDS so merge/replace no longer wipes owners/domains/certification.
  • UI fix: extractOdpsProductName replaces the fragile first-name:-key regex with a js-yaml parse + product.details.<lang>.name path read, and missing/unparseable names now block the import rather than bypassing the guard.
  • Tests: New DataProductODPS.spec.ts covers export, validate (valid + invalid), export→rename→import round-trip, merge preserving domain and owners, the UI manage-menu export/import/metadata-edit flows, and two name-guard regression cases.

Confidence Score: 5/5

All six described bugs are correctly fixed; no new defects introduced.

Every fix is narrow and well-targeted: the PATCH persistence change adds three missing recordChange calls, the import path removes dead code that caused a guaranteed validation failure, the domain resolution and UUID assignment are straightforward one-liners, the sparse-load fix points to an already-defined constant, and the UI name-guard now explicitly handles the previously bypassed undefined case. The new Playwright suite directly exercises each corrected code path. No cross-cutting concerns were altered and the changes are self-contained.

No files require special attention.

Important Files Changed

Filename Overview
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataProductRepository.java Adds recordChange calls for dataProductType/visibility/portfolioPriority in entitySpecificUpdate, and extracts reindexDetachedProduct with a narrow EntityNotFoundException guard; both changes are targeted and correct.
openmetadata-service/src/main/java/org/openmetadata/service/resources/domains/DataProductResource.java buildDataProductFromODPS now assigns a fresh UUID and fully resolves the domain reference; findExistingByName now loads with EXPORT_FIELDS instead of a sparse three-field projection, fixing null-domain/owner wipe-out in merge and replace paths.
openmetadata-service/src/main/java/org/openmetadata/service/util/ODPSConverter.java Removes preserveOdpsMetadata (dead code that caused every import to fail validation), and fixes smartMerge/fullReplace to always preserve existing custom properties rather than overwriting them with a null imported extension.
openmetadata-ui/src/main/resources/ui/src/components/DataProducts/ODPSImportModal/ODPSImportModal.component.tsx Replaces the fragile first-name-key regex with a js-yaml parse + product.details..name path read, and adds an explicit guard for the previously-bypassed undefined case (missing name now blocks import rather than silently passing).
openmetadata-ui/src/main/resources/ui/playwright/e2e/Pages/DataProductODPS.spec.ts New 9-test Playwright suite covering export, validate, round-trip import, merge domain/owner preservation, UI export menu, import modal, metadata-edit modal PATCH, and two name-guard regression scenarios.

Comments Outside Diff (1)

  1. openmetadata-service/src/main/java/org/openmetadata/service/resources/domains/DataProductResource.java, line 1855-1867 (link)

    P2 ?domain= is silently ignored for existing products on every strategy

    buildDataProductFromODPS resolves domainFqn and sets it on imported.getDomains(), but both smartMerge and fullReplace unconditionally overwrite that with existing.getDomains(). A caller who passes ?domain=new-domain on a PUT expecting to move the product to a new domain will get a silent no-op: the response returns 200 with the new domain echoed in the body (because imported carries it), but the merge/replace path discards it before the update is applied, so the stored domain never changes. The fix is intentional for the ODPS-as-idempotent-sync case, but the API contract should either document this or propagate the domainFqn parameter through the merge logic.

Reviews (5): Last reviewed commit: "Merge branch 'main' into odps-fixes-main..." | Re-trigger Greptile

The ODPS / Data Products feature shipped with several bugs in its import,
merge, and metadata-edit paths, surfaced by a new Playwright suite (the flow
had no E2E coverage). All are still present on main.

1. DataProduct PATCH never persisted dataProductType/visibility/portfolioPriority.
   entitySpecificUpdate recorded changes only for name/domains, so change
   consolidation reverted these ODPS-aligned scalar fields on store — the PATCH
   response showed the value but a GET returned null, and the metadata-edit
   modal silently no-opped. Added recordChange for the three fields.

2. ODPS import always returned 400 "Unknown custom field odpsMetadata".
   ODPSConverter.fromODPS wrote the raw document into extension.odpsMetadata,
   which is not a registered custom property, so validateExtension rejected
   every import. toODPS never reads it back, so the write was dead weight —
   removed it; merge/replace now preserve the existing product's custom props.

3. ODPS import returned 409 because the domain reference built from ?domain= had
   no id. Resolved via Entity.getEntityReferenceByName.

4. ODPS create-from-import returned 409 (null primary key) because the converted
   entity had no id. Assigns a fresh UUID like the normal create path.

5. ODPS merge/replace wiped owners/domains/experts/reviewers/certification on an
   existing product: findExistingByName loaded it with only id,name,version, so
   the lazy relationship fields came back null and smartMerge/fullReplace copied
   the nulls. Load with EXPORT_FIELDS so they are hydrated before merging.

6. ODPSImportModal's name guard matched the first `name:` key via regex, which
   can hit an SLA dimension or tag rather than the product name
   (product.details.<lang>.name). Parse with js-yaml and read that path.

Adds playwright/e2e/Pages/DataProductODPS.spec.ts (9 tests): export, validate
(valid + invalid), export→rename→import round-trip, merge preserves domain +
owners, the UI export-menu / import-modal / metadata-edit-modal flows, and a
name-guard regression test asserting the mismatch toast.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@harshach harshach requested a review from a team as a code owner June 18, 2026 22:45
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

  • No GitHub issue is linked. Link an issue in the Development section of the PR (or add Fixes #12345 to the description). For a same-org cross-repo issue, add Fixes open-metadata/<repo>#123 to the description.

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Jun 18, 2026
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 62%
62.27% (66685/107087) 44.06% (37350/84753) 45.39% (11208/24690)

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (9 flaky)

✅ 4321 passed · ❌ 0 failed · 🟡 9 flaky · ⏭️ 88 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 302 0 0 4
✅ Shard 2 815 0 0 9
🟡 Shard 3 815 0 1 8
🟡 Shard 4 867 0 2 12
🟡 Shard 5 732 0 1 47
🟡 Shard 6 790 0 5 8
🟡 9 flaky test(s) (passed on retry)
  • Flow/ExploreAggregationCountsMatching.spec.ts › should verify left panel counts and tab search results for normal search (shard 3, 2 retries)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 4, 1 retry)
  • Pages/Entity.spec.ts › User as Owner with unsorted list (shard 4, 1 retry)
  • Pages/EntityDataSteward.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)
  • Pages/LogsViewer.spec.ts › Logs page shows breadcrumb, summary, and log viewer or empty state after opening from bundle suite pipeline tab (shard 6, 1 retry)
  • Pages/ServiceListing.spec.ts › should render the service listing page (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

…test

- ODPSImportModal: treat an unreadable product.details.<lang>.name as a
  mismatch instead of bypassing the guard, so a YAML missing the name path
  can no longer silently overwrite a backend-resolved product. Adds a
  dedicated odps-yaml-name-missing message (synced across all locales).
- DataProductODPS.spec: rename only product.details.<lang>.name via
  js-yaml parse/mutate/stringify instead of a naive whole-document string
  replace, and add an E2E case for the missing-name guard branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@harshach harshach added the skip-pr-checks Bypass PR metadata validation check label Jun 19, 2026
harshach and others added 2 commits June 19, 2026 10:21
yarn i18n only seeds the English fallback for new keys. Provide real
translations for the 18 locales that already localize the sibling odps
messages, reusing each locale's existing terminology (data product, YAML,
quote style). sv-se is intentionally left in English to match its existing
odps-* message block.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… reindex

Two robustness fixes in the domain-delete detach path (from #29138):

- detachFromDeletingDomains now calls removeDomainLineage for each removed
  domain, mirroring EntityUpdater.removeDomains so a hard-deleted domain
  leaves no dangling lineage edge to the retained data product.
- reindexAfterDomainDetach skips products that no longer exist instead of
  throwing: the post-commit loop catches EntityNotFoundException so one
  product deleted elsewhere in the cascade can't abort reindexing the rest.
  Keeps the intentional fresh (non-cached) find for committed-state reindex.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gitar-bot

gitar-bot Bot commented Jun 19, 2026

Copy link
Copy Markdown
Code Review ✅ Approved 2 resolved / 2 findings

Fixes six bugs in the ODPS import, merge, and metadata persistence flows, including issues with domain reference resolution and property consolidation. Adds a new Playwright E2E suite to ensure robust coverage for these critical operations.

✅ 2 resolved
Edge Case: Domain detach removes relationships but not domain lineage

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataProductRepository.java:194-208
removeDomainContainment (DataProductRepository.java:220-225) deletes the CONTAINS and HAS relationships for each detached domain, and persistDomainDetachVersion bumps version / emits the change event. However, neither step calls removeDomainLineage for the detached domains. In the normal PATCH domain-change path the base EntityUpdater.removeDomains calls removeDomainLineage(updated.getId(), entityType, domain) (EntityRepository) so the data product's own domain lineage edge is cleaned up. The detach path's docstring explicitly claims it stays "consistent with a normal domain change" for downstream consumers, but the stale domain lineage edge to the hard-deleted domain is left behind, which can produce dangling lineage references after the domain is gone. Consider invoking removeDomainLineage for each removed domain inside removeDomainContainment/persistDomainDetachVersion.

Bug: reindexAfterDomainDetach can throw if a retained product was deleted

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataProductRepository.java:263-277
reindexAfterDomainDetach (DataProductRepository.java:263-280) calls find(dataProductId, ALL, false) for every id in the post-transaction reindex set when search-write deferral is active. find(...false) throws EntityNotFoundException if the data product no longer exists. The ids come from context.dataProductsToReindex, populated for "retained" products that should survive the domain hard-delete, so in the expected flow they exist. But if any of those products ends up deleted elsewhere in the same cascade, this post-commit reindex loop would throw after the delete transaction already committed, aborting reindexing of the remaining products. Consider using find(..., true) (or a try/guard) to skip missing products and continue the loop.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud

Copy link
Copy Markdown

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs skip-pr-checks Bypass PR metadata validation check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant