Skip to content

[SPARK-57262][SQL][WEBUI] Job description derived from a query should respect spark.sql.redaction.string.regex#56361

Closed
sarutak wants to merge 2 commits into
apache:masterfrom
sarutak:fix-redact-sql-description-v2
Closed

[SPARK-57262][SQL][WEBUI] Job description derived from a query should respect spark.sql.redaction.string.regex#56361
sarutak wants to merge 2 commits into
apache:masterfrom
sarutak:fix-redact-sql-description-v2

Conversation

@sarutak

@sarutak sarutak commented Jun 7, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR changes SparkSQLDriver.scala to redact a query before setJobDescription.

Why are the changes needed?

In the current implementation, when a query is executed through SparkSQLDriver, redaction is done in SQLExecution.scala so the description in the table on the top of /SQL/execution is redacted.
sql-execution-page-top-table

But the description in the table on the /jobs page and the one in the table on the bottom of /SQL/execution page are not redacted.
jobs-page-before
sql-execution-page-before

NOTE:
Even after this PR is merged, when a job description is set manually using sc.setJobDescription, the description displayed in the /jobs page and the one on the bottom of /SQL/execution page are not redacted though the one on the top of SQL/execution page is redacted.

$ bin/spark-shell -c spark.sql.redaction.string.regex="secret.*=.*"
scala> val s = "SELECT * FROM (SELECT 'secret=1')"
scala> sc.setJobDescription(s)
scala> sql(s).show()
+--------+
|secret=1|
+--------+
|secret=1|
+--------+

description in /jobs page
jobs-page-not-redacted
description in /SQL/execution (top)
sql-execution-page-redacted
description in /SQL/execution (bottom)
sql-execution-page-not-redacted

This is consistent with the previous behavior and not a regression. There is no simple way to redact them and doing it is out of scope of this PR.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Added new test and confirmed the test SQL execution description should respect spark.sql.redaction.string.regex added in #56358 passed.
Also confirmed descriptions are redacted in UI.

$ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*"
spark-sql (default)>  CREATE TABLE test1(secret string);
spark-sql (default)> SELECT * FROM test1 WHERE secret=1;
jobs-page-after-2 sql-execution-page-after-2

Was this patch authored or co-authored using generative AI tooling?

Kiro CLI / Claude

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for resubmission. BTW, we don't need a new test coverage for this, @sarutak ? I thought you will reuse your previous test case, SparkSQLDriverSuite.scala.

@sarutak

sarutak commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Oh, forgot to include the new test. Thanks @dongjoon-hyun .

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs). Thank you, @sarutak !

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, @sarutak . While checking once more before merging, I found a little strange description. Is the PR description correct?

NOTE:
Even after this PR is merged, when a job description is set manually using sc.setJobDescription, the description displayed in the /jobs page and the one on the bottom of /SQL/execution page are not redacted though the one on the top of SQL/execution page is redacted.

@sarutak

sarutak commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Yes, it's correct, @dongjoon-hyun. Please check the screenshot in the description.
As I mentioned in the description too, it's consistent with the previous behavior and not a regression. There is no simple way to redact them and doing it, and I think users should not set a text which includes confidential information as a description by themselves.

@dongjoon-hyun

Copy link
Copy Markdown
Member

Thank you for your confirmation. Let me check this once more Today, @sarutak .

@dongjoon-hyun

dongjoon-hyun commented Jun 7, 2026

Copy link
Copy Markdown
Member

I finished the verification and fully agree with your assessment. Only SQL tab has the redacted result. We are good to go. Thank you for waiting for me. Feel free to merge this, @sarutak .

@sarutak sarutak closed this in 96b255f Jun 8, 2026
sarutak added a commit that referenced this pull request Jun 8, 2026
… respect `spark.sql.redaction.string.regex`

### What changes were proposed in this pull request?
This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`.

### Why are the changes needed?
In the current implementation, when a query is executed through `SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description in the table on the top of `/SQL/execution` is redacted.
<img width="1083" height="349" alt="sql-execution-page-top-table" src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda" />

But the description in the table on the `/jobs` page and the one in the table on the bottom of `/SQL/execution` page are not redacted.
<img width="525" height="692" alt="jobs-page-before" src="https://github.com/user-attachments/assets/31c88b98-779b-4305-bf71-58f19a1d7117" />
<img width="515" height="274" alt="sql-execution-page-before" src="https://github.com/user-attachments/assets/012be251-f642-4ded-8f77-32f811b05cac" />

NOTE:
Even after this PR is merged, when a job description is set manually using `sc.setJobDescription`, the description displayed in the `/jobs` page and the one on the bottom of `/SQL/execution` page are not redacted though the one on the top of `SQL/execution` page is redacted.

```
$ bin/spark-shell -c spark.sql.redaction.string.regex="secret.*=.*"
scala> val s = "SELECT * FROM (SELECT 'secret=1')"
scala> sc.setJobDescription(s)
scala> sql(s).show()
+--------+
|secret=1|
+--------+
|secret=1|
+--------+
```

**description in `/jobs` page**
<img width="555" height="226" alt="jobs-page-not-redacted" src="https://github.com/user-attachments/assets/b4e084ad-b648-4ba6-b049-ef42f570398d" />
**description in `/SQL/execution` (top)**
<img width="913" height="203" alt="sql-execution-page-redacted" src="https://github.com/user-attachments/assets/91e745f0-aa7f-4618-98e9-5b4b117415da" />
**description in `/SQL/execution` (bottom)**
<img width="536" height="292" alt="sql-execution-page-not-redacted" src="https://github.com/user-attachments/assets/761aad76-0d1b-49af-9e03-58510cd474d1" />

This is consistent with the previous behavior and not a regression. There is no simple way to redact them and doing it is out of scope of this PR.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
Added new test and confirmed the test `SQL execution description should respect spark.sql.redaction.string.regex` added in #56358 passed.
Also confirmed descriptions are redacted in UI.
```
$ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*"
spark-sql (default)>  CREATE TABLE test1(secret string);
spark-sql (default)> SELECT * FROM test1 WHERE secret=1;
```
<img width="607" height="213" alt="jobs-page-after-2" src="https://github.com/user-attachments/assets/62646cfc-67c3-46b5-a9f9-695b1f874462" />
<img width="589" height="274" alt="sql-execution-page-after-2" src="https://github.com/user-attachments/assets/597db0da-58fb-4275-b6aa-7e8b301f15d0" />

### Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Claude

Closes #56361 from sarutak/fix-redact-sql-description-v2.

Authored-by: Kousuke Saruta <sarutak@amazon.co.jp>
Signed-off-by: Kousuke Saruta <sarutak@apache.org>
(cherry picked from commit 96b255f)
Signed-off-by: Kousuke Saruta <sarutak@apache.org>
sarutak added a commit that referenced this pull request Jun 8, 2026
… respect `spark.sql.redaction.string.regex`

### What changes were proposed in this pull request?
This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`.

### Why are the changes needed?
In the current implementation, when a query is executed through `SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description in the table on the top of `/SQL/execution` is redacted.
<img width="1083" height="349" alt="sql-execution-page-top-table" src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda" />

But the description in the table on the `/jobs` page and the one in the table on the bottom of `/SQL/execution` page are not redacted.
<img width="525" height="692" alt="jobs-page-before" src="https://github.com/user-attachments/assets/31c88b98-779b-4305-bf71-58f19a1d7117" />
<img width="515" height="274" alt="sql-execution-page-before" src="https://github.com/user-attachments/assets/012be251-f642-4ded-8f77-32f811b05cac" />

NOTE:
Even after this PR is merged, when a job description is set manually using `sc.setJobDescription`, the description displayed in the `/jobs` page and the one on the bottom of `/SQL/execution` page are not redacted though the one on the top of `SQL/execution` page is redacted.

```
$ bin/spark-shell -c spark.sql.redaction.string.regex="secret.*=.*"
scala> val s = "SELECT * FROM (SELECT 'secret=1')"
scala> sc.setJobDescription(s)
scala> sql(s).show()
+--------+
|secret=1|
+--------+
|secret=1|
+--------+
```

**description in `/jobs` page**
<img width="555" height="226" alt="jobs-page-not-redacted" src="https://github.com/user-attachments/assets/b4e084ad-b648-4ba6-b049-ef42f570398d" />
**description in `/SQL/execution` (top)**
<img width="913" height="203" alt="sql-execution-page-redacted" src="https://github.com/user-attachments/assets/91e745f0-aa7f-4618-98e9-5b4b117415da" />
**description in `/SQL/execution` (bottom)**
<img width="536" height="292" alt="sql-execution-page-not-redacted" src="https://github.com/user-attachments/assets/761aad76-0d1b-49af-9e03-58510cd474d1" />

This is consistent with the previous behavior and not a regression. There is no simple way to redact them and doing it is out of scope of this PR.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
Added new test and confirmed the test `SQL execution description should respect spark.sql.redaction.string.regex` added in #56358 passed.
Also confirmed descriptions are redacted in UI.
```
$ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*"
spark-sql (default)>  CREATE TABLE test1(secret string);
spark-sql (default)> SELECT * FROM test1 WHERE secret=1;
```
<img width="607" height="213" alt="jobs-page-after-2" src="https://github.com/user-attachments/assets/62646cfc-67c3-46b5-a9f9-695b1f874462" />
<img width="589" height="274" alt="sql-execution-page-after-2" src="https://github.com/user-attachments/assets/597db0da-58fb-4275-b6aa-7e8b301f15d0" />

### Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Claude

Closes #56361 from sarutak/fix-redact-sql-description-v2.

Authored-by: Kousuke Saruta <sarutak@amazon.co.jp>
Signed-off-by: Kousuke Saruta <sarutak@apache.org>
(cherry picked from commit 96b255f)
Signed-off-by: Kousuke Saruta <sarutak@apache.org>
@sarutak

sarutak commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

Merged to master/branch-4.x/branch-4.2. Thank you all for reviewing.

sarutak added a commit that referenced this pull request Jun 9, 2026
…hould respect `spark.sql.redaction.string.regex`

### What changes were proposed in this pull request?
This PR backports SPARK-57262 (#56361) to `branch-4.1`.
This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`.

### Why are the changes needed?
In the current implementation, when a query is executed through `SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description in the table on the `/SQL` page is redacted.
<img width="488" height="252" alt="sql-page-4 1" src="https://github.com/user-attachments/assets/bec6cd3d-c655-4bb2-9e95-5d0efb86999a" />

But the description in the table on the `/jobs` page is not redacted.
<img width="444" height="162" alt="jobs-page-before-4 1" src="https://github.com/user-attachments/assets/a6c93c26-2a64-4d41-ae1a-23ac77e38f84" />

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
Added new test and confirmed the test `SQL execution description should respect spark.sql.redaction.string.regex` added in #56358 passed.
Also confirmed the description is redacted in UI.
```
$ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*"
spark-sql (default)>  CREATE TABLE test1(secret string);
spark-sql (default)> SELECT * FROM test1 WHERE secret=1;
```
<img width="601" height="205" alt="jobs-page-after-4 1" src="https://github.com/user-attachments/assets/c21ae7cc-1089-4e3c-828a-ad022cd2492f" />

### Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Claude

Closes #56364 from sarutak/fix-redact-sql-description-v2-4.1.

Authored-by: Kousuke Saruta <sarutak@amazon.co.jp>
Signed-off-by: Kousuke Saruta <sarutak@apache.org>
sarutak added a commit that referenced this pull request Jun 11, 2026
…hould respect `spark.sql.redaction.string.regex`

### What changes were proposed in this pull request?
This PR backports SPARK-57262 (#56361) to `branch-4.0`.
This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`.

### Why are the changes needed?
In the current implementation, when a query is executed through `SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description in the table on the `/SQL` page is redacted.
<img width="436" height="260" alt="sql-page-4 0" src="https://github.com/user-attachments/assets/81bb3296-67b1-4a03-9c41-e720f767e16e" />

But the description in the table on the `/jobs` page is not redacted.
<img width="511" height="174" alt="jobs-page-before-4 0" src="https://github.com/user-attachments/assets/3107a44e-0e35-4a93-977c-2e764e1c41ae" />

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
Added new test and confirmed the test `SQL execution description should respect spark.sql.redaction.string.regex` added in #56358 passed.
Also confirmed the description is redacted in UI.
```
$ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*"
spark-sql (default)>  CREATE TABLE test1(secret string);
spark-sql (default)> SELECT * FROM test1 WHERE secret=1;
```
<img width="459" height="171" alt="jobs-page-after-4 0" src="https://github.com/user-attachments/assets/ba24e318-5054-415f-9f96-383f7ed3e99f" />

### Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Claude

Closes #56365 from sarutak/fix-redact-sql-description-v2-4.0.

Authored-by: Kousuke Saruta <sarutak@amazon.co.jp>
Signed-off-by: Kousuke Saruta <sarutak@apache.org>
@sarutak sarutak deleted the fix-redact-sql-description-v2 branch June 15, 2026 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants