[SPARK-57262][SQL][WEBUI] Job description derived from a query should respect spark.sql.redaction.string.regex#56361
Conversation
There was a problem hiding this comment.
Thank you for resubmission. BTW, we don't need a new test coverage for this, @sarutak ? I thought you will reuse your previous test case, SparkSQLDriverSuite.scala.
|
Oh, forgot to include the new test. Thanks @dongjoon-hyun . |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM (Pending CIs). Thank you, @sarutak !
There was a problem hiding this comment.
Ur, @sarutak . While checking once more before merging, I found a little strange description. Is the PR description correct?
NOTE:
Even after this PR is merged, when a job description is set manually using sc.setJobDescription, the description displayed in the /jobs page and the one on the bottom of /SQL/execution page are not redacted though the one on the top of SQL/execution page is redacted.
|
Yes, it's correct, @dongjoon-hyun. Please check the screenshot in the description. |
|
Thank you for your confirmation. Let me check this once more Today, @sarutak . |
|
I finished the verification and fully agree with your assessment. Only SQL tab has the redacted result. We are good to go. Thank you for waiting for me. Feel free to merge this, @sarutak . |
… respect `spark.sql.redaction.string.regex` ### What changes were proposed in this pull request? This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`. ### Why are the changes needed? In the current implementation, when a query is executed through `SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description in the table on the top of `/SQL/execution` is redacted. <img width="1083" height="349" alt="sql-execution-page-top-table" src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda" /> But the description in the table on the `/jobs` page and the one in the table on the bottom of `/SQL/execution` page are not redacted. <img width="525" height="692" alt="jobs-page-before" src="https://github.com/user-attachments/assets/31c88b98-779b-4305-bf71-58f19a1d7117" /> <img width="515" height="274" alt="sql-execution-page-before" src="https://github.com/user-attachments/assets/012be251-f642-4ded-8f77-32f811b05cac" /> NOTE: Even after this PR is merged, when a job description is set manually using `sc.setJobDescription`, the description displayed in the `/jobs` page and the one on the bottom of `/SQL/execution` page are not redacted though the one on the top of `SQL/execution` page is redacted. ``` $ bin/spark-shell -c spark.sql.redaction.string.regex="secret.*=.*" scala> val s = "SELECT * FROM (SELECT 'secret=1')" scala> sc.setJobDescription(s) scala> sql(s).show() +--------+ |secret=1| +--------+ |secret=1| +--------+ ``` **description in `/jobs` page** <img width="555" height="226" alt="jobs-page-not-redacted" src="https://github.com/user-attachments/assets/b4e084ad-b648-4ba6-b049-ef42f570398d" /> **description in `/SQL/execution` (top)** <img width="913" height="203" alt="sql-execution-page-redacted" src="https://github.com/user-attachments/assets/91e745f0-aa7f-4618-98e9-5b4b117415da" /> **description in `/SQL/execution` (bottom)** <img width="536" height="292" alt="sql-execution-page-not-redacted" src="https://github.com/user-attachments/assets/761aad76-0d1b-49af-9e03-58510cd474d1" /> This is consistent with the previous behavior and not a regression. There is no simple way to redact them and doing it is out of scope of this PR. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Added new test and confirmed the test `SQL execution description should respect spark.sql.redaction.string.regex` added in #56358 passed. Also confirmed descriptions are redacted in UI. ``` $ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*" spark-sql (default)> CREATE TABLE test1(secret string); spark-sql (default)> SELECT * FROM test1 WHERE secret=1; ``` <img width="607" height="213" alt="jobs-page-after-2" src="https://github.com/user-attachments/assets/62646cfc-67c3-46b5-a9f9-695b1f874462" /> <img width="589" height="274" alt="sql-execution-page-after-2" src="https://github.com/user-attachments/assets/597db0da-58fb-4275-b6aa-7e8b301f15d0" /> ### Was this patch authored or co-authored using generative AI tooling? Kiro CLI / Claude Closes #56361 from sarutak/fix-redact-sql-description-v2. Authored-by: Kousuke Saruta <sarutak@amazon.co.jp> Signed-off-by: Kousuke Saruta <sarutak@apache.org> (cherry picked from commit 96b255f) Signed-off-by: Kousuke Saruta <sarutak@apache.org>
… respect `spark.sql.redaction.string.regex` ### What changes were proposed in this pull request? This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`. ### Why are the changes needed? In the current implementation, when a query is executed through `SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description in the table on the top of `/SQL/execution` is redacted. <img width="1083" height="349" alt="sql-execution-page-top-table" src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda" /> But the description in the table on the `/jobs` page and the one in the table on the bottom of `/SQL/execution` page are not redacted. <img width="525" height="692" alt="jobs-page-before" src="https://github.com/user-attachments/assets/31c88b98-779b-4305-bf71-58f19a1d7117" /> <img width="515" height="274" alt="sql-execution-page-before" src="https://github.com/user-attachments/assets/012be251-f642-4ded-8f77-32f811b05cac" /> NOTE: Even after this PR is merged, when a job description is set manually using `sc.setJobDescription`, the description displayed in the `/jobs` page and the one on the bottom of `/SQL/execution` page are not redacted though the one on the top of `SQL/execution` page is redacted. ``` $ bin/spark-shell -c spark.sql.redaction.string.regex="secret.*=.*" scala> val s = "SELECT * FROM (SELECT 'secret=1')" scala> sc.setJobDescription(s) scala> sql(s).show() +--------+ |secret=1| +--------+ |secret=1| +--------+ ``` **description in `/jobs` page** <img width="555" height="226" alt="jobs-page-not-redacted" src="https://github.com/user-attachments/assets/b4e084ad-b648-4ba6-b049-ef42f570398d" /> **description in `/SQL/execution` (top)** <img width="913" height="203" alt="sql-execution-page-redacted" src="https://github.com/user-attachments/assets/91e745f0-aa7f-4618-98e9-5b4b117415da" /> **description in `/SQL/execution` (bottom)** <img width="536" height="292" alt="sql-execution-page-not-redacted" src="https://github.com/user-attachments/assets/761aad76-0d1b-49af-9e03-58510cd474d1" /> This is consistent with the previous behavior and not a regression. There is no simple way to redact them and doing it is out of scope of this PR. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Added new test and confirmed the test `SQL execution description should respect spark.sql.redaction.string.regex` added in #56358 passed. Also confirmed descriptions are redacted in UI. ``` $ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*" spark-sql (default)> CREATE TABLE test1(secret string); spark-sql (default)> SELECT * FROM test1 WHERE secret=1; ``` <img width="607" height="213" alt="jobs-page-after-2" src="https://github.com/user-attachments/assets/62646cfc-67c3-46b5-a9f9-695b1f874462" /> <img width="589" height="274" alt="sql-execution-page-after-2" src="https://github.com/user-attachments/assets/597db0da-58fb-4275-b6aa-7e8b301f15d0" /> ### Was this patch authored or co-authored using generative AI tooling? Kiro CLI / Claude Closes #56361 from sarutak/fix-redact-sql-description-v2. Authored-by: Kousuke Saruta <sarutak@amazon.co.jp> Signed-off-by: Kousuke Saruta <sarutak@apache.org> (cherry picked from commit 96b255f) Signed-off-by: Kousuke Saruta <sarutak@apache.org>
|
Merged to |
…hould respect `spark.sql.redaction.string.regex` ### What changes were proposed in this pull request? This PR backports SPARK-57262 (#56361) to `branch-4.1`. This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`. ### Why are the changes needed? In the current implementation, when a query is executed through `SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description in the table on the `/SQL` page is redacted. <img width="488" height="252" alt="sql-page-4 1" src="https://github.com/user-attachments/assets/bec6cd3d-c655-4bb2-9e95-5d0efb86999a" /> But the description in the table on the `/jobs` page is not redacted. <img width="444" height="162" alt="jobs-page-before-4 1" src="https://github.com/user-attachments/assets/a6c93c26-2a64-4d41-ae1a-23ac77e38f84" /> ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Added new test and confirmed the test `SQL execution description should respect spark.sql.redaction.string.regex` added in #56358 passed. Also confirmed the description is redacted in UI. ``` $ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*" spark-sql (default)> CREATE TABLE test1(secret string); spark-sql (default)> SELECT * FROM test1 WHERE secret=1; ``` <img width="601" height="205" alt="jobs-page-after-4 1" src="https://github.com/user-attachments/assets/c21ae7cc-1089-4e3c-828a-ad022cd2492f" /> ### Was this patch authored or co-authored using generative AI tooling? Kiro CLI / Claude Closes #56364 from sarutak/fix-redact-sql-description-v2-4.1. Authored-by: Kousuke Saruta <sarutak@amazon.co.jp> Signed-off-by: Kousuke Saruta <sarutak@apache.org>
…hould respect `spark.sql.redaction.string.regex` ### What changes were proposed in this pull request? This PR backports SPARK-57262 (#56361) to `branch-4.0`. This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`. ### Why are the changes needed? In the current implementation, when a query is executed through `SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description in the table on the `/SQL` page is redacted. <img width="436" height="260" alt="sql-page-4 0" src="https://github.com/user-attachments/assets/81bb3296-67b1-4a03-9c41-e720f767e16e" /> But the description in the table on the `/jobs` page is not redacted. <img width="511" height="174" alt="jobs-page-before-4 0" src="https://github.com/user-attachments/assets/3107a44e-0e35-4a93-977c-2e764e1c41ae" /> ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Added new test and confirmed the test `SQL execution description should respect spark.sql.redaction.string.regex` added in #56358 passed. Also confirmed the description is redacted in UI. ``` $ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*" spark-sql (default)> CREATE TABLE test1(secret string); spark-sql (default)> SELECT * FROM test1 WHERE secret=1; ``` <img width="459" height="171" alt="jobs-page-after-4 0" src="https://github.com/user-attachments/assets/ba24e318-5054-415f-9f96-383f7ed3e99f" /> ### Was this patch authored or co-authored using generative AI tooling? Kiro CLI / Claude Closes #56365 from sarutak/fix-redact-sql-description-v2-4.0. Authored-by: Kousuke Saruta <sarutak@amazon.co.jp> Signed-off-by: Kousuke Saruta <sarutak@apache.org>
What changes were proposed in this pull request?
This PR changes
SparkSQLDriver.scalato redact a query beforesetJobDescription.Why are the changes needed?
In the current implementation, when a query is executed through

SparkSQLDriver, redaction is done inSQLExecution.scalaso the description in the table on the top of/SQL/executionis redacted.But the description in the table on the


/jobspage and the one in the table on the bottom of/SQL/executionpage are not redacted.NOTE:
Even after this PR is merged, when a job description is set manually using
sc.setJobDescription, the description displayed in the/jobspage and the one on the bottom of/SQL/executionpage are not redacted though the one on the top ofSQL/executionpage is redacted.description in



/jobspagedescription in
/SQL/execution(top)description in
/SQL/execution(bottom)This is consistent with the previous behavior and not a regression. There is no simple way to redact them and doing it is out of scope of this PR.
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
Added new test and confirmed the test
SQL execution description should respect spark.sql.redaction.string.regexadded in #56358 passed.Also confirmed descriptions are redacted in UI.
Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Claude