SNOW-2408964: support custom_schema when reading xml #4033

sfc-gh-yuwang · 2025-12-11T21:29:16Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-2408964
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
- I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
- If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
Please describe how your code solves the related issue.

Here is a doc to talk about the behavior of XML read with user schema in snowpark:
https://docs.google.com/document/d/1W3os2SGbO_M89TjIa7xVuNrQ0RDDuGMHEOBHFuw0OYw/edit?tab=t.0

sfc-gh-aalam · 2026-01-05T23:02:18Z

src/snowflake/snowpark/dataframe_reader.py

+            and format.lower() == "xml"
+            and XML_ROW_TAG_STRING not in self._cur_options
+        ):
+            raise ValueError("When read XML with user schema, rowtag must be set.")


Suggested change

raise ValueError("When read XML with user schema, rowtag must be set.")

raise ValueError("When reading XML with user schema, rowtag must be set.")

sfc-gh-aalam · 2026-01-05T23:18:02Z

src/snowflake/snowpark/_internal/xml_reader.py

+    return None
+
+
+def generate_norm_column_name_to_ori_column_name_dict(result: dict):


does this really need to be a function?

you are right, I removed this function

sfc-gh-aalam · 2026-01-05T23:39:16Z

src/snowflake/snowpark/dataframe_reader.py

+        # cast to input custom schema type
+        if self._user_schema:
+            cols = [
+                df[single_quote(field._name)]


why are we using single_quote here?

currently all column name is produced with a single quote around,
I am not sure why we are having this behavior but I think we should not change now as it would be a BCR
single quote is added here so that we can correctly read the column

Ahh, I think I remember Jianzhun mentioning we use PIVOT and it snowflake pivot adds single quote for string, and datetime columns. This could break since server side is planning to do a BCR to remove quotes.

src/snowflake/snowpark/dataframe_reader.py

src/snowflake/snowpark/_internal/xml_reader.py

graphite-app · 2026-01-06T00:05:38Z

tests/integ/test_xml_reader_row_tag.py

+    with pytest.raises(
+        ValueError, match="When read XML with user schema, rowtag must be set."
+    ):
+        session.read.schema(user_schema).xml(f"@{tmp_stage_name}/{test_file_books_xml}")


Test regex pattern mismatch with actual error message. The test expects "When read XML with user schema, rowtag must be set." but the actual error message in dataframe_reader.py:1392 is "When reading XML with user schema, rowtag must be set." (note "reading" vs "read"). This test will fail.

# Fix: Update the match pattern to match the actual error message with pytest.raises( ValueError, match="When reading XML with user schema, rowtag must be set." ):

Suggested change

with pytest.raises(

ValueError, match="When read XML with user schema, rowtag must be set."

):

session.read.schema(user_schema).xml(f"@{tmp_stage_name}/{test_file_books_xml}")

with pytest.raises(

ValueError, match="When reading XML with user schema, rowtag must be set."

):

session.read.schema(user_schema).xml(f"@{tmp_stage_name}/{test_file_books_xml}")

Spotted by Graphite Agent

Is this helpful? React 👍 or 👎 to let us know.

src/snowflake/snowpark/_internal/xml_reader.py

sfc-gh-aling · 2026-01-06T00:41:14Z

tests/integ/test_xml_reader_row_tag.py

        ).option("mode", "failfast").xml(f"@{tmp_stage_name}/{test_file_books_xml}")
+
+
+def test_read_xml_with_custom_schema(session):


can we have one simple case for nested data? books2.xml seems to be a nested one

sfc-gh-aling

LGTM.
can you also help address the test coverage before merging it?

sfc-gh-yuwang added 5 commits December 11, 2025 13:28

xml custom schema placeholder

e30e179

make custom_schema accessible from xml_reader

c628665

Merge branch 'main' into SNOW-2408964

c715863

add logic to user custom schema(minus type mapping)

2bac9f6

add test and full logic

1c84a9f

sfc-gh-yuwang marked this pull request as ready for review December 22, 2025 23:20

sfc-gh-yuwang requested review from a team as code owners December 22, 2025 23:20

sfc-gh-yuwang requested review from sfc-gh-jdu, sfc-gh-jrose and sfc-gh-yixie December 22, 2025 23:20

sfc-gh-yuwang added 3 commits December 23, 2025 10:01

fix test

24d0ae8

fix lint

9fbb032

Merge branch 'main' into SNOW-2408964

dbfa060

sfc-gh-yuwang requested review from sfc-gh-aalam and sfc-gh-aling January 5, 2026 22:28

Merge branch 'main' into SNOW-2408964

9d65d5e

sfc-gh-aalam reviewed Jan 5, 2026

View reviewed changes

sfc-gh-aling reviewed Jan 5, 2026

View reviewed changes

src/snowflake/snowpark/dataframe_reader.py Show resolved Hide resolved

sfc-gh-aling reviewed Jan 5, 2026

View reviewed changes

src/snowflake/snowpark/_internal/xml_reader.py Outdated Show resolved Hide resolved

address comments

ec48972

graphite-app bot reviewed Jan 6, 2026

View reviewed changes

update test

b2437d3

sfc-gh-aling reviewed Jan 6, 2026

View reviewed changes

src/snowflake/snowpark/_internal/xml_reader.py Show resolved Hide resolved

add a hint for BCR

50676d1

sfc-gh-aling reviewed Jan 6, 2026

View reviewed changes

add test for nested xml

137a2bb

sfc-gh-yuwang requested review from sfc-gh-aalam and sfc-gh-aling January 6, 2026 17:35

sfc-gh-aling approved these changes Jan 6, 2026

View reviewed changes

sfc-gh-yuwang added 2 commits January 6, 2026 11:35

add test

de63eac

fix lint

8b8573c

sfc-gh-aalam approved these changes Jan 6, 2026

View reviewed changes

sfc-gh-yuwang merged commit 1713ffe into main Jan 6, 2026
28 of 29 checks passed

sfc-gh-yuwang deleted the SNOW-2408964 branch January 6, 2026 20:50

github-actions bot locked and limited conversation to collaborators Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SNOW-2408964: support custom_schema when reading xml #4033

SNOW-2408964: support custom_schema when reading xml #4033

Uh oh!

sfc-gh-yuwang commented Dec 11, 2025 •

edited

Loading

Uh oh!

sfc-gh-aalam Jan 5, 2026

Uh oh!

sfc-gh-aalam Jan 5, 2026

Uh oh!

sfc-gh-yuwang Jan 5, 2026

Uh oh!

sfc-gh-aalam Jan 5, 2026

Uh oh!

sfc-gh-yuwang Jan 5, 2026 •

edited

Loading

Uh oh!

sfc-gh-aalam Jan 6, 2026

Uh oh!

Uh oh!

Uh oh!

graphite-app bot Jan 6, 2026

Uh oh!

Uh oh!

sfc-gh-aling Jan 6, 2026 •

edited

Loading

Uh oh!

sfc-gh-aling left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	raise ValueError("When read XML with user schema, rowtag must be set.")
	raise ValueError("When reading XML with user schema, rowtag must be set.")

		return None


		def generate_norm_column_name_to_ori_column_name_dict(result: dict):

		).option("mode", "failfast").xml(f"@{tmp_stage_name}/{test_file_books_xml}")


		def test_read_xml_with_custom_schema(session):

SNOW-2408964: support custom_schema when reading xml #4033

SNOW-2408964: support custom_schema when reading xml #4033

Uh oh!

Conversation

sfc-gh-yuwang commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfc-gh-aalam Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-aalam Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-yuwang Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-aalam Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-yuwang Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfc-gh-aalam Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

graphite-app bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sfc-gh-aling Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfc-gh-aling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sfc-gh-yuwang commented Dec 11, 2025 •

edited

Loading

sfc-gh-yuwang Jan 5, 2026 •

edited

Loading

sfc-gh-aling Jan 6, 2026 •

edited

Loading