Skip to content

Conversation

@sfc-gh-yuwang
Copy link
Collaborator

@sfc-gh-yuwang sfc-gh-yuwang commented Dec 11, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-2408964

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    Here is a doc to talk about the behavior of XML read with user schema in snowpark:
    https://docs.google.com/document/d/1W3os2SGbO_M89TjIa7xVuNrQ0RDDuGMHEOBHFuw0OYw/edit?tab=t.0

@sfc-gh-yuwang sfc-gh-yuwang marked this pull request as ready for review December 22, 2025 23:20
@sfc-gh-yuwang sfc-gh-yuwang requested review from a team as code owners December 22, 2025 23:20
and format.lower() == "xml"
and XML_ROW_TAG_STRING not in self._cur_options
):
raise ValueError("When read XML with user schema, rowtag must be set.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("When read XML with user schema, rowtag must be set.")
raise ValueError("When reading XML with user schema, rowtag must be set.")

return None


def generate_norm_column_name_to_ori_column_name_dict(result: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this really need to be a function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, I removed this function

# cast to input custom schema type
if self._user_schema:
cols = [
df[single_quote(field._name)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using single_quote here?

Copy link
Collaborator Author

@sfc-gh-yuwang sfc-gh-yuwang Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently all column name is produced with a single quote around,
I am not sure why we are having this behavior but I think we should not change now as it would be a BCR
single quote is added here so that we can correctly read the column

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I think I remember Jianzhun mentioning we use PIVOT and it snowflake pivot adds single quote for string, and datetime columns. This could break since server side is planning to do a BCR to remove quotes.

Comment on lines 624 to 627
with pytest.raises(
ValueError, match="When read XML with user schema, rowtag must be set."
):
session.read.schema(user_schema).xml(f"@{tmp_stage_name}/{test_file_books_xml}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test regex pattern mismatch with actual error message. The test expects "When read XML with user schema, rowtag must be set." but the actual error message in dataframe_reader.py:1392 is "When reading XML with user schema, rowtag must be set." (note "reading" vs "read"). This test will fail.

# Fix: Update the match pattern to match the actual error message
with pytest.raises(
    ValueError, match="When reading XML with user schema, rowtag must be set."
):
Suggested change
with pytest.raises(
ValueError, match="When read XML with user schema, rowtag must be set."
):
session.read.schema(user_schema).xml(f"@{tmp_stage_name}/{test_file_books_xml}")
with pytest.raises(
ValueError, match="When reading XML with user schema, rowtag must be set."
):
session.read.schema(user_schema).xml(f"@{tmp_stage_name}/{test_file_books_xml}")

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

).option("mode", "failfast").xml(f"@{tmp_stage_name}/{test_file_books_xml}")


def test_read_xml_with_custom_schema(session):
Copy link
Contributor

@sfc-gh-aling sfc-gh-aling Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have one simple case for nested data? books2.xml seems to be a nested one

Copy link
Contributor

@sfc-gh-aling sfc-gh-aling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
can you also help address the test coverage before merging it?

@sfc-gh-yuwang sfc-gh-yuwang merged commit 1713ffe into main Jan 6, 2026
28 of 29 checks passed
@sfc-gh-yuwang sfc-gh-yuwang deleted the SNOW-2408964 branch January 6, 2026 20:50
@github-actions github-actions bot locked and limited conversation to collaborators Jan 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants