Please answer these questions before submitting your issue. Thanks!
-
What version of Python are you using?
Python 3.11.3 (main, Mar 1 2024, 15:38:32) [GCC 11.4.0]
-
What are the Snowpark Python and pandas versions in the environment?
pandas==2.3.0
snowflake-snowpark-python==1.31.1
- What did you do?
def test__snowpark_bug(self) -> None:
col_a: str = "COL_A"
col_b: str = "COL_B"
value_col: str = "VAL"
df = self.snowpark_client.session.create_dataframe(
[
[1, 1, 1],
[2, 2, 1],
[2, 2, 1],
[2, 1, 1],
],
[col_a, col_b, value_col]
)
window_a = Window.partition_by(col_a)
window_both = Window.partition_by(col_b, col_a)
df = df.with_columns(["over_a", "over_both"],
[spf.sum(value_col).over(window_a),
spf.sum(value_col).over(window_both)])
df.show()
- What did you expect to see?
output online test:
------------------------------------------------------
|"COL_A" |"COL_B" |"VAL" |"OVER_A" |"OVER_BOTH" |
------------------------------------------------------
|1 |1 |1 |1 |1 |
|2 |2 |1 |3 |2 |
|2 |2 |1 |3 |2 |
|2 |1 |1 |3 |1 |
------------------------------------------------------
output local test:
------------------------------------------------------
|"COL_A" |"COL_B" |"VAL" |"OVER_A" |"OVER_BOTH" |
------------------------------------------------------
|1 |1 |1 |1.0 |1.0 |
|2 |2 |1 |3.0 |1.0 |
|2 |2 |1 |3.0 |2.0 |
|2 |1 |1 |3.0 |2.0 |
------------------------------------------------------
The value for 'OVER_BOTH' are wrong, e.g. the last row should be 1.0, since we group over both COL_A and COL_B, i.e. the last row is its own group, and has sum 1.
Please answer these questions before submitting your issue. Thanks!
What version of Python are you using?
Python 3.11.3 (main, Mar 1 2024, 15:38:32) [GCC 11.4.0]
What are the Snowpark Python and pandas versions in the environment?
pandas==2.3.0
snowflake-snowpark-python==1.31.1
output online test:
output local test:
The value for 'OVER_BOTH' are wrong, e.g. the last row should be 1.0, since we group over both COL_A and COL_B, i.e. the last row is its own group, and has sum 1.