-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-48629: [R] Add test coverage for joins with duplicate columns with removing a todo #48630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
e086a4c to
a4eb975
Compare
jonkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I say this in more detail in the comment below, but we do have tests for some aspects of joins with duplicate columns, though I will admit they aren't named very well and follow a slightly non-standard parttern. Mind using this PR to clean those up as we add slightly more coverage (e.g. duplicate columns with no (explicit) suffix additions)?
Also, in the description, you mention
added a comment about type casting because it seems it disallows the type cast. Here is example error message:
But I don't think I see that? Would you mind either leaving it or creating a test that asserts the failure with a note explaining what's going on?
Thanks!
| # Test with custom suffixes | ||
| compare_dplyr_binding( | ||
| .input |> | ||
| left_join(right_dup, by = "x", suffix = c("_left", "_right")) |> | ||
| collect(), | ||
| left_dup | ||
| ) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test especially is effectively duplicated by
arrow/r/tests/testthat/test-dplyr-join.R
Lines 320 to 338 in 744f0ec
| test_that("suffix", { | |
| left_suf <- Table$create( | |
| key = c(1, 2), | |
| left_unique = c(2.1, 3.1), | |
| shared = c(10.1, 10.3) | |
| ) | |
| right_suf <- Table$create( | |
| key = c(1, 2, 3, 10, 20), | |
| right_unique = c(1.1, 1.2, 3.1, 4.1, 4.3), | |
| shared = c(20.1, 30, 40, 50, 60) | |
| ) | |
| join_op <- inner_join(left_suf, right_suf, by = "key", suffix = c("_left", "_right")) | |
| output <- collect(join_op) | |
| res_col_names <- names(output) | |
| expected_col_names <- c("key", "left_unique", "shared_left", "right_unique", "shared_right") | |
| expect_equal(expected_col_names, res_col_names) | |
| }) |
Though I will admit that the tests there do not follow the established pattern you follow here. Would you mind updating merging these two tests together? You can use your fixture for that purpose if you like, but if you do, would you mind adding a column with floats and also making the key column more obvious (e.g. by naming it key or to_join or something like that?
I'm happy to help describe in more detail if that's helpful!
Rationale for this change
Two TODOs in
test-dplyr-join.R(lines 191-192) requested test coverage for edge cases in join operations. These TODOs were added in commit c273ea7.What changes are included in this PR?
Added one test case to
r/tests/testthat/test-dplyr-join.Rfor duplicate columns, and added a comment about type casting because it seems it disallows the type cast. Here is example error message:Therefore this PR removes that todo.
Are these changes tested?
Yes, tests were added.
Are there any user-facing changes?
No, test-only.