-
Notifications
You must be signed in to change notification settings - Fork 7k
[ci][data] tensorflow-datasets tests: depset for tfds tests (3/3) #59354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully sets up a dedicated CI job for tensorflow-datasets tests on Python 3.12. The changes include creating a new Docker image, defining dependencies with depset, and refactoring the tests into a new file. The CI configuration is updated correctly to isolate these tests. My only suggestion is to remove a duplicated test case in the new test file to improve maintainability.
| def test_from_tf(ray_start_regular_shared_2_cpus): | ||
| import tensorflow as tf | ||
| import tensorflow_datasets as tfds | ||
|
|
||
| tf_dataset = tfds.load("mnist", split=["train"], as_supervised=True)[0] | ||
| tf_dataset = tf_dataset.take(8) # Use subset to make test run faster. | ||
|
|
||
| ray_dataset = ray.data.from_tf(tf_dataset) | ||
|
|
||
| actual_data = extract_values("item", ray_dataset.take_all()) | ||
| expected_data = list(tf_dataset) | ||
| assert len(actual_data) == len(expected_data) | ||
| for (expected_features, expected_label), (actual_features, actual_label) in zip( | ||
| expected_data, actual_data | ||
| ): | ||
| tf.debugging.assert_equal(expected_features, actual_features) | ||
| tf.debugging.assert_equal(expected_label, actual_label) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_from_tf appears to be redundant. Its logic is fully contained within test_from_tf_e2e, which performs the same data validation checks plus additional assertions on the execution plan and usage records. To reduce code duplication and improve maintainability, consider removing test_from_tf.
| @pytest.mark.skipif( | ||
| sys.version_info >= (3, 12), | ||
| reason="Skip due to incompatibility tensorflow with Python 3.12+", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just remove this entire test. it is a duplicate.
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
|
merging this without @ray-project/ray-data approval; forgive me |
using depset for data tfds ci tests