Skip to content

Conversation

@elliot-barn
Copy link
Contributor

using depset for data tfds ci tests

Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
@elliot-barn elliot-barn requested review from a team as code owners December 10, 2025 19:26
Signed-off-by: elliot-barn <[email protected]>
@elliot-barn elliot-barn added the go add ONLY when ready to merge, run all tests label Dec 10, 2025
@elliot-barn elliot-barn changed the title [ci][data] depset for tfds tests [ci][data] tensorflow-datasets tests: depset for tfds tests (3/3) Dec 10, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully sets up a dedicated CI job for tensorflow-datasets tests on Python 3.12. The changes include creating a new Docker image, defining dependencies with depset, and refactoring the tests into a new file. The CI configuration is updated correctly to isolate these tests. My only suggestion is to remove a duplicated test case in the new test file to improve maintainability.

Comment on lines 12 to 29
def test_from_tf(ray_start_regular_shared_2_cpus):
import tensorflow as tf
import tensorflow_datasets as tfds

tf_dataset = tfds.load("mnist", split=["train"], as_supervised=True)[0]
tf_dataset = tf_dataset.take(8) # Use subset to make test run faster.

ray_dataset = ray.data.from_tf(tf_dataset)

actual_data = extract_values("item", ray_dataset.take_all())
expected_data = list(tf_dataset)
assert len(actual_data) == len(expected_data)
for (expected_features, expected_label), (actual_features, actual_label) in zip(
expected_data, actual_data
):
tf.debugging.assert_equal(expected_features, actual_features)
tf.debugging.assert_equal(expected_label, actual_label)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test test_from_tf appears to be redundant. Its logic is fully contained within test_from_tf_e2e, which performs the same data validation checks plus additional assertions on the execution plan and usage records. To reduce code duplication and improve maintainability, consider removing test_from_tf.

@pytest.mark.skipif(
sys.version_info >= (3, 12),
reason="Skip due to incompatibility tensorflow with Python 3.12+",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just remove this entire test. it is a duplicate.

Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
@ray-gardener ray-gardener bot added data Ray Data-related issues devprod labels Dec 11, 2025
@elliot-barn elliot-barn requested a review from aslonnie December 11, 2025 23:08
@elliot-barn elliot-barn requested a review from aslonnie December 12, 2025 18:20
@aslonnie
Copy link
Collaborator

merging this without @ray-project/ray-data approval; forgive me

@aslonnie aslonnie merged commit 663eff1 into master Dec 12, 2025
6 checks passed
@aslonnie aslonnie deleted the elliot-barn/depset-for-tfds-tests branch December 12, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues devprod go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants