General Multi-turn FrozenLake #429

pan-x-c · 2025-12-04T10:58:33Z

Description

As the title says

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist · 2025-12-04T10:58:52Z

Summary of Changes

Hello @pan-x-c, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive example for a multi-turn FrozenLake agent, integrating it with the Agentscope framework. The changes encompass the creation of a specialized agent, a customized environment, and supporting utilities, all configured via a dedicated YAML file. The primary goal is to showcase how an AI agent can effectively navigate and solve the FrozenLake problem within a structured agent framework.

Highlights

New FrozenLake Example: Introduced a new example demonstrating a multi-turn FrozenLake agent using the Agentscope framework, complete with a dedicated README.
Custom Agent Implementation: Added a FrozenLakeAgent class that leverages Agentscope's ReActAgent to interpret observations, generate prompts, and select actions within the FrozenLake environment.
Enhanced Environment Definition: Created a custom FrozenLakeEnv class, extending Gymnasium's FrozenLake, which includes improved rendering capabilities, custom action mapping, and detailed state management.
Configuration File: Included a YAML configuration file (frozenlake_agent.yaml) to define the parameters for the FrozenLake agent, covering algorithm, model, cluster, buffer, explorer, trainer, and synchronizer settings.
Utility Functions: Provided utility functions in utils.py for generating valid random FrozenLake maps, checking path validity, and defining the system prompt and structured action format for the agent.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new example for a multi-turn FrozenLake agent using the agentscope framework. The implementation is spread across several new files, including the agent logic, environment wrapper, workflow, and utilities. While this is a valuable addition, I've identified a critical bug in the agent's step counting logic and a high-severity issue with how it handles invalid actions, both of which would prevent the agent from functioning correctly. I've also pointed out several medium-severity issues related to rendering, documentation, and code clarity. Please address these points to ensure the example is robust and correct.

examples/agentscope_frozenlake/agent.py

examples/agentscope_frozenlake/env.py

examples/agentscope_frozenlake/workflow.py

pan-x-c · 2025-12-09T05:43:56Z

/unittest-all

github-actions · 2025-12-09T07:10:02Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
210	204	3	3	0	0	1h 24m

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode	The test failed in the call phase due to an exception
❌ tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	The test failed in the call phase
❌ tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	The test failed in the call phase

Skipped

Tests	Status
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter	skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	skipped ⏭️

Tests

Test Name	Status	Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo	✅	40ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage	✅	2ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	5ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	3ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	2ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	2ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_with_std_threshold	✅	2ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_abs_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_fallback	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_loss	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_same_policy	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_with_old_logprob	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_dummy_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k1_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k2_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k3_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_kl_loss_aggregation_modes	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_low_var_kl_fn	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss_with_sequence_masking	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sapo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	✅	25.4s
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation	✅	16.2s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer	✅	4.1s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft	✅	5.9s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo	✅	6.8s
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	157ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	4.3s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter	✅	527ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter	✅	450ms
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter	✅	876ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter	✅	969ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter	✅	711ms
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter	✅	218ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	9.1s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	4.8s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control	✅	7.0s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	5.8s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	6.1s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	6.7s
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration	✅	614ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage	✅	8ms
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write	✅	4.1s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write	✅	4.4s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0	✅	92ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1	✅	71ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2	✅	113ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3	✅	111ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4	✅	112ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5	✅	116ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6	✅	131ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple	✅	58ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file	✅	73ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql	✅	4.2s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file	✅	53ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql	✅	4.2s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file	✅	52ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql	✅	4.8s
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode	❌	45.3s
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command	✅	6.3s
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc	✅	1.4s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command	✅	326ms
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run	✅	1.7s
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	33.9s
tests/common/config_test.py::TestConfig::test_chat_template_path	✅	94ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	42ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	192ms
tests/common/config_test.py::TestConfig::test_default_workflow	✅	93ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	3.2s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly	✅	99ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation	✅	94ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster	✅	362ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	16ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	53.4s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	31.8s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	43.0s
tests/common/vllm_test.py::TestModelLen_0::test_model_len	✅	15.5s
tests/common/vllm_test.py::TestModelLen_1::test_model_len	✅	15.4s
tests/common/vllm_test.py::TestModelLen_2::test_model_len	✅	15.2s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len	✅	15.6s
tests/common/vllm_test.py::TestAPIServer::test_api	✅	21.1s
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api	✅	16.1s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	21.0s
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	256ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	238ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	17.8s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	15.7s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate	✅	3m 7s
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	1m 15s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer	✅	1m 43s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	3m 37s
tests/explorer/explorer_test.py::ServeTest::test_serve	✅	1m 22s
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	✅	12.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	12.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout	✅	20.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	27.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0	✅	12.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1	✅	12.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0	✅	12.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1	✅	12.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	12.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	12.7s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait	✅	16.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	22.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	24.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	15.7s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	32.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	15.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	21.3s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection	✅	17.3s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0	✅	2ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1	✅	602ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0	✅	2ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1	✅	1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	35ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	25ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	691ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	4ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	14ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	8ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	108ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1	✅	101ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1	✅	202ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	✅	14.6s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	✅	14.9s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording	✅	4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter	⏭️	1ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner	✅	301ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state	✅	8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai	✅	16.8s
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	1m 6s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	2m 16s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	2m 4s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	2m 45s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	2m 58s
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	❌	38.5s
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	❌	38.1s
tests/service/data_juicer_test.py::TestDataJuicer::test_config	✅	1.8s
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start	✅	21.6s
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators	✅	32.0s
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline	✅	14.1s
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	✅	3m 31s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	✅	4m 35s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	1m 31s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	✅	1m 24s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	✅	1m 23s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	✅	1m 23s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	✅	1m 33s
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	2m 30s
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	1m 2s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	58.5s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	✅	59.5s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	✅	1m 53s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	✅	1m 51s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	✅	2m 37s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	✅	2m 17s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer	✅	4m 26s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	2m 36s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	⏭️	810ms
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	⏭️	807ms
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	✅	3m 51s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer	✅	1m 18s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer	✅	1m 11s
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent	✅	15ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth	✅	2ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution	✅	2ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	4ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	75ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	6ms
tests/utils/log_test.py::LogTest::test_actor_log	✅	5.2s
tests/utils/log_test.py::LogTest::test_group_by_node	✅	4.9s
tests/utils/log_test.py::LogTest::test_no_actor_log	✅	906ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_0__workspace_tests_utils_plugins	✅	99ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_1_tests_utils_plugins	✅	95ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_0__workspace_tests_utils_plugins	✅	22.3s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_1_tests_utils_plugins	✅	21.6s
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_0__workspace_tests_utils_plugins	✅	12.0s
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_1_tests_utils_plugins	✅	11.7s
tests/utils/registry_test.py::TestRegistry::test_dynamic_import	✅	4.1s

Github Test Reporter by CTRF 💚

pan-x-c · 2025-12-09T07:50:45Z

/unittest-module-cli

pan-x-c · 2025-12-09T07:50:54Z

/unittest-module-manager

github-actions · 2025-12-09T07:54:11Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
5	5	0	0	0	0	1m 23s

Tests

Test Name	Status	Duration
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode	✅	55.3s
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command	✅	6.8s
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc	✅	1.4s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command	✅	315ms
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run	✅	1.7s

Github Test Reporter by CTRF 💚

github-actions · 2025-12-09T08:10:22Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
7	7	0	0	0	0	13m 47s

Tests

Test Name	Status	Duration
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	1m 7s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	1m 50s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	1m 53s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	2m 35s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	2m 32s
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	✅	1m 44s
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	✅	1m 45s

Github Test Reporter by CTRF 💚

examples/agentscope_frozenlake/README.md

pan-x-c · 2025-12-10T14:00:18Z

/unittest-all

github-actions · 2025-12-10T15:27:23Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
210	205	2	3	0	0	1h 24m

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	The test failed in the call phase
❌ tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	The test failed in the call phase

Skipped

Tests	Status
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter	skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	skipped ⏭️

Tests

Test Name	Status	Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo	✅	41ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage	✅	2ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	5ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	3ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	2ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	2ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_with_std_threshold	✅	2ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_abs_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_fallback	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_loss	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_same_policy	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_with_old_logprob	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_dummy_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k1_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k2_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k3_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_kl_loss_aggregation_modes	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_low_var_kl_fn	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss_with_sequence_masking	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sapo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	✅	25.8s
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation	✅	16.0s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer	✅	3.9s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft	✅	6.2s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo	✅	6.7s
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	157ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	4.5s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter	✅	519ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter	✅	469ms
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter	✅	883ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter	✅	963ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter	✅	716ms
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter	✅	218ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	8.9s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	5.1s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control	✅	6.8s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	6.0s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	6.0s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	6.6s
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration	✅	613ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage	✅	6ms
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write	✅	3.9s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write	✅	4.4s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0	✅	95ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1	✅	71ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2	✅	110ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3	✅	111ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4	✅	111ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5	✅	116ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6	✅	131ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple	✅	59ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file	✅	73ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql	✅	4.2s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file	✅	52ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql	✅	4.1s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file	✅	52ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql	✅	4.5s
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode	✅	49.2s
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command	✅	6.4s
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc	✅	1.4s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command	✅	323ms
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run	✅	1.8s
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	34.0s
tests/common/config_test.py::TestConfig::test_chat_template_path	✅	96ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	42ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	194ms
tests/common/config_test.py::TestConfig::test_default_workflow	✅	97ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	3.8s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly	✅	96ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation	✅	95ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster	✅	359ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	16ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	53.4s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	32.3s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	42.2s
tests/common/vllm_test.py::TestModelLen_0::test_model_len	✅	15.7s
tests/common/vllm_test.py::TestModelLen_1::test_model_len	✅	15.5s
tests/common/vllm_test.py::TestModelLen_2::test_model_len	✅	15.8s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len	✅	15.5s
tests/common/vllm_test.py::TestAPIServer::test_api	✅	20.4s
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api	✅	15.5s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	21.1s
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	246ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	253ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	17.7s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	16.0s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate	✅	3m 7s
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	1m 37s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer	✅	1m 44s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	3m 40s
tests/explorer/explorer_test.py::ServeTest::test_serve	✅	1m 20s
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	✅	12.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	11.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout	✅	20.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	27.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0	✅	12.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1	✅	12.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0	✅	12.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1	✅	12.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	12.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	12.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait	✅	16.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	22.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	23.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	15.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	32.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	15.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	21.3s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection	✅	17.9s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0	✅	2ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1	✅	603ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1	✅	1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	15ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	24ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	265ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	6ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	17ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	10ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	110ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1	✅	101ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1	✅	201ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	✅	14.4s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	✅	14.6s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording	✅	4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter	⏭️	1ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner	✅	300ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state	✅	8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai	✅	16.4s
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	1m 5s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	2m 16s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	2m 22s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	2m 46s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	2m 48s
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	❌	38.2s
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	❌	38.7s
tests/service/data_juicer_test.py::TestDataJuicer::test_config	✅	1.7s
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start	✅	21.6s
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators	✅	31.4s
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline	✅	14.1s
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	✅	3m 30s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	✅	4m 39s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	1m 37s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	✅	1m 23s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	✅	1m 23s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	✅	1m 22s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	✅	1m 31s
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	2m 31s
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	1m 1s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	58.3s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	✅	57.1s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	✅	1m 51s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	✅	1m 50s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	✅	2m 38s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	✅	2m 19s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer	✅	4m 20s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	2m 34s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	⏭️	810ms
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	⏭️	808ms
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	✅	3m 51s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer	✅	1m 19s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer	✅	1m 11s
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent	✅	15ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth	✅	2ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution	✅	2ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	4ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	76ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	6ms
tests/utils/log_test.py::LogTest::test_actor_log	✅	5.1s
tests/utils/log_test.py::LogTest::test_group_by_node	✅	4.8s
tests/utils/log_test.py::LogTest::test_no_actor_log	✅	904ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_0__workspace_tests_utils_plugins	✅	100ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_1_tests_utils_plugins	✅	97ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_0__workspace_tests_utils_plugins	✅	21.8s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_1_tests_utils_plugins	✅	22.0s
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_0__workspace_tests_utils_plugins	✅	12.0s
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_1_tests_utils_plugins	✅	11.8s
tests/utils/registry_test.py::TestRegistry::test_dynamic_import	✅	4.3s

Github Test Reporter by CTRF 💚

pan-x-c · 2025-12-11T02:08:43Z

/unittest-module-manager

github-actions · 2025-12-11T02:24:33Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
7	7	0	0	0	0	13m 45s

Tests

Test Name	Status	Duration
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	1m 3s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	1m 48s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	1m 55s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	2m 38s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	2m 31s
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	✅	1m 45s
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	✅	1m 45s

Github Test Reporter by CTRF 💚

trinity/explorer/workflow_runner.py

trinity/common/verl_config.py

add multiturn frozenlake

2a3f719

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

hiyuchang reviewed Dec 4, 2025

View reviewed changes

examples/agentscope_frozenlake/workflow.py Show resolved Hide resolved

pan-x-c added 8 commits December 4, 2025 20:35

fix comments

eb9cb97

fix comments

a56e6b2

change prompt

62877ad

fix pre-commit

0fbfb09

Merge branch 'main' into feature/multi_turn_frozen_lake

95769c6

fix frozenlake agent

114974f

update agentscope dependencies

6ba356e

fix sql storage

3bf1977

pan-x-c added 2 commits December 9, 2025 15:49

fix prepare

d790921

fix pre-commit

f776121

update yaml

a86abd6

pan-x-c changed the title ~~[WIP] General Multi-turn FrozenLake~~ General Multi-turn FrozenLake Dec 9, 2025

pan-x-c added 2 commits December 9, 2025 20:13

update viewer

72daf66

update frozen lake readme

b902418

hiyuchang reviewed Dec 10, 2025

View reviewed changes

examples/agentscope_frozenlake/README.md Outdated Show resolved Hide resolved

pan-x-c added 3 commits December 10, 2025 19:29

update readme

995a3bc

fix log

dd62717

support trainer occupying partial nodes

267d43e

fix tests

38b2f10

fix config

0bd1fad

chenyushuo reviewed Dec 11, 2025

View reviewed changes

trinity/explorer/workflow_runner.py Show resolved Hide resolved

chenyushuo reviewed Dec 11, 2025

View reviewed changes

trinity/common/verl_config.py Show resolved Hide resolved

pan-x-c added 3 commits December 11, 2025 11:21

fix config manager

1a6a038

Merge branch 'main' into feature/multi_turn_frozen_lake

9f5b036

use shorter name

67d8c2a

chenyushuo approved these changes Dec 11, 2025

View reviewed changes

fix comments

d3a78fa

hiyuchang approved these changes Dec 11, 2025

View reviewed changes

hiyuchang merged commit f3f0846 into modelscope:main Dec 11, 2025
2 checks passed

General Multi-turn FrozenLake #429

General Multi-turn FrozenLake #429

Uh oh!

Conversation

pan-x-c commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pan-x-c commented Dec 9, 2025

Uh oh!

github-actions bot commented Dec 9, 2025

Summary

Failed Tests

Skipped

Tests

Uh oh!

pan-x-c commented Dec 9, 2025

Uh oh!

pan-x-c commented Dec 9, 2025

Uh oh!

github-actions bot commented Dec 9, 2025

Summary

Tests

Uh oh!

github-actions bot commented Dec 9, 2025

Summary

Tests

Uh oh!

Uh oh!

pan-x-c commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Summary

Failed Tests

Skipped

Tests

Uh oh!

pan-x-c commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 11, 2025

Summary

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pan-x-c commented Dec 4, 2025 •

edited

Loading