Skip to content

Conversation

@syedshazli
Copy link

In random.py, there are several functions which have a TODO to remove the functions due to being unused by megatron-lm. However, these functions still have not been removed.

In this PR, I removed the unused functions named 'init_checkpointed_activations_memory_buffer' and 'reset_checkpointed_activations_memory_buffer', as well as the global variable '_CHECKPOINTED_ACTIVATIONS_MEMORY_BUFFER'

@syedshazli
Copy link
Author

cc @crcrpar

@syedshazli
Copy link
Author

gentle ping @crcrpar

@crcrpar
Copy link
Collaborator

crcrpar commented Dec 23, 2025

I think it'd be better to remove apex.transformer rather, following the existing deprecation warning --

deprecated_warning(
"`apex.transformer` is deprecated and will be removed in September 2025. "
"We encourage you to migrate to Megatron Core. "
"It is available on PyPI at https://pypi.org/project/megatron-core/ "
"and its documentation can be found at https://docs.nvidia.com/megatron-core/developer-guide/latest/index.html."
)

@syedshazli
Copy link
Author

Ah okay that makes sense! I'll make some commits on my branch to just remove apex transformer

@crcrpar
Copy link
Collaborator

crcrpar commented Dec 23, 2025

remember to remove the relevant tests as well

@syedshazli
Copy link
Author

@crcrpar I just made the changes needed to deprecate transformer.

I removed the whole transformer folder, as well as one reference to the transformer module in the apex/apex directory. I made sure any tests referencing transformer were deleted, and also modified the test harness to not search for the transformer directory to run tests from.

Feel free to check and provide your thoughts! The branch is upstream with main as well

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request removes unused functions and test files from the apex.transformer module. The main changes involve:

  • Removing two unused functions (init_checkpointed_activations_memory_buffer and reset_checkpointed_activations_memory_buffer) and a global variable (_CHECKPOINTED_ACTIVATIONS_MEMORY_BUFFER) from random.py
  • Completely removing the entire apex.transformer module including all test files and supporting code
  • Removing the transformer test directory from the test runner configuration

Reviewed changes

Copilot reviewed 66 out of 71 changed files in this pull request and generated no comments.

Show a summary per file
File Description
apex/transformer/tensor_parallel/random.py Deleted entire file containing the unused functions mentioned in PR description
apex/transformer/tensor_parallel/init.py Deleted file that exported the removed functions
apex/transformer/tensor_parallel/* Deleted all other tensor parallel implementation files
apex/transformer/testing/* Deleted all testing utilities and test infrastructure
apex/transformer/utils.py Deleted utility functions file
tests/L0/run_transformer/* Deleted all transformer-related test files
tests/L1/transformer/* Deleted L1 transformer test file
tests/L0/run_test.py Removed "run_transformer" from default test directories

Note: The scope of this PR appears to be much larger than described. The PR description mentions removing only 2 functions and 1 global variable from random.py, but the actual changes delete the entire apex.transformer module and all associated tests. This is a breaking change that removes significant functionality from the codebase.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@syedshazli
Copy link
Author

@crcrpar gentle ping, hope you had a good new year! let me know if you have any feedback or if it's good to go

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are this and other contrib tests removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed tests under contrib that still referenced apex.transformer, and since I just deprecated it this would cause errors. Would you like me to just keep the contrib tests and skip them?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on which submodules of apex.transformer are referenced; if it's testing utilities then the utilities should be moved somewhere and the tests depending on it should be kept

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh okay I see I'll go ahead and fix that then, thanks

@syedshazli syedshazli changed the title Remove Unused Functions in apex\apex\transformer\tensor_parallel\random.py Deprecate apex/transformer Jan 8, 2026
logging.getLogger("apex").setLevel(logging.WARNING)


class TransformerUtilsTest(NcclDistributedTestBase):
Copy link
Author

@syedshazli syedshazli Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, consider bringing this test back and the tensor paralell/paralell state utils, as the test is specifically testing transformer utils

if self.world_size % tensor_model_parallel_world_size:
continue
msg = f"tensor_model_parallel_world_size: {tensor_model_parallel_world_size}"
parallel_state.initialize_model_parallel(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guessing it's fine to remove here since we're on deprecated behavior, not utils

@syedshazli
Copy link
Author

Update: Contrib tests were referencing utils from apex/transformer, so I moved the utils to a new directory named apex/distributed_testing (since the utils referenced were related to NCCL based distributed testing)

Reviewing the L0 and L1 tests removed again to see if any should be brought back

@syedshazli
Copy link
Author

@crcrpar I was able to bring back tests under contrib because they only used utils related to distributed test base classes. I moved the utilities to a new folder, and those contrib tests work now.

However, there are some L0 and L1 tests related to transformers that use utils but also reference some of the core functionality from apex/transformer. An example is L0/run_transfomrer/test_bert_minimal.py.

The file uses the UccDistributedTestBase and NcclDistributedTestBase classes for utils, but then tests apex/transformer based functionality referring to BERT.

In the case where utils and core transformer functionality is being tested, would you like those tests removed, or modified to still test the utils?

@crcrpar
Copy link
Collaborator

crcrpar commented Jan 10, 2026

apex.transformer and tests for it should be removed.

However, there are some L0 and L1 tests related to transformers that use utils but also reference some of the core functionality from apex/transformer. An example is L0/run_transfomrer/test_bert_minimal.py.

This PR removes test_bert_minimal.py, right?

@syedshazli
Copy link
Author

@crcrpar

Yes, it removes that tests and all other tests in L0 and L1 that refer to apex.transformer, since they test core functionality.

Feel free to take a look at the PR now, I've updated it to move the utils needed by contrib tests to a new folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants