-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Deprecate apex/transformer #1968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
cc @crcrpar |
|
gentle ping @crcrpar |
|
I think it'd be better to remove apex.transformer rather, following the existing deprecation warning -- apex/apex/transformer/parallel_state.py Lines 213 to 218 in db8e053
|
|
Ah okay that makes sense! I'll make some commits on my branch to just remove apex transformer |
|
remember to remove the relevant tests as well |
for more information, see https://pre-commit.ci
…li/apex into Remove_Unused_Functions
|
@crcrpar I just made the changes needed to deprecate transformer. I removed the whole transformer folder, as well as one reference to the transformer module in the apex/apex directory. I made sure any tests referencing transformer were deleted, and also modified the test harness to not search for the transformer directory to run tests from. Feel free to check and provide your thoughts! The branch is upstream with main as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request removes unused functions and test files from the apex.transformer module. The main changes involve:
- Removing two unused functions (
init_checkpointed_activations_memory_bufferandreset_checkpointed_activations_memory_buffer) and a global variable (_CHECKPOINTED_ACTIVATIONS_MEMORY_BUFFER) fromrandom.py - Completely removing the entire
apex.transformermodule including all test files and supporting code - Removing the transformer test directory from the test runner configuration
Reviewed changes
Copilot reviewed 66 out of 71 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| apex/transformer/tensor_parallel/random.py | Deleted entire file containing the unused functions mentioned in PR description |
| apex/transformer/tensor_parallel/init.py | Deleted file that exported the removed functions |
| apex/transformer/tensor_parallel/* | Deleted all other tensor parallel implementation files |
| apex/transformer/testing/* | Deleted all testing utilities and test infrastructure |
| apex/transformer/utils.py | Deleted utility functions file |
| tests/L0/run_transformer/* | Deleted all transformer-related test files |
| tests/L1/transformer/* | Deleted L1 transformer test file |
| tests/L0/run_test.py | Removed "run_transformer" from default test directories |
Note: The scope of this PR appears to be much larger than described. The PR description mentions removing only 2 functions and 1 global variable from random.py, but the actual changes delete the entire apex.transformer module and all associated tests. This is a breaking change that removes significant functionality from the codebase.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@crcrpar gentle ping, hope you had a good new year! let me know if you have any feedback or if it's good to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are this and other contrib tests removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed tests under contrib that still referenced apex.transformer, and since I just deprecated it this would cause errors. Would you like me to just keep the contrib tests and skip them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on which submodules of apex.transformer are referenced; if it's testing utilities then the utilities should be moved somewhere and the tests depending on it should be kept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh okay I see I'll go ahead and fix that then, thanks
for more information, see https://pre-commit.ci
| logging.getLogger("apex").setLevel(logging.WARNING) | ||
|
|
||
|
|
||
| class TransformerUtilsTest(NcclDistributedTestBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, consider bringing this test back and the tensor paralell/paralell state utils, as the test is specifically testing transformer utils
| if self.world_size % tensor_model_parallel_world_size: | ||
| continue | ||
| msg = f"tensor_model_parallel_world_size: {tensor_model_parallel_world_size}" | ||
| parallel_state.initialize_model_parallel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guessing it's fine to remove here since we're on deprecated behavior, not utils
|
Update: Contrib tests were referencing utils from apex/transformer, so I moved the utils to a new directory named apex/distributed_testing (since the utils referenced were related to NCCL based distributed testing) Reviewing the L0 and L1 tests removed again to see if any should be brought back |
|
@crcrpar I was able to bring back tests under contrib because they only used utils related to distributed test base classes. I moved the utilities to a new folder, and those contrib tests work now. However, there are some L0 and L1 tests related to transformers that use utils but also reference some of the core functionality from apex/transformer. An example is L0/run_transfomrer/test_bert_minimal.py. The file uses the UccDistributedTestBase and NcclDistributedTestBase classes for utils, but then tests apex/transformer based functionality referring to BERT. In the case where utils and core transformer functionality is being tested, would you like those tests removed, or modified to still test the utils? |
|
This PR removes test_bert_minimal.py, right? |
|
Yes, it removes that tests and all other tests in L0 and L1 that refer to apex.transformer, since they test core functionality. Feel free to take a look at the PR now, I've updated it to move the utils needed by contrib tests to a new folder. |
In random.py, there are several functions which have a TODO to remove the functions due to being unused by megatron-lm. However, these functions still have not been removed.
In this PR, I removed the unused functions named 'init_checkpointed_activations_memory_buffer' and 'reset_checkpointed_activations_memory_buffer', as well as the global variable '_CHECKPOINTED_ACTIVATIONS_MEMORY_BUFFER'