Use rdkit for SSSR and RCs (bug fix + Python upgrade)#2796
Use rdkit for SSSR and RCs (bug fix + Python upgrade)#2796JacksonBurns wants to merge 87 commits intomainfrom
rdkit for SSSR and RCs (bug fix + Python upgrade)#2796Conversation
99bf00c to
8d6fe64
Compare
|
Cantera 2.6 isn't available for Python 3.12, so this PR will also need to upgrade the Cantera version to 3 as mostly completed in #2751 |
|
A note for the path forward on this PR - the |
|
I ended up moving all the functions in |
|
Addressed some of the failures, but still a few nontrivial challenges: TODO list:
The tests are just failing mainly for the aromatic compounds, indicating there's some issue with the implementation of the code. Since everything runs, the code "flow" seems clean, just need to iron out the implementation. |
|
We are now very close to passing all of the unit tests. There are just a couple major bugs to resolve, namely:
|
|
One of the last failing tests is @rwest, since you're the last remaining dev who was involved in some capacity with this part of the code, do you know why we have that assert statement? I would think it's better to just plot a "dot" than to outright crash out. While we're at it, the |
I don't. I can see that if there is no straight chain backbone that a function to
I haven't investigated the full stack trace. But some safeguard, or exception handling, sounds appropriate.
We can try it. I expect RDKit at the time wasn't being helpful. Hopefully the commit messages offer clues? (this is why we should write helpful commit messages that explain why things are being done). |
|
Is this related? |
regarding the draw unit test@rwest yes, thanks, #2744 looks relevant. The test that's failing is for this charged species: maybe failing due to the missing X-O connectivity, and so no backbone? Edit: I tried modifying the connectivity s.t. there is an apparent backbone. But, I still run into this problem even after changing the adjacency list to So it is more fundamentally a problem with the species , than a unit-test specific one. Note about sanitizationFor posterity: some molecules that fail the first kekulization step include: As implemented in this PR, the |
|
Regarding the The test case is drawn as a cyclic molecule, even though it's not truly "cyclic". So it should take the However, it thinks that So, how to proceed? a few options...
|
|
I think using RDKit for drawing as much as possible is fine, if it does a good job. Probably we ran into issues in the past that may not persist with more recent versions of RDKit, so it's fine to revisit past decisions. We should continue to put reasons for things in commit messages, to make life easier for future developers (our future selves included). |
|
OK. Maybe we can incorporate the fix into #2838 and then rebase onto |
In ReactionMechanismGenerator#2744 and ReactionMechanismGenerator#2796 it was found that charge-separated bidentate species can have issues due to ring perception conflicts. The previous implementation also by default did not use the rdkit backend for charged species, but this was decided many years ago (~10 years!) In the meantime, RDKit conformer generation has improved and likely this we can just use RDKit by default, which would avoid the pesky edge-case issues for ions/zwitterions. In case the old behavior is desired, use_rdkit can be set to False.
In ReactionMechanismGenerator#2744 and ReactionMechanismGenerator#2796 it was found that charge-separated bidentate species can have issues due to ring perception conflicts. The previous implementation also by default did not use the rdkit backend for charged species, but this was decided many years ago (~10 years!) In the meantime, RDKit conformer generation has improved and likely this we can just use RDKit by default, which would avoid the pesky edge-case issues for ions/zwitterions. In case the old behavior is desired, use_rdkit can be set to False.
7598c98 to
019362c
Compare
In ReactionMechanismGenerator#2744 and ReactionMechanismGenerator#2796 it was found that charge-separated bidentate species can have issues due to ring perception conflicts. The previous implementation also by default did not use the rdkit backend for charged species, but this was decided many years ago (~10 years!) In the meantime, RDKit conformer generation has improved and likely this we can just use RDKit by default, which would avoid the pesky edge-case issues for ions/zwitterions. In case the old behavior is desired, use_rdkit can be set to False.
6e6490a to
e92bec0
Compare
In #2744 and #2796 it was found that charge-separated bidentate species can have issues due to ring perception conflicts. The previous implementation also by default did not use the rdkit backend for charged species, but this was decided many years ago (~10 years!) In the meantime, RDKit conformer generation has improved and likely this we can just use RDKit by default, which would avoid the pesky edge-case issues for ions/zwitterions. In case the old behavior is desired, use_rdkit can be set to False.
|
With the merger of e92bec0 this PR's CI should now pass - let's see! |
|
@jonwzheng the overnight test failures look spurious - re-running them now. |
If you had remove_h = True and sanitize="partial", it would try passing that along to RDKit, which would not expect it. Now we remove H's (without sanitizing) before possibly doing the sanitizing.
So you don't need to create an instance of Fragment() just in order to call the method (seeing as that seems to be how it is mostly used.) Probably it should be a module level function, but this is a minimal change.
Now that we use RDKit to detect rings, we create a lot of rdkit molecules in which we don't care about the bond orders, sometimes we have non-integer or weird bond orders, and we also don't want to waste time sanitizing them since all we care about is connectivity of the molecular graph. So if you know you're calling to_rdkit_mol in this setting, call it with ignore_bond_orders=True and sanitize=False.
This should save complication and time. Also, don't sanitize. All we're going to ask it for is ring information. For that all bonds (that we give it) count. If we want it to ignore H-bonds, or vdW bonds, for some scenarios, then we should figure that out. But often we use this ring detection code for things like drawing, which does want to use those bonds.
This reverts commit 55f8fc4. We should, during ring detection, now be not attempting to convert bond orders into strings, so this warning should not be triggered. We can restore the debug level if this becomes too noisy.
We do a call to detect_cutting_label (which is a complicated function to find them buried inside SMILES strings) passing it the label of every atom. (Though maybe these are mostly ''?). This commit checks whether a simpler approach would work: just checking if the atom.label is either 'R' or 'L'. If that's the case, we can then make the change.
As far as I can tell, cutting labels can only be "L" or "R". I had a previous commit that checked this is equivalent, and by running the unit tests and regression tests could not find a counterexample. This check is much simpler, and about 200 times faster, than the previous call to detect_cutting_label. Also simplified the dictionary structure.
…rnings. Previously, if you call to_rdkit_mol on a Fragment, it would always return the mapping, but if you called it on a Molecule, it would by default not. Now if you call it with return_mapping=False (or not specified) then it doesn't return the mapping, and if you call it with return_mapping=True, then it does. Internally, it calls converter.to_rdkit_mol with return_mapping=True, but doesn't pass it back unless asked. Now the behavior of thing.to_rdkit_mol() is the same whether thing is a Molecule or a Fragment (which is a subclass of Molecule). Should reduce number of warnings, eg. when drawing fragments. Also cleaned up the code a bit, and added some docstrings,
…_rdkit_mol The latest change of to_rdkit_mol means that both Molecule and Fragment return the same signature, and the calling code doesn't need to cope with unexpected tuples.
The comment suggests it was going through Geometry in order to save the mapping, but this can be done with a direct call to to_rdkit_mol using return_mapping=True.
- cache the result of the call to rdkit to avoid remaking the molecule and running the decomposition - use GetSSSR only and deprecate other methods (and do so following conventions correctly) - add a test for the OPTIONAL symmetric SSSR passthrough
68fbf96 to
1aa6de5
Compare
|
RMS install fails on 3.10 and 3.11 because of something to do with PythonCall and RMS |
Based on this: The solution may be to increment Julia version to 1.10 or newer, which would allow us to use the latest versions of PythonCall (that support more versions of Python), and is the minimum supported version according to PythonCall docs. |
…nCall version setting
|
On second thought, Julia actually already is 1.10 - the problem appears to be that when we install |
|
@copilot - the changes in this pull request are mostly complete, except that these two tests are failing: FAILED test/rmgpy/data/thermoTest.py::TestThermoDatabase::test_thermo_estimation_not_affect_database - AssertionError: assert 'PolycyclicRing' in []
FAILED test/rmgpy/data/thermoTest.py::TestCyclicThermo::test_thermo_for_monocyclic_and_polycyclic_same_molecule - assert 0 == 1
+ where 0 = len([])There's something wrong with the detection of polycycles, though the detection of the smallest set of smallest rings appears correct. Can you figure out what is causing this issue, and fix things so that these last two unit tests pass? |

Currently we use
RingDecomposerLibfor finding the Smallest Set of Smallest Rings and getting the Relevant Cycles. This package does not support Python 3.10+ and is thus blocking further upgrades to RMG.@KnathanM in particular is looking to get RMG to Python 3.11 so as to add support for ChemProp v2.
I believe we can just use RDKit to do these operations instead. The original paper mentions that the functionality was being moved upstream to RDKit. With the help of AI I've taken just a first pass at reimplementing, with the special note that:
get_deterministic_sssris not really deterministic #2562This PR will be a draft for now, as it is predicated on Python 3.9 already being available (which it nearly is in #2741)
Motivation or Problem
A clear and concise description of what what you're trying to fix or improve. Please reference any issues that this addresses.
Description of Changes
A clear and concise description of what what you've changed or added.
Testing
A clear and concise description of testing that you've done or plan to do.
Reviewer Tips
Suggestions for verifying that this PR works or other notes for the reviewer.