Skip to content

Commit 51fdac3

Browse files
committed
Merge 'constraint_dev' into 'main'
update constraint api; update cuequivariance to 0.6.1 See merge request: !102
2 parents dcc5a0a + 2f311af commit 51fdac3

File tree

10 files changed

+164
-123
lines changed

10 files changed

+164
-123
lines changed

Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ RUN pip3 install --no-cache-dir \
2121
torchaudio==2.7.1
2222

2323
RUN pip3 install --no-cache-dir \
24-
cuequivariance-ops-torch-cu12==0.6.0 \
25-
cuequivariance-torch==0.6.0
24+
cuequivariance-ops-torch-cu12==0.6.1 \
25+
cuequivariance-torch==0.6.1
2626

2727
RUN pip3 --no-cache-dir install \
2828
scipy==1.16.1 \

docs/docker_installation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
2. Pull the Docker image, which was built based on this [Dockerfile](../Dockerfile)
1414
```bash
15-
docker pull ai4s-cn-beijing.cr.volces.com/infra/protenix:v0.0.6
15+
docker pull ai4s-cn-beijing.cr.volces.com/infra/protenix:v0.0.6.1
1616
```
1717

1818
3. Clone this repository and `cd` into it
@@ -24,7 +24,7 @@
2424

2525
4. Run Docker with an interactive shell
2626
```bash
27-
docker run --gpus all -it -v $(pwd):/workspace -v /dev/shm:/dev/shm ai4s-cn-beijing.cr.volces.com/infra/protenix:v0.0.6 /bin/bash
27+
docker run --gpus all -it -v $(pwd):/workspace -v /dev/shm:/dev/shm ai4s-cn-beijing.cr.volces.com/infra/protenix:v0.0.6.1 /bin/bash
2828
```
2929

3030
After running above commands, you’ll be inside the container’s environment and can execute commands as you would on a normal Linux terminal.

docs/infer_json_format.md

Lines changed: 53 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -222,33 +222,53 @@ The `contact` constraint allows you to specify residue/atom-residue/atom level p
222222
223223
#### contact constraint
224224

225-
The `contact` field consists of a list of dictionaries, each describing one contact. The residues and atoms involved in the contact are now represented as compact lists, making the format more concise and flexible.
225+
The contact field is a list of dictionaries, each defining a distance constraint between two residues or specific atoms. The format uses explicit, named keys for clarity and flexibility.
226226

227227
##### Example:
228228

229229
```json
230230
"contact": [
231231
{
232-
"residue1": ["1", 1, 169],
233-
"atom2": ["2", 1, 1, "C5"],
232+
"entity1": 1,
233+
"copy1": 1,
234+
"position1": 169,
235+
"entity2": 2,
236+
"copy2": 1,
237+
"position2": 1,
238+
"atom2": "C5",
234239
"max_distance": 6,
235240
"min_distance": 0
236241
}, // token-contact
237242
{
238-
"atom1": ["1", 1, 169, "CA"],
239-
"residue2": ["2", 1, 1],
243+
"entity1": 1,
244+
"copy1": 1,
245+
"position1": 169,
246+
"atom1": "CA",
247+
"entity2": 2,
248+
"copy2": 1,
249+
"position2": 1,
240250
"max_distance": 6,
241251
"min_distance": 0
242252
}, // token-contact
243253
{
244-
"residue1": ["1", 1, 169],
245-
"residue2": ["2", 1, 1 ],
254+
"entity1": 1,
255+
"copy1": 1,
256+
"position1": 169,
257+
"entity2": 2,
258+
"copy2": 1,
259+
"position2": 1,
246260
"max_distance": 6,
247261
"min_distance": 0
248262
}, // token-contact
249263
{
250-
"atom1": ["1", 1, 169, "CA"],
251-
"atom2": ["2", 1, 1, "C5"],
264+
"entity1": 1,
265+
"copy1": 1,
266+
"position1": 169,
267+
"atom1": "CA",
268+
"entity2": 2,
269+
"copy2": 1,
270+
"position2": 1,
271+
"atom2": "C5",
252272
"max_distance": 6,
253273
"min_distance": 3
254274
}, // atom-contact
@@ -257,11 +277,17 @@ The `contact` field consists of a list of dictionaries, each describing one cont
257277
```
258278

259279
Each contact dictionary includes the following keys:
260-
* `residue1` or `residue2` (list):
261-
Specifies a **residue** in the format:`[entity_number, copy_index, position]`
280+
* entity1, copy1, position1 (required)
281+
Specifies the first residue: entity (entity number), copy (copy index), position (residue index).
262282

263-
* `atom1` or `atom2` (list):
264-
Specifies an **atom** (commonly from a ligand or another residue) in the format:`[entity_number, copy_index, position, atom_name]`
283+
* atom1 (optional)
284+
Name of the specific atom in the first residue (e.g., "CA", "C5"). If omitted, the distance constraint is applied at the token granularity by default, specifically the central atom of the token.
285+
286+
* entity2, copy2, position2 (required)
287+
Specifies the second residue.
288+
289+
* atom2 (optional)
290+
Specific atom in the second residue.
265291

266292
* `max_distance` (float):
267293
The **expected maximum distance** (in Ångströms) between the specified residues or atoms.
@@ -276,21 +302,29 @@ The `pocket` constraint is defined as a dictionary with three keys: `"binder_cha
276302

277303
```json
278304
"pocket": {
279-
"binder_chain": ["2", 1],
305+
"binder_chain":
306+
{
307+
"entity": 2,
308+
"copy": 1
309+
},
280310
"contact_residues": [
281-
["1", 1, 126],
311+
{
312+
"entity": 1,
313+
"copy": 1,
314+
"position": 126
315+
},
282316
...
283317
],
284318
"max_distance": 6
285319
}
286320
```
287321

288-
* `binder_chain` (list):
289-
Specifies the **binder chain** in the format: `[entity_number, copy_index]`
322+
* `binder_chain` (dict):
323+
Specifies the **binder chain** in the format: `{ "entity": <int>, "copy": <int> }`
290324

291-
* `contact_residues` (list of lists):
325+
* `contact_residues` (list of dict):
292326
A list of residue that are expected to be in spatial proximity (i.e., in or near the binding pocket). Each residue is specified as:
293-
`[entity_number, copy_index, position]`
327+
`{ "entity": <int>, "copy": <int>, "position": <int> }`
294328

295329
* `max_distance` (float):
296330
The **maximum allowed distance** (in Ångströms) between the binder and the specified contact residues.

examples/example_constraint_msa.json

Lines changed: 71 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -52,16 +52,16 @@
5252
"name": "7st3_pocket_1_8",
5353
"constraint": {
5454
"pocket": {
55-
"binder_chain": [
56-
"2",
57-
1
58-
],
55+
"binder_chain": {
56+
"entity": 2,
57+
"copy": 1
58+
},
5959
"contact_residues": [
60-
[
61-
"1",
62-
1,
63-
69
64-
]
60+
{
61+
"entity": 1,
62+
"copy": 1,
63+
"position": 69
64+
}
6565
],
6666
"max_distance": 8
6767
}
@@ -95,68 +95,75 @@
9595
"constraint": {
9696
"contact": [
9797
{
98-
"residue1": [
99-
"1",
100-
1,
101-
72
102-
],
103-
"residue2": [
104-
"2",
105-
1,
106-
103
107-
],
98+
"entity1": 1,
99+
"copy1": 1,
100+
"position1": 72,
101+
"entity2": 2,
102+
"copy2": 1,
103+
"position2": 103,
108104
"max_distance": 15
109105
}
110106
]
111107
}
112108
},
113109
{
114-
"sequences": [
115-
{
116-
"proteinChain": {
117-
"sequence": "MSSPLKNALVTAMLAGGALSSPTKQHVGIPVNASPEVGPGKYSFKQVRNPNYKFNGPLSVKKTYLKYGVPIPAWLEDAVQNSTSGLAERSTGSATTTPIDSLDDAYITPVQIGTPAQTLNLDFDTGSSDLWVFSSETTASEVDGQTIYTPSKSTTAKLLSGATWSISYGDGSSSSGDVYTDTVSVGGLTVTGQAVESAKKVSSSFTEDSTIDGLLGLAFSTLNTVSPTQQKTFFDNAKASLDSPVFTADLGYHAPGTYNFGFIDTTAYTGSITYTAVSTKQGFWEWTSTGYAVGSGTFKSTSIDGIADTGTTLLYLPATVVSAYWAQVSGAKSSSSVGGYVFPCSATLPSFTFGVGSARIVIPGDYIDFGPISTGSSSCFGGIQSSAGIGINIFGDVALKAAFVVFNGATTPTLGFASK",
118-
"count": 1,
119-
"msa": {
120-
"precomputed_msa_dir": "./examples/5sak/1/",
121-
"pairing_db": "uniref100"
122-
}
123-
}
124-
},
125-
{
126-
"ligand": {"ligand": "CCD_ZRY", "count": 1}
127-
}
128-
],
129-
"modelSeeds": [],
130-
"name": "5sak_base"
131-
},
110+
"sequences": [
111+
{
112+
"proteinChain": {
113+
"sequence": "MSSPLKNALVTAMLAGGALSSPTKQHVGIPVNASPEVGPGKYSFKQVRNPNYKFNGPLSVKKTYLKYGVPIPAWLEDAVQNSTSGLAERSTGSATTTPIDSLDDAYITPVQIGTPAQTLNLDFDTGSSDLWVFSSETTASEVDGQTIYTPSKSTTAKLLSGATWSISYGDGSSSSGDVYTDTVSVGGLTVTGQAVESAKKVSSSFTEDSTIDGLLGLAFSTLNTVSPTQQKTFFDNAKASLDSPVFTADLGYHAPGTYNFGFIDTTAYTGSITYTAVSTKQGFWEWTSTGYAVGSGTFKSTSIDGIADTGTTLLYLPATVVSAYWAQVSGAKSSSSVGGYVFPCSATLPSFTFGVGSARIVIPGDYIDFGPISTGSSSCFGGIQSSAGIGINIFGDVALKAAFVVFNGATTPTLGFASK",
114+
"count": 1,
115+
"msa": {
116+
"precomputed_msa_dir": "./examples/5sak/1/",
117+
"pairing_db": "uniref100"
118+
}
119+
}
120+
},
121+
{
122+
"ligand": {
123+
"ligand": "CCD_ZRY",
124+
"count": 1
125+
}
126+
}
127+
],
128+
"modelSeeds": [],
129+
"name": "5sak_base"
130+
},
132131
{
133-
"sequences": [
134-
{
135-
"proteinChain": {
136-
"sequence": "MSSPLKNALVTAMLAGGALSSPTKQHVGIPVNASPEVGPGKYSFKQVRNPNYKFNGPLSVKKTYLKYGVPIPAWLEDAVQNSTSGLAERSTGSATTTPIDSLDDAYITPVQIGTPAQTLNLDFDTGSSDLWVFSSETTASEVDGQTIYTPSKSTTAKLLSGATWSISYGDGSSSSGDVYTDTVSVGGLTVTGQAVESAKKVSSSFTEDSTIDGLLGLAFSTLNTVSPTQQKTFFDNAKASLDSPVFTADLGYHAPGTYNFGFIDTTAYTGSITYTAVSTKQGFWEWTSTGYAVGSGTFKSTSIDGIADTGTTLLYLPATVVSAYWAQVSGAKSSSSVGGYVFPCSATLPSFTFGVGSARIVIPGDYIDFGPISTGSSSCFGGIQSSAGIGINIFGDVALKAAFVVFNGATTPTLGFASK",
137-
"count": 1,
138-
"msa": {
139-
"precomputed_msa_dir": "./examples/5sak/1/",
140-
"pairing_db": "uniref100"
141-
}
142-
}
143-
},
144-
{
145-
"ligand": {"ligand": "CCD_ZRY", "count": 1}
146-
}
147-
],
148-
"modelSeeds": [],
149-
"name": "5sak_atom_1_3_5",
150-
"constraint": {
151-
"contact": [
152-
{
153-
"atom1": ["1", 1, 311, "CG2"],
154-
"atom2": ["2", 1, 1, "C10"],
155-
"max_distance": 5,
156-
"min_distance": 3
132+
"sequences": [
133+
{
134+
"proteinChain": {
135+
"sequence": "MSSPLKNALVTAMLAGGALSSPTKQHVGIPVNASPEVGPGKYSFKQVRNPNYKFNGPLSVKKTYLKYGVPIPAWLEDAVQNSTSGLAERSTGSATTTPIDSLDDAYITPVQIGTPAQTLNLDFDTGSSDLWVFSSETTASEVDGQTIYTPSKSTTAKLLSGATWSISYGDGSSSSGDVYTDTVSVGGLTVTGQAVESAKKVSSSFTEDSTIDGLLGLAFSTLNTVSPTQQKTFFDNAKASLDSPVFTADLGYHAPGTYNFGFIDTTAYTGSITYTAVSTKQGFWEWTSTGYAVGSGTFKSTSIDGIADTGTTLLYLPATVVSAYWAQVSGAKSSSSVGGYVFPCSATLPSFTFGVGSARIVIPGDYIDFGPISTGSSSCFGGIQSSAGIGINIFGDVALKAAFVVFNGATTPTLGFASK",
136+
"count": 1,
137+
"msa": {
138+
"precomputed_msa_dir": "./examples/5sak/1/",
139+
"pairing_db": "uniref100"
140+
}
141+
}
142+
},
143+
{
144+
"ligand": {
145+
"ligand": "CCD_ZRY",
146+
"count": 1
147+
}
148+
}
149+
],
150+
"modelSeeds": [],
151+
"name": "5sak_atom_1_3_5",
152+
"constraint": {
153+
"contact": [
154+
{
155+
"entity1": 1,
156+
"copy1": 1,
157+
"position1": 311,
158+
"atom1": "CG2",
159+
"entity2": 2,
160+
"copy2": 1,
161+
"position2": 1,
162+
"atom2": "C10",
163+
"max_distance": 5,
164+
"min_distance": 3
165+
}
166+
]
157167
}
158-
]
159168
}
160-
}
161-
162169
]

protenix/data/constraint_featurizer.py

Lines changed: 29 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -54,23 +54,26 @@ def _canonicalize_contact_format(
5454
_pair = {}
5555

5656
for id_num in ["1", "2"]:
57-
res_id = f"residue{id_num}"
58-
if (res_info := pair.get(res_id, None)) is not None:
59-
assert len(res_info) == 3, "residue contact should have 3 identifiers"
60-
res_info.append(None) # Add None for atom_name
61-
_pair[f"id{id_num}"] = res_info
62-
63-
atom_id = f"atom{id_num}"
64-
if (atom_info := pair.get(atom_id, None)) is not None:
65-
assert len(atom_info) == 4, "atom contact must have 4 identifiers"
66-
67-
entity_id, atom_name = atom_info[0], atom_info[3]
68-
if isinstance(atom_name, int):
69-
entity_dict = list(sequences[int(entity_id - 1)].values())[0]
70-
assert "atom_map_to_atom_name" in entity_dict
71-
atom_info[3] = entity_dict["atom_map_to_atom_name"][atom_name]
72-
73-
_pair[f"id{id_num}"] = atom_info
57+
res_info = []
58+
for key in ["entity", "copy", "position", "atom"]:
59+
identifier_value = pair.get(f"{key}{id_num}", None)
60+
if identifier_value is None:
61+
assert (
62+
key == "atom"
63+
), "contact should have at least 3 identifiers('entity', 'copy', 'position')"
64+
if key == "atom" and (identifier_value is not None):
65+
if isinstance(identifier_value, int):
66+
entity_dict = list(
67+
sequences[
68+
int(pair.get(f"entity{id_num}", None) - 1)
69+
].values()
70+
)[0]
71+
assert "atom_map_to_atom_name" in entity_dict
72+
identifier_value = entity_dict["atom_map_to_atom_name"][
73+
identifier_value
74+
]
75+
res_info.append(identifier_value)
76+
_pair[f"id{id_num}"] = res_info
7477

7578
if hash(tuple(_pair["id1"][:2])) == hash(tuple(_pair["id2"][:2])):
7679
raise ValueError("A contact pair can not be specified on the same chain")
@@ -86,9 +89,11 @@ def _canonicalize_contact_format(
8689
return _pair
8790

8891
@staticmethod
89-
def _canonicalize_pocket_res_format(binder: list, pocket_pos: list) -> list:
92+
def _canonicalize_pocket_res_format(binder: dict, pocket_pos: dict) -> dict:
9093
assert len(pocket_pos) == 3
91-
if hash(tuple(binder[:2])) == hash(tuple(pocket_pos[:2])):
94+
if hash(tuple([binder["entity"], binder["copy"]])) == hash(
95+
tuple([pocket_pos["entity"], pocket_pos["copy"]])
96+
):
9297
raise ValueError("Pockets can not be the same chain with the binder")
9398
return pocket_pos
9499

@@ -316,8 +321,8 @@ def generate_from_json(
316321

317322
atom_mask_binder = get_atom_mask_by_name(
318323
atom_array=atom_array,
319-
entity_id=binder[0],
320-
copy_id=binder[1],
324+
entity_id=binder["entity"],
325+
copy_id=binder["copy"],
321326
)
322327

323328
binder_asym_id = torch.tensor(
@@ -340,9 +345,9 @@ def generate_from_json(
340345

341346
atom_mask_pocket = get_atom_mask_by_name(
342347
atom_array=atom_array,
343-
entity_id=pocket_res[0],
344-
copy_id=pocket_res[1],
345-
position=pocket_res[2],
348+
entity_id=pocket_res["entity"],
349+
copy_id=pocket_res["copy"],
350+
position=pocket_res["position"],
346351
)
347352
pocket_token_list = atom_to_token_idx[atom_mask_pocket]
348353

0 commit comments

Comments
 (0)