Skip to content

Commit 1163713

Browse files
authored
v1.2.0
- fixed contact filter for peptides, which was causing very short peptide binders to be rejected - avoid the saving of duplicate sequences during MPNN redesign, which is very unlikely but can happen when designing very short peptides, where multiple trajectories converge to the same sequence - added extensive checks for the installation script to make sure each step is completed fully before proceeding - removed default anaconda channel dependency - added libgfortran5 to installation requirements - added live trajectory and accepted design counters to the colab notebook - fixed hydrophobicity calculation for binder surface, there was a bug where the surface taken into account was from the whole complex instead of just the binder alone - colab target settings are now saved in the design output folder on Google drive and can be reloaded for continuing the design campaign - added options into settings_advanced jsons to manually set AF2 params directory, or dssp path or dalphaball path. If left empty, it will set the default installation paths - added more relaxed filter settings for normal proteins and peptides - added more advanced setting files allowing to redesign interface with MPNN, as well as increased flexibility of the target by masking the template sequence during design and reprediction - fixed mpnn sequence generation where batch size did not correspond to number of generated sequences
1 parent b5e1bc1 commit 1163713

25 files changed

+5199
-36177
lines changed

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ First you need to clone this repository. Replace **[install_folder]** with the p
1212

1313
The navigate into your install folder using *cd* and run the installation code. BindCraft requires a CUDA-compatible Nvidia graphics card to run. In the *cuda* setting, please specify the CUDA version compatible with your graphics card, for example '11.8'. If unsure, leave blank but it's possible that the installation might select the wrong version, which will lead to errors. In *pkg_manager* specify whether you are using 'mamba' or 'conda', if left blank it will use 'conda' by default.
1414

15+
Note: This install script will install PyRosetta, which requires a license for commercial purposes.
16+
1517
`bash install_bindcraft.sh --cuda '12.4' --pkg_manager 'conda'`
1618

1719
## Google Colab
@@ -39,16 +41,16 @@ number_of_final_designs -> how many designs that pass all filters to aim for,
3941
```
4042
Then run the binder design script:
4143

42-
`sbatch ./bindcraft.slurm --settings './settings_target/PDL1.json' --filters './settings_filters/default_filters.json' --advanced './settings_advanced/4stage_multimer.json'`
44+
`sbatch ./bindcraft.slurm --settings './settings_target/PDL1.json' --filters './settings_filters/default_filters.json' --advanced './settings_advanced/default_4stage_multimer.json'`
4345

44-
The *settings* flag should point to your target .json which you set above. The *filters* flag points to the json where the design filters are specified (default is ./filters/default_filters.json). The *advanced* flag points to your advanced settings (default is ./advanced_settings/4stage_multimer.json). If you leave out the filters and advanced settings flags it will automatically point to the defaults.
46+
The *settings* flag should point to your target .json which you set above. The *filters* flag points to the json where the design filters are specified (default is ./filters/default_filters.json). The *advanced* flag points to your advanced settings (default is ./advanced_settings/default_4stage_multimer.json). If you leave out the filters and advanced settings flags it will automatically point to the defaults.
4547

4648
Alternatively, if your machine does not support SLURM, you can run the code directly by activating the environment in conda and running the python code:
4749

4850
```
4951
conda activate BindCraft
5052
cd /path/to/bindcraft/folder/
51-
python -u ./bindcraft.py --settings './settings_target/PDL1.json' --filters './settings_filters/default_filters.json' --advanced './settings_advanced/4stage_multimer.json'
53+
python -u ./bindcraft.py --settings './settings_target/PDL1.json' --filters './settings_filters/default_filters.json' --advanced './settings_advanced/default_4stage_multimer.json'
5254
```
5355

5456
**We recommend to generate at least a 100 final designs passing all filters, then order the top 5-20 for experimental characterisation.** If high affinity binders are required, it is better to screen more, as the ipTM metric used for ranking is not a good predictor for affinity, but has been shown to be a good binary predictor of binding.

bindcraft.py

Lines changed: 53 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
help='Path to the basic settings.json file. Required.')
1616
parser.add_argument('--filters', '-f', type=str, default='./settings_filters/default_filters.json',
1717
help='Path to the filters.json file used to filter design. If not provided, default will be used.')
18-
parser.add_argument('--advanced', '-a', type=str, default='./settings_advanced/4stage_multimer.json',
18+
parser.add_argument('--advanced', '-a', type=str, default='./settings_advanced/default_4stage_multimer.json',
1919
help='Path to the advanced.json file with additional design settings. If not provided, default will be used.')
2020

2121
args = parser.parse_args()
@@ -33,11 +33,9 @@
3333
### load AF2 model settings
3434
design_models, prediction_models, multimer_validation = load_af2_models(advanced_settings["use_multimer_design"])
3535

36-
### set package settings
36+
### perform checks on advanced_settings
3737
bindcraft_folder = os.path.dirname(os.path.realpath(__file__))
38-
advanced_settings["af_params_dir"] = bindcraft_folder
39-
advanced_settings["dssp_path"] = os.path.join(bindcraft_folder, 'functions/dssp')
40-
advanced_settings["dalphaball_path"] = os.path.join(bindcraft_folder, 'functions/DAlphaBall.gcc')
38+
advanced_settings = perform_advanced_settings_check(advanced_settings, bindcraft_folder)
4139

4240
### generate directories, design path names can be found within the function
4341
design_paths = generate_directories(target_settings["design_path"])
@@ -66,7 +64,6 @@
6664
script_start_time = time.time()
6765
trajectory_n = 1
6866
accepted_designs = 0
69-
rejected_designs = 0
7067

7168
### start design loop
7269
while True:
@@ -170,44 +167,47 @@
170167

171168
### MPNN redesign of starting binder
172169
mpnn_trajectories = mpnn_gen_sequence(trajectory_pdb, binder_chain, trajectory_interface_residues, advanced_settings)
170+
existing_mpnn_sequences = set(pd.read_csv(mpnn_csv, usecols=['Sequence'])['Sequence'].values)
171+
172+
# create set of MPNN sequences with allowed amino acid composition
173+
restricted_AAs = set(aa.strip().upper() for aa in advanced_settings["omit_AAs"].split(',')) if advanced_settings["force_reject_AA"] else set()
174+
175+
mpnn_sequences = sorted({
176+
mpnn_trajectories['seq'][n][-length:]: {
177+
'seq': mpnn_trajectories['seq'][n][-length:],
178+
'score': mpnn_trajectories['score'][n],
179+
'seqid': mpnn_trajectories['seqid'][n]
180+
} for n in range(advanced_settings["num_seqs"])
181+
if (not restricted_AAs or not any(aa in mpnn_trajectories['seq'][n][-length:].upper() for aa in restricted_AAs))
182+
and mpnn_trajectories['seq'][n][-length:] not in existing_mpnn_sequences
183+
}.values(), key=lambda x: x['score'])
184+
185+
del existing_mpnn_sequences
186+
187+
# check whether any sequences are left after amino acid rejection and duplication check, and if yes proceed with prediction
188+
if mpnn_sequences:
189+
# add optimisation for increasing recycles if trajectory is beta sheeted
190+
if advanced_settings["optimise_beta"] and float(trajectory_beta) > 15:
191+
advanced_settings["num_recycles_validation"] = advanced_settings["optimise_beta_recycles_valid"]
192+
193+
### Compile prediction models once for faster prediction of MPNN sequences
194+
clear_mem()
195+
# compile complex prediction model
196+
complex_prediction_model = mk_afdesign_model(protocol="binder", num_recycles=advanced_settings["num_recycles_validation"], data_dir=advanced_settings["af_params_dir"],
197+
use_multimer=multimer_validation)
198+
complex_prediction_model.prep_inputs(pdb_filename=target_settings["starting_pdb"], chain=target_settings["chains"], binder_len=length, rm_target_seq=advanced_settings["rm_template_seq_predict"],
199+
rm_target_sc=advanced_settings["rm_template_sc_predict"])
200+
201+
# compile binder monomer prediction model
202+
binder_prediction_model = mk_afdesign_model(protocol="hallucination", use_templates=False, initial_guess=False,
203+
use_initial_atom_pos=False, num_recycles=advanced_settings["num_recycles_validation"],
204+
data_dir=advanced_settings["af_params_dir"], use_multimer=multimer_validation)
205+
binder_prediction_model.prep_inputs(length=length)
206+
207+
# iterate over designed sequences
208+
for mpnn_sequence in mpnn_sequences:
209+
mpnn_time = time.time()
173210

174-
# whether to hard reject sequences with excluded amino acids
175-
if advanced_settings["force_reject_AA"]:
176-
restricted_AAs = set(advanced_settings["omit_AAs"].split(','))
177-
mpnn_sequences = [{'seq': mpnn_trajectories['seq'][n][-length:], 'score': mpnn_trajectories['score'][n], 'seqid': mpnn_trajectories['seqid'][n]}
178-
for n in range(advanced_settings["num_seqs"])
179-
if not any(restricted_AA in mpnn_trajectories['seq'][n] for restricted_AA in restricted_AAs)]
180-
else:
181-
mpnn_sequences = [{'seq': mpnn_trajectories['seq'][n][-length:], 'score': mpnn_trajectories['score'][n], 'seqid': mpnn_trajectories['seqid'][n]}
182-
for n in range(advanced_settings["num_seqs"])]
183-
184-
# sort MPNN sequences by lowest MPNN score
185-
mpnn_sequences.sort(key=lambda x: x['score'])
186-
187-
# add optimisation for increasing recycles if trajectory is beta sheeted
188-
if advanced_settings["optimise_beta"] and float(trajectory_beta) > 15:
189-
advanced_settings["num_recycles_validation"] = advanced_settings["optimise_beta_recycles_valid"]
190-
191-
### Compile prediction models once for faster prediction of MPNN sequences
192-
clear_mem()
193-
# compile complex prediction model
194-
complex_prediction_model = mk_afdesign_model(protocol="binder", num_recycles=advanced_settings["num_recycles_validation"], data_dir=advanced_settings["af_params_dir"],
195-
use_multimer=multimer_validation)
196-
complex_prediction_model.prep_inputs(pdb_filename=target_settings["starting_pdb"], chain=target_settings["chains"], binder_len=length, rm_target_seq=advanced_settings["rm_template_seq_predict"],
197-
rm_target_sc=advanced_settings["rm_template_sc_predict"])
198-
199-
# compile binder monomer prediction model
200-
binder_prediction_model = mk_afdesign_model(protocol="hallucination", use_templates=False, initial_guess=False,
201-
use_initial_atom_pos=False, num_recycles=advanced_settings["num_recycles_validation"],
202-
data_dir=advanced_settings["af_params_dir"], use_multimer=multimer_validation)
203-
binder_prediction_model.prep_inputs(length=length)
204-
205-
# iterate over designed sequences
206-
for mpnn_sequence in mpnn_sequences:
207-
mpnn_time = time.time()
208-
209-
# compile sequences dictionary with scores and remove duplicate sequences
210-
if mpnn_sequence['seq'] not in [v['seq'] for v in mpnn_dict.values()]:
211211
# generate mpnn design name numbering
212212
mpnn_design_name = design_name + "_mpnn" + str(mpnn_n)
213213
mpnn_score = round(mpnn_sequence['score'],2)
@@ -230,6 +230,7 @@
230230
# if AF2 filters are not passed then skip the scoring
231231
if not pass_af2_filters:
232232
print(f"Base AF2 filters not passed for {mpnn_design_name}, skipping interface scoring")
233+
mpnn_n += 1
233234
continue
234235

235236
# calculate statistics for each model individually
@@ -415,14 +416,17 @@
415416
# if enough mpnn sequences of the same trajectory pass filters then stop
416417
if accepted_mpnn >= advanced_settings["max_mpnn_sequences"]:
417418
break
419+
420+
if accepted_mpnn >= 1:
421+
print("Found "+str(accepted_mpnn)+" MPNN designs passing filters")
422+
print("")
418423
else:
419-
print("Skipping duplicate sequence")
424+
print("No accepted MPNN designs found for this trajectory.")
425+
print("")
420426

421-
if accepted_mpnn >= 1:
422-
print("Found "+str(accepted_mpnn)+" MPNN designs passing filters")
423427
else:
424-
print("No accepted MPNN designs found for this trajectory.")
425-
rejected_designs += 1
428+
print('Duplicate MPNN designs sampled with different trajectory, skipping current trajectory optimisation')
429+
print("")
426430

427431
# save space by removing unrelaxed design trajectory PDB
428432
if advanced_settings["remove_unrelaxed_trajectory"]:
@@ -447,4 +451,4 @@
447451
### Script finished
448452
elapsed_time = time.time() - script_start_time
449453
elapsed_text = f"{'%d hours, %d minutes, %d seconds' % (int(elapsed_time // 3600), int((elapsed_time % 3600) // 60), int(elapsed_time % 60))}"
450-
print("Finished all designs. Script execution for "+str(trajectory_n)+" trajectories took: "+elapsed_text)
454+
print("Finished all designs. Script execution for "+str(trajectory_n)+" trajectories took: "+elapsed_text)

bindcraft.slurm

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@
1010
#SBATCH --output=bindcraft_%A.log
1111

1212
# Initialise environment and modules
13-
conda activate BindCraft
13+
CONDA_BASE=$(conda info --base)
14+
source ${CONDA_BASE}/bin/activate ${CONDA_BASE}/envs/BindCraft
15+
export LD_LIBRARY_PATH=${CONDA_BASE}/lib
1416

1517
# alternatively you can source the environment directly
1618
#source /path/to/mambaforge/bin/activate /path/to/mambaforge/envs/BindCraft

functions/biopython_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,4 +233,4 @@ def calculate_percentages(total, helix, sheet):
233233
sheet_percentage = round((sheet / total) * 100,2) if total > 0 else 0
234234
loop_percentage = round(((total - helix - sheet) / total) * 100,2) if total > 0 else 0
235235

236-
return helix_percentage, sheet_percentage, loop_percentage
236+
return helix_percentage, sheet_percentage, loop_percentage

functions/colabdesign_utils.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -357,7 +357,7 @@ def mpnn_gen_sequence(trajectory_pdb, binder_chain, trajectory_interface_residue
357357
mpnn_model.prep_inputs(pdb_filename=trajectory_pdb, chain=design_chains, fix_pos=fixed_positions, rm_aa=advanced_settings["omit_AAs"])
358358

359359
# sample MPNN sequences in parallel
360-
mpnn_sequences = mpnn_model.sample_parallel(temperature=advanced_settings["sampling_temp"], num=advanced_settings["num_seqs"], batch=advanced_settings["sample_seq_parallel"])
360+
mpnn_sequences = mpnn_model.sample(temperature=advanced_settings["sampling_temp"], num=advanced_settings["num_seqs"], batch=advanced_settings["num_seqs"])
361361

362362
return mpnn_sequences
363363

@@ -474,4 +474,4 @@ def plot_trajectory(af_model, design_name, design_paths):
474474
plt.savefig(os.path.join(design_paths["Trajectory/Plots"], design_name+"_"+metric+".png"), dpi=150)
475475

476476
# Close the figure
477-
plt.close()
477+
plt.close()

functions/generic_utils.py

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -225,10 +225,35 @@ def perform_input_check(args):
225225

226226
# Set a random advanced json settings file if not provided
227227
if not args.advanced:
228-
args.advanced = os.path.join(binder_script_path, 'settings_advanced', '4stage_multimer.json')
228+
args.advanced = os.path.join(binder_script_path, 'settings_advanced', 'default_4stage_multimer.json')
229229

230230
return args.settings, args.filters, args.advanced
231231

232+
# check specific advanced settings
233+
def perform_advanced_settings_check(advanced_settings, bindcraft_folder):
234+
# set paths to model weights and executables
235+
if bindcraft_folder == "colab":
236+
advanced_settings["af_params_dir"] = '/content/bindcraft/params/'
237+
advanced_settings["dssp_path"] = '/content/bindcraft/functions/dssp'
238+
advanced_settings["dalphaball_path"] = '/content/bindcraft/functions/DAlphaBall.gcc'
239+
else:
240+
# Set paths individually if they are not already set
241+
if not advanced_settings["af_params_dir"]:
242+
advanced_settings["af_params_dir"] = bindcraft_folder
243+
if not advanced_settings["dssp_path"]:
244+
advanced_settings["dssp_path"] = os.path.join(bindcraft_folder, 'functions', 'dssp')
245+
if not advanced_settings["dalphaball_path"]:
246+
advanced_settings["dalphaball_path"] = os.path.join(bindcraft_folder, 'functions', 'DAlphaBall.gcc')
247+
248+
# check formatting of omit_AAs setting
249+
omit_aas = advanced_settings["omit_AAs"]
250+
if advanced_settings["omit_AAs"] in [None, False, '']:
251+
advanced_settings["omit_AAs"] = None
252+
elif isinstance(advanced_settings["omit_AAs"], str):
253+
advanced_settings["omit_AAs"] = advanced_settings["omit_AAs"].strip()
254+
255+
return advanced_settings
256+
232257
# Load settings from JSONs
233258
def load_json_settings(settings_json, filters_json, advanced_json):
234259
# load settings from json files

functions/pyrosetta_utils.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,17 +99,19 @@ def score_interface(pdb_file, binder_chain="B"):
9999
interface_binder_fraction = 0
100100

101101
# calculate surface hydrophobicity
102+
binder_pose = {pose.pdb_info().chain(pose.conformation().chain_begin(i)): p for i, p in zip(range(1, pose.num_chains()+1), pose.split_by_chain())}[binder_chain]
103+
102104
layer_sel = pr.rosetta.core.select.residue_selector.LayerSelector()
103105
layer_sel.set_layers(pick_core = False, pick_boundary = False, pick_surface = True)
104-
surface_res = layer_sel.apply(pose)
106+
surface_res = layer_sel.apply(binder_pose)
105107

106108
exp_apol_count = 0
107109
total_count = 0
108110

109111
# count apolar and aromatic residues at the surface
110112
for i in range(1, len(surface_res) + 1):
111113
if surface_res[i] == True:
112-
res = pose.residue(i)
114+
res = binder_pose.residue(i)
113115

114116
# count apolar and aromatic residues as hydrophobic
115117
if res.is_apolar() == True or res.name() == 'PHE' or res.name() == 'TRP' or res.name() == 'TYR':

0 commit comments

Comments
 (0)