-
Notifications
You must be signed in to change notification settings - Fork 185
Description
Summary
I successfully trained a model using deepmd-kit. Now I want to run dpgen autotest for calculating physical properties.
I follow the dpgen document, and have prepared relaxation.json and machine_local.json.
I run
dpgen autotest make relaxation_T.json
It successfully works.
Then I run
dpgen autotest run relaxation_T.json machine_local.json
It comes out an error.
It seems that dpgen is trying to submit jobs, but I am running it on my local shell. I think that there should not be job submissions.
The error calls "unexpected submission state".
I put my json files here.
machine_local.json
relaxation_T.json
I would like to know if there is any mistakes in the json files, and how can I solve it.
DP-GEN Version
0.12.1
Platform, Python Version, etc
Platform: WSL Ubuntu 22.04
Python version: 3.10.13
Details
DeepModeling
Version: 0.12.1
Path: /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgenDependency
numpy 1.26.4 /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/numpy dpdata 0.2.18 /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdatapymatgen unknown version or path
monty 2024.4.17 /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/monty
ase 3.22.1 /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/ase
paramiko 3.4.0 /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/paramiko
custodian 2024.4.18 /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/custodianReference
Please cite:
Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E,
DP-GEN: A concurrent learning platform for the generation of reliable deep learning
based potential energy models, Computer Physics Communications, 2020, 107206.Description
/home/lijh/HfO2/4phase-200w/autotest --> Runing...
2024-05-14 15:53:04,453 - INFO : info:check_all_finished: False
2024-05-14 15:53:04,457 - INFO : job: b910e4a6be4620f8b89f5ed1af23cab264b0e786 submit; job_id is 31369
2024-05-14 15:53:35,592 - INFO : job: b910e4a6be4620f8b89f5ed1af23cab264b0e786 31369 terminated; fail_cout is 1; resubmitting job
2024-05-14 15:53:35,642 - INFO : job:b910e4a6be4620f8b89f5ed1af23cab264b0e786 re-submit after terminated; new job_id is 31708
2024-05-14 15:53:35,851 - INFO : job:b910e4a6be4620f8b89f5ed1af23cab264b0e786 job_id:31708 after re-submitting; the state now is <JobStatus.running: 3>
2024-05-14 15:54:05,986 - INFO : job: b910e4a6be4620f8b89f5ed1af23cab264b0e786 31708 terminated; fail_cout is 2; resubmitting job
2024-05-14 15:54:06,029 - INFO : job:b910e4a6be4620f8b89f5ed1af23cab264b0e786 re-submit after terminated; new job_id is 32098
2024-05-14 15:54:06,238 - INFO : job:b910e4a6be4620f8b89f5ed1af23cab264b0e786 job_id:32098 after re-submitting; the state now is <JobStatus.running: 3>
2024-05-14 15:54:36,367 - INFO : job: b910e4a6be4620f8b89f5ed1af23cab264b0e786 32098 terminated; fail_cout is 3; resubmitting job
Traceback (most recent call last):
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/submission.py", line 358, in handle_unexpected_submission_state
job.handle_unexpected_job_state()
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/submission.py", line 862, in handle_unexpected_job_state
raise RuntimeError(err_msg)
RuntimeError: job:b910e4a6be4620f8b89f5ed1af23cab264b0e786 32098 failed 3 times.
Possible remote error message: ==> /home/lijh/HfO2/4phase-200w/autotest/work/b279de7e9ede8ee9b4d5502ff7df4cc95cbe3866/confs/T_phase/relaxation/relax_task/errlog <==The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/lijh/anaconda3/envs/deepmd/bin/dpgen", line 8, in
sys.exit(main())
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/main.py", line 255, in main
args.func(args)
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/auto_test/run.py", line 58, in gen_test
run_task(args.TASK, args.PARAM, args.MACHINE)
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/auto_test/run.py", line 34, in run_task
run_equi(confs, inter_parameter, mdata)
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/auto_test/common_equi.py", line 197, in run_equi
submission.run_submission()
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/submission.py", line 261, in run_submission
self.handle_unexpected_submission_state()
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/submission.py", line 362, in handle_unexpected_submission_state
raise RuntimeError(
RuntimeError: Meet errors will handle unexpected submission state.
Debug information: remote_root==/home/lijh/HfO2/4phase-200w/autotest/work/b279de7e9ede8ee9b4d5502ff7df4cc95cbe3866.
Debug information: submission_hash==b279de7e9ede8ee9b4d5502ff7df4cc95cbe3866.
Please check error messages above and in remote_root. The submission information is saved in /home/lijh/.dpdispatcher/submission/b279de7e9ede8ee9b4d5502ff7df4cc95cbe3866.json.
For furthur actions, run the following command with proper flags: dpdisp submission b279de7e9ede8ee9b4d5502ff7df4cc95cbe3866