-
-
Notifications
You must be signed in to change notification settings - Fork 762
Description
When I run # run aggregator server
bash scripts/run_fedml_server.sh "$RUN_ID"
run client(s)
bash scripts/run_fedml_client.sh 1 "$RUN_ID"
bash scripts/run_fedml_client.sh 2 "$RUN_ID"
bash scripts/run_fedml_client.sh 3 "$RUN_ID"
I have the error as follows:
File "/home//anaconda3/envs/fedllm/lib/python3.10/site-packages/fedml/cross_silo/client/fedml_trainer.py", line 83, in train weights = self.trainer.get_model_params() File "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/run_fedllm.py", line 325, in get_model_params peft_state_dict = load_checkpoint(self.latest_checkpoint_dir) File "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/run_fedllm.py", line 238, in load_checkpoint raise FileNotFoundError( FileNotFoundError: Could not find either PEFT checkpoint in "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/.logs/FedML/1111/node_2/round_0_before_agg/adapter_model.bin" nor full checkpoint in /gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/.logs/FedML/1111/node_2/round_0_before_agg/pytorch_model.bin. [2024-07-30 15:00:46,590] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 3673085
Could someone help me with the issue? Thanks!