Skip to content

missing "adapter_model.bin" or "pytorch_model.bin" #2215

@Jiaxu-Zhao

Description

@Jiaxu-Zhao

When I run # run aggregator server
bash scripts/run_fedml_server.sh "$RUN_ID"

run client(s)

bash scripts/run_fedml_client.sh 1 "$RUN_ID"
bash scripts/run_fedml_client.sh 2 "$RUN_ID"
bash scripts/run_fedml_client.sh 3 "$RUN_ID"

I have the error as follows:

File "/home//anaconda3/envs/fedllm/lib/python3.10/site-packages/fedml/cross_silo/client/fedml_trainer.py", line 83, in train weights = self.trainer.get_model_params() File "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/run_fedllm.py", line 325, in get_model_params peft_state_dict = load_checkpoint(self.latest_checkpoint_dir) File "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/run_fedllm.py", line 238, in load_checkpoint raise FileNotFoundError( FileNotFoundError: Could not find either PEFT checkpoint in "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/.logs/FedML/1111/node_2/round_0_before_agg/adapter_model.bin" nor full checkpoint in /gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/.logs/FedML/1111/node_2/round_0_before_agg/pytorch_model.bin. [2024-07-30 15:00:46,590] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 3673085

Could someone help me with the issue? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions