I tried to finetune qwen3-1.7B on implescaling/s1K-1.1_tokenized datasets , by using this repository code , cannot figure out why do I got worse inferrence result ?

S1.1 shows great improvement in agents serving LLM!!!!
I tried to finetune qwen3-1.7B on implescaling/s1K-1.1_tokenized datasets , by using this repository code , cannot figure out why do I got worse inferrence result ?


# Reference Running: bash train/sft.sh
# {'train_runtime': 5268.8407, 'train_samples_per_second': 0.949, 'train_steps_per_second': 0.119, 'train_loss': 0.1172730620391667, 'epoch': 5.0}
uid="$(date +%Y%m%d_%H%M%S)"
base_model="../models/Qwen-Qwen3-1.7B/"
lr=1e-5
min_lr=0
epochs=3
weight_decay=1e-4 # -> the same training pipe as slurm_training
micro_batch_size=1 # -> batch_size will be 16 if 16 gpus
gradient_accumulation_steps=16 # requires more GPU memory
max_steps=-1
gpu_count=$(nvidia-smi -L | wc -l)
push_to_hub=false

torchrun --nproc-per-node ${gpu_count} --master_port 12345 \
    train/sft-8B.py \
    --block_size=1024 \
    --per_device_train_batch_size=${micro_batch_size} \
    --per_device_eval_batch_size=${micro_batch_size} \
    --gradient_accumulation_steps=${gradient_accumulation_steps} \
    --num_train_epochs=${epochs} \
    --train_file_path="./simplescaling/s1K-1.1_tokenized" \
    --model_name=${base_model} \
    --warmup_ratio=0.05 \
    --fsdp="full_shard auto_wrap" \
    --fsdp_config="train/fsdp_config_qwen.json" \
    --bf16=True \
    --eval_strategy="no" \
    --logging_steps=1 \
    --save_strategy="no" \
    --lr_scheduler_type="cosine" \
    --learning_rate=${lr} \
    --weight_decay=${weight_decay} \
    --adam_beta1=0.9 \
    --adam_beta2=0.95 \
    --output_dir="ckpts/s1-${uid}" \
    --push_to_hub=${push_to_hub} \
    --save_only_model=True \
    --gradient_checkpointing=True 

#### with 1 GPU A100 80G ,   #####

#### training procedure as follows:

<img width="1365" height="756" alt="Image" src="https://github.com/user-attachments/assets/8d40f6ba-5cd6-4e59-9d52-893efe0571d8" />

#### Could you please help me ? Thank you! ###


#

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I tried to finetune qwen3-1.7B on implescaling/s1K-1.1_tokenized datasets , by using this repository code , cannot figure out why do I got worse inferrence result ? #128

Reference Running: bash train/sft.sh

{'train_runtime': 5268.8407, 'train_samples_per_second': 0.949, 'train_steps_per_second': 0.119, 'train_loss': 0.1172730620391667, 'epoch': 5.0}

with 1 GPU A100 80G ,

training procedure as follows:

Could you please help me ? Thank you!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

I tried to finetune qwen3-1.7B on implescaling/s1K-1.1_tokenized datasets , by using this repository code , cannot figure out why do I got worse inferrence result ? #128

Description

Reference Running: bash train/sft.sh

{'train_runtime': 5268.8407, 'train_samples_per_second': 0.949, 'train_steps_per_second': 0.119, 'train_loss': 0.1172730620391667, 'epoch': 5.0}

with 1 GPU A100 80G ,

training procedure as follows:

Could you please help me ? Thank you!

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions