LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation

Introduction

This is the code implementation for our paper on KDD'25 "LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation".

Environments

To execute the code correctly, the following python packages are required:

torch >= 2.6.0
transformers >= 4.44.2
llm2vec == 0.2.3
flash-attn >= 2.7.4

Datasets

The zipped datasets used in this paper can be downloaded from this link. Please unzip the dataset files under directory ./data .

Training

LLM2Rec follows a two-stage training pipeline:

Collaborative Supervised Fine-Tuning (CSFT)
Fine-tunes a pre-trained LLM to capture collaborative filtering (CF) signals using user interaction sequences as training data.
Item-level Embedding Modeling (IEM)
Converts the CF-aware LLM into an embedding generator.

Run training

We provide example shell scripts for training:

# Stage 1: Collaborative Supervised Fine-Tuning
bash run_LLM2Rec_CSFT.sh

# Stage 2: Item-level Embedding Modeling
bash run_LLM2Rec_IEM.sh

Please change the necessary configs of your own device (e.g. path of the saved pre-trained LLMs) before executing.

Evaluation

We integrate the evaluation process, including embedding extraction and training downstream sequential recommenders, into one script, which can be easily executed by

bash script_extract_and_evaluate.sh

You can change the paths of the saved checkpoints to evaluate in the config part of the script_extract_and_evaluate.sh script.

Citation

If you find our repo useful, please consider citing:

@inproceedings{he2025llm2rec,
  title={LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation},
  author={He, Yingzhi and Liu, Xiaohao and Zhang, An and Ma, Yunshan and Chua, Tat-Seng},
  booktitle={Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2},
  pages={896--907},
  year={2025}
}

Acknowledgements

The code implementation is based on previous repos, including llm2vec, recbole, and DecodingMatters.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
baselines		baselines
docs/project_group_readme		docs/project_group_readme
llm2rec		llm2rec
seqrec		seqrec
utils		utils
.gitignore		.gitignore
Baseline_inference.py		Baseline_inference.py
README.md		README.md
compiler_setup.sh		compiler_setup.sh
evaluate_with_seqrec.py		evaluate_with_seqrec.py
extract_llm_embedding.py		extract_llm_embedding.py
repeated_evaluate_with_seqrec.py		repeated_evaluate_with_seqrec.py
requirements.txt		requirements.txt
run_LLM2Rec_CSFT.sh		run_LLM2Rec_CSFT.sh
run_LLM2Rec_IEM.sh		run_LLM2Rec_IEM.sh
script_eval_baselines.sh		script_eval_baselines.sh
script_extract_and_evaluate.sh		script_extract_and_evaluate.sh
组员必看.md		组员必看.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation

Introduction

Environments

Datasets

Training

Run training

Evaluation

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

hyz-courses/CSIT5210-Project-LLM2Rec

Folders and files

Latest commit

History

Repository files navigation

LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation

Introduction

Environments

Datasets

Training

Run training

Evaluation

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages