Yujie Zhao1,2*,
Hongwei Fan1,2*,
Di Chen3,
Shengcong Chen3,
Liliang Chen3,
Xiaoqi Li1,2,
Guanghui Ren3,
Hao Dong1,2,
1CFCS, School of Computer Science, Peking University, 2PKU-AgiBot Lab, 3AgiBot
This repository contains the official authors implementation associated with the paper "Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface".
- Mar 10, 2026: We released the code and the model weights.
- Feb 21, 2026: Real2Edit2Real has been accepted by CVPR 2026. 🥳🥳
- Dec 22, 2025: We released the arXiv and demo of Real2Edit2Real.
git clone --recurse-submodules https://github.com/Real2Edit2Real/Real2Edit2Real.git
cd Real2Edit2Real
conda create -y -n r2e2r python=3.10
conda activate r2e2r
conda install -y nvidia/label/cuda-12.1.0::cuda-toolkit -c nvidia/label/cuda-12.1.0
conda install -y -c conda-forge gxx_linux-64=11.4 gcc_linux-64=11.4 aria2
bash scripts/installation/1_install_env.sh
bash scripts/installation/2_install_curobo.sh
# Set this flag if you experience slow download speeds:
# export USE_HF_MIRROR=true
bash scripts/installation/3_download_ckpts.shThe data generation pipeline illustrated below is capable of running on an NVIDIA GeForce RTX 4090.
# Set this flag if you experience slow download speeds:
# export USE_HF_MIRROR=true
bash scripts/installation/3_download_data.shbash scripts/preprocess_demo.sh --config-path configs/mug_to_basket.yamlbash scripts/generate_demo.sh --config-path configs/mug_to_basket.yamlbash scripts/generate_demo_video.sh --config-path configs/mug_to_basket.yamlThe training scripts illustrated below are capable of running on GPUs with 80GB VRAM.
- Preparing the dataset
Dataset
- task-id
-- episode-id
--- frame-id
---- head_color
---- hand_left_color
---- hand_right_color
---- head_depth
---- hand_left_depth
---- hand_right_depth
---- head_extrinsic
---- hand_left_extrinsic
---- hand_right_extrinsic
---- head_intrinsic
---- hand_left_intrinsic
---- hand_right_intrinsic
- Downloading the pretrained VGGT
wget https://huggingface.co/facebook/VGGT-1B/resolve/main/model.pt -O checkpoints/vggt_base_model.pt- Run the training script
cd vggt
bash train.shFor training data, we use the metric-VGGT model to annotate the open-source dataset AgibotWorld-Beta with depth and camera pose labels.
We provide a reference data processing script: vggt/preprocess_agibot_dataset.py
- Preparing the dataset
Dataset
- observations
-- task-id
---episode-id
- parameters
- proprio_stats
Annotated-Result
- task-id
-- episode-id
--- head_depth_ori
--- hand_left_depth_ori
--- hand_right_depth_ori
--- head_depth_canny
--- hand_left_depth_canny
--- hand_right_depth_canny
--- head_extrinsic.npy
--- hand_left_extrinsic.npy
--- hand_right_extrinsic.npy
- Downloading the pretrained GE-Sim
wget https://modelscope.cn/models/agibot_world/Genie-Envisioner/resolve/master/ge_sim_cosmos_v0.1.safetensors -O checkpoints/ge_sim_cosmos_v0.1.safetensors- Training
cd videogen
bash train.sh scripts/train_action_depth_canny_cosmos2.py --config_file configs/action_depth_canny_cosmos2.yamlThanks to these great repositories: DUSt3R, MASt3R, VGGT, DemoGen, Cosmos, GenieEnvisioner, Enerverse-AC, and many other inspiring works in the community.
@article{zhao2025real2edit2real,
title={Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface},
author={Yujie Zhao and Hongwei Fan and Di Chen and Shengcong Chen and Liliang Chen and Xiaoqi Li and Guanghui Ren and Hao Dong},
year={2025},
eprint={2512.19402},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2512.19402},
}
