Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface

Yujie Zhao^1,2*, Hongwei Fan^1,2*, Di Chen³, Shengcong Chen³, Liliang Chen³, Xiaoqi Li^1,2,
Guanghui Ren³, Hao Dong^1,2,

¹CFCS, School of Computer Science, Peking University, ²PKU-AgiBot Lab, ³AgiBot

(* indicates equal contribution)

CVPR 2026

This repository contains the official authors implementation associated with the paper "Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface".

📢 News

Mar 10, 2026: We released the code and the model weights.
Feb 21, 2026: Real2Edit2Real has been accepted by CVPR 2026. 🥳🥳
Dec 22, 2025: We released the arXiv and demo of Real2Edit2Real.

🛠️ Installation

git clone --recurse-submodules https://github.com/Real2Edit2Real/Real2Edit2Real.git

cd Real2Edit2Real
conda create -y -n r2e2r python=3.10
conda activate r2e2r
conda install -y nvidia/label/cuda-12.1.0::cuda-toolkit -c nvidia/label/cuda-12.1.0
conda install -y -c conda-forge gxx_linux-64=11.4 gcc_linux-64=11.4 aria2
bash scripts/installation/1_install_env.sh
bash scripts/installation/2_install_curobo.sh
# Set this flag if you experience slow download speeds:
# export USE_HF_MIRROR=true
bash scripts/installation/3_download_ckpts.sh

🔥 Quick Start

The data generation pipeline illustrated below is capable of running on an NVIDIA GeForce RTX 4090.

Downloading Example Data

# Set this flag if you experience slow download speeds:
# export USE_HF_MIRROR=true
bash scripts/installation/3_download_data.sh

Metric-scale Geometry Reconstruction

bash scripts/preprocess_demo.sh --config-path configs/mug_to_basket.yaml

Depth-reliable Spatial Editing

bash scripts/generate_demo.sh --config-path configs/mug_to_basket.yaml

3D-Controlled Video Generation

bash scripts/generate_demo_video.sh --config-path configs/mug_to_basket.yaml

🔥 Training

The training scripts illustrated below are capable of running on GPUs with 80GB VRAM.

Metric-VGGT Training

Preparing the dataset

Dataset
- task-id
-- episode-id
--- frame-id
---- head_color
---- hand_left_color
---- hand_right_color
---- head_depth
---- hand_left_depth
---- hand_right_depth
---- head_extrinsic
---- hand_left_extrinsic
---- hand_right_extrinsic
---- head_intrinsic
---- hand_left_intrinsic
---- hand_right_intrinsic

Downloading the pretrained VGGT

wget https://huggingface.co/facebook/VGGT-1B/resolve/main/model.pt -O checkpoints/vggt_base_model.pt

Run the training script

cd vggt
bash train.sh

Video Generation Model Training

For training data, we use the metric-VGGT model to annotate the open-source dataset AgibotWorld-Beta with depth and camera pose labels.

We provide a reference data processing script: vggt/preprocess_agibot_dataset.py

Preparing the dataset

Dataset
- observations
-- task-id
---episode-id
- parameters
- proprio_stats

Annotated-Result
- task-id
-- episode-id
--- head_depth_ori
--- hand_left_depth_ori
--- hand_right_depth_ori
--- head_depth_canny
--- hand_left_depth_canny
--- hand_right_depth_canny
--- head_extrinsic.npy
--- hand_left_extrinsic.npy
--- hand_right_extrinsic.npy

Downloading the pretrained GE-Sim

wget https://modelscope.cn/models/agibot_world/Genie-Envisioner/resolve/master/ge_sim_cosmos_v0.1.safetensors -O checkpoints/ge_sim_cosmos_v0.1.safetensors

Training

cd videogen
bash train.sh scripts/train_action_depth_canny_cosmos2.py --config_file configs/action_depth_canny_cosmos2.yaml

🧩 Acknowledgements

Thanks to these great repositories: DUSt3R, MASt3R, VGGT, DemoGen, Cosmos, GenieEnvisioner, Enerverse-AC, and many other inspiring works in the community.

✍️ Citation

@article{zhao2025real2edit2real,
      title={Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface}, 
      author={Yujie Zhao and Hongwei Fan and Di Chen and Shengcong Chen and Liliang Chen and Xiaoqi Li and Guanghui Ren and Hao Dong},
      year={2025},
      eprint={2512.19402},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2512.19402}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
configs		configs
editing		editing
scripts		scripts
third-party		third-party
tools		tools
vggt		vggt
videogen		videogen
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface

📢 News

🛠️ Installation

🔥 Quick Start

Downloading Example Data

Metric-scale Geometry Reconstruction

Depth-reliable Spatial Editing

3D-Controlled Video Generation

🔥 Training

Metric-VGGT Training

Video Generation Model Training

🧩 Acknowledgements

✍️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface

📢 News

🛠️ Installation

🔥 Quick Start

Downloading Example Data

Metric-scale Geometry Reconstruction

Depth-reliable Spatial Editing

3D-Controlled Video Generation

🔥 Training

Metric-VGGT Training

Video Generation Model Training

🧩 Acknowledgements

✍️ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages