Train Better Models, Faster
LightlyTrain is the leading framework for transforming your data into state-of-the-art computer vision models. It covers the entire model development lifecycle from pretraining DINOv2/v3 vision foundation models on your unlabeled data to fine-tuning transformer and YOLO models on detection and segmentation tasks for edge deployment.
Contact us to request a license for commercial use.
- [0.12.0] - 2025-11-06: 💡 New DINOv3 Object Detection: Run inference or fine-tune DINOv3 models for object detection! 💡
- [0.11.0] - 2025-08-15: 🚀 New DINOv3 Support: Pretrain your own model with distillation from DINOv3 weights. Or fine-tune our SOTA EoMT semantic segmentation model with a DINOv3 backbone! 🚀
- [0.10.0] - 2025-08-04: 🔥 Train state-of-the-art semantic segmentation models with our new DINOv2 semantic segmentation fine-tuning method! 🔥
- [0.9.0] - 2025-07-21: DINOv2 pretraining is now officially available!
Install LightlyTrain on Python 3.8+ for Windows, Linux or MacOS with:
pip install lightly-trainObject Detection
Train LTDETR detection models with DINOv2 or DINOv3 backbones.
| Implementation | Model | Val mAP50:95 | Latency (ms) | Params (M) | Input Size |
|---|---|---|---|---|---|
| LightlyTrain | dinov2/vits14-ltdetr-coco | 55.7 | 16.87 | 55.3 | 644×644 |
| LightlyTrain | dinov3/convnext-tiny-ltdetr-coco | 54.4 | 13.29 | 61.1 | 640×640 |
| LightlyTrain | dinov3/convnext-small-ltdetr-coco | 56.9 | 17.65 | 82.7 | 640×640 |
| LightlyTrain | dinov3/convnext-base-ltdetr-coco | 58.6 | 24.68 | 121.0 | 640×640 |
| LightlyTrain | dinov3/convnext-large-ltdetr-coco | 60.0 | 42.30 | 230.0 | 640×640 |
Models are trained for 60 epochs on the COCO 2017 dataset and evaluated on the validation
set with single-scale testing. Latency is measured with TensorRT on a NVIDIA T4 GPU with
batch size 1. All models are compiled and optimized using tensorrt==10.13.3.9.
import lightly_train
if __name__ == "__main__":
# Train an object detection model with a DINOv3 backbone
lightly_train.train_object_detection(
out="out/my_experiment",
model="dinov3/convnext-small-ltdetr-coco",
data={
"path": "my_data_dir",
"train": "images/train",
"val": "images/val",
"names": {
0: "person",
1: "bicycle",
2: "car",
},
},
)
# Load model and run inference
model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")
# Or use one of the models provided by LightlyTrain
# model = lightly_train.load_model("dinov3/convnext-small-ltdetr-coco")
results = model.predict("image.jpg")
results["labels"] # Class labels, tensor of shape (num_boxes,)
results["bboxes"] # Bounding boxes in (xmin, ymin, xmax, ymax) absolute pixel
# coordinates of the original image. Tensor of shape (num_boxes, 4).
results["scores"] # Confidence scores, tensor of shape (num_boxes,)Instance Segmentation
Train state-of-the-art instance segmentation models with DINOv3 backbones using the EoMT method from CVPR 2025.
| Implementation | Model | Val mAP mask | Avg. FPS | Params (M) | Input Size |
|---|---|---|---|---|---|
| LightlyTrain | dinov3/vits16-eomt-inst-coco | 32.6 | 51.5 | 21.6 | 640×640 |
| LightlyTrain | dinov3/vitb16-eomt-inst-coco | 40.3 | 25.2 | 85.7 | 640×640 |
| LightlyTrain | dinov3/vitl16-eomt-inst-coco | 46.2 | 12.5 | 303.2 | 640×640 |
| EoMT (CVPR 2025 paper, current SOTA) | dinov3/vitl16-eomt-inst-coco | 45.9 | - | 303.2 | 640×640 |
Models are trained for 12 epochs on the COCO 2017 dataset and evaluated on the validation
set with single-scale testing. Avg. FPS is measured on a single NVIDIA T4 GPU with batch
size 1. All models are compiled and optimized using torch.compile.
import lightly_train
if __name__ == "__main__":
# Train an instance segmentation model with a DINOv3 backbone
lightly_train.train_instance_segmentation(
out="out/my_experiment",
model="dinov3/vits16-eomt-inst-coco",
data={
"path": "my_data_dir",
"train": "images/train",
"val": "images/val",
"names": {
0: "background",
1: "vehicle",
2: "pedestrian",
# ...
},
},
)
# Load model and run inference
model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")
# Or use one of the models provided by LightlyTrain
# model = lightly_train.load_model("dinov3/vits16-eomt-inst-coco")
results = model.predict("image.jpg")
results["labels"] # Class labels, tensor of shape (num_instances,)
results["masks"] # Binary masks, tensor of shape (num_instances, height, width).
# Height and width correspond to the original image size.
results["scores"] # Confidence scores, tensor of shape (num_instances,)Semantic Segmentation
Train state-of-the-art semantic segmentation models with DINOv2 or DINOv3 backbones using the EoMT method from CVPR 2025.
| Implementation | Model | Val mIoU | Avg. FPS | Params (M) | Input Size |
|---|---|---|---|---|---|
| LightlyTrain | dinov3/vits16-eomt-coco | 0.465 | 88.7 | 21.6 | 512×512 |
| LightlyTrain | dinov3/vitb16-eomt-coco | 0.520 | 43.3 | 85.7 | 512×512 |
| LightlyTrain | dinov3/vitl16-eomt-coco | 0.544 | 20.4 | 303.2 | 512×512 |
Models are trained for 12 epochs with num_queries=200 on the COCO-Stuff dataset and
evaluated on the validation set with single-scale testing. Avg. FPS is measured on a
single NVIDIA T4 GPU with batch size 1. All models are compiled and optimized using
torch.compile.
| Implementation | Model | Val mIoU | Avg. FPS | Params (M) | Input Size |
|---|---|---|---|---|---|
| LightlyTrain | dinov3/vits16-eomt-cityscapes | 0.786 | 18.6 | 21.6 | 1024×1024 |
| LightlyTrain | dinov3/vitb16-eomt-cityscapes | 0.810 | 8.7 | 85.7 | 1024×1024 |
| LightlyTrain | dinov3/vitl16-eomt-cityscapes | 0.844 | 3.9 | 303.2 | 1024×1024 |
| EoMT (CVPR 2025 paper, current SOTA) | dinov2/vitl16-eomt | 0.842 | - | 319 | 1024×1024 |
Avg. FPS is measured on a single NVIDIA T4 GPU with batch size 1. All models are compiled
and optimized using torch.compile.
import lightly_train
if __name__ == "__main__":
# Train a semantic segmentation model with a DINOv3 backbone
lightly_train.train_semantic_segmentation(
out="out/my_experiment",
model="dinov3/vits16-eomt",
data={
"train": {
"images": "my_data_dir/train/images",
"masks": "my_data_dir/train/masks",
},
"val": {
"images": "my_data_dir/val/images",
"masks": "my_data_dir/val/masks",
},
"classes": {
0: "background",
1: "road",
2: "building",
# ...
},
},
)
# Load model and run inference
model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")
# Or use one of the models provided by LightlyTrain
# model = lightly_train.load_model("dinov3/vits16-eomt")
masks = model.predict("image.jpg")
# Masks is a tensor of shape (height, width) with class labels as values.
# It has the same height and width as the input image.Distillation (DINOv2/v3)
Pretrain any model architecture with unlabeled data by distilling the knowledge from DINOv2 or DINOv3 foundation models into your model. On the COCO dataset, YOLOv8-s models pretrained with LightlyTrain achieve high performance across all tested label fractions. These improvements hold for other architectures like YOLOv11, RT-DETR, and Faster R-CNN. See our announcement post for more benchmarks and details.
import lightly_train
if __name__ == "__main__":
# Distill the knowledge from a DINOv3 teacher into a YOLOv8 model
lightly_train.train(
out="out/my_experiment",
data="my_data_dir",
model="ultralytics/yolov8s",
method="distillation",
method_args={
"teacher": "dinov3/vitb16",
},
)
# Load model for fine-tuning
model = YOLO("out/my_experiment/exported_models/exported_last.pt")
model.train(data="coco8.yaml")Pretraining (DINOv2 Foundation Models)
With LightlyTrain you can train your very own foundation model like DINOv2 on your data.
| Implementation | Model | Val ImageNet k-NN |
|---|---|---|
| LightlyTrain | dinov2/vitl16 | 81.9% |
| DINOv2 | dinov2/vitl16 | 81.6% |
Models are pretrained on ImageNet-1k for 100 epochs and evaluated with a k-NN classifier on the ImageNet validation set.
import lightly_train
if __name__ == "__main__":
# Pretrain a DINOv2 vision foundation model on your data
lightly_train.train(
out="out/my_experiment",
data="my_data_dir",
model="dinov2/vitb14",
method="dinov2",
)Autolabeling
LightlyTrain provides simple commands to autolabel your unlabeled data using DINOv2 or DINOv3 pretrained models. This allows you to efficiently boost performance of your smaller models by leveraging all your unlabeled images.
| Implementation | Model | Autolabel | Val mIoU | Params (M) | Input Size |
|---|---|---|---|---|---|
| LightlyTrain | dinov3/vits16-eomt | ❌ | 0.466 | 21.6 | 518×518 |
| LightlyTrain | dinov3/vits16-eomt-ade20k | ✅ | 0.533 | 21.6 | 518×518 |
| LightlyTrain | dinov3/vitb16-eomt | ❌ | 0.544 | 85.7 | 518×518 |
| LightlyTrain | dinov3/vitb16-eomt-ade20k | ✅ | 0.573 | 85.7 | 518×518 |
The better results with auto-labeling were achieved by fine-tuning a ViT-H+ on the ADE20K dataset, which reaches 0.595 validation mIoU. This model was then used to autolabel 100k images from the SUN397 dataset. Using these labels, we subsequently fine-tuned the smaller models, and then used the ADE20k dataset for validation.
import lightly_train
if __name__ == "__main__":
# Autolabel your data with a DINOv3 semantic segmentation model
lightly_train.predict_semantic_segmentation(
out="out/my_autolabeled_data",
data="my_data_dir",
model="dinov3/vitb16-eomt-coco",
# Or use one of your own model checkpoints
# model="out/my_experiment/exported_models/exported_best.pt",
)
# The autolabeled masks will be saved in this format:
# out/my_autolabeled_data
# ├── <image name>.png
# ├── <image name>.png
# └── …- Python, Command Line, and Docker support
- Built for high performance including multi-GPU and multi-node support
- Monitor training progress with MLflow, TensorBoard, Weights & Biases, and more
- Runs fully on-premises with no API authentication
- Export models in their native format for fine-tuning or inference
- Export models in ONNX or TensorRT format for edge deployment
LightlyTrain supports the following model and workflow combinations.
| Model | Object Detection | Instance Segmentation | Semantic Segmentation |
|---|---|---|---|
| DINOv3 | ✅ 🔗 | ✅ 🔗 | ✅ 🔗 |
| DINOv2 | ✅ 🔗 | ✅ 🔗 |
| Model | Distillation | Pretraining |
|---|---|---|
| DINOv3 | ✅ 🔗 | |
| DINOv2 | ✅ 🔗 | ✅ 🔗 |
| Torchvision ResNet, ConvNext, ShuffleNetV2 | ✅ 🔗 | ✅ 🔗 |
| TIMM models | ✅ 🔗 | ✅ 🔗 |
| Ultralytics YOLOv5–YOLO12 | ✅ 🔗 | ✅ 🔗 |
| RT-DETR, RT-DETRv2 | ✅ 🔗 | ✅ 🔗 |
| RF-DETR | ✅ 🔗 | ✅ 🔗 |
| YOLOv12 | ✅ 🔗 | ✅ 🔗 |
| Custom PyTorch Model | ✅ 🔗 | ✅ 🔗 |
Contact us if you need support for additional models.
LightlyTrain collects anonymous usage events to help us improve the product. We only
track training method, model architecture, and system information (OS, GPU). To opt-out,
set the environment variable: export LIGHTLY_TRAIN_EVENTS_DISABLED=1
LightlyTrain offers flexible licensing options to suit your specific needs:
-
AGPL-3.0 License: Perfect for open-source projects, academic research, and community contributions. Share your innovations with the world while benefiting from community improvements.
-
Commercial License: Ideal for businesses and organizations that need proprietary development freedom. Enjoy all the benefits of LightlyTrain while keeping your code and models private.
-
Free Community License: Available for students, researchers, startups in early stages, or anyone exploring or experimenting with LightlyTrain. Empower the next generation of innovators with full access to the world of pretraining.
We're committed to supporting both open-source and commercial users. Contact us to discuss the best licensing option for your project!
