-
Notifications
You must be signed in to change notification settings - Fork 351
Description
Checklist
- I have searched for similar issues.
- I have tested with the latest development wheel.
- I have checked the release documentation and the latest documentation (for
mainbranch).
Describe the issue
I have been trying to train my custom dataset on point transformer using torch pipeline but training crashes due less number of neighbours.
Steps to reproduce the bug
part of config with many number of points:
train_dir: train
val_dir: [val, test]
test_dir: []
dataset_info: dataset_info.json
sampler:
name: SemSegRandomSampler
test_result_folder: ./test
min_points: 32768
padding_noise_std: 0.01Error message
@pop-os:~/Documents/HPC pipeline$ python training_pipeline/open3dml-clone/scripts/run_pipeline.py torch
-c training_pipeline/configs/pointtransformer_helios.yml
--dataset.dataset_path data_processing/helios_open3dml_data
--pipeline SemanticSegmentation
--dataset.use_cache True
--device cuda
--pipeline.log_level DEBUG
Using external Open3D-ML in /homeHPC pipeline/training_pipeline/open3dml-clone
regular arguments
backend: gloo
batch_size: null
cfg_dataset: null
cfg_file: training_pipeline/configs/pointtransformer_helios.yml
cfg_model: null
cfg_pipeline: null
ckpt_path: null
dataset: null
dataset_path: null
device: cuda
device_ids:
- '0'
framework: torch
host: localhost
main_log_dir: null
max_epochs: null
mode: null
model: null
node_rank: 0
nodes: 1
pipeline: SemanticSegmentation
port: '12355'
seed: 0
split: train
extra arguments
dataset.dataset_path: data_processing/helios_open3dml_data
dataset.use_cache: 'True'
pipeline.log_level: DEBUG
INFO - 2025-09-26 14:34:01,560 - heliosconstruction - HeliosConstruction dataset initialized with 152 train / 48 val / 0 test files.
INFO - 2025-09-26 14:34:01,703 - semantic_segmentation - DEVICE : cuda
INFO - 2025-09-26 14:34:01,703 - semantic_segmentation - Logging in file : ./logs/PointTransformer_HeliosConstruction_torch/log_train_2025-09-26_14-34-01.txt
INFO - 2025-09-26 14:34:01,704 - heliosconstruction - Found 152 point clouds for train
INFO - 2025-09-26 14:34:01,705 - heliosconstruction - Found 48 point clouds for validation
INFO - 2025-09-26 14:34:01,956 - semantic_segmentation - Initializing from scratch.
INFO - 2025-09-26 14:34:01,957 - semantic_segmentation - Writing summary in train_log/00020_PointTransformer_HeliosConstruction_torch.
INFO - 2025-09-26 14:34:01,957 - semantic_segmentation - Started training
INFO - 2025-09-26 14:34:01,957 - semantic_segmentation - === EPOCH 0/200 ===
training: 0%| | 0/76 [00:00<?, ?it/s]/home/Documents/HPC pipeline/training_pipeline/open3dml-clone/ml3d/datasets/augment/augmentation.py:78: UserWarning: It is recommended to recenter the pointcloud before calling rotate.
warnings.warn(
training: 0%| | 0/76 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/Documents/HPC pipeline/training_pipeline/open3dml-clone/scripts/run_pipeline.py", line 261, in
sys.exit(main())
^^^^^^
File "/home/Documents/HPC pipeline/training_pipeline/open3dml-clone/scripts/run_pipeline.py", line 192, in main
pipeline.run_train()
File "/home/Documents/HPC pipeline/training_pipeline/open3dml-clone/ml3d/torch/pipelines/semantic_segmentation.py", line 416, in run_train
results = model(inputs['data'])
^^^^^^^^^^^^^^^^^^^^^
File "/home/miniconda3/envs/Helios/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/miniconda3/envs/Helios/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home//Documents/HPC pipeline/training_pipeline/open3dml-clone/ml3d/torch/models/point_transformer.py", line 175, in forward
p, f, r = self.encoders[i]([points[i], feats[i], row_splits[i]])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/miniconda3/envs/Helios/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/miniconda3/envs/Helios/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/miniconda3/envs/Helios/lib/python3.11/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
^^^^^^^^^^^^^
File "/home/miniconda3/envs/Helios/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/miniconda3/envs/Helios/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/Documents/HPC pipeline/training_pipeline/open3dml-clone/ml3d/torch/models/point_transformer.py", line 643, in forward
feat = self.relu(self.bn2(self.transformer2([point, feat, row_splits])))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/miniconda3/envs/Helios/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/miniconda3/envs/Helios/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/Documents/HPC pipeline/training_pipeline/open3dml-clone/ml3d/torch/models/point_transformer.py", line 430, in forward
feat_k = queryandgroup(self.nsample,
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/Documents/HPC pipeline/training_pipeline/open3dml-clone/ml3d/torch/models/point_transformer.py", line 680, in queryandgroup
idx = knn_batch(points,
^^^^^^^^^^^^^^^^^
File "/home/Documents/HPC pipeline/training_pipeline/open3dml-clone/ml3d/torch/models/point_transformer.py", line 734, in knn_batch
return ans.neighbors_index.reshape(-1, k).long().cuda()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[-1, 16]' is invalid for input of size 345
Expected behavior
No response
Open3D, Python and System information
- Operating system: Ubuntu 22.04,
- Python version: Python 3.11
- Open3D version: 0.18.0
- System type:
- Is this remote workstation?: no
- How did you install Open3D?: pip
- Compiler version (if built from source):Additional information
The crash comes from knn_search in the current Open3D release returning ragged neighbor lists—some queries yield fewer than 16 neighbours, so Open3D-ML’s
PointTransformer (which assumes a fixed k) can’t reshape the tensor. That behaviour changed in Open3D ≥0.18; the upstream Open3D-ML master hasn’t patched this branch
yet,