Skip to content

add Array template index and use long long for ProjDataInMemory#1676

Draft
KrisThielemans wants to merge 3 commits intoUCL:masterfrom
KrisThielemans:ArrayTemplateIndex
Draft

add Array template index and use long long for ProjDataInMemory#1676
KrisThielemans wants to merge 3 commits intoUCL:masterfrom
KrisThielemans:ArrayTemplateIndex

Conversation

@KrisThielemans
Copy link
Collaborator

Drastic revision that adds a indexT to VectorWithOffset, Array and IndexRange to be able to use something else than int.
Then uses that to allow for more bins in ProjDataInMemory.

fixes #1505

@KrisThielemans
Copy link
Collaborator Author

C++ tests are fine. Weird things in the Python tests.

@z-k-li in principle you could check this with SIRF and "in memory" acquisition data. The STIR Python stuff will likely fail as per the tests.

Adding an indexT (defaulting to int) to various classes related to arrays,
including BasicCoordinate, IndexRange, VectorWithOffset and Array.

This will allow to use larger (or smaller) ranges for the indices.

Most code using these classes has not been changed, therefore still using
"int".

This was a bit more work than anticipated, as I had to moved the
forward declarations to separate files, such that
- they are consistent
- there is only one place where the default is defined (as required
by C++)
This removes the limitation on the number of elements in the proj-data.

Fixes UCL#1505
- Looks like SWIG doesn't understand default templates arguments in %template unfortunately.
- cope with num_dimenssions swig bug
@KrisThielemans
Copy link
Collaborator Author

The tests currently fail for unsigned indices (although I thought I had fixed that). However, this is currently not uesed, except in thest. I also still expect problems with SWIG as above.

@ChristianHinge @z-k-li could you give this a go with parallelproj (either CPU or GPU). It doesn't have #1674 yet, but that fix doesn't affect parallelproj. (You'd have to disable building of STIR Python.)

For GPU, no doubt you will have to change

; this keyword allows increasing the number of chunks that the projector uses
; increase if you run out of GPU memory
num_gpu_chunks:=1

@ChristianHinge
Copy link

@KrisThielemans I will try this out next week since we are just finalizing the MICCAI submission :-)! So just make sure I understand correctly - I will recompile the PR without python bindings and benchmark RAM + clock against the master branch. Is that correct?

@KrisThielemans
Copy link
Collaborator Author

That's correct, but of course after installing CUDA drivers/toolkit, and switching the projector to parallelproj.

@ChristianHinge
Copy link

@KrisThielemans I ran the recon using parallelproj using 4i5s, zoom 0.5, segment 4 (same recon everything as the benchmark I ran for #1674, but using parallelproj)

Time: 118min (TOF CPU: 16 min)
RAM: 167GB (TOF CPU: 19GB)

I ran it on 4xA40 GPUs. When monitoring, nvidia-smi, I can see the work being offloaded to the GPUs, albeit the utilization is quite low since the GPU is iddle most of the time. The resulting images look virtually identical to the master branch, but the voxel values differ a tiny bit.

@KrisThielemans
Copy link
Collaborator Author

Thanks @ChristianHinge at least we know it works now (it used to crash), which is great. I do expect differences between parallelproj (Joseph projector) and the ray-tracing matrix (Siddon with a few lines), but images should look overall quite similar.

Indeed, most of the computation time sits in copying and reordering data. The GPU probably flies through it all. This will be work for our imminent hackathons.

@ChristianHinge
Copy link

It is awesome to see the recon work on GPU! And quite cool that it distributes the workload evenly between the four A40s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calling parallelproj forward Segmentation fault (core dumped) for LAFOV ProjDataInMemory fails with more than 2^31 bins

2 participants