ENH: ImpactMetric – disable Torch JIT runtime optimizations to avoid late runtime dependencies and move normalization to models#1394
Conversation
|
Thank you for the new PR, @vboussot! I see you upgraded LibTorch from 2.6.0 to 2.8.0. Just curious, does it have any particular improvement that you found interesting? We are currently preparing a new release of elastix. Given the fact that the elastix+IMPACT binaries don't always work "out of the box" for end-users (they may need to manually install the appropriate version of libtorch), maybe we should present two sets of binaries of each platform: one with or one without IMPACT support. What do you think? |
|
I chose to ship the IMPACT builds against LibTorch 2.8 + CUDA 12.8 because this gives the best practical coverage of modern GPUs. CUDA 12.8 supports all recent architectures (Ada, Hopper, Blackwell, …) as long as the driver is recent enough (minimum driver versions: Linux ≥ 570.26, Windows ≥ 570.65), which makes it a good default for end-users with GPUs. I wanted to target CUDA 12.8, but LibTorch 2.6 simply does not exist in a +cu128 variant (there is no libtorch-2.6.0+cu128 package on the PyTorch download servers, see https://download.pytorch.org/libtorch/cu128 I could have gone even further (e.g. LibTorch 2.9/2.10), but each new release makes LibTorch larger, without bringing any concrete benefit for IMPACT. LibTorch 2.8 is therefore a good compromise: it gives access to CUDA 12.8 while keeping the footprint reasonable. Self-contained releases. For CUDA builds, however, bundling LibTorch becomes problematic: the LibTorch CUDA package alone exceeds 2 GB, which already hits GitHub’s release size limits. To stay consistent across platforms and across all IMPACT builds, I therefore chose not to vendor LibTorch in any IMPACT binary and instead let the user provide it. This also has the advantage that users who already have a compatible LibTorch installed do not end up with two copies on their system. The main limitation today is that the LibTorch version must be strictly identical to the one used at build time. The PyTorch team is actively working toward a more stable C++ ABI (see https://pytorch.org/blog/pytorch-2-8/ and https://pytorch.org/blog/pytorch-2-9/), but this is not fully mature yet. In the longer term, this should make it possible to reuse the same LibTorch installation across multiple projects. In this context, providing two sets of binaries per platform, one standard elastix build and one with IMPACT (CPU or CUDA 12.8), is the right solution. For users who do not need IMPACT, there is no reason to download hundreds of megabytes in the CPU case, or several gigabytes in the CUDA case. |
717179e to
abf86bb
Compare
|
@vboussot As I see you already offer elastix binaries with IMPACT metric, at https://github.com/vboussot/ImpactElastix/releases/tag/1.0.0 we are now considering to just release the elastix binaries without IMPACT at SuperElastix/elastix for the upcoming release. Users will then still have the option to enable IMPACT when building their own binaries from SuperElastix/elastix. Or download from your site. Would that be OK to you? We still need to figure out how to adjust the GitHub Action yml files then, to build binaries again without IMPACT. I don't want to "throw away the child with the bathwater". 🤷 |
|
Ideally, I think we can aim for a very simple model:
This just requires IMPACT to be loaded lazily, i.e., only when Metric "Impact" is requested, so that no Torch dependency is pulled at startup. The nice side effect is that there would be only one binary per platform to build, test, release, and maintain. For now, I’m fine with the plan you describe (official binaries without IMPACT, and IMPACT-enabled binaries from my side, which I have already provided for Slicer), but I think this “single release, IMPACT-on-demand” model would be ideal in the long run. |
abf86bb to
c908485
Compare
|
@vboussot We try to make a new elastix release before the end of the week. Do you think we should still have this PR (#1394) in there? Or would it be OK to postpone it to the next elastix release? (Maybe in a minor release, in a few months or so.) Of course, you are free to make your own IMPACT/elastix release at github.com/vboussot/ImpactElastix/releases |
|
I understand your wish to keep this release conservative, especially with a deadline at the end of the week. PR #1394 is actually very small: it only fixes two robustness issues in ImpactMetric:
These fixes are already integrated in my IMPACT release for 3D Slicer and have been tested on Linux and Windows, so they are not experimental. PR #1396 is, in my opinion, ideal for a release. It is minimal and conservative:
The .so plugins are copied into the lib directory, so they are found automatically at runtime. elastix remains lightweight and works out of the box, while IMPACT can be enabled transparently. What do you think? |
| double eps = 1e-6; | ||
|
|
||
| torch::Tensor intersection = (fixedOutput * movingOutput).sum(1); | ||
| torch::Tensor unionSum = (fixedOutput + movingOutput).sum(1); | ||
| torch::Tensor diceScore = (2 * intersection + 1e-6) / (unionSum + 1e-6); | ||
| torch::Tensor u = (fixedOutput * movingOutput).sum(1) + eps; | ||
| torch::Tensor v = (fixedOutput + movingOutput).sum(1) + eps; |
There was a problem hiding this comment.
I don't think I'm sufficiently knowledgeable to review this in detail, but it looks like a bit of a hack (or at least a workaround), having to add this eps = 1e-6 to those tensors. Is it really necessary? If so, shouldn't the value of eps be relative to the values of fixedOutput and movingOutput.
A line of comment to explain the reason for this eps might be helpful here.
Also I see, you replaced the meaningful identifier names intersection, unionSum, and diceScore. No problem to me, but I just wonder, were those identifiers incorrect?
There was a problem hiding this comment.
In short:
The eps in the denominator is only there for numerical stability: if all channels are zero (no segmented structure in the patch in both images), then v = 0 and we get a 0/0. The eps prevents NaNs.
The eps in the numerator is not required for stability. It encodes a convention: when there is no structure in either image (u = 0 and v = 0), we force the Dice to be 1, i.e. two empty outputs are considered perfectly similar.
So:
eps in the denominator -> avoids division by zero;
eps in the numerator -> forces Dice to 1 when there is no segmentation in the patch.
Regarding the variable names, the previous identifiers were not incorrect. I simply switched to u and v because it is clearer for me when writing and checking the gradient
There was a problem hiding this comment.
Thank you for your explanation! So then, would eps = DBL_MIN work for you as well? (Or maybe std::numeric_limits<double>::min().) I try not to introduce more "magic numbers" in elastix (like 1e-6), although I know there are a few in there already 🤷
If the outcome of those sum(1) calls may be exactly zero, it's OK to handle that separately, in my opinion.
In general we prefer meaningful variable names (although I do use i for an integer occasionally). No show-stopper though.
Disable Torch JIT runtime optimizations in BeforeRegistration() to avoid late graph rewrites, lead to late CUDA toolkit runtime dependencies at execution time Make feature extraction models responsible for image normalization. The metric no longer applies implicit pre-processing, ensuring that each model runs under the exact conditions it was trained for.
c908485 to
4e8811a
Compare
| args.emplace_back(m_imageDirectionTensor); | ||
| } | ||
|
|
||
| return m_Model->forward(args).toList().vec(); |
There was a problem hiding this comment.
If the performance penalty from heap allocation matters, you might consider "moving" the args, instead of just passing them by value:
return m_Model->forward(std::move(args)).toList().vec();Doing so would replace a potentially expensive copy-constructor call with a fast non-exception-throwing move-constructor call, of std::vector.
There was a problem hiding this comment.
Good point, thank you !
There was a problem hiding this comment.
You're welcome!
I notices that std::move is indeed beneficial here, because of the way Module::forward is defined, in my local copy of "libtorch-win-shared-with-deps-debug-2.8.0+cu128\libtorch\include\torch\csrc\jit\api\module.h":
IValue forward(std::vector<IValue> inputs, const Kwargs& kwargs = Kwargs()) {
return get_method("forward")(std::move(inputs), kwargs);
}😃
Handle the degenerate case (union == 0) explicitly instead of using a magic epsilon. This avoids any 0/0 division, prevents NaNs, and makes the empty/empty -> Dice = 1 convention explicit in both value and gradient.
There was a problem hiding this comment.
Thank you very much, @vboussot ! I'll merge now, so that these improvements can still be included with elastix 5.3.0.
We would very much like to have #1396 in the next elastix release, which could be a minor release (elastic 5.3.1).
For this release, I think I still need to temporarily revert some of your yml changes, in order to publish elastix binaries without IMPACT. Right?
For your information, my idea is to temporarily revert building IMPACT elastix binaries in a separate branch (not main), specific for elastix 5.3.0: https://github.com/SuperElastix/elastix/tree/release-5.3.0 I'm now preparing a PR for this release branch: https://github.com/SuperElastix/elastix/tree/Temporarily-remove-IMPACT-from-binaries-for-release
|
Yes, that should work 👍 You can simply: That’s enough to disable IMPACT and publish elastix binaries without it |
I'm almost there, but it still took me a few more adjustments... 7 commits so far: https://github.com/SuperElastix/elastix/tree/Temporarily-remove-IMPACT-from-binaries-for-release I guess you might have done it slightly differently, but it's also a good learning experience for me, to find out all places that affect IMPACT binary building at the CI! |

This PR improves the robustness of ImpactMetric in two ways:
Torch JIT runtime optimizations are disabled in BeforeRegistration() to avoid late graph rewrites during registration. When the forward pass is executed several times with the same image, JIT may try to re-optimize the graph and invoke NVRTC for on-the-fly compilation. NVRTC is not bundled with LibTorch and requires the user’s CUDA toolkit to be installed, which can lead to runtime errors on systems.
Implicit image normalization is removed from the metric. ImpactMetric now calls the TorchScript models with the raw image together with its global statistics (min, max, mean, std) and image direction. This allows each model to reproduce exactly the same normalization and canonicalization steps as during training, inside its own forward pass.