Skip to content

ENH: ImpactMetric – disable Torch JIT runtime optimizations to avoid late runtime dependencies and move normalization to models#1394

Merged
N-Dekker merged 5 commits intoSuperElastix:mainfrom
vboussot:feature/impact-metric
Jan 30, 2026
Merged

ENH: ImpactMetric – disable Torch JIT runtime optimizations to avoid late runtime dependencies and move normalization to models#1394
N-Dekker merged 5 commits intoSuperElastix:mainfrom
vboussot:feature/impact-metric

Conversation

@vboussot
Copy link
Contributor

This PR improves the robustness of ImpactMetric in two ways:

  1. Torch JIT runtime optimizations are disabled in BeforeRegistration() to avoid late graph rewrites during registration. When the forward pass is executed several times with the same image, JIT may try to re-optimize the graph and invoke NVRTC for on-the-fly compilation. NVRTC is not bundled with LibTorch and requires the user’s CUDA toolkit to be installed, which can lead to runtime errors on systems.

  2. Implicit image normalization is removed from the metric. ImpactMetric now calls the TorchScript models with the raw image together with its global statistics (min, max, mean, std) and image direction. This allows each model to reproduce exactly the same normalization and canonicalization steps as during training, inside its own forward pass.

@N-Dekker
Copy link
Member

Thank you for the new PR, @vboussot! I see you upgraded LibTorch from 2.6.0 to 2.8.0. Just curious, does it have any particular improvement that you found interesting?

We are currently preparing a new release of elastix. Given the fact that the elastix+IMPACT binaries don't always work "out of the box" for end-users (they may need to manually install the appropriate version of libtorch), maybe we should present two sets of binaries of each platform: one with or one without IMPACT support. What do you think?

@vboussot
Copy link
Contributor Author

I chose to ship the IMPACT builds against LibTorch 2.8 + CUDA 12.8 because this gives the best practical coverage of modern GPUs. CUDA 12.8 supports all recent architectures (Ada, Hopper, Blackwell, …) as long as the driver is recent enough (minimum driver versions: Linux ≥ 570.26, Windows ≥ 570.65), which makes it a good default for end-users with GPUs.
Cuda

I wanted to target CUDA 12.8, but LibTorch 2.6 simply does not exist in a +cu128 variant (there is no libtorch-2.6.0+cu128 package on the PyTorch download servers, see https://download.pytorch.org/libtorch/cu128
). That is the practical reason why I moved to LibTorch 2.8, and this change does not affect IMPACT itself in any way, since it does not rely on any feature introduced in 2.8.

I could have gone even further (e.g. LibTorch 2.9/2.10), but each new release makes LibTorch larger, without bringing any concrete benefit for IMPACT. LibTorch 2.8 is therefore a good compromise: it gives access to CUDA 12.8 while keeping the footprint reasonable.

Self-contained releases.
For IMPACT CPU-only builds, we could realistically make them fully self-contained by bundling LibTorch in the release, since the CPU variant remains relatively small and only adds a hundred megabytes.

For CUDA builds, however, bundling LibTorch becomes problematic: the LibTorch CUDA package alone exceeds 2 GB, which already hits GitHub’s release size limits. To stay consistent across platforms and across all IMPACT builds, I therefore chose not to vendor LibTorch in any IMPACT binary and instead let the user provide it.

This also has the advantage that users who already have a compatible LibTorch installed do not end up with two copies on their system.

The main limitation today is that the LibTorch version must be strictly identical to the one used at build time. The PyTorch team is actively working toward a more stable C++ ABI (see https://pytorch.org/blog/pytorch-2-8/ and https://pytorch.org/blog/pytorch-2-9/), but this is not fully mature yet. In the longer term, this should make it possible to reuse the same LibTorch installation across multiple projects.

In this context, providing two sets of binaries per platform, one standard elastix build and one with IMPACT (CPU or CUDA 12.8), is the right solution. For users who do not need IMPACT, there is no reason to download hundreds of megabytes in the CPU case, or several gigabytes in the CUDA case.

@vboussot vboussot force-pushed the feature/impact-metric branch from 717179e to abf86bb Compare January 26, 2026 15:34
@N-Dekker
Copy link
Member

@vboussot As I see you already offer elastix binaries with IMPACT metric, at https://github.com/vboussot/ImpactElastix/releases/tag/1.0.0 we are now considering to just release the elastix binaries without IMPACT at SuperElastix/elastix for the upcoming release. Users will then still have the option to enable IMPACT when building their own binaries from SuperElastix/elastix. Or download from your site. Would that be OK to you?

We still need to figure out how to adjust the GitHub Action yml files then, to build binaries again without IMPACT. I don't want to "throw away the child with the bathwater". 🤷

@vboussot
Copy link
Contributor Author

Ideally, I think we can aim for a very simple model:

  • a single release per platform,
  • elastix always works “out of the box”,
  • if LibTorch is present in the runtime environment, then IMPACT works,
  • otherwise IMPACT is simply unavailable, while everything else keeps working (with a clear error if someone tries to use it).

This just requires IMPACT to be loaded lazily, i.e., only when Metric "Impact" is requested, so that no Torch dependency is pulled at startup.

The nice side effect is that there would be only one binary per platform to build, test, release, and maintain.

For now, I’m fine with the plan you describe (official binaries without IMPACT, and IMPACT-enabled binaries from my side, which I have already provided for Slicer), but I think this “single release, IMPACT-on-demand” model would be ideal in the long run.

@N-Dekker
Copy link
Member

@vboussot We try to make a new elastix release before the end of the week. Do you think we should still have this PR (#1394) in there? Or would it be OK to postpone it to the next elastix release? (Maybe in a minor release, in a few months or so.)

Of course, you are free to make your own IMPACT/elastix release at github.com/vboussot/ImpactElastix/releases

@vboussot
Copy link
Contributor Author

vboussot commented Jan 27, 2026

I understand your wish to keep this release conservative, especially with a deadline at the end of the week.

PR #1394 is actually very small: it only fixes two robustness issues in ImpactMetric:

  • disabling Torch JIT runtime optimizations to avoid late runtime dependencies (NVRTC crashes),
  • moving image normalization into the models to guarantee consistency with training.

These fixes are already integrated in my IMPACT release for 3D Slicer and have been tested on Linux and Windows, so they are not experimental.

PR #1396 is, in my opinion, ideal for a release. It is minimal and conservative:

  • elastix behaves exactly as before by default,
  • only when (Metric "Impact") is requested does it try to load a plugin,
  • IMPACT is built as a MODULE, so elastix has no hard dependency on LibTorch,
  • clear error messages are shown if IMPACT cannot be loaded,
  • the existing CI remains unchanged,
  • a dedicated workflow produces three clean release artifacts:
    • Elastix Ubuntu (with ImpactMetric.so and ImpactMetricCuda.so),
    • Elastix Windows (with ImpactMetric.dll and ImpactMetricCuda.dll),
    • Elastix macOS (with the ImpactMetric.so plugin).

The .so plugins are copied into the lib directory, so they are found automatically at runtime.

elastix remains lightweight and works out of the box, while IMPACT can be enabled transparently. What do you think?

Comment on lines +288 to +291
double eps = 1e-6;

torch::Tensor intersection = (fixedOutput * movingOutput).sum(1);
torch::Tensor unionSum = (fixedOutput + movingOutput).sum(1);
torch::Tensor diceScore = (2 * intersection + 1e-6) / (unionSum + 1e-6);
torch::Tensor u = (fixedOutput * movingOutput).sum(1) + eps;
torch::Tensor v = (fixedOutput + movingOutput).sum(1) + eps;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm sufficiently knowledgeable to review this in detail, but it looks like a bit of a hack (or at least a workaround), having to add this eps = 1e-6 to those tensors. Is it really necessary? If so, shouldn't the value of eps be relative to the values of fixedOutput and movingOutput.

A line of comment to explain the reason for this eps might be helpful here.

Also I see, you replaced the meaningful identifier names intersection, unionSum, and diceScore. No problem to me, but I just wonder, were those identifiers incorrect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short:

The eps in the denominator is only there for numerical stability: if all channels are zero (no segmented structure in the patch in both images), then v = 0 and we get a 0/0. The eps prevents NaNs.

The eps in the numerator is not required for stability. It encodes a convention: when there is no structure in either image (u = 0 and v = 0), we force the Dice to be 1, i.e. two empty outputs are considered perfectly similar.

So:
eps in the denominator -> avoids division by zero;
eps in the numerator -> forces Dice to 1 when there is no segmentation in the patch.

Regarding the variable names, the previous identifiers were not incorrect. I simply switched to u and v because it is clearer for me when writing and checking the gradient

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your explanation! So then, would eps = DBL_MIN work for you as well? (Or maybe std::numeric_limits<double>::min().) I try not to introduce more "magic numbers" in elastix (like 1e-6), although I know there are a few in there already 🤷

If the outcome of those sum(1) calls may be exactly zero, it's OK to handle that separately, in my opinion.

In general we prefer meaningful variable names (although I do use i for an integer occasionally). No show-stopper though.

Valentin Boussot and others added 3 commits January 29, 2026 16:29
Disable Torch JIT runtime optimizations in BeforeRegistration() to avoid late graph rewrites, lead to late CUDA toolkit runtime dependencies at execution time

Make feature extraction models responsible for image normalization. The metric no longer applies implicit pre-processing, ensuring that each model runs under the exact conditions it was trained for.
@vboussot vboussot force-pushed the feature/impact-metric branch from c908485 to 4e8811a Compare January 29, 2026 15:29
args.emplace_back(m_imageDirectionTensor);
}

return m_Model->forward(args).toList().vec();
Copy link
Member

@N-Dekker N-Dekker Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the performance penalty from heap allocation matters, you might consider "moving" the args, instead of just passing them by value:

    return m_Model->forward(std::move(args)).toList().vec();

Doing so would replace a potentially expensive copy-constructor call with a fast non-exception-throwing move-constructor call, of std::vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thank you !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome!

I notices that std::move is indeed beneficial here, because of the way Module::forward is defined, in my local copy of "libtorch-win-shared-with-deps-debug-2.8.0+cu128\libtorch\include\torch\csrc\jit\api\module.h":

  IValue forward(std::vector<IValue> inputs, const Kwargs& kwargs = Kwargs()) {
    return get_method("forward")(std::move(inputs), kwargs);
  }

😃

Handle the degenerate case (union == 0) explicitly instead of using a magic epsilon.

This avoids any 0/0 division, prevents NaNs, and makes the empty/empty ->  Dice = 1 convention explicit in both value and gradient.
Copy link
Member

@N-Dekker N-Dekker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much, @vboussot ! I'll merge now, so that these improvements can still be included with elastix 5.3.0.

We would very much like to have #1396 in the next elastix release, which could be a minor release (elastic 5.3.1).

For this release, I think I still need to temporarily revert some of your yml changes, in order to publish elastix binaries without IMPACT. Right?


For your information, my idea is to temporarily revert building IMPACT elastix binaries in a separate branch (not main), specific for elastix 5.3.0: https://github.com/SuperElastix/elastix/tree/release-5.3.0 I'm now preparing a PR for this release branch: https://github.com/SuperElastix/elastix/tree/Temporarily-remove-IMPACT-from-binaries-for-release

@N-Dekker N-Dekker merged commit 6c09b6f into SuperElastix:main Jan 30, 2026
5 checks passed
@vboussot
Copy link
Contributor Author

Yes, that should work 👍

You can simply:
Set USE_ImpactMetric=OFF.
Stop building ElastixImpactMetricExample.cxx in the ExternalProject when this flag is OFF.

That’s enough to disable IMPACT and publish elastix binaries without it

@N-Dekker
Copy link
Member

Set USE_ImpactMetric=OFF. Stop building ElastixImpactMetricExample.cxx in the ExternalProject when this flag is OFF.

That’s enough to disable IMPACT and publish elastix binaries without it

I'm almost there, but it still took me a few more adjustments... 7 commits so far:

https://github.com/SuperElastix/elastix/tree/Temporarily-remove-IMPACT-from-binaries-for-release

I guess you might have done it slightly differently, but it's also a good learning experience for me, to find out all places that affect IMPACT binary building at the CI!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants