-
Notifications
You must be signed in to change notification settings - Fork 4
Data‐Consistent Inversion
Data-consistent inversion (DCI) is a measure-theoretic inversion technique which seeks to solve specific class of stochastic inverse problems. Namely, given an observed (or target) probability measure/density on quantities of interest and a computational model, DCI seeks a probability measure/density on model inputs such that the corresponding push-forward measure/density matches the observed/target. In other words, it seeks a pullback probability measure.
Of course, as with many inverse problems, the solution is not guaranteed to exist or to be unique. Both existence and uniqueness can be obtained if one regularizes the problem by incorporating prior knowledge or an initial guess about the model inputs. Existence of a solution can be guaranteed through what we call the predictability assumption which guarantees that the push-forward of the initial guess through the computational model can predict all of the data. This is mathematically described as requiring that the observed measure is absolutely continuous with respect to the push-forward of the initial. Incorporating this initial information also regularizes the problem in the sense that the solution is unique given a choice of initial. The DCI solution to this inverse problem is called the updated measure/density:
where
We see that the updated density is given by the product of the initial density and the ratio of the observed to the predicted (the pushforward of the initial). We usually assume the initial and observed densities are given, but practically, we only need the ability to sample from the initial and the observed can be approximated from data. The predicted density, in the denominator of the ratio, must be approximated using some form of density estimation. MrHyDE provides this ability to post-process a forward UQ study using samples from the initial density and can either reweight these samples or perform rejection sampling to provide information about the updated density.
The algorithm in MrHyDE follows these steps:
- Generate a set of samples from the initial density.
- Evaluate the model and the QoI for each of these samples.
- Construct an approximation of the predicted density using Gaussian kernel density estimation.
- Compute the ratio of observed and predicted densities at each sample point.
- Output the value of the ratio for each sample.
- (Optional) Perform rejection sampling using the ratio and output a set of samples from the updated density.
Note that in the above procedure, all of the density approximations take place in the QoI space, which is often much lower-dimensional than the input space.
To perform DCI within MrHyDE, the following settings should be specified in the input file:
Analysis:
analysis type: DCI
UQ:
seed: 123
samples: 100
DCI:
observed type: Gaussian
observed mean: 0.0
observed variance: 0.0001
This particular yaml block comes from the regression test in MrHyDE/regression/DCI/Gaussian-Observed
. We see that the requested analysis mode is DCI
, but we also need to provide a UQ
settings sub-block. The forward UQ problem is defined by this block and the initial distribution given to the stochastic parameters (not shown here). Then, DCI is performed as a postprocessing step given an observed (or target) distribution on the QoI. Here, we use a Gaussian distribution with a particular mean and variance. A uniform distribution can also be used:
Analysis:
analysis type: DCI
UQ:
seed: 123
samples: 100
DCI:
observed type: uniform
observed min: -0.02
observed max: 0.02
See MrHyDE/regression/DCI/Uniform-Observed
for more details.
Both the Gaussian and uniform observed distributions are currently limited to 1-dimensional QoI, though this would be easy to generalize. A third option for defining the observed distribution is through data:
Analysis:
analysis type: DCI
UQ:
seed: 123
samples: 100
DCI:
observed type: data
observed file: observed.dat
Here, we do not provide a functional form of the observed density and allow MrHyDE to use its internal Gaussian kernel density estimator to approximate the observed density - similar to how it approximates the predicted density. This capability is not limited to 1-dimensional QoI, but of course, density estimation scales poorly with dimension. See MrHyDE/regression/DCI/Data-Driven-Observed
for more details.
In each of these case, MrHyDE will print out some diagnostic information:
This output came from the regression test in MrHyDE/regression/DCI/Gaussian-Observed
. We see that the mean of the ratio of the observed and predicted gives a value reasonably close to one. This example only uses 100 samples for the forward UQ study to allow for rapid regression testing. Increasing the number of samples gives a value much closer to 1.0. In addition, we see that the mean and variance of pushforward of the updated density are close to the given values of 0.0 and 1.0e-4. The estimate of the information gained (measured by the KL-divergence) and the acceptance rate from performing rejection sampling are just for informational purposes and are not used for verification/validation purposes.
MrHyDE will also produce a file called DCI_output.dat
which contains the values of the QoI, the predicted density, the observed density, and whether the sample was accepted or rejected.