Sobol Sensitivity Analysis

Basics

Let's say that we want to analyze how the output, $y$ of the function $f$ below varies with respect to the input parameters, $\mathbf{X}$ :

$$y = f(\mathbf{X})$$

$$\mathbf{X} = \left{ x_1, x_2, …, x_k \right}$$

How do we separate out the contribution of each of the parameters?

The One-at-a-Time Approach

Traditional Sensitivity Analysis approach, where one input variable $x_i$ is moved while keeping the others are their baseline. The sensitivity of the model output to a change in the input $x_i$ is commonly represented as

$$\left\vert \frac{\partial y }{\partial x_i } \right \vert_{x_i^} = \frac{f(x_1^, x_2^, …, x_i^ + \Delta x_i, …, x_k^) - f(x_1^, x_2^, …, x_k^)} {\Delta x_i}$$

where $x_i^*$ represents a baseline value. However, the OAT approach has several limitations:

  1. OAT not fully explore the input space, and the unexplored space grows super-exponentially
  2. OAT does not consider parameters interactions
  3. OAT does not handle nonlinearity

There are, however, modifications to the OAT approach that extend extend it to be global, including the Morris method, but they still face limitations that have been discussed in literature.

Variance Based Approach (Sobol)

Sobol's Sensitivity Analysis is a form of variance-based sensitivity analysis, that ranks parameters based on the variance they contribute to the output. Here we are referring to variance as the expected value of the squared deviation from the mean. For a parameter $y$, it's commonly written as:

$$V(y) = \mathbf{E}\left[ \left( y - \mu \right)^2\right]$$

where $\mu$ is the expected value of $y$, $E[y]$. Expanding variance results in the following equivalent expression

$$V(y) = \mathbf{E}\left[y^2\right] - \mu^2$$

which will be critical to understanding Sobol's method.

To understand Sobol's method, we have to first start with functional decomposition or ANOVA decomposition, which states that the function $y = f(\mathbf{X})$ can be represented as:

$$f(\mathbf{X}) = f_0 + \sum_i^k\sum_{j<i}^k f_{i..j}(x_i, …, x_j)$$

or more directly

$$f(\mathbf{X}) = f_0 + \sum_i^k f_{i}(x_i) + \sum_j^k f_{i, j}(x_i, x_j) + … f_{1,2,…,k}(x_1, x_2, …, x_k)$$

The number of terms in total is $2^k$, including $f_0$. It as this point that we make a couple assumptions:

  1. Each term is square integrable (this implies that we can calculate variance)
  2. The functions are orthogonal, or satisfy the unicity condition:
    1. $\int_0^1 f_{i..j} (x_i, …, x_k) dx_s = 0$ for $s = i, …, k$
    2. The meaning of this is no covariance
    3. Note I removed $p(x)$ from the below equations, which is the probability function of $x$. This is allowed in this context because we are assuming independent input samples.

With the assumptions in mind, the functions $f_0, f_i, f_{ij}$ are represented as:

$$f_0 = E\left[ Y\right] = \int f(\mathbf{X})d\mathbf{X}$$

$$f_i = E_{\mathbf{X_{-i}}}\left[ Y \vert x_i\right] - E\left[Y\right]$$

$$f_{i, j} = E_{\mathbf{X_{-i,j}}}\left[ Y \vert x_i, x_j\right] - f_i - f_j - E\left[Y\right]$$

The grok of $f_i$ here is that is is the expected value when holding $x_i$ constant and varying all other parameters. Now, here comes the important part: variance can similarly be decomposed as:

$$V(y) = \sum_i^k V_i + \sum_i^k \sum_{j<i}^k V_{ij} + ….$$

where $V(y)$ represents the total output variance. $V_i$ represents the output variance that is attributed to parameter $x_i$ and $V_{i,j}$ is the variance caused by the interactions of parameters $x_i$ and $x_j$:

$$V_i = V(f_i) = V_{x_i}\left(E_{\mathbf{X}_{~i}}\left[ Y \vert x_i\right]\right)$$

$$V_{i,j} = V\left(E\left[ Y \vert x_i, x_j\right]\right) -V_i - V_j$$

First-Order Effects

Re-arranging the above equations we also have:

$$V(y) = V\left(E\left[ Y \vert x_i \right]\right) + E\left[V\left( Y \vert x_i \right)\right]$$

$$V\left(E\left[ Y \vert x_i \right]\right) = V(y) - E\left[V\left( Y \vert x_i \right)\right]$$

which shows that the total variance in $y$ is equivalent to the variance in the expected value of $y$ given $x_i$ plus the expected variance in $y$ given $x_i$. Using the definition of variance as $\mathbf{E}\left[y^2\right] - \mu^2$, the equation for $V(y)$ above can be re-written in integral form as:

$$ V(y) = \int \int … \int f^{2}(x_{1}, …, x_{k})\Pi_{i=1}^{k} dx_{i} - E^{2}[y] $$

Re-writing the equation for $V_i$ at one point $x_i = \hat{x}_i$, we have the following:

$$V(y \vert x_i = \hat{x}_i) = \int \int .. \int f^2(x_1, x_2, …, \hat{x}i, …, x_k) \Pi{xj\neq i}^k dx_j - E^2[y \vert x_i = \hat{x}_i]$$

which gives us the variance of $y$ at a single point by integrating over all values of $\mathbf{X}_{-i}$. However, for a global sensitivity analysis, we don't want the variance at one point, but instead we want to attribute the variance caused by $x_i$ over all values of $\hat{x}_i$ . So, we add an additional integral, over $x_i$. This turns above into an equation for the expected value of variance over all $x_i$

$$E[V(y \vert x_i)] = \int \int .. \int f^2(x_1, x_2, …, x_i, …, x_k) \Pi_{j = 1}^k dx_{j} - \int E^2[y \vert x_i = \hat{x}_i] d\hat{x}_i$$

Using the equality from above, we see that the first term in the above equation is the same as the first term in the integral form of $V(y)$ . Subtracting above from $V(y)$, we are left with

$$V(y) - E[V(y \vert x_{i})]= \int E^{2} [y \vert x_i = \hat{x}i] d\hat{x}{i} - E^2[y]$$

The left side of this equation is equivalent to $V\left(E\left[ Y \vert x_i \right]\right)$ through the equation that we explored above. So, we have simplified the equation to only the square of the expected outcome of the model (given $x_i$), which is quite easy to compute, and a double integral. Using the shortened notation of $V_i$, the equation becomes

$$V_i = \int E^{2} [y \vert x_i = \hat{x}i] d\hat{x}{i}^{2 } - f_{0}^2$$

Remember that $E[y]$ is the same as $\int f(X)dx$. We re-introduce the notation of $\mathbf{X}_{-i}$, which means for all values of $\mathbf{X}$ except for the $ith$ column. With this notation, the first part of the above equation becomes:

$$\int E^{2} [y \vert x_i = \hat{x}i] d\hat{x}{j}^{2 } = \int \mathbf{E}{\mathbf{X{-i}}} \left[ y \vert x_i \right] \cdot \mathbf{E}{\mathbf{X{-i}}} \left[ y \vert x_i \right] dx_i = \int \int f(x_1, …, \mathbf{x_i}, … x_k) f(x_1', …, \mathbf{x_i}, … x_k')d\mathbf{X}d\mathbf{X}_{-i}'$$

The interpretation is that there are two loops, one to compute the integral over $d\mathbf{X}$ and the second to compute the integral over $d\mathbf{X}_i$. It's here that a second matrix $\mathbf{X}'$ is introduced. This matrix is an independent sample of the input space and replaces the inner for loop that would need to occur for the double integral (Sobol has a proof of the validity of this method).

If you look closely at the function, you will also notice that $x_i$ is the same in both evaluations of $f$. It is formalized using three matrices: $\mathbf{A}$, $\mathbf{B}$ , and $\mathbf{A}{\mathbf{B}}$. The construction of the matrices is a simple process. Two independent samples of the input are made with dimensions $N \times k$ . Then, a third matrix, $\mathbf{A}{\mathbf{B}}$, is composed by taking $\mathbf{A}$ and replacing the $ith$ column with the corresponding column of $\mathbf{B}$. The result is a 3 dimensional matrix, $N \times k \times k$.

From this representation, it is easy to derive the number of simulations needed for the sensitivity analysis. We simply "stack" the matrices to arrive at $N (k + 2)$. Using the sample matrices, the above equation is greatly simplified in the Monte Carlo framework. Only one integral remains, with the equation being written as:

$$V_i = \frac{1}{N} \sum^{N}{j=1} f(\mathbf{B}){j} \left (f \left(\mathbf{A}{\mathbf{B}}^{(i)}\right){j} - f\left(\mathbf{A})_{j}\right)\right)$$

Higher-Order Effects

So far we have been focused on $V_i$, (really $V_{x_i}\left(E_{\mathbf{X}_i}\left[ Y \vert x_i\right]\right)$), which is also referred to as the *first-order variance*. It is the variance that $x_i$ alone causes in the output. Yet, the power of variance-based or global sensitivity methods is that they can tell us about the interactions between parameters. Unless the model of interest is linear, there will be interaction effects, and it's likely that we care about them. These further-order effects are usually summarized by two metrics in literature. with one investigating the total variance that $x_i$ causes and another investigating the variance due to two parameters, $x_i$ and $x_j$.

The total variance is a rather simple modification to the first order equations. Basically, instead of computing the variance induced by a parameter $x_i$, we want to find all the other first order variance and subtract that variance from the total. Its written as

$$V_{T_{i}}= V(y) - V_{\mathbf{X}{-i}}(\mathbf{E}[y \vert \mathbf{X}{-i}]) = \mathbf{E}{\mathbf{X}{-i}}[V_{x_i}(y \vert \mathbf{X}_{~ i})]$$

The integral form is

$$V_{\mathbf{X}{-i}}(\mathbf{E}[y \vert \mathbf{X}{-i}]) = \frac{1}{2} \int \left( f(x_1, …, \mathbf{x_i}, … x_k) - f(x_1, …, \mathbf{x_i'}, … x_k) \right )^{2} d\mathbf{X} dx_{i}'$$

and the Monte Carlo version

$$\mathbf{E}{\mathbf{X}{-i}}[V_{x_i}(y \vert \mathbf{X}{~ i})] = \frac{1}{2N} \sum^{N}{j=1} \left( f(\mathbf{A}){j} - f \left(\mathbf{A}{\mathbf{B}}^{(i)}\right)_{j} \right)^2 $$

As compared to $V_i$, this equation uses $\mathbf{A}$ and $\mathbf{A}_B$, which have all columns in common, except for the $ith$. Remaining are the second order sensitivity indicies:

$$\mathbf{E}{\mathbf{X}{-i,j}}[V_{x_i,x_j}(y \vert \mathbf{X}{~ i,j})] = \frac{1}{2N} \sum^{N}{j=1} \left( f(\mathbf{A}){j} - f \left(\mathbf{A}{\mathbf{B}}^{(i)}\right)_{j} \right)^2 $$

Re-Implementing in Rust

GitHub - mschrader15/sobol-sensitivity-rs: Basic Implementation of Sobol's SA in Rust

fn calc_s1(slices: &ArraySlices, y: &Array1<f64>, res: &mut SARes) -> () {
	// calculate the first order sensitivity indices
	// inner = (AB - A)
	let mut inner = slices.ab.to_owned().sub(&slices.a);
	// innner = B (1d) * (inner (2 dimensional) ), iterate columns
	inner.columns_mut().into_iter().map(|x| x.mul(&slices.b));
	// 1/N SUM (inner)
	let vi = inner.mean_axis(Axis(0)).unwrap();
	// Si = Vi / V
	res.set_s1(vi / y.std(0.0));
}

References

  1. https://doi.org/10.1016/j.cpc.2009.09.018
  2. Global sensitivity indices for nonlinear mathematical models
  3. Variance based sensitivity analysis of model output