MeanSDMetric
MeanSDMetric is used for experimental endpoints reported as a mean together with a standard deviation (SD). It compares both the average level and the overall variability of individual-level simulation outputs to the corresponding experimental targets.
See API reference: MeanSDMetric.
Concept
Many publications report a mean and SD but do not provide individual observations. MeanSDMetric treats these summary statistics as fixed targets and evaluates how well a simulated cohort matches them.
The loss is implemented as a sum of two terms:
- a mean-matching term (same structure as
MeanMetric), - an SD-matching term that penalizes deviation of the simulated second central moment from the target variance.
Implementation note. For consistency between mismatch and add_mismatch_expression!, the simulated variance proxy is computed around the target mean rather than the simulated mean. This keeps the expression affine in binary selection variables (see below) and leads to a quadratic MIQP objective.
Example
Assume experimental data report a mean value of 2.5 and an SD of 0.6 based on a cohort of 50 patients:
using DigiPopData
ms_metric = MeanSDMetric(
50, # experimental sample size
2.5, # experimental mean
0.6 # experimental standard deviation
)Simulated individual outcomes can be provided as a numeric vector:
sim_values = randn(1000) .* 0.7 .+ 2.4
loss = mismatch(sim_values, ms_metric)Mathematics
Let simulated individual values be $y_1,\dots,y_N$ with sample mean
\[\mu_{virt} = \frac{1}{N}\sum_{i=1}^{N} y_i.\]
Let the experimental targets be $\mu_{exp}$ and $\sigma$ (SD).
Mean term
The mean mismatch term is:
\[\Lambda_\mu = N\,\frac{(\mu_{virt} - \mu_{exp})^2}{\sigma^2}.\]
SD term (variance proxy)
In the current implementation, the simulated second central moment is evaluated around the target mean:
\[s^2_{virt} = \frac{1}{N}\sum_{i=1}^{N}(y_i - \mu_{exp})^2.\]
The SD mismatch term penalizes deviation from the target variance $\sigma^2$:
\[\Lambda_{\sigma} = \frac{N}{2}\,\frac{(s^2_{virt} - \sigma^2)^2}{\sigma^4}.\]
The total loss is:
\[\Lambda = \Lambda_\mu + \Lambda_{\sigma}.\]
This form is likelihood-inspired (Gaussian second-order / chi-square style scaling) and is designed to be additive with other metric losses.
Quadratic formulation for binary selection
For cohort selection, introduce a binary vector $X\in\{0,1\}^{N_{tot}}$ where $X_i=1$ means that individual $i$ is selected. Assume the selected cohort size is fixed: $\sum_i X_i = N_{virt}$.
The selected-cohort mean is:
\[\mu_{virt}(X) = \frac{1}{N_{virt}}\sum_{i=1}^{N_{tot}} X_i y_i.\]
The variance proxy used by the implementation is:
\[s^2_{virt}(X) = \frac{1}{N_{virt}}\sum_{i=1}^{N_{tot}} X_i (y_i - \mu_{exp})^2.\]
Both $\mu_{virt}(X)$ and $s^2_{virt}(X)$ are affine in $X$ for fixed $N_{virt}$. Therefore, each squared deviation term is a quadratic function of the binary decision variables, and the total loss $\Lambda(X)$ defines a Mixed-Integer Quadratic Programming (MIQP) objective.
Practical notes
sizeis stored for consistency and potential future extensions but is not used in the current loss.- The SD term uses a variance proxy computed around the target mean to preserve quadratic structure for MIQP.
- The loss is intended for engineering workflows (calibration / selection) and can be summed with other metric losses.
- As with all moment-based matching, rare heavy tails and strong outliers may affect SD matching and should be handled at the data preparation stage.