Metric Data Format

Experimental and clinical data are represented in DigiPopData.jl as a collection of metrics, i.e. aggregated summary statistics describing real patient populations. Each metric defines a population-level experimental target that can be compared to individual-level simulations.

Internal Julia representation

Internally, each row of experimental data is represented as a MetricBinding. A MetricBinding links:

  • an experimental metric (e.g. mean, quantile, category, survival),
  • a scenario (e.g. treatment arm),
  • and an endpoint column in the simulation table.

Example:

mb = MetricBinding(
    "m_mean_conc24_Tx",     # metric id
    "Tx",                   # scenario (e.g. treatment arm)
    MeanMetric(
        40,                 # experimental sample size
        2.1,                # mean value
        0.2                 # standard deviation
    ),
    "conc_t24",             # endpoint column in simulation data
    true                    # active flag
)

Tabular metric definition

For practical workflows, metrics are usually defined in a table and loaded in bulk (e.g. from CSV or DataFrame) using parse_metric_bindings. Each row corresponds to one metric.

Core columns

A metric table typically includes:

  • id — unique object identifier
  • active — whether the metric is included in the loss (1 or 0)
  • scenario — scenario identifier used to match simulation conditions
  • endpoint — name of the simulation output column used for comparison
  • metric.type — metric type (e.g. mean, mean_sd, category, quantile, survival)
  • metric.size — experimental sample size
  • metric.<prop> — additional metric-specific properties, see more details in Overview

Example table

The table below defines two metrics for the same scenario Tx:

idactivescenariometric.typemetric.sizeendpointmetric.meanmetric.sdmetric.levelsmetric.values
m_conc24_mean_Tx1Txmean40conc_t242.100.2
m_biomarker_q_Tx1Txquantile40biomarker0.25;0.50;0.750.1;1.35;10.1

Interpretation:

  • m_conc24_mean_Tx targets the mean of conc_t24 in the experimental population.
  • m_biomarker_q_Tx targets the quantiles (0.25, 0.50, 0.75) of biomarker.

In practice, you may store only the columns required by the metric types used in your dataset.

Loading from CSV

using CSV, DataFrames

metrics_df = CSV.File("metrics.csv") |> DataFrame
metrics = parse_metric_bindings(metrics_df)