Simulation Data Format

Simulated data are provided to DigiPopData.jl as individual-level results of QSP model simulations. They are represented as a tabular structure and are typically produced by an external simulator.

Structure

Simulation data are expected to be loaded as a DataFrame with the following columns:

  • id Identifier of a virtual patient. Used to distinguish individuals and for selection or weighting procedures. Type: String, or Int64

  • scenario Identifier of a simulation scenario (e.g. treatment, dosing regimen, condition). Allows multiple scenarios per virtual patient. Type: String.

  • <endpoint columns> One or more columns containing simulated model outputs that correspond to experimental metrics (e.g. concentration, biomarker value, event time). Column names must match the identifiers referenced by experimental metrics. Type: String, Int64, or Float64

Each row corresponds to one simulated individual under one scenario. For a given workflow, it is typically expected that all required endpoint values are present.

CSV file format

Simulation results can be stored and loaded from a CSV file. This is particularly useful when simulation and analysis are performed in separate tools or when virtual patient selection is applied to precomputed results.

Simulation data can be loaded as follows:

using CSV, DataFrames

simulation_df = CSV.File("simulation_data.csv") |> DataFrame

Example

Example of a simulation table for a virtual population:

idscenarioconc_t24conc_t48biomarker_t48response
VP1placebo1.230.4513NonResponder
VP2placebo0.980.5210NonResponder
VP3placebo1.100.4812NonResponder
VP1treated3.420.308Responder
VP2treated2.950.334NonResponder

In this example:

  • id identifies virtual patients,
  • scenario distinguishes treatment conditions,
  • conc_t24, conc_t48, and biomarker_t48 are simulated numerical endpoints,
  • response is a categorical simulated endpoint, that can be referenced by experimental metrics.

This table can be directly used together with experimental metric definitions to compute mismatch and loss values.