Inclusion of Input Data that Are Not Analyzed but that Support a Derivation in the ADaM Dataset

ADaM datasets are developed to facilitate intended analyses. The original data sources for ADaM datasets are SDTM datasets, even when ADaM datasets are derived from other ADaM datasets. ADaM has features that enable traceability from analysis results to ADaM datasets and from ADaM datasets to SDTM datasets.

The ADaM methodology to achieve the expected traceability is to describe the derivation algorithms in the metadata and, if practical and feasible, to include supportive rows as appropriate for traceability. To include the input data as rows in the ADaM dataset, columns should be added where feasible to indicate the source of the input data. Although this methodology increases both the size of the dataset and the complexity of selecting the appropriate rows for analysis, it also provides input data in an immediately accessible manner. In addition, intermediate values can be retained if appropriate flags are used to distinguish them.

In general, it is strongly recommended to include as much supporting data as is needed for traceability. However, there are situations in which this may not be practical. For example, if an analyzed parameter is a summary derived from a very large number of raw e-diary input records, it may be neither useful nor practical to include all of the raw e-diary records as rows in the ADaM dataset.

The remainder of this section addresses cases where the ADaM datasets contain not only the analysis data but also input data that are necessary to provide clearer traceability of the algorithms used to derive the analysis data. In addition to the actual values used in the analysis, the dataset may include rows not used in the analysis, rows containing input data, and rows containing intermediate values computed during the derivation of the analysis data. Flags or other columns are used to distinguish the various data types as well as to provide a traceable path from the input data to the value used in the analysis. The analysis results metadata specify how the appropriate rows are identified (by a specific selection clause). The identification of rows used in an analysis is addressed in Section 2.9.9.4, Identification of Records Used for Analysis.

Unless the input data are already present as column(s) on the row (e.g., as covariate(s) or supportive variable(s)), the input data will be retained as rows in the ADaM dataset. The analysis value column (AVAL and/or AVALC) on the retained input data row will contain a value for the analysis parameter. Not all columns from the input dataset are carried into the ADaM dataset; instead, additional variables will be included indicating the source of the input data—domain, variable name, and sequence number. This approach allows the inclusion of input data from multiple domains. If the input data are already included in columns on the analysis parameter row (e.g., as covariates or supportive information), there is no need to include additional rows for those input data. The decision regarding keeping the input data as rows or columns will therefore be dictated by the types of input data and whether they are used for other purposes in the ADaM dataset.

Retaining in 1 dataset all data used in the determination of the analysis parameter value will provide the clearest traceability in the most flexible manner within the standard ADaM BDS. This large dataset also provides the most flexibility for testing the robustness of an analysis.

If it is determined that this large dataset is too cumbersome, the producer can choose to provide 2 datasets: 1 that contains all rows and another that is a subset of the first, containing only the rows used in the specified analysis. To ensure traceability, the metadata for the subset ADaM dataset will refer back to the full ADaM dataset as the immediate predecessor. This approach provides the needed traceability along with a dataset that can be used in an analysis without specifying a selection clause. The producer will need to ensure consistency is maintained between the 2 datasets. There also may be potential confusion about which dataset supported an analysis, if analysis results metadata is not provided for that analysis.

« Inclusion of All Observed and Derived Records for a Parameter Versus the Subset of Records Used for Analysis

Identification of Records Used for Analysis »

Page tree

Inclusion of Input Data that Are Not Analyzed but that Support a Derivation in the ADaM Dataset