An ADaM dataset is a particular type of analysis dataset that either:

  1. Is compliant with one of the ADaM defined structures and follows the ADaM fundamental principles, OR
  2. Follows the ADaM fundamental principles defined in ADaM and adheres as closely as possible to TIG analysis variable naming and other conventions.


The Analysis Dataset Structure


Fundamental Principles 

ADaM datasets must adhere to certain fundamental principles:

  • ADaM datasets and associated metadata must clearly and unambiguously communicate the content and source of the datasets supporting the statistical analyses performed in a study following the ADaM.
  • ADaM datasets and associated metadata must provide traceability to show the source or derivation of a value or a variable (i.e., the data's lineage or relationship between a value and its predecessor(s)). The metadata must identify when and how analysis data have been derived or imputed.
  • ADaM datasets must be readily usable with commonly available software tools.
  • ADaM datasets must be associated with metadata to facilitate clear and unambiguous communication. Ideally the metadata are machine-readable.
  • ADaM datasets should have a structure and content that allow statistical analyses to be performed with minimal programming. Such datasets are described as "analysis-ready."
  • ADaM datasets contain the data needed for the review and re-creation of specific statistical analyses. It is not necessary to collate data into analysis-ready datasets solely to support data listings or other non-analytical displays.

Traceability

To assist review, ADaM datasets and metadata must clearly communicate how the datasets were created. The verification of derivations in an ADaM dataset requires having at hand the input data used to create the ADaM dataset. A CDISC-conformant submission includes both SDTM and ADaM datasets; therefore, it follows that the relationship between SDTM and ADaM must be clear. This requirement highlights the importance of traceability between the analyzed data (ADaM) and its input data (SDTM).

Traceability is built by clearly establishing the path between an element and its immediate predecessor. The full path is traced by going from one element to its predecessors, then on to their predecessors, and so on, back to the SDTM datasets, and ultimately to the data collection instrument.

Traceability establishes across-dataset relationships as well as within-dataset relationships. For example, the metadata for supportive variables within the ADaM dataset facilitates the understanding of how (and perhaps why) derived records were created.

There are 2 levels of traceability:

  1. Metadata traceability facilitates the understanding of the relationship of the analysis variable to its source dataset(s) and variable(s) and is required for ADaM compliance. This traceability is established by describing (via metadata) the algorithm used or steps taken to derive or populate an analysis variable from its immediate predecessor. Metadata traceability is also used to establish the relationship between an analysis result and ADaM dataset(s).
  2. Datapoint traceability points directly to the specific predecessor record(s) and should be implemented if practical and feasible. This level of traceability can be very helpful when trying to trace a complex data manipulation path. This traceability is established by providing clear links in the data (e.g., by use of a --SEQ variable) to the specific data values used as input for an analysis value. The BDS and OCCDS structures were designed to enable datapoint traceability back to predecessor data.

Traceability would then involve several steps. The analysis results would be linked by appropriate metadata to the data which supports the analytical procedure, those data would be linked to the intermediate analysis data, and the intermediate data would in turn be linked to the source SDTM data. When traceability is successfully implemented, it is possible to identify:

  • Information in the ADaM datasets that comes from the SDTM data
  • Information that is derived or imputed within the ADaM dataset
  • The method used to create derived or imputed data
  • Information used for analyses, in contrast to information that is not used for analyses yet is included to support traceability or future analysis

  • No labels