How To Determine Where Data Belong

CDISC models used to standardize data collection, data tabulation, and creation of analysis datasets are implemented with purpose in mind.

Category	Standards for Collection	Standards for Tabulation	Standards for Analysis
Product Description		Study Data Tabulation Model (SDTM)	Analysis Data Model (ADaM)
Nonclinical		Study Data Tabulation Model (SDTM)
Product Impact on Individual Health	Clinical Data Acquisition Standards Harmonization (CDASH)	Study Data Tabulation Model (SDTM)	Analysis Data Model (ADaM)
Product Impact on Population Health			Analysis Data Model (ADaM)

SDTM

Observations about study subjects are normally collected for all subjects in a series of domains. A domain is defined as a collection of logically related observations with a common topic. The logic of the relationship may pertain to the scientific subject matter of the data or to its role in the trial. Each domain is represented by a single dataset.

Each domain dataset is distinguished by a unique, 2-character code that should be used consistently throughout the submission. This code, which is stored in the SDTM variable named DOMAIN, is used in 4 ways: as the dataset name, as the value of the DOMAIN variable in that dataset, as a prefix for most variable names in that dataset, and as a value in the RDOMAIN variable in relationship tables (see Section 8, Representing Relationships and Data).

All datasets are structured as flat files with rows representing observations and columns representing variables. Each dataset is described by metadata definitions that provide information about the variables used in the dataset. The metadata are described in a data definition document (i.e., a Define-XML document) that is submitted with the data to regulatory authorities. The Define-XML standard (available at https://www.cdisc.org/standards/data-exchange/define-xml) specifies metadata attributes to describe SDTM data.

All models implemented as part of this guide collect and represent data by common topics with:

CDASH and SDTM grouping logically related data points in domains; and
ADaM dataset design customizable to support analysis requirements.

SEND

Aside from a limited number of special-purpose domains, all subject-level SDTM datasets are based on 1 of the 3 general observation classes. When faced with a set of data that were collected and that "go together" in some sense, the first step is to identify SDTM observations within the data and the general observation class of each observation. Once these observations are identified at a high level, 2 other tasks remain:

Determining whether the relationships between these observations need to be represented using GRPID within a dataset, as described in Section 8.1, (SENDIG v3.1.1) Relating Groups of Records Within a Domain Using the --GRPID Variable, or using RELREC between datasets, as described in Section 8.3, (SENDIG v3.1.1) Supplemental Qualifiers - SUPP-- Datasets
Placing all the data items in 1 of the identified general observation class records, or in a SUPP-- dataset, as described in Section 8.5, (SENDIG v3.1.1) Relating Findings To Multiple Subjects - Subject Pooling

In practice, considering the representation of relationships and placing individual data items may lead to reconsidering the identification of observations, so the whole process may require several iterations.

ADD MORE TEXT HERE

Page tree

How To Determine Where Data Belong