Guidance for Datasets

Datasets and Domains - SDTMIG v3.4 - Wiki (cdisc.org)

Observations about study subjects are normally collected for all subjects in a series of domains. A domain is defined as a collection of logically related observations with a common topic. The logic of the relationship may pertain to the scientific subject matter of the data or to its role in the trial. Each domain is represented by a single dataset.

Each domain dataset is distinguished by a unique, 2-character code that should be used consistently throughout the submission. This code, which is stored in the SDTM variable named DOMAIN, is used in 4 ways: as the dataset name, as the value of the DOMAIN variable in that dataset, as a prefix for most variable names in that dataset, and as a value in the RDOMAIN variable in relationship tables (see Section 8, Representing Relationships and Data).

All datasets are structured as flat files with rows representing observations and columns representing variables. Each dataset is described by metadata definitions that provide information about the variables used in the dataset. The metadata are described in a data definition document (i.e., a Define-XML document) that is submitted with the data to regulatory authorities. The Define-XML standard (available at https://www.cdisc.org/standards/data-exchange/define-xml) specifies metadata attributes to describe SDTM data.

Data represented in SDTM datasets include data as originally collected or received, data from the protocol, assigned data, and derived data. The SDTM lists only the name, label, and type, with a set of brief CDISC guidelines that provide a general description for each variable.

Page tree

Guidance for Datasets