Guidance for Datasets

Observations about study subjects are represented in a series of domains.

Each domain is represented by a single dataset.

Typically, each domain is represented by a single dataset.

Datasets and Domains - SDTMIG v3.4 - Wiki (cdisc.org)

Each domain dataset is distinguished by a unique, 2-character code that should be used consistently throughout the submission. This code, which is stored in the SDTM variable named DOMAIN, is used in 4 ways: as the dataset name, as the value of the DOMAIN variable in that dataset, as a prefix for most variable names in that dataset, and as a value in the RDOMAIN variable in relationship tables (see Section 8, Representing Relationships and Data).

All datasets are structured as flat files with rows representing observations and columns representing variables. Each dataset is described by metadata definitions that provide information about the variables used in the dataset. The metadata are described in a data definition document (i.e., a Define-XML document) that is submitted with the data to regulatory authorities. The Define-XML standard (available at https://www.cdisc.org/standards/data-exchange/define-xml) specifies metadata attributes to describe SDTM data.

Data represented in SDTM datasets include data as originally collected or received, data from the protocol, assigned data, and derived data. The SDTM lists only the name, label, and type, with a set of brief CDISC guidelines that provide a general description for each variable.

Page tree

Guidance for Datasets