All datasets are structured as flat files with rows representing observations and columns representing variables; each dataset is described by metadata definitions that provide information about the variables used in the dataset. Metadata are described in the CDISC Define-XML specification, available at https://www.cdisc.org/standards/data-exchange/define-xml.
Each observation consists of a series of named variables. Each variable, which normally corresponds to a column in a dataset, can be classified according to its role. A role describes the type of information conveyed by the variable about each distinct observation and how it can be used. There are variables which play different roles in different datasets. This is most common for variables which appear in both trial design datasets and general observation class datasets. For example, ARMCD is the topic variable in Trial Arms (TA), but a record qualifier in Demographics (DM) and Trial Visits (TV). Variables which appear in multiple general observation classes have the same role, although the variable qualified by a variable qualifier or synonym qualifier can be different in different general observation classes. For example, --MODIFY qualifies --TRT in interventions, --TERM in events, and --ORRES in findings.
SDTM variables can be classified into 5 major roles:
- Identifier variables, SDTM-793 - Getting issue details... STATUS such as those that identify the study, the subject involved in the study, the domain, and the sequence number of the record;
- Topic variables, which specify the focus of the observation (e.g., the name of a lab test);
- Timing variables, which describe the timing of an observation (e.g., start date, end date);
- Qualifier variables, SDTM-794 - Getting issue details... STATUS which include additional illustrative text or numeric values that describe the results or additional traits of the observation (e.g., units, descriptive adjectives); and
- Rule variables, which describe the conditions for starting, ending, branching, or looping in the Trial Design model.
The set of Qualifier variables can be further categorized into 5 subclasses:
- Grouping Qualifiers, used to group together a collection of observations within the same domain (e.g., categories or subcategories);
- Result Qualifiers, which describe the specific results associated with the topic variable in a Findings dataset and that answer the question raised by the topic variable;
- Synonym Qualifiers specifying an alternative name for a particular variable in an observation (e.g., coded version of a verbatim topic variable or the name associated with a test code);
- Record Qualifiers, which define additional attributes of the observation record as a whole, rather than describing a particular variable within a record (e.g., for a lab test, the specimen type and the name of lab that performed the test); and
- Variable Qualifiers used to further modify or describe 1 or more of a specific set of variables within an observation and which are only meaningful in the context of the variable they qualify (e.g., the unit for a numeric test result or a medication dose, the laterality of an anatomic location).
The SDTM includes variable metadata for the standard variables as described in Section 2.2, Table Structure.
All datasets for data about individuals and for data about a study include the variable DOMAIN, which is populated with a code that should be used in the dataset name. Some relationship datasets include the variable RDOMAIN, to describe a relationship to a domain for data about individuals. The Comments special-purpose domain includes the variable RDOMAIN, but other special-purpose domains do not. The Device-subject Relationships dataset includes the variable DOMAIN, but other study reference datasets do not.
The SDTM is structured so that data can be represented in SAS v5 transport files, the file format accepted by the US Food and Drug Administration (FDA) and other regulatory authorities. This imposes certain restrictions on variables. Note that the SDTM type specified in this document is either character or numeric, as these are the only types supported by SAS v5 transport files. Define-XML provides more descriptive data types (e.g., integer, float, date, datetime); see the Define-XML specification for information about how to represent SDTM types using Define-XML data types.