You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 24 Next »

Requirements for data submission are defined and managed by the regulatory authorities to whom data are submitted. This section describes general requirements for datasets which may be part of a submission. However, additional conventions may be defined by or negotiated with regulatory reviewers and must be followed when applicable.



Observations about study subjects are represented in a series of domain datasets. All datasets are structured as flat files with rows representing observations and columns representing variables. Data represented in datasets will include:

  • Data as originally collected or received
  • Data from the protocol
  • Assigned data
  • Derived data

Dataset naming conventions

Datasets names will reflect the following conventions:

  • be a unique 2 to 4 letter character code (RELREC? SUPPQUAL)
  • This code, which is stored in the SDTM variable named DOMAIN, is used in 4 ways: as the dataset name, as the value of the DOMAIN variable in that dataset, as a prefix for most variable names in that dataset, and as a value in the RDOMAIN variable in relationship tables (see Section 8, Representing Relationships and Data).

Splitting Domains - SDTMIG v3.4 - Wiki (cdisc.org)

Splitting Domains into Separate Datasets

When applicable, a domain of topically related information may be split into physically separate datasets.

  • A domain based on a general observation class may be split according to values in --CAT. When a domain is split on --CAT, --CAT must not be null.
  • The Findings About (FA) domain may alternatively be split based on the domain of the value in --OBJ.

The following rules must be adhered to when splitting a domain into separate datasets to ensure they can be appended back into one domain dataset:

  1. The value of DOMAIN must be consistent across the separate datasets as it would have been if they had not been split (e.g., LB, FA).
  2. All variables that require a domain prefix (e.g., --TESTCD, --LOC) must use the value of DOMAIN as the prefix value (e.g., LB, FA).
  3. --SEQ must be unique within USUBJID for all records across all the split datasets. If there are 1000 records for a USUBJID across the separate datasets, all 1000 records need unique values for --SEQ.
  4. When relationship datasets (e.g., SUPPxx, FAxx, CO, RELREC) relate back to split parent domains, IDVAR would generally be --SEQ. When IDVAR is a value other than --SEQ (e.g., --GRPID, --REFID, --SPID), care should be used to ensure that the parent records across the split datasets have unique values for the variable specified in IDVAR, so that related children records do not accidentally join back to incorrect parent records.
  5. Permissible variables included in one split dataset need not be included in all split datasets.
  6. For domains with 2-letter domain codes (i.e., other than SUPPxx and RELREC), split dataset names can be up to 4 characters in length. For example, if splitting by --CAT, dataset names would be the domain name plus up to 2 additional characters (e.g., QS36 for SF-36). If splitting Findings About by parent domain, then the dataset name would be the domain code, "FA", plus the 2-character domain code for parent domain code (e.g., "FACM"). The 4-character dataset-name limitation allows the use of a Supplemental Qualifier dataset associated with the split dataset.
  7. Supplemental Qualifier datasets for split domains would also be split. The nomenclature would include the additional 1 to 2 characters used to identify the split dataset (e.g., SUPPQS36, SUPPFACM). The value of RDOMAIN in the SUPP-- datasets would be the 2-character domain code (e.g., QS, FA).
  8. In RELREC, if a dataset-level relationship is defined for a split Findings About domain, then RDOMAIN may contain the 4-character dataset name, rather than the domain name "FA", as shown in the following example. 

    relrec.xpt

    Row

    STUDYID

    RDOMAIN

    USUBJID

    IDVAR

    IDVARVAL

    RELTYPE

    RELID

    1ABCCM
    CMSPID
    ONE1
    2ABCFACM
    FASPID
    MANY1






  • No labels