You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 62 Next »

Requirements for data submission are defined and managed by the regulatory authorities to whom data are submitted. This section describes general requirements for datasets which may be part of a submission. However, additional conventions may be defined by regulatory bodies or negotiated with regulatory reviewers. In such cases, additional requirements must be followed.


Tabulation Datasets

Observations about tobacco products and study subjects generated to support a submission are represented in a series of datasets aligned with logical groupings of data per domains. Domains described in this guide are generally aligned with implementation of a single dataset in which to represent data in scope for a domain. All datasets are structured as flat files with rows representing observations and columns representing variables. In some cases, a dataset implemented for a domain may be split into physically separate datasets to support submission when needed and as allowable by the regulatory authority. 

The following guidance will be adhered to for tabulation datasets: 


NumGuidance ForImplementation
1Dataset Content

Data represented in tabulation datasets will include the following per regulatory requirements, scientific needs, and standards in this guide:

  • Data as originally collected or received to support the submission.
  • Data from external references relevant to the submission (such as a study protocol).
  • Data assigned per conventions in the TIG.
  • Data derived per regulatory and TIG conventions.
2Dataset Naming
  • Domain datasets based on the SDTM General Observations Classes will be named using the two-character code for the domain or using the applicable four-character code when a dataset is split.
  • Supplemental Qualifier datasets will be named using convention SUPP concatenated with the two-character domain code or four-character code when a dataset is split (e.g., SUPPDM, SUPPFA, SUPPFACM).
  • All other datasets will be named using the code for the domain or dataset and (e.g., DM, RELREC). 
4Splitting Datasets

A domain dataset may be split into physically separate datasets to support submission when needed and as allowable by the regulatory authority. The following conventions must be adhered to when splitting domains into separate datasets:

  • A domain based on a General Observation Class may be split according to values in variable --CAT. When a domain is split on --CAT, --CAT must not be null.
  • The Findings About (FA) domain may be split based on the of the value in variable --OBJ.

To ensure split datasets can be appended back into one domain dataset:

    1. The value of DOMAIN must be consistent across the separate datasets as it would have been if they had not been split (e.g., LB, FA).
    2. All variables that require a domain prefix (e.g., --TESTCD, --LOC) must use the value of DOMAIN as the prefix value (e.g., LB, FA).
    3. --SEQ must be unique within USUBJID for all records across all the split datasets. If there are 1000 records for a USUBJID across the separate datasets, all 1000 records need unique values for --SEQ.
    4. When relationship datasets (e.g., SUPPxx, FAxx, CO, RELREC) relate back to split parent domains, the value of IDVAR would generally be --SEQ. When IDVAR is a value other than --SEQ (e.g., --GRPID, --REFID, --SPID), care should be used to ensure that the parent records across the split datasets have unique values for the variable specified in IDVAR, so that related children records do not accidentally join back to incorrect parent records.
    5. Permissible variables included in one split dataset need not be included in all split datasets.
    6. For domains with 2-letter domain codes, split dataset names can be up to 4 characters in length. For example, if splitting by --CAT, dataset names would be the domain name plus up to 2 additional characters (e.g., LBHM for LB if the value of --CAT is HEMATOLOGY). If splitting Findings About by parent domain, then the dataset name would be the domain code, "FA", plus the two-character domain code for parent domain code (e.g., "FACM"). The four-character dataset-name limitation allows the use of a Supplemental Qualifier dataset associated with the split dataset.
    7. Supplemental Qualifier datasets for split domains will also be split. The nomenclature will include the additional one to two characters used to identify the split dataset (e.g., SUPPLBHM, SUPPFACM). The value of RDOMAIN in the SUPP-- datasets would be the two-character domain code (e.g., LB, FA).
    8. In RELREC, if a dataset-level relationship is defined for a split Findings About domain, then RDOMAIN will contain the four-character dataset name, rather than the domain name "FA" (e.g., the value of RDOMAIN will be FACM).

Analysis Datasets

add analysis guidance here.

  • No labels