Page History

...

Tabulation Datasets

Observations about study subjectsgenerated over the course of a study are represented in a series of datasets aligned with logical groupings of data in domains. In practice, a domain is generally per domains. when data have been collected. Domains described in this guide are generally aligned with implementation of a single dataset for each domain. In some cases, a dataset implemented in alignment with a domain may be split into physically separate datasets to support submission when needed and as allowable by the regulatory authority.

Generally, a domain is represented by a single dataset.

...

1. The value of DOMAIN must be consistent across the separate datasets as it would have been if they had not been split (e.g., LB, FA).
2. All variables that require a domain prefix (e.g., --TESTCD, --LOC) must use the value of DOMAIN as the prefix value (e.g., LB, FA).
3. --SEQ must be unique within USUBJID for all records across all the split datasets. If there are 1000 records for a USUBJID across the separate datasets, all 1000 records need unique values for --SEQ.
4. When relationship datasets (e.g., SUPPxx, FAxx, CO, RELREC) relate back to split parent domains, the value of IDVAR would generally be --SEQ. When IDVAR is a value other than --SEQ (e.g., --GRPID, --REFID, --SPID), care should be used to ensure that the parent records across the split datasets have unique values for the variable specified in IDVAR, so that related children records do not accidentally join back to incorrect parent records.
5. Permissible variables included in one split dataset need not be included in all split datasets.
6. For domains with 2-letter domain codes, split dataset names can be up to 4 characters in length. For example, if splitting by --CAT, dataset names would be the domain name plus up to 2 additional characters (e.g., QS36 for SF-36). If splitting Findings About by parent domain, then the dataset name would be the domain code, "FA", plus the 2-character domain code for parent domain code (e.g., "FACM"). The 4-character dataset-name limitation allows the use of a Supplemental Qualifier dataset associated with the split dataset.
7. Supplemental Qualifier datasets for split domains would also be split. The nomenclature would include the additional 1 to 2 characters used to identify the split dataset (e.g., SUPPQS36, SUPPFACM). The value of RDOMAIN in the SUPP-- datasets would be the 2-character domain code (e.g., QS, FA).
8. In RELREC, if a dataset-level relationship is defined for a split Findings About domain, then RDOMAIN may contain the 4-character dataset name, rather than the domain name "FA", as shown in the following example.
  relrec.xpt
  Row
  STUDYID
  RDOMAIN
  USUBJID
  IDVAR
  IDVARVAL
  RELTYPE
  RELID
  1 ABC CM
  CMSPID
  ONE 1
  2 ABC FACM
  FASPID
  MANY 1

Standards for tabulation represent data in groupings of related data calleddomains. Datasets are the dataset structure associated with those groupings.

...

Page tree

Versions Compared

Old Version 38

New Version 39

Key

Row	STUDYID	RDOMAIN	USUBJID	IDVAR	IDVARVAL	RELTYPE	RELID
1	ABC	CM		CMSPID		ONE	1
2	ABC	FACM		FASPID		MANY	1