Page History
Requirements for data submission are defined and managed by the regulatory authorities to whom data are submitted. This section describes general requirements for datasets which that may be part of a submission. However However, additional conventions may be defined by regulatory bodies or negotiated with regulatory reviewers. In such cases, additional requirements must be followed.
Tabulation Datasets
Observations generated over the course of a study about tobacco products and study subjects generated to support a submission are represented in a series of datasets aligned with logical groupings of data per domains. when data have been collectedinto domains. Domains described in this guide are generally aligned with implementation of a single dataset for each domainfile in which to represent data in scope for a domain.All datasets are structured as flat files with rows representing observations and columns representing variables.In some cases, a dataset implemented in alignment with for a domain may be split into physically separate datasets dataset files to support submission when needed and as allowable by the regulatory authority.
Generally, a domain is represented by a single dataset.
A domain dataset may be split into physically separate datasets to support submission when needed and as allowable by the regulatory authority. The following conventions must be adhered to when splitting domains into separate datasets:
- A domain based on a general observation class may be split according to values in --CAT. When a domain is split on --CAT, --CAT must not be null.
- The Findings About (FA) domain may alternatively be split based on the domain of the value in --OBJ.
To ensure split datasets can be appended back into one domain dataset:
All datasets are structured as flat files with rows representing observations and columns representing variables. Data represented in tabulation datasets will include:
- Data as originally collected or received.
- Data from the protocol.
- Assigned data.
- Derived data.
Dataset names will reflect the following conventions:
- Names will be a unique 2 to 4 letter character code.
- This code, which is stored in the SDTM variable named DOMAIN, is used in 4 ways: as the dataset name, as the value of the DOMAIN variable in that dataset, as a prefix for most variable names in that dataset, and as a value in the RDOMAIN variable in relationship tables (see Section 8, Representing Relationships and Data).
The following guidance will be adhered to for tabulation datasets:
Metadataspec | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
relrec.xpt
|
Analysis Datasets
Observations about tobacco products and study subjects generated to support analysis in a submission are represented in a series of datasets based on the CLASS values described in the TIG. Datasets described in this guide are generally created to support a certain type of analysis, but sometimes analysis datasets are created to support the creation of a subsequent dataset that will be used for analysis. All datasets are structured as flat files with rows representing observations and columns representing variables.
The following guidance will be adhered to for analysis datasets:
Metadataspec | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
Row
...
STUDYID
...
RDOMAIN
...
USUBJID
...
IDVAR
...
IDVARVAL
...
RELTYPE
...
RELID
...
a domain is a grouping of observations that are related while a dataset
The terms domain and dataset
The terms “domain” and “dataset” are commonly used in CDISC’s nomenclature and found frequently in the Study Data Tabulation Model (SDTM). For example, the SDTM v1.8 includes 134 instances of "domain" and says "A collection of observations on a particular topic is considered a domain." The Model includes 78 instances of dataset and certain structures in the model are called "datasets" rather than "domains." Is there a difference between a domain and a dataset?
The CDISC Glossary defines these terms as follows:
- Domain: A collection of logically related observations with a common, specific topic that are normally collected for all subjects in a clinical investigation. NOTE: The logic of the relationship may pertain to the scientific subject matter of the data or to its role in the trial. Example domains include laboratory test results (LB), adverse events (AE), concomitant medications (CM). [After SDTM Implementation Guide version 3.2, CDISC.org] See also general observation class.
- Dataset: A collection of structured data in a single file. [CDISC, ODM, and SDS] Compare to analysis dataset, tabulation dataset.
In plainer terms, a domain is a grouping of observations that are related while a dataset is the data structure associated with that grouping of observations. Both domains and datasets use the same nomenclature, which is why they are often confused.
The distinction between domain and dataset is most clearly seen in cases where a general observation class domain is split into multiple datasets in a submission. Common examples are splitting the Laboratory Test Results (LB) domain due to size, splitting the Questionnaires (QS) domain by questionnaire, and splitting the Findings About Events or Interventions (FA) domain by parent domain.
However, since in most cases there is a one-to-one relationships between a conceptual domain and a dataset based on that conceptual domain, the words are used interchangeably in the standards and, therefore, by most users. The structures called “relationship datasets” were given that name because they are mechanisms for connecting information represented in different datasets rather than observations about study subjects. Note that none of the relationship datasets includes the variable DOMAIN. However, in a submission, these datasets need dataset names, and character strings used in those names are included in the CDISC Codelist called "SDTM Domain Abbreviations."
In conclusion, there is a clear distinction between the meaning of "domain" and "dataset" but given that the naming conventions are the same across both terms, in many cases they can be considered interchangeable.
Tabulation Datasets
langauge from here https://www.cdisc.org/kb/articles/domain-vs-dataset-whats-difference
The terms “domain” and “dataset” are commonly used in CDISC’s nomenclature and found frequently in the Study Data Tabulation Model (SDTM). For example, the SDTM v1.8 includes 134 instances of "domain" and says "A collection of observations on a particular topic is considered a domain." The Model includes 78 instances of dataset and certain structures in the model are called "datasets" rather than "domains." Is there a difference between a domain and a dataset?
The CDISC Glossary defines these terms as follows:
- Domain: A collection of logically related observations with a common, specific topic that are normally collected for all subjects in a clinical investigation. NOTE: The logic of the relationship may pertain to the scientific subject matter of the data or to its role in the trial. Example domains include laboratory test results (LB), adverse events (AE), concomitant medications (CM). [After SDTM Implementation Guide version 3.2, CDISC.org] See also general observation class.
- Dataset: A collection of structured data in a single file. [CDISC, ODM, and SDS] Compare to analysis dataset, tabulation dataset.
In plainer terms, a domain is a grouping of observations that are related while a dataset is the data structure associated with that grouping of observations. Both domains and datasets use the same nomenclature, which is why they are often confused.
The distinction between domain and dataset is most clearly seen in cases where a general observation class domain is split into multiple datasets in a submission. Common examples are splitting the Laboratory Test Results (LB) domain due to size, splitting the Questionnaires (QS) domain by questionnaire, and splitting the Findings About Events or Interventions (FA) domain by parent domain.
However, since in most cases there is a one-to-one relationships between a conceptual domain and a dataset based on that conceptual domain, the words are used interchangeably in the standards and, therefore, by most users. The structures called “relationship datasets” were given that name because they are mechanisms for connecting information represented in different datasets rather than observations about study subjects. Note that none of the relationship datasets includes the variable DOMAIN. However, in a submission, these datasets need dataset names, and character strings used in those names are included in the CDISC Codelist called "SDTM Domain Abbreviations."
In conclusion, there is a clear distinction between the meaning of "domain" and "dataset" but given that the naming conventions are the same across both terms, in many cases they can be considered interchangeable.
...
Pagenav |
---|