You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 86 Next »

Implementation of standards per the TIG starts with the selection of standards with which to collect, represent, and/or exchange data. After standards are selected, it is then possible to determine how the data are collected, represented, or exchanged using the standard.


Determining the Standard

At the highest level, sets of standards in this guide are aligned with both tobacco study use cases and by the activities in the data life cycle they support. In the TIG:  

  • Standards for Collection implement the CDASH Model to support development and use of case report forms (CRFs).
  • Standards for Tabulation implement the SDTM to organize data collected, assigned, or derived in datasets.
  • Standards for Analysis implement the ADaM to specify the principles to follow in the creation of analysis datasets and associated metadata.
  • Standards for Data Exchange with associated resources support the sharing of structured data between parties and across different information systems.

Use cases, activities, and associated sets of standards in scope for this guide are shown in the table below. U

Use caseStandards for CollectionStandards for TabulationStandards for AnalysisStandards for Data Exchange
Product Description 
SDTMADaMDefine-XML
Nonclinical
SDTM
Define-XML
Product Impact on Individual HealthCDASH Model SDTMADaMODM-XML, Define-XML
Product Impact on Population Health

ADaMDefine-XML

Determining Where Data Belong

Standards for collection, tabulation, and analysis are






All models implemented as part of this guide collect and represent data by common topics with:

  • CDASH and SDTM grouping logically related data points in domains; and 
  • ADaM dataset design customizable for analysis requirements.


The terms “domain” and “dataset” are commonly used in CDISC’s nomenclature and found frequently in the Study Data Tabulation Model (SDTM). For example, the SDTM v1.8 includes 134 instances of "domain" and says "A collection of observations on a particular topic is considered a domain." The Model includes 78 instances of dataset and certain structures in the model are called "datasets" rather than "domains." Is there a difference between a domain and a dataset?

The CDISC Glossary defines these terms as follows:

  • Domain: A collection of logically related observations with a common, specific topic that are normally collected for all subjects in a clinical investigation. NOTE: The logic of the relationship may pertain to the scientific subject matter of the data or to its role in the trial. Example domains include laboratory test results (LB), adverse events (AE), concomitant medications (CM). [After SDTM Implementation Guide version 3.2, CDISC.org] See also general observation class.
  • Dataset: A collection of structured data in a single file. [CDISC, ODM, and SDS] Compare to analysis dataset, tabulation dataset.

In plainer terms, a domain is a grouping of observations that are related while a dataset is the data structure associated with that grouping of observations. Both domains and datasets use the same nomenclature, which is why they are often confused.

The distinction between domain and dataset is most clearly seen in cases where a general observation class domain is split into multiple datasets in a submission. Common examples are splitting the Laboratory Test Results (LB) domain due to size, splitting the Questionnaires (QS) domain by questionnaire, and splitting the Findings About Events or Interventions (FA) domain by parent domain.

However, since in most cases there is a one-to-one relationships between a conceptual domain and a dataset based on that conceptual domain, the words are used interchangeably in the standards and, therefore, by most users. The structures called “relationship datasets” were given that name because they are mechanisms for connecting information represented in different datasets rather than observations about study subjects. Note that none of the relationship datasets includes the variable DOMAIN. However, in a submission, these datasets need dataset names, and character strings used in those names are included in the CDISC Codelist called "SDTM Domain Abbreviations."

In conclusion, there is a clear distinction between the meaning of "domain" and "dataset" but given that the naming conventions are the same across both terms, in many cases they can be considered interchangeable.


Domains


SDTM

Observations about study subjects are normally collected for all subjects in a series of domains. A domain is defined as a collection of logically related observations with a common topic. The logic of the relationship may pertain to the scientific subject matter of the data or to its role in the trial. Each domain is represented by a single dataset.

Each domain dataset is distinguished by a unique, 2-character code that should be used consistently throughout the submission. This code, which is stored in the SDTM variable named DOMAIN, is used in 4 ways: as the dataset name, as the value of the DOMAIN variable in that dataset, as a prefix for most variable names in that dataset, and as a value in the RDOMAIN variable in relationship tables (see Section 8, Representing Relationships and Data).



SEND

Aside from a limited number of special-purpose domains, all subject-level SDTM datasets are based on 1 of the 3 general observation classes. When faced with a set of data that were collected and that "go together" in some sense, the first step is to identify SDTM observations within the data and the general observation class of each observation. Once these observations are identified at a high level, 2 other tasks remain:

In practice, considering the representation of relationships and placing individual data items may lead to reconsidering the identification of observations, so the whole process may require several iterations.






ADD MORE TEXT HERE

  • No labels