Page History

Grouping variables are a subset of Identifier and Grouping Qualifier variablesused to group records in datasets . The hierarchy of grouping variables isby study, domain, across subjects, by subject, and for a subject. When used per this guidance, grouping variables and their values adhere to the following hierarchy:

...

Metadataspec

, DOMAIN--CAT

Hierarchy of Grouping Variables

STUDYID

--SCAT

USUBJID

--GRPID

Hierarchy of Grouping Variables

DOMAIN
	--CAT
		--SCAT
			USUBJID
				--GRPID

...

STUDYID
DOMAIN
--CAT
--SCAT
USUBJID
--GRPID
--REFID

How Grouping Variables Group Data

...

All records with the same USUBJID value are a group of records that describe that subject.

The following are expectations for how values in these variables will group records given their hierarchy.

...

Metadataspec

Variables	Record Grouping	Purpose of Grouping
STUDYID	By study

All records with the same STUDYID value are a group of records that describe that study.
DOMAIN	By domain	All records with the same DOMAIN value are a group of records that describe that domain.
--CAT

...

, --SCAT

...

Across subjects

--CAT and --SCAT values subset groups of records within a domain and apply to all subjects within

...

the domain.

...

--GRPID values further group (subset) records within USUBJID. Unlike --CAT and --SCAT, --GRPID values are not intended to have any meaning across subjects and they are usually assigned during or after data collection.
Although --SPID and --REFID are identifier variables, these are usually not considered to be grouping variables, although they may have meaning across domains.

Differences Between Grouping Variables

The primary distinctions between -CAT/ SCAT and --GRPID are:

--CAT/--SCAT are known (identified) about the data before it is collected.
--CAT/--SCAT values group data across subjects.
--CAT/--SCAT may have some controlled terminology.
--GRPID is usually assigned during or after data collection at the discretion of the sponsor.
--GRPID groups data only within a subject.
--GRPID values are sponsor-defined, and will not be subject to controlled terminology.

Therefore, data that would be the same across subjects is usually more appropriate in --CAT/--SCAT, and data that would vary across subjects is usually more appropriate in --GRPID.

The primary distinctions between -CAT/ SCAT and --REFID are:

--CAT/-SCAT are usually textual descriptions of the data designed into the collection vehicle/process, and --REFID is usually a tracking number/value of some type assigned to an object being tracked (e.g., a blood sample).
ASK LOU ANN ABOUT THIS SECTION

In domains based on the Findings general observation class, the --RESCAT variable can be used to categorize results after the fact. --CAT and --SCAT by contrast, are generally predefined or used at the point of collection, not after assessing the value of findings results.

For SDTM

Hierarchy of Grouping Variables

...


USUBJID	By subject	All records with the same USUBJID value are a group of records that describe that subject.
--GRPID	For subjects	All records in the same domain with the same --GRPID value are a group of records within USUBJID.

...

How Grouping Variables Group Data

--CAT (Category) and --SCAT (Sub-category) values further subset groups within the domain. Generally, --CAT/--SCAT values have meaning within a particular domain. However, it is possible to use the same values for --CAT/--SCAT in related domains (e.g., MH and AE). When values are used across domains, the meanings should be the same. Examples of where --CAT/--SCAT may have meaning across domains/datasets include:
Cases where different domains in the same general observation class contain similar conceptual information. Adverse Events (AE), Medical History (MH), and Clinical Events (CE), for example, are conceptually the same data, the only differences being when the event started relative to the study start and whether the event is considered a regulatory-reportable adverse event in the study. Neurotoxicities collected in oncology trials both as separate Medical History CRFs (MH domain) and Adverse Event CRFs (AE domain) could both identify/collect "Paresthesia of the left arm". In both domains, the --CAT variable could have the value of "NEUROTOXICITY".
Cases where multiple datasets are necessary to capture data about the same topic. Following the oncology example, the existence and start and stop date of paresthesia of the left arm may be reported as an adverse event (AE domain), whereas the severity of the event is captured at multiple visits and recorded as Findings About (FA dataset). In both cases the --CAT variable could have a value of "NEUROTOXICITY".
Cases where multiple domains are necessary to capture data that were collected together and have an implicit relationship, perhaps identified in the Related Records (RELREC) special-purpose dataset.
Stress-test data collection may capture the following:
Information about the occurrence, start, stop, and duration of the test (in the Procedures (PR) domain)
Vital Signs recorded during the stress test (VS domain)
Treatments (e.g., oxygen) administered during the stress test (in an Interventions domain)
In such cases, the data collected during the stress tests recorded in 3 separate domains may all have --CAT/--SCAT values (STRESS TEST) that identify that data were collected during the stress test.

Within subjects (records with the same USUBJID values)

...

The optional grouping identifier variable --GRPID may be used in all domains based on the general observation classes. --GRPID identifies relationships between records within a USUBJID within a single domain and has no inherent meaning across subjects or across domains. Relationships between observations are defined by assigning the same unique character value to the --GRPID variable for sets of related observations. The values used for --GRPID can be any values the applicant chooses.

--GRPID values are not intended to have any meaning across subjects and are usually assigned during or after data collection.Although

Using --SPID and --REFID are Identifier variables, they may sometimes be used as grouping variables and may also have meaning across domains.

--LNKID and --LNKGRP express values that are used to link records in separate domains. As such, these variables are often used in IDVAR in a RELREC relationship when there is a dataset-to-dataset relationship.

--LNKID is a grouping identifier used to identify a record in one domain that is related to records in another domain, often forming a one-to-many relationship.
--LNKGRP is a grouping identifier used to identify a group of records in one domain that is related to a record in another domain, often forming a many-to-one relationship.

Differences Between Grouping Variables

The primary distinctions between --CAT/--SCAT and --GRPID are:

--CAT/--SCAT are known (identified) about the data before it is collected.
--CAT/--SCAT values group data across subjects.
--CAT/--SCAT may have some controlled terminology.
--GRPID is usually assigned during or after data collection at the discretion of the sponsor.
--GRPID groups data only within a subject.
--GRPID values are sponsor-defined, and will not be subject to controlled terminology.

Therefore, data that would be the same across subjects is usually more appropriate in --CAT/--SCAT, and data that would vary across subjects is usually more appropriate in --GRPID. For example, a concomitant medication administered as part of a known combination therapy for all subjects (e.g., "Mayo Clinic Regimen") would more appropriately use --CAT/--SCAT to identify the medication as part of that regimen. Groups of medications recorded on a Serious Adverse Event (SAE) form as treatments for the SAE would more appropriately use --GRPID because groupings are likely to differ across subjects.

GRPID in the general-observation class domains can reduce the number of records in the RELREC, SUPP--, and CO datasets, when those datasets represent relationships/associations for records or values to a "group" of general observation class records.

Pagenav

In domains based on the Findings general observation class, the --RESCAT variable can be used to categorize results after the fact. --CAT and --SCAT by contrast, are generally defined by the sponsor or used by the investigator at the point of collection, not after assessing the value of Findings results.

Page tree

Versions Compared

Old Version 8

New Version Current

Key