Grouping Records with Values

Grouping variables are a subset of Identifier and Grouping Qualifier variables used to group records in datasets by study, domain, across subjects, by subject, and for a subject. When used per this guidance, grouping variables and their values adhere to the hierarchy below.

Hierarchy of Grouping Variables
STUDYID
	DOMAIN
		--CAT
			--SCAT
				USUBJID
					--GRPID

The following are expectations for how values in variables will group data given their hierarchy.

Variables	Record Grouping	Purpose of Grouping
STUDYID	By Study	All records with the same STUDYID value are a group of records that describe that study.
DOMAIN	By Domain	All records with the same DOMAIN value are a group of records that describe that domain.
--CAT and --SCAT	Across Subjects	--CAT and --SCAT values subset groups of tests within a domain and apply to all subjects within the domain.
USUBJID	By Subject	All records with the same USUBJID value are a group of records that describe that subject.
--GRPID	For Subjects	All records in the same domain with the same --GRPID value are a group of records within USUBJID.

How Grouping Variables Group Data

For the subject
Across subjects (records with different USUBJID values)
--CAT and --SCAT values further subset groups of tests within a domain and are not redundant with the domain or dictionary classification provided by --DECOD and --BODSYS. Generally, --CAT/--SCAT values have meaning within a particular domain and apply to all subjects within that domain. For example, a lab record with LBTEST = "SODIUM" might have LBCAT = "CHEMISTRY" and LBSCAT = "ELECTROLYTES".
1. --GRPID values further group (subset) records within USUBJID. Unlike --CAT and --SCAT, --GRPID values are not intended to have any meaning across subjects and they are usually assigned during or after data collection.
2. Although --SPID and --REFID are identifier variables, these are usually not considered to be grouping variables, although they may have meaning across domains.

Differences Between Grouping Variables

The primary distinctions between -CAT/ SCAT and --GRPID are:

--CAT/--SCAT are known (identified) about the data before it is collected.
--CAT/--SCAT values group data across subjects.
--CAT/--SCAT may have some controlled terminology.
--GRPID is usually assigned during or after data collection at the discretion of the sponsor.
--GRPID groups data only within a subject.
--GRPID values are sponsor-defined, and will not be subject to controlled terminology.

Therefore, data that would be the same across subjects is usually more appropriate in --CAT/--SCAT, and data that would vary across subjects is usually more appropriate in --GRPID.

The primary distinctions between -CAT/ SCAT and --REFID are:

--CAT/-SCAT are usually textual descriptions of the data designed into the collection vehicle/process, and --REFID is usually a tracking number/value of some type assigned to an object being tracked (e.g., a blood sample).
ASK LOU ANN ABOUT THIS SECTION

In domains based on the Findings general observation class, the --RESCAT variable can be used to categorize results after the fact. --CAT and --SCAT by contrast, are generally predefined or used at the point of collection, not after assessing the value of findings results.

For SDTM

Hierarchy of Grouping Variables

STUDYID DOMAIN
	--CAT
		--SCAT
			USUBJID
				--GRPID --LNKID --LNKGRP

How Grouping Variables Group Data

--CAT (Category) and --SCAT (Sub-category) values further subset groups within the domain. Generally, --CAT/--SCAT values have meaning within a particular domain. However, it is possible to use the same values for --CAT/--SCAT in related domains (e.g., MH and AE). When values are used across domains, the meanings should be the same. Examples of where --CAT/--SCAT may have meaning across domains/datasets include:
1. Cases where different domains in the same general observation class contain similar conceptual information. Adverse Events (AE), Medical History (MH), and Clinical Events (CE), for example, are conceptually the same data, the only differences being when the event started relative to the study start and whether the event is considered a regulatory-reportable adverse event in the study. Neurotoxicities collected in oncology trials both as separate Medical History CRFs (MH domain) and Adverse Event CRFs (AE domain) could both identify/collect "Paresthesia of the left arm". In both domains, the --CAT variable could have the value of "NEUROTOXICITY".
2. Cases where multiple datasets are necessary to capture data about the same topic. Following the oncology example, the existence and start and stop date of paresthesia of the left arm may be reported as an adverse event (AE domain), whereas the severity of the event is captured at multiple visits and recorded as Findings About (FA dataset). In both cases the --CAT variable could have a value of "NEUROTOXICITY".
3. Cases where multiple domains are necessary to capture data that were collected together and have an implicit relationship, perhaps identified in the Related Records (RELREC) special-purpose dataset.
  Stress-test data collection may capture the following:
  1. Information about the occurrence, start, stop, and duration of the test (in the Procedures (PR) domain)
  2. Vital Signs recorded during the stress test (VS domain)
  3. Treatments (e.g., oxygen) administered during the stress test (in an Interventions domain)
  In such cases, the data collected during the stress tests recorded in 3 separate domains may all have --CAT/--SCAT values (STRESS TEST) that identify that data were collected during the stress test.

Within subjects (records with the same USUBJID values)

--GRPID values further group (subset) records within USUBJID. All records in the same domain with the same --GRPID value are a group of records within USUBJID. Unlike --CAT and --SCAT, --GRPID values are not intended to have any meaning across subjects and are usually assigned during or after data collection.

Although --SPID and --REFID are Identifier variables, they may sometimes be used as grouping variables and may also have meaning across domains.

--LNKID and --LNKGRP express values that are used to link records in separate domains. As such, these variables are often used in IDVAR in a RELREC relationship when there is a dataset-to-dataset relationship.

--LNKID is a grouping identifier used to identify a record in one domain that is related to records in another domain, often forming a one-to-many relationship.
--LNKGRP is a grouping identifier used to identify a group of records in one domain that is related to a record in another domain, often forming a many-to-one relationship.

Differences Between Grouping Variables

The primary distinctions between --CAT/--SCAT and --GRPID are:

--CAT/--SCAT are known (identified) about the data before it is collected.
--CAT/--SCAT values group data across subjects.
--CAT/--SCAT may have some controlled terminology.
--GRPID is usually assigned during or after data collection at the discretion of the sponsor.
--GRPID groups data only within a subject.
--GRPID values are sponsor-defined, and will not be subject to controlled terminology.

Therefore, data that would be the same across subjects is usually more appropriate in --CAT/--SCAT, and data that would vary across subjects is usually more appropriate in --GRPID. For example, a concomitant medication administered as part of a known combination therapy for all subjects (e.g., "Mayo Clinic Regimen") would more appropriately use --CAT/--SCAT to identify the medication as part of that regimen. Groups of medications recorded on a Serious Adverse Event (SAE) form as treatments for the SAE would more appropriately use --GRPID because groupings are likely to differ across subjects.

In domains based on the Findings general observation class, the --RESCAT variable can be used to categorize results after the fact. --CAT and --SCAT by contrast, are generally defined by the sponsor or used by the investigator at the point of collection, not after assessing the value of Findings results.

Page tree

Grouping Records with Values