Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Guidance in this section describes conventions for the population of tabulation records and variables. Conventions in this section are both general and provided by general observation class. When conventions are applicable to TIG Nonclinical and Product Impact on Individual Health use cases, this is denoted in the Implementation column.

The following are general conventions for variable population:

Metadataspec
NumTabulation Variable UsePopulationImplementation
1Text Data Casing
  • Variables subject to controlled terminology will be populated with the exact value for the controlled term, including term casing.
  • Otherwise, text data will be represented in upper case (e.g., NEGATIVE).
strings greater than 200 characters

When text strings greater than 200 characters are collected, the following conventions for general observation class variables and SUPP-- datasets will be adhered to:

  • The first 200 characters of text should be stored in the parent domain variable and each additional 200 characters of text should be stored in a record in the SUPP-- dataset.
    • When splitting a text string into several SUPP-- records, the text should be split between words to improve readability.
    • The value of the first QNAM representing text over 200 characters will be the original domain variable name without any numeric suffix. 
    • The values for subsequent QNAMs will be sequential variable names, formed by appending a 1-digit integer, beginning with 1, to the original domain variable name. In cases where the standard domain variable name is already 8 characters in length, applicants will replace the last character with a digit when creating values for QNAM.
      • e.g., For Other Action Taken in Adverse Events (AEACNOTH), values for QNAM for the SUPPAE records would have the values AEACNOT1, AEACNOT2, AEACNOT3, and so on.
    • The value for QLABEL should be the original domain variable label for all QNAM values.
2"Yes/No" values2"Yes", "No", Values
  • For variables where the response is "Yes" or "No", both "Y" and "N" will
be   Variables where the response is "Yes" or "No" ("Y" or "N") should normally
  • be populated for
both "Y" and "N"
  • responses. This eliminates confusion regarding whether a blank response indicates "N" or is a missing value.
However, some
  • Some variables are collected or derived in a manner that allows only 1 response
, such as when
  • (e.g., a single checkbox
indicates
  • for "Yes"). In situations such as these, where it is unambiguous to populate
only the
  • only the response of interest,
it is permissible to populate
  • only 1 value will be populated ("Y" or "N") and
leave
  • the alternate value
blank. An example of when it would be acceptable to use only a value of "Y" would be for Last Observation Before Exposure Flag (--LOBXFL) variables, where "N" is not necessary to indicate that a value is not the last observation before exposure.
  • will be blank. 
3

--FOCID

  • Variable --FOCID is populated when a specific part of a subject or specimen is identified as a study-specific point of interest (e.g., injection site, biopsy site, treated site, region of the body).
  • When used, the variable serves as a cross-domain identifier for the study-specific focus of interest; any records relating to the same focus would have the same FOCID value.
4

--SEQ, --RECID

Variables --SEQ and --RECID are populated to explicitly identify domain records in different ways. Differences in variable population are described below.

--SEQ--RECID
Values uniquelyidentify records for subjects within a domain.Values uniquely identify records within a domain.

The relationship between records and values is not one-to-one.

  • Values may change between versions of datasets.
  • When a record is deleted, the value for the record may be reused to identify another record.

There is a one-to-one relationship between records and values.

  • Values for records do not change between versions of datasets even when content is modified. 
  • When a record is deleted, the value for the record will not be reused to identify another record.
Variable is numeric with numeric values.Variable is character with numeric, character, or alphanumeric values

--SEQ

Values in --SEQ will uniquely identify a record for a given USUBJID or SPTOBID within a domain
.
Conventions for establishing and maintaining
--SEQ
values are applicant-defined. Values may or may not be sequential depending on data processes and sources.
5--GRPID

The value of --GRPID is generally assigned during or after data collection at the discretion of the applicant.

63

--REFID

Values for --REFID are

sponsor

applicant-defined and can be any alphanumeric strings the

sponsor

applicant chooses, consistent with their internal practices

.

.

7--CAT, --SCAT
  • Values for --CAT and/or--SCAT are known (identified) about the data before it is collected.
  • Variable --SCAT will be populated only when there is a value in variable --CAT.
  • Values for --CAT and --SCAT will not be the domain name or dictionary classification represented in --DECOD and --BODSYS.
84--STAT
  • In general observation class domains, --STAT will be populated with "NOT DONE" when data are not collected for the topic of the observation.
567

Assumptions in this section are appliable to Interventions, Events, and Findings class domains and will be used with domain-specific assumptions as appropriate.

General assumptions for the population of values in tabulation variables are provided in this section. Assumptions in this section will be followed and complement more detailed assumptions provided in Domain Specifications.


The following assumptions will be implemented for Findings The following are conventions for variable population in Interventions and Events class domains. 

Metadataspec
NumField or Variable PopulationGuidanceImplementation
1Result Variables

Prespecified interventions and events (--

ORRES

PRESP, --

STRESC

OCCUR, --

STRESN) 
  • --ORRES will be populated with the result of the measurement or finding as originally received or collected s --ORRES is an expected variable and should always be populated, except (1) when --STAT = "NOT DONE" (because there is no result for such a record) or (2) for derived records.

Note: Records with --DRVFL = "Y" may combine data collected at more than 1 visit. In such cases, sponsors must define the value for VISITNUM, addressing the correct temporal sequence. If a new record is derived for a dataset by the sponsor or their agent (e.g., a CRO), then that new record should be flagged as derived.
For example, in electrocardiogram (ECG) data, if a corrected QT interval value derived in-house by the sponsor were represented in an SDTM record, then EGDRVFL would be "Y". If a corrected QT interval value was received from a vendor or was produced by the ECG machine, the derived flag would be null.

When --ORRES is populated, --STRESC must also be populated, regardless of whether the data values are character or numeric. The variable --STRESC is populated either by the conversion of values in --ORRES to values with standard units, or by the assignment of the value of --ORRES, as in the Physical Examination (PE) domain, where --STRESC could contain a dictionary-derived term. A further step is necessary when --STRESC contains numeric values. These are converted to numeric type and written to --STRESN. Because --STRESC may contain a mixture of numeric and character values, --STRESN may contain null values, as shown in the following figure.

Figure. Original to Standardized Results 

When the original measurement or finding is a selection from a defined codelist, in general, the --ORRES and --STRESC variables contain results in decoded format (i.e., the textual interpretation of whichever code was selected from the codelist). In some cases where the code values in the codelist are statistically meaningful standardized values or scores, which are defined by sponsors or by valid methodologies such as SF36 questionnaires, the --ORRES variables will contain the decoded format, whereas the --STRESC variables as well as the --STRESN variables will contain the standardized values or scores.

Occasionally data that are intended to be numeric are collected with characters attached that cause the character-to-numeric conversion to fail. For example, numeric cell counts in the source data may be specified with a greater than (>) or less than (<) sign attached (e.g., >10,000, <1). In these cases, the value with the greater than (>) or less than (<) sign attached should be moved to the --STRESC variable, and --STRESN should be null. The rules for modifying the value for analysis purposes should be defined in the analysis plan and a numeric value should only be imputed in the ADaM datasets. If the value in --STRESC has different units, the greater than (>) or less than (<) sign should be maintained. See Example 1, Rows 11 and 12

2

STAT, REASND)

Product Impact on Individual Health only:

Interventions (e.g., concomitant medications) and events (e.g., medical history) can be collected as responses to a prespecified list of treatments or terms. In such cases:

  • --PRESP represents when topic variable values, specific interventions (--TRT), or events (–TERM) were prespecified at the time of data collection. Values will be "Y" (for "Yes") or a null value.
  • --OCCUR represents whether prespecified interventions or events occurred or did not occur. Values will be populated for prespecified interventions and events only. Possible values are "Y" and "N" (for "Yes" and "No"). When an intervention or event is not prespecified, the value of --OCCUR will be null. 
  • --STAT and --REASND can be used to provide information about prespecified interventions and events for which there is no response (e.g., investigator forgot to ask). In such cases the value of --STAT will be "NOT DONE" and the value of --REASND will be the reason when collected.

The following table shows the population of --PRESP, --OCCUR, --STAT, and --REASND for different data collection scenarios.

Collection Scenario--PRESP Value--OCCUR Value--STAT Value--REASND Value
An intervention or event was prespecified at the time of collection and occurred.YY

An intervention or event was prespecified at the time of collection and did not occur.YN

An intervention or event was prespecified at the time of collection with no response and no reason collected.Y
NOT DONE
An intervention or event was prespecified at the time of collection with no response and reason collected.Y
NOT DONEForgot to ask.
A spontaneously reported intervention or event was collected.



2Reason for an action or activity
  • For Interventions class domains, --INDC will represent the medical condition for which the intervention was given and --ADJ will represent the reason for an adjustment to exposure, when collected.
  • For Events class domains, reasons for performing an activity will be represented using nonstandard variable(s) in the SUPP-- dataset with QNAM = --REAS. 


The following are conventions for variable population in Findings class domains. 

Metadataspec
NumRecord and Variable PopulationImplementation
1Result precision
  • For numeric non-derived data, --ORRES will represent results to precision collected. Precision will not be artificially changed due to computer storage considerations.
  • For numeric derived data, --ORRES and --STRESC will represent the correct number of significant figures based upon the calculation used to derive the value. Trailing zeroes will be retained when significant.
2Standardized units

Applicants may standardize units within a study for a given test per scientific and regulatory requirements. Standardization of units is recommended when data for the same test are collected via different sources using different units. In such cases, --ORRESU will represent the collected unit and --STRESU will represent the standardized unit.

3Original and standardized results (--ORRES, --ORRESU, --STRESC, --STRESU, --STRESN) 

If supplemental free text is collected for a result via CRF, then refer to Section 2.8.7.4, Free Text from Case Report Forms (CRFs). For responses collected via QRS instruments, refer to Section 2.8.7.2, Questionnaires, Ratings, and Scales.   

For all other results:

Image Added

  • --ORRES will be populated with the result of the measurement or finding as originally collected or received, using controlled terminology when applicable.  
    • When applicable, the unit associated with the value of --ORRES will be populated in --ORRESU, using controlled terminology.
  • Values will be populated in --STRESC when --ORRES is populated. The value of --STRESC will be:
    • Derived by the conversion of numeric values in --ORRES to numeric values with standard units. Standard units will be represented in --STRESU using controlled terminology.
      • Numeric --ORRES values with characters attached (e.g., a greater than (>) or less than (<) sign) will be converted to standard units and the value of --STRESC will maintain the attached character (e.g., >10,000, <1). 
    • The assigned of the value of --ORRES. 
      • For nonclinical studies, in the Macroscopic and Microscopic Findings (MA/MI) domains, --ORRES may contain a finding with multiple concatenated modifiers. In this case, --STRESC would represent only the finding without the modifiers. 
  • Numeric values represented in --STRESC will be assigned to --STRESN. If --STRESC is a character value, then, --STRESN will be null. 
    • Numeric values with attached characters (e.g., >10,000, <1) are considered to be character results and will not be populated in --STRESN.
4Reason Test Performed (--REASPF)

--REASPF will represent the reason a test was performed, if collected.

5Tests not done

When an entire examination (e.g., Laboratory Test Results (LB)), a group of tests (e.g., hematology or urinalysis), or an individual test (e.g., glucose) is not done for a USUBJID, POOLID, or SPTOBID and this information is explicitly captured with or without the reason for not collecting the information, record(s) can be created in the dataset to represent these data.

In such cases, applicants may include:

  • individual records for each test not done for each subject or pool; or
  • one record for each subject or pool for a group of tests that were not done. In such cases:
    • The paired values of --TESTCD and --TEST will represent a general description of testing in scope for the domain and will be used for all groupings of not done tests within the domain.  
      • --TESTCD will be the domain code concatenated with the word "ALL".
      • --TEST will be the domain description per controlled terminology.
    • --CAT will represent the group of tests not done.
    • --ORRES will be null.
    • --STAT will be "NOT DONE".
    • --REASND will be the reason the group of tests was not done, if collected.

For example, if a group of hematology or urinalysis tests represented in the LB domain are not done for a subject, then:

USUBJIDLBTESTCDLBTESTLBCATLBORRESLBSTATLBREASND
ABC-001LBALLLaboratory Test ResultsHEMATOLOGY
NOT DONE
ABC-001LBALLLaboratory Test ResultsURINALYSIS
NOT DONENo urine specimen present
6Biological significance 

Nonclinical only:

  • For assessments of biological significance when the overall interpretation is a record in the domain, use the supplemental qualifier (SUPP--) record (with QNAM = --BIOSIG) linked to the record that contains the overall interpretation or a particular result.
    • An example would be a QNAM value of LBBIOSIG in SUPPLB with a value of "Y", indicating that a lab result for albumin of 30 mg/mL was biologically significant.
  • Biological significance is not the same as the concepts of normal and abnormal, which are generally represented in --ORRES.
7Clinical significance

Product Impact on Individual Health only:

  • For assessments of clinical significance when the overall interpretation is a record in the domain, use the --CLSIG (Clinically Significant) variable on the record that contains the overall interpretation or a particular result.
    • For example, EGCLSIG = "Y" indicates that an ECG result of "ATRIAL FIBRILLATION" was clinically significant.
  • Clinical significance is not the same as the concepts of normal and abnormal and lab values out of normal range, which are generally submitted in --ORRES and normal range/indicator variables respectively.
8Records for derived results

Nonclinical:

  • When there is a need to derive results based on collected values in --ORRES (e.g., means or ratios based on collected values), a new record for the derived result will be created in the dataset. In such cases, --DRVFL will be populated with "Y" in the derived record. --GRPID may be used to explicitly define the relationship between a derived record and the records from which it was derived. This practice would be especially important in the case of multiple derived records in a domain for the same subject (e.g., 2 baseline averages). 
    • For example, a mean systolic blood pressure derived from collected systolic blood pressure would be represented in the following way:
CVGRPIDCVTESTCVORRESCVDRFLCVDTC
1Systolic Blood Pressure154
2023-04-02T09:52
1Systolic Blood Pressure149
2023-04-02T09:54
1Systolic Blood Pressure153
2023-04-02T09:55
1Systolic Blood Pressure152Y2023-04-02

Product Impact on Individual Health:

  • Derived records will only be created for QRS domains as applicable. Refer to Section 2.8.7.2, Questionnaires, Ratings, and Scales (QRS).
  • Otherwise, records for derived results will not be created. When needed such results will be derived as part of analysis.
9Dates collected as results
  • When appropriate, dates that are collected results will be represented as results in variable --ORRES. 
  • Dates will be represented in --ORRES in ISO 8601 format. 
  • Prior to representing a date as a result, confirm the date is actually a finding for an observation and not the timing of an observation.
3456789

Pagenav