Guidance in this section describes conventions for the population of tabulation records and variables. Conventions in this section are both general and provided by general observation class. When conventions are applicable to TIG Nonclinical and Product Impact on Individual Health use cases, this is denoted in the Implementation column.
The following are general conventions for variable population:
Metadataspec |
---|
Num | Variable Population | Implementation |
---|
1 | Text strings greater than 200 characters | When text strings greater than 200 characters are collected, the following conventions for general observation class variables and SUPP-- datasets will be adhered to: - The first 200 characters of text should be stored in the parent domain variable and each additional 200 characters of text should be stored in a record in the SUPP-- dataset.
- When splitting a text string into several SUPP-- records, the text should be split between words to improve readability.
- The value of the first QNAM representing text over 200 characters will be the original domain variable name without any numeric suffix.
- The values for subsequent QNAMs will be sequential variable names, formed by appending a 1-digit integer, beginning with 1, to the original domain variable name. In cases where the standard domain variable name is already 8 characters in length, applicants will replace the last character with a digit when creating values for QNAM.
- e.g., For Other Action Taken in Adverse Events (AEACNOTH), values for QNAM for the SUPPAE records would have the values AEACNOT1, AEACNOT2, AEACNOT3, and so on.
- The value for QLABEL should be the original domain variable label for all QNAM values.
| 2 | "Yes/No" values | - For variables where the response is "Yes" or "No", both "Y" and "N" will be populated for responses. This eliminates confusion regarding whether a blank response indicates "N" or is a missing value.
- Some variables are collected or derived in a manner that allows only 1 response (e.g., a single checkbox for "Yes"). In situations such as these, where it is unambiguous to populate only the response of interest, only 1 value will be populated ("Y" or "N") and the alternate value will be blank.
| 3 | --FOCID | - Variable --FOCID is populated when a specific part of a subject or specimen is identified as a study-specific point of interest (e.g., injection site, biopsy site, treated site, region of the body).
- When used, the variable serves as a cross-domain identifier for the study-specific focus of interest; any records relating to the same focus would have the same FOCID value.
| 4 | --SEQ, --RECID | Variables --SEQ and --RECID are populated to explicitly identify domain records in different ways. Differences in variable population are described below. --SEQ | --RECID |
---|
Values uniquelyidentify records for subjects within a domain. | Values uniquely identify records within a domain. | The relationship between records and values is not one-to-one. - Values may change between versions of datasets.
- When a record is deleted, the value for the record may be reused to identify another record.
| There is a one-to-one relationship between records and values. - Values for records do not change between versions of datasets even when content is modified.
- When a record is deleted, the value for the record will not be reused to identify another record.
| Variable is numeric with numeric values. | Variable is character with numeric, character, or alphanumeric values. | Conventions for establishing and maintaining values are applicant-defined. Values may or may not be sequential depending on data processes and sources. |
| 5 | --GRPID | The value of --GRPID is generally assigned during or after data collection at the discretion of the applicant. | 6 | --REFID | Values for --REFID are applicant-defined and can be any alphanumeric strings the applicant chooses, consistent with their internal practices. | 7 | --CAT, --SCAT | - Values for --CAT and/or--SCAT are known (identified) about the data before it is collected.
- Variable --SCAT will be populated only when there is a value in variable --CAT.
- Values for --CAT and --SCAT will not be the domain name or dictionary classification represented in --DECOD and --BODSYS.
| 8 | --STAT | - In general observation class domains, --STAT will be populated with "NOT DONE" when data are not collected for the topic of the observation.
|
|
The following are conventions for variable population in Interventions and Events class domains.
Metadataspec |
---|
Num | Variable Population | Implementation |
---|
1 | Prespecified interventions and events (--PRESP, --OCCUR, --STAT, REASND) | Product Impact on Individual Health only: Interventions (e.g., concomitant medications) and events (e.g., medical history) can be collected as responses to a prespecified list of treatments or terms. In such cases: - --PRESP represents when topic variable values, specific interventions (--TRT), or events (–TERM) were prespecified at the time of data collection. Values will be "Y" (for "Yes") or a null value.
- --OCCUR represents whether prespecified interventions or events occurred or did not occur. Values will be populated for prespecified interventions and events only. Possible values are "Y" and "N" (for "Yes" and "No"). When an intervention or event is not prespecified, the value of --OCCUR will be null.
- --STAT and --REASND can be used to provide information about prespecified interventions and events for which there is no response (e.g., investigator forgot to ask). In such cases the value of --STAT will be "NOT DONE" and the value of --REASND will be the reason when collected.
The following table shows the population of --PRESP, --OCCUR, --STAT, and --REASND for different data collection scenarios. Collection Scenario | --PRESP Value | --OCCUR Value | --STAT Value | --REASND Value |
---|
An intervention or event was prespecified at the time of collection and occurred. | Y | Y |
|
| An intervention or event was prespecified at the time of collection and did not occur. | Y | N |
|
| An intervention or event was prespecified at the time of collection with no response and no reason collected. | Y |
| NOT DONE |
| An intervention or event was prespecified at the time of collection with no response and reason collected. | Y |
| NOT DONE | Forgot to ask. | A spontaneously reported intervention or event was collected. |
|
|
|
|
| 2 | Reason for an action or activity | - For Interventions class domains, --INDC will represent the medical condition for which the intervention was given and --ADJ will represent the reason for an adjustment to exposure, when collected.
- For Events class domains, reasons for performing an activity will be represented using nonstandard variable(s) in the SUPP-- dataset with QNAM = --REAS.
|
|
The following are conventions for variable population in Findings class domains.
Metadataspec |
---|
Num | Record and Variable Population | Implementation |
---|
1 | Result precision | - For numeric non-derived data, --ORRES will represent results to precision collected. Precision will not be artificially changed due to computer storage considerations.
- For numeric derived data, --ORRES and --STRESC will represent the correct number of significant figures based upon the calculation used to derive the value. Trailing zeroes will be retained when significant.
| 2 | Standardized units | Applicants may standardize units within a study for a given test per scientific and regulatory requirements. Standardization of units is recommended when data for the same test are collected via different sources using different units. In such cases, --ORRESU will represent the collected unit and --STRESU will represent the standardized unit. | 3 | Original and standardized results (--ORRES, --ORRESU, --STRESC, --STRESU, --STRESN) | If supplemental free text is collected for a result via CRF, then refer to Section 2.8.7.4, Free Text from Case Report Forms (CRFs). For responses collected via QRS instruments, refer to Section 2.8.7.2, Questionnaires, Ratings, and Scales. For all other results: Image Added - --ORRES will be populated with the result of the measurement or finding as originally collected or received, using controlled terminology when applicable.
- When applicable, the unit associated with the value of --ORRES will be populated in --ORRESU, using controlled terminology.
- Values will be populated in --STRESC when --ORRES is populated. The value of --STRESC will be:
- Derived by the conversion of numeric values in --ORRES to numeric values with standard units. Standard units will be represented in --STRESU using controlled terminology.
- Numeric --ORRES values with characters attached (e.g., a greater than (>) or less than (<) sign) will be converted to standard units and the value of --STRESC will maintain the attached character (e.g., >10,000, <1).
- The assigned of the value of --ORRES.
- For nonclinical studies, in the Macroscopic and Microscopic Findings (MA/MI) domains, --ORRES may contain a finding with multiple concatenated modifiers. In this case, --STRESC would represent only the finding without the modifiers.
- Numeric values represented in --STRESC will be assigned to --STRESN. If --STRESC is a character value, then, --STRESN will be null.
- Numeric values with attached characters (e.g., >10,000, <1) are considered to be character results and will not be populated in --STRESN.
| 4 | Reason Test Performed (--REASPF) | --REASPF will represent the reason a test was performed, if collected. | 5 | Tests not done | When an entire examination (e.g., Laboratory Test Results (LB)), a group of tests (e.g., hematology or urinalysis), or an individual test (e.g., glucose) is not done for a USUBJID, POOLID, or SPTOBID and this information is explicitly captured with or without the reason for not collecting the information, record(s) can be created in the dataset to represent these data. In such cases, applicants may include: - individual records for each test not done for each subject or pool; or
- one record for each subject or pool for a group of tests that were not done. In such cases:
- The paired values of --TESTCD and --TEST will represent a general description of testing in scope for the domain and will be used for all groupings of not done tests within the domain.
- --TESTCD will be the domain code concatenated with the word "ALL".
- --TEST will be the domain description per controlled terminology.
- --CAT will represent the group of tests not done.
- --ORRES will be null.
- --STAT will be "NOT DONE".
- --REASND will be the reason the group of tests was not done, if collected.
For example, if a group of hematology or urinalysis tests represented in the LB domain are not done for a subject, then: USUBJID | LBTESTCD | LBTEST | LBCAT | LBORRES | LBSTAT | LBREASND |
---|
ABC-001 | LBALL | Laboratory Test Results | HEMATOLOGY |
| NOT DONE |
| ABC-001 | LBALL | Laboratory Test Results | URINALYSIS |
| NOT DONE | No urine specimen present |
| 6 | Biological significance | Nonclinical only: - For assessments of biological significance when the overall interpretation is a record in the domain, use the supplemental qualifier (SUPP--) record (with QNAM = --BIOSIG) linked to the record that contains the overall interpretation or a particular result.
- An example would be a QNAM value of LBBIOSIG in SUPPLB with a value of "Y", indicating that a lab result for albumin of 30 mg/mL was biologically significant.
- Biological significance is not the same as the concepts of normal and abnormal, which are generally represented in --ORRES.
| 7 | Clinical significance | Product Impact on Individual Health only: - For assessments of clinical significance when the overall interpretation is a record in the domain, use the --CLSIG (Clinically Significant) variable on the record that contains the overall interpretation or a particular result.
- For example, EGCLSIG = "Y" indicates that an ECG result of "ATRIAL FIBRILLATION" was clinically significant.
- Clinical significance is not the same as the concepts of normal and abnormal and lab values out of normal range, which are generally submitted in --ORRES and normal range/indicator variables respectively.
| 8 | Records for derived results | Nonclinical: - When there is a need to derive results based on collected values in --ORRES (e.g., means or ratios based on collected values), a new record for the derived result will be created in the dataset. In such cases, --DRVFL will be populated with "Y" in the derived record. --GRPID may be used to explicitly define the relationship between a derived record and the records from which it was derived. This practice would be especially important in the case of multiple derived records in a domain for the same subject (e.g., 2 baseline averages).
- For example, a mean systolic blood pressure derived from collected systolic blood pressure would be represented in the following way:
CVGRPID | CVTEST | CVORRES | CVDRFL | CVDTC |
---|
1 | Systolic Blood Pressure | 154 |
| 2023-04-02T09:52 | 1 | Systolic Blood Pressure | 149 |
| 2023-04-02T09:54 | 1 | Systolic Blood Pressure | 153 |
| 2023-04-02T09:55 | 1 | Systolic Blood Pressure | 152 | Y | 2023-04-02 |
Product Impact on Individual Health: - Derived records will only be created for QRS domains as applicable. Refer to Section 2.8.7.2, Questionnaires, Ratings, and Scales (QRS).
- Otherwise, records for derived results will not be created. When needed such results will be derived as part of analysis.
| 9 | Dates collected as results | - When appropriate, dates that are collected results will be represented as results in variable --ORRES.
- Dates will be represented in --ORRES in ISO 8601 format.
- Prior to representing a date as a result, confirm the date is actually a finding for an observation and not the timing of an observation
|
|
...
- Values subject to controlled terminology will
- In general, text data will be represented in upper case (e.g., NEGATIVE).
- Exceptions may include long text data (e.g., comment text) and values of --TEST in Findings datasets (which may be more readable in title case if used as labels in transposed views). Values from CDISC Controlled Terminology or external code systems (e.g., MedDRA, SNOMED) or response values for QRS instruments specified by the instrument documentation should be in the case specified by those sources, which may be mixed case.
...
Assumptions in this section are appliable to Interventions, Events, and Findings class domains and will be used with domain-specific assumptions as appropriate.
General assumptions for the population of values in tabulation variables are provided in this section. Assumptions in this section will be followed and complement more detailed assumptions provided in Domain Specifications.
Values for --REFID are sponsor-defined and can be any alphanumeric strings the sponsor chooses, consistent with their internal practices.
- The sequence number (--SEQ) variable uniquely identifies a record for a given USUBJID within a domain. The variable --SEQ is required in all domains except DM. For example, if a subject has 25 observations in the Vital Signs (VS) domain, then 25 unique VSSEQ values should be established for this subject. Conventions for establishing and maintaining --SEQ values are applicant-defined. Values may or may not be sequential depending on data processes and sources.
Differences Between Grouping Variables Grouping Qualifier variable
The primary distinctions between -CAT/ SCAT and --GRPID are:
- --CAT/--SCAT are known (identified) about the data before it is collected.
- --CAT/--SCAT values group data across subjects.
- --CAT/--SCAT may have some controlled terminology.
- --GRPID is usually assigned during or after data collection at the discretion of the sponsor.
- --GRPID groups data only within a subject.
- --GRPID values are sponsor-defined, and will not be subject to controlled terminology.
Therefore, data that would be the same across subjects is usually more appropriate in --CAT/--SCAT, and data that would vary across subjects is usually more appropriate in --GRPID.
The primary distinctions between -CAT/ SCAT and --REFID are:
- --CAT/-SCAT are usually textual descriptions of the data designed into the collection vehicle/process, and --REFID is usually a tracking number/value of some type assigned to an object being tracked (e.g., a blood sample).
- ASK LOU ANN ABOUT THIS SECTION
The following assumptions will be implemented for Interventions class domains.
...
- Variables with the question text "Were there any interventions?" (e.g., “Were there any concomitant medications?") support the cleaning of data and confirmation that there are no missing values.
- Values collected for these fields will not be represented in subsequent tabulation datasets.
...
- Categories and subcategories are determined per protocol design and values are generally not entered via CRF.
- Implementers may:
- Pre-populate and display category values to help individuals involved in data collection understand what data should be recorded on the CRF.
- Pre-populate hidden variables with the values assigned within their operational database.
- Populate values directly in the tabulation dataset during dataset creation.
...
- The time an intervention started will be collected if there is a scientific or regulatory reason to collect this level of detail and the time can be realistically determined.
- Collection variables for date (e.g., --DAT, --STDAT, --ENDAT) will be concatenated with collection variables for time (e.g., --TIM, --STTIM, --ENTIM) as applicable to populate tabulation variables for dates (e.g., --DTC, --STDTC, --ENDTC) using ISO 8601 format.
...
- --REASND is used with tabulation variable --STAT.
- The value "NOT DONE" in --STAT indicates that the subject was not questioned about the intervention or that data were not collected; it does not mean that the subject had no interventions.
...
- --SPID may be populated by the applicant's data collection system.
- If collected, --SPID it can be used as an identifier in a data query to communicate clearly to individuals involved in data collection the record in question.
...
- When free-text intervention treatments are recorded, the location may be included in the --TRT variable to facilitate coding (e.g., lung biopsy). Location may be collected when the applicant needs to identify the specific anatomical location of the intervention. This location information does not need to be removed from the verbatim --TRT when creating tabulation datasets.
- The nonstandard variables --ATC1 through --ATC5 and --ATC1CD through --ATC5CD are used only when the intervention is coded using the World Health Organization's Anatomical Therapeutic Chemical (ATC) classification system (https://www.who.int/medicines/regulation/): 1 = the anatomical main group, 2 = the therapeutic main group, 3 = the therapeutic/pharmacological subgroup, 4 = chemical/therapeutic/pharmacological subgroup, 5 = chemical substance.
- The implementer may also add MedDRA coding elements as nonstandard variables (NSVs) to the Interventions domain if this dictionary is used for coding.
...
- Applicants may collect location data using a subset list of controlled terminology on the CRF.
- Applicants may pre-populate hidden variables with values assigned within their operational database.
- There is currently some overlap across controlled terminology for LOC, LAT, and DIR. While the overlap exists, ensure that this overlap is not part of database design.
The following assumptions will be implemented for Events class domains.
...
- Variables with the question text "Were there any <events>?" (e.g., “Were there any adverse events?”) support the cleaning of data and confirmation that there are no missing values.
- These questions can be used on any CRF.
- Values collected for these fields will not be represented in subsequent tabulation datasets.
...
- Categories and subcategories are determined per protocol design and values are generally not entered via CRF.
- Implementers may:
- Pre-populate and display category values to help individuals involved in data collection understand what data should be recorded on the CRF.
- Pre-populate hidden variables with the values assigned within their operational database.
- Populate values directly in the tabulation dataset during dataset creation.
...
- The time of an event will be collected if there is a scientific or regulatory reason to collect this level of detail and the time can be realistically determined.
- Collection variables for date (e.g., --DAT, --STDAT, --ENDAT) will be concatenated with collection variables for time (e.g., --TIM, --STTIM, --ENTIM) as applicable to populate tabulation variables for dates (e.g., --DTC, --STDTC, --ENDTC) using ISO 8601 format.
...
- --OCCUR may be used when a specific event is solicited (preprinted) on the CRF and the CRF uses an applicant-defined codelist.
- --OCCUR may be implemented while also allowing for a "NOT DONE" response.
...
- --REASND is used with tabulation variable --STAT.
- The value "NOT DONE" in --STAT indicates that the subject was not questioned about the event or that data were not collected; it does not mean that the subject had no events.
...
- --SPID may be populated by the applicant's data collection system.
- If collected, --SPID it can be used as an identifier in a data query to communicate clearly to individuals involved in data collection the record in question.
...
- The collection variables used for coding are not data collection fields that will appear on the CRF. Applicants will populate values through the coding process.
- When free-text event terms are entered, the location may be included in --TERM to facilitate coding and further clarify the event. This location information does not need to be removed from the verbatim term when creating tabulation datasets.
- The CDASH variables --LLT, --LLTCD, --PTCD, --HLT, --HLTCD, --HLGT, --HLGTCD, --SOC, and --SOCCD are only applicable to events coded in MedDRA.
...
- Location is collected when the applicant needs to identify the specific anatomical location of the event.
- Applicants may collect location data using a subset list of controlled terminology on the CRF.
- Applicants may pre-populate hidden variables with values assigned within their operational database.
- There is currently some overlap across controlled terminology for LOC, LAT, and DIR. While the overlap exists, ensure that this overlap is not part of database design.
The following assumptions will be implemented for Findings class domains.
Metadataspec |
---|
Num | Field or Variable | Guidance |
---|
1 | --CAT, --SCAT | - Categories and subcategories are determined per protocol design and values are generally not entered via CRF.
- Implementers may:
- Pre-populate and display category values to help individuals involved in data collection understand what data should be recorded on the CRF.
- Pre-populate hidden variables with the values assigned within their operational database.
- Populate values directly in the tabulation dataset during dataset creation.
| 2 | --PERF, --STAT, --REASND | - --PERF defines - variables to record whether an assessment has been performed/collected. --REASND is used to collect a reason why an assessment was not done.
- --PERF has the Question Text "[Were any/Was the] [--TEST/ topic] [measurement(s)/test(s) /examinations (s)/specimen(s) /sample(s) ] [performed/collected]?" are intended to assist in the cleaning of data and in confirming that there are no missing values.
- --PERF may be used at the page, panel, or question level.
- --PERF may be used during the creation of tabulaton datasets to derive a value into the SDTM variable --STAT. The implementer can use a combination of --CAT, --SCAT, with the --TESTCD= "--ALL" and --TEST= "<Name of the CRF module>" to represent what tests were not performed.
- Applicants must decide how to model each test not performed (e.g., to denote that all tests were not performed using TESTCD = "–ALL").
- --STAT has the Question Text "Was the [--TEST] not [completed/answered/done/assessed/evaluated]?; Indicate if (the [--TEST] was) not [answered/assessed/done/evaluated/performed]." This is intended to be used to collect a simple "NOT DONE" check box at the page, panel, or question level.
- --REASND is used with SDTM variable --STAT only. The value NOT DONE in --STAT indicates that the findings test was not performed.
| 3 | --SPID | - --SPID may be populated by the applicant's data collection system. If collected, it can be beneficial to use an identifier in a data query to communicate clearly to the site the specific record in question.
- This field may be populated by the applicant's data collection system.
| 4 | Variables for Date and Time | - Time will be collected if there is a scientific or regulatory reason to collect this level of detail and the time can be realistically determined.
- Metadata tables generally include --DAT and --TIM will be added from the CDASH Model as appropriate.
- Collection variables for date and time (e.g., --DAT, --TIM) will be used to collect the date or date and time that the test was performed, or the specimen was collected. The start and end dates and times (e.g., for specimen collection) will be collected as appropriate.
- The date of collection of a test can be derived from the date of visit. In such cases, a separate date of observation field is not required to be present on the CRF.
- Date and time variables will not be used to collect dates that are the result of a tests. Test results will be collected using --ORRES.
| 5 | Horizontal (Denormalized) and Vertical Data Structures (Normalized) | - In metadata tables, many of the Findings class domains are presented in a normalized structure (1 record for each test) similar to a tabulation dataset, even though many data management systems hold the data in a denormalized structure (1 variable for each test).
- When implementing collection standards in a denormalized structure, create variable names for the Findings --TEST and/or --TESTCD values. To do this:
- Define the denormalized variable names using available CDISC Controlled Terminology for --TESTCD; or
- When a system allows more than 8-character variable names, the following naming convention can be used: <--TESTCD>_<-- tabulation variable name> where --TESTCD is the appropriate CT for the test code (e.g., DIABP_VSORRES, DIABP_VSLOC).
- In the horizontal (denormalized) setting, collection variables such as --PERF, --LOC , and --STAT can be collected once for the whole horizontal record and applied to all of the observations on that record, or collected per test using collection variables, such as <--TESTCD>_--PERF. When tabulation datasets are created, any variables collected for the entire horizontal record will be mapped to each vertical record per tabulation guidance.
- In the horizontal (denormalized) setting, an identifier (e.g., --GRPID) can be used to identify all --TESTCD for the same collection record. This supports mapping of data collected in a horizontal setting to tabulation datasets and creation of RELRECs.
| 6 | Tests and Original Results | - The value in --TEST will be 40 characters or less.
- The corresponding codelist value for the short test name, 8 characters or less, will be populated in the tabulation variable --TESTCD.
- Variable --TESTCD should be used to create a variable name and --TEST be used as the Prompt on the CRF.
- Both --TESTCD and --TEST are recommended for use in the operational database.
- Variable --ORRES is used to collect test results or findings in the original units per controlled terminology in character format.
- If results are modified for coding, the --MODIFY variable contains the modified text.
- Variables --ORNRLO and --ORNRHI and --NRIND are used when normal or reference ranges are collected for results.
- Standardization of the original results and/or normal/reference ranges will be performed during the creation of tabulation datasets.
| 7 | Location Variables (--LOC, --LAT, --DIR, --PORTOT) | - Location variables are used to collect the location of the test.
- Applicants may collect location data using a subset list of controlled terminology on the CRF.
- Applicants may pre-populate hidden variables with values assigned within their operational database.
- There is currently some overlap across controlled terminology for LOC, LAT, and DIR. While the overlap exists, ensure that this overlap is not part of database design.
| 8 | –ORRES, --RES, --DESC, and --RESOTH | - Variables --ORRES, --RES, --DESC, and --RESOTH are used to collect results. It is recommended that:
- --ORRES is used when the result is collected using a single question. The result will map directly to the tabulation variable --ORRES.
- --RES and --DESC are used when a pair of questions are asked to collect the result; a question to collect the result with a follow-up question for a description of the result. For example, the question “Is the <condition> [absent/present]?" with a follow-up question “What is the finding that was observed?
- --RES and --RESOTH are used when a question is asked that allows the selection of a pre-specified finding, with a follow-up question to ask about the pre-specified response "OTHER". For example, the question "What is the result?" with a set of prespecified responses, including the choice “OTHER” with the follow-up question “Specify, Other”.
| 9 | Root variables | The Findings About Events and Intervention domains use the same root variables as the Findings domain, with the addition of the --OBJ variable |
|