Identification of Records Used for Analysis

This section addresses how to identify the records of an ADaM dataset that are used for analysis. The specific issues addressed include

identification of the records used in an LOCF analysis;
identification of the record containing the baseline value;
identification of post-baseline conceptual timepoint records, such as endpoint, minimum, maximum, or average; and
identification of specific records used in an analysis.

Num	Identification	Implementation
1	Records used in a timepoint imputation analysis	This section considers the issue of how to identify records used in a timepoint-related imputation analysis as well as how to represent data imputed for missing timepoints in an ADaM dataset. LOCF (last observation carried forward) is commonly used in timepoint-related imputation analyses and is therefore specifically mentioned. However, the methodology is general and is not restricted to LOCF analysis. WOCF (worst observation carried forward) analysis is also mentioned to emphasize generalizability. When an analysis timepoint is missing, the ADaM methodology is to create a new record in the ADaM dataset to represent the missing timepoint and identify these imputed records by populating the derivation type variable DTYPE. For example, when an LOCF/WOCF analysis is being performed, create LOCF/WOCF records when the LOCF/WOCF analysis timepoints are missing, and identify these imputed records by populating the derivation type variable DTYPE with values LOCF or WOCF. All of the original records would have null values in DTYPE. It would be very simple to select the appropriate records for analysis by selecting DTYPE = null for DAO (data as observed) analysis, DTYPE = null or LOCF for LOCF analysis, and DTYPE = null or WOCF for WOCF analysis. This approach would require understanding and communicating that if the DTYPE flag were not referenced correctly, the analysis would default to using all records, including the DAO records, plus the records derived by LOCF and WOCF. To perform a correct DAO analysis, one would need to explicitly select DTYPE = null.
2	Baseline records	Many statistical analyses require the identification of a baseline value. This section describes how a record used as a baseline is identified. The ADaM methodology is to create a baseline flag column to indicate the record used as baseline (the record whose value of AVAL is used to populate the BASE variable). This method does not require duplication of records in the event that the baseline record is not derived. Although a baseline record flag variable ABLFL is created and used to identify the record that is the baseline record, this does not prohibit also providing a record with a unique value of AVISIT (e.g., "Baseline"), designating the baseline record used for analysis, even if redundant with another record. For more complicated baseline definitions (functions of multiple records), a derived baseline record would have to be created as described in Section 2.9.9.1, Creation of Derived Columns Versus Creation of Derived Rows. This methodology requires that clear metadata be provided for the baseline record variable so that the value can be reproduced accurately.
3	Post-baseline conceptual timepoint records	When analysis involves cross-timepoint derivations (e.g., endpoint, minimum, maximum, average post-baseline), questions such as "Should distinct records with unique value of AVISIT always be created even if redundant with an observed value record?" or "Should these records just be flagged?" need to be considered. The ADaM methodology is to create a new record with a unique value of AVISIT in cases where analysis is based on AVISIT. The advantage of this approach is that it is simple and analysis-friendly. It is recognized that such new records might be redundant with observed records for some kinds of conceptual timepoint definitions. Always creating a record with a unique value of AVISIT designating the record used for analysis (e.g., "Endpoint," "Post-Baseline Minimum," "Post-Baseline Maximum") has the advantage that once the AVISIT values are understood, producers, consumers, and software can rely on these values of AVISIT. This approach represents the general case because any such cross-timepoint derivation can be represented in a new record with a unique AVISIT description. The disadvantage is that the dataset would contain more records, and conventions would have to be communicated and understood. In cases where analysis is not based on AVISIT, either solution is valid. It is recognized that in cases where the AVISIT values are not defined in the analysis documentation, adding a flag may be more appropriate. Which methodology is appropriate for situations where an "analysis visit" value is not defined can be driven by how the analysis will be performed. In cases where only a subset of data is analyzed (e.g., only on product minimum values), then flagging the values that qualify for analysis might be a better choice than creating an additional record to contain the minimum value. However, where the subset of data is analyzed within the context of a greater pool of data, creating an additional record to contain the minimum value would help facilitate analysis-ready usage and review.
4	Records used for analysis—general case	It is important to identify the records used in or excluded from analysis. Should records used in the analysis be identified via flags or by unique values of analysis timepoint window description AVISIT? The ADaM methodology is to use an analysis flag (ANLzzFL) to indicate the records that fulfill specific requirements for 1 or more analyses. For example, ANLzzFL = Y indicates records meeting the requirements for analysis and is blank (null) in other records, such as a duplicate record that was not the one selected for analysis, or prespecified post-study timepoints not included in the analysis. This allows multiple records within a parameter with the same value of AVISIT. However, it also requires flags to be added to the dataset to be used in selecting appropriate records for analysis. Understanding of the flags is required for correct analysis results to be generated. In addition to ANLzzFL, additional flags might also be required, such as record-based population flags (e.g., ITTRFL, PPROTRFL). Note that there can be multiple ANLzzFL variables. In this case, it will be imperative to have clear and robust metadata to indicate the basis for the creation and population of each ANLzzFL variable.
5	Population-specific analyzed records	It is not uncommon in the statistical analysis of clinical studies to conduct analyses based on multiple populations of interest. The population of interest can be defined either at the subject level, the record (measurement) level, or both. For example, when defining an analysis population, a subject may be included in one analysis population (e.g., intent-to-treat), but excluded from another analysis population (e.g., per-protocol). Analysis populations may also be defined using characteristics of individual measurements. For example, a measurement that was assessed outside of a prespecified time window for a particular visit may not be included in a per-protocol visit-level population. In this section, it is assumed that the definition of a record-level analysis population is dependent on the definition of the subject-level population. In other words, if a subject is excluded from the subject-level per-protocol population, then none of that subject's records would be candidates for inclusion within the record-level per-protocol population. Given the variety of possible population definitions, the same record in an analysis dataset could be included in one analysis and excluded from another, depending on characteristics of the subject as a whole and the characteristics of the individual measurement. Therefore, the issue becomes how best to select records for each analysis. The ADaM methodology for this analysis issue is to create a single ADaM dataset that can be used to perform multiple analyses using population flag variables to identify records that are used for each type of analysis. An advantage of this approach is that this single ADaM dataset can be used for multiple analyses. Flag variables obviate the need to replicate records for each type of analysis. This approach promotes efficiency in the operational aspects of electronic submissions, clarity of analyses, and ease in comparing selected values for each population. This approach does, however, require that clear metadata be provided for the flag variables so that each specific analysis can be reproduced accurately.
6	Records which satisfy a predefined criterion for analysis purposes	For analysis purposes, criteria are often defined to group results based on the collected value's relationship to one or more algorithmic condition—for example, subjects who had a result greater than 5 times the upper limit of the normal range or subjects who had a systolic blood pressure value >160 mmHg with at least a 25-point increase from the BASE value. In addition to creating subgroups of subjects, the categorization of the presence or absence of a criterion is often used in listings, tabular displays, or statistical modeling (as a covariate or a response variable). When the criterion has binary responses, ADaM methodology provides an analysis criterion variable, CRITy, paired with a criterion evaluation result flag, CRITyFL, to identify whether a criterion is met. These variables are defined in 2.9.6.4, Analysis Parameter Variables for BDS Datasets. The variables MCRITy and MCRITyML are defined in Section 2.9.6.6, Analysis Parameter Criteria Variables for BDS Datasets, for use in situations where the criterion can have multiple responses (as opposed to CRITy, which has binary responses). CRITy is populated with a text description defining the conditions necessary to satisfy the presence of the criterion. The definition of CRITy can use any variable(s) located on the row, and the definition must stay constant across all rows within the same value of PARAM. A complex criterion which draws from multiple rows (different parameters or multiple rows for a single parameter) will require a new PARAM be created. CRITyFL (Criterion Evaluation Result Flag) is the character indicator of whether the criterion described in CRITy was met. Variable CRITyFL must be present on the dataset if variable CRITy is present. CRITyFN is permitted if a numeric result flag is needed. ADaM methodology allows the option of only populating CRITy on a row if the CRITy criterion is met for that row. In that case, CRITyFL is set to "Y" only if CRITy is populated and is null otherwise. If this option is not used and CRITy is populated on all rows within the parameter, then CRITyFL is set to "Y" or "N" or null. The choice of populating CRITy on only the rows where the criteria is met versus on all rows is dependent on the analysis need. CRITy and CRITyFL facilitate subgroup analyses. The ADaM methodology does not preclude the addition of rows (in contrast to the addition of multiple CRITy and CRITyFL columns) to the BDS for the criterion CRITy. However, CRITy must be kept constant (if populated) across all rows within the same value of PARAM. CRITy, CRITyFL, and CRITyFN are not parameter-invariant in that CRITy can vary across parameters within a dataset, as can the controlled terminology used for the corresponding CRITyFL and CRITyFN. In other words, CRITy for one parameter can be different than CRITy for a different parameter in the same dataset. When the criterion has multiple responses, ADaM methodology provides an analysis criterion variable, MCRITy, paired with a criterion evaluation result flag, MCRITyML (Multi-Response Criterion y Evaluation), to identify which level of a multiple response criterion is met. These variables are defined in Section 2.9.6.6, Analysis Parameter Criteria Variables for BDS Datasets. MCRITy is populated with a text description identifying the criterion being evaluated. The definition of MCRITy can use any variable(s) located on the row and the definition must stay constant across all rows within the same value of PARAM. A complex criterion which draws from multiple rows (different parameters or multiple rows for a single parameter) will require a new PARAM be created. MCRITyML is the character flag variable that indicates which level of the criterion defined in MCRITy was met. Variable MCRITyML must be present on the dataset if variable MCRITy is present. MCRITyMN is permitted if a numeric result flag is needed. MCRITy and MCRITyML facilitate subgroup analyses. The ADaM methodology does not preclude the addition of rows (in contrast to the addition of multiple MCRITy and MCRITyML columns) to the BDS for the criterion MCRITy. However, MCRITy must be kept constant (if populated) across all rows within the same value of PARAM. MCRITy, MCRITyML, and MCRITyMN are not parameter-invariant in that MCRITy can vary across parameters within a dataset, as can the Controlled Terminology used for the corresponding MCRITyML and MCRITyMN. In other words, MCRITy for one parameter can be different than MCRITy for a different parameter in the same dataset.

« Inclusion of Input Data that Are Not Analyzed but that Support a Derivation in the ADaM Dataset

Standards for Data Exchange »

Page tree

Identification of Records Used for Analysis