Stratified randomization is used to ensure balance of product assignments across 1 or more prognostic factors. A prognostic factor is an aspect of the disease or a characteristic of the subject that may influence product effect. The prognostic factors used to stratify the randomization are specified in the protocol. As a simple example, suppose age group (<50, >=50) and gender (male, female) are considered important prognostic factors. When a subject is deemed eligible for randomization, their individual values of these factors are determined at the site and used as input to the randomization process to determine their product assignment. The situation may occur where the value of a factor used for randomization is later discovered to be in error. For example, suppose a subject was randomized according to the age group of <50 and male. Later, it was discovered that the subject was actually 54 and therefore should have been randomized according to the age group of >=50 and male. If this situation happens too often, the balance in product assignments across these factors is in question, which may then result in the use of sensitivity analyses. Therefore, there is an analysis need to have 2 sets of values to describe the stratification factors. In this document, these 2 sets of values are referred to the “as-randomized” values and the “as-verified” values. As-verified values are derived using source documentation.
At present, there is no standard method for representing the randomization strata factors and values in SDTM-based datasets. Depending on the randomization process, it might be unnecessary to represent variables and values specific to stratification in SDTM-based datasets if the information can be found within an appropriate domain. For example, if age and sex were used as stratification factors, then the Demographics (DM) variables AGE and SEX should appropriately reflect values used for randomization. However, more sophisticated randomizations or more complicated derivations of prognostic factors, such as whether a subject had ever used a particular concomitant medication for a given length of time, may be harder to identify or document in SDTM-based datasets. If using an interactive voice response system (IVRS), the values used for randomization would be captured by the system and would correspond to the values that are represented on the randomization schedule. As-verified values are typically derived by comparing the values used for randomization against the data that is in the SDTM dataset, whether it be a simple match with a single data point such as sex or the reprogramming of more complex factors such as previous products.
The following table provides a set of variables to allow maximum flexibility in representing the description of the prognostic factors. To illustrate the interrelationships of the variables, the examples for every variable in the CDISC Notes column use the combination of 3 stratification factors: age group (“<50” or “ >=50”), prior product status (“Product naïve”, “Product experienced”), and hypertension (“Y” or “N”).