Creation of Derived Columns Versus Creation of Derived Rows

This section provides specific rules to use in building a BDS dataset. These rules are essential, because they ensure the BDS dataset is analysis-focused, with all analysis-enabling variables and supportive variables included in a predictable structure, while preventing a "horizontalization" of the dataset.

The rows (i.e., records) in the ADaM BDS represent subject data for analysis parameters and timepoints (as applicable). There may be multiple rows within a given combination of subject, parameter, and timepoint, depending on the number of observations collected or derived, baseline definition, etc.

The ADaM BDS structure contains a central set of columns (i.e., variables) that represent the data being analyzed. These variables include the value being analyzed (e.g., AVAL) and the description of the value being analyzed (e.g., PARAM). Other columns in the dataset provide more information about the value being analyzed (e.g., the subject identification) or describe and trace its derivation (e.g., DTYPE) or support its analysis (e.g., treatment variables, covariates). Standard columns exist for a variety of purposes, such as SDTM record identifiers for traceability, population and other record selection flags, analysis values, and some standard functions of analysis values. Permissible columns are not limited to those whose variable names are specified in ADD LINK Section 3, Standard ADaM Variables, and may include study-specific analysis model covariates, subgrouping variables, variables supportive of traceability, and other variables needed for analysis or useful for review.

The BDS is flexible in that derived data can be added to the collected data as additional rows and columns that support the analyses and provide traceability. However, there are some constraints on how to incorporate derived data in the BDS dataset. Specifically, the subject of this section is to address when derived data that are functions of analysis values should be added as additional columns, and when they should be added as additional rows instead.

The precise sequence of steps involved in creating a BDS ADaM dataset varies according to operational and study-specific needs. For the purposes of this discussion, it is useful to consider two fundamental steps.

1. Create an initial dataset from the source datasets. The first step is to create a set of rows and columns more or less directly derived from or loaded from input datasets (primarily SDTM datasets and other ADaM datasets) into their appropriate places. This step will include creation and population of columns containing analysis parameter (PARAM), analysis timepoint (e.g., AVISIT) and analysis values (e.g., AVAL, AVALC). It would also include adding columns containing identifiers (e.g., STUDYID, USUBJID, SUBJID, SITEID) and other SDTM variables for traceability (e.g., VISIT, --SEQ).

2. Add additional derived data as needed for the analysis. The second step consists of adding derived rows and columns based on the initial set of ADaM dataset records and columns. The rules below govern this step. These rules are further described and illustrated in the remaining subsections of this section.

Rules added here:

Num

Rules

Implementation

1

Rule 1: A parameter-invariant function of AVAL and BASE on the same row that does not involve a transform of BASE should be added as a new column.

The three conditions of Rule 1 for when a function of AVAL and BASE should be added as a column (i.e., a function column) are:

The function is of AVAL and, optionally, BASE, on the same row; and
The function is parameter-invariant; and
The function does not involve a transform of BASE.

The remainder of the discussion of this rule is devoted to explaining these conditions.

PARAM uniquely describes the contents of AVAL or AVALC. Often, AVAL itself is not the value that is needed for analysis. For example, in a change from baseline analysis, it is the change from baseline CHG that is analyzed. The change from baseline column CHG should be created according to Rule 1 because it satisfies the three conditions:

CHG is derived from AVAL and BASE on the same row.
The same calculation applies on all rows in the dataset on which CHG is populated (the function CHG=AVAL-BASE does not vary according to PARAM). This second condition is known as the property of parameter-invariance; unless listed in ADD LINK Section 3, Standard ADaM Variables, a function of AVAL (and optionally BASE) may not be derived as a column if it is parameter-variant (i.e., is calculated differently for different parameters).
In the function CHG=AVAL-BASE, BASE is not transformed.

The intent is to use the standard columns as much as possible, to keep the structure as standard as possible, and avoid undue "horizontalization," while still permitting efficient use of function columns.

Rule 2: A transformation of AVAL that does not meet the conditions of Rule 1 should be added as a new parameter, and AVAL should contain the transformed value.

Page tree

Creation of Derived Columns Versus Creation of Derived Rows