The reference specifications introduce the Reference Dataset Structure. A Reference Dataset Structure dataset contains 1 record per combination of stratum values. At least 1 stratum variable is required and up to 99 stratum variables can be present in a reference dataset. There may several reference datasets in a study. This section of the TIG defines the standard variables used in reference datasets.
Reference dataset names must have a prefix of RF. There are then up to 6 characters that should be used to make dataset name meaningful. However, in the ADaM standard there are currently no predefined dataset names besides for ADSL.
- Proposed names for the datasets in the example section are:
- RFBR (Reference Data for Birthrate)
- RFIP (Reference Data for Initial Population)
- RFMIGRAT (Reference Data for Migration Rates)
- RFMORT (Reference Data for Mortality Rates)
- RFTRANSP (Reference Data for Transition Prob)
The identifier variables associated with the reference values are captured in the STRTMy (Stratum y) variables. The actual values of the STRTMy variables are captured in the STRVALy (Stratum y Value) variables. As many identifiers as necessary based on the source data should be captured in the reference dataset. The order of the stratum variables has no inherent meaning, so ordering is not defined in this section. The convention of y is used as an index value indicating an integer with a value of 1-99. There is no requirement that the stratum variables start with 1, nor must the variables use consecutive values. Some examples of stratum variables are Year, Sex, Race, Age, Transition type, and Product.
The source file may change over time so capturing the name of the source file either in the dataset or the define.xml file allows for traceability back to the source. If the source name is captured in the dataset, then REFSRCE (Reference Data Source) is the variable to use.