Metadata QC macro

The SDTM Implementation Guide (SDTMIG) is one of the most highly utilized CDISC products. Among the rich contents, it contains a variety of normative information, such as domain dataset specification tables, assumptions, etc. The normative contents formulate conformance requirements, which are the foundation to rule sets when submitting standardized data to global regulatory agencies. Through internal curation and community feedback, CDISC recognizes a few recurring typographical error categories exist in the SDTMIG:

  • Variable names misspelling, e.g., EGFXN instead of EGXFN
  • Variable labels longer than 40 characters. e.g., "Findings About Character Results/Findings in Std. Format" for variable SRSTRESC has 41 characters
  • Variable's data type not aligned with the Model (SDTM), e.g., numeric variables mis-characterized as text, and vice verse
  • Variable ordering not aligned with the SDTM, e.g., TAETORD and EPOCH switched in the Subject Element dataset

To stop these typographical errors from being continually pervasive in our products, CDISC has installed a set of metadata QC macros into our collaborative authoring WIKI environment. These macros are designed to, in real time, detect and report aforementioned typographical errors. The logic comes from requirements by the SDS team, as well as troubleshooting experience by CDISC metadata curators. Quality metadata upfront is a clear benefit. With a much reduced overhead of manual curation, metadata will be loaded into SHARE and made available for member access with little to no wait time after publication. CDISC plans to continue to work with all development teams and spread this practice beyond SDTMIG.

QRS Maker

CDISC regularly publishes new implementation supplements for Questionnaire, Rating, and Scale (QRS Supplements). These QRS Supplements provide implementation advice on how to tabulate data for QRS measures using SDTMIG and Controlled Terminology. QRS measures often have groups and sub-groups of questions and responses as general layout. This hierarchical relationship is less than easy to convey two-dimensionally on papers, but is perfect for SHARE. For example, in the Columbia-Suicide Severity Rating Scale Baseline (C-SSRS BASELINE) measure, there are 39 questions with 10 distinct response sets, spanning 3 sub-groups. The C-SSRS BASELINE QRS Supplement goes further into details and advises users to tabulate the data into the SDTMIG's QS dataset using 3 codelists (QSCAT, CSS01TC, CSS01TN) from Controlled Terminology. Because the model behind SHARE requires metadata and their relationships be explicit, the hierarchy of question groups, questions, and their responses can easily be stored (as graphs) in and extracted (using SHARE API) from the repository.

With that introduction, CDISC has created a tool called QRS Maker and installed it into our collaborative authoring WIKI environment. It is designed to capture the detailed metadata described above during the development phase. Metadata captured there become a source for SHARE, and then QRS Supplement's publication. This tool embraces the "machine first, human next" strategy so that metadata are much more accessible and contemporaneous. At CDISC, the QRS team is actively using QRS Maker for measures that belong to public domain. A sample is as follows, along with their application in Therapeutic Area User Guide (TAUG):

  • Age, Treatment with Systemic Antibiotics, Leukocyte Count, Serum Albumin and Serum Creatinine as a Measure of Renal Function (ATLAS), used in conjunction with TAUG-CDAD
  • Combat Exposure Scale (CES), used in conjunction with TAUG-PTSD
  • Life Events Checklist for DSM-5 (LEC-5), used in conjunction with TAUG-PTSD
  • Controlled Oral Word Association Test (COWAT), used in conjunction with TAUG-Huntington

NSV Registry

The SDTM allows the use of non-standard variables (NSV) in certain datasets when tabulating data. An NSV contains supplemental information that is not a fit for any of the standardized variables cemented in the SDTM. Users have the liberty to decide their inclusion so long important data are not lost. The use of NSV is a long-standing modeling decision since science always evolves faster than standards. Those said, CDISC always has multiple concurrent Therapeutic Area User Guides (TAUG) in development. Medical concepts do overlap across TAUG. To ensure consistency, a tool called NSV Registry is created to 1) catalog NSV used in the past publications such as TAUG and IG; 2) serve as a governance approval tool; 3) store approved metadata such as definition, name, etc. Currently, we amassed over 125 NSV into the registry. After Governance finished adjudicating the list of NSV, CDISC will publish the list as reference tool for CDISC users. 

  • No labels