Blog from May, 2014

As part of a tool onboarding exercise, we thought it would be of great benefit to enter some metadata interactively on-line (as opposed to data import). In April, the team was divided into three teams of two to develop content for the three new oncology domains RS, TR, and TU introduced in SDTMIG v3.1.3. As illustrated on the SHARE Stack diagram, the first hurdle is to correctly up-version the managed objects (or, “asset” as it is called in the tool): Since these new domains belong to the Findings class, the Findings class asset needs a new version to contain both new and existing domains. Further, the TU domain also implements new class variables --LAT, --DIR, and --PORTOT. Therefore, the General Findings class asset also needs a new version to contain both new and existing class variables; until SHARE, these relationships can only be deduced from section 6.1 of the SDTM v1.3 publication. SHARE forces us to be explicit and express them in a machine-readable way.

That was the easy part -- we had only been dealing with new items thus far. For existing domains, the team needed to know what exactly changed between the two versions of the standard. Such manifest needs to be granular to be useful for SHARE. For example, the TSVAL variable in Domain TS has a change in CDISC Notes and Core; or, the only change in Domain CM is CMDOSFRM’s role. To accomplish that, we needed a reliable machine-readable metadata input in order to produce a reliable metadata comparison output^[a]. At the time of this blog entry, the team is half way through reviewing the SDTM v1.3 / SDTMIG v3.1.3 metadata spreadsheet posted on the CDISC website. With the keen eyes for detail, the team already identified several discussion-worthy discrepancies between the PDF publication and the input metadata spreadsheet. We will discuss how to resolve them in the upcoming team meetings^[b]. It is my gut feeling decisions will be contingent on how prevalent each category is, i.e., one size may not fit all.

[a] Using the existing metadata spreadsheet, there are 1,134 changes from SDTM v1.2 / SDTMIG v3.1.2 to SDTM v1.3 / SDTMIG v3.1.3. The count reflects each attribute change, includes variable name, label, order, data type, controlled term, role, CDISC notes, and core, excluding the three new oncology domains.

[b] The intention is, after issues are resolved, the metadata curator will generate SHARE-friendly import files using the reviewed metadata. With the magnitude of changes, manual data entry will not be efficient.

Another challenge we face is CDISC Controlled Terminology. Specifically, it is the evolution of codelist development that creates an unanticipated complexity. Within the sixteen months in between the final publication of SDTM v1.3 (2012-07) and SDTM v1.4 (2013-11), five CDISC Controlled Terminology releases were published and new codelists were introduced. For example, C78735 (EVAL: Evaluator) for --EVAL, C99079 (EPOCH: Epoch) for EPOCH, C66728 (STENRF: Relation to Reference Period) for --STRF, --ENRF, --STRTPT, --ENRTPT^[c]. We will have to decide how to handle this in SHARE.

[c] Credits go to the PhUSE Semantic Technology team. By analyzing their RDF materials using SPARQL, I realized the team retrospectively applied these CDISC Controlled Terminology codelists to the SDTM v1.3 / SDTMIG v3.1.3 triples. It is good to see how other people may interpret the standard.