Before answering the topic, here is a little background to set the stage: During R1 development late last year 2013, the SHARE dev team loaded the SDTM v1.2 / SDTMIG v3.1.2 into SHARE. It serves the baseline SDTM content. In March, 2014, we held a kickoff meeting with the SDS volunteers to begin the journey of adding new content, i.e., SDTM v1.3 / SDTMIG v3.1.3 and SDTM v1.4 / SDTMIG v3.2.

 

Everything in SHARE is interconnected with relationships (see SHARE Metamodel). It was obvious we needed to divide the work into two pieces as SDTM v1.4 is a child of SDTM v1.3, which itself is a child of SDTM v1.2. In other words, SDTM v1.4 can’t be a parallel task, at least not until SDTM v1.3’s content is stable.

 

As part of a tool onboarding exercise, we thought it would be of great benefit to enter some metadata interactively on-line (as opposed to data import). In April, the team was divided into three teams of two to develop content for the three new oncology domains RS, TR, and TU introduced in SDTMIG v3.1.3. As illustrated on the SHARE Stack diagram, the first hurdle is to correctly up-version the managed objects (or, “asset” as it is called in the tool): Since these new domains belong to the Findings class, the Findings class asset needs a new version to contain both new and existing domains. Further, the TU domain also implements new class variables --LAT, --DIR, and --PORTOT. Therefore, the General Findings class asset also needs a new version to contain both new and existing class variables; until SHARE, these relationships can only be deduced from section 6.1 of the SDTM v1.3 publication. SHARE forces us to be explicit and express them in a machine-readable way.

 

That was the easy part -- we had only been dealing with new items thus far. For existing domains, the team needed to know what exactly changed between the two versions of the standard. Such manifest needs to be granular to be useful for SHARE. For example, the TSVAL variable in Domain TS has a change in CDISC Notes and Core; or, the only change in Domain CM is CMDOSFRM’s role. To accomplish that, we needed a reliable machine-readable metadata input in order to produce a reliable metadata comparison output[a]. At the time of this blog entry, the team is half way through reviewing the SDTM v1.3 / SDTMIG v3.1.3 metadata spreadsheet posted on the CDISC website. With the keen eyes for detail, the team already identified several discussion-worthy discrepancies between the PDF publication and the input metadata spreadsheet. We will discuss how to resolve them in the upcoming team meetings[b]. It is my gut feeling decisions will be contingent on how prevalent each category is, i.e., one size may not fit all.

[a] Using the existing metadata spreadsheet, there are 1,134 changes from SDTM v1.2 / SDTMIG v3.1.2 to SDTM v1.3 / SDTMIG v3.1.3. The count reflects each attribute change, includes variable name, label, order, data type, controlled term, role, CDISC notes, and core, excluding the three new oncology domains.

[b] The intention is, after issues are resolved, the metadata curator will generate SHARE-friendly import files using the reviewed metadata. With the magnitude of changes, manual data entry will not be efficient.

Another challenge we face is CDISC Controlled Terminology. Specifically, it is the evolution of codelist development that creates an unanticipated complexity. Within the sixteen months in between the final publication of SDTM v1.3 (2012-07) and SDTM v1.4 (2013-11), five CDISC Controlled Terminology releases were published and new codelists were introduced. For example, C78735 (EVAL: Evaluator) for --EVAL, C99079 (EPOCH: Epoch) for EPOCH, C66728 (STENRF: Relation to Reference Period) for --STRF, --ENRF, --STRTPT, --ENRTPT[c]. We will have to decide how to handle this in SHARE.

[c] Credits go to the PhUSE Semantic Technology team. By analyzing their RDF materials using SPARQL, I realized the team retrospectively applied these CDISC Controlled Terminology codelists to the SDTM v1.3 / SDTMIG v3.1.3 triples. It is good to see how other people may interpret the standard.

There are other hurdles too. The team will decide how to represent value domain and conceptual domain (see ISO 11179 Part 4) for the new MedDRA variables in the Events class, as well as ISO 21090’s nullFlavor associated to the TSVALNF variable in Domain TS. Note both instances are external standards whose values are managed by a different entity. Besides, intellectual property and copyright may further complicate the matter.

 

Suffice to say, adding existing content into SHARE is easy said than done. Challenges that would not materialize on paper now surface because the metadata repository requires us to be precise and verbose. That said, it is important to do the right thing so the community will soon benefit from truly interoperable data.

 

Lastly, I leave you this (in reference to SHARE). Your feedbacks are always welcome.

 

 

  • No labels