The Standards Review Council (SRC) recently reviewed the SDTM conformance rules ("Rules") produced by the SDTMV. After having painstakingly combed through the SDTM v1.4 and SDTMIG v3.2, the team identified 400+ rule candidates. At the time of this blog post, the SRC is working with the sub-team to address some reviewer comments before making the package available for Public Review. As you can preview here, the construct is not very different from those published by the FDA SDTM Validation Rules and OpenCDISC Community: Rules have identifier, context, rule description in some pre-specified lexicons, condition, and citation of the rule's source.
As a Metadata Curator, I need to ask myself what the Rules mean to SHARE, as metadata. The text and description are, by definition, not metadata. Extra steps are needed to tease out the metadata. I thought to first illustrate a typical rule construct, or a model, shown here:
Furthermore, I formulated these objectives to help me devise solutions (my philosophy to innovate: first understand the what's before bother with the how's):
Additionally, I self-imposed some scoping limitations, i.e., a list of "won't do's" to keep implementation simple so this can be completed within a reasonable amount of time:
Having done some research along with inputs from volunteers and peers, two choices were available. They are both open standards and fit my objectives:
At first, I found HL7 GELLO fascinating, supporting a huge range of medical and healthcare data. After all, it is designed to be a clinical decision support system. That said, having required to understand HL7 RIM and specialized toolset, it will be very difficult to find a sustainable workforce to develop and maintain using the GELLO framework.
A little bit more research revealed GELLO is in fact created based on OMG OCL. Here are a few characteristics that resonate with me and my Objectives:
This diagram nicely depicts the information architecture we use and how the CDISC product family stack up in terms of overall model framework.
Those said and illustrated, OMG OCL represents a no-brainer choice to me. UML, hence OCL, is the next logic step to further with (and, complete) the architectural blueprint.
I have only recently begun studying the OCL specifications to solidify my thinking. I hope the little work I attempted helps demonstrate this proposal. Below is a subset of the SDTM Findings class drawn using Enterprise Architect:
I added a couple of OCL to --TESTCD:
Their OCL expressions are as follows:
Imagine we will be able to run test data through the whole series of OCL as an exercise to validate the correctness of the constraints. This will enable us to run example data to test their validity prior to including them in Implementation Guides or User Guides. As a matter of fact, they are not a far-fetched ideas. This Youtube video posted by a third party modeling tool, called MagicDraw, adequately demonstrates the power of test automation using OCL functionality. At 6:00, the video shows how easy it is to validate an OCL using some XML data: prepare an XML file guaranteed to trigger a constraint violation, run it against the rules in a compiled Java code and the auto-generated schema file. Pretty nifty.
The vision of this proposal:
In conclusion, SHARE influences a certain discipline and conduct toward the standards development process. Engineering SDTM with an UML model and refitting validation rules using OCL are not only logical, but essential to lead the industry with technical innovation. Furthermore, this will address a lot of model and implementation ambiguities currently exist. Lastly, I'd like to make a call for volunteers to further implement this proposal. Perhaps, a proof of concept project to create a testbed to apply model constraints and rules metadata toward submission data validation and other uses.
6 Comments
Julie Evans
Really nice approach to rules.
Jozef Aerts
This is great!
As you probably know, some of us have started writing a number of these rules (FDA, SDTM, ADaM) in XQuery, which is NOT vendor/technology-neutral, as it is assumed that Dataset-XML is used for the datasets. So, if we can describe such rules using OCL, and than have them (automatically) "translated" into things that actually executes the rules on XPT, XML, JSON implementations of the standards, this will be great. "Proof of the pudding" will of course be to see whether some of the complex (e.g. ADaM) rules can be described.
This would also be a next step into machine-readable IGs.
Stetson Line
Impressive analysis. I completely agree with the vision and strategy outlined here. Let's talk about how the Rules sub-team can support these efforts.
Anthony Chow AUTHOR
UML software these days allows modelers to "annotate" an OCL with natural language. To me, this must be done to eliminate the "geek effect," otherwise there will no chance for adoption in the industry. Although there are a number of research papers examining the mechanism to translate natural language to OCL, it is a much more complicated matter to extract semantic meanings from natural language (ontology, natural language rules, etc.). For now, the first step is to get the basic framework in place so we can build some simple OCL expressions.
That's it for a Sunday morning food for thought.
Dave Iberson-Hurst
Anthony
I read this about a week ago, been meaning to comment since then, got round to it at last!
First of all, very nice post. Yes we need this, it was always the intent to move the text-based rules into some form of metadata. As you say, it will reduce the size of the documents. Since I first read the post you have also added your "geek effect' comment. I had a similar thought/concern when I first read it. OCL is not, shall we say, user friendly and the CDISC community might find it hard to gain understanding.
Some specific thoughts
Again, nice post
Dave
Stephen Gelling
I would like to add my thoughts on this topic. I have already implemented a number of “workflows” that checks the consistency of metadata in my organisation using the open source tool KNIME (www.knime.org).
From my experience creating simple rules using OCL expressions will be of some benefit in the short term, but I afraid that as you delve deeper it will become apparent that greater precision is needed and the OCL solution may show its limitations.
My belief is that we will not be able to avoid the “geek effect” as rules become more complex. It would be great to see the documentation relating to the rules to be implemented (maybe I have missed this). This should be open to public review before starting any work on a future solution.