Taking a cross-country flight from California to Dulles allows me amble time to write a blog entry. What else would I do when there is no Skymall Magazine to entertain me? Besides, Pittsburgh is getting crushed by the Jets. That is at least in the first quarter, anyway.
So, RCMap is the topic of this blog. The best way to give you preview is to export the relevant components as web pages. It is about the low-density lipoprotein test, or LDL. I could not get tooltips to show. Please take a mental note that the tool does show extra metadata as tooltips when I hover on the bubbles, especially on those Controlled Terminology. It looks like this:
I recently studied the SHARE metadata display templates (MDT) attached to two Therapeutic Area User Guides (TAUG), Asthma and Diabetes. My first reacion, "Oh no, more spreadsheets." I blogged here explaining how loading metadata from spreadsheet wasn't as easy as it seemed. Creating research concepts in spreadsheet format adds another new dimension of complexities. But, before we go deep into this blog topic, first please know I recognize MDTs in the aforementioned TAUGs are prototypes and acknowledge a great deal of effort was put in to creating them. Second, it helps to describe the methodology as I understood it: These templates use multiple data element concepts (DEC) to describe a group of related research concepts. Each DEC consists of BRIDG class and attribute, ISO 12090 data type, SDTM variable, and CDISC Controlled Terminology. A single research concept can be created by instantiating from an MDT, subtracting irrelevant DECs, and constraining with additional controlled terms.
Challenges, as I see, are multi-folds. First, friendliness. These metadata displays were probably never meant to be user-friendly as they are very heavy on metadata, without a Layman translation such as descriptions. They are also not very machine-friendly because worksheet tabs, merged cells, color coding, text stylization mean hidden metadata that require additional extrapolations.
Second, consistency. The free form nature of spreadsheet is not the only contributing factor. There is also the cross-referencing to Controlled Terminology. And, I mean a lot of it. As I mentioned above, this research concept creation methodology hinges heavily on the use of controlled terms to constrain a template. You can imagine it requires a lot of looking up to Controlled Terminology, followed by copying and pasting. Further, new therapeutic areas (TA) often require new codelists and terms. With many concurrent TAUG developments, communication is a definite challenge across TA teams to define new items collaboratively and consistently.
Third, communication, which is tightly related to consistency. We want to reap the benefit of reusability from well-crafted MDTs and research concepts. Think about labs and how frequently they are used in safety and primary endpoints. Reuse means time saving, which equates to efficiency. Moreover, a research concept can be a composite of several related research concepts. Related disease area such as asthma and COPD in pulmonary function disorder can be developed in parallel. These advantages would be difficult to achieve by standalone spreadsheets.
Four, sustainability. Granted, it is more of an issue on process and resources than the methodology per se. It is nonetheless a growing challenge with the increasing rate of new TAs commencing. The toolset for creating research concept needs to be easy to use, quick to learn, and without a heavy baggage of inherent spreadsheet annoyances.
All of the above reasons motivated me to explore options, thus RCMap. It is a methodology based on visualization and knowledge organization. As you may notice from the preview of LDL test above, I used CmapTools as the conduit to demonstrate the idea. CmapTools is a visual tool. Users express ideas and concepts as bubbles (parts) and lines (relationships between any two parts). It has a good search functionality. It uses folder structure for organization. Drag and drop to reuse ideas and concepts is a bliss. And, it is a familiar software to the metadata developers and is in free beta offering.
To make RCMap useful to metadata developers, I had to do some preparation work. First, I exported Controlled Terminology codelists and terms from SHARE and import them as concepts. By making them concepts, metadata developers will be able to search the terms they need, either by name or by any of the Controlled Terminology attributes. They will be able to review the results, then drag and drop a match into the research concept they are creating. Likewise, I made SDTM variables available as reference objects. This way, metadata developers will be able to easily associate a DEC with appropriate SDTM variables, say LBTEST and LBTESTCD for Lab Test. The result is a tight binding among controlled term, SDTM variable, and DEC. It takes the ambiguity and guessing game away, which many of CDISC implementers want.
MDTs are patterns in RCMap. Patterns that contains visually appealing predefined DECs, relationships, and controlled terms. They are readily available for reuse. It will not bog at complex research concepts that require intricate linkages to related concepts because they are simply another set of bubble and line, where the target concept is just a click away.
Some Excel manipulations of the XML data exported from RCMap allows me to create a metadata display that mimics the ones bundled in the two TAUGs. Here is a sample for the LDL, Direct lab test research concept, where you can see both the SDTM variables and controlled terms.
It is suffice to say automation is not the focus here. I believe enough research concepts need to exist before automation is attainable. Only when there are sufficient patterns, meta-patterns can be ascertained.
At the time of this blog entry, one of the few remaining hurdles is to encapsulate BRIDG metadata into RCMap. I hope to discuss options with the metadata developers, such as using similar approach to binding SDTM variables to DEC, i.e., binding BRIDG class and attribute to DEC. Another hurdle is identifying role of SHARE in relation to RCMap. It is not a matter of technology because CmapTools has an excellent XML technology backbone. Rather, it is the curation and governance process.
Well, my flight is at the approaching phase to Dulles. I look forward to the 12th Annual CDISC International Interchanage.