Dataset-JSON was adapted from the Dataset-XML Version 1.0 specification, but uses JSON format. Like Dataset-XML, each Dataset-JSON file is connected with a Define-XML file containing detailed information about the metadata. One aim of Dataset-JSON is to address as many of the relevant requirements in the PHUSE 2017 Transport for the Next Generation paper as possible, including the efficient use of storage space.

Dataset-JSON is using lowerCamelCase notation for attribute names, comparing to Dataset-XML PascalCase (e.g., clinicalData vs ClinicalData).

JSON format does not allow to specify or control order of attributes. Despite of that, as most JSON engines allow to control the order of attributes it is strongly recommended to follow the attribute order specified in details. The reason for that is that due to a possible large size of Dataset-JSON files, following the specified order will enable a software using steaming approaches to read the file to work in an efficient and fast way.

Dataset-JSON must contain only one dataset per file.

Top Level Attributes

At the top level of Dataset-JSON object, there are technical attributes and two main optional attributes: clinicalData and referenceData, corresponding to Dataset-XML elements. At least 1 of the main attributes must be provided. Subject data is stored in clinicalData and non-subject data is stored in referenceData.

Attribute	Usage	Description	Attribute order
creationDateTime	Required	Time of creation of the file containing the document.	1
datasetJSONVersion	Required	Version of Dataset-JSON standard	2
fileOID	Optional	A unique identifier for this file.	3
asOfDateTime	Optional	The date/time at which the source database was queried in order to create this document.	4
originator	Optional	The organization that generated the Dataset-JSON file.	5
sourceSystem	Optional	The computer system or database management system that is the source of the information in this file.	6
sourceSystemVersion	Optional	The version of the "sourceSystem" above.	7
clinicalData	Optional	Contains datasets for clinical data across multiple subjects.	8
referenceData	Optional	Contains datasets for non-subject data domains.	9

Code Block

language	js

{
    "creationDateTime": "2023-03-22T11:53:27",      
    "datasetJSONVersion": "1.0.0",
    "fileOID": "www.sponsor.xyz.org.project123.final",
    "asOfDateTime": "2023-02-15T10:23:15",
    "originator": "Sponsor XYZ",
    "sourceSystem": "Software ABC",
    "sourceSystemVersion": "1.0.0",

...

Status

title	DRAFT

...

Sam Hume

...

Jozef Aerts Sam Hume

...

Introduction

JSON representations for exchange standards are widely used in today’s architectures. In RESTful web services, JSON is often the preferred format for the service response, due to its compactness and ease of use in mobile applications. Other standards used in healthcare, such as HL7-FHIR support JSON as well as XML, together with other formats such as RDF.

JSON and XML are however not 1:1 interoperable, as they are based on different principles. For examples, JSON does not have a native mechanism for namespaces (as it wants to remain "lightweight"). Also JSON does not have an equivalent for XML "text content". In JSON, "text content" is treated in the same way as "attribute pairs" of XML.

...

Starting from ODM version 2 (ODMv2) a JSON representation for ODM is available.
This document explains the principles of the JSON representation, and the conventions used. These are based on the "Flickr conventions" for JSON (https://www.flickr.com/services/api/response.json.html).

Main principles

For the JSON implementation of ODM, the following main principles apply:

Just like XML, JSON is case-sensitive.
JSON is based on sets of name-value pairs. These are separated by a colon. Name and Value are embedded in double quotes.
Example: "OID": "MyStudy"
XML elements are represented as JSON objects.
Arrays of objects or name-value pairs are represented by and embedded in square brackets.
For example: ["a","b","c"] represents a list of the objects with name "a", "b" and "c".
In JSON, an object is an unordered set of name-value pairs. An object begins with a left brace '{' and ends with right brace '}', preceded by the object name (in double quotes). Each name is followed by colon and the name/value pairs are separated by a comma.
Example:

Code Block

language	js

"Protocol": {
    "StudyEventRef": [{
        "Mandatory": "Yes",
        "OrderNumber": 1,
        "StudyEventOID": "BASELINE"
    }]
}

The example above shows the JSON serialization of the XML element "Protocol" with an array of child "StudyEventRef" elements (as can be seen from the curly brackets, which has the attributes "Mandatory" (with value "Yes"), "OrderNumber" (with value 1) and "StudyEventOID" (with value "BASELINE"). More than one StudyEventRef may be included since it is defined as an array ("StudyEventRef": [...]). Attributes are represented as name-value pairs, such as "Mandatory": "Yes".

Note that the indentation is completely arbitrary, and (just like in XML), does not imply anything. Also, line breaks used to format the JSON do not have a meaning: very complex JSON or XML files of 1 GB in size can just consist of one single line. However, line breaks within strings (content surrounded with double quotes), have meaning, for example line breaks are not allowed and must be replaced with \n.

XML text content is treated as a name-value pair with the name being "_content".
Example:

Code Block

language	js

"StudyName": {"_content": "Test Study 003"}

and combined with the parent element "GlobalVariables":

Code Block

language	js

"GlobalVariables": {
    "StudyName": {"_content": "Test Study 003"},
    "StudyDescription": {"_content": "Test Study 003 created by API"},
    "ProtocolName": {"_content": "Test Study 003 created by API"}
}

Namespaces are ignored.
For ODM, this essentially means that the attribute "xml:lang" translates into "lang".
Example:

Code Block

language	js

"Description": {
    "TranslatedText": [{
        "lang": "en",
        "_content": "Unique identifier for a study."
    }]
}

Representing the ODM-XML element "Description" element with child element "TranslatedText", having the "xml:lang" attribute with the value "en" and the text content "Unique identifier for a study."

As usual in JSON, the root element is not explicitly named:
Example:

Code Block

language	js

{
    "CreationDateTime": "2011-10-24T10:05:00",
    "Description": "JSON test",
    "FileOID": "JSON_Test_2020",
    "FileType": "Snapshot",
    "Granularity": "Metadata",
    "ODMVersion": "2.0",
    "Originator": "MySystem",
    ...
    ...
}

Representing the ODM element with attributes "CreationDateTime", "Description", "FileOID", "FileType", "Granularity", "ODMVersion", and "Originator".

Dataset-JSON

Dataset-JSON is based on the Dataset-XML specification, but represents a different approach from the one described above. It utilizes JSON format specifics to efficiently store data. Each Dataset-JSON file is connected with a Define-XML file, containing detailed information about the metadata. One aim of Dataset-JSON is to address as many of the relevant requirements in the PHUSE 2017 Transport for the Next Generation paper as possible, including the efficient use of storage space.

At the top level of Dataset-JSON object, there are two optional attributes: clinicalData, referenceData, corresponding to Dataset-XML elements.

Code Block

language	js

{
    "clinicalData": { ... },
    "referenceData": { ... }
}

ClinicalData and ReferenceData Attributes

Both clinicalData and referenceData have the same structure. Each of these attributes contains study and metadata OIDs as well as , optional reference to the metadata file and an object describing an item group (dataset). The following attributes are defined on this level.

Attribute	Requirement	Description	Attribute order
studyOID	Optional	See ODM definition for study OID (ODM/Study/@OID).	1
metaDataVersionOID	Optional	See ODM definition for metadata version OID (ODM/Study/MetaDataVersion/@OID).	2
metaDataRef	Optional	URL for a metadata file describing the data.	3
itemGroupData	Required	Object containing dataset information	4

one or more item groups (datasets). Values of the studyOID and metaDataVersionOID must match corresponding values in the Define-XML file.

Code Block

language	js

{
    "clinicalData": {
        "studyOID": "xxx",
        "metaDataVersionOID": "xxx",
        "metaDataRef": "https://metadata.location.org/api.link",
        "itemGroupData": { ... }
}

ItemGroupData attribute

itemGroupData is an object with attributes a single attribute corresponding to individual datasetsan individual dataset. There must be only one dataset per Dataset-JSON file. The attribute name is OID of a described dataset, which must be the same as the OID of the corresponding itemGroup itemGroupDef in the Define-XML file.

Code Block

language	js

"itemGroupData": { 
    "IG.DM": { ... }
}

The dataset description contains basic information about the dataset itself and its items.

Attribute	Requirement	Description	Attribute order
records

...

	Required	The total number of records in a dataset	1
name

...

	Required	Dataset name	2
label	Required	Dataset description	3
items	Required	Basic

...

information about variables	4
itemData

...

Required

Dataset data

5

Code Block

language	js

"IG.DM": {
    "records": 100,
    "name": "DM",
    "label": "Demographics",
    "items": [ ... ],
    "itemData": [ ... ]
}

items is an array of basic information about dataset variables. The order of the elements in the array must be the same as the order of variables in the described dataset.

...

The first element always describes the Record Identifier (ITEMGROUPDATASEQ).

Attribute	Requirement	Description	Attribute order
OID	Required	OID of a variable (must correspond to the variable OID in the Define-XML file)	1
name

...

	Required	Variable name	2
label	Required	Variable description	3
type	Required	Type

...

of the variable.

...

Allowed values: "string", "integer", "decimal", "float", "double", "boolean". See ODM types for details.	4
length	Optional	Variable length	5
displayFormat	Optional	Display format supports data visualization of numeric float and date values.	6
keySequence	Optional	Indicates that this item is a key variable in the dataset structure. It also provides an ordering for the keys.	7

Code Block

language	js

"items": [    
    {
        "OID": 100"ITEMGROUPDATASEQ",
        "name": "ITEMGROUPDATASEQ",
        "label": "Record identifier",
        "type": "integer",
    },
    {
        "OID": "IT.DM.STUDYID",
        "name": "STUDYID",
        "label": "DemographicsStudy Identifier",
        "type": "float""string",
        "length": 12,
        "keySequence": 1,
    },
    ...
]

itemData is an array of records with variables values. Each record itself is also represented as an array of variables values. The first value is a unique sequence number for each record in the dataset.

Code Block

language	js

"itemData": {[
   [1, "MyStudy", "001", "DM", 56],
   [2, "MyStudy", "002", "DM", 26],
}
   ...
]

Missing values are represented by empty elements of an array: [null in the case of numeric variables, and an empty string in case of character variables: [1, "MyStudy", "", "DM", null]

The following is a full example of a Dataset-JSON file:

Code Block

language	js

{
    "creationDateTime": "2023-03-22T11:53:27",      
    "datasetJSONVersion": "1.0.0",    
    "fileOID": "www.sponsor.org.project123.final",
    "asOfDateTime": "2023-02-15T10:23:15",
    "originator": "Sponsor XYZ",
    "sourceSystem": "Software ABC",
    "sourceSystemVersion": "1.2.3",
    "clinicalData": {
        "studyOID": "xxx",
        "metaDataVersionOID": "xxx",
        "metaDataRef": "https://metadata.location.org/api.link",
        "itemGroupData": {
            "IG.DM": {
                "records": 600,
                "name": "DM",
                "label": "Demographics",
                "items": [                      
                    {"OID": "ITEMGROUPDATASEQ", "name": "ITEMGROUPDATASEQ", "label": "Record identifier", "type": "integer"},
                    {"OID": "IT.STUDYID", "name": "STUDYID", "label": "Study identifier", "type": "string", "length": 7, "keySequence": 1}, 
                    {"OID": "IT.USUBJID", "name": "USUBJID", "label": "Unique Subject Identifier", "type": "string", "length": 3, "keySequence": 2}, 
                    {"OID": "IT.DOMAIN", "name": "DOMAIN", "label": "Domain Identifier", "type": "string", "length": 2},
                    {"OID": "IT.AGE", "name": "AGE", "label": "Subject Age", "type": "integer", "floatlength": 2}
                ],
                "itemData": [
                    [1, "MyStudy", "001", "DM", 56],
                    [2, "MyStudy", "002", "DM", 26],
                    ...
                ]
            }
        }
    },
    "referenceData": {
        ... Same structure as clinical data
    }
}
}

The TypeScript model representation and the JSON schema for Dataset-JSON version 1.0 can be found at https://github.com/cdisc-org/DataExchange-DatasetJson.

Pagenav2

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Top Level Attributes

Introduction

Main principles

Dataset-JSON

ClinicalData and ReferenceData Attributes

ItemGroupData attribute

Page tree

Page History

Versions Compared

Old Version 1

New Version Current

Key

Top Level Attributes

Introduction

Main principles

Dataset-JSON

ClinicalData and ReferenceData Attributes

ItemGroupData attribute