Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Dataset-JSON was adapted from  from the Dataset-XML Version 1.0 specification, but uses JSON format.   Like Dataset-XML, each Dataset-JSON file is connected with a Define-XML file , containing detailed information about the metadata. One aim of Dataset-JSON is to address as many of the relevant requirements in the PHUSE 2017 Transport for the Next Generation paper as possible, including the efficient use of storage space.

At the top level of Dataset-JSON object, there are two optional attributes: clinicalData, referenceData, corresponding is using lowerCamelCase notation for attribute names, comparing to Dataset-XML elements.

Code Block
languagejs
{
    "clinicalData": { ... },
    "referenceData": { ... }
}

Each of these attributes contains study and metadata OIDs as well as an object describing one or more item groups (datasets).  Values of the studyOID and metaDataVersionOID must match corresponding values in the Define-XML file.

Code Block
languagejs
{
    "clinicalData": {
        "studyOID": "xxx",
        "metaDataVersionOID": "xxx",
        "itemGroupData": { ... }
}

itemGroupData is an object with attributes corresponding to individual datasets. The attribute name is OID of a described dataset, which must be the same as OID of the corresponding itemGroup in the Define-XML file.

Code Block
languagejs
"itemGroupData": { 
    "IG.DM": { ... }
}

The dataset description contains basic information about the dataset itself and its items.

  • records - the total number of records in a dataset
  • name - dataset name
  • label  - dataset description
  • items - basic information about variables
  • itemData - dataset data
Code Block
languagejs
"IG.DM": {
    "records": 100,
    "name": "DM",
    "label": "Demographics",
    "items": [ ... ],
    "itemData": [ ... ]
}

items is an array of basic information about dataset variables. The order of elements in the array must be the same as the order of variables in the described dataset.

  • OID - OID of a variable (must correspond to the variable OID in the Define-XML file)
  • name - variable name
  • label - variable description
  • type - type of the variable. One of 'string', 'integer', 'float', 'double', 'decimal', 'boolean'. See ODM types for details.
  • length - variable length
  • fractionDigits - Number of digits to the right of the decimal point when type of the variable is float

PascalCase (e.g., clinicalData vs ClinicalData).

JSON format does not allow to specify or control order of attributes. Despite of that, as most JSON engines allow to control the order of attributes it is strongly recommended to follow the attribute order specified in details. The reason for that is that due to a possible large size of Dataset-JSON files, following the specified order will enable a software using steaming approaches to read the file to work in an efficient and fast way.

Dataset-JSON must contain only one dataset per file. 

Top Level Attributes

At the top level of Dataset-JSON object, there are technical attributes and two main optional attributes: clinicalData and referenceData, corresponding to Dataset-XML elements. At least 1 of the main attributes must be provided. Subject data is stored in clinicalData and non-subject data is stored in referenceData.

AttributeUsageDescriptionAttribute order
creationDateTimeRequiredTime of creation of the file containing the document.1
datasetJSONVersionRequiredVersion of Dataset-JSON standard2
fileOIDOptionalA unique identifier for this file.3
asOfDateTimeOptionalThe date/time at which the source database was queried in order to create this document.4
originatorOptionalThe organization that generated the Dataset-JSON file.5
sourceSystemOptionalThe computer system or database management system that is the source of the information in this file.6
sourceSystemVersionOptionalThe version of the "sourceSystem" above.7
clinicalDataOptionalContains datasets for clinical data across multiple subjects.8
referenceDataOptionalContains datasets for non-subject data domains.9
Code Block
languagejs
{
    "creationDateTime": "2023-03-22T11:53:27",      
    "datasetJSONVersion": "1.0.0",
    "fileOID": "www.sponsor.xyz.org.project123.final",
    "asOfDateTime": "2023-02-15T10:23:15",
    "originator": "Sponsor XYZ",
    "sourceSystem": "Software ABC",
    "sourceSystemVersion": "1.0.0",
    "clinicalData": { ... },
    "referenceData": { ... }
}

ClinicalData and ReferenceData Attributes

Both clinicalData and referenceData have the same structure. Each of these attributes contains study and metadata OIDs, optional reference to the metadata file and an object describing an item group (dataset). The following attributes are defined on this level.

AttributeRequirementDescriptionAttribute order
studyOIDOptionalSee ODM definition for study OID (ODM/Study/@OID).1
metaDataVersionOIDOptionalSee ODM definition for metadata version OID (ODM/Study/MetaDataVersion/@OID).2
metaDataRefOptionalURL for a metadata file describing the data.3
itemGroupDataRequiredObject containing dataset information4


Values of the studyOID and metaDataVersionOID must match corresponding values in the Define-XML file.

Code Block
languagejs
{
    "clinicalData": {
Code Block
languagejs
"items": [    
    {
        "OID": "ITEMGROUPDATASEQ",
        "name": "ITEMGROUPDATASEQ",
        "labelstudyOID": "Record identifierxxx",
        "typemetaDataVersionOID": "integer"
    },
    {xxx",
        "OIDmetaDataRef": "IT.DM.STUDYID",
        "name": "STUDYID",
        "label": "Study Identifier",
        "type": "string",
        "length": 12
    },
    ...
]

itemData is an array of records with variables values. Each record itself is also represented as an array of variables values. The first value is a unique sequence number for each record in the dataset.

Code Block
languagejs
"itemData": [
   [1, "MyStudy", "001", "DM", 56],
   [2, "MyStudy", "002", "DM", 26],
   ...
]

Missing values are represented by null in the case of numeric variables, and an empty string in case of character variables: [1, "MyStudy", "", "DM", null]

The full example of a Dataset-JSON file:

Code Block
languagejs
{
    "clinicalData": {
        "studyOID": "xxx",
        "metaDataVersionOID": "xxx",
        "itemGroupData": {
            "IG.DM": {
                "records": 600,
                "name": "DM",
                "label": "Demographics",
                "items": [
                    {"OID": "ITEMGROUPDATASEQ", "name": "ITEMGROUPDATASEQ", "label": "Record identifier", "type": "integer"},
                    {"OID": "IT.STUDYID", "name": "STUDYID", "label": "Study identifier", "type": "string", "length": 12}, 
                    {"OID": "IT.USUBJID", "name": "USUBJID", "label": "Unique Subject Identifier", "type": "string", "length": 3}, 
                    {"OID": "IT.DOMAIN", "name": "DOMAIN", "label": "Domain Identifier", "type": "string", "length": 2},
                    {"OID": "IT.AGE", "name": "AGE", "label": "Subject Age", "type": "float", "length": 5, "fractionDigits": 2}
                ],
                "itemData": [
                    [1, "MyStudy", "001", "DM", 56],
                    [2, "MyStudy", "002", "DM", 26],
                    ...
                ]
            }
        }
    },
    "referenceData": {
        ... Same structure as clinical data
    }
}

Dataset-JSON was adapted from the Dataset-XML Version 1 specification, but uses JSON format. Like Dataset-XML, each Dataset-JSON file is connected with a Define-XML file, containing detailed information about the metadata. One aim of Dataset-JSON is to address as many of the relevant requirements in the PHUSE 2017 Transport for the Next Generation paper as possible, including the efficient use of storage space.

Dataset-JSON is using lowerCamelCase notation for attribute names, comparing to Dataset-XML PascalCase (e.g., clinicalData vs ClinicalData).

ClinicalData and ReferenceData Attribute

https://metadata.location.org/api.link",
        "itemGroupData": { ... }
}

ItemGroupData attribute

itemGroupData is an object with a single attribute corresponding to an individual dataset. There must be only one dataset per Dataset-JSON file. The attribute name is OID of a described dataset, which must be the same as the OID of the corresponding itemGroupDef in the Define-XML fileAt the top level of Dataset-JSON object, there are two optional attributes: clinicalData, referenceData, corresponding to Dataset-XML elements. At least one of the attribute must be provided. Subject data is stored in clinicalData and non-subject data is stored in referenceData.

Code Block
languagejs
{
    "clinicalData"itemGroupData": { ... },
    "referenceDataIG.DM": { ... }
}

Both clinicalData and referenceData have the same structure. Each of these attributes contains study and metadata OIDs as well as an object describing one or more item groups (datasets). Values of the studyOID and metaDataVersionOID must match corresponding values in the Define-XML file.



The dataset description contains basic information about the dataset itself and its items.

AttributeRequirementDescriptionAttribute order
recordsRequiredThe total number of records in a dataset1
nameRequiredDataset name2

label

RequiredDataset description3
itemsRequiredBasic information about variables4
itemDataRequiredDataset data5
Code Block
languagejs
"IG.DM": 
Code Block
languagejs
{
    "clinicalDatarecords": {100,
    "name": "DM",
    "studyOIDlabel": "xxxDemographics",
        "metaDataVersionOIDitems": "xxx" [ ... ],
    "itemData": [   "itemGroupData": { ... }
}

ItemGroupData attribute

...

... ]
}

items is an array of basic information about dataset variables. The order of the elements in the array must be the same as the order of variables in the described dataset. The first element always describes the Record Identifier (ITEMGROUPDATASEQ).

AttributeRequirementDescriptionAttribute order
OIDRequiredOID of a variable (must correspond to the variable OID in the Define-XML file

...

)1
nameRequiredVariable name2

label

RequiredVariable description3
typeRequiredType of the variable. Allowed values: "string", "integer", "decimal", "float", "double", "boolean". See ODM types for details.4
lengthOptionalVariable length5
displayFormatOptionalDisplay format supports data visualization of numeric float and date values. 6
keySequenceOptionalIndicates that this item is a key variable in the dataset structure. It also provides an ordering for the keys.7

...

Code Block
languagejs
"itemGroupData": { 
    "IG.DM": { ... },
    "IG.AE": { ... },
}

The dataset description contains basic information about the dataset itself and its items.

...

Code Block
languagejs
"IG.DMitems": {
[    "records": 100,
    "name": "DM",
    "label": "Demographics",
    "items": [ ... ],
{
        "itemData": [ ... ]
}

items is an array of basic information about dataset variables. The order of elements in the array must be the same as the order of variables in the described dataset.

  • OID - OID of a variable (must correspond to the variable OID in the Define-XML file)
  • name - variable name
  • label - variable description
  • type - type of the variable. One of 'string', 'integer', 'float', 'double', 'boolean'. See ODM types for details.
  • length - variable length
  • fractionDigits - Number of digits to the right of the decimal point when type of the variable is float

Code Block
languagejs
"items": [    
    {OID": "ITEMGROUPDATASEQ",
        "name": "ITEMGROUPDATASEQ",
        "label": "Record identifier",
        "type": "integer",
    },
    {
        "OID": "IT.DM.STUDYID",
        "name": "STUDYID",
        "label": "Study Identifier",
        "OIDtype": "string",
        "ITEMGROUPDATASEQlength": 12,
        "namekeySequence": "ITEMGROUPDATASEQ" 1,
    },
        "label": "Record identifier",
        "type": "integer"
    },
    {
        "OID": "IT.DM.STUDYID",
        "name": "STUDYID",
        "label": "Study Identifier",
        "type": "string",
        "length": 12
    },
    ...
]

itemData is an array of records with variables values. Each record itself is also represented as an array of variables values.

Code Block
languagejs
"itemData": {
   [1, "MyStudy", "001", "DM", 56],
   [2, "MyStudy", "002", "DM", 26],
}

Missing values are represented by empty elements of an array: ["MyStudy", , "DM",]

The full example of a Dataset-JSON file:

...
]

itemData is an array of records with variables values. Each record itself is also represented as an array of variables values. The first value is a unique sequence number for each record in the dataset.

Code Block
languagejs
"itemData": [
   [1, "MyStudy", "001", "DM", 56],
   [2, "MyStudy", "002", "DM", 26],
   ...
]

Missing values are represented by null in the case of numeric variables, and an empty string in case of character variables: [1, "MyStudy", "", "DM", null]

The following is a full example of a Dataset-JSON file:

Code Block
languagejs
{
    "creationDateTime": "2023-03-22T11:53:27",      
    "datasetJSONVersion": "1.0.0",    
    "fileOID": "www.sponsor.org.project123.final",
    "asOfDateTime": "2023-02-15T10:23:15",
    "originator": "Sponsor XYZ",
    "sourceSystem": "Software ABC",
    "sourceSystemVersion": "1.2.3",
Code Block
languagejs
{
    "clinicalData": {
        "studyOID": "xxx",
        "metaDataVersionOID": "xxx",
        "metaDataRef": "xxxhttps://metadata.location.org/api.link",
        "itemGroupData": {
            "IG.DM": {
                "records": 600,
                "name": "DM",
                "label": "Demographics",
                "items": [                      
                    {"OID": "ITEMGROUPDATASEQ", "name": "ITEMGROUPDATASEQ", "label": "Record identifier", "type": "integer"},
                    {"OID": "IT.STUDYID", "name": "STUDYID", "label": "Study identifier", "type": "string", "length": 7, "keySequence": 1}, 
                    {"OID": "IT.USUBJID", "name": "USUBJID", "label": "Unique Subject Identifier", "type": "string", "length": 3, "keySequence": 2}, 
                    {"OID": "IT.DOMAIN", "name": "DOMAIN", "label": "Domain Identifier", "type": "string", "length": 2},
                    {"OID": "IT.AGE", "name": "AGE", "label": "Subject Age", "type": "floatinteger", "length": 5, "fractiondigits": 2}
                ],
                "itemData": [
                    [1, "MyStudy", "001", "DM", 56],
                    [2, "MyStudy", "002", "DM", 26],
                    ...
                ]
            }
        }
    },
    "referenceData": {
     }
   ... Same structure as clinical data}
    }
}

The TypeScript model representation and the JSON schema for Dataset-JSON version 1.0 can be found at https://github.com/cdisc-org/DataExchange-DatasetJson.

Pagenav2