You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

Dataset-JSON was adapted from  the Dataset-XML Version 1 specification, but uses JSON format.  Like Dataset-XML, each Dataset-JSON file is connected with a Define-XML file, containing detailed information about the metadata. One aim of Dataset-JSON is to address as many of the relevant requirements in the PHUSE 2017 Transport for the Next Generation paper as possible, including the efficient use of storage space.

At the top level of Dataset-JSON object, there are two optional attributes: clinicalData, referenceData, corresponding to Dataset-XML elements.

{
    "clinicalData": { ... },
    "referenceData": { ... }
}

Each of these attributes contains study and metadata OIDs as well as an object describing one or more item groups (datasets).  Values of the studyOID and metaDataVersionOID must match corresponding values in the Define-XML file.

{
    "clinicalData": {
        "studyOID": "xxx",
        "metaDataVersionOID": "xxx",
        "itemGroupData": { ... }
}

itemGroupData is an object with attributes corresponding to individual datasets. The attribute name is OID of a described dataset, which must be the same as OID of the corresponding itemGroup in the Define-XML file.

"itemGroupData": { 
    "IG.DM": { ... }
}

The dataset description contains basic information about the dataset itself and its items.

  • records - the total number of records in a dataset
  • name - dataset name
  • label  - dataset description
  • items - basic information about variables
  • itemData - dataset data
"IG.DM": {
    "records": 100,
    "name": "DM",
    "label": "Demographics",
    "items": [ ... ],
    "itemData": [ ... ]
}

items is an array of basic information about dataset variables. The order of elements in the array must be the same as the order of variables in the described dataset.

  • OID - OID of a variable (must correspond to the variable OID in the Define-XML file)
  • name - variable name
  • label - variable description
  • type - type of the variable. One of 'string', 'integer', 'float', 'double', 'decimal', 'boolean'. See ODM types for details.
  • length - variable length
  • fractionDigits - Number of digits to the right of the decimal point when type of the variable is float

"items": [    
    {
        "OID": "ITEMGROUPDATASEQ",
        "name": "ITEMGROUPDATASEQ",
        "label": "Record identifier",
        "type": "integer"
    },
    {
        "OID": "IT.DM.STUDYID",
        "name": "STUDYID",
        "label": "Study Identifier",
        "type": "string",
        "length": 12
    },
    ...
]

itemData is an array of records with variables values. Each record itself is also represented as an array of variables values. The first value is a unique sequence number for each record in the dataset.

"itemData": [
   [1, "MyStudy", "001", "DM", 56],
   [2, "MyStudy", "002", "DM", 26],
   ...
]

Missing values are represented by null in the case of numeric variables, and an empty string in case of character variables: [1, "MyStudy", "", "DM", null]

The full example of a Dataset-JSON file:

{
    "clinicalData": {
        "studyOID": "xxx",
        "metaDataVersionOID": "xxx",
        "itemGroupData": {
            "IG.DM": {
                "records": 600,
                "name": "DM",
                "label": "Demographics",
                "items": [
                    {"OID": "ITEMGROUPDATASEQ", "name": "ITEMGROUPDATASEQ", "label": "Record identifier", "type": "integer"},
                    {"OID": "IT.STUDYID", "name": "STUDYID", "label": "Study identifier", "type": "string", "length": 12}, 
                    {"OID": "IT.USUBJID", "name": "USUBJID", "label": "Unique Subject Identifier", "type": "string", "length": 3}, 
                    {"OID": "IT.DOMAIN", "name": "DOMAIN", "label": "Domain Identifier", "type": "string", "length": 2},
                    {"OID": "IT.AGE", "name": "AGE", "label": "Subject Age", "type": "float", "length": 5, "fractionDigits": 2}
                ],
                "itemData": [
                    [1, "MyStudy", "001", "DM", 56],
                    [2, "MyStudy", "002", "DM", 26],
                    ...
                ]
            }
        }
    },
    "referenceData": {
        ... Same structure as clinical data
    }
}









Dataset-JSON was adapted from the Dataset-XML Version 1 specification, but uses JSON format. Like Dataset-XML, each Dataset-JSON file is connected with a Define-XML file, containing detailed information about the metadata. One aim of Dataset-JSON is to address as many of the relevant requirements in the PHUSE 2017 Transport for the Next Generation paper as possible, including the efficient use of storage space.

Dataset-JSON is using lowerCamelCase notation for attribute names, comparing to Dataset-XML PascalCase (e.g., clinicalData vs ClinicalData).

ClinicalData and ReferenceData Attribute

At the top level of Dataset-JSON object, there are two optional attributes: clinicalData, referenceData, corresponding to Dataset-XML elements. At least one of the attribute must be provided. Subject data is stored in clinicalData and non-subject data is stored in referenceData.

{
    "clinicalData": { ... },
    "referenceData": { ... }
}

Both clinicalData and referenceData have the same structure. Each of these attributes contains study and metadata OIDs as well as an object describing one or more item groups (datasets). Values of the studyOID and metaDataVersionOID must match corresponding values in the Define-XML file.

{
    "clinicalData": {
        "studyOID": "xxx",
        "metaDataVersionOID": "xxx",
        "itemGroupData": { ... }
}

ItemGroupData attribute

itemGroupData is an object with attributes corresponding to individual datasets. The attribute name is OID of a described dataset, which must be the same as OID of the corresponding itemGroup in the Define-XML file.

"itemGroupData": { 
    "IG.DM": { ... },
    "IG.AE": { ... },
}

The dataset description contains basic information about the dataset itself and its items.

  • records - the total number of records in a dataset
  • name - dataset name
  • label  - dataset description
  • items - basic information about variables
  • itemData - dataset data
"IG.DM": {
    "records": 100,
    "name": "DM",
    "label": "Demographics",
    "items": [ ... ],
    "itemData": [ ... ]
}

items is an array of basic information about dataset variables. The order of elements in the array must be the same as the order of variables in the described dataset.

  • OID - OID of a variable (must correspond to the variable OID in the Define-XML file)
  • name - variable name
  • label - variable description
  • type - type of the variable. One of 'string', 'integer', 'float', 'double', 'boolean'. See ODM types for details.
  • length - variable length
  • fractionDigits - Number of digits to the right of the decimal point when type of the variable is float

"items": [    
    {
        "OID": "ITEMGROUPDATASEQ",
        "name": "ITEMGROUPDATASEQ",
        "label": "Record identifier",
        "type": "integer"
    },
    {
        "OID": "IT.DM.STUDYID",
        "name": "STUDYID",
        "label": "Study Identifier",
        "type": "string",
        "length": 12
    },
    ...
]

itemData is an array of records with variables values. Each record itself is also represented as an array of variables values.

"itemData": {
   [1, "MyStudy", "001", "DM", 56],
   [2, "MyStudy", "002", "DM", 26],
}

Missing values are represented by empty elements of an array: ["MyStudy", , "DM",]

The full example of a Dataset-JSON file:

{
    "clinicalData": {
        "studyOID": "xxx",
        "metaDataVersionOID": "xxx",
        "itemGroupData": {
            "IG.DM": {
                "records": 600,
                "name": "DM",
                "label": "Demographics",
                "items": [                      
                    {"OID": "ITEMGROUPDATASEQ", "name": "ITEMGROUPDATASEQ", "label": "Record identifier", "type": "integer"},
                    {"OID": "IT.STUDYID", "name": "STUDYID", "label": "Study identifier", "type": "string", "length": 7}, 
                    {"OID": "IT.USUBJID", "name": "USUBJID", "label": "Unique Subject Identifier", "type": "string", "length": 3}, 
                    {"OID": "IT.DOMAIN", "name": "DOMAIN", "label": "Domain Identifier", "type": "string", "length": 2},
                    {"OID": "IT.AGE", "name": "AGE", "label": "Subject Age", "type": "float", "length": 5, "fractiondigits": 2}
                ],
                "itemData": [
                    [1, "MyStudy", "001", "DM", 56],
                    [2, "MyStudy", "002", "DM", 26],
                    ...
                ]
            }
        }
    },
    "referenceData": {
        ... Same structure as clinical data
    }
}









  • No labels