...
Mainstream definition for unstructured data is data that do not contain descriptive information about the data themselves. An email body is unstructured. A newspaper op-ed is unstructured; definition further goes to describe structured describe structured data to have a rigid schema requirement. Semi-structured data are the middle category, where meaning of data is, for example, self-explanatory using markup tags.
Below are examples to illustrate these three kinds of data.
Section | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
Controlled Terminology, on the other hand, has publication in PDF format rendered directly from raw data. The raw data is highly structured. NCI EVS maintains all codelist and terms in Protégéin Protégé, an an ontology tool. EVS extends Protégé extends Protégé with OWL, a web semantic layer.[1] Data serialization happens at each quarterly each quarterly publication to produce formats available for download, such as CSV, Excel spreadsheet, and PDF. This structured data approach enables many scenarios of reusability. Define-XML, codelists & terms in implementation guides, and CDISC 360's biomedical concepts, just to name a few. Controlled Terminology reaps the benefit of repeatability by virtue of its highly structured nature. CDISC publishes Controlled Terminology four times per year, with a maximum of six packages each time. With this level of frequency and quantity, a repeatable process is a must.
Technology plays a significant role when working with various structuredness of data. An overall data architecture is a strategy that must be very well articulated. This impacts storage, analytics, accessibility, and discoverability. Computing evolves at rapid pace. Especially with cloud computing. , it has created many choices in the marketplace for different kinds of data: RDBMS, graph database, document store, multi-model database. Long gone is the era where all data would go in to one single enterprise data store. Database-as-a-service has become a hot commodity, offering unprecedented flexibility, scalability, and modularityand modularity. Implementers today can choose to put their data in a right container for the right purpose.
...