The S3Model Reference Model provides extended datatypes and a document-like structure capability to enhance interoperability across disparant systems. This is the foundation of the modern data-centric information management world. The business case for becoming data-centric is detailed in other sections.
The fundamental concepts, expressed in the reference model classes, are based on basic philosophical concepts of real world entities. These broad concepts can then be constrained to more specific concepts using models created by domain experts. These Data Models (DMs) created by domain experts provides the ability for their tacit knowledge to be directly encoded into a model-driven system. A trained domain expert can create computable models that are not available through any other methodology.
The attempt to design a data model for a concept is restricted to the knowledge and context of a particular domain expert. In effect, there can never be one best or a maximal data model for human concepts being communicated to machines. This means that from a global perspective there may be several DMs that purport to fill the same need. But are actualy slightly different due to the many human factors involved in their design.
There is no conflict in the S3Model world in this case as DMs are identified using a Type 4 UUID and there are no semantics in the processing and validation model. The machine only knows what you tell it the context is for this model.
The real advantage to using the S3Model approach to information modeling is that it provides for a wide variety of applications to be developed based on the broad concepts defined in the reference model. By having domain experts within the their fields, define and create the DMs, they can then be shared across multiple applications so that the structure and semantics of the data is not locked into one specific application, but can be exchanged among many different applications. This approach provides the real foundation for artificial intelligence to move forward instead of being isolated in each instance of implementation as it is today.
To demonstrate the differences between the S3Model approach and the typical data model design approach we will use two common metaphors. This is assuming the reference implementation based on XML Schema.
- The first is for the data model approach to developing software. This is where a set of database definitions are created based on a requirements statement representing an information model. An application is then developed to support all of the functionality required to input, manipulate and output this data as information, all built around the data model. This approach is akin to a jigsaw puzzle (the software application) where the shape of the pieces are the syntax and the design and colors are the semantics of the information represented in an aggregation of data components described by the model. This produces an application that, like the jigsaw puzzle, can provide for pieces (information) to be exchanged only between exact copies of the same puzzle. If you were to try to put pieces from one puzzle into a different puzzle, you might find that a piece has the same shape (syntax) but the picture on the piece (semantics) will not be the same. Even though they both belong to the same domain of jigsaw puzzles. You can see that getting a piece from one puzzle to correctly fit into another is going to require manipulation of the basic syntax (shape) and / or semantics (picture) of the piece. This can also be extended to the relationship that the puzzle has a defined limit of its four sides. It cannot, reasonably, be extended to incorporate new pieces (concepts) discovered after the initial design.
- The multi-level, data-centric approach used in S3Model is akin to creating models (applications) using the popular toy blocks made by Lego® and other companies. If you compare a box of these interlocking blocks to the reference model and the instructions to creating a specific toy model (software application), where these instructions present a concept constraint (implemented as a DM in S3Model). You can see that the same blocks can be used to represent multiple toy models without any change to the physical shape, size or color of each block. Now we can see that when new concepts are created within healthcare, they can be represented as instructions for building a new toy model using the same fundamental building blocks that the original software applications were created upon.
So, the various applications are now part of a larger infrastructure or ecosystem of interoperability. Any application that understands the reference model can now interpret the meaning conveyed in constraint definitions.
Data Models (DMs) can be created using any XML Schema editor or even a plain text editor. However, this is not a recommended approach. Realistic DMs can be several hundred and up to thousands of lines long. They also require Type 4 UUIDs to be created as complexType and element names. These UUIDs should be machine generated.
Data Insights, Inc. has a Data Model Generator (DMGen) for license.
The root element of a DM and all complexType and global elements will use 4 Type UUIDs as defined by the IETF RFC 4122 See: https://www.ietf.org/rfc/rfc4122.txt The filename of a DM may use any format defined by the DM author. The DM author must recognize that the metadata section of the DM must contain the correct RDF:about URI with this filename.
As a matter of consistency and to avoid any possible name clashes, the DMs created by the DMGen also use the DM ID (DM-<uuid>.xsd). To be a viable DM for validation purposes the DM should use the W3C assigned extension of ‘.xsd’. Though many tools may still process the artifact as an XML Schema without it. The S3Model community considers it a matter of good artifact management practice to use the DM ID with the .xsd extension, as the filename.
Versioning of DMs is not supported by these specifications. Though XML Schema does have supporting concepts for versioning of schemas, this is not desirable in DMs. The reasons for this decision focuses primarily around the ability to capture temporal and ontological semantics for data instances and maintain them for all time (future proof data). A key feature of S3Model is the ability to guarantee the semantics for all future time, as intended by the original modeler. We determined that any change in the structure or semantics of a DM, constitutes a new DM. Since the complexTypes are re-usable (See the PCM description below), an approach that tools should use is to allow for copying a DM and assigning a new DM ID. The editing the required PCMs before publishing the new DM. Reusing PCMs across DMs improves the ability to perform data analysis across time.
When a complexType is changed within this new DM, all ancestors (enclosing complexTypes) also must be assigned a new name along with its global element name. For example if the enumerations on a XdStringType restriction are changed, the XdStringType, the XdAdapterType, the parent ClusterType and any enclosing ClusterTypes, the EntryType and the DMType must all get new UUIDs. The DMGen is aware of these processes and makes this an easier task than manual editing.
Model Components (MCs)¶
S3Model DMs are made up of XML schema complexTypes composed by restriction of the Reference Model complexTypes. This is the foundation of interoperability. What is in the Reference Model is the superset of all DMs. Because part of the name is a unique identification the complexTypes can be seen as re-usable components. For example, a domain expert might model a complexType that is a restriction of XdStringType with the enumerations for selecting one of the three measurement systems for temperature; Fahrenheit, Kelvin and Celsius. This MC as well as many others can be reused in many DMs without modification. For this reason, the semantic links for MCs are directly expressed in an xs:appinfo section in each MC. This approach lends itself very well to the creation of RDF triples from this information. For example:
<xs:appinfo> <rdf:Description rdf:about='&S3Model;mc-3a54417d-d1d6-4294-b868-e7a9ab28f8c4'> <rdfs:isDefinedBy rdf:resource='http%3A//purl.obolibrary.org/obo/RO_0002371'/> </rdf:Description> </xs:appinfo>
In this example the subject is &S3Model;mc-3a54417d-d1d6-4294-b868-e7a9ab28f8c4 the predicate is rdfs:isDefinedBy and the object is http%3A//purl.obolibrary.org/obo/RO_0002371
Every xs:appinfo section must begin with the rdf:Description element and have the rdf:about attribute to define the subject, as the containing complexType. This is then followed by one or more predicate/object components. The predicates can be from any vocabulary/terminology. Just be certain that the namespace prefix is correctly defined in the DM header. The S3Model Tools defines common namespaces by default but others may be added as needed. Also be certain that any URLs are properly encoded so that they will be valid inside the DM. RDF triples are a cornerstone of the semantic web. For more information see this tutorial. Of particular interest here is the section titled; Introducing RDF/XML. RDF/XML is one of the syntaxes used to describe semantic links and it is what we use in S3Model.