The Semantic Web and the Web Ontology Language, OWL, are an approach to bringing semantics (i.e., context-appropriate “meaning”) to the web. With “meaning” a computer could differentiate between the notion of a gene as a unit of inheritance and a ‘Gene’ as the first name of a famous actor or rock star. In this capacity, computers could help us find, aggregate, and integrate disparate information from across the web for the benefit of advanced science. Besides using ad hoc heuristics to page scrape and infer meaning (an error prone exercise), presumably one could use something like NLP (Natural Language Processing) to allow computers to infer the meaning inherent in various web pages. But this turns out to be a gargantuan task, and today, despite both effort and success, we as a community are still a long way from being able to have computers “read and understand” web pages and human language. So another approach would be to formalize just a set of core logical constructs that allowed meaning to be expressed, rather like we express mathematical meaning in theorems and proofs. This scenario is more difficult for humans—because we rarely think and express ourselves in strict logical formalisms—but does make it easier for us to program computers to extract meaning and make decisions: something we call discerning suitability-for-purpose. OWL is the W3C standard for expressing meaning as formalized semantics on the web (viz., OWL Web Ontology Language Overview, OWL Web Ontology Language Guide). OWL enables this by allowing one to use a Description Logic to describe a web resource. A web resource could be web page, a data set, or even a service that accepts input and returns output. You could serialize (write-out) this logic in many forms; for OWL, recommended format is RDF/XML. While OWL gives us a construct in which to express the logical relationship of resources to one another, it does not give us the actual terms to use. So while you can use OWL to express that REV7 is a gene involved in DNA repair in yeast, OWL by itself gives us neither the notion of a gene, DNA, repair, ‘DNA repair’, nor yeast, and of course it does not give us the actual web page on REV7 itself. We ourselves—you and I as user/developers—make these terms. OWL is easy. OWL is based on RDF (Resource Description Framework) and two helper technologies called RDFS (Resource Description Framework Schema) and XSD (XML Schema). OWL is grounded on the notion of three basic entities: individuals, properties, classes. An individual (also called an instance) is a thing, a web resource. With only one exception, an individual is always a URI (a hyperlink); i.e., a web resource. Thus the implied semantics are that when making statements about individuals, we are making statements about whatever is represented by that hyperlink. (The one exception is that when an individual is not a URI, it is like an “anonymous resource.” Such individuals can only be referenced within a single file. These individuals—being unlinked to any universally recognized address space—are called anonymous, or blank, nodes. An individual cannot be something like a literal: a string, or a primitive data type). Given two individuals, we can establish one as the subject and the other as the object of a relationship between them. An individual could be both a subject and an object, and indeed, this is quite common when building complex statements. The relationship linking the subject to the object is a property or predicate, and the whole triple of subject-predicate-object is called a statement. So the statement: http://db.yeastgenome.org/cgi-bin/locus.pl?locus=rev7 http://www.myWebSite.org/myPredicates/hasGeneOntologyAnnotation http://db.yeastgenome.org/cgi-bin/GO/go.pl?goid=6281 says that the web page for REV7 at the Saccharomyces Genome Database (the individual representing REV7) has a ‘hasGeneOntologyAnnotation’ relationship to a data page about DNA repair, namely http://db.yeastgenome.org/cgi-bin/GO/go.pl?goid=6281). Notice how you are making statements about stuff on the web without ever having to edit, or reach into, the actual data held by others. This turns out to be extremely powerful. Unlike subjects, objects of statements may be literals, so predicates can also link individuals with data types, for example: http://db.yeastgenome.org/cgi-bin/locus.pl?locus=rev7 http://www.myWebSite.org/myPredicates/hasName “SGD REV7 Web page” The third and last entity OWL uses are classes. A class simply represents a set of individuals. The actual individuals themselves do not need to exist. So you can have a class of “DNARepairGenes” or “myFavoriteCartoonCharacters.” A class allows you to specify properties that are—i.e., must be—present for any individual who is a member of that set. In semantic web services, those individuals are web resources. The combination of individuals, their properties, and classes allows us to use reasoners (logic engines, essentially automated theorem proves) to infer logically implied statements that are true, but may not otherwise be explicitly apparent. This is what makes OWL so much more powerful than simply web service specifications using SOAP or WSDL. Because in OWL (specifically OWL-DL, see below), we not just using syntactical conventions to embed meaning in a sequence of tokens, we are imbedding meaning that maps to a description logic formalization, so reasoners can actually infer new knowledge from the logical consequences of statements. This inference of new knowledge is what Kant called synthetic a priori judgments in his famous example that ‘7 + 5’ is not the same as ‘12’. “Twelve” is logically derived new knowledge from ‘7 + 5’, as you can prove to yourself by finishing the statement: ‘7593748939039374583490345 + 9347858383745834939384784 = ’. Even though the answer is logically embedded within the statement, actually executing the mathematical (logical) operator and recognizing the answer is new knowledge, since the types of actions you can do in this world differ on whether you know the left-hand or right-hand side of the equality. This is at the philosophical core of encryption algorithms. The bottom line is that reasoners can be powerful engines to find truths that are necessarily true, but are otherwise hidden from us; things like, “Who has the appropriate gene expression data that is relevant to my cancer study?” Reasoning is hard. Yet very quickly we can start asking questions, such as “Get the me the set of all disjoint sets whose individuals have the value ‘REV7’ in their name” that, depending on the knowledge base, can be very difficult or even impossible for a reasoner to solve. To address this OWL is partitioned into three levels. The most expressive level is OWL Full, but you can write statements in OWL Full that no computer may be able to ever fully solve, if by solve, we mean, “Get me all the logical implications of these statements.” A slightly less expressive level is called OWL DL (“DL” for Description Logic). In OWL DL you are guaranteed “computational completeness (all entailments are guaranteed to be computed) and decidability (all computations will finish in finite time).” (Of course, “finite” can still be a long time J ). SSWAP uses OWL DL. Even less expressive is OWL Lite. It is hoped that OWL Lite may put the least burden on both reasoners and humans needing to code legacy schema into OWL. A set of valid terms is called a controlled vocabulary. A set of terms where some terms describe the relationships of other terms to each other is called an ontology. SSWAP (Simple Semantic Web Architecture and Protocol) is an ontology specifically designed to allow web resources to describe themselves; to enable you to query on those resources; to engage them; and to semantically encode the result. Because SSWAP enables this interoperability (it defines a semantic hand-shaking, or rules of engagement), it is called a protocol. SSWAP is a protocol for semantic web services: gives everyone a common set of terms with specific meaning to allow you to describe and engage in discovering and sharing data and services. Because SSWAP uses OWL, SSWAP resources are amenable to reasoning. SSWAP defines terms such as what it means to be a web resource, who provides that resource, and how the resource maps its input to its output. SSWAP does not define the particulars of the resource—such as if ‘gene’ stands for REV7 or Gene Hackman or Gene Simmons, but it enables you to do so. SSWAP is aimed at being lightweight (it defines only five classes and 13 properties), so instead of setting it’s own rules for authentication, security, service integrity, etc., it is designed to ride on top of existing protocols that already address these issues, such as SSL and https. Where this is insufficient, SSWAP allows the community to extend the base protocol and establish it’s own standards. As will become clear later, a major hindrance to using large ontologies for semantic web services is that often those ontologies are described in their entirety in a single, monolithic file. This creates problems for semantic web services, because terms within the file are addressed by using the fragment identifier (#). Fragment identifiers are client-side locators; they are not guaranteed to be sent to web servers (www.w3.org/TR/webarch/#media-type-fragid, www.w3.org/Addressing/URL/4_2_Fragments.html, www.w3.org/TR/webarch/#fragid, www.w3.org/DesignIssues/Fragment). So, for example, a call for three terse terms in a 10 MB ontology could result in the web server sending the entire file three times. For this reason, SSWAP establishes each term as its own file and then uses OWL to join the terms logically (instead of physically) into a meta-ontology. We emphasize this by using the conventional RDF/XML syntax ‘sswap:someTerm’ where sswap is the base URL and each someTerm is a file underneath it. |
|
|
|
|