It’s generally accepted that Hanns Oertel in his ‘Lectures on the study of language’ was the first to clearly distinguish between the formal and the semantic definitions of a word – the latter relates to its significance or meaning. That was back in 1901, and exactly one-hundred years later I first read about ‘The Semantic Web’ in a Scientific American article that included a notable name among its three authors: Tim Berners-Lee.
The Semantic Web is a mechanism for linking the contextual meaning of online data in such a way that it can be processed efficiently by a computer while remaining accessible to humans. The concept of metadata is nothing new, but Berners-Lee goes way beyond simple keyword tagging: his vision is one of a unifying logical language that can progressively link concepts old and new into a truly universal web. A structure that will ‘open up the knowledge and workings of humankind to meaningful analysis by software agents’ as Berners-Lee put it in that Scientific American piece (May 2001, http://tinyurl.com). He continued: ‘The Semantic Web will bring structure to the meaningful content of web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users.’ This wouldn’t require any great leap forward in AI, but merely that the data be published in a more informative format than today’s HTML. By taking a Semantic Web approach, information literally becomes free to be used within whatever context someone requires, even if far removed from its original purpose.
RDF, FOAF, XML and other acronyms
At the heart of the Semantic Web lies the Resource Description Framework (RDF), a set of specifications based upon the idea of ‘making statements about resources in the form of a subject-predicate-object expression’, according to the Wikipedia. Maintained by the World Wide Web Consortium (W3C), which Berners-Lee heads, RDF provides a methodology for describing resources while retaining a relatively simple data model. Publish your information to the public domain using RDF (plus XML, which has become a de facto part of the framework, and a uniform resource identifier that identifies and links resources) and once so published it can be repurposed in any way another application or user sees fit.
So will the Semantic Web make meaning understandable to computers, creating a medium for universal information exchange? Well yes, in theory. The problem is that there are very few examples out there that you can go and play with. You could take a look at the FOAF Project (www.foaf-project.org), which is all about creating a web of machine-friendly home pages that describe the links between people, the data they create and the things they do. Unfortunately, it seems to have lost momentum somewhat and many of its links are broken. The FOAF Explorer (http://xml.mfd-consult.dk) offers a web-based overview of the FOAF Project and may be a better starting point.
Then there’s BigBlogZoo (www.bigblogzoo.com), which claims to be the world’s first semantic web browser, but in reality bears more than a passing resemblance to a blog/newsfeed browser (that’s just what it is, of course). It defines a newsfeed as being a semantic reference to a web page because it adds dates, languages, categories and descriptions to the website itself. Download it and play with it yourself: you get 80,000 or so XML feeds, categorised using the DMOZ schema and all of which can be spidered according to your preferences.