So, what is the next wave after Web 2.0 (aka Social Web)? If we ask this question to the inventor of the web Tim Berners-Lee most probably the answer would be Semantic Web. The vision of Semantic Web is to transform the web into a distributed database system where all the data in the web will be interconnected and machine readable. Some Semantic Web evangelists are so optimistic about the future of Semantic Web that they are using the terms Semantic Web and Web 3.0 almost synonymously! According to Tim Berners-Lee’s vision of Semantic Web, the web should have the functionality of connecting all of its data to each other as Linked Data just like the hyperlinks of webpages in the present web architecture. Tim Berners-Lee is also a pioneer of Open Data movement. As Open Data are published in public domains, it offers the perfect playground for Semantic Web and Linked Data supporters.
In practice all the Open Data publishers do not necessarily care about publishing their data as Linked Data. However, Tim Berners-Lee led W3C recommends the Open Data and/or Government data publishers to put their data on the web as Linked Data (and it helps W3C in their progress towards Semantic Web). According to Tim Berners-Lee “The term Linked Data refers to a set of best practices for publishing and connecting structured data on the web”. This idea of Linked Data is also associated with Tim Berners-Lee’s vision of Semantic Web.
Image source: www.w3.org
Some Open Data publishers follow W3C recommendation and publish their Open Data as Linked Data. Therefore apparently the Open Data available in the web can be divided into two categories – Linked Open Data and Non Linked Open Data. So, the question is, does Open Data really need to be Linked? And, Who needs Semantic Web?
At this moment, perspectives towards the Open Data vary from organisation to organisation according to their organisational nature and motivation behind publishing Open Data. For instance, according to Local Government Improvement and Development (UK), “The idea behind Open Data is that information held by government should be freely available to use and re-mix by the public”. On the other hand, W3C emphasises more on the technical aspects and standard of data publishing while discussing about Open Data. W3C’s approach towards Open Data is influenced by the Tim Berners-Lee’s Linked Data concept.
In technical terms, Semantic Web refers to the use of specific W3C Semantic Web open standards (RDF, OWL, SPARQL, GRDDL etc) that have been developed for data integration over the web and to make the web data machine readable. From an abstract point of view, the purpose of Semantic Web is to create a web wisdom that leads to a global knowledge society. There are a number of projects in the web that are following W3C Semantic Web open standards and developing interesting applications. DBPedia is one such project that extracts the structured data from the textual content of Wikipedia. DBPedia aims to convert Wikipedia content into structured data using Semantic Web technologies so that people can get easier access to Wikipedia’s knowledgebase and perform sophisticated queries. DBPedia’s Ontology is not limited into one single domain. It has RDF links with other Open Data knowledgebases like OpenCyc, WordNet, Freebase, UMBEL, MusicBrainz etc. The link-space created by RDF links between DBPedia and other Open Data websites can be considered as a miniature version of ideal Semantic Web where all the data are interlined across different domains and websites.
However, not necessarily all the Open Data publishers publish their data as Linked Data. In fact, Linked Data and semantic web have a lot of criticism as well and for various reasons not everyone supports the idea of publishing Open Data as Linked Data. For instance, Pachube does not publish their Open Data as Linked Data. As of now, Pachube uses Folksonomy (user generated tags) to categorise the type of data feeds published as Open Data in Pachube platform. Pachube is not using any automated machine generated and readable “Ontology” to classify the Open Data.
RDF is not the only option available for putting a machine created Semantic Layer on the data. Triple tag or Machine Tag is another tagging system used by Flickr and Delicious to add machine generated semantic information to their photos and bookmarks.
From the above discussion we can see that there are two methods available for knowledge representation and document indexing in the web – Ontology and Folksonomy. The difference between Folksonomy and Ontology is that Folksonomy is created and used by human whereas Ontology is created and used by machines. However, new concepts have been emerging that combine both Folksonomy and Ontology together rather than choosing one of them and dumping another.
Almost all the Open Data projects are using either Folksonomy or Ontology for their content/data classifications. As it was mentioned earlier, by combining both of the methods together into one standard we can make our web more structured. If we can decide on one standard then the similar Open Data projects can share their resources with each other just like DBPedia. Therefore, the answer of “Does Open Data really need to be linked data?” is probably YES as it makes Open Data more structured and searchable, but they all better follow only one standard technology (e.g. RDF or Machine Tag) for making their Open Data linked. However, transforming the entire web into Semantic Web is too challenging and its feasibility and necessity are also questionable.