The Google Knowledge Graph
Introduced in May 2012 with the goal of enhancing Google’s Web Search results, the Google Knowledge Graph is a massive entity-relationship graph that stores information about all the entities that Google users care about (e.g. people, places, events, etc), as well the “relationships” between them.
It enhances Google Web Search in three different ways. First, it enables query disambiguation at the entity level, as illustrated on the right rail of the search results page for the query “Paris, Texas“. Second, it provides key information about the entity identified in a query (e.g. short textual description, key facts and relationships, thumbnail), as illustrated on the right rail of the search results page for the query “Brad Pitt“. Finally, it also suggests related entities relevant to the topic and intent of a query, as illustrated in the carousel displayed at the top of the search results page for the query “Brad Pitt movies“.
Introducing the Open Knowledge Graph
Despite its roots in Freebase and open data, the Google Knowledge Graph is a closed knowledge base. There is no publicly-available API to access its content, and crawling or fetching its content systematically is neither practicable (i.e. 500 million objects; 3.5 billion facts and relationships) nor allowed under Google’s Terms and Conditions. In such a context, Thomas Steiner and Stefan Mirea launched the Open Knowledge Graph, an experimental project to create an open version of the Google Knowledge Graph, with a publicly-available API.
The underlying platform consists of two main components: the SEKI@Home browser extension (i.e. Search for Embedded Knowledge Items) is used to crowd source the extraction of Google Knowledge Graph facts from google.com, while the Open Knowledge Graph repository is used to store the extracted knowledge centrally, and provide access to it. As people with the browser extension installed search on google.com, the extension anonymously extracts Knowledge Graph facts from Google’s search result pages, and sends them to the central repository where they are mapped against an ontology, converted into RDF triples, and stored in a RDF database with a publicly-accessible SPARQL query interface, creating an open Knowledge Graph with a publicly-accessible API over time.
Despite its relative anonymity, the project seems to have been quite successful. The exact number of distinct entities collected is not known, but in its short 26-day life (it started on August 11th), the project generated 2,850,510 RDF triples and mapped 380 properties. Unfortunately, it had to be shutdown on September 6th, on Google’s request…
Shutting Down the Open Knowledge Graph
Indeed Jack Menzel, a Product Management Director at Google, reached out to the authors Thomas Steiner and Stefan Mirea (incidentally both affiliated to Google) to state that – by design – Google Knowledge Graph data were only available via (and for) Google’s Web Search, and that Google was already making a lot of structured data publicly available via Freebase, which includes information about 23,407,174 topics as of today.
More importantly, he explained that there was a couple of specific reasons why Google couldn’t “participate” in the Open Knowledge Graph project. First, some of the data in the Google Knowledge Graph are from closed datasets acquired from sources that did not granted Google the rights to redistribute them. Some other datasets have more open licenses, but still have share-alike or attribution constraints. Second, he reminded that – by principle – Google was blocking any kind of automatic extraction allowing to collect information about its search and ranking technologies because “they were the proprietary cores of what Google provides”.
A sad, yet expected, conclusion for an interesting experiment. And at the same time very understandable given the resources involved in the creation and maintenance of Google’s Knowledge Graph.
A General Method for Crowd-Sourcing Information Extraction
Anyway, although the Open Knowledge Graph experiment, and more specifically the SEKI@Home browser extension, was focused on extracting (and providing access to) information from the Google Knowledge Graph, the approach outlined is actually generic and applicable to any similar database on the Web. More details should be made available later this year, when the associated paper “SEKI@home, or Crowdsourcing an Open Knowledge Graph” will be presented at KESCM 2012.
KECSM 2012 paper:
- SEKI@home, or Crowdsourcing an Open Knowledge Graph, Thomas Steiner and Stefan Mirea, Proc. of the 1st International Workshop on Knowledge Extraction and Consolidation from Social Media, 2012, Boston, USA.
SEKI@Home and the Open Knowledge graph:
Google Knowledge Graph:
Coverage on semanticweb.com: