dbpediaspotlight

Information Extraction framework from Wikipedia, Wikidata & Commons

Technologies
python, java, scala, rdf, nosql
Topics
big data, data science, natural language processing, semantic web
Information Extraction framework from Wikipedia, Wikidata & Commons
Almost every major Web company has now announced its work on a Knowledge Graph: Google’s Knowledge Graph, Yahoo!’s Web of Objects, Microsoft's Satori Graph, Walmart Lab’s Social Genome, and Facebook’s Entity Graph, just to cite the biggest ones.

DBpedia

DBpedia is a community-run project that has been working on a free, open-source Knowledge Graph since 2006!

DBpedia currently describes 38.3 million “things” of 685 different “types” in 128 languages, with over 4 billion “facts”. It is interlinked to many other databases. The knowledge in DBpedia is exposed through a technology stack called Linked Data, which has been revolutionizing the way applications interact with the Web: with Linked Data technologies, all APIs are interconnected via standard Web protocols and languages.

The Web of Data

Such Web of Data provides useful knowledge that can complement the Web of documents in many ways. See, for instance, how bloggers tag their posts or assign them to categories in order to organize and interconnect their blog posts. This is a very simple way to connect unstructured text to a structure (hierarchy of tags). For more advanced examples, see how BBC has created the World Cup 2010 website by interconnecting textual content and facts from their knowledge base.

Or, more recently, did you see that IBM's Watson used DBpedia data to win the Jeopardy challenge?

DBpedia Spotlight

DBpedia Spotlight is an open source text annotation tool that connects text to Linked Data by marking names of things in text (we call that Spotting) and selecting between multiple interpretations of these names (we call that Disambiguation). For example, Washington can be interpreted in more than 50 ways including a state, a government or a person. You can already imagine that this is not a trivial task, especially when we're talking about millions of things and hundreds of types.

We are regularly growing our community through GSoC and can deliver more and more opportunities to you.
2016 Program

Successful Projects

Contributor
Kunal Jha
Mentor
Sandro, Tim Ermilov, Axel Ngonga, Dimitris Kontokostas
Organization
dbpediaspotlight
DBpedia Lookup Improvements
DBpedia is one of the most extensive and most widely used knowledge base in over 125 languages. DBpedia Lookup is a tool that allows The DBpedia...
Contributor
wojtuch
Mentor
taraathan@gmail.com, Alexandru, swadpasc
Organization
dbpediaspotlight
Combining DBpedia and Topic Modelling
DBpedia, a crowd- and open-sourced community project extracting the content from Wikipedia, stores this information in a huge RDF graph. DBpedia...
Contributor
FedBai
Mentor
Emanuele Storti, Marco Fossati, Claudia Diamantini, Domenico Potena
Organization
dbpediaspotlight
The List Extractor
The project focuses on the extraction of relevant but hidden data which lies inside lists in Wikipedia pages. The information is unstructured and...
Contributor
Aditya Nambiar
Mentor
Nilesh, Dimitris Kontokostas, Chile
Organization
dbpediaspotlight
Automatic mappings extraction
DBpedia currently maintains a mapping between Wikipedia info-box properties to the DBpedia ontology, since several similar templates exist to...
Contributor
Vincent Bohlen
Mentor
Marco Fossati, Alexandru, swadpasc
Organization
dbpediaspotlight
A Hybrid Classifier/Rule-based Event Extractor for DBpedia Proposal
In modern times the amount of information published on the internet is growing to an immeasurable extent. Humans are no longer able to gather all the...
Contributor
s.papalini
Mentor
Emanuele Storti, Marco Fossati, Claudia Diamantini, Domenico Potena
Organization
dbpediaspotlight
The Table Extractor
Wikipedia is full of data hidden in tables. The aim of this project is to exploring the possibilities of take advantage of all the data represented...
Contributor
Peng_Xu
Mentor
Nilesh, Domenico Potena
Organization
dbpediaspotlight
Inferring infobox template class mappings from Wikipedia + Wikidata
There are many infoboxes on wikipedia. Every infobox has some properties. Actually, every infobox follows a certain template. In my project, the goal...
Contributor
wmaroy
Mentor
Anastasia Dimou, Alexandru, Dimitris Kontokostas
Organization
dbpediaspotlight
Integrating RML in the Dbpedia extraction framework
This project is about integrating RML in the Dbpedia extraction framework. Dbpedia is derived from Wikipedia infoboxes using the extraction framework...