Contributor
wojtuch

Combining DBpedia and Topic Modelling


Mentors
taraathan@gmail.com, Alexandru, swadpasc
Organization
dbpediaspotlight

DBpedia, a crowd- and open-sourced community project extracting the content from Wikipedia, stores this information in a huge RDF graph. DBpedia Spotlight is a tool which delivers the DBpedia resources that are being mentioned in the document.

Using DBpedia Spotlight to extract and disambiguate Named Entities from Wikipedia articles and then applying a topic modelling algorithm (e.g. LDA) with URIs of DBpedia resources as features would result in a model, which is capable of describing the documents with the proportions of the topics covering them. But because the topics are also represented by DBpedia URIs, this approach could result in a novel RDF hierarchy and ontology with insights for further analysis of the emerged subgraphs.

The direct implication and first application scenario for this project would be utilizing the inference engine in DBpedia Spotlight, as an additional step after the document has been annotated and predicting its topic coverage.