DEVELOPING A NEW DBPEDIA ABSTRACT EXTRACTOR
- Mentors
- Mykola Medynskyi, Dimitris kontokostas, mhofer
- Organization
- DBpedia
- Technologies
- java, scala
- Topics
- semantic web
DBpedia provides monthly releases produced by the DBpedia Extraction Framework. They are composed of various data artifacts that mainly stem from the wiki dumps. However, some of them also rely on API calls for rendering dynamic contents, which is the case of the DBpedia abstracts. The large amount of data requested from APIs couldn't be extracted entirely within a month today. We suggest solving this issue by a strategy composed of four steps:
- a study based on the data recorded during the last abstract extraction
- the test and implement the use of the TextExtracts extension and the improvement the error management
- the reduction the number of possible calls
- the integration into the framework of the possibility to appeal to more than one API
Each step of the project will be developed into a new dedicated GitHub branch of the DBpedia extractor framework, which could be documented and used for working on the project.