languagetool.org

Style and grammar checker

Technologies
javascript, java, machine learning, tensorflow, ai
Topics
education, artificial intelligence, language, nlp, edtech
Style and grammar checker
What

LanguageTool scans texts for style, spelling, and grammar errors. In some cases, it can even find semantic issues. For example, what could be wrong about "Thursday, 27 June 2017"? Well, 27 June 2017 was not on a Thursday, and LanguageTool detects that.

LanguageTool supports more than 20 languages (to a different degree), including English, Russian, German, Polish, Spanish, and French.

How

Internally, LanguageTool uses four different approaches to find errors:

  • it scans for known error pattern with a pattern languages similar to regular expressions, but more powerful
  • it uses Java code to find errors that are too complex for the error-pattern approach
  • it uses statistics to find uncommon sequences of words
  • it uses artificial intelligence to see if commonly confused words are used properly (like ad/add or cease/seize)
The Future

Artificial intelligence will be the main approach in the future to detect text errors. We're looking for your help and ideas to apply AI to the proofreading problem, for example by using a seq2seq approach like in machine translation.

LanguageTool is also an end user application, and users want LanguageTool to be integrated in the software they already use. We're looking for integrations into tinyMCE, CKEditor, and many others (your suggestions are welcome). Plus, the existing browser add-on for Firefox and Chrome needs major UI improvements.

2018 Program

Successful Projects

Contributor
Oleg Serikov
Mentor
Yakov Reztsov
Organization
languagetool.org
Suggestions sorting improvement, migration to the modern server-side framework, migration from Maven to Gradle
During the GSoC I'm going to complete the following tasks: Enhance the suggestions sorting algorithm using the ML-way inspired by after the...
Contributor
Allen Antony
Mentor
jaumeortola
Organization
languagetool.org
Confusion Pair Correction Using Sequence to Sequence Models
LanguageTool (LT) currently uses neural networks to detect confusion between words. So far, it only considers 2 words of context in both directions...