Resources
Resources are grouped into four categories: corpora, tools, events and other.
Corpora
The most popular annotated and unannotated corpora created by and/or used by researchers in the field of automatic grammatical error correction.
- The NUS Corpus of Learner English — The corpus of about 1,400 students essays on a wide range of topics, such as environmental pollution, healthcare, etc., annotated by professional English instructors with error tags and corrections within 28 error categories. Annotated: yes. Size: about 1,400 essays.
- The WikEd error corpus ver. 1.0 — A publicly available large corpus of corrective edits extracted from English Wikipedia. Annotated: no information. Size: about 55M sentences.
Tools
Useful tools that has been used to develop a various grammatical error correction system, for example NLP tools, machine learning toolkits, evaluation scripts, etc.
-
Moses
—
A statistical machine translation system that allows you to automatically train translation models for any language pair. Check the
fscorer
branch for scripts designed for automatic grammatical error correction.