Thursday 16 April 2020

Finnish NER dataset

https://github.com/mpsilfve/finer-data

The directory data contains a corpus of Finnish technology related news articles with a manually prepared named entity annotation (digitoday.2014.csv). The text material was extracted from the archives of Digitoday, a Finnish online technology news source (www.digitoday.fi). The corpus consists of 953 articles (193,742 word tokens) with six named entity classes (organization, location, person, product, event, and date). The corpus is available for research purposes and can be readily used for development of NER systems for Finnish. The corpus is described in the article
"A Finnish News Corpus for Named Entity Recognition" (in review)

No comments:

Post a Comment

Popular Posts