https://metatext.io/datasets-list/finnish-language
Friday, 7 July 2023
List of Finnish Datasets for NLP Projects
Thursday, 29 June 2023
text summarise dataset
**Paper:**
https://arxiv.org/abs/1908.08345
**Dataset:**
1) the CNN/DailyMail news highlights dataset: somewhat Extractive
- News Articles & Related Highlights: Provides a brief overview of articles
- Input document: limited to 512 tokens
- https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail
2) the New York Times Annotated Corpus (NYT): somewhat Extractive
- Contains 110,540 articles with abstract summaries
- Input document : limited to 800 tokens
- https://research.google/resources/datasets/ny-times-annotated-corpus/
3) XSum: Abstractive
- 226,711 news articles answering the question of ‘What is this articles about?’ + one-sentence summaries
- Input document: limited to 512 tokens
- https://github.com/google-research-datasets/xsum_hallucination_annotations
Popular Posts
-
Best interesting data is football network refer to this page: http://www-personal.umich.edu/~mejn/netdata/
-
👉🏻 http://www.crowd-counting.com/#download A comprehensive dataset with 4,372 images and 1.51 million annotations. In comparison to...
-
Artificial Datasets Arcade Universe : - An artificial dataset generator with images containing arcade games sprites such as tetris pentom...
-
**Paper:** https://arxiv.org/abs/1908.08345 **Dataset:** 1) the CNN/DailyMail news highlights dataset: somewhat Extractive - News Articles...
-
github: https://github.com/layumi/University1652-Baseline
-
Geospatial data OpenStreetMap : Vector data for the entire planet under a free license. It contains (an older version of) the US Census B...
-
Dataset Domain License Reference Availablility CONLL 2003 News DUA Sang and Meulder, 2003 Easy to find NIST-IEER...
-
Facial Datasets Labelled Faces in the Wild : 13,000 cropped facial regions (using; Viola-Jones that have been labeled with a name ident...
-
Dataset Finders Google Dataset Search Introductory blog post Kaggle Datasets Page : A data science site that contains a variety of exter...
-
Health & Biology Data EU Surveillance Atlas of Infectious Diseases Merck Molecular Activity Challenge Musk dataset : The Musk datab...