https://metatext.io/datasets-list/finnish-language
Friday, 7 July 2023
List of Finnish Datasets for NLP Projects
Thursday, 29 June 2023
text summarise dataset
**Paper:**
https://arxiv.org/abs/1908.08345
**Dataset:**
1) the CNN/DailyMail news highlights dataset: somewhat Extractive
- News Articles & Related Highlights: Provides a brief overview of articles
- Input document: limited to 512 tokens
- https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail
2) the New York Times Annotated Corpus (NYT): somewhat Extractive
- Contains 110,540 articles with abstract summaries
- Input document : limited to 800 tokens
- https://research.google/resources/datasets/ny-times-annotated-corpus/
3) XSum: Abstractive
- 226,711 news articles answering the question of ‘What is this articles about?’ + one-sentence summaries
- Input document: limited to 512 tokens
- https://github.com/google-research-datasets/xsum_hallucination_annotations
Popular Posts
-
This data set was created to understand the potential for machine learning, computer vision, and HPC to improve the energy efficiency aspec...
-
Text Datasets 20 newsgroups : Classification task, mapping word occurences to newsgroup ID. One of the classic datasets for text classifi...
-
Recent Additions The UZH-FPV Drone Racing Dataset: High-speed, Aggressive 6DoF Trajectories for State Estimation and Drone Racing Hotels...
-
Data size is 100GB. Torrent files Link : https://bit.ly/2z8Rryd
-
dataset example python tools/download-dataset.py facades 400 images from CMP Facades dataset . (31MB) Pre-trained: BtoA ...
-
Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images We present Deep Fashion3D, a large-scale...
-
Video Datasets Youtube-8M : A large and diverse labeled video dataset for video understanding research.
-
Sentiment Multidomain sentiment analysis dataset An older, academic dataset. IMDB : An older, relatively small dataset for binary senti...
-
Semi automatically generated nuclei instance segmentation and classification dataset with exhaustive nuclei labels across 19 different tiss...
-
**Paper:** https://arxiv.org/abs/1908.08345 **Dataset:** 1) the CNN/DailyMail news highlights dataset: somewhat Extractive - News Articles...