https://metatext.io/datasets-list/finnish-language
Friday 7 July 2023
List of Finnish Datasets for NLP Projects
Thursday 29 June 2023
text summarise dataset
**Paper:**
https://arxiv.org/abs/1908.08345
**Dataset:**
1) the CNN/DailyMail news highlights dataset: somewhat Extractive
- News Articles & Related Highlights: Provides a brief overview of articles
- Input document: limited to 512 tokens
- https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail
2) the New York Times Annotated Corpus (NYT): somewhat Extractive
- Contains 110,540 articles with abstract summaries
- Input document : limited to 800 tokens
- https://research.google/resources/datasets/ny-times-annotated-corpus/
3) XSum: Abstractive
- 226,711 news articles answering the question of ‘What is this articles about?’ + one-sentence summaries
- Input document: limited to 512 tokens
- https://github.com/google-research-datasets/xsum_hallucination_annotations
Popular Posts
-
image segmentation dataset github : https://github.com/divamgupta/image-segmentation-keras google drive : https://drive.google.com/uc...
-
github: https://github.com/layumi/University1652-Baseline
-
https://github.com/RSIA-LIESMARS-WHU/RSOD-Dataset- SpaceNet[ https://spacenetchallenge.github.io/ ] https://github.com/chrieke/awesome-sa...
-
Geospatial data OpenStreetMap : Vector data for the entire planet under a free license. It contains (an older version of) the US Census B...
-
This data set was created to understand the potential for machine learning, computer vision, and HPC to improve the energy efficiency aspec...
-
dataset example python tools/download-dataset.py facades 400 images from CMP Facades dataset . (31MB) Pre-trained: BtoA ...
-
코퍼스 명 용도 설명 링크 Naver sentiment movie corpus v1.0 분류 네이버 영화 리뷰 ( 긍정 , 부정 ) 분류 라벨링 됨 https://github.com/e9t/nsmc C...
-
Stanford Background Dataset Sift Flow Dataset Barcelona Dataset Microsoft COCO dataset MSRC Dataset LITS Liver Tumor Segmentation Data...
-
CT images with clinical findings of COVID-19 The COVID-CT-Dataset has 275 CT images containing clinical findings of COVID-19. The images ...
-
Speech Datasets 2000 HUB5 English : English-only speech data used most recently in the Deep Speech paper from Baidu. LibriSpeech : Audio...