https://metatext.io/datasets-list/finnish-language
Data Scientist Dataset Finder Blog
Friday, 7 July 2023
List of Finnish Datasets for NLP Projects
Thursday, 29 June 2023
text summarise dataset
**Paper:**
https://arxiv.org/abs/1908.08345
**Dataset:**
1) the CNN/DailyMail news highlights dataset: somewhat Extractive
- News Articles & Related Highlights: Provides a brief overview of articles
- Input document: limited to 512 tokens
- https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail
2) the New York Times Annotated Corpus (NYT): somewhat Extractive
- Contains 110,540 articles with abstract summaries
- Input document : limited to 800 tokens
- https://research.google/resources/datasets/ny-times-annotated-corpus/
3) XSum: Abstractive
- 226,711 news articles answering the question of ‘What is this articles about?’ + one-sentence summaries
- Input document: limited to 512 tokens
- https://github.com/google-research-datasets/xsum_hallucination_annotations
Sunday, 6 December 2020
pix2pix dataset
go to this page
https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/
or click this link directly
| cityscapes.tar.gz | 2016-12-02 23:15 | 99M | ||
| edges2handbags.tar.gz | 2016-12-02 23:16 | 8.0G | ||
| edges2shoes.tar.gz | 2016-12-02 23:17 | 2.0G | ||
| facades.tar.gz | 2016-12-02 23:17 | 29M | ||
| maps.tar.gz | 2016-12-02 23:17 | 239M |
Tuesday, 19 May 2020
RecSys Challenge 2015 dataset
Given a sequence of click events performed by some user during a typical session in an e-commerce website, the goal is to predict whether the user is going to buy something or not, and if he is buying, what would be the items he is going to buy. The task could therefore be divided into two sub goals:
- Is the user going to buy items in this session? Yes|No
- If yes, what are the items that are going to be bought?
Website:
https://2015.recsyschallenge.com/challenge.html
dataset link:
https://s3-eu-west-1.amazonaws.com/yc-rdata/yoochoose-data.7z
Sunday, 10 May 2020
Network(GML format graph) Dataset
refer to this page: http://www-personal.umich.edu/~mejn/netdata/
Tuesday, 21 April 2020
Monday, 20 April 2020
GAN, image segmentation dataset
dataset
|
example
|
python tools/download-dataset.py facades
400 images from CMP Facades dataset. (31MB)
Pre-trained: BtoA
| |
python tools/download-dataset.py cityscapes
2975 images from the Cityscapes training set. (113M)
| |
python tools/download-dataset.py maps
1096 training images scraped from Google Maps (246M)
| |
python tools/download-dataset.py edges2shoes
50k training images from UT Zappos50K dataset. Edges are computed by HED edge detector + post-processing. (2.2GB)
Pre-trained: AtoB
| |
python tools/download-dataset.py edges2handbags
137K Amazon Handbag images from iGAN project. Edges are computed by HED edge detector + post-processing. (8.6GB)
Pre-trained: AtoB
|
image segmentation dataset list
- Stanford Background Dataset
- Sift Flow Dataset
- Barcelona Dataset
- Microsoft COCO dataset
- MSRC Dataset
- LITS Liver Tumor Segmentation Dataset
- KITTI
- Pascal Context
- Data from Games dataset
- Human parsing dataset
- Mapillary Vistas Dataset
- Microsoft AirSim
- MIT Scene Parsing Benchmark
- COCO 2017 Stuff Segmentation Challenge
- ADE20K Dataset
- INRIA Annotations for Graz-02
- Daimler dataset
- ISBI Challenge: Segmentation of neuronal structures in EM stacks
- INRIA Annotations for Graz-02 (IG02)
- Pratheepan Dataset
- Clothing Co-Parsing (CCP) Dataset
- Inria Aerial Image
- ApolloScape
- UrbanMapper3D
- RoadDetector
- Cityscapes
- CamVid
- Inria Aerial Image Labeling
Sunday, 19 April 2020
COVID-CT
Covid19 Challenge Dataset
Saturday, 18 April 2020
An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification
The ORNL Overhead Vehicle Dataset (OOVD)
Popular Posts
-
Symbolic Music Datasets Piano-midi.de: classical piano pieces Nottingham : over 1000 folk tunes MuseData: electronic library of classic...
-
Artificial Datasets Arcade Universe : - An artificial dataset generator with images containing arcade games sprites such as tetris pentom...
-
Datasets for English Named Entity Recognition Annotated Corpus for Named Entity Recognition : Corpus for entity classification with enh...
-
Recommendation and ranking systems Movielens : Movie ratings dataset from the Movielens website, in various sizes ranging from demo to mi...
-
This data set was created to understand the potential for machine learning, computer vision, and HPC to improve the energy efficiency aspec...
-
dataset example python tools/download-dataset.py facades 400 images from CMP Facades dataset . (31MB) Pre-trained: BtoA ...
-
Question answering Maluuba News QA Dataset : 120K Q&A pairs on CNN news articles. Quora Question Pairs : first dataset release from ...
-
https://github.com/mpsilfve/finer-data The directory data contains a corpus of Finnish technology related news articles with a manually p...
-
Open research on large Covid-19 imaging datasets Medical imaging is potentially well suited for Covid-19 diagnosis. This challenge is abo...
-
Dataset Domain License Reference Availablility CONLL 2003 News DUA Sang and Meulder, 2003 Easy to find NIST-IEER...










