https://metatext.io/datasets-list/finnish-language
Data Scientist Dataset Finder Blog
Friday, 7 July 2023
List of Finnish Datasets for NLP Projects
Thursday, 29 June 2023
text summarise dataset
**Paper:**
https://arxiv.org/abs/1908.08345
**Dataset:**
1) the CNN/DailyMail news highlights dataset: somewhat Extractive
- News Articles & Related Highlights: Provides a brief overview of articles
- Input document: limited to 512 tokens
- https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail
2) the New York Times Annotated Corpus (NYT): somewhat Extractive
- Contains 110,540 articles with abstract summaries
- Input document : limited to 800 tokens
- https://research.google/resources/datasets/ny-times-annotated-corpus/
3) XSum: Abstractive
- 226,711 news articles answering the question of ‘What is this articles about?’ + one-sentence summaries
- Input document: limited to 512 tokens
- https://github.com/google-research-datasets/xsum_hallucination_annotations
Sunday, 6 December 2020
pix2pix dataset
go to this page
https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/
or click this link directly
cityscapes.tar.gz | 2016-12-02 23:15 | 99M | ||
edges2handbags.tar.gz | 2016-12-02 23:16 | 8.0G | ||
edges2shoes.tar.gz | 2016-12-02 23:17 | 2.0G | ||
facades.tar.gz | 2016-12-02 23:17 | 29M | ||
maps.tar.gz | 2016-12-02 23:17 | 239M |
Tuesday, 19 May 2020
RecSys Challenge 2015 dataset
Given a sequence of click events performed by some user during a typical session in an e-commerce website, the goal is to predict whether the user is going to buy something or not, and if he is buying, what would be the items he is going to buy. The task could therefore be divided into two sub goals:
- Is the user going to buy items in this session? Yes|No
- If yes, what are the items that are going to be bought?
Website:
https://2015.recsyschallenge.com/challenge.html
dataset link:
https://s3-eu-west-1.amazonaws.com/yc-rdata/yoochoose-data.7z
Sunday, 10 May 2020
Network(GML format graph) Dataset
refer to this page: http://www-personal.umich.edu/~mejn/netdata/
Tuesday, 21 April 2020
Monday, 20 April 2020
GAN, image segmentation dataset
dataset
|
example
|
python tools/download-dataset.py facades
400 images from CMP Facades dataset. (31MB)
Pre-trained: BtoA
| |
python tools/download-dataset.py cityscapes
2975 images from the Cityscapes training set. (113M)
| |
python tools/download-dataset.py maps
1096 training images scraped from Google Maps (246M)
| |
python tools/download-dataset.py edges2shoes
50k training images from UT Zappos50K dataset. Edges are computed by HED edge detector + post-processing. (2.2GB)
Pre-trained: AtoB
| |
python tools/download-dataset.py edges2handbags
137K Amazon Handbag images from iGAN project. Edges are computed by HED edge detector + post-processing. (8.6GB)
Pre-trained: AtoB
|
image segmentation dataset list
- Stanford Background Dataset
- Sift Flow Dataset
- Barcelona Dataset
- Microsoft COCO dataset
- MSRC Dataset
- LITS Liver Tumor Segmentation Dataset
- KITTI
- Pascal Context
- Data from Games dataset
- Human parsing dataset
- Mapillary Vistas Dataset
- Microsoft AirSim
- MIT Scene Parsing Benchmark
- COCO 2017 Stuff Segmentation Challenge
- ADE20K Dataset
- INRIA Annotations for Graz-02
- Daimler dataset
- ISBI Challenge: Segmentation of neuronal structures in EM stacks
- INRIA Annotations for Graz-02 (IG02)
- Pratheepan Dataset
- Clothing Co-Parsing (CCP) Dataset
- Inria Aerial Image
- ApolloScape
- UrbanMapper3D
- RoadDetector
- Cityscapes
- CamVid
- Inria Aerial Image Labeling
Sunday, 19 April 2020
COVID-CT
Covid19 Challenge Dataset
Saturday, 18 April 2020
An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification
The ORNL Overhead Vehicle Dataset (OOVD)
Popular Posts
-
Natural-Image Datasets MNIST: handwritten digits : The most commonly used sanity check. Dataset of 25x25, centered, B&W handwritten d...