https://metatext.io/datasets-list/finnish-language
Data Scientist Dataset Finder Blog
Friday 7 July 2023
List of Finnish Datasets for NLP Projects
Thursday 29 June 2023
text summarise dataset
**Paper:**
https://arxiv.org/abs/1908.08345
**Dataset:**
1) the CNN/DailyMail news highlights dataset: somewhat Extractive
- News Articles & Related Highlights: Provides a brief overview of articles
- Input document: limited to 512 tokens
- https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail
2) the New York Times Annotated Corpus (NYT): somewhat Extractive
- Contains 110,540 articles with abstract summaries
- Input document : limited to 800 tokens
- https://research.google/resources/datasets/ny-times-annotated-corpus/
3) XSum: Abstractive
- 226,711 news articles answering the question of ‘What is this articles about?’ + one-sentence summaries
- Input document: limited to 512 tokens
- https://github.com/google-research-datasets/xsum_hallucination_annotations
Sunday 6 December 2020
pix2pix dataset
go to this page
https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/
or click this link directly
cityscapes.tar.gz | 2016-12-02 23:15 | 99M | ||
edges2handbags.tar.gz | 2016-12-02 23:16 | 8.0G | ||
edges2shoes.tar.gz | 2016-12-02 23:17 | 2.0G | ||
facades.tar.gz | 2016-12-02 23:17 | 29M | ||
maps.tar.gz | 2016-12-02 23:17 | 239M |
Tuesday 19 May 2020
RecSys Challenge 2015 dataset
Given a sequence of click events performed by some user during a typical session in an e-commerce website, the goal is to predict whether the user is going to buy something or not, and if he is buying, what would be the items he is going to buy. The task could therefore be divided into two sub goals:
- Is the user going to buy items in this session? Yes|No
- If yes, what are the items that are going to be bought?
Website:
https://2015.recsyschallenge.com/challenge.html
dataset link:
https://s3-eu-west-1.amazonaws.com/yc-rdata/yoochoose-data.7z
Sunday 10 May 2020
Network(GML format graph) Dataset
refer to this page: http://www-personal.umich.edu/~mejn/netdata/
Tuesday 21 April 2020
Monday 20 April 2020
GAN, image segmentation dataset
dataset
|
example
|
python tools/download-dataset.py facades
400 images from CMP Facades dataset. (31MB)
Pre-trained: BtoA
| |
python tools/download-dataset.py cityscapes
2975 images from the Cityscapes training set. (113M)
| |
python tools/download-dataset.py maps
1096 training images scraped from Google Maps (246M)
| |
python tools/download-dataset.py edges2shoes
50k training images from UT Zappos50K dataset. Edges are computed by HED edge detector + post-processing. (2.2GB)
Pre-trained: AtoB
| |
python tools/download-dataset.py edges2handbags
137K Amazon Handbag images from iGAN project. Edges are computed by HED edge detector + post-processing. (8.6GB)
Pre-trained: AtoB
|
image segmentation dataset list
- Stanford Background Dataset
- Sift Flow Dataset
- Barcelona Dataset
- Microsoft COCO dataset
- MSRC Dataset
- LITS Liver Tumor Segmentation Dataset
- KITTI
- Pascal Context
- Data from Games dataset
- Human parsing dataset
- Mapillary Vistas Dataset
- Microsoft AirSim
- MIT Scene Parsing Benchmark
- COCO 2017 Stuff Segmentation Challenge
- ADE20K Dataset
- INRIA Annotations for Graz-02
- Daimler dataset
- ISBI Challenge: Segmentation of neuronal structures in EM stacks
- INRIA Annotations for Graz-02 (IG02)
- Pratheepan Dataset
- Clothing Co-Parsing (CCP) Dataset
- Inria Aerial Image
- ApolloScape
- UrbanMapper3D
- RoadDetector
- Cityscapes
- CamVid
- Inria Aerial Image Labeling
Sunday 19 April 2020
COVID-CT
Covid19 Challenge Dataset
Saturday 18 April 2020
An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification
The ORNL Overhead Vehicle Dataset (OOVD)
Popular Posts
-
github: https://github.com/layumi/University1652-Baseline
-
image segmentation dataset github : https://github.com/divamgupta/image-segmentation-keras google drive : https://drive.google.com/uc...
-
**Paper:** https://arxiv.org/abs/1908.08345 **Dataset:** 1) the CNN/DailyMail news highlights dataset: somewhat Extractive - News Articles...
-
👉🏻 http://www.crowd-counting.com/#download A comprehensive dataset with 4,372 images and 1.51 million annotations. In comparison to...
-
Geospatial data OpenStreetMap : Vector data for the entire planet under a free license. It contains (an older version of) the US Census B...
-
This data set was created to understand the potential for machine learning, computer vision, and HPC to improve the energy efficiency aspec...
-
Dataset Finders Google Dataset Search Introductory blog post Kaggle Datasets Page : A data science site that contains a variety of exter...
-
https://github.com/RSIA-LIESMARS-WHU/RSOD-Dataset- SpaceNet[ https://spacenetchallenge.github.io/ ] https://github.com/chrieke/awesome-sa...
-
go to this page https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/ or click this link directly cityscapes.tar.gz 2016-12...
-
Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images We present Deep Fashion3D, a large-scale...