Dataset Overview

The SMP datasets SMPD2019 contain 486K posts with images, which collected from Flickr (one of the largest photo-sharing website). SMPD is a multi-faced data collection, which contains rich contextual information and annotations (such as user profile, post category, customize tag, geography information, photo image, and photo metadata). For the prediction task in this year, we split the data with time-order, resulting in train and test ratio is 2:1. The tables below show the statistics of SMPD2019.

Dataset #Post #User #Categories Temporal Range (Months) Avg. Title Length #Customize Tags
SMPD2019 486k 69k 756 16 29 250k

Histogram of Labels:

Hierarchy for 756 Category Classes:

The inner circle denotes the first level categories, including 11 different classes. The second circle denotes the second level categories, including 77 different classes. And the last circle denotes the third level categories, including 668 different classes.

Photo Tag Cloud:

This tag cloud denotes all of the customize tags provided by users, including 250k different words.

Copyright © 2019. SMP Challenge Organization Committee. All rights reserved.