The SMHP dataset collected from Flickr (a photo sharing platform) for headline prediction task. We split the data with time-order, resulting in train and test data ratio is 10:1.The tables below show the statistics of dataset.


You can download image URLs and their associated meta data here. The training data (including popularity labels) is available now.

Readme Document

Download Link for Train Image Urls (Path Sample: train/77@N93/551891.jpg)
Download Link for Train Data (include image paths, meta data and labels)
Download Link for Time Zone of Train Data
Download Link for Test Data (include image paths, meta data and without labels)
Download Link for Time Zone of Test Data

Note that the datasets will ONLY be released to participants who have registered the challenge during the competition. Until the challenge completes, we will make the data publically available to the whole research community.

Dataset Statistics

#Post #User #Categories Temporal Range (Months) Avg. Title Length #Tags #POIs Avg. Views
340K 80K 11 16 26 669 103K 306

*In the dataset, we provide the category information for each photo.