Dataset Overview

The SMPD (Social Media Prediction Dataset) contains 486K social multimedia posts from 70K users and various social media information including anonymized photo-sharing records, user profile, web image, text, time, location, category, etc. SMPD is a multi-faced, large-scale, temporal web data collection, which collected from Flickr (one of the largest photo-sharing platforms). For the time-series forecasting task, we split training/testing data into chronological sets (commonly, by date and time). The tables below show the statistics of the dataset.

Dataset #Post #User #Categories Temporal Range (Months) Avg. Title Length #Customize Tags
SMPD2019 486k 70k 756 16 29 250k

Histogram of Labels


Hierarchy for 756 Category Classes

The inner circle denotes the first level categories, including 11 different classes. The second circle denotes the second level categories, including 77 different classes. And the last circle denotes the third level categories, including 668 different classes.


Photo Tag Cloud

This tag cloud denotes all of the customize tags provided by users, including 250k different words.

Copyright © 2020. SMP Challenge Organization Committee. All rights reserved.