SMPD 2020

All data

Label

Image

Category

Text

Temporal-Spatial Information

User Profile

Additional Information

Image Dataset

All data

Label

Image

Category

Text

Temporal-Spatial Information

User Profile

Additional Information

Image Dataset

Label

Each row contains the popularity score(log-views) of the corresponding post.

Popularityscore
...
3.2
2.3
...

Image

Each row contains the URL of the cprrsponding photo or video

...
"https://www.flickr.com/photos/58708830@N00/385070026"
"https://www.flickr.com/photos/97042891@N00/943750056"
...

Category

Each row in the file corresponds a category set of the post.

Uid  Pid  Category  Subcategory  Concept
...
"70478@N10" "564687" "Whether&Season" "Raining" "umbrella"
"37810@N60" "565202" "Fashion" "Girls,Fashion" "skirt"
"25893@N22" "565381" "Whether&Season" "Raining" "puddle"
"3175@N73" "16603"  "Entertainment" "Music" "rnb
...
{
  "Uid": "70478@N10",
  "Pid": 564687,
  "Category": "Whether&Season",
  "Subcategory": "Raining",
  "Concept": "umbrella"
}

Uid: the user this post belongs to.

Pid: the photo along with the post. One Pid can locate a particular post.

Category: the first category of the post.(11 classes)

Subcategory: there are 77 classes in 2nd level category.

Concept: there are 668 different description.

Text

Each row represents a text information of the post.

Uid Pid Title Mediatype Alltags
...
"70478@N10" "564687" "Sarah Moon 3" "photo" "black hat fashion yellow umbrella"
"37810@N60" "565202" "2016-03-06 22.19.08" "photo" "orange sexy philadelphia hockey nhl sweater bra skirt blonde flowing cheerleader cleavage plaid flyers philadelphiaflyers icegirls"
"25893@N22" "565381" "Tristesse at the Federal Chancellery" "photo" "blackandwhite bw white black reflection berlin wet water rain canon germany puddle deutschland eos blackwhite wasser symmetry sw schwarzweiss puddles reflexion weiss federal schwarz regen tristesse trist reflektion kanzleramt pfuetze 6d nass bundeskanzleramt 2016 symmetrie pftze weis pftzen chancellery schwarzweis federalchancellery pfuetzen canoneos6d hoonose68 againstautotagging sgrossien grossien"
"3175@N73" "16594" "Amari DJ Mona-Lisa" "photo" "newyork celebrity brooklyn radio flickr photos itunes images singer singers celebrities hiphop reggae rb songwriter recordingartists broadcaster rnb songwriters amari hiphopartists cdbaby newreleases femaleartists reverbnation soundcloud famouscelebrities reggaeartists femaleperformers mtvartists amaridjmonalisa amazonmusic newyorkperformers reverbnationartists spotifyartists soundcloudartists jamgo cdbabyartists googleplayartists"
...
{
  "Uid": "70478@N10",
  "Pid": "564687",
  "Tile": "Sarah Moon 3",
  "Mediatype": "photo",
  "Alltags": "black" "hat" "fashion" "yellow" "umbrella"
}

Title: the tile of the post defined by the user.

Mediatype: the type of the attached media file, including 'photo' and 'video'.

Alltags: the customized tags from users.

Temporal-Spatial Information

Each row offers the date and geographic information of the post.

Uid Pid Postdate Latitude Longitude Geoaccuracy
...
"70478@N10" "564687" "1457068974" "0" "0" "0"
"37810@N60" "565202" "1457273948" "0" "0" "0"
"25893@N22" "565381" "1457239452" "52.520213" "13.373097" "16"
"3263@N23" "17776" "1445400000" "39.051935" "-94.48068 14"
...
{
  "Uid": "25893@N22",
  "Pid": "565381",
  "Latitude": "52.520213",
  "Longitude": "13.373097",
  "Geoaccuracy": "16"
}

Postdate: the publish timestamp of the post. It can be converted to Datetime by following python code:

import time
timestamp = 1457068974
timeArray = time.localtime(timestamp)
datetime = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)

Latitude: the latitude whose valid range is -90 to 90. Anything more than 6 decimal places will be truncated.

Longitude: the longitude whose valid range is -180 to 180. Anything more than 6 decimal places will be truncated.

Geoaccuracy: recorded accuracy level of the location information. World level is 1, Country is ~3, Region ~6, City ~11, Street ~16. The current range is 1-16. Defaults to 16 if not specified.

User Profile

Each row contains the user data of the post.

photo_firstdate photo_count ispro canbuypro timezone_offset photo_firstdatetaken timezone_id user_description location_description
...
"1213743830" "6828" "1" "0" "1" "1904010100" "9" "0.0866962,-0.0752717,..." "0,0,..."
...
    {
    "photo_firstdate": "1213743830",
    "photo_count": "6828",
    "ispro": "1",
    "canbuypro": "0",
    "timezone_offset": "1"
    "photo_firstdatetaken": "1904010100"
    "timezone_id":"9"
    "user_description":"0.0866962,-0.0752717,..."
    "location_description":"0,0,..."
    }

Photo_firstdate: the date of the first photo uploaded by the user.

Photo_count: the number of posted photo by the user.

Ispro: is the user belong to pro member.

Photo_firstdatetaken: the date of the first photo taken by the user.

Timezone_offset: the time zone of the user.

User description: the feature used to describe the user data.

Location description: the feature used to describe the user location.

Additional Information

Each row offers the supplimental information of the post.

Uid Pid Pathalias Ispublic Mediastatus
...
"70478@N10" "564687" "None" "1" "ready"
"37810@N60" "565202" "None" "1" "ready"
"25893@N22" "565381" "hoo_nose_68" "1" "ready"
"3652@N11" "19388" "angelo_nairod" "1" "ready"
...
{
  "Uid": "25893@N22",
  "Pid": "565381",
  "Pathalias": "hoo_nose_68",
  "Ispublic": "1",
  "Mediastatus": "ready"
}

Pathalias: the path alias provided by the user.

Ispublic: indicates that the post is authenticated with 'read' permissions.

Mediastatus: indicates that the attached media is ready to access by others.


Copyright © 2020. SMP Challenge Organization Committee. All rights reserved.