pop500: 500 videos crawled in February 2009. A video list was retrieved from one of the YouTube API feeds for one of the following categories: most_viewed, favorites, most_discussed, top_rated, and most_popular. The videos in the list were then crawled. This was repeated with varying feed categories at different times. pop1500: 1,500 videos crawled in February through May 2009 using the same process as pop500. random3500: 3,500 videos randomly crawled in February 2009. Videos were randomly selected by issuing random one-word queries from the SCOWL word list. Videos were randomly chosen from the results, but only one randomly-chosen results page was considered, regardless of the number of results. random10000: 10,000 videos crawled in May 2009 using the same random process as random3500. The data is in a SGML format similar to the one TREC uses. Each video is enclosed in VIDEO tags and contains multiple elements as shown below. Some videos may not have any COMMENTs in the COMMENTS section. The sizes may vary by a few videos in cases where some were thrown out. The comment rating is the number of people who liked or disliked the comment.