Searching the Youtomb database
The number one request we’ve received at Youtomb is a search feature. I’m sure the web based search feature is still forth coming but what we’ve done instead is about 1000 times cooler. We’ve replicated the database that we use and opened it up to the public. So now you can make your own interface or search by hand like we do!The tool I’ve used most is called Navicat and it provides a decent GUI to the mysql database. [WARNING] Navicat is not free however it does come with a free trial. As much as I dislike suggesting non-free software this really is one of the better front ends to mysql I’ve used. To start create a new connection by clicking on the Connection button in the upper left hand side of the window.Don’t forget that we do not require a password for the public database. Once you have created the connection you can click on icon in the left hand side and it will expand to show you the different objects within the database. Click on the Tables icon to drill down further and see the data contained inside. You’ll notice several tables but the most exciting stuff is in ‘artifacts’. This is where we store all of the information related to takedowns and geolocational filtering. Double clicking on the table will open it up and allow you to manually browse the table data. We have millions of records in there but that would take an eternity to download. So instead Navicat breaks up the results into chunks called “pages”. By default you can view 1000 records at a time. Browsing this way can be fun but you will see not only the videos that are taken down but *all* of the videos in our database. Since we record the status of each video we can use (structured query language) SQL to isolate just the records of videos that are no longer up. If you already know SQL then feel free to skip this next paragraph and move on to the sample queries. For those of you who don’t know SQL Navicat provides some decent tools to construct them for you. The first step is to click on the Query button in the toolbar. Then click New Query in the toolbar below that. This will create a new window for you which contains two tabs. The first is Query Builder and the next is SQL. By default it should select SQL and those with experience in the language can feel free to construct their own here. Otherwise click on the Query Builder tab and it will show the various tables available to you. Double click on the ‘artifacts’ table and the window pane to the right should populate with a list of check boxes. Those are the column names of the table and represent the possible data you can include in your search results. I have a set of data that I like to look at when I’m researching that includes: time_published, title, tags, category, num_views, banned_countries, status, description You can click the Run button from the menu bar now if you like however I normally add some more filters to the data. In particular the query constructed still shows all videos regardless of whether they are down or active. Click on the SQL tab and you can see the query that has been constructed so far. To filter out only the videos which are no longer up we have to add a “WHERE” clause. This is a lot simpler then it sounds. At the end of the query append:WHERE status != ‘up’Make sure to use single quotes. What we have done here is modified the query so that any record where the status column is not equal to ‘up’ will be returned. The results we get back will show every record in the database of videos which are down. I also like to order the records returned by adding a few more clauses. Don’t do this until you are comfortable with the above exercise. To have the records returned with the most popular videos on top you can add the ORDER BY clause. I normally use the num_views (number of views) column in descending order to have the most viewed videos appear on top. Ironically it is the least viewed videos that are often times the most interesting. If you are interested in this then add the clauseORDER BY num_views DESCto the end of the query. That’s it for this post. I’m going to put another one up soon that shows how to find the videos in our database that contain the word “Tibet” in the title and have been taken down. After that I’ll put up another post showing how to export your results into CSV format which can be read by spreadsheet programs like OpenOffice, MS Excel, and even Google Docs.