My obsession with YouTube search has entered its second week. This time, though, I can actually share with you what I made to address one of my search annoyances because I didn’t need the YouTube API this time!
Searching YouTube is an interesting puzzle for me because a) you’re searching fairly small descriptive spaces so you have to be careful with your keywords and b) There’s no easy way to eliminate large swathes of search space outside of using keywords. if I were searching Google and not getting the result quality I wanted, I might try adding site:gov or site:edu to limit my search to more restricted Web spaces, where I could check the results and maybe develop the query further. It’s difficult to do that with YouTube.
In fact, exploring YouTube channels is kind of difficult, period. Here’s what you get when you search for channels on YouTube:
This is bad. Sure, you get some description, but it’s not even the full description. And you get the number of subscribers. Yay. Know what you don’t get? You don’t get the number of videos in the channel. You don’t get the date the channel’s last video uploaded. You don’t get the data points you could use to see if the channel is active or not and how much content it produces. This is important because you can’t preview a channel from the search page. You have to click out and visit it. You have to go back and forth between sites that look interesting because you have no signal in the search results telling you not to waste your time because the channel died years ago and nobody bothered to take it down.
Maybe YouTube is just hoping you subscribe to whatever looks interesting and thin out your subscriptions later. But how many channels does YouTube have? It’s got to be in the tens of millions. It seems to me that there is simply too much content flowing through YouTube to be this lackadaisical in presenting channel search results. Why not provide more data points in the search results and give the users tools to find the active channels, the content-producing channels, the channels that are going to give them the best experience on the platform?
Or perhaps YouTube is more concerned with users searching for videos as opposed to channels. And that’s certainly a reasonable argument; people might prefer to start by searching by keywords and then find channels they want to follow. If that’s the case, why bother making the channel search good? Maybe that’s the reason.
Whatever the reason is, I hate YouTube’s channel search, so I used the YouTube API to create my own spin. I added in the data I thought was missing, made sure the entire description got included, and even included a player in each result which showed that channel’s last 15 videos. (I didn’t need the API for that, though — I used the channel’s RSS feed.) That was a lot better, but some keywords ended up getting me a lot of low-quality channels and they were hard to eliminate. So I wondered if I could maybe narrow down my search pool by applying another data source to my YouTube search — like Wikipedia!
Enter Wikidata
Wikidata properties include P2397, the YouTube Channel ID property, which meant I could isolate Wikipedia pages which have YouTube channels, gather their information into a dataset via the Wikipedia API, and then make that dataset browsable in interesting ways. According to the Wikidata Query Service, Wikipedia has about 114,000 pages with YouTube Channels, which was rather a bigger bite than I wanted to chew. But I filtered the list some by restricting it to pages which were instances of various types of media outlet (newspaper, TV station, etc) and that narrowed it down to about 4400 pages, with a focus on media properties. Much more manageable.
I made one script to gather all the page information (downloaded a CSV of the page Q-numbers from the Wikidata Query Service and then chunked it into the script.) All the pages ended up making a 150MB JSON, so I made a second script to filter out all data I didn’t need and got it down to 4.5 MB. When I was done I had a JSON of information about over 4000 Wikipedia-listed media outlets which also have YouTube channels.
Once I had the JSON, I just needed a viewer, which I made and put up at https://searchtweaks.com/vcb/ .
The JSON loads automatically when you open the page, so after a few seconds you’ll see a list of outlets. From here you can either narrow the list down via keyword search or by using the Type, Country, and Language dropdown menus. Because not every listing has every piece of metadata, I recommend you do a keyword search in addition to using the dropdown menu if you’re trying to narrow the list by media type or location.
You can get more details on the channel by clicking on the outlet name. A div will open providing a little more information on the channel along with links to other social platforms and official web sites. Also available in the div is a slideshow of the last 15 videos from that channel. Unless the channel prohibits playing videos outside of YouTube, you can watch those videos right from the listing.
Of course, the biggest disadvantage of searching YouTube channels this way is that this collection of channels is a tiny fraction of YouTube’s total. On the other hand, I kind of like having a topical collection of channels to browse through. Also, I can make more collections! All the viewer does is process a JSON. Any page collection I can turn into a JSON (which basically means any query I can make work in the Wikidata Query Service) can turn into a browsable collection. Wikpedia has about 3000 people it identifies as “YouTubers” with YouTube channels; I think that’s going to be my next set.
I like the simplicity of this tool because it doesn’t require any API keys at all. But I think I’m going to develop it further; maybe add something that calls the YouTube API to bulk out the JSON with even more channel information. I could offer even more filtering options without having to add an API key requirement to the equation. But the first step is to make a few more datasets and see what I can find.