Exploring YouTube Channels Via Wikidata

Exploring YouTube Channels Via Wikidata

My obsession with YouTube search has entered its second week. This time, though, I can actually share with you what I made to address one of my search annoyances because I didn’t need the YouTube API this time!

Searching YouTube is an interesting puzzle for me because a) you’re searching fairly small descriptive spaces so you have to be careful with your keywords and b) There’s no easy way to eliminate large swathes of search space outside of using keywords. if I were searching Google and not getting the result quality I wanted, I might try adding site:gov or site:edu to limit my search to more restricted Web spaces, where I could check the results and maybe develop the query further. It’s difficult to do that with YouTube.

In fact, exploring YouTube channels is kind of difficult, period. Here’s what you get when you search for channels on YouTube:

Screenshot of a YouTube search for "house music." Four channels are listed. There's a picture of the channel icon, name, information on subscriber numbers, and abridged description, and a subscribe button.

This is bad. Sure, you get some description, but it’s not even the full description. And you get the number of subscribers. Yay. Know what you don’t get? You don’t get the number of videos in the channel. You don’t get the date the channel’s last video uploaded. You don’t get the data points you could use to see if the channel is active or not and how much content it produces. This is important because you can’t preview a channel from the search page. You have to click out and visit it. You have to go back and forth between sites that look interesting because you have no signal in the search results telling you not to waste your time because the channel died years ago and nobody bothered to take it down.

Maybe YouTube is just hoping you subscribe to whatever looks interesting and thin out your subscriptions later. But how many channels does YouTube have? It’s got to be in the tens of millions. It seems to me that there is simply too much content flowing through YouTube to be this lackadaisical in presenting channel search results. Why not provide more data points in the search results and give the users tools to find the active channels, the content-producing channels, the channels that are going to give them the best experience on the platform?

Or perhaps YouTube is more concerned with users searching for videos as opposed to channels. And that’s certainly a reasonable argument; people might prefer to start by searching by keywords and then find channels they want to follow. If that’s the case, why bother making the channel search good? Maybe that’s the reason.

Whatever the reason is, I hate YouTube’s channel search, so I used the YouTube API to create my own spin. I added in the data I thought was missing, made sure the entire description got included, and even included a player in each result which showed that channel’s last 15 videos. (I didn’t need the API for that, though — I used the channel’s RSS feed.) That was a lot better, but some keywords ended up getting me a lot of low-quality channels and they were hard to eliminate. So I wondered if I could maybe narrow down my search pool by applying another data source to my YouTube search — like Wikipedia!

Enter Wikidata

Wikidata properties include P2397, the YouTube Channel ID property, which meant I could isolate Wikipedia pages which have YouTube channels, gather their information into a dataset via the Wikipedia API, and then make that dataset browsable in interesting ways. According to the Wikidata Query Service, Wikipedia has about 114,000 pages with YouTube Channels, which was rather a bigger bite than I wanted to chew. But I filtered the list some by restricting it to pages which were instances of various types of media outlet (newspaper, TV station, etc) and that narrowed it down to about 4400 pages, with a focus on media properties. Much more manageable.

I made one script to gather all the page information (downloaded a CSV of the page Q-numbers from the Wikidata Query Service and then chunked it into the script.) All the pages ended up making a 150MB JSON, so I made a second script to filter out all data I didn’t need and got it down to 4.5 MB. When I was done I had a JSON of information about over 4000 Wikipedia-listed media outlets which also have YouTube channels.

{
    "id": "Q1033960",
    "label": "VRT CANVAS",
    "description": "Flemish TV station",
    "wikipediaInfo": {
      "title": "VRT Canvas",
      "summary": "VRT Canvas is a Belgian television channel of the Flemish public broadcasting organisation Vlaamse Radio- en Televisieomroeporganisatie (VRT). Specialising in both original and adaptations from western Europe and North America, the channel offers: in-depth news and current affairs, non-mainstream entertainment, documentaries, arthouse films, other cultural programming, and most recently additional children's programming.",
      "url": "https://en.wikipedia.org/wiki/VRT%20Canvas",
      "pageId": "3884053"
    },
    "claims": {
      "INSTANCE_OF": {
        "id": "Q1616075",
        "label": "television station"
      },
      "OFFICIAL_WEBSITE": "http://www.canvas.be/",
      "FOUNDED": "+1997-12-01T00:00:00Z",
      "COUNTRY": {
        "id": "Q31",
        "label": "Belgium"
      },

Once I had the JSON, I just needed a viewer, which I made and put up at https://searchtweaks.com/vcb/ .

The Video Channel Browser at work. At the top there are some instructions: "Browse information and recent videos of 4000+ media outlets with YouTube channels. (The dataset was extracted from Wikidata / Wikipedia in December 2024.) Search by keyword or use the dropdown menu to specify media type, country, or language. Bear in mind that not all metadata is available for all listings, so if you're searching for a location you might get additional search results in addition to using the dropdown. After search, click on an outlet name for more additional information about it and a slideshow with the last 15 videos from the channel." 
Beneath that is a keyword search fine. Underneath THAT there are three dropdown menus, letting you narrow down the listings by media type, country, or language. Because the data loads from the JSON automatically there are already a number of media outlets visible. They're listed in a table which includes outlet name, description, type, country, language, and YouTube link. Not every listing has every piece of metadata.

The JSON loads automatically when you open the page, so after a few seconds you’ll see a list of outlets. From here you can either narrow the list down via keyword search or by using the Type, Country, and Language dropdown menus. Because not every listing has every piece of metadata, I recommend you do a keyword search in addition to using the dropdown menu if you’re trying to narrow the list by media type or location.

You can get more details on the channel by clicking on the outlet name. A div will open providing a little more information on the channel along with links to other social platforms and official web sites. Also available in the div is a slideshow of the last 15 videos from that channel. Unless the channel prohibits playing videos outside of YouTube, you can watch those videos right from the listing.

A detail div from the Video Channel Browser showing information on Sama TV, a television station in Syria. On the left it's showing basic information about the channel along with links to some social platforms. On the right, a thumbnail shows the most recent video, a live press conference from December 16, 2024.

Of course, the biggest disadvantage of searching YouTube channels this way is that this collection of channels is a tiny fraction of YouTube’s total. On the other hand, I kind of like having a topical collection of channels to browse through. Also, I can make more collections! All the viewer does is process a JSON. Any page collection I can turn into a JSON (which basically means any query I can make work in the Wikidata Query Service) can turn into a browsable collection. Wikpedia has about 3000 people it identifies as “YouTubers” with YouTube channels; I think that’s going to be my next set.

I like the simplicity of this tool because it doesn’t require any API keys at all. But I think I’m going to develop it further; maybe add something that calls the YouTube API to bulk out the JSON with even more channel information. I could offer even more filtering options without having to add an API key requirement to the equation. But the first step is to make a few more datasets and see what I can find.

Back To Top