Using ChatGPT to Double-Distill Mojeek Results into a Date-Based Topic Overview

Using ChatGPT to Double-Distill Mojeek Results into a Date-Based Topic Overview

My concern about AI-assisted search results has been, from the beginning, the lack of human context. A simple query is rarely going to be sufficient in itself; after all, the user is searching because of some existing information lack. Outside of the most basic queries (When is a movie playing? Where is that restaurant? How many ounces in a pound?) the lack of any understanding around the why of the search risks the answer going off the rails. (This is why librarians do reference interviews, to understand better what patrons want!)

What an original search engine did was give you a bunch of links based on your query, letting you apply your own context via reviewing and evaluating the content of each link. Putting AI on top of that workflow puts the AI in the position of having to infer the search context, because the step of having a human assess the results is removed. AI is trying to go straight from query to result with no human- applied query refinement or framework, and after a low level of complexity these attempts go very sideways and end up in tech media roundups.

The good news is that you can apply even general contextual frameworks to a search query to make the AI’s results better. If you’ve been following this blog for a while you’ve heard me talk about the persistent metadata of time and place, the idea that everything in a physical universe can be described through the metadata of where they are and when they are. (If you have somehow luckily missed my ranting, I wrote an article about persistent metadata and contextual search boundaries here.)

Yesterday I wrote about applying persistent metadata of time via a JavaScript program that does date-based searching on Mojeek one year at a time and summarizes the returned content with ChatGPT. (The program performs date-based search by looking for year strings in the URL via the inurl: syntax and performing additional URL pattern matching/filtering on top of that.) Trying that program gave me more organized search results around a topic since the query was time- structured. The results also showed me how earlier details in a topic’s history tend to get buried as more recent information comes out. I don’t know the inner workings of Google News’ search but in my experience it weights recent news higher in search results, which makes sense. It’s just the details of past history tend to get overshadowed in the glare of Now.

After I made that little widget to do one year of date-based searching on Mojeek, I wondered what it would be like to do several years’ worth of searches/summaries at a time and then summarize that — a double distillation of Mojeek’s Web search, if you will (or even if you won’t.) So I spent some time expanding what I initially made. I thought it turned out pretty cool!

Matt Gaetz 2018/2020/2022

A screenshot of three ChatGPT summarized biographies of Matt Gaetz from 2018, 2020, and 2022.

My new tool accepts a query and multiple years to search. To test it, I tried searching Matt Gaetz using as my time-source the years 2018, 2020, and 2022. I got a summary for each year I searched and an aggregated biography of all the years put together. But what I really liked were the bullet points of critical data from each year, which I got by prompting ChatGPT to extract the most important points from each year’s summary into a list. (I also learned that if you’re writing ChatGPT output to a HTML div, adding “Please format your response with basic HTML” to the end of your prompt can work wonders.)

A bulleted list titled "Matt Gaetz - Key Events and Controversies."  Beneath it are lists of salient facts from 2018, 2020, and 2022.

Using the persistent metadata of time to shape our search and ChatGPT’s reporting gives us a timeline-based overview of Matt Gaetz and shows us where current events could overlap with details that might have been overshadowed by later events. For example, six years ago Gaetz was both promoting medical marijuana and pushing for investigations of Google. Those details take a backseat nowadays to Gaetz’ more contemporary status as “Guy who ousted Kevin McCarthy” or “Guy who is under several different investigations.” But they’re relevant to current happenings.

With a bullet list I like this I can take salient points and turn them into additional time-focused Mojeek searches easily. “Matt Gaetz” Google investigation inurl:2018 . “Matt Gaetz” Florida politics PACs inurl:2020. Though my initial search was only for “Matt Gaetz,” using URL pattern-matching to restrict content to specific years was enough of a search structure to generate a list of additional details attached to time metadata, which we can then turn into additional Mojeek searches if we want to do further exploring.

In the Gaetz example I mentioned details about his political career that are less mentioned but still relevant. Also useful, if you’re trying to focus your search results, are details about a topic that are less-mentioned and no longer-relevant at all! Let’s look at TikTok.

TikTok 2018/2020/2022

Today’s TikTok is a bit different from when it was launched late 2017 / early 2018, as you can tell from the summary paragraphs:

A screenshot of ChatGPT summaries for TikTok taken from source years 2018, 2020, and 2022.

TikTok went from “short-form mobile video app” in 2018 to “social media platform known for its short video content” in 2022. That’s quite a leap! Remember the 15-second video limit? Remember the Musical.ly acquisition? Neither one of them is relevant to TikTok’s current status. Here’s what will happen if you do a Mojeek search for TikTok “15 seconds” “musical.ly”:

A screenshot of a Mojeek search results for TikTok "15 seconds" "musical.ly." The results are generally old; the first one is from 2018.

What do you notice from those search results? The first one is from 2018. They’re mostly about historical happenings at TikTok, like the change of name from Musical.ly to TikTok, or about the history of TikTok. That’s because these little details are irrelevant to using TikTok now and using them in a query filters out more contemporary results.

On the other hand, potential partnerships with Apple music and doing more with streaming music are more recent developments in the history of TikTok. That reflects in the search results for TikTok “streaming music” viral “Apple Music”:

A screenshot of Mojeek search results for TikTok "streaming music" viral "Apple Music." the results are generally from 2022 and 2023.

These results are mostly from 2022 and 2023. And did you notice that in this case we didn’t even specify a specify year of content to search with inurl:? Because these concepts we’re searching for related to specific periods in TikTok’s history, they’re giving our search results a date focus without having to use the inurl: syntax with a year string. In the earlier case of Matt Gaetz, because we were searching for particular events, it made more sense to use the inurl: syntax, because we wanted to focus on specific time periods.

Current AI-Assisted Search Misses a Step

The way AI-assisted search works now, I believe, relies far too heavily on the (in)ability of the AI to infer context. There needs to be an interstitial step during which the AI can conduct some kind of basic reference interview or the user can choose some kind of understood restrictive structure (dates, times, authoritative information.)

The good news is that even adding most basic of search structures to your query, like a time element via inurl:, can have a huge impact on how well the results are organized and how easily AI can turn them into a useful set of facts for further exploration.

Back To Top