Searching in Data Tide Pools Before Braving Google’s Oceans

Searching in Data Tide Pools Before Braving Google’s Oceans

I’ve been playing with the idea of building a little wading pool of data that offers a limited but reasonably authoritative collection of information (in this case Wikipedia), and then  exploring the relationships between those data to build more complex search engine queries that are less likely to get snared by junk Google results.

I made a Wikipedia concept extractor and I’m using it to build topical searches that I test with strictly-filtered (&fi) Mojeek site sets — expanding that information pool just a little. Will that be enough to narrow my query so I can try it on Google? Or should I bring in that new lovely (way cheaper) GPT-4o-mini model in to do another round of concept extraction to refine the query further?

I’m intrigued by the idea of using external information to define your search as precisely as possible before letting your query loose on something like Google. Obviously this is not for basic quick searches, but I can imagine using it for topical searching.

If you want to search for information about George Washington, having a “wading pool” of data (completely transparent, intentionally limited, and unlikely to drown you) that starts with Wikipedia and expands to a carefully-chosen set of reference type sites (other encyclopedias, a couple of large news sites, etc.) would you allow you to explore the cloud of concepts that is George Washington, pick out the ones you’re concerned with at the moment, and build them into a complex enough query that you can bypass a lot of that top-level shallow-knowledge AI sewage.

Thursday night fun

Back To Top