If you’ve been reading my stuff for a while you may remember me talking about the idea of persistent metadata. By “persistent metadata,” I mean that every thing in a physical universe can be identified by WHERE they are and WHEN they are. A person is born here, educated here, works here, moved here, died here. Along with a place, each of those steps also has a span of time associated with it. And even if that span of time is not formally contextualized, you can use it in your searching; if you know grandpa was educated between 1938-1950 and served in the military between 1951-1955, you can use those years in time-bounded queries to shape your results while searching a genealogy database .
(If you want to explore my thoughts on persistent metadata, I wrote a three-part essay which goes deeper into it and shows examples of how I have applied these principles to search tools: one, two, three.)
You apply persistent metadata to your searches all the time. When you look for Japanese restaurants near you, you’re applying the persistent metadata of location. When you do a news search and restrict your results to the last 24 hours, you’re applying the persistent metadata of time. You do this because it works to restrict your results in a useful way and because as a human you instinctively understand the importance of time and place. (There’s a reason for the trope of someone getting bonked on the head, waking up, and asking “Where am I?”) It is an easy step to teach someone to apply this understanding to performing a better Web search.
When I was putting my Google Alert replacements together, I replaced the Google News search first, because a news search is inherently date-based. That is to say, if you went to Google News right now and did a search for Kristi Noem, you would not be reading about what she did during the beginning of the covid epidemic. Monitoring news means you’re filtering recent information, which is a much smaller search space than the general Web or even a general News search. You can narrow it further by restricting your search to an aspect of a Web page (like title or URL) but such strategies are generally less necessary.
On the other hand, monitoring the WEB for new information is much more chaotic. Pages get scraped by a search engine but they get rescraped as well. SEO strategies and dynamic content might mean an absolutely ancient page shows up in what was supposed to be a review of recent Web pages. (This is happening a lot with Google Alerts, which is why I’m making this substitute in the first place.)
If I’m monitoring a space as unstructured as the Web, and I can’t rely on the search engine I’m using to provide completely fresh content, then I need to find a way to either aggressively filter the returned data or build a better search query. The Mojeek API tier I’m using restricts me to 10 results per search and a rate limit of one query per second, so it’s not economical to gather up a big bunch of data and filter it. I have to get smart about my queries instead. That’s why my version of Web alerts has a date mode!
A lot of sites which post content on an ongoing basis (newsletters, magazines, blogs, etc) use date strings in their URL patterns like this: /2024/04/30 . Combine those date strings with the inurl: search operator and you can create queries which incorporate site information to make your search results fresher. For example, a Web search today for donor-advised-funds inurl:04 inurl:2024 is going to provide much different results than a search for just donor-advised funds.
Now when I specify date mode with my Web alerts, the current month and date are calculated by the program and added to my monitoring query as inurl: strings. This query is not date-perfect (the string 04 can match a day as well as a month) but it’s not meant to be; instead it’s meant to drastically reduce the search space, which it does well. And because it does, I can monitor the Web for general topics like “cultural heritage” without having to wade through oceans of junk results.
This has worked so well that I’m intrigued by the idea of adding automatic persistent metadata modifies in more ways. Is there some way I can apply a location search like this? Hmm. It’s worth thinking about, but first I need to move my Google Alerts over…