Over the weekend I got a Google Alert with an ostensibly “new” news story that was actually from 2010, so I’ve moved “find a substitute for Google Alerts” to the top of my to-do list.
I already have a news API I like okay, but that won’t monitor new Web pages for me — for that I need a Web search API. As much as I like Stract, it doesn’t index new content quickly enough for this case. I looked around and saw that Mojeek offers pay-as-you-go API pricing I can afford, so I got a trial key.
Before I jumped into making my own alerts system, though, I wanted to try a little exercise with Mojeek. Helen E. Brown has been extremely patient letting me throw search ideas at her, and getting her feedback has gotten me thinking more about using datasets to create search contexts and particular search spaces.
Just to see how it would go I knocked something together that blends two datasets and creates some context for a Mojeek search. In this case all the user has to specify is a state in America. The program uses one dataset to identify the current congresspeople in that state and another to identify all the 4-year institutions of higher education in that state. Choose one of the institutions and the program does a Mojeek search for each politician’s name in the university’s web space. (and makes a good guess at the publication year for each result via URL parsing.) It’s an interesting way to browse how politicians are (or aren’t) mentioned in indexed higher education Web space.
So now I’m thinking of contextually-triggered alerts. Maybe you’re monitoring the Congressional Record or BusinessWire or, heck, an RSS feed you really like a lot. Something happens, which triggers your Web/News search.
But you can create more complicated searches as well! Easy example: you monitor CNN for mentions of North Carolina Congressman Wiley Nickel. When he’s mentioned, you use the FCC’s API to identify TV stations in North Carolina and search that Web space for a span of days around the CNN mention. The national media mention both triggers and informs your local media search, hopefully avoiding the low-information results you’d get for a basic keyword sweep.
This is way too much fun to think about but I think I need to start with duplicating what I have right now!