Autojot
Download: autojot-0.1.8.tar.gz
Autojot indexes every web page you look at and lets you search them
later. It depends on Perl (with LWP), SWISH++, and
FilterProxy.
A public search engine like Google
searches a small subset of everything on the net. That's great for
finding things I haven't seen, but sometimes what I want is a
searchable index of everything I've seen already. When I think to
myself, "I saw something about that a month ago, and I can't remember
where.", I want a good way to find it, without having to wade through
other keyword matches from around the world, telling me about things I
haven't seen.
Warning: this program is kind of a pain to install.
To Do
Bugs to fix
- There's little documentation.
- FilterProxy has changed a lot and Autojot probably doesn't work
with the latest version.
- File locking may not work over NFS. There does not
appear to be a good solution,
though there may be a better method than Perl's flock().
- A URL can only be indexed once per index file. If the content
of a page changes faster than index files are rotated, some of those
versions will not be indexed. This is a limitation of SWISH++.
It would be possible to index each visit by naming files according
to URL + time, but you'd get a lot of duplicates that way. My
solution is to try to visit permanent locations of things and not
worry about what gets indexed for dynamic front pages.
- I haven't tested it with a recent version of SWISH++ (though
because the index format has changed, I can't ever upgrade
without throwing away my data or writing a converter).
Features to add
- Automate rotating the index files.
- Add date to hit report and allow sorting by date instead of score.
- Add configurable report formatting.
- The installation is complex. I should make a Debian package for it now that FilterProxy and SWISH++ have them.