Storage space is cheap these days. Some are even giving it away for free. For me, however, file size is an important consideration. I use a Macbook as my primary workhorse and I want as much information as possible to be at my fingertips at all times. This means that I don’t want to be working with external drives, CD backups and ‘cloud storage’ (except for purposes of backup, of course) I’ve got an 80 gig drive so I should have lots of room for my materials but once you install a few programs, take a few hundred pictures and put a few gigs worth of MP3s onto the disc you’re left with a lot less room to play with. This is why I’ve gotten fussy about what I save from my research.
Web pages are particularly nasty wasters of space. I recently saved a single article from the New York Times website that took up over 1100k! That’s more than a megabyte for a single article. After I’d taken 20 seconds to cut out the adds, javascript and other useless, irrelevant material that was saved in the web archive I was left with a file of 7k. The New York Times stuck me with 1093k of unnecessary material. Now multiply this across thousands of files and realize just how much drive space could have been wasted.
That’s why I’m thankful for Aardvark and make.text.
Aardvark is a bookmarklet (or plugin if you use firefox) that allows you to highlight the part of the page you’re actually interested in an automatically (or sometimes manually) get rid of unwanted content. This is usually more than enough to get rid of much of the crap that comes along with most web pages. However, make.txt takes this a step further by converting the remaining text on the page into markdown. This step is pure nerdery but it does an excellent job of preserving
- the link to the original source,
- all of the remaining links on the page
- the semantic structure of the original document.
What I’m left with is a small, easily searchable file of unformatted text that can easily be imported into other applications for editing and quoting. And no ads.