zSearch

From CommerceNet Wiki

Jump to: navigation, search

Why search the Public Web when you can search your Personal Web?

We have speculated quite a bit about how great it would be to search your own drive, emails, past websurfing, etc over the past year. Now we're planning to build one, and the first milestone along the path is a Personal Nutch, Nutch being the open-source search engine project from the mind behind the Lucene indexer.

The most exciting possibilty is that there is a new blend between public and private infospheres: an index of everything I've read is one thing, but imagine mining and index expanded to include everything you've ever had recommended to you. By crawling all the sites and pages mentioned in your personal archive, it's possible to build up a vast database of pages that covers all the areas you might care about. And that crawl not only is a tiny, focused fraction of a 6B-page index, but it would fit on an ordinary laptop.

See a live Nutch crawl of all the public pages at CommerceNet and all the pages we point to -- an experiment in indexing the "neighborhood" of an organization. For example if you were to search for "Nutch", you'd see pages from our blog, Nutch.org, and Creative Commons.org.

We're also very enthusiastic about Google Desktop Search. We have some notes on that page about how we think it integrates with Windows Sockets to insert local results into Google searches. We also have some brief notes comparing Nutch/Lucene's performance to a homegrown indexer.

See also: Projects, Publications.

Personal tools