Webaroo (Beta 1) review
Webaroo promises a fully searchable and browsable web, even when you’re offline. It’s a seemingly impossible claim and we were naturally sceptical as to how effective such a product could be. Wireless connections will soon be everywhere – Acer, Dell and Lenovo will be releasing notebooks with integrated 3G technology. But Acer has also hedged its bets by signing a deal with Webaroo to include the software on its notebooks.
Rather than follow previous web-caching services and simply grab everything, the Webaroo client redirects search queries to its own virtual proxy server, which then searches through ‘web packs’ stored locally. Webaroo’s own servers crawl the web, analysing pages and storing only a subset: those containing the most content in the minimum storage space. These are then grouped into the web packs, which are freely downloadable. The ‘content value’ is determined by the diversity and quality of the pages: the more diverse a set of pages, the more queries are likely to be answered; the higher the quality, the more relevant the results. It’s analogous to an encyclopedia, with a high content density taking up less space than a library of specialist books. Metadata is then extracted to enable caching and searching within that particular subset.
A smart caching system automatically updates the content of a pack, expiring old pages and adding new ones, at a frequency that depends upon the nature of the pack itself.
The number of web packs will no doubt increase relative to the popularity of Webaroo itself, but for now they’re sorely lacking in range. Webaroo is still a beta product, though, and for demonstration and testing purposes the handful of city-specific and global news packs are impressive. The London pack, for example, contains around 15,000 content-dense pages.
Downloading and installing the 5MB setup file may be quick, but be prepared to invest time and hard disk space to download and store the web packs and websites you specify. Webaroo recommends at least 1GB of available space, although in practice you can get away with a lot less. We downloaded the world news pack (26.7MB) and four city packs (367.8MB total), plus a handful of individual sites. For www.pcpro.co.uk to a link depth of 1, plus images, we needed only 1MB. The link depth setting is currently restricted, so much of the content isn’t cached offline. Although it was possible to locate our Real World columns, for example, we couldn’t display the content without an online connection as they went beyond the link depth limit. This is naturally a major drawback with the current implementation – if the web is to be truly searchable offline, the user needs to be able to specify the trade-off between comprehensive coverage and the disk space consumed to enable it.
Where Webaroo differs from the likes of web-clipping services for PDAs such as AvantGo, is in both keeping the original format and packaging the web in topic-specific chunks. A Windows Mobile 2003 version is also available, but there’s currently no Windows Mobile 5 support. If you search while online, you get the option of both the cached version of the site and a live link. Link descriptions themselves can be surreal, but are generally accurate enough to make choosing the right pages easy. It will be interesting to see if Webaroo can deliver on its promise of delivering the ‘whole web’ (or at least its content density-driven subset) into a 40GB web pack. Considering there are around 10 billion web pages, at an average of, say, 10KB each, that’s a big reduction from the petabyte (1 million gigabytes) you’d require to store the whole thing locally.