Foxy feeds

Some useful web-technology advances go unnoticed by the majority of users, perhaps the most notable being RSS, which stands for Really Simple Syndication or Rich Site Summary. RSS summarises site contents for distribution and viewing in any format. Its first popular use was to distribute web newsfeeds to a wider audience. It caught on because it is a totally portable format, and is both platform independent and content oriented. Once some data source is made available in RSS format, any application that’s RSS aware can view it, and users who subscribe to a particular feed get fresh content pushed to them any time it changes. Not surprisingly, RSS feeds have become popular with bloggers too.

Foxy feeds

The applications you use to read RSS feeds are both varied and plentiful, ranging from dedicated standalone aggregators to applets that integrate into mail clients like Outlook (I will be looking at just such an application in next month’s column). If you have been tempted to dip a toe in Mozilla waters following my recent Firefox coverage, note that it now comes with an interesting new technology called Live Bookmarks, which lets you view both RSS and blog headlines directly within the Bookmarks toolbar/menu without having to visit the home site in question. Clicking on any entry displays the full content in the main browser window, and adding new feeds is done from the New Live Bookmark entry under Bookmarks | Manage Bookmarks | File Menu. Alternatively, just click the live Bookmark icon on the bottom-right corner of the browser window. More information on this can be found at www.mozilla.org/products/firefox/live-bookmarks.html

The trouble with all these RSS and blog feeds is finding the ones you want to read, which is where services such as Technorati (www.technorati.com) come in. Monitoring over six million blogs – and 750 million links as a result – Technorati is a kind of Google for blogs, but it makes a pretty good RSS feed finder as well, because so many blogs have cottoned on to the fact that RSS is a great way of spreading their word. If you want to find more RSS feeds, the aptly named Feedster (www.feedster.com) will not disappoint.

Also putting RSS to good use, as well as being a painfully cool URL, is del.icio.us, which lets you share your bookmarks by publishing them as RSS feeds. Essentially, it is a social bookmarks manager that exploits the same shared browsing concept I already admire so much in StumbleUpon (www.stumbleupon.com), bringing it into your browser by way of a simple bookmark. A shared bookmark facility is not as daft as you may think, and once you have experienced it you will find social browsing hard to give up. Its value lies not in seeing the sites other people have bookmarked – although that in itself is a pretty addictive hobby – but in seeing who else has bookmarked the sites you use most often and what other sites they value equally. The adage about great minds thinking alike couldn’t be more wrong in the information age, and you will find invaluable sites and services that would never have crossed your horizon were it not for either StumbleUpon or del.icio.us.

Davey the digital detective

If you have ever wanted to find out more about a website, an IP address or a domain name, the chances are you have stumbled across DNS Stuff (www.dnsstuff.com), which is an all-on-one-page set of DNS, Whois and other IP-related lookups and tests (including reverse lookups, URL obfuscation tests and IP routing lookups). However, if you are a really savvy user, you need to dig a little deeper, and for once I’m not recommending you go to Sam Spade (www.samspade.org), despite that also being in my Bookmark folder’s ‘essential tools’ directory. In fact, you only need to dig down one or two levels at DNS Stuff. First, try www.dnsstuff.com/pages/expert.htm, which adds such things as MAC address lookup and DNS timing lookup. Then, if you are really adventurous, go to www.dnsstuff.com/pages/testbed.htm, which enables credit card merchant category code and charge-back reason code lookups, SSL examination and even telephone number lookup from number fragments.
If your interest lies in finding the right domain name for your new web venture, or seeing who has registered similar names, I find Whois Source (www.whois.sc) hard to beat. There are two aspects to this site: the name spinner and the domain explorer (a third is just a money maker for the company, auctioning off domains that are about to expire). The name spinner is brilliant when you know the type of name you want but cannot find a variation that’s still available. Instead of spending hours searching, let these people do it for you. Enter the name you want, see the top-level domains that are already registered, along with indications of whether they are active, expired or whatever, but also see similar domains using all the same information. If a name you really want is already taken, use the domain explorer to dig deeper into its details and discover who owns it, how long it has been active, when it expires, get a snapshot of the home page, details of SSL certificate expiry, web host server type and IP address location. There is also a one-click process to monitor a domain, so you get notified if it expires. Be aware that there is a commercial aspect to much of this stuff – you have to sign up if you want to activate a monitor, for example – but the most important search-and-retrieval stuff is free.

Taking the search lead

I have covered search technology a fair bit in this column lately, but I make no apologies, because knowledge management rates right up there alongside security as the hottest of topics on the client side of online computing right now. If further justification is needed, honestly answer the question, ‘can you find the exact information you need, when you need it, and without fuss?’ Although there have been some truly important leaps forward recently (including the move toward integrated Desktop searching from the likes of Google and Copernic), the search garden is far from being all rosy. Every current solution has its limitations, and each day more and more of us stumble across them.

I’m glad to say, though, that the passage of time and force of peer pressure do seem to be solving some of the more long-standing and annoying limits. Google has finally lifted its ten-word search limit, for example. ‘The ten-word what?’ you might be thinking unless you are a serious searcher, as then you would be only too aware that up until now Google has placed a ten-word maximum on the keywords used in a single search. Ask Jeeves, Yahoo! and even MSN have never imposed any such keyword limit, and I’m still at a loss as to why Google did it. The good news is that anyone needing to go Google on a whole passage of text, or to automate their searching using the Google API, can now do so with much less hassle, although I’m slightly disappointed to see that the new keyword string ceiling has been set at 32 words, which still is not enough in my humble opinion. If you have been moaning at Google about the ten-word limit, I implore you to continue doing so about the new 32-word one and, while you are at it, about the 101KB page cut-off, the lack of nested searches or any proximity operator. Talking of which, the only search engine I have found (and recently a client required me to write a report outlining the relative benefits of 50 of them) that does offer proximity searches is the little-known Exalead (beta.exalead.com).

‘Little-known’ does not imply little-used (and certainly not little usefulness), though, as Exalead has been powering AOL France searches for the past three years and its parent company has been highly active in the enterprise search space for six years. The web search engine was launched last year and features much of the functionality power searchers have been crying out for, including that proximity operator we have missed since the days when AltaVista, for example, used to have one. This is no coincidence, as the Exalead founder, François Bourdoncle, was part of the team that started AltaVista back in 1995, and AltaVista’s founder, Louis Monier, is a member of Exalead’s board. At one billion indexed pages, they are not claiming to be the biggest. In fact, Exalead is not claiming anything, as it appears content for users to sing its praises on the once-bitten-forever-smitten principle. User traffic is increasing at a rate of 20 per cent per week, so it would seem that word of mouth is working pretty well, and with both RSS and blog indexing picking up speed the two billion-page milestone will be passed before long.
So what is so great about Exalead? Perhaps ‘different’ would be a better word – different in both feature set and functionality, different with regard to navigation and interface. Exalead indexes Word and PDF documents, PowerPoint and MP3 files and all the rest, and it does this in real-time. It lets you search by proximity as already mentioned, but also phonetically, and is happy to accept truncated searches too. However, it is the user interface that impresses the most. Switching between text-only, text-with-thumbnail and thumbnail-only results is easy, for example, although thumbnails are currently restricted to websites, while non-HTML documents such as PDF or Word files show just an icon rather than a preview, which is a shame.

More than making up for this is the integrated ‘safe’ Preview browser, which lets you quickly view the page you are interested in within a sandbox environment (so the original document, be it Web/Word/PDF, is not opened), and the forward/backward buttons that enable quick jumps from one highlighted search term to the next. Being able to bookmark sites from this Preview window straight into your browser’s Bookmarks folder is hugely useful. Then there is the Navigation Bar, which lives up to its name, as search results are listed alongside dynamically related search terms and related categories from the Open Directory listings and can be one-click re-sorted by geographic location or document type. Lastly, it is worth mentioning that while Boolean search operators are a feature, as they are in the majority of search engines, Exalead, unlike the rest, allows these to be combined into complex expressions. Such search expressions are particularly powerful, thanks to the completeness of the set of operators allowed. For example, you can include NEAR for proximity searches. It should come as no surprise that Exalead’s slick search offering, soon to be available as a Desktop-searching variant, is coded from the ground up in XML.

Google pot shots

Although it suffers a little from ‘top of the tree’ syndrome where everyone (including me) takes pot shots at it, Google itself continues to innovate in some areas. Most notable of late is the beta test of Google Print, which is currently being integrated into the main search engine. It is an interesting concept, bringing content that is not currently online into the search space. You do a search as normal and, if there are books within the Google index that contain content matching your search, a books link is flagged at the top of your search results listing. Following this link takes you to a page where you will find additional information about the book and links to purchase it at the major online bookshops. Most importantly, though, the page of the book that matches your search profile will also be displayed in full. Indeed, most books will offer a number of pages either side of the main ‘hit’ available for reading online, together with a contents page.

If the book has expired from copyright, the chances are its whole text will be available for online reading, but not otherwise. There are some limitations, such as both browser-printing and image-copying functions being disabled (although if you really must, it is easy to get around these blocks by using screen-capture software). If you use the Google Desktop search, books do not seem to get returned in results, at least not in my browser. At the moment, the main limitation is that you will not get too many hits that return book links, because only a few publishers have as yet signed up. However, I expect this to change fairly quickly, as Google is signing deals with library partners at the moment.
As an alternative, you might want to try Amazon, or rather its A9 search engine (www.a9.com), which has a book-search feature and a better interface for searching inside those books and viewing covers, contents and excerpts. My rather random testing suggested that currently Amazon has a larger index of books to reference, not surprisingly, than Google. Similarly, when it comes to articles (of the magazine and periodical variety), I recommend going to LookSmart FindArticles (www.findarticles.com), where there are more than five million full-text and free articles to search through, covering some 900 magazines and journals dating back to 1998. There are also links to premium content, which point to articles that need to be purchased, but even then this works out a lot cheaper than subscribing to the online version of the publication, assuming there is such a subscription available in the first place.

A toad well travelled

Another interesting idea on the search engine front comes courtesy of MrSapo (www.mrsapo.com), which offers a twist on the usual meta-search genre. Instead of bringing the results from multiple search sites into one results index, as is the norm, MrSapo is perhaps best described as being a meta-interface site. You get a single search box and the one search site, with buttons that enable you to perform your search at any of the many choices available, but only one at a time. This may sound like something of a backwards step when compared to a meta-search site, but it works very well. The trouble with meta-searching is that, more often than not, you are faced with information overload and no choice about the search engines being used (which will be determined by partnership agreements and business deals rather than user convenience).

I’m sure many of you will have visited one meta-search site after another to reach the spread of engines you really wanted to search. One way to get around this is to use a product such as Copernic Agent (www.copernic.com), which lets you specify which of the hundreds of search sites listed you want to include in any given meta search. It also lets you create specific search-site sets for different applications and index their results in any number of ways. Unfortunately, this is a rather expensive solution for any but the professional researcher, although it is wonderfully efficient and is used here at the Happy Geek offices on a daily basis. This is where MrSapo comes in, offering a much wider range of search sites that cover audio, video, RSS and blog searches all at the press of a button and all returned on the same page. You only need enter the search terms once, which makes re-submission at multiple sites a breeze. With big names like Google, Yahoo! and MSN included alongside the less well-known but equally useful Clusty, Snap and Vivisimo, to name but a few, it is certainly one to look at.

Disclaimer: Some pages on this site may include an affiliate link. This does not effect our editorial in any way.