Desktop search gets down to business

01.09.2005
Von Mike Heck

When enterprises roll out search applications, it"s usually a big IT effort to keep indexes refreshed and the overall systems running. Because of this complexity and the reality that most enterprise knowledge resides on workers" PCs, consumer desktop search technology has infiltrated organizations -- and has caught IT executives off guard.

There"s no questioning the benefit of quickly finding that e-mail or spreadsheet squirreled away months ago. Yet there are still red flags concerning security of consumer desktop tools, such as revealing private personal or corporate information or introducing spyware to the enterprise network. More significantly, these tools lack the centralized administration so essential for enterprise deployments.

What, then, distinguishes tools that are free or for personal use from those you"d consider purchasing for your organization? To answer this question, I looked at enterprise products from dtSearch, ISYS Search Software, and X1 Technologies, along with Google"s Desktop Search, which has recently been outfitted with corporate features.

I checked the breadth of file types, total number of documents, and systems that each enterprise product indexes, as well as how each accomplishes this. Accuracy is of utmost importance, of course, along with usability. The end-user experience is not, however, just about forming queries and displaying readable results; the operational side, which includes the building and sharing of indexes, is equally significant.

Search performance goes beyond how fast a product indexes and returns results. Thus, in testing these products, I also considered what lies beneath, such as the index size and system resources consumed. Given that IT staff resources come at a premium, I examined how customizable each product is -- and whether rollouts and updates could be performed with existing software management tools.

Last but not least, security is paramount even when these search tools are used within a corporate firewall. Desktop search applications should respect Windows authentication and related permissions, such as log-ins to file servers, Web sites, applications, and local workstations.

dtSearch7.01

The old man of the bunch, dtSearch introduced its desktop text retrieval software in 1991, and Version 7.01 further improves the product"s usability and performance. Besides full-text scanning of Outlook e-mail, indexed documents can be in HTML, PDF, XML, Word, Excel, PowerPoint, WordPerfect, RTF, and ZIP formats. The system also searches unindexed documents as well as a combination of both. The network version adds scans of remote file servers.

This application offers a wide range of search options (12 in all), including fuzzy, phonic, natural language, Boolean logic, and proximity. Search results appear in a customizable browser. Navigation commands permit quick scans through documents, although dtSearch lacks results clustering.

Among the products reviewed here, dtSearch offers the most options for managing indexes, including merging and creating libraries. You can index Web sites to any level you want, and the spider works on both static HTML pages and dynamic sites, such as those driven by content management systems. One improvement I"d like to see is password encryption -- passwords  entered to crawl protected sites could potentially be read by anyone with access to your PC.

A separate application, which introduces two more interfaces, is used for search and displaying results. The search part puts the majority of options on the main tab, so it could be a bit daunting for first-time users. After you learn how to find your way around, however, the design saves time. For example, you can select the indexes to search, features, and relevance all at once.

When I searched local e-mail and Office documents, dtSearch always returned results in less than a second, living up to the company"s claims. The software will store original documents or text equivalents, too, bringing the time to search remote servers or Web sites to less than a second as well -- although I found that this setting increases index size by about 20 percent. For the most part, indexes were reasonably sized -- about one-fifth of total document size -- because they"re compressed in ZIP format.

I was especially impressed with the transparent features that boost accuracy. For example, dtSearch automatically recognizes fields in XML files and meta information embedded in PDF documents, resulting in more relevant returns.

In addition to precise results, these lists are usable, with the top part of the view showing document name along with a relevance score and the lower pane previewing the document in the original form with the hits highlighted. I had no problem jumping back and forth among documents or opening documents in applications associated with the hits.

To share indexes, users merely make a shortcut to a shared data folder on a networked server -- running dtSearch from my PC automatically searched indexes listed on the server.

IT staff can automatically deploy dtSearch"s main executable file using Group Policy Objects in Microsoft"s Active Directory or by employing Microsoft SMS (Systems Management Server). It"s also easy to create and deploy the separate policy file that specifies options such as the location of shared index libraries. An optional client/server version of dtSearch, Network with Spider, is essentially the same software run from a central location. Users access the shared index using a menu in the client software.

I found dtSearch to be a versatile application that allows you to hunt quickly through large local and remote data stores using practically any search formula, from keywords to fuzzy logic. In some spots the design looks a bit dated, and usability could be improved by integrating indexing and search functions. Still, performance is great and makes dtSearch useful in almost any endeavor.

Google Desktop Search for Enterprise

Almost identical to the consumer version, Google Desktop Search for Enterprise indexes content on your local hard drive from various e-mail apps, business file types, Web pages viewed with the top four browsers, and AIM chats. Employees search and see results using the familiar Google interface.

This version has a legitimate claim to the "enterprise" tag because of its centralized administration and security. IT staff may restrict indexing of secure sites, and there"s encryption of users" local indexes to protect them from unauthorized access. Plus, the Google software works on workstations with multiple log-ins: Users search across the files only they can access, while content associated with other accounts remains secure.

After a quick setup, Desktop Search automatically performs an initial index of all PC files and then perpetually refreshes the catalog. This step requires about an hour.

If you"d rather not wait for the auto-indexing, you may select which items to crawl, such as e-mail, Word documents, or PDF files. Google doesn"t recognize as many file formats as do the other products reviewed here -- it covers a little more than 25 -- but it hits all the major types, including Outlook and Notes e-mail and Office documents. I didn"t find any documents that I couldn"t index.

In addition to built-in formats, plug-ins are available for download, many of which are freeware contributed by outside developers. I recommend looking into these if you have special search needs; the plug-ins are conveniently linked from Google Desktop"s Help site and at desktop.google.com/plugins.

Google"s numerous search operators include phrase, site, file type, and advanced e-mail; these apply whether you search using a toolbar search box or Web page. By combining appropriate operators, I quickly limited an e-mail search to messages on a specific topic from a certain person. Although this package doesn"t match ISYS:desktop"s advanced clustering, Google nevertheless does a solid job grouping all e-mail search results on the same topic.

File, chat, and Web search results are also organized well. The file results page shows an icon indicating its type, a snippet of the content, a link to open the file, and a link to cached versions of the file. Web results show a small thumbnail of the page"s layout.

What impressed me most were the functions that should endear Google to your IT operations staff. Desktop Search for Enterprise includes a Group Policy Administrative Template, which permits policy settings targeted at each user. Policies are very granular (such as disabling indexing of certain document types or Web sites), permit securing of indexes with EFS (Encrypted File System), and are especially well documented. Distribution is performed via Microsoft Active Directory server or SMS, and staff can test updated versions of the software before distributing them.

To scan intranets and other corporate data, you"ll need a Google Search Appliance or Google Mini in addition to Google Desktop Search. That adds cost to this solution, but not necessarily complexity. Although I wasn"t able to test the latest version of the Google Search Appliance software, it has been improved during the past year to address earlier limitations on searching enterprise content. For example, you can now search IBM DB2, SQL Server, MySQL, Oracle, and Sybase databases.

"Free" is a bit misleading when factoring in an appliance to search business repositories, but even considering that addition, the complete solution can come in below the cost of other, similarly configured solutions -- throwing in premium support, however, may alter that.

ISYS:desktop 7

A significant upgrade that streamlines the ISYS:desktop search solution, Version 7 gives users more relevant results. It indexes as many as 64 million documents per index and can chain as many as 128 indexes, or 8 billion documents.

A new taskbar object allows users to search at any time, with options to select indexes or launch the full query interface. In either case, users drill down through results faster because ISYS:desktop 7 now performs on-the-fly categorization. Also contributing to a fine experience is ISYS:desktop"s capability of searching more than 140 structured, unstructured, and semistructured file formats.

As does dtSearch, ISYS:desktop 7 employs a separate utility for creating and managing indexes. Using this utility, I was able to define what each index should catalog, choosing among documents, e-mail, Web sites, and specific folders and file types. Each index has a setup wizard; they all include the option to set a daily update schedule and an agent to alert you when new documents appear that meet your search criteria. But, as opposed to the other products tested here, ISYS:desktop does not provide a built-in or optional server component -- to crawl and store searches centrally -- although indexes can be shared.

Indexing speed was reasonably good, with typical throughput of 10GB per hour -- a full index takes 24MB of disk space. An advanced feature, Rich SQL, displays database records in a readable format, and I used the built-in HTML Editing Suite to customize database display templates. ISYS also indexes documents and binary objects stored in SQL and Lotus Notes database.

In an unusual move for a desktop search tool, ISYS:desktop automatically creates categories when indexing, based on metadata, folder names, database table names, and related attributes. Typically you only see this with specialized products, such as Vivisimo Velocity, and it makes finding relevant information much easier. There are five ways to search indexes. Menu-assisted is by far the easiest method to build exact queries because it guides novices through using conditional operators. I also constructed natural-language queries easily and refined results with Word Wheel ("sounds like" or "starts with"). Web-style search uses a syntax common to public Internet search engines, and the command-line option will appeal to expert users.

After entering a query, ISYS:desktop responds in less than a second with a well-designed results list, including the most relevant documents grouped into categories, those with secondary interest, and a large review pane with hits highlighted. I would like to see a high-fidelity preview such as PDF or Word documents in their original form; right now, ISYS:desktop displays only a plain-text preview of a document"s content. That said, you can immediately launch the original document from the ISYS:desktop 7 toolbar.

ISYS:desktop 7 worked well without much tuning. I searched within results, applied various filters, and hid results that didn"t interest me, all of which helped refine results. ISYS:desktop 7 also searches most Asian and European languages -- a boon for international companies.

Those working with a lot of structured data will like ISYS:desktop 7"s spiffed-up metadata handling. There are now more metadata search operators, and you can see a document"s known metadata by hovering over the results. ISYS:desktop 7 is extensible with scripting tools; for example, you could inject metadata into documents from an external database programmatically to improve results. dtSearch 7 is the only other product in this roundup that offers that type of control.

ISYS:desktop 7 relies on Windows NT security, so users won"t see results from documents they don"t have permission to view. The system, however, includes rudimentary network installation options, requiring a file server to be designated as a license server.

Overall, ISYS delivers a fast, usable desktop search experience. You get numerous, well-engineered query options that don"t take much time to master, as well as clustered results that greatly help in locating relevant files.

On the downside, this application lacks the manageability options you"ll find in the other desktop search tools, such as automated deployment. Searching network resources could be better thought out for enterprise use, such as including a way to share searches or indexes created by others.

X1 Enterprise Edition

X1 is likely a name you know -- the company"s software is behind the Yahoo and Earthlink desktop search initiatives. Its just-released flagship product, X1 Enterprise Edition, adds secure search of PC and enterprise data from the desktop.

Conceptually similar to Google Desktop, X1 Desktop Client automatically indexes content on a PC, including network drives that appear on the desktop. The final index updates automatically according to a few basic rules that users establish, such as how often the system should check for new or revised files.

X1"s enterprise interface shows polish. I quickly picked the type of search, which includes Files, Documents, Pictures, Email, and Attachments. Searches weren"t quite as fast as they were with the other products, but they never took more than two seconds.

The search box doesn"t offer any obvious options, yet searches were generally accurate. There are a few basic search rules, such as Boolean logic, to refine results. I also formed my queries with more advanced search commands, such as specifying the subject of an e-mail, which worked well to limit results to specific documents. Still, X1 Enterprise offers fewer ways to refine a search than do the other products.

On the positive side, keywords were highlighted and documents -- including tough formats such as Visio drawings -- were previewed in a separate pane for easy access. The media-enabled preview also plays music and displays images or videos from your results list.

Searches may be saved as Favorites, which cuts time later when looking for similar information, and searches can be mailed, exported, or printed.

X1 Enterprise Server comes into play for searching enterprise repositories. This Microsoft Windows service scans designated network data sources and maintains a central index of data it finds. Creating an initial index of 65,000 documents required several hours, and the index was about 20 percent of the total file size.

Integrating directly with enterprise systems requires the optional Content Connectors, which are client-side and server-side modules. The Content Connector for Exchange extracts e-mail, attachments, and contacts from Exchange servers; the same is true for the Notes connector.

In a future release, X1 plans to expand these Content Connectors to permit indexing of Interwoven document management repositories and Microsoft SharePoint portal sites. X1 is also working on an SDK that should allow IT staff to create a searchable index of proprietary enterprise data or a system that doesn"t have a prebuilt connector, such as Microsoft SQL or Oracle databases.

Unfortunately, I found that the security side of the X1 solution requires administrative hand-holding. For example, you have to think through security permissions on each index, then publish the search for others to access the results. I believe this could be streamlined with Active Domain or another directory product.

After remote searches are published, users work with information in the X1 Browser Client, which uses Internet Explorer. The Browser Client allows users with a domain password to access results they are privileged to see, all without additional software. The X1 Mobile Client -- sold separately by WebMessenger -- works the same way for BlackBerrys.

Those running the X1 Client link to the server index from their desktop, accessing local and network indexes from one search interface. You can"t yet explore both the client and server results in the same search, however -- X1 representatives said the company will be adding a "search across all" feature in the next release.

X1 provides adequate client customization and distribution through its Deployment Manager. An installer can be run or downloaded from a network share, or you can distribute it using SMS, Active Directory, or with a custom solution such as BMC"s Marimba.

Taken as a whole, X1 Enterprise Edition does a good job. The capable X1 Client operates smoothly, can be locked down by administrators, and can be deployed with common management applications. The X1 Enterprise Server, with its current Content Connectors, handles corporate e-mail systems, but you"ll need to wait for the SDK and more Content Connectors to index other enterprise data.

X1 also needs to work on the server administration side, and it needs to develop a way to integrate local and remote indexes. That"s an important function that dtSearch and Google Desktop perform now.

Searching for the right fit

After putting all four through their paces, I wouldn"t hesitate to install any of these products on my PC for local file searching. Each performed searches quickly, produced similar results, and didn"t have any major security issues, but if you expand beyond the client desktop, some differences surface. From a pure technology standpoint, I like dtSearch 7. It"s fast, extensible, easily deployed, searches across multiple indexes simultaneously, and doesn"t mandate a central server. That last point could be a stumbling block for large enterprises because you must figure out an efficient way to share indexes securely.

ISYS:desktop 7 is the only product in this review with true results clustering. What"s more, indexes can be shared, and the multilanguage support is handy for international companies. Yet again, scalability could be a problem for very large deployments.

X1, on the other hand, mainly does it right on the desktop and bundles in a server for indexing enterprise data. It falls short in server administration and its currently available Content Connectors, which limit the types of enterprise data that can be searched.

Google Desktop for Enterprise is a model of usability and true to the Google mantra -- which extends to federating local, Web, and intranet search results. The newly released Desktop Search Beta 2, which adds a sidebar and Quick Find function, shows how serious Google is about further improving the search experience. IT shops will like Google Desktop"s ease of maintenance, although you will need to budget for an appliance to make this solution viable, and you may not be able to search quite as many back-end systems as you can with the other products.

Clearly, traditional search vendors see value in their superior -- albeit costly and complex -- back-end systems. The challenge for the pure-play desktop search vendors will be to continue to add value in the face of updated enterprise search solutions and vastly better embedded OS search, such as Apple"s Spotlight and Microsoft"s Windows Vista.

SIDEBAR

Windows Vista offers view of integrated desktop search

There"s no reason to postpone planning your enterprise desktop search deployment while waiting for far-off OS-based search technologies. Still, that doesn"t mean these developments aren"t worth tracking, considering that they have the potential to significantly alter the way we locate information. With that in mind, I examined Microsoft"s first Windows Vista beta to see how the desktop search experience may change for end-users.

The most obvious change: Search is planted throughout the Windows Vista interface. You"ll find a search bar at the top of every Windows Explorer display. Begin typing a product name, such as Adobe, in the search block attached to the Start menu and Windows Vista immediately returns results that are only in Adobe application formats. No more shuffling through seemingly endless lists of programs.

Windows Vista retains the physical My Documents folder but adds Virtual Folders, which collect related files spread throughout your hard disk. When I opened documents from the Start menu, Windows Vista allowed me to display files by type, keyword, or author -- for example, Windows Vista automatically collected all my spreadsheets in a virtual Excel folder.

Windows Vista Beta 1, however, recognizes only Outlook e-mail; as a comparison, Apple"s Spotlight -- the embedded search in Mac OS X 10.4 Tiger -- indexes e-mail messages, iCal calendars, and contacts. Combined with a wider range of indexed documents, Spotlight holds a slight edge early on, but Windows Vista Beta 2 will also search Internet History and any RSS feeds to which you have subscribed.

To refine accuracy, Windows Vista uses a fundamental concept called Labels. In effect, you attach metadata to files, which allows you to further categorize information. You could, for example, have a set of spreadsheets appear segmented by their various divisions or business units. And for developers, Windows Vista also offers a way to include unique file types in a desktop search.

The main search interface is far more refined compared with previous versions of Windows -- I could easily cascade filters to find the particular file I needed. I also liked Windows Vista"s visual style of previewing documents in a separate pane and providing direct access to the meta information for easy updates and changes. 

It remains to be seen how well this search will work with shared files and enterprise systems, such as SharePoint, databases, and Web servers. Nevertheless, at this early stage, Windows Vista appears headed in the right direction.