Features – Search Engines for Intranets

Nina Platt is an independent consultant in library automation and information management. Her past experience includes 10+ years as technical services manager and systems librarian in the law office environment. In addition to her work as a consultant, she writes and speaks frequently about information resources and technology, has developed a number of Web sites and full-text databases and continues to search for better ways to integrate resources, making research more cost-effective. Compiler of Piper State Court Directory, her most recent project is the online newsletter, ILSR : Integrated Library System Reports.

(Archived August 15, 1998)


Abstract

As Intranets become more common place, the need for implementing the ability to search for content has become more important. Without a search engine, much of the content that makes up an Intranet is lost. With so much importance placed on this tool, it is imperative that thoughtful planning precede the implementation. How does one choose a search engine? What criteria are used? What steps should be included in the project plan for putting a search engine in place? Additionally, many commercial web servers now include a search engine as part of the server package. It would seem that a built-in component would be simpler to install, easier to maintain, and less expensive. How do those search engines stack up against the commercial third-party products? Are they easier to work with? Do they meet user needs? A comparison of the search engines delivered with Novell’s intraNetWare, Microsoft’s NT Server and Netscape’s SuiteSpot Server shows the strengths and weaknesses of each product.

A couple of years ago I was given the responsibility of finding a search engine to add search functionality to a web site that contained an archive of court opinions. I decided to look at other courts to see what they were using to get an idea of the products that were available for such a purpose. As I researched, I discovered that while most sites were using freeware like WAIS, Glimpse, ht://Dig, some sites were using commercial products like Fulcrum, Folio, Excite, etc. A smaller set still was using built-in search engines that were part of web server/development products like Web Site and FrontPage.

Options for adding a search component to web sites have changed a great deal in the last two years. While some freeware still exists, a host of software companies have delivered commercial search engines to the market and most commercial web servers now have a built-in search facility. With all the choices, how do web site and Intranet developers make the decision of what search engine to implement? When should they implement the search engine delivered with their web server and when should they implement a third party product? This paper discusses the criteria used for selecting a search engine and compares the search engines delivered with Novell’s intraNetWare, Microsoft’s NT Server and Netscape’s SuiteSpot Server.

1. ASKING THE RIGHT QUESTIONS

I can only imagine that those web developers that settled on the freeware did so because it met their needs without adding a large cost to the development. Those using the commercial products were probably looking for additional features that were not available in the freeware search engines. Finally, those using the built-in search engines or search bots were probably looking for ease of implementation and maintenance. Deciding which search engine to use is not an easy task. Many questions must be answered to insure that the product selected meets the needs of those who will be searching the web site or Intranet:

1.1. System Platforms

What platform will you be using? Examine the platforms supported carefully. The search engine generally will only run on specific platforms with specific versions of the platform’s operating system with the exception of those that provide the source code and require compilation upon installation.

1.2 Installation and Maintenance

How important is easy installation and low maintenance? If web site creators finds that time is a precious commodity or do not have the technical know how to undertake a complicated installation then they should look for a search engine that is easy to install and maintain.

1.3 Database Type

Do you want to maintain a collection of files stored in directories or do you want to maintain a datafile that stores the documents? There are advantages to both. The search engine that requires maintenance of a directory or directories of files does not require that you import each file into the database. The search engine that requires the documents be stored in a datafile allows the administrator the luxury of only having to maintain one or more files (depending on the structure of the database).

1.4 File Type

Does the search engine you are considering support the various file types (Word, Wordperfect, Excel, Powerpoint, PDF, etc.) you want to index.

1.5 File Location

Where will the documents that you want to index and search reside? If the documents that are going to be included in the database are spread across file servers or directories, then the search engine chosen must include the ability to index documents wherever they are located.

1.6 Multiple Databases

Do you want to maintain individual databases or be able to choose one or more databases to search? If so, then the search engine has to have the capability to create multiple databases and the search interface must provide users with the option to select one or more databases.

1.7 Search Forms

Do you want a simple search form or one that prompts the user for a number of search criteria? Another option would be to have a simple search form as your initial search page with the option to go to a more advanced form.

1.8 Search Capabilities

What search capabilities do you want to offer your users? The searching functions must be examined carefully to see that they meet user needs. Some search functions supported include:

  • Natural query language. This allows users to enter a question or phrase that best describes the topic for which they are searching.
  • Boolean operators (AND, OR, NOT). These are connectors that allow users to contained in documents (OR), or when one term but not the other are contained in documents (NOT). The default that is used when not entering a connector is generally AND or OR.
  • Proximity operators. These connectors allow users to search where a term is found within so many characters from another term (W/number of characters), where a term is found ADJacent or NEAR another term. Another capability offered by some database managers allows the user to specify the order of the terms (i.e., database BEFORE manager).
  • Phrase searching. Allows users to search for an exact phrase.
  • Thesaurus. Uses an operator that replaces terms with synonyms or provides the user with a summary of broader or narrower terms and/or synonyms.
  • Concept searching. Similar to thesaurus, this function will search on all variations of a term.
  • Wildcards. Allows users to truncate terms when they want variations of a term or insert wildcards when they are not sure of the spelling or want to specify how many characters should be replaced by the wildcards. A single string wildcard can be used to replace one character (i.e., Anders?n for Anderson or Andersen). Multiple characters can be replaced by using more than one single string wildcard (i.e., act??? would retrieve action or acting). A character string wildcard can be used to search words that contain the same string of characters (i.e., dark* would retrieve darker, darkness, darkest, etc). Wildcards can be used for prefixes, suffixes, or characters within a word depending on how the software was developed.
  • Exact match. Allows users to search on the term exactly as it is entered. This is useful if the database was set up to search for the singular and plural variation of a term.
  • Fuzzy match. Returns records with words that have a similar spelling to search terms.
  • Numeric operators like equals, greater than, less than, etc. returns records with a specific alphanumeric value.
  • Range operator. Returns records within a range of values.
  • Fielded searches. Allows users to search on a specific field or fields in the database.
  • Query by example. Enables users to find other documents similar to a document in the current result set that the user finds relevant.
  • Advisors. Provide tips on how to construct a better query.

1.9 Structured Fields or Metadata

Do you want to add structured fields or metadata to your documents? If so, the search engine must support the addition of structured fields. The advantages to adding structured fields are many including the ability to search on specific criteria. The disadvantages include the increased amount of time needed to maintain the database. If you choose to use metadata, will you use the Dublin Core standard or develop your own standard?

1.10 Document Collection

How are you going to collect the documents that will be indexed? Do you want your end users to submit documents for indexing interactively? Does the search engine you are considering have built-in collection support? If not, will you need to develop it? If your end users do submit documents for indexing, do you want them to enter the values that will be stored as metadata? Do you plan to develop data entry standards? If so, how will you insure that those standards are met?

1.11 Results Display

How do you want the results displayed? Some of the values included in results displays by various search engines include:

  • Title
  • Author
  • Description or summary
  • Size
  • Relevance ranking
  • Number of documents/records found
  • Database from which the document was retrieved.
  • Search terms used
  • Date document was created or indexed
  • Database fields as specified by database administrator or user
  • Terms searched on are highlighted in the document
  • Users can navigate between search terms (or hits) within the retrieved documents

Do you want your users to be able modify the results display? Different users may find that they need to display different components of the documents they retrieve.

1.12 Viewing Files

Do you want your users to be able to view the HTML or ASCII form of the documents and be able to download the original word processing file? The various search engines handle this in different ways (some of which are more time consuming) and some do not offer the function.

1.13 Cost

How much do you want to pay for the search engine? The cost of the search engines range anywhere from free to thousands of dollars.

2. DEVELOPING A PLAN

The questions listed above are just a few of the questions that must be considered before selecting and implementing a search engine. As you develop your own implementation plan, you will find that there may be more questions than answers. Some things to keep in mind as you proceed.

2.1 Involve End Users

To make a good selection, include end users in the evaluation process. This will ensure that the database that is developed will meet user needs. Involve them before, during and after implementation.

2.2 Do Your Research

Spend time learning about the products. Find articles and reviews that comment on the search engines you are considering. Talk to other developers who have implemented the search engines. They will be able to confirm if the functionality the vendors say exist really does and if the product works without any major bugs or problems. They can also tell you about any problems they encountered in implementation and maintenance.

2.3 Forget the Holy Grail

Do not dismiss a search engine because it does not meet all of your needs. There is no perfect search engine. All of the features of the products must be examined and tradeoffs must be made depending on what is more important. For example, you may find that the users must have Boolean operators but do not need the ability to modify search results.

2.4 Regroup Quickly

If, after implementing a search engine, you find that it does not meet the needs you determined during your initial analysis (even though your research showed that it should), move on to another product that does work. The search engine is too important a component in your Intranet to leave poorly implemented.

3 USING THE BUILT-IN SEARCH ENGINE

It would seem that using the search engines that are packaged with web servers being sold today would be a no-brainer. With some servers, the search engine is set up to start indexing as soon as the server software is installed. Why then, would you even consider using another search engine? It should all come back to user needs. Does the search engine that you have installed as part of your server package do the job for your users?

3.1 Product Functionality

Three of the most common server packages being installed today include Novell’s intraNetWare, Microsoft’s NT Server and Netscape’s SuiteSpot Server. Each comes with a built-in search engine including intraNetWare’s QuickFinder, NT Index Server, and Netscape Compass Server (powered by Verity SEARCH ’97 with extensions). The following table provides a comparison of each product’s capabilities. This limited list focuses on the features most often asked for by end users. Other comparisons that show ease of administration, index size and other criteria important to Intranet administrators are not included in the scope of this paper.

Functions

QuickFinder

Index Server

Compass Server

Searching

Boolean operators

Yes

Yes

Yes

Natural language

No

Yes

No

Phrase searching

Yes

Yes

Yes

Proximity searching

Yes

Yes

yes

Concept searching

No

Yes

No

Query by example

No

No

No

Wildcards

Yes

Yes

Yes

Exact match

Yes

No

Yes

Fuzzy match

Yes, limited

Yes, limited

Yes, limited

Numeric operators

No

Yes

No

Range operators

No

Yes

Yes

Fielded/Metadata searches

Yes

Yes

Yes

Select specific directories/files to search

Yes

Yes

Yes

Search across servers

No

No

Yes

Browse categories

No

No

Yes

Indexing

Document types supported ASCII, Ami Pro, PDF, HTML, MS Word, MS Excel, OLE, Presentations, Quattro Pro, RTF, Unicode, Wordperfect HTML, ASCII, ASP, all MS Office, PDF HTML, MS Word, MS Excel, RTF, PDF, Wordperfect

Multiple languages

Yes

Yes

Yes

Stop words

No

Yes

No

Incremental indexing

Yes

Yes

Yes

Dynamic indexing

Yes

Yes

Yes

Results Display

Relevance ranking

Yes

Yes

Yes

Metadata can be displayed

Yes, Author and Abstract

Yes, any valid tag

Yes

Document title

Yes

Yes

Yes

Date/time last revised

Yes

Yes

Yes

Size of file

Yes

Yes

yes

URL

Yes

Yes

Yes

Hit highlighting

No

Yes

No

Server platforms Can be installed on a 386-based PC or above. Windows NT Window NT, Solaris, HP-UX, AIX, Digital Unix, IRIX
Web servers supported Novell Web server IIS Any
Pricing (approximate cost for full server package that includes search engine. 50 user license) With intraNetWare – $4995 With NT Server 4.0 – $2500 With SuiteSpot – $7000

Upon first glance at this chart, it would seem that the only difference in the products is the price. A few things stand out for each, however, making them unique. Microsoft’s Index Server can be configured to provide hit highlighting allowing users to easily see the term for which they are searching. It also provides concept searching and stopword functionality, while the others do not. Netscape’s Compass Server’s strong points are its ability to search across servers and the category browsing it provides. Compass Server also has some filtering options that allow individuals to customize the content they receive with their browsers. Novell’s Quickfinder has few unique features except it appears to handle more file types than the other two search engines.

3.2 Comparisons to other products

With so many similarities, how does one make a decision? It’s not easy. If what you need is a simple search function, it would be wise to consider implementing the search engine that comes with your server. It will keep life simple and in reviewing the chart above, the functionality needed in a simple search application is provided by all three search engines reviewed. If, however, you are interested in providing your end users with additional functionality that third party vendors could provide, then you may want to consider other options. Your choices then would include products like AltaVista Search Intranet eXtension 97, I-Search 3.0, Verity’s Search 97, OpenText’s Livelink Intranet, InMagic’s. DB/Text WebPublisher, Information Dimension’s Basis Webserver, Lotus Notes Dominoe, etc. The list of commercial web search products is large (and continues to grow).

However, as you examine other vendor products, keep in mind that some of the third party search engines being sold today are no more powerful a search tool than the server’s reviewed above. Kevin Railsback in Internet Computing’s article “Serving up Quality Searches” picked Netscape Compass Server in a comparison with Index Server and some of the third party products listed above. His selection was based on its ease of use, full support of most major operating systems, reporting capabilities and other factors.

Still, there are good reasons to go with third party vendor’s including the need to implement:

  • Enhanced document management functions including versioning, check-in, check-out, etc.
  • Inclusion of a thesaurus function for use in validation and retrieval.
  • Additional search functions like natural query language searching or query by example.
  • Management of additional file types not listed above.
  • Development of collaborative environments.
  • “Pushing” new resources to the end user.

4. CONCLUSION

Search engines are the mortar of the Intranet. Without a search engine the content will crumble and give way. As important as they are, their implementation must be given high priority with the necessary time allotted for research and development. As end users become more sophisticated in their use of the Intranet, developers will need to be ready to provide them with the tools they need to efficiently find what they need to do their work. If you are in the midst of developing the requirements for your Intranet and have not placed importance on the search engine, now is the time to do so.

  • Create a team of end users and IT staff to conduct a needs analysis.
  • Talk to other companies and organizations to see what they are using for their search engines.
  • Attend trade shows and talk to vendors about their products.
  • Read the literature that reviews search engines.
  • Compile a list of possible products.
  • Compare the functionality of each product to the criteria you developed through needs analysis.
  • Narrow your list down to three possible products.
  • Spend additional time learning about each product.
  • Invite the vendors in for demonstrations.
  • Ask for references and follow up with each reference given.
  • Select product and implement.
  • Follow up with end users. Are they content with how the search engine is working? Are there expectations that were never met? Have their needs changed since the analysis?
  • Continue an on going review with end users.

5. REFERENCES

  1. Columb, Todd E. Beyond UNIX. PC Magazine Online, 1997.
  2. Hibbard, Justin. Applications–Straight Line to Relevant Data–Customized Content Should Slash Intranet Search Time. Information Week, November 17, 1997.
  3. Nance, Barry. Internal Search Engines Get You Where You Want To Go. Network Computing, October 8, 1997.
  4. Railsback, Kevin. “Serving Up Quality Searches–Six Server-based Packages for Adding Search Capability to a Website.” Internet Computing, February 16, 1998.
  5. Stern, Morgan & Tom Rasmussen. Building Intranets on NT, NetWare and Solaris : An Administrator’s Guide. San Francisco, CA : Network Press, 1997.
  6. Sonnenreich, Wes & Tim MacInta. Web Developer.Com Guide to Search Engines. John Wiley & Sons, 1998.
  7. Sullivan, Danny. “Search Engine Solutions for Your Site–Make Your Site Easy to Search with an Assortment of Features and Techniques.” NetGuide, December 1, 1996.
  8. Swank, Mark & Drew Kittel. Designing and Implementing Microsoft Index Server, Indianapolis, IN: Sams.net Publishing, 1997.

    © 1998, Information Today, Inc. Reproduced with permission of the publisher: Information Today, Inc. 143 Old Marlton Pike, Medford, NJ 08055-8750 Phone: 609-654-6266 — FAX: 609-654-4309

Posted in: Features, Intranets, Search Engines