Search Engines


Introduction

The research and development of internet search engines has taken leaps and bounds over the last 2-3 years with the appearance of "searchbots" i.e., computer robots, sometimes called web crawlers and metasearch engines, which roam the world wide web relentlessly collecting and indexing data as directed by their own individual program. The older search engines such as Altavista, Yahoo! Excite, etc., rely on less sophisticated manual methods of data collection, but it is the basic accuracy of the collected data that is imperative. It does not matter how advanced a search engine may be, the quality of the search result is dependent on the quality of the data collection input.

It is these data banks that we interrogate, via a small query box, when we seek information from the internet.

Choices

So which search engine, or engines, do we use to get our information as efficiently as possible? Not an easy choice.

Firstly, browsers such as Navigator and Internet Explorer usually give a choice of engines: Altavista, Yahoo!, etc., to get you started. No doubt the next generation of browsers will have some of today's newer search engines on offer.

Secondly, we may hear of a "super new browser" (flavour of the month) from a friend, or on a mailing list, or read a revue in a magazine, or find one on a magazine cover CD.

If you spend a lot of time searching for information on the internet then web crawler search engines will probably give you a good orientation and allow you to develop a more acute search profile which gives you more gold and little dirt per search. You will probably also be able to reduce the number of data bases searched. Search engines such as Copernic 99 list the engines used and the number of hits made in the search, so if several give no hits or only one or two of little relevance, they can be removed from follow-up searches, thus saving time and money.

 

Quality

The quality of the search - the ratio of gold to dirt - depends on the quality of the query. However, in many cases you are not given any directions on how to phrase, or formulate your query. At least, not on the query page. When your query comes back with 473,219 hits you may be offered a chance to "refine query"! The query "operators", such things as "AND" and "OR" can usually be found by delving into the pages below the query page. The effective use of operators markedly improves the retrieval ratio of gold to dirt.

All search engines operate differently; you have to learn the systems of those you decide to use in order to get the best out of them. It is not always easy.

Some Examples

You would think that if I wanted to search for references to the "Cactus and Succulent Society of NSW" I would type into the query box the six words inside the quotes and up would come my answer - all golden hits! Not necessarily so. Firstly, all search engines ignore connecting words - and, of, the, with, etc. Secondly, some engines would then search for the occurrence of any of the four single words in a document heading, producing a very large number of hits. Other engines will automatically 'link' the four words - they are searched for as a group - and consequently produce a very much smaller number of hits.

The results for some search engines (the home page for the CSS of NSW is on the Cactus-Mall):

PlanetSearch: <www.PlanetSearch.com>

The four words: cactus succulent society nsw, gave 497,533 hits deleting "society" and submitting the other three words gave 7307 hits

GoTo.com: <www.GoTo.com>

The three words gave six hits - home page, first

Google!: <www.Google.com>

The three words gave 55 hits, first and second hits, and listed all four linked pages.

Google! was the only search engine that found the link to CSS of NSW in the Queensland Succulent Society web site. Google! among other things, and unlike other engines, looks for linked pages.

Yahoo!: <www.Yahoo.com>

26 hits, home page

Hotbot: <www.HotBot.com>

19 hits, first, home page

Copernic 99: <www.Copernic99.com>

29 hits, first, found the five pages on Cactus-Mall (Google! is one of the engines.)

Others meta search engines worth examining are Dogpile, Alexa and Web Ferret.

****************************

Copernic 99

A personal review

As its name suggests Copernic 99 is new on the scene. It is available in two versions; a freeware version and a pay-for version, Copernic 99 Pro. I run the freeware version (C 99) which can be downloaded from the company site: <www.copernic.com>www.copernic.com

C99 Pro has 21 categories such as, Life Style, Health, Kids, Travel, Science, etc.

When you select a category what you are doing is opting to use a pre-selected group of search engines which Copernic believe to be the best for that particular category of information.

Only four categories are available in C99.

These are the ones, which, I think, would cover the needs of most C&S collectors. They are:

The Web 19 engines

Newsgroups 3 engines

Email Addresses 6 engines

Buy Books 5 engines

You can view all 21 categories from a selection window in C 99, with the extra 17 available in C 99 Pro shown grayed out. They are activated when you decide to upgrade to C 99 Pro.

To search for these 21 categories Copernic uses some 125 search engines - a formidable array!

The engines used by each category are displayed in a screen panel while the search is in progress. Each engine is named and has a sliding indicator showing its progress together with a box showing the number of hits. You can select any or all of the engines in any category.

There are four search options:

Quick Search: Max. of 10 results per engine; total for all engines used, 100 per search

Normal Search: Max. of 20 results per engine; total for all engines used, 200 per search

Detailed Search: Max. of 30 results per engine; total for all engines used, 300 per search

Custom Search: max. of 300 results per engine; total for all engines used, 1000 per search

When running C 99 I find that most of my plant queries are answered by Google! and Altavista. If I wanted to find every mention of a topic then I would use, in separate searches, the engines which, in C 99, returned the greatest number of hits.

Copernic 99 only consults the best search engines! It skims the cream from other search engines which it herds into it's corral. It has numerous document handling features and functions as well as excellent indexing and history functions. It is fast and uses a browser display to present the search results, which are indexed as to relevance. The search key words found are highlighted.

The search results are automatically saved and can be browsed later. If you see a result you would like to examine just click on the link.

When you enter search words (key words) into a search engine that engine applies it's own default search settings before it proceeds. Maybe it treats them as isolated words, as PlanetSearch did, or it ties them together as a group as did Google!, Hotbot and others. Each engine will provide you with the option to refine their searching by selecting "operators" to assign to each word, or all-words-together, such as a phrase, to define the way your search will be conducted. These will be found by looking for such things as help, refine, or options buttons. You will be told how to add the operators "and, or, except, +, -, etc." to your key words to refine, i.e., reduce the number of hits.

The Email category in C 99 has been useful for me on several occasions. At the bottom of the displayed results you are offered an option to search public records (only in the USA) if the search did not have the address you wanted. I noted two addresses that I knew were several years old and no longer used. There must be many discarded email addresses on the internet. We need a search and destroy web crawler!!

The Buy Books category also saves a lot of time in checking who has the book in question. Otherwise it takes ages to get to a query box only to find they don't list your book. I looked for "Mesembs of the World" no hits although a second choice offer is made with titles including "world". I ran the same search on Amazon, and had the offer of some 9500 titles including "world", I declined. However, I was very successful seeking a Tillandsia Handbook. It was top of the pile and I linked directly to the site! It saved a lot of time.

And that is what Copernic 99 does so well, it saves time. It saves you logging in to every URL one after the other, it saves waiting for slow downloads of pages of stuff you must wade through to get to the query box. I have found it an efficient tool in providing useful answers to my queries.

If you do not have a pet search engine give Copernic 99 a try. Different search engines have different strengths. There is bound to be one or more to suit your needs - it is only a matter of finding them. The trouble is that the performance of search engines keeps improving and "flavour of the month" keeps changing.

*************************************

Google!

Google! is a new generation search engine. Its stated aim is to increase the relevancy of results. It was created at Stanford University where it is "research in progress", indeed, the program is presented as "beta". It is contacted at: <www.google.com>www.google.com

In a recent article about Yahoo! the author wrote:

"My first choice these days when trying to find something on the Internet is

a tiny little site called Google. Google prioritizes search results by how

many other pages it has found that link to the page in question, and thus

can roughly judge how important or authoritative the rest of the people

making up the Internet consider that page to be. The results have been

consistently good enough to make me switch."

 

When you logon you are immediately presented with the query box in a very simple page.

To use the search system, enter your search words or phrase. If more than one word is entered Google! searches for the phrase first and then the individual words.

It is awesome how Google! is able to find linked pages. Most of my queries have been botanical or horticultural. In each case there has been gold either close to or at the top. Indeed along side the Search Now button there is a button "I'm feeling lucky" which takes you to the first result without showing the rest. It works, often, particularly if you are looking for something specific!!!

Otherwise results are presented in browser form, ranked, with title, URL and a cache, which when clicked presents the file that Google! found when it visited the file and catalogued it (this can avoid that annoying "Error 404" when you try to logon to the URL). Clicking on the ranking or relevance bar brings up all the cited - linked - pages for that file! Results can be presented up to 100 at a time. Google! is new, different, very fast, accurate and must be experienced.

Google! is the default search engine used by Netscape. Interestingly you access it from the Communicator window by typing into the Location Box: the word search, followed by, say, Agave stricta The results lack the ranking and cache features of the direct google.com and come in lots of 10, no option.

Google! is simple and very efficient. An experience you will not forget.

May the search be with you!

 

Russell Johnstone

In the Southern Highlands

New South Wales

Australia


Hits per search

Cyphostemma

Cyphostemma elephantopus

Uncarina

Bursera

Hydnophytum

AltaVista

10

10(?)

10

10

10

Copernic99

61

15

58

71

53

Dogpile

68

20(?)

62

562

41

Excite

36

92(?)

20

229

14

Go2Net

39

3

27

41

33

Google

81

9(?)

66

641

28

GoTo

10+

3

10+

10+

10+

HotBot

9

2

14

190(?)

10

Infoseek

35

4

26

527

12

Looksmart

100

3

73

200

46

Lycos

10+

2
10+

1

7

Mamma

49

5

45

46

38

MetaFind

13

0

23

25+

24

Northern Light

143

2

68

886

48

SavvySearch

29

6

24

33

23

Sherlock (Mac)

0

0

0

0

0

Thunderstone

3

1

3

9

1


Notes:

1. Names selected were those that would be common to Fat Plant searches. Some Search Engines allow for 'Phrase Searches', and with others, the plus (+) sign was used between words.

2. Several Search Engines are Multi (Meta) Search engines.

3. A number with a question mark behind it (?) represents suspect hits.

4. Google Search Engine was by far the fastest!

5. Copernic99 Search Engine is Freeware, and when installed, can be launched from your desk top. A real time saving factor!

6. Northern Lights searches also returned 'Personal Folders', to the users. www.gpdesert.com showed up as a personal folder.

7. Several Search Engines allowed you to customize your searches.

8. Several Search Engines proved to be totally worthless.

9. Search engines biased, out-of-date, and index no more than 16% of the web article.


Fat-Plants Group