lotus

previous page: 06  Step Three: Know You Can Ask (Effective Information Searching)
  
page up: Information Research FAQ
  
next page: 08 The Internet Format (Information Research) part 2

07 The Internet Format (Information Research) part 1




Description

This article is from the Information Research FAQ, by David Novak david@spireproject.com with numerous contributions by others.

07 The Internet Format (Information Research) part 1

As Shakh became more proficient with writing, father wrote more
frequently of the family deity. Horus, the falcon god, had long watched
over his family. Horus sees all, his father would write, and even
across the many miles separating you from us, Horus will watch over you
and keep you close. It was a great comfort to Shakh to have the family
deity looking after him.

Shakh too devoted himself to a life of watching and knowing.

- - - - - - - - - - - - - -

We have discussed how information comes packaged in certain
standardized formats like books, articles or news clips. Each format
has particular qualities and standards that reflect the way the
information is prepared. For example books are dense, factual,
comprehensive and a minimum of 6 months to a year old.

So how can we apply this newfound wisdom to the internet?

Let's start at the beginning. The internet is an inexpensive and
pervasive system for the delivery of data. It is also the medium of a
dramatic shift in the way we access information.

A (1) dramatic drop in the cost of publishing is fuelling (2) the
liberation of information from previously closed systems, leading to
(3) an emergence of alternative funding for certain public resources
and (4) an eagerly awaited 'direct to consumer' commercial information
industry.

The first mental knot to untie is the separation of internet resources
into distinct formats. Electronic books share most of the qualities of
books published on paper. News stories found on the web share all of
the qualities of news in your local newspaper. The fact they are
electronic or appear as webpages has nothing to do with it. News is
news. Electronic books are almost books.

But if online news is news, and online books are almost books, and both
are not internet formats, what is an internet format?

The search-by-format method is a concept to simplify and understand the
many information resources which exist in the world. The concept is
only as valuable as it is successful at enlightening us. As to the
internet, we have more to learn, but could safely divide the internet
into several formats at this time, perhaps webpages, online discussion
and ftp resources. Yet this is largely superficial. The real value
comes from understanding the qualities of different types of webpages.
We shall divide the webpage format further.

Must we really learn this?
You would be pardoned for equating searching and the internet. Much of
the hype surrounding internet search tools builds the illusion that the
skill of searching can somehow be distilled computationally then
delivered to you electronically. Through the wonders of modern science,
you can have the best information at your finger tips without having
learn anything of search technology.

This is a pervasive lie (or marketing fiction). The electronic research
industry has been around for decades and has worked on this problem for
some time. No upstart internet guru has invented a technique to
suddenly transform the search process. Such thinking would work in
section two (Searching is Easy) but is the first illusion we must
shatter for you to progress.

Case in point, Lycos and All-the-Web search engines use the same
database of webpages. This database is growing rapidly, it stood at
350,000,000 webpages in June 2000 and hopes to reach one billion
webpages by the end of 2001. It stands as a grand achievement in
organization, right?

Wrong. Years ago I was using a unified database of news called Global
Textline (no longer available but replaced by others). It had an
astounding four billion news articles available for advanced text
searching! Four billion news items, representing many years of news
from all over the world. This was superficially 10 times the size of
the current All-the-Web search engine.

No, the internet does not even hold the record for being the largest
information field. Oh, it will surely surpass the quantity of
commercial information, and superficially we could say it may already
have achieved this. But the internet is not a new medium for
information research. It is emerging as a new resource, not a new
phenomenon.

The internet is a new medium for business - most businesses have never
incorporated the immediacy or global nature of internet involvement, so
considerable rethinking is required. The internet is a new medium for
publishing for almost all of us; very few of us published
electronically before the internet emerged. The internet is NOT a new
medium for research. Information researchers have been working
electronically for years. The internet is just a new resource we can
reach for with strengths, weaknesses and peculiar traits we must
appreciate.

By way of an example, let us compare Link Analysis as used in Google
and Raging (of Altavista) with the process of editorial vetting as used
in scientific journals.

Through the magic of link analysis, we can make certain assumptions
about the value of a webpage by adding up the number of other pages
linking to that page. In its simplest form, webpages with at least 100
inbound links from other websites are judged to be quality, valuable
resources. A webpage without any inbound links has the suspicion of
being of poorer quality. After all, no one has thought it valuable
enough to add a link to their further resources page.

This logic has some serious shortcomings. Firstly, the process rewards
long-term projects that have been online long enough to earn links. A
brilliant new webpage would have few links - yet. It would be ranked
poorly, undeservedly. Secondly, link analysis rewards websites over
webpages. The pages with the most links are often homepages. Rating
homepages over second level webpages works at odds to keyword
searching. Our keywords will be found in specific, perhaps second-tier
webpages. Links go to the top level. Thirdly, link analysis is a mass
market, popular technique. You are banking on the intellectual finesse
of a mass of mindless computer users much like yourself. It is the same
kind of popular democratic selection that votes B-grade actors into the
presidency.

Let's contrast this with the process of editorial vetting used in
scientific journals. Each article is reviewed by a selection of
knowledgeable peers who understand the topic is great depth. Each
article is further improved by the editing of the journal editors, and
by self-editing, for there is great competition and prestige at stake.
Only a handful of the many submissions are judged worthy and appear in
the printed journal. Success places the successful in the standard of
record; stamped with an external statement of truth and importance.

Of course, the logic of editorial vetting also has shortcomings.
Firstly, the process is time and effort intensive. Many of the most
important journals will delay six months or more between submission and
publication. In our digital era this is increasingly unacceptable.
Secondly, the number of submissions accepted are at odds with the pace
of development. So much more happens in the world than can be digested
in this manner. Thirdly, editorial vetting supports the clannish
behavior leveled against the upper echelons of science. New and novel
developments have difficulty floating to the top if the peer review
process should not be open to new ideas.

If link analysis is popular and democratic, editorial vetting is
elitist and autocratic. Both approaches have pros and cons.

Once you have absorbed the drama between link analysis and editorial
vetting, please do not retain the belief that your search needs will be
completely solved for you. Searching is a complex, overgrown garden and
its time to get your hands dirty.

So what does the internet have to do with searching?
The internet changes searching in two ways. Firstly, the webpage is a
new format to contend with.

"Webpages are often of unknown age, of only guessed at quality and
potentially the easiest information to retrieve. There are many points
of entry to web resources but search tools differ. Try to match your
search tool to your question."
(See http://spireproject.com/webpage.htm)

The internet is also a conduit to many of the pre-existing tools for
searching other formats (books, news, interviews).

With an internet connection, we can reach database retailers and many
commercial quality databases like LOCOC, ERIC, MOCAT and AGIP directly
from the source. We can also remotely search the catalogue of most
libraries in the world. These are not new resources, just new ways to
reach them.

In this day of interconnectivity and change, it is too tempting to
declare the information industry is in rapid flux. Everything I have
learned suggests this is not so. There are some changes associated with
new channels but by and large the process of searching for information
remains the same.

Let's look briefly at news as an example. News articles are written by
the reporter, sold to international newswires which then distribute
these stories to interested newspapers and news channels, that
incorporate the news into your newspaper or evening TV news.

Journalist - Newswire - Newspaper/News show - You.

News would also be added to commercial databases of past news. These
databases are then provided to database retailers like Dialog or
Lexis-Nexis who sell occasional access to you.

Journalist - Newswire - Commercial Database - Database Retailer - You.

With the internet, newswires have also provided their text news to
online sites. Text news is thus available for you to browse or search.

Journalist - Newswire - Internet News Sites - You.

I draw your attention to several facts. The fundamental nature of the
industry has not changed. Journalists and newswires still impart upon
the news the same nature as before. It is short, shallow, immediate. It
is created to journalistic standards.

 

Continue to:













TOP
previous page: 06  Step Three: Know You Can Ask (Effective Information Searching)
  
page up: Information Research FAQ
  
next page: 08 The Internet Format (Information Research) part 2