InterNetNews... Salz

The NNTP protocol is defined in Internet RFC
977 [Kantor86] published in February, 1986. This
was accompanied by the general public release of a
reference implementation, also called ``nntp.'' This
has been the only NNTP implementation that is gen-
erally available to UNIX sites.

Usenet Software

In addition to InterNetNews, there are two
major Usenet packages available for UNIX sites. All
three share several common implementation details.
A newsgroup name such as comp.foo is mapped to a
directory comp/foo within a global spool directory.
An article posted to a group is assigned a unique
increasing number based on a file called the active
file. If an article is posted to multiple groups, links
are used so that only one copy of the data is kept.
A sys file contains patterns describing what news-
groups the site wishes to receive, and how articles
should be propagated. In most cases, this means that
a record of the article is written to a ``batchfile'' that
is processed off -line to do the actual sending.

The first Usenet package is called B News, also
known as B2.11. The B news model is very simple:
the program rnews is run to process each incoming
article. Locking is used to make sure that only one rnews
process tries to update the active file and his-
tory database. At one site that received over 15,000
articles per day, the locking would often fail so that
10 to 100 duplicates were not uncommon. Because
each article is handled by a separate process, it is
impossible to pre -calcuate or cache any useful data.

More importantly, file I/O had become a major
bottleneck. A site that feeds 10 other sites does over
150,000 open/append/cl ose operations on its
batchfiles. It is generally agreed that B news cannot
keep up with current Usenet volume; it is no longer
being maintained, and its author has said more then
once that the software should be considered ``dead.''

C News gets much better performance then B
news by processing articles in batches [Collyer87].
The relaynews program is run several times a day to
process all the articles that have been received since
the last run. Since only one relaynews program is
running, it is not necessary to do fine -grain locking
of any of the supporting data files. More impor-
tantly, it can keep the entire active and sys file in
memory. It can also use buffered I/O on its
batchfiles, reducing the amount of system calls by
one or two orders of magnitude.

An alpha version of C News was released in
October, 1987. Within four years it surpassed B
news in popularity, and there are now more sites
running C News then ever ran B news.

From the beginning, the NNTP reference
implementation was layered on top of the existing
Usenet software: an article received from a remote
NNTP peer was written to a temporary file and the

appropriate rnews or relaynews program processed it.
In order to avoid processing an article the system
already has, it first does a lookup on the history
database to see if the article exists. It soon became
apparent that invoking relaynews for every article
lost all of C News's speed gain, so the NNTP pack-
age was changed to write a set of articles into a
batch, and offer the batch to relaynews.

When articles arrive faster then relaynews can
process them, they must be spooled. If two sites (B
and C in the previous examples) both offer a third
site (D) the same article at the ``same time'' then an
extra copy will be spooled, only to be rejected when
it is processed, wasting disk space; this problem
multiplies as the number of incoming sites
increases.(2)

To alleviate this problem, most sites run
Paul Vixie's msgidd , a daemon that keeps a
memory -resident list of article Message -ID's offered
within the last 24 hours. The NNTP server is
modified so that it tells this daemon all of the arti-
cles that it is handing to Usenet and queries the dae-
mon before telling the remote site that it needs the
article. This is not a perfect solution -- if the first,
spooled, copy of the article is lost or corrupted, the
site will likely never be offered the article after the
msgidd cache entry has expired. Going further,
msgidd is work -around for a problem inherent in the
current software architecture.

Other problems, while not as severe, lead to the
conclusion that a new implementation of Usenet is
needed for Internet sites. For example:

  • Since all articles are spooled, relaynews can-
    not tell the NNTP server the ultimate disposi-
    tion of the article, and the server cannot tell
    its peer at the other end of the wire. This
    hides transmission problems. For example, a
    site tracing the communication has no way of
    finding out an article was rejected because the
    remote site does not receive that particular set
    of newsgroups.

  • The NNTP reference implementation is show-
    ing signs of age. Maintaining the server is
    becoming a maintenance nightmare; over
    one-tenth of its 6,800 lines are #ifdef-related.

  • All articles are written to disk at extra time.
    Disks are getting bigger, but not faster, while
    CPU's, memory, and networks are.

InterNetNews architecture

There are four key programs in the InterNet-
News package (see Figure 2):
  • Innd is the principal news server for incoming
    newsfeeds;

___________________________________
2 This is quite common for Internet sites, where
redundant fast newsfeeds are common and where many
Usenet administrators seem to be avid players of the
``exchange news with as many other people as possible''
game.

Summer '92 USENIX -- June 8 -June 12, 1992 -- San Antonio, TX