[Previous][Up to Table of Contents] [Next]
InterNetNews...[Salz92]: Usenet Software
In addition to InterNetNews, there are two major Usenet packages available for UNIX sites. All three share several common implementation details. A newsgroup name such as comp.foo is mapped to a directory comp/foo within a global spool directory. An article posted to a group is assigned a unique increasing number based on a file called the active file. If an article is posted to multiple groups, links are used so that only one copy of the data is kept. A sys file contains patterns describing what newsgroups the site wishes to receive, and how articles should be propagated. In most cases, this means that a record of the article is written to a ``batchfile'' that is processed off -line to do the actual sending.
The first Usenet package is called B News, also known as B2.11. The B news model is very simple: the program rnews is run to process each incoming article. Locking is used to make sure that only one rnews process tries to update the active file and history database. At one site that received over 15,000 articles per day, the locking would often fail so that 10 to 100 duplicates were not uncommon. Because each article is handled by a separate process, it is impossible to pre-calcuate or cache any useful data.
More importantly, file I/O had become a major bottleneck. A site that feeds 10 other sites does over 150,000 open/append/cl ose operations on its batchfiles. It is generally agreed that B news cannot keep up with current Usenet volume; it is no longer being maintained, and its author has said more then once that the software should be considered ``dead.''
C News gets much better performance then B news by processing articles in batches [Collyer87]. The relaynews program is run several times a day to process all the articles that have been received since the last run. Since only one relaynews program is running, it is not necessary to do fine-grain locking of any of the supporting data files. More importantly, it can keep the entire active and sys file in memory. It can also use buffered I/O on its batchfiles, reducing the amount of system calls by one or two orders of magnitude.
An alpha version of C News was released in October, 1987. Within four years it surpassed B news in popularity, and there are now more sites running C News then ever ran B news.
From the beginning, the NNTP reference implementation was layered on top of the existing Usenet software: an article received from a remote NNTP peer was written to a temporary file and theappropriate rnews or relaynews program processed it. In order to avoid processing an article the system already has, it first does a lookup on the history database to see if the article exists. It soon became apparent that invoking relaynews for every article lost all of C News's speed gain, so the NNTP package was changed to write a set of articles into a batch, and offer the batch to relaynews.
When articles arrive faster then relaynews can process them, they must be spooled. If two sites (B and C in the previous examples) both offer a third site (D) the same article at the ``same time'' then an extra copy will be spooled, only to be rejected when it is processed, wasting disk space; this problem multiplies as the number of incoming sites increases.(This is quite common for Internet sites, where redundant fast newsfeeds are common and where many Usenet administrators seem to be avid players of the ``exchange news with as many other people as possible'' game.)
To alleviate this problem, most sites run Paul Vixie's msgidd , a daemon that keeps a memory -resident list of article Message -ID's offered within the last 24 hours. The NNTP server is modified so that it tells this daemon all of the articles that it is handing to Usenet and queries the daemon before telling the remote site that it needs the article. This is not a perfect solution -- if the first, spooled, copy of the article is lost or corrupted, the site will likely never be offered the article after the msgidd cache entry has expired. Going further, msgidd is work -around for a problem inherent in the current software architecture.
Other problems, while not as severe, lead to the conclusion that a new implementation of Usenet is needed for Internet sites. For example:
- Since all articles are spooled, relaynews cannot tell the NNTP server the ultimate disposition of the article, and the server cannot tell its peer at the other end of the wire. This hides transmission problems. For example, a site tracing the communication has no way of finding out an article was rejected because the remote site does not receive that particular set of newsgroups.
- The NNTP reference implementation is showing signs of age. Maintaining the server is becoming a maintenance nightmare; over one-tenth of its 6,800 lines are #ifdef related.
- All articles are written to disk at extra time. Disks are getting bigger, but not faster, while CPU's, memory, and networks are.
InterNetNews(Salz) [Source:"InterNetNews: Usenet transport for Internet sites"] [Copyright: 1992 Rich Salz]
InterNetNews(Salz)[Previous][Up to Table of Contents] [Next]
This is from the Mib Software Usenet RKT
See also the topic in the Usenet RKT Usenet Software before INN
RKT Rapid-Links:[Search] [RKT Tips] Path:Usenet RKT / InterNetNews...[Salz92] / 0018.htm