InterNetNews...[Salz92]: Introduction

Usenet is a distributed bulletin board system, built as a logical network on top of other networks and connections. By design, messages resemble standard Internet electronic mail messages as defined in RFC822 [Crocker82]. The Usenet message format is described in RFC1036 [Adams87]. This defines some additional headers. It also limits the values of some of the standard headers as well as giving some of them special semantics.

Newsgroups are the classification system of Usenet. The required Newsgroups header specifies where a message, or article, should be filed upon reception. Sites are free to carry whatever [transliteral]groups they want. Most sites carry the core set of so -called ``mainstream'' groups.
There are currently about 730 of these groups, and one or two new ones is created every week.

Messages generated at a site are sent to the site's ``neighbors'' who process them and relay them to their neighbors, and so on. Sites can be interconnected -- indeed, on the Internet, this is quite common. See Figure 1.

Figure 1 : Small Usenet topology (all links are two -way).

The Path header is used to prevent message loops. For example, an article written at A could get sent to B , D , C , and then back to A. Before propagating an article, a site prepends its own name to the Path header. Before propagating an article to a site, the receiving host checks to make sure that the site that would receive the article does not appear in the Path line. For example, when the article arrived at site C , the Path would contain A!B!D , so site C would know not to send the article to A.

Sites also keep a record of the Message -ID's of all articles they currently have. If D receives an article from B , it will reject the article if C offers it later. For self -protection, most sites keep a record of recent articles that they no longer have. This is very useful when another site dumps a (usually quite large) batch of old news back out to Usenet.

For the past few years, the amount of data generated by Usenet sites has been doubling every year. A site that receives all the mainstream groups is receiving over 17 megabytes a day spread out over 11,000 articles [Adams92]. About 20% of the data is article headers, and while all of them must be scanned only half of it is must be processed by the Usenet software. (Yes, this means that, as far as the software is concerned, Usenet is over 90% noise.)

The number of sites participating in Usenet has been growing almost as quickly. Based on articles his site receives and survey data sent in by participating sites, Brian Reid estimates that there are 36,000 sites with 1.4 million participants [Reid91]. A ``sendsys'' message to the ``inet'' distribution in June of 1989 received about 200 replies in the first twenty -four hours. A year later, nearly 700 replies were received. (Sendsys is a special article that asks all receiving sites to send back an email message, usually without human intervention; by convention, inet is primarily the set of sites on the Internet.)

The NNTP protocol is defined in Internet RFC 977 [Kantor86] published in February, 1986. This was accompanied by the general public release of a reference implementation, also called ``nntp.'' This has been the only NNTP implementation that is generally available to UNIX sites.

InterNetNews(Salz) [Source:"InterNetNews: Usenet transport for Internet sites"]
The usage statistics and bandwidth estimates are hopelessly outdated. The growth rate continues to be very high. Typical estimate for June 1997 is 300K articles per day, more than 2Gbyte of data. There are over 20,000 newsgroups.

