Why sample these 200 newsgroups, and not some other set?
The primary criteria for selecting a newsgroup for the sample set is that it has "homogeneous article propagation" on average. This means that most of the posts in the newsgroup are propagated the same as most other posts in the newsgroup.
It is not necessary for the newsgroup itself to be propagated the same as other newsgroups. (In fact, one server may not even carry the newsgroup.) But if the newsgroup is carried, there must not be something about the posts which cause individual articles to be treated differently.
To better understand this description, selection criteria and the desired aspects are discussed below.
There are a few newsgroups in the sample which don't fit these criteria, but there is usually a good reason to compare and report them. "news.announce.newgroups" has sporadic traffic, but it is important for admins to track what they are missing. "misc.lists.filters" isn't usually read by too many people but the robot cancellers. But it is a great test of a high traffic newsgroup which doesn't get much spam.
- - relatively constantly popular. 10-300 posts per day.
There is a practical reason for a low limit. Every newsgroup gets an occasional "spam"
or "MMF" posting. By nature, these are handled differently by different sites, and may be
counted on one site, but not on another. When there are 20 posts daily in a newsgroup,
and one spam comes through, it can throw off the comparison by 1 in 20, or 5%. If a
newsgroup gets only 2 posts daily, and a spam comes through, it can cause a difference
of 33%. (This violates the "homogeneous" principle, since 33% of the articles (1 article)
were handled differently than the other articles in that newsgroup.)
Efforts are made to keep the sample set full of higher traffic newsgroups, but
occasionally there are "lulls" in newsgroups, or the popularity changes. When there
are fewer than 5 articles expected, that newsgroup is not used in the letter grade or GPA
Newsgroups which are only briefly popular due to current events are not sampled. They
are likely to die out. (A comparison of these "briefly famous" newsgroups may be
reported separately in the future.)
- Newsgroups people care about.
When a newsgroup is in the sample set, there is no need to generalize the data to
estimate performance. The value itself can be used.
Readership statistics and posting statistics are used in selecting the sample set, but
the sample set is not the newsgroups which get the most posting, or even the most
Groups which get the most postings are generally "spam magnet" or "autopost robot"
groups, such as misc.jobs*, or alt.forsale. Newsgroups with the most readers are
alt.binaries.pictures*, which are also spam newsgroups, and get large binary
postings. Such newsgroups have characteristics which don't make them good
candidates for comparison. See My server has 30,000 newsgroups. You only report 200.
for an explanation.
Controversial newsgroups have two undesirable characteristics. They experience "flare
ups" from time to time where the number of posts per day rises sharply and then falls.
This can be due to valid discussion, or flaming. This daily variation can influence posts
from day to day. Secondly they also tend to get "rogue" third-party cancels, which are
handled differently at some sites.
- Not a spam target.
Robot cancellers and automatic filters will reject the spam at some sites, but not others.
A comparison of these "spam target" newsgroups may be reported separately in the
future. This may allow filter developers to "tune" their filters to agree.
- Appropriate article length
Some newsgroups get binary postings when they should not. These are handled
differently at different sites. Some newsgroups get very long binary postings which are
simply not propagated due to article size cutoffs.
(There are some non-spam, well propagated alt.binaries newsgroups, believe it or not!)
The sample set will change from time to time to meet these and other goals.
View an example report (includes the list of newsgroups sampled.)
What if some sample newsgroups are not on the server?
My server has 30,000 newsgroups. You only report 200.
Can an estimate be made for newsgroups not sampled?
How do I signup?
Up to The newsrAte RKT
Up to newsrAte home
Up to Mib Software home
Copyright 1998, Forrest J. Cavalier III, Mib Software
INN customization and consulting