usefor-article-07 May 2002

[< Prev] [TOC] [ Next >]
5.5.  Newsgroups

   The Newsgroups-header's content specifies the newsgroup(s) in which
   the article is intended to appear. It is an inheritable header
   (4.2.5.2) which then becomes the default Newsgroups-header of any
   followup, unless a Followup-To-header is present to prescribe
   otherwise.  Articles MUST NOT be passed between relaying agents or to
   serving agents unless the sending agent has been configured to supply
   and the receiving agent to receive at least one of the newsgroup-
   names in the Newsgroups-header.

   References to "Unicode" or "the latest version of the Unicode
   Standard" mean [UNICODE 3.1] or any standard that supersedes it. That
   document contains guarantees of strict future upwards compatibility
   (e.g. no character will be removed or change classification).
   Implementors should be aware that currently unassigned code points
   (Unicode category Cn) may become valid characters in future versions
   of Unicode. Since the poster of an article might have access to a
   newer version of that standard, relaying and serving agents MUST
   accept such characters, but posting agents (and indeed all agents)
   MUST NOT generate them (though they might well follow up to
   newsgroup-names containing them).

      header              =/ Newsgroups-header
      Newsgroups-header   = "Newsgroups"  ":" SP Newsgroups-content
                          *( ";" other-parameter )
      Newsgroups-content  = [FWS] newsgroup-name
                     *( [FWS] ng-delim [FWS] newsgroup-name )
                     [FWS]
      newsgroup-name      = component *( "." component )
      component           = 1*component-glyph
      ng-delim            = ","
      component-glyph     = combiner-base *combiner-mark
      combiner-base       = combiner-ASCII / combiner-extended
      combiner-ASCII      = DIGIT / ALPHA / "+" / "-" / "_"
      combiner-extended   = <any character with a Unicode code value of
                   0080 or greater and a combining class of 0,
                   but excluding any character in Unicode
                   categories Cc, Cf, Cs, Zs, Zl, and Zp>
      combiner-mark       = <any character with a Unicode code value of
                   0080 or greater and a combining class other
                   than 0>

        NOTE: the excluded characters are control characters (Cc),
        format control characters (Cf), surrogates (Cs), and separators
        (Zs, Zl, Zp). In particular, this excludes all whitespace
        characters.  To all intents and purposes, a component-glyph is
        what a user might regard as a single "character" as displayed on
        his screen, though it might be transmitted as several actual
        characters (e.g. q-circumflex is two characters). Note also
        that, in some writing schemes, several component-glyphs will
        merge into one visible object of variable size.

   Each component MUST be invariant under Unicode normalization NFKC
   (cf. the weaker normalization requirement for other headers in
   section 4.4.1 which specified no more than normalization NFC, and see
   also the explanatory NOTE in that section).

        NOTE: As a result of of this restriction, a name has only one
        valid form. Implementations can assume that a straight
        comparison of characters or octets is sufficient to compare two
        newsgroup-names.

        The requirement that names be invariant under NFKC, rather than
        NFC, means that all characters with a "compatibility
        decomposition" are forbidden (Unicode provides the property
        "NFKC_NO" to make this test easier).  The effect is to exclude
        variant forms of characters, such as superscripts and
        subscripts, wide and narrow forms, font variants, encircled
        forms, ligatures, and so on, as their use could cause confusion.

        There is insufficient experience in this area to determine
        whether this is the right long-term solution. Implementors
        should therefore be aware that a future version of this standard
        might reduce the requirement in the direction of NFC as opposed
        to NFKC.

        NOTE: An implementation is not required to apply NFKC, or any
        other normalization, to newsgroup names. Only agencies that
        create new groups need to be careful to obey this restriction
        (7.2.1).  However, if a posting agent neglects to normalize a
        newsgroup-name entered manually, this may lead to the user
        posting to a non-existent group without understanding why.

   Newsgroup-names containing non-ASCII characters MUST be encoded in
   UTF-8 and not according to [RFC 2047].
   Components beginning with underline ("_") are reserved for use by
   future versions of this standard and MUST NOT occur in newsgroup
   names (whether in Newsgroups-headers or in newgroup control messages
   (7.2.1)).  However, such names MUST be accepted.

   Components beginning with "+" or "-" are reserved for use by
   implementations and MUST NOT occur in newsgroup names (whether in
   Newsgroups-headers or in newgroup control messages). Implementors may
   assume that this rule will not change in any future version of this
   standard.

        NOTE: For example, implementors may safely use leading "+" and
        "-" to "escape" other entities within something that looks like
        a newsgroup-name.

   Agencies responsible for the administration of particular hierarchies
   Ought to place additional restrictions on the characters they allow
   in newsgroup-names within those hierarchies (such as to accord with
   the languages commonly used within those hierarchies, or to avoid
   perceived ambiguities pertinent to those languages). Where there is
   no such specific policy, the following restrictions SHOULD be applied
   to newsgroup names.

        NOTE: These restrictions are intended to reflect existing
        practice, with some additions to accommodate foreseeable
        enhancements, and are intended both to avoid certain technical
        difficulties and to avoid unnecessary confusion. It may well be
        that experience will allow future extensions to this standard to
        relax some or all of these restrictions.

   The specific restrictions (to be applied in the absence of
   established policies to the contrary) are:

   1. The following characters are forbidden, subject to the comments
      and notes at the end of the list:

      characters in category Cn (Other, Not assigned)         [1]
      characters in category Co (Other, Private Use)          [2]
      characters in category Lt (Letter, Titlecase)           [3]
      characters in category Lu (Letter, Uppercase)           [3]
      characters in category Me (Mark, Enclosing)             [4]
      characters in category Pd (Punctuation, Dash)           [4][5]
      characters in category Pe (Punctuation, Close)          [4]
      characters in category Pf (Punctuation, Final quote)    [4]
      characters in category Pi (Punctuation, Initial quote)  [4]
      characters in category Po (Punctuation, Other)          [4]
      characters in category Ps (Punctuation, Open)           [4]
      characters in category Sc (Symbol, Currency)            [4]
      characters in category Sk (Symbol, Modifier)            [4]
      characters in category Sm (Symbol, Math)                [4][5]
      characters in category So (Symbol, Other)               [4]
      [1] As new characters are added to Unicode, the code point moves
from category Cn to some other category. As stated above,
implementors should be prepared for this.

      [2] Specific private use characters can be used within a hierarchy
or co-operating subnet that has agreed meanings for them.

      [3] Traditionally, newsgroup-names have been written in lowercase.
Posting agents Ought Not to convert uppercase or titlecase
characters to the corresponding lowercase forms except under
the explicit instructions of the poster.

      [4] Traditionally newsgroup names have only used letters, digits,
and the three special characters "+", "-" and "_". These
categories correspond to characters outside that set.

      [5] Although the characters "+" and "-" are within categories Pd
and Sm, they are not forbidden.

   2. A component name is forbidden to consist entirely of digits.

        NOTE: This requirement was in [RFC 1036] but nevertheless
        several such groups have appeared in practice and implementors
        should be prepared for them. A common implementation technique
        uses each component as the name of a directory and uses numeric
        filenames for each article within a group. Such an
        implementation needs to be careful when this could cause a clash
        (e.g. between article 123 of group xxx.yyy and the directory for
        group xxx.yyy.123).

   3. A component is limited to 30 component-glyphs and a newsgroup-name
      to 71 component-glyphs. Whilst there is no longer any technical
      reason to limit the length of a component (formerly, it was
      limited to 14 octets) nor of a newsgroup-name, it should be noted
      that these names are also used in the newsgroups line (7.2.1.2)
      where an overall policy limit applies and, moreover, excessively
      long names can be exceedingly inconvenient in practical use.

   Serving and relaying agents MUST accept any newsgroup-name that meets
   the above requirements, even if they violate one or more of the
   policy restrictions. Posting and injecting agents MAY reject articles
   containing newsgroup-names that do not meet these restrictions, and
   posting agents MAY attempt to correct them (but only with the
   explicit agreement of the poster for anything more than NFC or NFKC
   normalization). However, because of the large and changing tables
   required to do these checks and corrections throughout the whole of
   Unicode, this standard does not require them to do so. Rather, the
   onus is placed on those who create new newsgroups (7.2.1) to check
   the mandatory requirements, to consider the effects of relaxing the
   other restrictions, and to consider how all this may affect
   propagation of the group.
   Since future extensions to this standard and the Unicode standard,
   including a possible relaxation of the NFKC normalization, plus any
   relaxations of the default restrictions introduced by specific
   hierarchies might invalidate some such checks, warnings, and
   adjustments, implementations MUST incorporate means to disable them.

      NOTE: The newsgroup-name as encoded in UTF-8 should be regarded as
      the canonical form. Reading agents may convert it to whatever
      character set they are able to display and serving agents may
      possibly need to convert it to some form more suitable as a
      filename. Simple algorithms for both kinds of conversion are
      readily available.  Observe that the syntax does not allow
      comments within the Newsgroups-header; this is to simplify
      processing by relaying and serving agents which have a requirement
      to process this header extremely rapidly.

   The inclusion of folding white space within a Newsgroups-content is a
   newly introduced feature in this standard. It MUST be accepted by all
   conforming implementations (relaying agents, serving agents and
   reading agents).  Posting agents should be aware that such postings
   may be rejected by overly-critical old-style relaying agents. When a
   sufficient number of relaying agents are in conformance, posting
   agents SHOULD generate such whitespace in the form of <CRLF WSP> so
   as to keep the length of lines in the relevant headers (notably
   Newsgroups and Followup-To) to no more than than 79 characters (or
   other agreed policy limit - see 4.5).  Before such critical mass
   occurs, injecting agents MAY reformat such headers by removing
   whitespace inserted by the posting agent, but relaying agents MUST
   NOT do so.

   Posters SHOULD use only the names of existing newsgroups in the
   Newsgroups-header. However, it is legitimate to cross-post to a
   newsgroup(s) which do not exist on the posting agent's host, provided
   that at least one of the newsgroups DOES exist there, and followup
   agents SHOULD accept this (posting agents MAY accept it, but Ought at
   least to alert the poster to the situation and request confirmation).
   Relaying agents MUST NOT rewrite Newsgroups-headers in any way, even
   if some or all of the newsgroups do not exist on the relaying agent's
   host. Serving agents MUST NOT create new newsgroups simply because an
   unrecognized newsgroup-name occurs in a Newsgroups-header (see 7.2.1
   for the correct method of newsgroup creation).

   The Newsgroups-header is intended for use in Netnews articles rather
   than in email messages. It MAY be used in an email message to
   indicate that it is a copy also posted to the listed newsgroups, in
   which case the inclusion of a Posted-And-Mailed header (6.9) would
   also be appropriate. However, it SHOULD NOT be used in an email-only
   reply to a Netnews article (thus the "inheritable" property of this
   header applies only to followups to a newsgroup, and not to followups
   to the poster). Moreover, if a newsgroup-name contains any non-ASCII
   character, it MAY be encoded using the mechanism defined in [RFC
   2047] when sent by email (for which purpose the newsgroup-name SHOULD
   be treated as an encoded-word) but, if it is subsequently returned to
   the Netnews environment, it MUST then be re-encoded into UTF-8. See
   also the further discussion in section 8.8.1.
[< Prev] [TOC] [ Next >]
#Diff to first older
NewerOlder
usefor-usefor May 2005
usefor-usefor April 2005
usefor-usefor November 2004
usefor-usefor September 2004
News Article Format and Transmission May 2004
News Article Format and Transmission November 2003
News Article Format June 2003
News Article Format April 2003
News Article Format February 2003
News Article Format August 2002
News Article Format November 2001
News Article Format July 2001
News Article Format April 2001
News Article Format February 2000
Son of 1036 June 1994
RFC 1036 December 1987

--- ../usefor-article-06/Newsgroups.out          November 2001
+++ ../usefor-article-07/Newsgroups.out          May 2002
@@ -1,10 +1,13 @@
 5.5.  Newsgroups
 
-   The Newsgroups header's content specifies the newsgroup(s) in which
+   The Newsgroups-header's content specifies the newsgroup(s) in which
    the article is intended to appear. It is an inheritable header
-   (4.2.2.2) which then becomes the default Newsgroups header of any
-   followup, unless a Followup-To header is present to prescribe
-   otherwise.
+   (4.2.5.2) which then becomes the default Newsgroups-header of any
+   followup, unless a Followup-To-header is present to prescribe
+   otherwise.  Articles MUST NOT be passed between relaying agents or to
+   serving agents unless the sending agent has been configured to supply
+   and the receiving agent to receive at least one of the newsgroup-
+   names in the Newsgroups-header.
 
    References to "Unicode" or "the latest version of the Unicode
    Standard" mean [UNICODE 3.1] or any standard that supersedes it. That
@@ -18,15 +21,18 @@
    MUST NOT generate them (though they might well follow up to
    newsgroup-names containing them).
 
-      Newsgroups-content  = newsgroup-name
-                     *( *FWS ng-delim *FWS newsgroup-name )
-                     *FWS
+      header              =/ Newsgroups-header
+      Newsgroups-header   = "Newsgroups"  ":" SP Newsgroups-content
+                          *( ";" other-parameter )
+      Newsgroups-content  = [FWS] newsgroup-name
+                     *( [FWS] ng-delim [FWS] newsgroup-name )
+                     [FWS]
       newsgroup-name      = component *( "." component )
       component           = 1*component-glyph
       ng-delim            = ","
       component-glyph     = combiner-base *combiner-mark
       combiner-base       = combiner-ASCII / combiner-extended
-      combiner-ASCII      = "0"-"9" / %x41-5A / %x61-7A / "+" / "-" / "_"
+      combiner-ASCII      = DIGIT / ALPHA / "+" / "-" / "_"
       combiner-extended   = <any character with a Unicode code value of
                    0080 or greater and a combining class of 0,
                    but excluding any character in Unicode
@@ -49,6 +55,7 @@
    (cf. the weaker normalization requirement for other headers in
    section 4.4.1 which specified no more than normalization NFC, and see
    also the explanatory NOTE in that section).
+
         NOTE: As a result of of this restriction, a name has only one
         valid form. Implementations can assume that a straight
         comparison of characters or octets is sufficient to compare two
@@ -63,7 +70,7 @@
         forms, ligatures, and so on, as their use could cause confusion.
 
         There is insufficient experience in this area to determine
-        whether this is the right long-term solution. Implementers
+        whether this is the right long-term solution. Implementors
         should therefore be aware that a future version of this standard
         might reduce the requirement in the direction of NFC as opposed
         to NFKC.
@@ -77,15 +84,14 @@
 
    Newsgroup-names containing non-ASCII characters MUST be encoded in
    UTF-8 and not according to [RFC 2047].
-
    Components beginning with underline ("_") are reserved for use by
    future versions of this standard and MUST NOT occur in newsgroup
-   names (whether in Newsgroup headers or in newgroup control messages
+   names (whether in Newsgroups-headers or in newgroup control messages
    (7.2.1)).  However, such names MUST be accepted.
 
    Components beginning with "+" or "-" are reserved for use by
    implementations and MUST NOT occur in newsgroup names (whether in
-   Newsgroup headers or in newgroup control messages). Implementors may
+   Newsgroups-headers or in newgroup control messages). Implementors may
    assume that this rule will not change in any future version of this
    standard.
 
@@ -100,8 +106,9 @@
    perceived ambiguities pertinent to those languages). Where there is
    no such specific policy, the following restrictions SHOULD be applied
    to newsgroup names.
+
         NOTE: These restrictions are intended to reflect existing
-        practice, with some additions to accomodate foreseeable
+        practice, with some additions to accommodate foreseeable
         enhancements, and are intended both to avoid certain technical
         difficulties and to avoid unnecessary confusion. It may well be
         that experience will allow future extensions to this standard to
@@ -128,7 +135,6 @@
       characters in category Sk (Symbol, Modifier)            [4]
       characters in category Sm (Symbol, Math)                [4][5]
       characters in category So (Symbol, Other)               [4]
-
       [1] As new characters are added to Unicode, the code point moves
 from category Cn to some other category. As stated above,
 implementors should be prepared for this.
@@ -158,8 +164,6 @@
         implementation needs to be careful when this could cause a clash
         (e.g. between article 123 of group xxx.yyy and the directory for
         group xxx.yyy.123).
-[Open issue a number of people think this should not be a default
-requirement but simply be a NOTE; wording for such is further down.]
 
    3. A component is limited to 30 component-glyphs and a newsgroup-name
       to 71 component-glyphs. Whilst there is no longer any technical
@@ -182,35 +186,21 @@
    the mandatory requirements, to consider the effects of relaxing the
    other restrictions, and to consider how all this may affect
    propagation of the group.
-
    Since future extensions to this standard and the Unicode standard,
    including a possible relaxation of the NFKC normalization, plus any
    relaxations of the default restrictions introduced by specific
    hierarchies might invalidate some such checks, warnings, and
    adjustments, implementations MUST incorporate means to disable them.
 
-[Alternative text for Open issue]
-
-        NOTE: Components composed entirely of digits were forbidden by
-        [RFC 1036] but have nevertheless been used in practice, and are
-        therefore permitted by this specification. A common
-        implementation technique uses each component as the name of a
-        directory and uses numeric filenames for each article within a
-        group. Such an implementation needs to be careful when this
-        could cause a clash (e.g. between article 123 of group xxx.yyy
-        and the directory for group xxx.yyy.123).
-[Open issue: delete the above text if we retain the default requirement
-above.]
-
-        NOTE: The newsgroup-name as encoded in UTF-8 should be regarded
-        as the canonical form. Reading agents may convert it to whatever
-        character set they are able to display (see 4.4.1) and serving
-        agents may possibly need to convert it to some form more
-        suitable as a filename. Simple algorithms for both kinds of
-        conversion are readily available.  Observe that the syntax does
-        not allow comments within the Newsgroups header; this is to
-        simplify processing by relaying and serving agents which have a
-        requirement to process this header extremely rapidly.
+      NOTE: The newsgroup-name as encoded in UTF-8 should be regarded as
+      the canonical form. Reading agents may convert it to whatever
+      character set they are able to display and serving agents may
+      possibly need to convert it to some form more suitable as a
+      filename. Simple algorithms for both kinds of conversion are
+      readily available.  Observe that the syntax does not allow
+      comments within the Newsgroups-header; this is to simplify
+      processing by relaying and serving agents which have a requirement
+      to process this header extremely rapidly.
 
    The inclusion of folding white space within a Newsgroups-content is a
    newly introduced feature in this standard. It MUST be accepted by all
@@ -218,8 +208,8 @@
    reading agents).  Posting agents should be aware that such postings
    may be rejected by overly-critical old-style relaying agents. When a
    sufficient number of relaying agents are in conformance, posting
-   agents SHOULD generate such whitespace in the form of <CRLF WS> so as
-   to keep the length of lines in the relevant headers (notably
+   agents SHOULD generate such whitespace in the form of <CRLF WSP> so
+   as to keep the length of lines in the relevant headers (notably
    Newsgroups and Followup-To) to no more than than 79 characters (or
    other agreed policy limit - see 4.5).  Before such critical mass
    occurs, injecting agents MAY reformat such headers by removing
@@ -227,25 +217,28 @@
    NOT do so.
 
    Posters SHOULD use only the names of existing newsgroups in the
-   Newsgroups header. However, it is legitimate to cross-post to
+   Newsgroups-header. However, it is legitimate to cross-post to a
    newsgroup(s) which do not exist on the posting agent's host, provided
    that at least one of the newsgroups DOES exist there, and followup
    agents SHOULD accept this (posting agents MAY accept it, but Ought at
    least to alert the poster to the situation and request confirmation).
-   Relaying agents MUST NOT rewrite Newsgroups headers in any way, even
+   Relaying agents MUST NOT rewrite Newsgroups-headers in any way, even
    if some or all of the newsgroups do not exist on the relaying agent's
    host. Serving agents MUST NOT create new newsgroups simply because an
-   unrecognised newsgroup-name occurs in a Newsgroups header (see 7.2.1
+   unrecognized newsgroup-name occurs in a Newsgroups-header (see 7.2.1
    for the correct method of newsgroup creation).
 
-   The Newsgroups header is intended for use in Netnews articles rather
-   than in mail messages. It MAY be used in a mail message to indicate
-   that it is a copy also posted to the listed newsgroups, but it SHOULD
-   NOT be used in a mail-only reply to a Netnews article (thus the
-   "inheritable" property of this header applies only to followups to a
-   newsgroup, and not to followups to the poster). Moreover, if a
-   newsgroup-name contains any non-ASCII character, it MAY be encoded
-   using the mechanism defined in [RFC 2047] when sent by mail but, if
-   it is subsequently returned to the Netnews environment, it MUST then
-   be re-encoded into UTF-8.
+   The Newsgroups-header is intended for use in Netnews articles rather
+   than in email messages. It MAY be used in an email message to
+   indicate that it is a copy also posted to the listed newsgroups, in
+   which case the inclusion of a Posted-And-Mailed header (6.9) would
+   also be appropriate. However, it SHOULD NOT be used in an email-only
+   reply to a Netnews article (thus the "inheritable" property of this
+   header applies only to followups to a newsgroup, and not to followups
+   to the poster). Moreover, if a newsgroup-name contains any non-ASCII
+   character, it MAY be encoded using the mechanism defined in [RFC
+   2047] when sent by email (for which purpose the newsgroup-name SHOULD
+   be treated as an encoded-word) but, if it is subsequently returned to
+   the Netnews environment, it MUST then be re-encoded into UTF-8. See
+   also the further discussion in section 8.8.1.
 

Documents were processed to this format by Forrest J. Cavalier III