usefor-article-03 February 2000

[< Prev] [TOC] [ Next >]
2.4.  Syntax Notation

   This standard uses the Augmented Backus Naur Form described in [RFC
   2234].  A discussion of this is outside the bounds of this standard,
   but it is expected that implementors will be able to quickly
   understand it with reference to the defining document.

   Much of the syntax of News Articles is based on the corresponding
   syntax defined in [MESSFOR] or in the Mime specifications [RFC 2045]
   et seq, which is deemed to have been incorporated into this standard
   as required.  However, there are some important differences arising
   from the fact that [MESSFOR] does not recognise anything other than
   US-ASCII characters, that it does not recognise the MIME headers [RFC
   2045], and that it includes much syntax described as "obsolete".

        NOTE:  News parsers historically have been much less permissive
        than Mail parsers, and this is reflected in the modifications
        referred to, and in some further specific rules.

   The following syntactic forms therefore supersede the corresponding
   rules given in [MESSFOR] and [RFC 2045], thus allowing UTF-8
   characters [RFC 2044] to appear in certain contexts (the four rules
   begining with "strict-" reflect the corresponding original rules from
   [MESSFOR]).

      UTF8-xtra-head  = %d192-253
      UTF8-xtra-tail  = %d128-191
      UTF8-xtra-char  = UTF8-xtra-head 1*UTF8-xtra-tail
      text            = %d1-9 /            ; all UTF-8 characters except
              %d11-12 /          ; US-ASCII NUL, CR and LF
              %d14-127 /
              UTF8-xtra-char
      ctext           = NO-WS-CTL /        ; all of <text> except
              %d33-39 /          ; SP, HTAB, "(", ")"
              %d42-91 /          ; and "\"
              %d93-126 /
              UTF8-xtra-char
      qtext           = NO-WS-CTL /        ; all of <text> except
              %d33 /             ; SP, HTAB, "\" and DQUOTE
              %d35-91 /
              %d93-126 /
              UTF8-xtra-char
      utext           = NO-WS-CTL /        ; Non white space controls
              %d33-126 /         ; The rest of US-ASCII
              UTF8-xtra-char
      strict-text     = %d1-9 /            ; text restricted to
              %d11-12 /          ; US-ASCII
              %d14-127
      strict-qtext    = NO-WS-CTL /        ; qtext restricted to
              %d33 /             ; US-ASCII
              %d35-91 /
              %d93-127
      strict-quoted-pair
            = "\" strict-text
      strict-quoted-string
            = [CFWS] DQUOTE
                 *([FWS] (strict-qtext / strict-quoted-pair))
                 [FWS] DQUOTE [CFWS]

        NOTE: There are sequences of octets which cannot legitimately
        occur in UTF-8, even a few permitted by the above syntax.  These
        SHOULD NOT be generated by posting agents but, where they occur
        inadavertently, they SHOULD be passed on untouched by other
        agents.

   Wherever in this standard the syntax is stated to be taken from
   [MESSFOR], it is to be understood as the syntax defined by [MESSFOR]
   after making the above changes, but NOT including any syntax defined
   in section 4 ("Obsolete syntax") of [MESSFOR].  Software compliant
   with this standard MUST NOT generate any of the syntactic forms
   defined in that Obsolete Syntax, although it MAY accept such
   syntactic forms. Certain syntax from the MIME specifications [RFC
   2045] et seq is also considered a part of this standard (see 6.17).

   The following syntactic forms, taken from [RFC 2234] or from
   [MESSFOR], are repeated here for convenience only:

      ALPHA           = %x41-5A /          ; A-Z
              %x61-7A            ; a-z
      CR              = %x0D               ; carriage return
      CRLF            = CR LF
      DIGIT           = %x30-39            ; 0-9
      HTAB            = %x09               ; horizontal tab
      LF              = %x0A               ; line feed
      SP              = %x20               ; space
      NO-WS-CTL       = %d1-8 /            ; US-ASCII control characters
              %d11 /             ; which do not include the
              %d12 /             ; carriage return, line feed,
              %d14-31 /          ; and whitespace characters
              %d127
      WSP             = SP / HTAB          ; Whitespace characters
      FWS             = ([*WSP CRLF] 1*WSP); Folding whitespace
      atext           = ALPHA / DIGIT /
              "!" / "#" /        ; Any character except
              "$" / "%" /        ; controls SP, and specials.
              "&" / "'" /        ; Used for atoms
              "*" / "+" /
              "-" / "/" /
              "=" / "?" /
              "^" / "_" /
              "`" / "}" /
              "|" / "}" /
              "~"
      atom            = [CFWS] 1*atext [CFWS]
      dot-atom        = [CFWS] dot-atom-text [CFWS]
      dot-atom-text   = 1*atext *( "." 1*atext )
      comment         = "(" *([FWS]
                 (ctext / quoted-pair / comment)) [FWS] ")"
      CFWS            = *([FWS] comment) (([FWS] comment) / FWS )
      DQUOTE          = %d34              ; quote mark
      quoted-pair     = "\" text
      quoted-string   = [CFWS] DQUOTE
                 *([FWS] (qtext / quoted-pair))
                 [FWS] DQUOTE [CFWS]
      unstructured    = *( [FWS] utext ) [FWS]

        NOTE: CFWS occurs at many places in the syntax in order to allow
        comments and extra whitespace to be inserted almost anywhere.
        The syntax is in fact ambiguous insofar as it may be impossible
        to tell in which of several possible ways a given comment or WS
        was produced. However, this does not lead to semantic ambiguity
        because, unless specifically stated otherwise, the presence of
        absence of a comment or additional WS has no semantic meaning
        and, in particular, it is a matter of indifference whether it
        forms a part of the syntactic construct preceding it or the one
        following it.

        NOTE: Following [RFC 2234], literal text included in the syntax
        is to be regarded as case-insensitive.  However, in
        contradistinction to [MESSFOR], the Netnews protocols are
        sensitive to case in some instances (as in newsgroup names, some
        header parameters, etc.). Care has been taken to indicate this
        explicitly where required.
[< Prev] [TOC] [ Next >]
#Diff to first older
NewerOlder
usefor-usefor May 2005
usefor-usefor April 2005
usefor-usefor November 2004
usefor-usefor September 2004
News Article Format and Transmission May 2004
News Article Format and Transmission November 2003
News Article Format June 2003
News Article Format April 2003
News Article Format February 2003
News Article Format August 2002
News Article Format May 2002
News Article Format November 2001
News Article Format July 2001
News Article Format April 2001
Son of 1036 June 1994

--- ../s-o-1036/Syntax_Notation.out          June 1994
+++ ../usefor-article-03/Syntax_Notation.out          February 2000
@@ -1,47 +1,134 @@
-2.2. Syntax Notation
+2.4.  Syntax Notation
 
-Although the mechanisms specified  in  this  Draft  are  all
-described  in prose, most are also described formally in the
-modified BNF notation of RFC 822.  Implementors will need to
-be  familiar  with  this  notation  to fully understand this
-specification, and are referred to RFC 822  for  a  complete
-explanation  of  the modified BNF notation.  Here is a brief
-illustrative example:
-
-     sentence  = clause *( punct clause ) "."
-     punct     = ":" / ";"
-     clause    = 1*word [ "(" clause ")" / "," 1*word ]
-     word      = <any English word>
-
-This defines a sentence as some clauses separated by  puncts
-and  ended  by  a period, a punct as a colon or semicolon, a
-clause as at least one <word> optionally followed by  either
-a  parenthesized  clause  or  a  comma and at least one more
-<word>, and a <word> as (informally) any English  word.   <>
-are  used to enclose names when (and only when) distinguish-
-ing them from surrounding text is useful.  The full form  of
-the  repetition  notation  is <m>"*"<n><thing>, denoting <m>
-through <n> repetitions of <thing>; <m>  defaults  to  zero,
-<n>  to  infinity, and the "*" and <n> can be omitted if <m>
-and <n> are equal, so 1*word is one or more  words,  1*5word
-is one through five words, and 2word is exactly two words.
-
-The  character  "\"  is not special in any way in this nota-
-tion.
-
-This Draft is intended  to  be  self-contained;  all  syntax
-rules  used in it are defined within it, and a rule with the
-same name as one found in MAIL does not necessarily have the
-same  definition.   The lexical layer of MAIL is NOT, repeat
-NOT, used in this  Draft,  and  its  presence  must  not  be
-assumed;  notably,  this  Draft  spells out all places where
-
-INTERNET DRAFT to be        NEWS                    sec. 2.2
-
-
-white space is permitted/required and all places where  con-
-structs resembling MAIL comments can occur.
-
-     NOTE:  News  parsers  historically  have been much
-     less permissive than MAIL parsers.
+   This standard uses the Augmented Backus Naur Form described in [RFC
+   2234].  A discussion of this is outside the bounds of this standard,
+   but it is expected that implementors will be able to quickly
+   understand it with reference to the defining document.
+
+   Much of the syntax of News Articles is based on the corresponding
+   syntax defined in [MESSFOR] or in the Mime specifications [RFC 2045]
+   et seq, which is deemed to have been incorporated into this standard
+   as required.  However, there are some important differences arising
+   from the fact that [MESSFOR] does not recognise anything other than
+   US-ASCII characters, that it does not recognise the MIME headers [RFC
+   2045], and that it includes much syntax described as "obsolete".
+
+        NOTE:  News parsers historically have been much less permissive
+        than Mail parsers, and this is reflected in the modifications
+        referred to, and in some further specific rules.
+
+   The following syntactic forms therefore supersede the corresponding
+   rules given in [MESSFOR] and [RFC 2045], thus allowing UTF-8
+   characters [RFC 2044] to appear in certain contexts (the four rules
+   begining with "strict-" reflect the corresponding original rules from
+   [MESSFOR]).
+
+      UTF8-xtra-head  = %d192-253
+      UTF8-xtra-tail  = %d128-191
+      UTF8-xtra-char  = UTF8-xtra-head 1*UTF8-xtra-tail
+      text            = %d1-9 /            ; all UTF-8 characters except
+              %d11-12 /          ; US-ASCII NUL, CR and LF
+              %d14-127 /
+              UTF8-xtra-char
+      ctext           = NO-WS-CTL /        ; all of <text> except
+              %d33-39 /          ; SP, HTAB, "(", ")"
+              %d42-91 /          ; and "\"
+              %d93-126 /
+              UTF8-xtra-char
+      qtext           = NO-WS-CTL /        ; all of <text> except
+              %d33 /             ; SP, HTAB, "\" and DQUOTE
+              %d35-91 /
+              %d93-126 /
+              UTF8-xtra-char
+      utext           = NO-WS-CTL /        ; Non white space controls
+              %d33-126 /         ; The rest of US-ASCII
+              UTF8-xtra-char
+      strict-text     = %d1-9 /            ; text restricted to
+              %d11-12 /          ; US-ASCII
+              %d14-127
+      strict-qtext    = NO-WS-CTL /        ; qtext restricted to
+              %d33 /             ; US-ASCII
+              %d35-91 /
+              %d93-127
+      strict-quoted-pair
+            = "\" strict-text
+      strict-quoted-string
+            = [CFWS] DQUOTE
+                 *([FWS] (strict-qtext / strict-quoted-pair))
+                 [FWS] DQUOTE [CFWS]
+
+        NOTE: There are sequences of octets which cannot legitimately
+        occur in UTF-8, even a few permitted by the above syntax.  These
+        SHOULD NOT be generated by posting agents but, where they occur
+        inadavertently, they SHOULD be passed on untouched by other
+        agents.
+
+   Wherever in this standard the syntax is stated to be taken from
+   [MESSFOR], it is to be understood as the syntax defined by [MESSFOR]
+   after making the above changes, but NOT including any syntax defined
+   in section 4 ("Obsolete syntax") of [MESSFOR].  Software compliant
+   with this standard MUST NOT generate any of the syntactic forms
+   defined in that Obsolete Syntax, although it MAY accept such
+   syntactic forms. Certain syntax from the MIME specifications [RFC
+   2045] et seq is also considered a part of this standard (see 6.17).
+
+   The following syntactic forms, taken from [RFC 2234] or from
+   [MESSFOR], are repeated here for convenience only:
+
+      ALPHA           = %x41-5A /          ; A-Z
+              %x61-7A            ; a-z
+      CR              = %x0D               ; carriage return
+      CRLF            = CR LF
+      DIGIT           = %x30-39            ; 0-9
+      HTAB            = %x09               ; horizontal tab
+      LF              = %x0A               ; line feed
+      SP              = %x20               ; space
+      NO-WS-CTL       = %d1-8 /            ; US-ASCII control characters
+              %d11 /             ; which do not include the
+              %d12 /             ; carriage return, line feed,
+              %d14-31 /          ; and whitespace characters
+              %d127
+      WSP             = SP / HTAB          ; Whitespace characters
+      FWS             = ([*WSP CRLF] 1*WSP); Folding whitespace
+      atext           = ALPHA / DIGIT /
+              "!" / "#" /        ; Any character except
+              "$" / "%" /        ; controls SP, and specials.
+              "&" / "'" /        ; Used for atoms
+              "*" / "+" /
+              "-" / "/" /
+              "=" / "?" /
+              "^" / "_" /
+              "`" / "}" /
+              "|" / "}" /
+              "~"
+      atom            = [CFWS] 1*atext [CFWS]
+      dot-atom        = [CFWS] dot-atom-text [CFWS]
+      dot-atom-text   = 1*atext *( "." 1*atext )
+      comment         = "(" *([FWS]
+                 (ctext / quoted-pair / comment)) [FWS] ")"
+      CFWS            = *([FWS] comment) (([FWS] comment) / FWS )
+      DQUOTE          = %d34              ; quote mark
+      quoted-pair     = "\" text
+      quoted-string   = [CFWS] DQUOTE
+                 *([FWS] (qtext / quoted-pair))
+                 [FWS] DQUOTE [CFWS]
+      unstructured    = *( [FWS] utext ) [FWS]
+
+        NOTE: CFWS occurs at many places in the syntax in order to allow
+        comments and extra whitespace to be inserted almost anywhere.
+        The syntax is in fact ambiguous insofar as it may be impossible
+        to tell in which of several possible ways a given comment or WS
+        was produced. However, this does not lead to semantic ambiguity
+        because, unless specifically stated otherwise, the presence of
+        absence of a comment or additional WS has no semantic meaning
+        and, in particular, it is a matter of indifference whether it
+        forms a part of the syntactic construct preceding it or the one
+        following it.
+
+        NOTE: Following [RFC 2234], literal text included in the syntax
+        is to be regarded as case-insensitive.  However, in
+        contradistinction to [MESSFOR], the Netnews protocols are
+        sensitive to case in some instances (as in newsgroup names, some
+        header parameters, etc.). Care has been taken to indicate this
+        explicitly where required.
 

Documents were processed to this format by Forrest J. Cavalier III