Son-of-RFC1036:[Previous][Up to Table of Contents] [Next]

          Header and body lines MAY contain any ASCII characters other
          than CR (ASCII 13), LF (ASCII 10), and NUL (ASCII 0).

               NOTE:  CR  and  LF are excluded because they clash
               with common  EOL  conventions.   NUL  is  excluded
               because  it  clashes with the C end-of-string con-
               vention, which is  significant  to  most  existing
               news   software.    These   three  characters  are
               unlikely to be transmitted successfully.

          However, posters SHOULD avoid using ASCII control characters
          except for tab (ASCII 9), formfeed (ASCII 12), and backspace
          (ASCII 8).  Tab signifies sufficient horizontal white  space
          to  reach  the next of a set of fixed positions; posters are
          warned that there is no standard set of positions,  so  tabs
          should be avoided if precise spacing is essential.  Formfeed
          signifies a point at which a reading agent SHOULD pause  and
          await  reader  interaction  before  displaying further text.
          Backspace SHOULD be used only for  underlining,  done  by  a
          sequence of underscores (ASCII 95) followed by an equal num-
          ber of backspaces, signifying that the same number  of  text
          characters  following  are  to  be  underlined.  Posters are
          warned that underlining  is  not  available  on  all  output
          devices  and  is  best  not relied on for essential meaning.
          Reading agents SHOULD recognize underlining and translate it
          to the appropriate commands for devices that support it.

               NOTE: Interpretation of almost all control charac-
               ters  is  device-specific  to  some  degree,   and
               devices  differ.   Tabs  and  underlining are sup-
               ported, to some extent, by most modern devices and
               reading  agents, hence the cautious exemptions for
               them.  The underlining method is specified because
               the  inverse method, text and then underscores, is
               tempting to the naive... but if sent unaltered  to
               a  device  that shows only the most recent of sev-
               eral overstruck characters rather than  a  compos-
               ite, the result can be utterly unreadable.

               NOTE: A common interpretation of tab is that it is
               a request to space forward to  the  next  position
               whose  number  is  one  more than a multiple of 8,
               with positions numbered sequentially  starting  at
               1.  (So tab positions are 9, 17, 25, ...)  Reading
               agents not constrained by existing system  conven-
               tions might wish to use this interpretation.

               NOTE: It will typically be necessary for a reading
               agent to catch and interpret  formfeed,  not  just
               send  it  to  the output device.  The actions per-
               formed by typical output devices  on  receiving  a
               formfeed  are neither adequate for nor appropriate
               to the pause-for-interaction meaning.

          Cooperating subnets which wish to employ non-ASCII character
          sets  by using escape sequences (employing, e.g., ESC (ASCII
          27), SO (ASCII 14), and SI (ASCII 15)) to alter the  meaning
          of  superficially-ASCII  characters  MAY do so, but MUST use
          MIME headers to alert reading agents to the particular char-
          acter  set(s)  and escape sequences in use.  A reading agent
          SHOULD not pass such an escape sequence through,  unaltered,
          to  the  output  device  unless  the agent confirms that the
          sequence is one used to affect character sets and has reason
          to  believe  that the device is capable of interpreting that
          particular sequence properly.

               NOTE:  Cooperating-subnet  organizers  are  warned
               that  some very old relayers strip certain control
               characters out of articles they pass  along.   ESC
               is known to be among the affected characters.

               NOTE:  There  are  now standard Internet encodings
               for Japanese [rrr] and Vietnamese [rrr] in partic-
               ular.

          Articles  MUST  not  contain  any octet with value exceeding
          127, i.e. any octet that is not an ASCII character.

               NOTE: This rule, like others, may  be  relaxed  by
               unanimous  consent of the members of a cooperating
               subnet, provided suitable precautions are taken to
               ensure  that  rule-violating  articles do not leak
               out of the subnet.  (This has already been done in
               many  areas  where  ASCII  is not adequate for the
               local language(s).)  Beware that articles contain-
               ing non-ASCII octets in headers are a violation of
               the MAIL specifications and  are  not  valid  MAIL
               messages.   MIME  offers a way to encode non-ASCII
               characters in ASCII for use in headers;  see  sec-
               tion 4.5.

               NOTE: While there is great interest in using 8-bit
               character sets, not all software  can  yet  handle
               them  correctly.  Hence the restriction to cooper-
               ating subnets.  MIME  encodings  can  be  used  to
               transmit  such  characters  while remaining within
               the octet restriction.

          In anticipation of the day when it is possible to  use  non-
          ASCII  characters  safely  anywhere,  and to provide for the
          (substantial) cooperating subnets  that  are  already  using
          them, transmission paths SHOULD treat news articles as unin-
          terpreted sequences of octets (except perhaps for  transfor-
          mations  between  EOL  representations)  and relayers SHOULD
          treat non-ASCII characters in articles as  ordinary  charac-
          ters.

               NOTE:  8-bit  enthusiasts  are warned that not all
               software conforms to  these  recommendations  yet.
               In particular, standard NNTP [rrr] is a 7-bit pro-
               tocol, and  there  may  be  implementations  which
               enforce  this rule.  Be warned, also, that it will
               never be safe to send raw binary data in the  body
               of news articles, because changes of EOL represen-
               tation may (will!) corrupt it.

          Except  where  cooperating  subnets   permit   more   direct
          approaches,  MIME [rrr] headers and encodings SHOULD be used
          to transmit non-ASCII content using  ASCII  characters;  see
          section  4.5, appendix B, and the MIME RFCs for details.  If
          article content can be expressed in  ASCII,  it  SHOULD  be.
          Failing  that, the order of preference for character sets is
          that described in MIME [rrr].

               NOTE: Using the MIME facilities, it is possible to
               transmit ANY character set, and ANY form of binary
               data, using only ASCII characters.  Equally impor-
               tant,  such  articles  are self-describing and the
               reading agent can tell which octet-to-symbol  map-
               ping  is  intended!  Designation of some preferred
               character sets is intended to minimize the  number
               of character sets that a reading agent must under-
               stand in order to display most articles  properly.

          Articles  containing  non-ASCII  characters,  articles using
          ASCII characters (values 0 through 127)  to  refer  to  non-
          ASCII  symbols, and articles using escape sequences to shift
          character sets SHOULD include MIME headers indicating  which
          character set(s) and conventions are being used, and MUST do
          so  unless  such  articles  are  strictly  confined   to   a
          cooperating subnet which has its own pre-agreed conventions.
          MIME encodings are preferred over all these techniques.   If
          it  comes to a relayer's attention that it is being asked to
          pass an article using such techniques outward across what it
          knows  to  be  the boundary of such a cooperating subnet, it
          MUST report this error to its administrator, and MAY  refuse
          to  pass the article beyond the subnet boundary.  If it does
          pass the article, it MUST re-encode it with  MIME  encodings
          to make it conform to this Draft.

               NOTE:  Such re-encoding is a non-trivial task, due
               to MIME rules such as the  prohibition  of  nested
               encodings.   It's not just a matter of pouring the
               body through a simple filter.

          Reading agents SHOULD note MIME headers and attempt to  show
          the   reader  the  closest  possible  approximation  to  the
          intended content.  They SHOULD not just send the  octets  of
          the  article to the output device unaltered, unless there is
          reason to believe that the output device will indeed  inter-
          pret  them  correctly.   Reading  agents MUST not pass ASCII
          control characters or escape sequences, other than  as  dis-
          cussed above, unaltered to the output device; only by chance
          would the result be the desired one, and  there  is  serious
          potential  for  harmful  side  effects, either accidental or
          malicious.

               NOTE: Exactly what to  do  with  unwanted  control
               characters/sequences  depends on the philosophy of
               the reading agent, but passing  them  straight  to
               the  output device is almost always wrong.  If the
               reading agent wants to mark the presence of such a
               character/sequence  in  circumstances  where  only
               ASCII printable characters are  available,  trans-
               lating  it  to "#" might be a suitable method; "#"
               is a conspicuous character seldom used  in  normal
               text.

               NOTE: Reading agents should be aware that many old
               output devices (or the transmission paths to them)
               zero out the top bit of octets sent to them.  This
               can transform non-ASCII characters into ASCII con-
               trol characters.

          Followup  agents MUST be careful to apply appropriate trans-
          formations of representation to  the  outbound  followup  as
          well  as  the  inbound  precursor.  A followup to an article
          containing non-ASCII material is very likely to contain non-
          ASCII material itself.