INTERNET-DRAFT                               Charles H. Lindsey
Usenet Format Working Group                  University of Manchester
                                             July 2001

6.21.3. Content-Transfer-Encoding

Previous Up Next
6.21.3.  Content-Transfer-Encoding
   "Content-Transfer-Encoding: 7bit" is sufficient for article bodies
   (or parts of multiparts) written in pure US-ASCII (or most other
   material representable in 7 bits).  Posting agents SHOULD specify
   "Content-Transfer-Encoding: 8bit" for all other cases unless there
   are pressing reasons to do otherwise. They MAY use "8bit" encoding
   even when "7bit" encoding would have sufficed. Examples of such
   pressing reasons are the following:

   1. The content type implies that the content is (or may be) "8bit-
      unsafe"; i.e.  it may contain octets equivalent to the US-ASCII
      characters CR or LF (other than in the combination CRLF) or NUL.
      In that case one of the Content-Transfer-Encodings "base64" or
      "quoted-printable" MUST be used, and reading agents MUST be able
      to handle both of them. Encoding "binary" MUST NOT be used (except
      in cooperating subnets with alternative transport arrangements)
      because this standard does not mandate a transport mechanism that
      could support it.

        NOTE: If a future extension to the MIME standards were to
        provide a more compact encoding of binary suited to transport
        over an 8bit channel, it could be considered as an alternative
        to base64 once it had gained widespread acceptance.

   2. It is often the case that "application" Content-Types are textual
      in nature, and intelligible to humans as well as to machines, and
      where this state can be recognized by the posting agent (either
      through knowledge of the particular application type or by
      testing) the material SHOULD NOT be treated as 8bit-unsafe; this
      has the added benefit, where the posting agent uses other than
      CRLF for line endings internally, of automatically ensuring that
      line endings are processed correctly during transport.

      If, on the other hand, the posting agent recognizes that the
      material is not textual, or cannot reasonably determine it to be
      so, then the material MUST be encoded as for 8bit-unsafe (however,
      in that case, it is the responsibility of the agent generating the
      material to ensure that lines endings, if any, are represented
      correctly).

        NOTE: All the application types defined by this standard, namely
        "application/news-transmission", "application/news-groupinfo"
        and "application/news-checkgroups" are textual, and indeed
        designed for human reading.

   3. Although the "text" Content-Types should normally be encoded as
      8bit (or 7bit), if the character set specified by the "charset="
      parameter can include the 3 disallowed octets, then the material
      MUST be encoded as for 8bit-unsafe.  This is most likely to arise
      in the case of 16-bit character sets such as UTF-16 ([UNICODE3.1]
      or [ISO/IEC 10646]).  In addition, where it is known that the
      material is subseqently to be gatewayed from news to mail (8.8),
      the encoding "quoted-printable" MAY be used (otherwise the gateway
      might have to re-encode it itself).

   4. Some protocols REQUIRE the use of a particular Content-Transfer-
      Encoding. In particular, the authentication protocol based on
      [Open]PGP defined in [RFC 2015] and/or [RFC 2015bis] mandates the
      use of one of the encodings "quoted-printable" or "base64".
      Whilst posters might be tempted to risk the use of "8bit" or
      "7bit" encodings (and indeed the referenced standard recommends
      that signed messages using those encodings be accepted and
      interpreted), they should be warned that differences in the
      treatment of trailing whitespace between OpenPGP [RFC 2440] and
      earlier versions of PGP may render signatures written with the one
      unverifiable by the other; and, moreover, Usenet articles are very
      likely to include trailing whitespace in the form of a personal
      signature (4.3.2).
[It is to be hoped that [RFC 2015bis] will have progressed to a full RFC
by the time this draft is finalized.]

   5. The Content-Type message/partial [RFC 2046] is required to use
      encoding "7bit" (the encapsulated complete message may itself use
      encoding "quoted-printable" or "base64", but that information is
      only conveyed along with the first of the partial parts).

        NOTE: Although there would actually be no problem using encoding
        "8bit" in a pure Netnews (as opposed to mail) environment, this
        standard discourages (see 6.21.2.1) the use of "message/partial"
        except for binary material, which will be encoded to pass
        through "7bit" in any case.

   Injecting and relaying agents MUST NOT change the encoding of
   articles passed to them. Gateways SHOULD NOT change the encoding
   unless absolutely necessary.

Previous Up Next
Previous draft (04): 6.21.2. Content-Transfer-Encoding

Diffs to previous draft

--- {draft-04}	Wed Jul 11 21:55:52 2001
+++ {draft-05}	Wed Jul 11 21:55:53 2001
@@ -1,21 +1,84 @@
-
6.21.2.  Content-Transfer-Encoding
-   Posting agents SHOULD specify "Content-Transfer-Encoding: 8bit" for
-   all articles not written in pure US-ASCII and not requiring full
-   binary. They MAY use "8bit" encoding even when "7bit" encoding would
-   have sufficed. They SHOULD specify "base64" when the content type
-   implies binary (i.e. content intended for machine, rather than human,
-   consumption).
+
6.21.3.  Content-Transfer-Encoding
+   "Content-Transfer-Encoding: 7bit" is sufficient for article bodies
+   (or parts of multiparts) written in pure US-ASCII (or most other
+   material representable in 7 bits).  Posting agents SHOULD specify
+   "Content-Transfer-Encoding: 8bit" for all other cases unless there
+   are pressing reasons to do otherwise. They MAY use "8bit" encoding
+   even when "7bit" encoding would have sufficed. Examples of such
+   pressing reasons are the following:
+
+   1. The content type implies that the content is (or may be) "8bit-
+      unsafe"; i.e.  it may contain octets equivalent to the US-ASCII
+      characters CR or LF (other than in the combination CRLF) or NUL.
+      In that case one of the Content-Transfer-Encodings "base64" or
+      "quoted-printable" MUST be used, and reading agents MUST be able
+      to handle both of them. Encoding "binary" MUST NOT be used (except
+      in cooperating subnets with alternative transport arrangements)
+      because this standard does not mandate a transport mechanism that
+      could support it.
 
         NOTE: If a future extension to the MIME standards were to
         provide a more compact encoding of binary suited to transport
         over an 8bit channel, it could be considered as an alternative
         to base64 once it had gained widespread acceptance.
 
-   Posting agents SHOULD NOT specify encoding "quoted-printable", but
-   reading agents MUST interpret that encoding correctly.  Encoding
-   "binary" MUST NOT be used (except in cooperating subnets with
-   alternative transport arrangements) because this standard does not
-   mandate a transport mechanism that could support it.
+   2. It is often the case that "application" Content-Types are textual
+      in nature, and intelligible to humans as well as to machines, and
+      where this state can be recognized by the posting agent (either
+      through knowledge of the particular application type or by
+      testing) the material SHOULD NOT be treated as 8bit-unsafe; this
+      has the added benefit, where the posting agent uses other than
+      CRLF for line endings internally, of automatically ensuring that
+      line endings are processed correctly during transport.
+
+      If, on the other hand, the posting agent recognizes that the
+      material is not textual, or cannot reasonably determine it to be
+      so, then the material MUST be encoded as for 8bit-unsafe (however,
+      in that case, it is the responsibility of the agent generating the
+      material to ensure that lines endings, if any, are represented
+      correctly).
+
+        NOTE: All the application types defined by this standard, namely
+        "application/news-transmission", "application/news-groupinfo"
+        and "application/news-checkgroups" are textual, and indeed
+        designed for human reading.
+
+   3. Although the "text" Content-Types should normally be encoded as
+      8bit (or 7bit), if the character set specified by the "charset="
+      parameter can include the 3 disallowed octets, then the material
+      MUST be encoded as for 8bit-unsafe.  This is most likely to arise
+      in the case of 16-bit character sets such as UTF-16 ([UNICODE3.1]
+      or [ISO/IEC 10646]).  In addition, where it is known that the
+      material is subseqently to be gatewayed from news to mail (8.8),
+      the encoding "quoted-printable" MAY be used (otherwise the gateway
+      might have to re-encode it itself).
+
+   4. Some protocols REQUIRE the use of a particular Content-Transfer-
+      Encoding. In particular, the authentication protocol based on
+      [Open]PGP defined in [RFC 2015] and/or [RFC 2015bis] mandates the
+      use of one of the encodings "quoted-printable" or "base64".
+      Whilst posters might be tempted to risk the use of "8bit" or
+      "7bit" encodings (and indeed the referenced standard recommends
+      that signed messages using those encodings be accepted and
+      interpreted), they should be warned that differences in the
+      treatment of trailing whitespace between OpenPGP [RFC 2440] and
+      earlier versions of PGP may render signatures written with the one
+      unverifiable by the other; and, moreover, Usenet articles are very
+      likely to include trailing whitespace in the form of a personal
+      signature (4.3.2).
+[It is to be hoped that [RFC 2015bis] will have progressed to a full RFC
+by the time this draft is finalized.]
+
+   5. The Content-Type message/partial [RFC 2046] is required to use
+      encoding "7bit" (the encapsulated complete message may itself use
+      encoding "quoted-printable" or "base64", but that information is
+      only conveyed along with the first of the partial parts).
+
+        NOTE: Although there would actually be no problem using encoding
+        "8bit" in a pure Netnews (as opposed to mail) environment, this
+        standard discourages (see 6.21.2.1) the use of "message/partial"
+        except for binary material, which will be encoded to pass
+        through "7bit" in any case.
 
    Injecting and relaying agents MUST NOT change the encoding of
    articles passed to them. Gateways SHOULD NOT change the encoding