Son-of-RFC1036:[Previous][Up to Table of Contents] [Next]
All octets found in headers MUST be ASCII characters. How-
ever, it is desirable to have a way of encoding non-ASCII
characters, especially in "human-readable" headers such as
Subject. MIME [rrr] provides a way to do this. Full
details may be found in the MIME specifications; herewith a
quick summary to alert software authors to the issues...
encoded-word = "=?" charset "?" encoding "?" codes "?="
charset = 1*tag-char
encoding = 1*tag-char
tag-char = <ASCII printable character except !()<>@,;:\"[]/?=>
codes = 1*code-char
code-char = <ASCII printable character except ?>
An encoded word is a sequence of ASCII printable characters
that specifies the character set, encoding method, and bits
of (potentially) non-ASCII characters. Encoded words are
allowed only in certain positions in certain headers. Spe-
cific headers impose restrictions on the content of encoded
words beyond that specified in this section. Posting agents
MUST ensure that any material resembling an encoded word
(complete with all delimiters), in a context where encoded
words may appear, really is an encoded word.
NOTE: The syntax is a bit ugly, but it was
designed to minimize chances of confusion with
legitimate header contents, and to satisfy diffi-
cult constraints on use within existing headers.
An encoded word MUST not be more than 75 octets long. Each
line of a header containing encoded word(s) MUST be at most
76 octets long, not counting the EOL.
NOTE: These limits are meant to bound the looka-
head needed to determine whether text that begins
"=?" is really an encoded word.
The details of charsets and encodings are defined by MIME
[rrr]; the sequence of preferred character sets is the same
as MIME's. Encoded words SHOULD not be used for content
expressible in ASCII.
When an encoded word is used, other than in a newsgroup name
(see section 5.5), it MUST be separated from any adjacent
non-space characters (including other encoded words) by
white space. Reading agents displaying the contents of
encoded words (as opposed to their encoded form) should
ignore white space adjacent to encoded words.
UNRESOLVED ISSUE: Should this section be deleted
entirely, or made much more terse? The material
is relevant, but too complex to discuss fully.
NOTE: The deletion of intervening white space per-
mits using multiple encoded words, implicitly con-
catenated by the deletion, to encode text that
will not fit within a single 75-character encoded
word.
Reading-agent implementors are warned that although this
Draft completely specifies where encoded words may appear in
the headers it defines, there are other headers (e.g. the
MIME Content-Description header) that MAY contain them.