Google

12.11 rfc822 -- Parse RFC 2822 mail headers

This module defines a class, Message, which represents an ``email message'' as defined by the Internet standard RFC 2822.12.6 Such messages consist of a collection of message headers, and a message body. This module also defines a helper class AddressList for parsing RFC 2822 addresses. Please refer to the RFC for information on the specific syntax of RFC 2822 messages.

The mailbox module provides classes to read mailboxes produced by various end-user mail programs.

class Message(file[, seekable])
A Message instance is instantiated with an input object as parameter. Message relies only on the input object having a readline() method; in particular, ordinary file objects qualify. Instantiation reads headers from the input object up to a delimiter line (normally a blank line) and stores them in the instance. The message body, following the headers, is not consumed.

This class can work with any input object that supports a readline() method. If the input object has seek and tell capability, the rewindbody() method will work; also, illegal lines will be pushed back onto the input stream. If the input object lacks seek but has an unread() method that can push back a line of input, Message will use that to push back illegal lines. Thus this class can be used to parse messages coming from a buffered stream.

The optional seekable argument is provided as a workaround for certain stdio libraries in which tell() discards buffered data before discovering that the lseek() system call doesn't work. For maximum portability, you should set the seekable argument to zero to prevent that initial tell() when passing in an unseekable object such as a a file object created from a socket object.

Input lines as read from the file may either be terminated by CR-LF or by a single linefeed; a terminating CR-LF is replaced by a single linefeed before the line is stored.

All header matching is done independent of upper or lower case; e.g. m['From'], m['from'] and m['FROM'] all yield the same result.

class AddressList(field)
You may instantiate the AddressList helper class using a single string parameter, a comma-separated list of RFC 2822 addresses to be parsed. (The parameter None yields an empty list.)

quote(str)
Return a new string with backslashes in str replaced by two backslashes and double quotes replaced by backslash-double quote.

unquote(str)
Return a new string which is an unquoted version of str. If str ends and begins with double quotes, they are stripped off. Likewise if str ends and begins with angle brackets, they are stripped off.

parseaddr(address)
Parse address, which should be the value of some address-containing field such as To: or Cc:, into its constituent ``realname'' and ``email address'' parts. Returns a tuple of that information, unless the parse fails, in which case a 2-tuple (None, None) is returned.

dump_address_pair(pair)
The inverse of parseaddr(), this takes a 2-tuple of the form (realname, email_address) and returns the string value suitable for a To: or Cc: header. If the first element of pair is false, then the second element is returned unmodified.

parsedate(date)
Attempts to parse a date according to the rules in RFC 2822. however, some mailers don't follow that format as specified, so parsedate() tries to guess correctly in such cases. date is a string containing an RFC 2822 date, such as 'Mon, 20 Nov 1995 19:12:08 -0500'. If it succeeds in parsing the date, parsedate() returns a 9-tuple that can be passed directly to time.mktime(); otherwise None will be returned. Note that fields 6, 7, and 8 of the result tuple are not usable.

parsedate_tz(date)
Performs the same function as parsedate(), but returns either None or a 10-tuple; the first 9 elements make up a tuple that can be passed directly to time.mktime(), and the tenth is the offset of the date's timezone from UTC (which is the official term for Greenwich Mean Time). (Note that the sign of the timezone offset is the opposite of the sign of the time.timezone variable for the same timezone; the latter variable follows the POSIX standard while this module follows RFC 2822.) If the input string has no timezone, the last element of the tuple returned is None. Note that fields 6, 7, and 8 of the result tuple are not usable.

mktime_tz(tuple)
Turn a 10-tuple as returned by parsedate_tz() into a UTC timestamp. It the timezone item in the tuple is None, assume local time. Minor deficiency: this first interprets the first 8 elements as a local time and then compensates for the timezone difference; this may yield a slight error around daylight savings time switch dates. Not enough to worry about for common use.

See Also:

Module mailbox:
Classes to read various mailbox formats produced by end-user mail programs.
Module mimetools:
Subclass of rfc.Message that handles MIME encoded messages.



Footnotes

...2822.12.6
This module originally conformed to RFC 822, hence the name. Since then, RFC 2822 has been released as an update to RFC 822. This module should be considered RFC 2822-conformant, especially in cases where the syntax or semantics have changed since RFC 822.


Subsections
See About this document... for information on suggesting changes.