Scanner

headerparser.scan_file(fp, **kwargs)[source]

Scan a file for RFC 822-style header fields and return a generator of (name, value) pairs for each header field in the input, plus a (None, body) pair representing the body (if any) after the header section.

See scan_lines() for more information on the exact behavior of the scanner.

Parameters:
  • fp – A file-like object than can be iterated over to produce lines to pass to scan_lines(). Opening the file in universal newlines mode is recommended.
  • kwargsscanner options
Return type:

generator of pairs of strings

Raises:
  • MalformedHeaderError – if an invalid header line, i.e., a line without either a colon or leading whitespace, is encountered
  • UnexpectedFoldingError – if a folded (indented) line that is not preceded by a valid header line is encountered
headerparser.scan_lines(iterable, **kwargs)[source]

Scan an iterable of lines for RFC 822-style header fields and return a generator of (name, value) pairs for each header field in the input, plus a (None, body) pair representing the body (if any) after the header section.

Each field value is a single string, the concatenation of one or more lines, with leading whitespace on lines after the first preserved. The ending of each line is converted to '\n' (added if there is no ending), and the last line of the field value has its trailing line ending (if any) removed.

Note

“Line ending” here means a CR, LF, or CR LF sequence at the end of one of the lines in iterable. Unicode line separators, along with line endings occurring in the middle of a line, are not treated as line endings and are not trimmed or converted to \n.

All lines after the first blank line are concatenated & yielded as-is in a (None, body) pair. (Note that body lines which do not end with a line terminator will not have one appended.) If there is no empty line in iterable, then no body pair is yielded. If the empty line is the last line in iterable, the body will be the empty string. If the empty line is the first line in iterable and the skip_leading_newlines option is False (the default), then all other lines will be treated as part of the body and will not be scanned for header fields.

Parameters:
  • iterable – an iterable of strings representing lines of input
  • kwargsscanner options
Return type:

generator of pairs of strings

Raises:
  • MalformedHeaderError – if an invalid header line, i.e., a line without either a colon or leading whitespace, is encountered
  • UnexpectedFoldingError – if a folded (indented) line that is not preceded by a valid header line is encountered
headerparser.scan_string(s, **kwargs)[source]

Scan a string for RFC 822-style header fields and return a generator of (name, value) pairs for each header field in the input, plus a (None, body) pair representing the body (if any) after the header section.

See scan_lines() for more information on the exact behavior of the scanner.

Parameters:
Return type:

generator of pairs of strings

Raises:
  • MalformedHeaderError – if an invalid header line, i.e., a line without either a colon or leading whitespace, is encountered
  • UnexpectedFoldingError – if a folded (indented) line that is not preceded by a valid header line is encountered

Scanner Options

The following keyword arguments can be passed to HeaderParser and the scanner functions in order to configure scanning behavior:

separator_regex=r'[ \t]*:[ \t]*'
A regex (as a str or compiled regex object) defining the name-value separator. When the regex matches a line, everything before the matched substring becomes the field name, and everything after becomes the first line of the field value. Note that the regex must match any surrounding whitespace in order for it to be trimmed from the key & value.
skip_leading_newlines=False
If True, blank lines at the beginning of the input will be discarded. If False, a blank line at the beginning of the input marks the end of an empty header section and the beginning of the message body.

New in version 0.3.0: separator_regex, skip_leading_newlines