Scanner

Scanner functions perform basic parsing of RFC 822-style header fields, splitting them up into sequences of (name, value) pairs without any further validation or transformation.

In each pair, the first element (the header field name) is the substring up to but not including the first whitespace-padded colon (or other delimiter specified by separator_regex) in the first source line of the header field. The second element (the header field value) is a single string, the concatenation of one or more lines, starting with the substring after the first colon in the first source line, with leading whitespace on lines after the first preserved; the ending of each line is converted to '\n' (added if there is no line ending in the actual input), and the last line of the field value has its trailing line ending (if any) removed.

Note

“Line ending” here means a CR, LF, or CR LF sequence. Unicode line separators are not treated as line endings and are not trimmed or converted to '\n'.

The various functions differ in how they behave once the end of the header section is encountered:

The scan(), scan_next_stanza(), and scan_stanzas() functions take as input an iterable of strings (e.g., a text file object) and treat each string as a single line, regardless of whether it ends with a line ending or not (or even whether it contains a line ending in the middle of the string).

The scan_string(), scan_next_stanza_string(), and scan_stanzas_string() functions take as input a single string which is then broken into lines on CR, LF, and CR LF boundaries and then processed as a list of strings.

headerparser.scan(iterable, **kwargs)[source]

New in version 0.4.0.

Scan a text-file-like object or iterable of lines for RFC 822-style header fields and return a generator of (name, value) pairs for each header field in the input, plus a (None, body) pair representing the body (if any) after the header section.

All lines after the first blank line are concatenated & yielded as-is in a (None, body) pair. (Note that body lines which do not end with a line terminator will not have one appended.) If there is no empty line in iterable, then no body pair is yielded. If the empty line is the last line in iterable, the body will be the empty string. If the empty line is the first line in iterable and the skip_leading_newlines option is false (the default), then all other lines will be treated as part of the body and will not be scanned for header fields.

Parameters:
  • iterable – a text-file-like object or iterable of strings representing lines of input
  • kwargsscanner options
Return type:

generator of pairs of strings

Raises:

ScannerError – if the header section is malformed

headerparser.scan_string(s, **kwargs)[source]

Scan a string for RFC 822-style header fields and return a generator of (name, value) pairs for each header field in the input, plus a (None, body) pair representing the body (if any) after the header section.

See scan() for more information on the exact behavior of the scanner.

Parameters:
  • s – a string which will be broken into lines on CR, LF, and CR LF boundaries and passed to scan()
  • kwargsscanner options
Return type:

generator of pairs of strings

Raises:

ScannerError – if the header section is malformed

headerparser.scan_next_stanza(iterator, **kwargs)[source]

New in version 0.4.0.

Scan a text-file-like object or iterator of lines for RFC 822-style header fields and return a generator of (name, value) pairs for each header field in the input. Input processing stops as soon as a blank line is encountered, leaving the rest of the iterator unconsumed (If skip_leading_newlines is true, the function only stops on a blank line after a non-blank line).

Parameters:
  • iterator – a text-file-like object or iterator of strings representing lines of input
  • kwargsscanner options
Return type:

generator of pairs of strings

Raises:

ScannerError – if the header section is malformed

headerparser.scan_next_stanza_string(s, **kwargs)[source]

New in version 0.4.0.

Scan a string for RFC 822-style header fields and return a pair (fields, extra) where fields is a list of (name, value) pairs for each header field in the input up to the first blank line and extra is everything after the first blank line (If skip_leading_newlines is true, the dividing point is instead the first blank line after a non-blank line); if there is no appropriate blank line in the input, extra is the empty string.

Parameters:
Return type:

pair of a list of pairs of strings and a string

Raises:

ScannerError – if the header section is malformed

headerparser.scan_stanzas(iterable, **kwargs)[source]

New in version 0.4.0.

Scan a text-file-like object or iterable of lines for zero or more stanzas of RFC 822-style header fields and return a generator of lists of (name, value) pairs, where each list represents a stanza of header fields in the input.

The stanzas are terminated by blank lines. Consecutive blank lines between stanzas are treated as a single blank line. Blank lines at the end of the input are discarded without creating a new stanza.

Parameters:
  • iterable – a text-file-like object or iterable of strings representing lines of input
  • kwargsscanner options
Return type:

generator of lists of pairs of strings

Raises:

ScannerError – if the header section is malformed

headerparser.scan_stanzas_string(s, **kwargs)[source]

New in version 0.4.0.

Scan a string for zero or more stanzas of RFC 822-style header fields and return a generator of lists of (name, value) pairs, where each list represents a stanza of header fields in the input.

The stanzas are terminated by blank lines. Consecutive blank lines between stanzas are treated as a single blank line. Blank lines at the end of the input are discarded without creating a new stanza.

Parameters:
Return type:

generator of lists of pairs of strings

Raises:

ScannerError – if the header section is malformed

Deprecated Functions

headerparser.scan_file(fp, **kwargs)[source]

Scan a file for RFC 822-style header fields and return a generator of (name, value) pairs for each header field in the input, plus a (None, body) pair representing the body (if any) after the header section.

See scan() for more information on the exact behavior of the scanner.

Deprecated since version 0.4.0: Use scan() instead.

Parameters:
  • fp – A file-like object than can be iterated over to produce lines to pass to scan(). Opening the file in universal newlines mode is recommended.
  • kwargsscanner options
Return type:

generator of pairs of strings

Raises:

ScannerError – if the header section is malformed

headerparser.scan_lines(fp, **kwargs)[source]

Scan an iterable of lines for RFC 822-style header fields and return a generator of (name, value) pairs for each header field in the input, plus a (None, body) pair representing the body (if any) after the header section.

See scan() for more information on the exact behavior of the scanner.

Deprecated since version 0.4.0: Use scan() instead.

Parameters:
  • iterable – an iterable of strings representing lines of input
  • kwargsscanner options
Return type:

generator of pairs of strings

Raises:

ScannerError – if the header section is malformed

Scanner Options

The following keyword arguments can be passed to HeaderParser and the scanner functions in order to configure scanning behavior:

separator_regex=r'[ \t]*:[ \t]*'
A regex (as a str or compiled regex object) defining the name-value separator. When the regex matches a line, everything before the matched substring becomes the field name, and everything after becomes the first line of the field value. Note that the regex must match any surrounding whitespace in order for it to be trimmed from the key & value.
skip_leading_newlines=False
If True, blank lines at the beginning of the input will be discarded. If False, a blank line at the beginning of the input marks the end of an empty header section.

New in version 0.3.0: separator_regex, skip_leading_newlines