Scanner¶
-
headerparser.
scan_file
(fp, **kwargs)[source]¶ Scan a file for RFC 822-style header fields and return a generator of
(name, value)
pairs for each header field in the input, plus a(None, body)
pair representing the body (if any) after the header section.See
scan_lines()
for more information on the exact behavior of the scanner.Parameters: - fp – A file-like object than can be iterated over to produce lines to
pass to
scan_lines()
. Opening the file in universal newlines mode is recommended. - kwargs – scanner options
Return type: generator of pairs of strings
Raises: - MalformedHeaderError – if an invalid header line, i.e., a line without either a colon or leading whitespace, is encountered
- UnexpectedFoldingError – if a folded (indented) line that is not preceded by a valid header line is encountered
- fp – A file-like object than can be iterated over to produce lines to
pass to
-
headerparser.
scan_lines
(iterable, **kwargs)[source]¶ Scan an iterable of lines for RFC 822-style header fields and return a generator of
(name, value)
pairs for each header field in the input, plus a(None, body)
pair representing the body (if any) after the header section.Each field value is a single string, the concatenation of one or more lines, with leading whitespace on lines after the first preserved. The ending of each line is converted to
'\n'
(added if there is no ending), and the last line of the field value has its trailing line ending (if any) removed.Note
“Line ending” here means a CR, LF, or CR LF sequence at the end of one of the lines in
iterable
. Unicode line separators, along with line endings occurring in the middle of a line, are not treated as line endings and are not trimmed or converted to\n
.All lines after the first blank line are concatenated & yielded as-is in a
(None, body)
pair. (Note that body lines which do not end with a line terminator will not have one appended.) If there is no empty line initerable
, then no body pair is yielded. If the empty line is the last line initerable
, the body will be the empty string. If the empty line is the first line initerable
and theskip_leading_newlines
option isFalse
(the default), then all other lines will be treated as part of the body and will not be scanned for header fields.Parameters: - iterable – an iterable of strings representing lines of input
- kwargs – scanner options
Return type: generator of pairs of strings
Raises: - MalformedHeaderError – if an invalid header line, i.e., a line without either a colon or leading whitespace, is encountered
- UnexpectedFoldingError – if a folded (indented) line that is not preceded by a valid header line is encountered
-
headerparser.
scan_string
(s, **kwargs)[source]¶ Scan a string for RFC 822-style header fields and return a generator of
(name, value)
pairs for each header field in the input, plus a(None, body)
pair representing the body (if any) after the header section.See
scan_lines()
for more information on the exact behavior of the scanner.Parameters: - s – a string which will be broken into lines on CR, LF, and CR LF
boundaries and passed to
scan_lines()
- kwargs – scanner options
Return type: generator of pairs of strings
Raises: - MalformedHeaderError – if an invalid header line, i.e., a line without either a colon or leading whitespace, is encountered
- UnexpectedFoldingError – if a folded (indented) line that is not preceded by a valid header line is encountered
- s – a string which will be broken into lines on CR, LF, and CR LF
boundaries and passed to
Scanner Options¶
The following keyword arguments can be passed to HeaderParser
and the scanner
functions in order to configure scanning behavior:
separator_regex=r'[ \t]*:[ \t]*'
- A regex (as a
str
or compiled regex object) defining the name-value separator. When the regex matches a line, everything before the matched substring becomes the field name, and everything after becomes the first line of the field value. Note that the regex must match any surrounding whitespace in order for it to be trimmed from the key & value. skip_leading_newlines=False
- If
True
, blank lines at the beginning of the input will be discarded. IfFalse
, a blank line at the beginning of the input marks the end of an empty header section and the beginning of the message body.
New in version 0.3.0: separator_regex
, skip_leading_newlines