headerparser — argparse for mail-style headers¶
GitHub | PyPI | Documentation | Issues | Changelog
Input Format¶
headerparser
accepts a syntax that is intended to be a simplified superset of
the Internet Message (e-mail) Format specified in RFC 822, RFC 2822, and
RFC 5322. Specifically:
- Everything in the input up to (but not including) the first blank line (i.e., a line containing only a line ending) constitutes a stanza or header section. Everything after the first blank line is a free-form message body. If there are no blank lines, the entire input is used as the header section, and there is no body.
Note
By default, blank lines at the beginning of a document are interpreted as
the ending of a zero-length stanza. Such blank lines can instead be
ignored by setting the skip_leading_newlines
scanner option to true.
- A stanza or header section is composed of zero or more header fields. A header field is composed of one or more lines, with all lines after the first beginning with a space or tab. Additionally, the first line must contain a colon (optionally surrounded by whitespace); everything before the colon is the header field name, while everything after (including subsequent lines) is the header field value.
Note
Name-value separators other than a colon can be used by setting the
separator_regex
scanner option appropriately.
Note
This format only recognizes CR, LF, and CR LF sequences as line endings.
An example:
Key: Value
Foo: Bar
Bar:Whitespace around the colon is optional
Baz : Very optional
Long-Field: This field has a very long value, so I'm going to split it
across multiple lines.
The above line is all whitespace. This counts as line folding, and so
we're still in the "Long Field" value, but the RFCs consider such lines
obsolete, so you should avoid using them.
.
One alternative to an all-whitespace line is a line with just indentation
and a period. Debian package description fields use this.
Foo: Wait, I already defined a value for this key. What happens now?
What happens now: It depends on whether the `multiple` option for the "Foo"
field was set in the HeaderParser.
If multiple=True: The "Foo" key in the dictionary returned by
HeaderParser.parse_string() would map to a list of all of Foo's values
If multiple=False: A ParserError is raised
If multiple=False but there's only one "Foo" anyway:
The "Foo" key in the result dictionary would map to just a single string.
Compare this to: the standard library's `email` package, which accepts
multi-occurrence fields, but *which* occurrence Message.__getitem__
returns is unspecified!
Are we still in the header: no
There was a blank line above, so we're now in the body, which isn't
processed for headers.
Good thing, too, because this isn't a valid header line.
On the other hand, this is not a valid RFC 822-style document:
An indented first line — without a "Name:" line before it!
A header line without a colon isn't good, either.
Does this make up for the above: no
Parser¶
-
class
headerparser.
HeaderParser
(normalizer=None, body=None, **kwargs)[source]¶ A parser for RFC 822-style header sections. Define the fields the parser should recognize with the
add_field()
method, configure handling of unrecognized fields withadd_additional()
, and then parse input withparse()
or anotherparse_*()
method.Parameters: - normalizer (callable) – By default, the parser will consider two field
names to be equal iff their lowercased forms are equal. This can be
overridden by setting
normalizer
to a custom callable that takes a field name and returns a “normalized” name for use in equality testing. The normalizer will also be used when looking up keys in theNormalizedDict
instances returned by the parser’sparse_*()
methods. - body (bool) – whether the parser should allow or forbid a body after
the header section;
True
means a body is required,False
means a body is prohibited, andNone
(the default) means a body is optional - kwargs – scanner options
-
add_additional
(enable=True, **kwargs)[source]¶ Specify how the parser should handle fields in the input that were not previously registered with
add_field
. By default, unknown fields will cause theparse_*
methods to raise anUnknownFieldError
, but calling this method withenable=True
(the default) will change the parser’s behavior so that all unregistered fields are processed according to the options in**kwargs
. (If no options are specified, the additional values will just be stored in the result dictionary.)If this method is called more than once, only the settings from the last call will be used.
Note that additional field values are always stored in the result dictionary using their field name as the key, and two fields are considered the same (for the purposes of
multiple
) iff their names are the same after normalization. Customization of the dictionary key and field name can only be done throughadd_field
.New in version 0.2.0:
action
argument addedParameters: - enable (bool) – whether the parser should accept input fields that
were not registered with
add_field
; setting this toFalse
disables additional fields and restores the parser’s default behavior - multiple (bool) – If
True
, each additional header field will be allowed to occur more than once in the input, and each field’s values will be stored in a list. IfFalse
(the default), aDuplicateFieldError
will be raised if an additional field occurs more than once in the input. - unfold (bool) – If
True
(defaultFalse
), additional field values will be “unfolded” (i.e., line breaks will be removed and whitespace around line breaks will be converted to a single space) before applyingtype
- type (callable) – a callable to apply to additional field values before storing them in the result dictionary
- choices (iterable) – A sequence of values which additional fields
are allowed to have. If
choices
is defined, all additional field values in the input must have one of the given values (after applyingtype
) or else anInvalidChoiceError
is raised. - action (callable) – A callable to invoke whenever the field is
encountered in the input. The callable will be passed the current
dictionary of header fields, the field’s name, and the field’s
value (after processing with
type
andunfold
and checking againstchoices
). The callable replaces the default behavior of storing the field’s values in the result dictionary, and so the callable must explicitly store the values if desired.
Returns: Raises: - if
enable
is true and a previous call toadd_field
used a customdest
- if
choices
is an empty sequence
- enable (bool) – whether the parser should accept input fields that
were not registered with
-
add_field
(name, *altnames, **kwargs)[source]¶ Define a header field for the parser to parse. During parsing, if a field is encountered whose name (modulo normalization) equals either
name
or one of thealtnames
, the field’s value will be processed according to the options in**kwargs
. (If no options are specified, the value will just be stored in the result dictionary.)New in version 0.2.0:
action
argument addedParameters: - name (string) – the primary name for the field, used in error
messages and as the default value of
dest
- altnames (strings) – field name synonyms
- dest – The key in the result dictionary in which the field’s
value(s) will be stored; defaults to
name
. When additional headers are enabled (seeadd_additional
),dest
must equal (after normalization) one of the field’s names. - required (bool) – If
True
(defaultFalse
), theparse_*
methods will raise aMissingFieldError
if the field is not present in the input - default – The value to associate with the field if it is not
present in the input. If no default value is specified, the field
will be omitted from the result dictionary if it is not present in
the input.
default
cannot be set when the field is required.type
,unfold
, andaction
will not be applied to the default value, and the default value need not belong tochoices
. - multiple (bool) – If
True
, the header field will be allowed to occur more than once in the input, and all of the field’s values will be stored in a list. IfFalse
(the default), aDuplicateFieldError
will be raised if the field occurs more than once in the input. - unfold (bool) – If
True
(defaultFalse
), the field value will be “unfolded” (i.e., line breaks will be removed and whitespace around line breaks will be converted to a single space) before applyingtype
- type (callable) – a callable to apply to the field value before storing it in the result dictionary
- choices (iterable) – A sequence of values which the field is
allowed to have. If
choices
is defined, all occurrences of the field in the input must have one of the given values (after applyingtype
) or else anInvalidChoiceError
is raised. - action (callable) – A callable to invoke whenever the field is
encountered in the input. The callable will be passed the current
dictionary of header fields, the field’s
name
, and the field’s value (after processing withtype
andunfold
and checking againstchoices
). The callable replaces the default behavior of storing the field’s values in the result dictionary, and so the callable must explicitly store the values if desired. Whenaction
is defined for a field,dest
cannot be.
Returns: Raises: - ValueError –
- if another field with the same name or
dest
was already defined - if
dest
is not one of the field’s names andadd_additional
is enabled - if
default
is defined andrequired
is true - if
choices
is an empty sequence - if both
dest
andaction
are defined
- if another field with the same name or
- TypeError – if
name
or one of thealtnames
is not a string
- name (string) – the primary name for the field, used in error
messages and as the default value of
-
parse
(iterable)[source]¶ New in version 0.4.0.
Parse an RFC 822-style header field section (possibly followed by a message body) from the contents of the given filehandle or sequence of lines and return a dictionary of the header fields (possibly with body attached). If
iterable
is an iterable ofstr
, newlines will be appended to lines in multiline header fields where not already present but will not be inserted where missing inside the body.Parameters: iterable – a text-file-like object or iterable of lines to parse
Return type: Raises: - ParserError – if the input fields do not conform to the field
definitions declared with
add_field
andadd_additional
- ScannerError – if the header section is malformed
- ParserError – if the input fields do not conform to the field
definitions declared with
-
parse_file
(fp)[source]¶ Parse an RFC 822-style header field section (possibly followed by a message body) from the contents of the given filehandle and return a dictionary of the header fields (possibly with body attached)
Deprecated since version 0.4.0: Use
parse()
instead.Parameters: fp (file-like object) – the file to parse
Return type: Raises: - ParserError – if the input fields do not conform to the field
definitions declared with
add_field
andadd_additional
- ScannerError – if the header section is malformed
- ParserError – if the input fields do not conform to the field
definitions declared with
-
parse_lines
(iterable)[source]¶ Parse an RFC 822-style header field section (possibly followed by a message body) from the given sequence of lines and return a dictionary of the header fields (possibly with body attached). Newlines will be inserted where not already present in multiline header fields but will not be inserted inside the body.
Deprecated since version 0.4.0: Use
parse()
instead.Parameters: iterable (iterable of strings) – a sequence of lines comprising the text to parse
Return type: Raises: - ParserError – if the input fields do not conform to the field
definitions declared with
add_field
andadd_additional
- ScannerError – if the header section is malformed
- ParserError – if the input fields do not conform to the field
definitions declared with
-
parse_next_stanza
(iterator)[source]¶ New in version 0.4.0.
Parse a RFC 822-style header field section from the contents of the given filehandle or iterator of lines and return a dictionary of the header fields. Input processing stops at the end of the header section, leaving the rest of the iterator unconsumed. As a message body is not consumed, calling this method when
body
is true will produce aMissingBodyError
.Parameters: iterator – a text-file-like object or iterator of lines to parse
Return type: Raises: - ParserError – if the input fields do not conform to the field
definitions declared with
add_field
andadd_additional
- ScannerError – if a header section is malformed
- ParserError – if the input fields do not conform to the field
definitions declared with
-
parse_next_stanza_string
(s)[source]¶ New in version 0.4.0.
Parse a RFC 822-style header field section from the given string and return a pair of a dictionary of the header fields and the rest of the string. As a message body is not consumed, calling this method when
body
is true will produce aMissingBodyError
.Parameters: s (string) – the text to parse
Return type: pair of
NormalizedDict
and a stringRaises: - ParserError – if the input fields do not conform to the field
definitions declared with
add_field
andadd_additional
- ScannerError – if a header section is malformed
- ParserError – if the input fields do not conform to the field
definitions declared with
-
parse_stanzas
(iterable)[source]¶ New in version 0.4.0.
Parse zero or more stanzas of RFC 822-style header fields from the given filehandle or sequence of lines and return a generator of dictionaries of header fields.
All of the input is treated as header sections, not message bodies; as a result, calling this method when
body
is true will produce aMissingBodyError
.Parameters: iterable – a text-file-like object or iterable of lines to parse
Return type: generator of
NormalizedDict
Raises: - ParserError – if the input fields do not conform to the field
definitions declared with
add_field
andadd_additional
- ScannerError – if a header section is malformed
- ParserError – if the input fields do not conform to the field
definitions declared with
-
parse_stanzas_stream
(fields)[source]¶ New in version 0.4.0.
Parse an iterable of iterables of
(name, value)
pairs as returned byscan_stanzas()
orscan_stanzas_string()
and return a generator of dictionaries of header fields. This is a low-level method that you will usually not need to call.Parameters: fields – an iterable of iterables of pairs of strings
Return type: generator of
NormalizedDict
Raises: - ParserError – if the input fields do not conform to the field
definitions declared with
add_field
andadd_additional
- ScannerError – if a header section is malformed
- ParserError – if the input fields do not conform to the field
definitions declared with
-
parse_stanzas_string
(s)[source]¶ New in version 0.4.0.
Parse zero or more stanzas of RFC 822-style header fields from the given string and return a generator of dictionaries of header fields.
All of the input is treated as header sections, not message bodies; as a result, calling this method when
body
is true will produce aMissingBodyError
.Parameters: s (string) – the text to parse
Return type: generator of
NormalizedDict
Raises: - ParserError – if the input fields do not conform to the field
definitions declared with
add_field
andadd_additional
- ScannerError – if a header section is malformed
- ParserError – if the input fields do not conform to the field
definitions declared with
-
parse_stream
(fields)[source]¶ Process a sequence of
(name, value)
pairs as returned byscan()
orscan_string()
and return a dictionary of header fields (possibly with body attached). This is a low-level method that you will usually not need to call.Parameters: fields (iterable of pairs of strings) – a sequence of
(name, value)
pairs representing the input fieldsReturn type: Raises: - ParserError – if the input fields do not conform to the field
definitions declared with
add_field
andadd_additional
- ValueError – if the input contains more than one body pair
- ParserError – if the input fields do not conform to the field
definitions declared with
-
parse_string
(s)[source]¶ Parse an RFC 822-style header field section (possibly followed by a message body) from the given string and return a dictionary of the header fields (possibly with body attached)
Parameters: s (string) – the text to parse
Return type: Raises: - ParserError – if the input fields do not conform to the field
definitions declared with
add_field
andadd_additional
- ScannerError – if the header section is malformed
- ParserError – if the input fields do not conform to the field
definitions declared with
- normalizer (callable) – By default, the parser will consider two field
names to be equal iff their lowercased forms are equal. This can be
overridden by setting
Scanner¶
Scanner functions perform basic parsing of RFC 822-style header fields,
splitting them up into sequences of (name, value)
pairs without any further
validation or transformation.
In each pair, the first element (the header field name) is the substring up to
but not including the first whitespace-padded colon (or other delimiter
specified by separator_regex
) in the first source line of the header field.
The second element (the header field value) is a single string, the
concatenation of one or more lines, starting with the substring after the first
colon in the first source line, with leading whitespace on lines after the
first preserved; the ending of each line is converted to '\n'
(added if
there is no line ending in the actual input), and the last line of the field
value has its trailing line ending (if any) removed.
Note
“Line ending” here means a CR, LF, or CR LF sequence. Unicode line
separators are not treated as line endings and are not trimmed or converted
to '\n'
.
The various functions differ in how they behave once the end of the header section is encountered:
scan()
andscan_string()
gather up everything after the header section and (if there is anything) yield it as a(None, body)
pairscan_next_stanza()
andscan_next_stanza_string()
stop processing input at the end of the header section;scan_next_stanza()
leaves the unprocessed input in the iterator, whilescan_next_stanza_string()
returns the rest of the input alongside the header fieldsscan_stanzas()
andscan_stanzas_string()
expect their input to consist entirely of multiple blank-line-terminated header sections, all of which are processed
The scan()
, scan_next_stanza()
, and scan_stanzas()
functions take as
input an iterable of strings (e.g., a text file object) and treat each string
as a single line, regardless of whether it ends with a line ending or not (or
even whether it contains a line ending in the middle of the string).
The scan_string()
, scan_next_stanza_string()
, and scan_stanzas_string()
functions take as input a single string which is then broken into lines on CR,
LF, and CR LF boundaries and then processed as a list of strings.
-
headerparser.
scan
(iterable, **kwargs)[source]¶ New in version 0.4.0.
Scan a text-file-like object or iterable of lines for RFC 822-style header fields and return a generator of
(name, value)
pairs for each header field in the input, plus a(None, body)
pair representing the body (if any) after the header section.All lines after the first blank line are concatenated & yielded as-is in a
(None, body)
pair. (Note that body lines which do not end with a line terminator will not have one appended.) If there is no empty line initerable
, then no body pair is yielded. If the empty line is the last line initerable
, the body will be the empty string. If the empty line is the first line initerable
and theskip_leading_newlines
option is false (the default), then all other lines will be treated as part of the body and will not be scanned for header fields.Parameters: - iterable – a text-file-like object or iterable of strings representing lines of input
- kwargs – scanner options
Return type: generator of pairs of strings
Raises: ScannerError – if the header section is malformed
-
headerparser.
scan_string
(s, **kwargs)[source]¶ Scan a string for RFC 822-style header fields and return a generator of
(name, value)
pairs for each header field in the input, plus a(None, body)
pair representing the body (if any) after the header section.See
scan()
for more information on the exact behavior of the scanner.Parameters: - s – a string which will be broken into lines on CR, LF, and CR LF
boundaries and passed to
scan()
- kwargs – scanner options
Return type: generator of pairs of strings
Raises: ScannerError – if the header section is malformed
- s – a string which will be broken into lines on CR, LF, and CR LF
boundaries and passed to
-
headerparser.
scan_next_stanza
(iterator, **kwargs)[source]¶ New in version 0.4.0.
Scan a text-file-like object or iterator of lines for RFC 822-style header fields and return a generator of
(name, value)
pairs for each header field in the input. Input processing stops as soon as a blank line is encountered, leaving the rest of the iterator unconsumed (Ifskip_leading_newlines
is true, the function only stops on a blank line after a non-blank line).Parameters: - iterator – a text-file-like object or iterator of strings representing lines of input
- kwargs – scanner options
Return type: generator of pairs of strings
Raises: ScannerError – if the header section is malformed
-
headerparser.
scan_next_stanza_string
(s, **kwargs)[source]¶ New in version 0.4.0.
Scan a string for RFC 822-style header fields and return a pair
(fields, extra)
wherefields
is a list of(name, value)
pairs for each header field in the input up to the first blank line andextra
is everything after the first blank line (Ifskip_leading_newlines
is true, the dividing point is instead the first blank line after a non-blank line); if there is no appropriate blank line in the input,extra
is the empty string.Parameters: - s – a string to scan
- kwargs – scanner options
Return type: pair of a list of pairs of strings and a string
Raises: ScannerError – if the header section is malformed
-
headerparser.
scan_stanzas
(iterable, **kwargs)[source]¶ New in version 0.4.0.
Scan a text-file-like object or iterable of lines for zero or more stanzas of RFC 822-style header fields and return a generator of lists of
(name, value)
pairs, where each list represents a stanza of header fields in the input.The stanzas are terminated by blank lines. Consecutive blank lines between stanzas are treated as a single blank line. Blank lines at the end of the input are discarded without creating a new stanza.
Parameters: - iterable – a text-file-like object or iterable of strings representing lines of input
- kwargs – scanner options
Return type: generator of lists of pairs of strings
Raises: ScannerError – if the header section is malformed
-
headerparser.
scan_stanzas_string
(s, **kwargs)[source]¶ New in version 0.4.0.
Scan a string for zero or more stanzas of RFC 822-style header fields and return a generator of lists of
(name, value)
pairs, where each list represents a stanza of header fields in the input.The stanzas are terminated by blank lines. Consecutive blank lines between stanzas are treated as a single blank line. Blank lines at the end of the input are discarded without creating a new stanza.
Parameters: - s – a string which will be broken into lines on CR, LF, and CR LF
boundaries and passed to
scan_stanzas()
- kwargs – scanner options
Return type: generator of lists of pairs of strings
Raises: ScannerError – if the header section is malformed
- s – a string which will be broken into lines on CR, LF, and CR LF
boundaries and passed to
Deprecated Functions¶
-
headerparser.
scan_file
(fp, **kwargs)[source]¶ Scan a file for RFC 822-style header fields and return a generator of
(name, value)
pairs for each header field in the input, plus a(None, body)
pair representing the body (if any) after the header section.See
scan()
for more information on the exact behavior of the scanner.Deprecated since version 0.4.0: Use
scan()
instead.Parameters: - fp – A file-like object than can be iterated over to produce lines to
pass to
scan()
. Opening the file in universal newlines mode is recommended. - kwargs – scanner options
Return type: generator of pairs of strings
Raises: ScannerError – if the header section is malformed
- fp – A file-like object than can be iterated over to produce lines to
pass to
-
headerparser.
scan_lines
(fp, **kwargs)[source]¶ Scan an iterable of lines for RFC 822-style header fields and return a generator of
(name, value)
pairs for each header field in the input, plus a(None, body)
pair representing the body (if any) after the header section.See
scan()
for more information on the exact behavior of the scanner.Deprecated since version 0.4.0: Use
scan()
instead.Parameters: - iterable – an iterable of strings representing lines of input
- kwargs – scanner options
Return type: generator of pairs of strings
Raises: ScannerError – if the header section is malformed
Scanner Options¶
The following keyword arguments can be passed to HeaderParser
and the scanner
functions in order to configure scanning behavior:
separator_regex=r'[ \t]*:[ \t]*'
- A regex (as a
str
or compiled regex object) defining the name-value separator. When the regex matches a line, everything before the matched substring becomes the field name, and everything after becomes the first line of the field value. Note that the regex must match any surrounding whitespace in order for it to be trimmed from the key & value. skip_leading_newlines=False
- If
True
, blank lines at the beginning of the input will be discarded. IfFalse
, a blank line at the beginning of the input marks the end of an empty header section.
New in version 0.3.0: separator_regex
, skip_leading_newlines
Utilities¶
-
class
headerparser.
NormalizedDict
(data=None, normalizer=None, body=None)[source]¶ A generalization of a case-insensitive dictionary.
NormalizedDict
takes a callable (the “normalizer”) that is applied to any key passed to its__getitem__
,__setitem__
, or__delitem__
method, and the result of the call is then used for the actual lookup. When iterating over aNormalizedDict
, each key is returned as the “pre-normalized” form passed to__setitem__
the last time the key was set (but seenormalized()
below). Aside from this,NormalizedDict
behaves like a normalMutableMapping
class.If a normalizer is not specified upon instantiation, a default will be used that converts strings to lowercase and leaves everything else unchanged, so
NormalizedDict
defaults to yet another case-insensitive dictionary.Two
NormalizedDict
instances compare equal iff their normalizers, bodies, andnormalized_dict()
return values are equal. When comparing aNormalizedDict
to any other type of mapping, the other mapping is first converted to aNormalizedDict
using the same normalizer.Parameters: - data (mapping) – a mapping or iterable of
(key, value)
pairs with which to initialize the instance - normalizer (callable) – A callable to apply to keys before looking them
up; defaults to
lower
. The callable MUST be idempotent (i.e.,normalizer(x)
must equalnormalizer(normalizer(x))
for all inputs) or else bad things will happen to your dictionary. - body (string or
None
) – initial value for thebody
attribute
-
body
= None¶ This is where
HeaderParser
stores the message body (if any) accompanying the header section represented by the mapping
-
normalized
()[source]¶ Return a copy of the instance such that iterating over it will return normalized keys instead of the keys passed to
__setitem__
>>> normdict = NormalizedDict() >>> normdict['Foo'] = 23 >>> normdict['bar'] = 42 >>> sorted(normdict) ['Foo', 'bar'] >>> sorted(normdict.normalized()) ['bar', 'foo']
Return type: NormalizedDict
- data (mapping) – a mapping or iterable of
-
headerparser.
BOOL
(s)[source]¶ Convert boolean-like strings to
bool
values. The strings'yes'
,'y'
,'on'
,'true'
, and'1'
are converted toTrue
, and the strings'no'
,'n'
,'off'
,'false'
, and'0'
are converted toFalse
. The conversion is case-insensitive and ignores leading & trailing whitespace. Any value that cannot be converted to abool
results in aValueError
.Parameters: s (string) – a boolean-like string to convert to a bool
Return type: bool Raises: ValueError – if s
is not one of the values listed above
-
headerparser.
lower
(s)[source]¶ New in version 0.2.0.
Convert
s
to lowercase by calling itslower()
method if it has one; otherwise, returns
unchanged
-
headerparser.
unfold
(s)[source]¶ New in version 0.2.0.
Remove folding whitespace from a string by converting line breaks (and any whitespace adjacent to line breaks) to a single space and removing leading & trailing whitespace.
>>> unfold('This is a \n folded string.\n') 'This is a folded string.'
Parameters: s (string) – a string to unfold Return type: string
Exceptions¶
-
exception
headerparser.errors.
Error
[source]¶ Bases:
Exception
Superclass for all custom exceptions raised by the package
Parser Errors¶
-
exception
headerparser.errors.
ParserError
[source]¶ Bases:
headerparser.errors.Error
,ValueError
Superclass for all custom exceptions related to errors in parsing
-
exception
headerparser.errors.
BodyNotAllowedError
[source]¶ Bases:
headerparser.errors.ParserError
Raised when
body=False
and the parser encounters a message body
-
exception
headerparser.errors.
DuplicateFieldError
(name)[source]¶ Bases:
headerparser.errors.ParserError
Raised when a header field not marked as multiple occurs two or more times in the input
-
name
= None¶ The name of the duplicated header field
-
-
exception
headerparser.errors.
FieldTypeError
(name, value, exc_value)[source]¶ Bases:
headerparser.errors.ParserError
Raised when a
type
callable raises an exception-
exc_value
= None¶ The exception raised by the
type
callable
-
name
= None¶ The name of the header field for which the
type
callable was called
-
value
= None¶ The value on which the
type
callable was called
-
-
exception
headerparser.errors.
InvalidChoiceError
(name, value)[source]¶ Bases:
headerparser.errors.ParserError
Raised when a header field is given a value that is not one of its allowed choices
-
name
= None¶ The name of the header field
-
value
= None¶ The invalid value
-
-
exception
headerparser.errors.
MissingBodyError
[source]¶ Bases:
headerparser.errors.ParserError
Raised when
body=True
but there is no message body in the input
-
exception
headerparser.errors.
MissingFieldError
(name)[source]¶ Bases:
headerparser.errors.ParserError
Raised when a header field marked as required is not present in the input
-
name
= None¶ The name of the missing header field
-
-
exception
headerparser.errors.
UnknownFieldError
(name)[source]¶ Bases:
headerparser.errors.ParserError
Raised when an unknown header field is encountered and additional header fields are not enabled
-
name
= None¶ The name of the unknown header field
-
Scanner Errors¶
-
exception
headerparser.errors.
ScannerError
[source]¶ Bases:
headerparser.errors.Error
,ValueError
Superclass for all custom exceptions related to errors in scanning
-
exception
headerparser.errors.
MalformedHeaderError
(line)[source]¶ Bases:
headerparser.errors.ScannerError
Raised when the scanner encounters an invalid header line, i.e., a line without either a colon or leading whitespace
-
line
= None¶ The invalid header line
-
-
exception
headerparser.errors.
UnexpectedFoldingError
(line)[source]¶ Bases:
headerparser.errors.ScannerError
Raised when the scanner encounters a folded (indented) line that is not preceded by a valid header line
-
line
= None¶ The line containing the unexpected folding (indentation)
-
Changelog¶
v0.4.0 (2019-05-29)¶
- Added a
scan()
function combining the behavior ofscan_file()
andscan_lines()
, which are now deprecated - Gave
HeaderParser
aparse()
method combining the behavior ofparse_file()
andparse_lines()
, which are now deprecated - Added
scan_next_stanza()
andscan_next_stanza_string()
functions for scanning & consuming input only up to the end of the first header section - Added
scan_stanzas()
andscan_stanzas_string()
functions for scanning input composed entirely of multiple stanzas/header sections - Gave
HeaderParser
parse_next_stanza()
andparse_next_stanza_string()
methods for parsing & comsuming input only up to the end of the first header section - Gave
HeaderParser
parse_stanzas()
andparse_stanzas_string()
methods for parsing input composed entirely of multiple stanzas/header sections
v0.3.0 (2018-10-12)¶
- Drop support for Python 3.3
- Gave
HeaderParser
and the scanner functions options for configuring scanning behavior:separator_regex
skip_leading_newlines
- Fixed a
DeprecationWarning
in Python 3.7
v0.2.0 (2018-02-14)¶
NormalizedDict
’s default normalizer (exposed as thelower()
function) now passes non-strings through unchangedHeaderParser
instances can now be compared for non-identity equalityHeaderParser.add_field()
andHeaderParser.add_additional()
now take an optionalaction
argument for customizing the parser’s behavior when a field is encountered- Made the
unfold()
function public
v0.1.0 (2017-03-17)¶
Initial release
headerparser
parses key-value pairs in the style of RFC 822 (e-mail)
headers and converts them into case-insensitive dictionaries with the trailing
message body (if any) attached. Fields can be converted to other types, marked
required, or given default values using an API based on the standard library’s
argparse
module. (Everyone loves argparse
, right?) Low-level functions
for just scanning header fields (breaking them into sequences of key-value
pairs without any further processing) are also included.
Installation¶
Just use pip (You have pip, right?) to install
headerparser
and its dependencies:
pip install headerparser
Examples¶
Define a parser:
>>> import headerparser
>>> parser = headerparser.HeaderParser()
>>> parser.add_field('Name', required=True)
>>> parser.add_field('Type', choices=['example', 'demonstration', 'prototype'], default='example')
>>> parser.add_field('Public', type=headerparser.BOOL, default=False)
>>> parser.add_field('Tag', multiple=True)
>>> parser.add_field('Data')
Parse some headers and inspect the results:
>>> msg = parser.parse_string('''\
... Name: Sample Input
... Public: yes
... tag: doctest, examples,
... whatever
... TAG: README
...
... Wait, why I am using a body instead of the "Data" field?
... ''')
>>> sorted(msg.keys())
['Name', 'Public', 'Tag', 'Type']
>>> msg['Name']
'Sample Input'
>>> msg['Public']
True
>>> msg['Tag']
['doctest, examples,\n whatever', 'README']
>>> msg['TYPE']
'example'
>>> msg['Data']
Traceback (most recent call last):
...
KeyError: 'data'
>>> msg.body
'Wait, why I am using a body instead of the "Data" field?\n'
Fail to parse headers that don’t meet your requirements:
>>> parser.parse_string('Type: demonstration')
Traceback (most recent call last):
...
headerparser.errors.MissingFieldError: Required header field 'Name' is not present
>>> parser.parse_string('Name: Bad type\nType: other')
Traceback (most recent call last):
...
headerparser.errors.InvalidChoiceError: 'other' is not a valid choice for 'Type'
>>> parser.parse_string('Name: unknown field\nField: Value')
Traceback (most recent call last):
...
headerparser.errors.UnknownFieldError: Unknown header field 'Field'
Allow fields you didn’t even think of:
>>> parser.add_additional()
>>> msg = parser.parse_string('Name: unknown field\nField: Value')
>>> msg['Field']
'Value'
Just split some headers into names & values and worry about validity later:
>>> for field in headerparser.scan_string('''\
... Name: Scanner Sample
... Unknown headers: no problem
... Unparsed-Boolean: yes
... CaSe-SeNsItIvE-rEsUlTs: true
... Whitespace around colons:optional
... Whitespace around colons : I already said it's optional.
... That means you have the _option_ to use as much as you want!
...
... And there's a body, too, I guess.
... '''): print(field)
('Name', 'Scanner Sample')
('Unknown headers', 'no problem')
('Unparsed-Boolean', 'yes')
('CaSe-SeNsItIvE-rEsUlTs', 'true')
('Whitespace around colons', 'optional')
('Whitespace around colons', "I already said it's optional.\n That means you have the _option_ to use as much as you want!")
(None, "And there's a body, too, I guess.\n")