The pyparsing module is an alternative approach to creating and
executing simple grammars, vs. the traditional lex/yacc approach, or the
use of regular expressions. With pyparsing, you don't need to learn a new
syntax for defining grammars or matching expressions - the parsing module
provides a library of classes that you use to construct the grammar
directly in Python.
Here is a program to parse "Hello, World!" (or any greeting
of the form "<salutation>, <addressee>!"):
The Python representation of the grammar is quite readable, owing to
the self-explanatory class names, and the use of '+', '|' and '^'
operators.
The parsed results returned from parseString() can be accessed as a
nested list, a dictionary, or an object with named attributes.
The pyparsing module handles some of the problems that are typically
vexing when writing text parsers:
Classes |
And |
Requires all given ParseExpressions to be found in the given
order. |
CaselessLiteral |
Token to match a specified string, ignoring case of letters. |
CharsNotIn |
Token for matching words composed of characters *not* in a given
set. |
Combine |
Converter to concatenate all matching tokens to a single string. |
Dict |
Converter to return a repetitive expression as a list, but also as a
dictionary. |
Empty |
An empty token, will always match. |
FollowedBy |
Lookahead matching of the given parse expression. |
Forward |
Forward declaration of an expression to be defined later - used for
recursive grammars, such as algebraic infix notation. |
GoToColumn |
Token to advance to a specific column of input text; useful for
tabular report scraping. |
Group |
Converter to return the matched tokens as a list - useful for
returning tokens of ZeroOrMore and OneOrMore expressions. |
LineEnd |
Matches if current position is at the end of a line within the parse
string |
LineStart |
Matches if current position is at the beginning of a line within the
parse string |
Literal |
Token to exactly match a specified string. |
MatchFirst |
Requires that at least one ParseExpression is found. |
NoMatch |
A token that will never match. |
NotAny |
Lookahead to disallow matching with the given parse expression. |
OneOrMore |
Repetition of one or more of the given expression. |
Optional |
Optional matching of the given expression. |
Or |
Requires that at least one ParseExpression is found. |
ParseElementEnhance |
Abstract subclass of ParserElement, for combining and post-processing
parsed tokens. |
ParseExpression |
Abstract subclass of ParserElement, for combining and post-processing
parsed tokens. |
ParserElement |
Abstract base level parser element class. |
ParseResults |
Structured parse results, to provide multiple means of access to the
parsed data: |
PositionToken |
|
SkipTo |
Token for skipping over all undefined text until the matched
expression is found. |
StringEnd |
Matches if current position is at the end of the parse string |
StringStart |
Matches if current position is at the beginning of the parse
string |
Suppress |
Converter for ignoring the results of a parsed expression. |
Token |
Abstract ParserElement subclass, for defining atomic matching
patterns. |
TokenConverter |
Abstract subclass of ParseExpression, for converting parsed
results. |
Upcase |
Converter to upper case all matching tokens. |
White |
Special matching class for matching whitespace. |
Word |
Token for matching words composed of allowed character sets. |
ZeroOrMore |
Optional repetition of zero or more of the given expression. |
Function Summary |
|
col (loc,
strg)
Returns current column within a string, counting newlines as line
separators The first column is number 1. |
|
delimitedList (expr,
delim,
combine)
Helper to define a delimited list of expressions - the delimiter
defaults to ','. |
|
dictOf (key,
value)
Helper to easily and clearly define a dictionary by specifying the
respective patterns for the key and value. |
|
line (loc,
strg)
Returns the line of text containing loc within a string, counting
newlines as line separators The first line is number 1. |
|
lineno (loc,
strg)
Returns current line number within a string, counting newlines as line
separators The first line is number 1. |
|
oneOf (strs,
caseless)
Helper to quickly define a set of alternative Literals, and makes sure
to do longest-first testing when there is a conflict, regardless of the
input order, but returns a MatchFirst for best performance. |