Problems parsing a string with pyparsing

prettydarknwild@lemmy.world · 1 year ago

Problems parsing a string with pyparsing

UlrikHD · edit-2 1 year ago

Personally I would recommend to use regex instead for parsing, which would also allow you to more easily test your expressions. You could then get the list as

import re
result = re.findall(r'[\w_]+|\S',  yourstring)  # This will preserve ULLONG_MAX as a single word if that's what you want

As for what’s wrong with your expressions:

First expression: Once you hit (, OneOrMore(Char(printables)) will take over and continue matching every printable char. Instead you should use OR (|) with the alphanumerical first for priority OneOrMore(word | Char(printables))

Second expression. You’re running into the same issue with your use of +. Once string.punctuation takes over, it will continue matching until it encounters a char that is not a punctuation and then stop the matching. Instead you can write:

parser = OneOrMore(Word(alphanums) | Word(string.punctuation))
result = parser.parseString(yourstring)

Do note that underscore is considered a punctutation so ULLONG_MAX will be split, not sure if that’s what you want or not.