The below is to parse a lisp expression (doing as much as possible in ‘one go’). How does it look, and what can be improved?
# goal: capture the next token then get the rest of the line # to be used in a while-loop/yield tokenizer = re.compile(r""" s* # any amount of whitespace... # 1. capture group one: token ( ,@ # special token ,@ ... |((),`') # or ) ( , ' ` ... |"(?:(^\")*(?:\.)*)*" # or match on string (unrolling the loop)... |;.* # or comment-anything... |(^s('"`,;))* # or non-special... ) # 2. capture group two: rest-of-line (.*) """, re.VERBOSE)
Example run (python):
line = '(define (square x) (* x x))' while line: token, line = tokenizer.match(line).groups() print (token)