There are plenty of articles about parsing full Markdown/CommonMark/etc, but I’m interested only in a specific limited subset, so I’m wondering if a better approach is available to me, and would much appreciate any tips or advice.
I only need to support:
No handling of lists, paragraphs, indentation, code blocks, tables, footnotes, superscript, subscript, headings, images, or any of that.
I have sort-of solved it by running individual regexes for each feature in a loop, but that leads to problems, like the asterisks in an line code block being parsed as italics (
a *int, b *int becoming impossible to write) which I don’t know if I can solve without expensive lookbehinds. And I’m not sure if complex regular expressions are slower than other methods, which matters because I want to parse the text frequently.
What approaches should I look at for a task like this? Is a finite state machine a better approach?