'Don't parse markup languages with Regex' is an annoying trollpost and it should die... right?

ChubakPDP11+TakeWithGrainOfSalt · edit-2 3 months ago

'Don't parse markup languages with Regex' is an annoying trollpost and it should die... right?

@Corbin · 3 months ago

[HTML and Markdown] are not grammatically Type 2 (Chomsky-wise, Context-Free); rather, they are Type 3 (Chomsky-wise, Regular).

This is at least half-wrong, in that HTML is clearly not regular. The proof is simple: HTML structurally embeds a language of balanced parentheses (a Dyck language), and such languages are context-free and not regular. I don’t know Markdown well and there are several flavors; it’s quite possible that some flavors are regular. However, traditional Markdown embeds HTML, so if HTML is not regular than neither is Markdown.

I once did a syntax-directed translation of Markdown to HTML in AWK!

Sure. The original Markdown implementation was in Perl and operated similarly. However, note that this doesn’t imply that either language is regular, only that a translation is possible assuming the input is valid Markdown. Punting on recognition means that invalid parse trees have undefined translations, presumably at least sometimes generating invalid HTML.

ChubakPDP11+TakeWithGrainOfSalt · 2 months ago

I remember reading something about recursive languages in Aho’s Automata Book. I will have to check it again. So Regular Languages can’t be recursive? Is that what ‘recursive’ language even means? I have to read the book again.

Thanks for your help man.

@Corbin · 2 months ago

The definition of recursive language can be read on Wikipedia too. Quoting from that page:

All regular, context-free and context-sensitive languages are recursive.

But not all recursive languages are regular. As for “recursive”, it’s a historical term that might not be sufficiently descriptive; today, I’d usually call such languages decidable, to emphasize that they are languages which we can write (always-halting) computer programs to detect.

ChubakPDP11+TakeWithGrainOfSalt · 2 months ago

Thanks! You know last night I fed a model several books and asked it a lot of stuff. This did not come up.

What i do is I glance at the book, then ask the model to explain it to me. I call the model ‘Urkel the Wiz Kid’.

I fed both CS books and CE books. Like Sipsers, A Quantative Approach, Automata of Aho, Harris and Harris, etc.