Hey. I have made some DSLs before, and for all of them, most of which were in C, I have used Flex and Bison. But this time I wanna use Scheme, Cyclone Scheme to be exact. I can potentially use Flex/Bison this time too, because Cyclone has a strong FFI to C.

But I’d rather innovate. I have been writing down the language’s grammar in EBNF:

https://gist.github.com/Chubek/bd54df78fe1f71f46cb262ba990a209b

And my thinking is, why not turn this into a parser? Not something like BNFC that translates BNF (not EBNF) into several parser and lexer specifications plus an AST, I want it to translate EBNF into Scheme.

But if you read the grammar, you will realize that there are some places where it’s not very descriptive and machine-friendly. For that reason, I think an LLM can help.

Now, I need your help. I am mostly active in systems programming. Like Assembly, C stuff. I don’t know much about LLMs and this whole AI revolution. I did some work as an ‘ML-engineer guy’ (not an ML engineer, an ML-engineer guy, there’s a difference!), so I know how this whole thing works. I have also read MITs standard book on mathematical optimization.

But I definitely need to use a pretrained model here. My knowledge of mathematical optimization is useless when you need like 28 million to train a model that would aide you with this?

I don’t want to use an API. I wanna own my software. I do use ChatGPT as a search engine, but that’s about it, I never owned Google anyways!

I know about HuggingFace. What model there do you think would help me?

Also, how do these weights work? If I bind one DNN framework to Cyclone, will the weights trained by another DNN framework work in it too? Do people use frameworks not written in C, so I would have to like triple-bind it? I know both Google’s and Facebook’s are in C. However they are in ‘garbage c’. Well let’s deal with that later.

Anyways, thanks for your help.

TL; DR:

I need an LLM that would be used in an EBNF -> Scheme parser generator.