Hey. I have made some DSLs before, and for all of them, most of which were in C, I have used Flex and Bison. But this time I wanna use Scheme, Cyclone Scheme to be exact. I can potentially use Flex/Bison this time too, because Cyclone has a strong FFI to C.

But I’d rather innovate. I have been writing down the language’s grammar in EBNF:

https://gist.github.com/Chubek/bd54df78fe1f71f46cb262ba990a209b

And my thinking is, why not turn this into a parser? Not something like BNFC that translates BNF (not EBNF) into several parser and lexer specifications plus an AST, I want it to translate EBNF into Scheme.

But if you read the grammar, you will realize that there are some places where it’s not very descriptive and machine-friendly. For that reason, I think an LLM can help.

Now, I need your help. I am mostly active in systems programming. Like Assembly, C stuff. I don’t know much about LLMs and this whole AI revolution. I did some work as an ‘ML-engineer guy’ (not an ML engineer, an ML-engineer guy, there’s a difference!), so I know how this whole thing works. I have also read MITs standard book on mathematical optimization.

But I definitely need to use a pretrained model here. My knowledge of mathematical optimization is useless when you need like 28 million to train a model that would aide you with this?

I don’t want to use an API. I wanna own my software. I do use ChatGPT as a search engine, but that’s about it, I never owned Google anyways!

I know about HuggingFace. What model there do you think would help me?

Also, how do these weights work? If I bind one DNN framework to Cyclone, will the weights trained by another DNN framework work in it too? Do people use frameworks not written in C, so I would have to like triple-bind it? I know both Google’s and Facebook’s are in C. However they are in ‘garbage c’. Well let’s deal with that later.

Anyways, thanks for your help.

TL; DR:

I need an LLM that would be used in an EBNF -> Scheme parser generator.

  • @Corbin
    link
    English
    22 months ago

    I’m not sure I understand your reasoning. Here are the highlights for me.

    But this time I wanna use Scheme, Cyclone Scheme to be exact.

    Why? Scheme is a fine choice in general, but you should only tie yourself to a specific flavor of Scheme if you need specific features…

    because Cyclone has a strong FFI to C.

    That’s a good reason! Keep in mind that there are usually multiple possible Schemes. In this case, I’d be aware of CHICKEN Scheme as well.

    I want it to translate EBNF into Scheme.

    Scheme what? What sorts of programs? Presumably a lexer and parser and AST manipulation kit? In a certain sense, that’s all that one can do with grammars. Don’t get me wrong – if you can parse Antlr grammars, then you have hundreds of popular languages at your fingers. But I think that you need to clarify your three languages: your source, your target, and your implementation. I like to use tombstone diagrams for this.

    I think an LLM can help.

    No, I’m afraid not. Even if all the popular claims about them were true, LLMs wouldn’t help you understand compilers; at best, they might drop some phrases that can be looked up on Wikipedia.

    I need an LLM that would be used in an EBNF -> Scheme parser generator.

    Wants are not needs. The parser generator could be handwritten. This was a painful lesson for me when I was younger; I faffed around with a parser generator for a couple weeks, and then one of the old hands wrote two thousand lines over a week, lexer and parser and test cases.

    • ChubakPDP11+TakeWithGrainOfSaltOP
      link
      12 months ago

      Yes I realized as much that a hand-roller LP is best. Especially after I read Design Concepts [2008, MIT] and learned how to represent ASTs in S-Expressions. Thanks.