I want to make my programming language! …for fun.

I’ve been reading LLVM’s own tutorial, which is really good. I’m curious though, for those of you who have written your own languages before… What do you wish you had known before you set out?

In terms of previous experience, I have written a really basic lexer and parser for a non-executable markup language I designed. Now I’m curious about the next level. I have some ideas for a language design I’d like to try out. The language features themselves are nothing new - I’m sure some other language out there has done these things and done it better. That’s fine! I just want to better understand how all this stuff hangs together.

  • K2yfi
    link
    fedilink
    arrow-up
    10
    ·
    1 year ago

    As someone who’s spent way too much time languishing over picking the perfect parsing technique for my own language, I’m actually gonna go against the norm and recommend figuring out the parser later. Instead you should start with building your language’s ASTs directly in memory and then from there, either build a backend for converting ASTs to LLVM IR, or what I’d actually do first is just start with an interpreter executing directly on the ASTs. This way you figure out the mechanics of your language, and then when they’re well established, you can worry about your syntax, and how to parse it into ASTs. There’s a lot you learn about your language by doing it this way that you don’t necessarily think about if you just start from the parser. It also let’s you see real progress/output sooner which I think is key for staying motivated on these kinds of projects.

    When it comes time to actually write the parser, I recommend either just hand crafting a parser directly, or using an existing parser generator tool like gnu bison, etc. I do not recommend trying to write your own parser generator (e.g. LR(k), LALR, LL(k), etc.) unless your language’s syntax is particularly simple. Speaking from experience, real languages have many common syntax features we take for granted that are hard to deal with in parser generators. In my case, I spent years bogged down exploring/implementing several state of the art parser algorithms (I’m a fan of generalized parsing, so Earley, GLR, SRNGLR, GLL, etc), and really only recently made any decent progress when I decided to trash them and just hand write a dumb recursive-descent-esque parser. Once things are working, it’s pretty easy to go back and swap out the parser if you want something more fancy.

    • TheLinuxGuy
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      1 year ago

      I agree on avoiding on the idea of avoiding having to make your own parser generator, this is precisely what I’m doing and it’s hell. I assumed that you probably want to pick up some understanding on how parser differs when it come to writing grammars. As for ease of use and requiring the least understanding, using something like Earley parser is probably the easiest, it would be slower than other parser algorithms, but it could handle ambiguous grammars making it ideal for first timers to learn how to write a programming language.

      • philm
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        I just default to recursive descent parsers (with pratt parsing), simple, efficient, great error messages and almighty (CFGs). For quick prototyping I really like to use https://github.com/zesterer/chumsky currently (pratt parsing was just added, need to try that out again).

        But writing a parser generator is certainly an interesting academic task.

        • TheLinuxGuy
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          1 year ago

          Very nice, I was basically forking off Python Lark and rewriting it in C language, with some adjustments to Earley Parser in an experiment to parallelize the processing in Vulkan Compute.