• 0 Posts
  • 4 Comments
Joined 2 years ago
cake
Cake day: June 26th, 2023

help-circle


  • As someone who’s spent way too much time languishing over picking the perfect parsing technique for my own language, I’m actually gonna go against the norm and recommend figuring out the parser later. Instead you should start with building your language’s ASTs directly in memory and then from there, either build a backend for converting ASTs to LLVM IR, or what I’d actually do first is just start with an interpreter executing directly on the ASTs. This way you figure out the mechanics of your language, and then when they’re well established, you can worry about your syntax, and how to parse it into ASTs. There’s a lot you learn about your language by doing it this way that you don’t necessarily think about if you just start from the parser. It also let’s you see real progress/output sooner which I think is key for staying motivated on these kinds of projects.

    When it comes time to actually write the parser, I recommend either just hand crafting a parser directly, or using an existing parser generator tool like gnu bison, etc. I do not recommend trying to write your own parser generator (e.g. LR(k), LALR, LL(k), etc.) unless your language’s syntax is particularly simple. Speaking from experience, real languages have many common syntax features we take for granted that are hard to deal with in parser generators. In my case, I spent years bogged down exploring/implementing several state of the art parser algorithms (I’m a fan of generalized parsing, so Earley, GLR, SRNGLR, GLL, etc), and really only recently made any decent progress when I decided to trash them and just hand write a dumb recursive-descent-esque parser. Once things are working, it’s pretty easy to go back and swap out the parser if you want something more fancy.


  • I’m making an engineering language where just about everything is an expression. Lately the most interesting thing to me is the juxtapose operator, i.e. if you stick two expressions next to each other without whitespace, they are considered juxtaposed. Initially juxtapose was just going to be for math/multiplication, but I’ve also decided to make function calling handled via juxtapose as well (since it lets me get rid of several types of syntax and replace them with pure expression handling)

    Some interesting examples:

    • since the quotes delimit the string, you don’t need the parenthesis

      printl'Hello, World!'
      
    • though sometimes you need to disambiguate with parenthesis

      s = "Hello, World!"
      printl(s)
      
    • technically you can wrap either operand, so long as they touch

      (printl)s  // though this is bad style for function calls
      
    • this has a neat consequence that string prefixes are just functions, and work pretty seamlessly

      mypath = p"this/is/some/path/object"
      myregex = re"[^i*&2@]"
      myphonetics = ipa"ɛt vɔkavit dɛus aɾidam tɛɾam kɔngɾɛgatsiɔnɛskwɛ"
      

      p, re, and ipa are all just ordinary functions

    • some basic math examples

      x = 3
      y = 2x
      z = (2+3y)(x*2)
      
    • complex numbers/quaternions are pretty seamless

      1 + 2i
      1 + 2i + 3j + 4k
      
    • also physical units will be first class citizens, and fit in pretty nicely with juxtapose

      15kg
      7(kg) * 10(m/s/s)
      25(N/m^2) + 15(Pa)
      1500(W) / 10(A)
      5(A) * 2(Ω)
      8(m*s^-1) / 2(s)
      40(N*m) * 10(rad)
      1000(m^3) * 2(kg/m^3)
      
    • Where it gets really wacky/hard to parse is something like this

      sin(x)^2 + cos(x)^2    // => (sin(x))^2 + (cos(x))^2
      

      depending on the types of sin and cos different things can happen. By default sin/cos are functions, so the function call happens first, but if the user redefined them or constructed an identically formatted expression where they are numeric, then the exponent should happen first

      s = 10
      c = 20
      s(x)^2 + c(x)^2    // => s(x^2) + c(x^2)
      

      but that’s all just a problem for the compiler

    Been working on and off on an interpreter/compiler written in python. Pretty slow going though.