Link to the thread: https://programming.dev/post/8969747

Hello everyone, I’ve followed this thread yesterday and noticed a few very negative reactions towards the choice of Java. I follow Java evolution from far away, but it seemed like it was evolving in a good direction since the last few years, and that performance-wise it would make sense for the back-end of a Lemmy-like platform.

Is it indeed the case? I was just curious to see that much negativity towards one of the most popular languages.

  • BatmanAoD
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    Re: “the guy has no clue what std::unique_ptr is”, are you saying that because of his assertion that unique_ptr has a non-zero cost, whereas Rust’s Box does not?

    He’s actually correct about that, although the difference is fairly minimal, and I believe the difference is outweighed by the unwinding (i.e. panic/exception handling) code that needs to be generated in both cases. But with unwinding disabled, you can see clearly that Rust generates exactly the same code for a Box as for a raw pointer, whereas C++ does not:

    The reason I looked into this is because of a Chandler Carruth talk primarily about unique_ptr called “There Are No Zero-Cost Abstractions”, which explains in detail why C++ fundamentally can’t optimize unique_ptr to generate the same code as a raw pointer.

    • AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world
      link
      fedilink
      arrow-up
      3
      arrow-down
      2
      ·
      edit-2
      1 year ago

      That’s a bad apples-to-oranges comparison, unique_ptr frees memory upon destruction, which with the raw pointer version you don’t do. The least you could do is use rvalue references. The class layout of unique_ptr is also hard to optimize away (unless via LTO) because consume isn’t in the same translation unit and the compiler has to let your binary be ABI compatible with the rest of your binaries. (Also, you’re using Clang 9 by the way, we are at version 17 now)

      This is much fairer: https://godbolt.org/z/v4PYcd8hf

      Then, if you additionally make the functions’ bodies accessible to the compiler and add a free to the raw pointer version (for fairness if you insist to have consume or foo destroy the resource), you should get an almost identical assembly code (with still an extra indirection that you’ll see in an extra mov due to the fact that the C++ compiler still doesn’t see how you use them, but IMO that should still be a textbook case for LTO), and the non-zero difference should disappear altogether once you actually use those functions and if it doesn’t you absolutely should file a bug report.

      Carruth, while an excellent presenter, has been on a “C++ standard committee bad, why don’t we do more ABI-breaking changes, y’all suck, Abseil and Carbon rule” rant spree, with that basically materialized by Google stopping active participation in Clang (haven’t followed the drama since then so not sure if Google backtracked on that decision), and it’s hard to consider him to be objective about this since he also has the Carbon project and his recent Carbon talks are painful to watch as it’s hard to ignore how he’s going from a “C++ optimization chad” that he used to be to a Google marketing/sales person.

      • BatmanAoD
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        1 year ago

        That’s a bad apples-to-oranges comparison, unique_ptr frees memory upon destruction, which with the raw pointer version you don’t do.

        I intentionally crafted an example where the code is simply using unique_ptr (and Box) without freeing the memory, just as it uses the raw pointer without freeing it. The consumes function would of course free it, hence the name. Freeing the memory shouldn’t be all that different between free, ~unique_ptr, and Box::drop.

        Moreover, the Rust code is doing the same thing the C++ code is doing; Box frees memory just like unique_ptr does.

        The least you could do is use rvalue references.

        I was surprised to see how much lower-overhead that looks, and I couldn’t remember why I originally wrote the example as passing by value until I reviewed Carruth’s video. But he actually talks about using rvalue references around the 22 minute mark, and then goes back to passing by value, so I assume that’s why I wrote it the way I did. I do think it’s pretty counterintuitive that a type that’s semantically a pointer needs to be passed by reference for efficiency.

        The class layout of unique_ptr is also hard to optimize away (unless via LTO)…

        The “class layout” of unique_ptr is just a pointer; are you talking about the struct needing to be on the stack in order to satisfy the ABI? That’s true, but people do in fact need to pass data between multiple different translation units (and even into and out of dynamically-loaded libraries), so that should be possible to do in an efficient manner. And, again, both the raw-pointer version and the Rust version manage to make this work.

        you’re using Clang 9 by the way…

        Oops, good catch; I crafted this example a long time ago and did try it with the most recent version, but I guess that must have been in a different tab. But it doesn’t actually make much of a difference here.

        Then, if you additionally make the functions’ bodies accessible to the compiler and add a free to the raw pointer version… and the non-zero difference should disappear altogether once you actually use those functions…

        Yes, sure, compiling in one translation unit helps, but as I mentioned above, passing an owning pointer between translation units shouldn’t be inherently inefficient. But also, as far as I can tell, making those changes doesn’t actually make the unique_ptr and raw-pointer assembly equivalent. The && in the signature for “consumes” is odd because the function doesn’t actually take ownership of the pointer so it doesn’t actually free it, and consequently the inlining of the function is a no-op and the destructor is called inside foo. But that doesn’t hinder the raw-pointer comparison much, because the C version just inlines consumes. I don’t read assembly well enough to understand whether the extra mov in the unique_ptr version is very significant or why it exists. (The print_global function is only here to prevent the other functions from being turned into no-ops.)

        https://godbolt.org/z/83T8Gfszv

        “Abseil and Carbon rule…”

        Abseil is…a collection of C++ libraries? How does that make him biased against the C++ standards committee? Carbon was announced in 2022, and the talk I linked was given in 2019, so I don’t know if Carruth was on his “rant spree” in your opinion at that point. But the point of linking to Carruth’s talk was just to explain where that example originally came from and to let someone more knowledgeable than myself explain why it would require ABI breakage for C++ to optimize unique_ptr as well as Rust optimizes Box.

        • AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world
          link
          fedilink
          arrow-up
          1
          arrow-down
          4
          ·
          edit-2
          1 year ago

          The reason I said to use rvalue references is because otherwise it is an apples-to-oranges comparison: in the C++ code you have implicit ABI decisions around the call convention and whose responsibility it is to destroy the temporary.

          Yes, sure, compiling in one translation unit helps, but as I mentioned above, passing an owning pointer between translation units shouldn’t be inherently inefficient

          https://godbolt.org/z/9875qMM6Y (or alternatively: https://godbolt.org/z/9xehs3sYP)

          The assembly is identical, the ownership is clearly transferred, and this doesn’t need LTO or looking at the function bodies and is entirely done by the C++ compiler. It involves using (when available) a vendor attribute (see trivial_abi, shouldn’t be an issue given Rust devs are fine with having only one compiler anyway) and writing a UniquePtr class (shouldn’t be used in production code, what I’ve given there is only for illustration purposes) that assumes that the custom deleter cannot have an internal state.

          This is a zero-runtime-cost abstraction. Now whether the zeroing of that cost can depend on what ABI assumptions you’re ready to make, or whether you want to depend on LTO is another thing. We’re literally discussing a “problem” that is not really a problem because Rust doesn’t have the luxury yet to have that problem: you’re easily forgetting that Rust has only one compiler.

          Carbon was announced in 2022

          A project like that usually takes years, so again, very likely that they began working on it years before that. For instance, Google designed Go in 2007 and announced it in November 2009.

          • BatmanAoD
            link
            fedilink
            arrow-up
            1
            ·
            1 year ago

            So…you had to make your own version of unique_ptr to make it zero-cost? Doesn’t that just confirm the original statement you were disagreeing with, that unique_ptr has a small runtime cost? Or was there some other reason you thought the creator of the video you shared has “no idea” what unique_ptr is?

            I also don’t understand why the standard library can’t use the trivial-abi attribute. Different implementations of the standard library aren’t required to be interoperable, are they?

            I still don’t understand what you think is “apples-to-oranges” here. If you change the Rust code to require the C ABI, there’s no difference in the generated code: https://godbolt.org/z/1xf9qG3n8