• @porgamrer
    link
    3
    edit-2
    3 months ago

    For 99% of use cases this string pool is just slower. Whether intentionally or not, the benchmark code is strange and misleading.

    String and StringPool are only slower in the final benchmark because doing 100,000 allocations in a synchronous loop while retaining a reference to each one is the worst case scenario for a generational GC. It forcibly and artificially breaks the generational hypothesis.

    Conversely, caching 100,000 samples of the same 16 strings (!!!) is the best possible case for the string pool. It spends zero time in GC because the benchmark code contains this very unrealistic pattern.

    Most real code is going to quickly forget intermediate strings and clean them up very cheaply in the nursery generation. If you do need to sample 100,000 substrings in a synchronous loop, you can just use ReadOnlySpan.

    There are real use-cases for string caches and tries, but they are pretty rare.

    • @[email protected]
      link
      fedilink
      2
      edit-2
      3 months ago

      I think the focus of the article is in highlighting the allocation performance (which is the goal of the StringPool) vs. overall performance (i.e. speed) and so the benchmark, while being artificial, is designed to focus on that specific thing. This is actually pointed out in the article just before showing the benchmark results:

      It is important to note that since the focus of StringPool is reducing memory allocation, our main focus in the benchmark is on allocations more than on speed:

      I agree that an additional benchmark, showing it in a more real-world scenario could prove helpful, but the existing benchmark does a good job of highlighting the allocation reduction seen when processing large numbers of char data. A more real world example would be something like a file upload validation method which is first checking the file extension against a HashSet<string> of valid extensions. In that scenario we would be able to take the filename as a Span and extract the extension from it as a Span, but we cannot call HashSet.Contains() with a Span, we have to use a string. So that would require calling extensionSpan.ToString(). In this scenario, we could use the StringPool to avoid unnecessary string allocation (while the article does not use this particular example, it does mention other related scenarios).

      Overall, as you mention, the real use-cases for string caches (such as StringPool) are pretty rare, it is a niche topic, but for those who need to do something like that, I think the article helps to present an accessible introduction.