• @burntsushiOP
    link
    English
    175 months ago

    Cross-posting from reddit:

    The PR has more details, but here are a few ad hoc benchmarks using ripgrep on my M2 mac mini while searching a 5.5GB file.

    This one is just a case insensitive search. A case insensitive regex expands to something like (ignoring Unicode) [Ss][Hh][Ee][Rr]..., which means that it has multiple literal prefixes. In fact, you can enumerate them! As long as the set is small enough, this is something that the new SIMD acceleration on aarch64 can handle (and has done for a long time on x86-64):

    $ time rg-before-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en
    3055
    
    real    8.208
    user    7.731
    sys     0.467
    maxmem  5600 MB
    faults  191
    
    $ time rg-after-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en
    3055
    
    real    1.137
    user    0.695
    sys     0.430
    maxmem  5904 MB
    faults  203
    

    And of course, using multiple literals explicitly also uses this optimization:

    $ time rg-before-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en
    3804
    
    real    9.055
    user    8.580
    sys     0.474
    maxmem  4912 MB
    faults  11
    
    $ time rg-after-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en
    3804
    
    real    1.121
    user    0.697
    sys     0.422
    maxmem  4832 MB
    faults  11
    

    And it doesn’t just work for prefixes, it also works for inner literals too:

    $ time rg-before-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en
    773
    
    real    9.065
    user    8.586
    sys     0.477
    maxmem  6384 MB
    faults  11
    
    $ time rg-after-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en
    
    773
    
    real    1.124
    user    0.702
    sys     0.421
    maxmem  6784 MB
    faults  11
    

    If you’re curious about how the SIMD stuff works, you can read my description of Teddy here. I ported this algorithm out of the Hyperscan project several years ago, and it has been one of the killer ingredients for making ripgrep fast in a lot of common cases. But it only worked on x86-64. With the rise and popularity of aarch64 and Apple silicon, I was motivated to port it over. I just recently finished analogous work for the memchr crate as well.

    • @snaggen
      link
      45 months ago

      This sounds really great and will probably have quite an impact on a lot of users. So, nice work!