• Aloso
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    The reddit thread has some interesting discussion, and a solution using no SIMD intrinsincs that is more than 200x faster, by using .chunks_exact(), and letting the compiler auto-vectorize it.