• Aloso
    link
    English
    21 year ago

    The reddit thread has some interesting discussion, and a solution using no SIMD intrinsincs that is more than 200x faster, by using .chunks_exact(), and letting the compiler auto-vectorize it.