Quite some exciting progress since the last progress report! There have been 180 commits since the last progress report.
As of today, rustc_codegen_cranelift is available on nightly! :tada: You can run rustup component add rustc-codegen-cranelift-preview --toolchain nightly to install it and then either CARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift cargo +nightly build to use it for the current invocation or add
Trying cranelift for the first time (I think).
Let’s create a “release-dev-cl” profile that inherits “release-dev” profile and compare.
For reference, “release-dev” is:
inherits = "release" debug = "full" codegen-units = 8 lto = "off"Cool, cold builds (including deps) went from 73s to 37s, with
zstd-sysbecoming a bigger offender.But but but…
warning: unsupported x86 llvm intrinsic llvm.x86.aesni.aesimc; replacing with trap warning: unsupported x86 llvm intrinsic llvm.x86.aesni.aesdec; replacing with trap warning: unsupported x86 llvm intrinsic llvm.x86.aesni.aesdeclast; replacing with trapAlright. Which dep is using this. Let’s
cargo vendorandrg.% cargo vendor &>/dev/null % cd vendor % rg -l 'aesimc|aesdec|aesdeclast' | sed 's|/.*||' | sort -u aes ringAlright, let’s try another project…
Nice, this one goes from 52s to 19s, and no unsupported intrinsics.
Let’s test the binary.
Hmm, it’s orders of magnitude slower… let’s
perf…- LLVM
1.71% async-global-ex abcd-cli [.] alloc::collections::btree::map::IntoIterᐸK,V,Aᐳ::dying_next 1.66% async-global-ex abcd-cli [.] ᐸalloc::collections::btree::map::KeysᐸK,Vᐳ as core::iter::traits::iterator::Iteratorᐳ::next 1.53% async-global-ex libc.so.6 [.] 0x0000000000158c4a 1.48% blocking-4 abcd-cli [.] ᐸlz4_flex::frame::decompress::FrameDecoderᐸRᐳ as std::io::Readᐳ::read_to_end 1.35% async-global-ex libc.so.6 [.] 0x000000000015818d 1.28% blocking-1 abcd-cli [.] ᐸlz4_flex::frame::decompress::FrameDecoderᐸRᐳ as std::io::Readᐳ::read_to_end 1.25% blocking-4 abcd-cli [.] ᐸcore::iter::adapters::map::MapᐸI,Fᐳ as core::iter::traits::iterator::Iteratorᐳ::try_fold 1.25% async-global-ex libc.so.6 [.] malloc 1.19% blocking-4 libc.so.6 [.] malloc 1.12% blocking-2 abcd-cli [.] ᐸlz4_flex::frame::decompress::FrameDecoderᐸRᐳ as std::io::Readᐳ::read_to_end 1.06% async-global-ex libc.so.6 [.] 0x0000000000158487 0.99% async-global-ex abcd-cli [.] ᐸalloc::collections::btree::map::BTreeMapᐸK,V,Aᐳ as core::ops::drop::Dropᐳ::drop 0.92% blocking-2 libc.so.6 [.] malloc 0.91% blocking-4 libc.so.6 [.] 0x0000000000158180 0.85% blocking-2 [kernel.vmlinux] [k] clear_page_erms 0.84% async-global-ex abcd-cli [.] alloc::collections::btree::search::ᐸimpl alloc::collections::btree::node::NodeRefᐸBorrowType, 0.81% async-global-ex abcd-cli [.] abcd::all::ELSMap::mk_extracted_st 0.78% blocking-1 libc.so.6 [.] malloc 0.75% blocking-4 abcd-cli [.] core::str::converts::from_utf8 0.75% async-global-ex [kernel.vmlinux] [k] clear_page_erms 0.74% async-global-ex abcd-cli [.] core::ptr::drop_in_placeᐸabcd::foo::FooStreamᐳ 0.74% async-global-ex abcd-cli [.] ᐸalloc::string::String as core::fmt::Writeᐳ::write_str 0.74% blocking-4 libc.so.6 [.] 0x000000000015818d 0.74% async-global-ex libc.so.6 [.] 0x000000000009a9f8 0.66% async-global-ex abcd-cli [.] alloc::raw_vec::RawVecᐸT,Aᐳ::reserve::do_reserve_and_handle- Cranelift
13.54% async-global-ex abcd-cli [.] alloc::vec::VecᐸT,Aᐳ::extend_with 2.34% async-global-ex abcd-cli [.] ᐸusize as core::iter::range::Stepᐳ::forward_unchecked 1.57% async-global-ex libc.so.6 [.] 0x000000000015818d 1.38% async-global-ex abcd-cli [.] core::clone::impls::ᐸimpl core::clone::Clone for u8ᐳ::clone 1.34% blocking-4 abcd-cli [.] lz4_flex::block::decompress_safe::decompress_internal 1.18% async-global-ex abcd-cli [.] ᐸcore::iter::adapters::enumerate::EnumerateᐸIᐳ as core::iter::traits::iterator::Iteratorᐳ::ne 1.09% blocking-4 abcd-cli [.] lz4_flex::block::decompress_safe::read_u16 1.00% blocking-4 libc.so.6 [.] 0x000000000015818d 0.97% blocking-3 abcd-cli [.] lz4_flex::block::decompress_safe::read_u16 0.94% async-global-ex abcd-cli [.] alloc::collections::btree::search::ᐸimpl alloc::collections::btree::node::NodeRefᐸBorrowType, 0.86% blocking-2 abcd-cli [.] lz4_flex::block::decompress_safe::read_u16 0.84% blocking-4 abcd-cli [.] ᐸcore::ops::range::Rangeᐸusizeᐳ as core::slice::index::SliceIndexᐸ[T]ᐳᐳ::index_mut 0.77% blocking-2 abcd-cli [.] lz4_flex::block::decompress_safe::decompress_internal 0.72% async-global-ex abcd-cli [.] ᐸcore::slice::iter::IterᐸTᐳ as core::iter::traits::iterator::Iteratorᐳ::next 0.71% blocking-3 abcd-cli [.] ᐸcore::ops::range::Rangeᐸusizeᐳ as core::slice::index::SliceIndexᐸ[T]ᐳᐳ::index_mut 0.69% blocking-3 libc.so.6 [.] 0x000000000015818d 0.68% blocking-2 libc.so.6 [.] 0x000000000015818d 0.67% blocking-4 abcd-cli [.] ᐸcore::ops::range::Rangeᐸusizeᐳ as core::slice::index::SliceIndexᐸ[T]ᐳᐳ::index 0.67% blocking-3 abcd-cli [.] lz4_flex::block::decompress_safe::decompress_internal 0.62% blocking-2 abcd-cli [.] ᐸcore::ops::range::Rangeᐸusizeᐳ as core::slice::index::SliceIndexᐸ[T]ᐳᐳ::index_mut 0.60% blocking-3 abcd-cli [.] ᐸcore::ops::range::Rangeᐸusizeᐳ as core::slice::index::SliceIndexᐸ[T]ᐳᐳ::index 0.57% blocking-3 abcd-cli [.] speedy::circular_buffer::CircularBuffer::consume_into 0.56% async-global-ex abcd-cli [.] alloc::collections::btree::search::ᐸimpl alloc::collections::btree::node::NodeRefᐸBorrowType, 0.56% blocking-4 abcd-cli [.] speedy::circular_buffer::CircularBuffer::consume_into 0.54% blocking-3 abcd-cli [.] speedy::reader::Reader::read_u64Ouch,
Vec::extend_with(),usize::forward_unchecked(), and even worse,u8::clone()are slow!That’s a hell of a comment !
Really exciting news! This should mitigate a longstanding problem that Rust developers have had when iterating.


