Recently someone in the community suggested that we try profile-guided optimization (#9387). Let's see how we can use Rust to build a PGO-optimized Databend!
Background
Profile-guided Optimization is a compiler optimization technique, which collects typical execution data (possible branches) during program execution and then optimizes for inlining, conditional branches, machine code layout, register allocation, etc.
The reason to introduce this technique is that static analysis techniques only consider code performance improvements without actually executing the program. However, these optimizations may not be fully effective. In the absence of runtime information, the compiler cannot take into account the actual execution of the program.
PGO allows data to be collected based on application scenarios in a production environment, so the optimizer can optimize the speed for hot code paths and size for cold code paths and produce faster and smaller code for applications.
rustc supports PGO by building data collection into the binaries, then collecting perf data during runtime to prepare for the final compilation optimization. The implementation relies entirely on LLVM.
Workflow
Follow the workflow below to generate a PGO-optimized program:
- Compile the program with instrumentation enabled.
- Run the instrumented program to generate a
profraw
file. - Convert the
.profraw
file into a.profdata
file using LLVM'sllvm-profdata
tool. - Compile the program again with the profiling data.
Preparations
The data collected during the run will be eventually converted with llvm-profdata
. To do so, install the llvm-tools-preview
component via rustup
, or consider using the program provided by a recent LLVM or Clang version.
rustup component add llvm-tools-preview
After the installation, llvm-profdata
may need to be added to the following PATH
:
~/.rustup/toolchains/<toolchain>/lib/rustlib/<target-triple>/bin/
Step-By-Step
The following procedure uses Databend's SQL logic tests for demonstration purposes only to help us understand how it works, so you may not get positive results for performance. Use a typical workload for your production environment.
The caveat, however, is that the sample of data fed to the program during the profiling stage must be statistically representative of the typical usage scenarios; otherwise, profile-guided feedback has the potential to harm the overall performance of the final build instead of improving it.
Make sure there is no left-over profiling data from previous runs.
rm -rf /tmp/pgo-data
Build the instrumented binaries (with
release
profile), using theRUSTFLAGS
environment variable in order to pass the PGO compiler flags to the compilation of all crates in the program.RUSTFLAGS="-Cprofile-generate=/tmp/pgo-data" \
cargo build --release --target=x86_64-unknown-linux-gnuInstrumented binaries were run with some typical workload and we strongly recommend using workload that is statistically representative of the real scenario. This example runs SQL logic tests for reference only.
Start a stand-alone Databend via a script, or a Databend cluster. Note that a production environment is more likely to run in cluster mode.
Import the dataset and run a typical query workload.
BUILD_PROFILE=release ./scripts/ci/deploy/databend-query-standalone.sh
ulimit -n 10000;ulimit -s 16384; cargo run -p sqllogictests --release -- --enable_sandbox --parallel 16 --no-fail-fast
Merge the
.profraw
files into a.profdata
file withllvm-profdata
.llvm-profdata merge -o /tmp/pgo-data/merged.profdata /tmp/pgo-data
Use the
.profdata
file for guiding optimizations. In fact, you can notice that both builds use the--release
flag, because in an actual runtime case we always use the release build binary.RUSTFLAGS="-Cprofile-use=/tmp/pgo-data/merged.profdata -Cllvm-args=-pgo-warn-missing-function" \
cargo build --release --target=x86_64-unknown-linux-gnuRun the compiled program again with the previous workload and check the performance:
BUILD_PROFILE=release ./scripts/ci/deploy/databend-query-standalone.sh
ulimit -n 10000;ulimit -s 16384; cargo run -p sqllogictests --release -- --enable_sandbox --parallel 16 --no-fail-fast