Getting Started With PAPI#
The Performance Application Programming Interface (PAPI) is a library that provides performance counters on a variety of platforms. Performance counters provide accurate low-level information about processors behavior during a given execution run. This information can contain simple metrics like total cycle count, cache misses, and instructions executed as well as more high level information like total FLOPS and warp occupancy. PAPI makes these metrics available while profiling.
Installing PAPI#
PAPI can either be installed using your package manager (apt-get install libpapi-dev
on Ubuntu), or from source here:
icl-utk-edu/papi.
Pulling the latest version of PAPI from source has caused build issues before. Therefore, it is recommended to checkout tagged version papi-6-0-0-1-t
.
Building TVM With PAPI#
To include PAPI in your build of TVM, set the following line in you config.cmake
:
set(USE_PAPI ON)
If PAPI is installed in a non-standard place, you can specify where it is like so:
set(USE_PAPI path/to/papi.pc)
Using PAPI While Profiling#
If TVM has been built with PAPI (see above), then you can pass a
tvm.runtime.profiling.PAPIMetricCollector
to
tvm.runtime.GraphModule.profile()
to collect performance metrics. Here
is an example:
import tvm
from tvm import relay
from tvm.relay.testing import mlp
from tvm.runtime import profiler_vm
import numpy as np
target = "llvm"
dev = tvm.cpu()
mod, params = mlp.get_workload(1)
exe = relay.vm.compile(mod, target, params=params)
vm = profiler_vm.VirtualMachineProfiler(exe, dev)
data = tvm.nd.array(np.random.rand(1, 1, 28, 28).astype("float32"), device=dev)
report = vm.profile(
data,
func_name="main",
collectors=[tvm.runtime.profiling.PAPIMetricCollector()],
)
print(report)
Name perf::CACHE-MISSES perf::CYCLES perf::STALLED-CYCLES-BACKEND perf::INSTRUCTIONS perf::STALLED-CYCLES-FRONTEND
fused_nn_dense_nn_bias_add_nn_relu 2,494 1,570,698 85,608 675,564 39,583
fused_nn_dense_nn_bias_add_nn_relu_1 1,149 655,101 13,278 202,297 21,380
fused_nn_dense_nn_bias_add 288 600,184 8,321 163,446 19,513
fused_nn_batch_flatten 301 587,049 4,636 158,636 18,565
fused_nn_softmax 154 575,143 8,018 160,738 18,995
----------
Sum 4,386 3,988,175 119,861 1,360,681 118,036
Total 10,644 8,327,360 179,310 2,660,569 270,044
You can also change which metrics are collected:
report = vm.profile(
data,
func_name="main",
collectors=[tvm.runtime.profiling.PAPIMetricCollector({dev: ["PAPI_FP_OPS"]})],
)
Name PAPI_FP_OPS
fused_nn_dense_nn_bias_add_nn_relu 200,832
fused_nn_dense_nn_bias_add_nn_relu_1 16,448
fused_nn_dense_nn_bias_add 1,548
fused_nn_softmax 160
fused_nn_batch_flatten 0
----------
Sum 218,988
Total 218,988
You can find a list of available metrics by running the papi_avail
and
papi_native_avail
commands.