Marvell 机器学习集成#
1. 简介#
Marvell(R) 支持一系列高性能数据处理器(DPU),这些处理器集成了计算、高速 I/O 和工作负载加速器。这些工作负载加速器包括 Marvell 的机器学习推理处理器(MLIP),这是高度优化、集成的推理引擎。
TVM 使用 "mrvl" 库支持 Marvell 的 MLIP。这将分区和编译支持的运算,以便在 MLIP 上加速执行,或使用 LLVM 进行通用计算。
在运行时,该库支持在 MLIP 硬件以及 Marvell 的 ML 模拟器(mrvl-mlsim)上本地执行。
该库支持 Marvell 的 Octeon 系列处理器及其 ML 加速器。
This guide demonstrates building TVM with codegen and runtime enabled. It also provides example code to compile and run models using 'mrvl' runtime.
2. Building TVM with mrvl support#
2.1 Clone TVM repo#
Refer to the following TVM documentation for cloning TVM
2.2 Build and start the TVM - mrvl docker container#
./docker/ demo_mrvl bash # Build the docker container
./docker/ tvm.demo_mrvl # Load the docker image
3. Compiling a model using TVMC command line#
Models can be compiled and run for mrvl target using TVMC which is optimized for performance.
Refer to the following TVMC documentation, for tvmc generic options.
Additional mrvl-specific options may be added as attributes if necessary. The advanced usage is described in this document below.
3.1 TVMC Compilation Flow for a model#
Refer to the following TVM documentation, for compilation flow
3.2. TVMC - Command line option(s): Syntax for mrvl target#
Compiling an ONNX model using the tvmc for mrvl target.
python3 -m tvm.driver.tvmc compile --target="mrvl, llvm"
Following is an example TVMC Compile command for an ARMv9 core and integrated MLIP cn10ka processor, using only 4 tiles in the block.
python3 -m tvm.driver.tvmc compile --target="mrvl, llvm" \
--target-llvm-mtriple=aarch64-linux-gnu --target-llvm-mcpu=neoverse-n2 \
--target-mrvl-num_tiles=4 \
--target-mrvl-mattr="hw -quantize=fp16 -wb_pin_ocm=1" \
--cross-compiler aarch64-linux-gnu-gcc \
--output model.tar \
3.3. TVMC Compiler: mrvl specific Command Line Options#
Description of mrvl options
- mcpu:
The CPU class of Marvell(R) ML Inference Processor; possible values = {cn10ka, cnf10kb}; defaults to cn10ka
- num_tiles:
Maximum number of tiles that may be used, possible values = {1,2,4,8}, defaults to 8
- mattr:
Attributes for mrvl; possible values = {quantize, wb_pin_ocm, run_mode}
mattr specifies the data type, code generation options and optimizations.
List of supported attributes are:
1. quantize
Specify the data type. Possible values = {fp16, int8}. Default is fp16, int8 is WIP and full support will be added in a future PR.
2. wb_pin_ocm
Optimize runtime by preloading a model's weights and bias into the on chip memory. Possible values = {0, 1}. Default is 0 (no preload)
3. run_mode
Specify whether to compile for the simulator or for the target hardware (Octeon). Possible values = {sim, hw}. Default is sim (software simulator).
4. Compile ONNX model using the TVMC flow#
In the TVMC mrvl flow, the model is partitioned into Marvell and LLVM regions. Building each partitioned Marvell subgraph generates serialized nodes.json and const.json. Partitioned nodes.json is the representation of the model graph which is suitable for the Marvell compiler (mrvl-tmlc). The compiler compiles the model graph to generate the model binary with MLIP instructions.
4.1 Compile and Run ONNX model for Simulator + LLVM / x86_64 target#
Model Compilation for Simulator + LLVM / x86_64 target
python3 -m tvm.driver.tvmc compile --target="mrvl, llvm" \
--target-mrvl-num_tiles=4 --output model.tar model.onnx
Run TVM models on x86_64 host using MLIP Simulator
Generated model binary is simulated using Marvell's MLIP Simulator(mrvl-mlsim).
python3 -m tvm.driver.tvmc run --inputs infer.npz --outputs predict.npz model.tar --number=0
4.2 Compile and Run ONNX model for Octeon target#
Model Compilation for Octeon target
Please refer to section 3.2 for the example command line.
Run TVM models on the Octeon Target
The cross compiled binary can be run on the target hardware using the tvmc run command. Alternatively, the RPC flow enables remote execution on the target device from your local machine:
python3 -m tvm.driver.tvmc run --inputs infer.npz --outputs predict.npz model.tar
5. Compiling a model using Python APIs#
In addition to using TVMC, models can also be compiled and run using TVM Python API. Below is an example to compile and run the MNIST model.
Download MNIST model from the web
cd $HOME
Import the TVM and other dependent modules
import tvm, onnx
import numpy as np
import tvm.relay as relay
from tvm.contrib import graph_executor
from tvm.relay.op.contrib.mrvl import partition_for_mrvl
from tvm.relay.build_module import build
from keras.datasets import mnist
Load model onnx file
onnx_model = onnx.load("mnist-12.onnx")
Create a Relay graph from MNIST model
shape_dict = {'Input3' : (1,1,28,28)}
mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
Define option dictionary and Partition the Model
Annotate and partition the graph for mrvl. All operations which are supported by the mrvl will be marked and offloaded to mrvl hardware accelerator. The rest of the operations will go through the regular LLVM compilation and code generation for ARM.
tvm_target = "llvm"
option_dict = {'num_tiles': 4}
mod = partition_for_mrvl(mod, params, **option_dict)
Build the Relay Graph
Build the Relay graph, using the new module returned by partition_for_mrvl.
with tvm.transform.PassContext(opt_level=3, config={"relay.ext.mrvl.options" : option_dict}):
model_lib =, tvm_target, params=params)
Generate runtime graph of the model library
dev = tvm.cpu()
model_rt_graph = graph_executor.GraphModule(model_lib["default"](dev))
Get test data and initialize model input
(train_X, train_y), (test_X, test_y) = mnist.load_data()
image = tvm.nd.array(test_X[0].reshape(1, 1, 28, 28).astype("float32") / 255)
inputs_dict = {}
inputs_dict["Input3"] = image
Run Inference and print the output
output_tensor = model_rt_graph.get_output(0).numpy()
print (output_tensor)