调试器#
TVM 调试器是用于调试 TVM 计算图执行的接口。它有助于在 TVM 运行时访问计算图结构和张量值。
调试交换格式#
1. 计算图#
通过 relay 以 JSON 序列化格式构建的优化计算图被转储为它包含有关计算图的所有信息。UX 可以直接使用这个计算图,或者将此计算图转换为 UX 可以理解的格式。
计算图 JSON 格式的说明如下
1. nodes
Nodes are either placeholders or computational nodes in json. The nodes are stored
as a list. A node contains the below information
op
- 运算类型,null
表示这是占位符/变量/输入节点,tvm_op
表示此节点可以执行name
- 节点名称inputs
- 此运算的输入位置,输入是元组列表,元组包含 (nodeid, index, version)。(可选)attrs
- 节点属性,包含以下信息
flatten_data
- 是否需要在执行前将数据展平
func_name
- 融合函数的名称,对应于生成的库中的符号
num_inputs
- 此节点的输入数量
num_outputs
- 此节点产生的输出数量
2. arg_nodes
arg_nodes is a list of indices of nodes which is placeholder/variable/input or constant/param to the graph.
3. heads
heads is a list of entries as the output of the graph.
4. node_row_ptr
node_row_ptr stores the history of forward path, so you can skip constructing the entire graph in inference tasks.
5. attrs
attrs can contain version numbers or similar helpful information.
storage_id
- 每个节点在存储布局中的内存槽的 IDdtype
- 每个节点的数据类型(枚举值)dltype
- 每个节点的数据类型(按照顺序)shape
- 每个节点的形状(按照顺序)device_index
- 每个条目的设备分配(按照顺序)
导出的计算图示例:
{
"nodes": [ # List of nodes
{
"op": "null", # operation type = null, this is a placeholder/variable/input or constant/param node
"name": "x", # Name of the argument node
"inputs": [] # inputs for this node, its none since this is an argument node
},
{
"op": "tvm_op", # operation type = tvm_op, this node can be executed
"name": "relu0", # Name of the node
"attrs": { # Attributes of the node
"flatten_data": "0", # Whether this data need to be flattened
"func_name": "fuse_l2_normalize_relu", # Fused function name, corresponds to the symbol in the lib generated by compilation process
"num_inputs": "1", # Number of inputs for this node
"num_outputs": "1" # Number of outputs this node produces
},
"inputs": [[0, 0, 0]] # Position of the inputs for this operation
}
],
"arg_nodes": [0], # Which all nodes in this are argument nodes
"node_row_ptr": [0, 1, 2], # Row indices for faster depth first search
"heads": [[1, 0, 0]], # Position of the output nodes for this operation
"attrs": { # Attributes for the graph
"storage_id": ["list_int", [1, 0]], # memory slot id for each node in the storage layout
"dtype": ["list_int", [0, 0]], # Datatype of each node (enum value)
"dltype": ["list_str", [ # Datatype of each node in order
"float32",
"float32"]],
"shape": ["list_shape", [ # Shape of each node k order
[1, 3, 20, 20],
[1, 3, 20, 20]]],
"device_index": ["list_int", [1, 1]], # Device assignment for each node in order
}
}
2. 张量转储#
执行后的张量以 tvm.ndarray
类型表示。所有张量将以二进制字节的形式保存为序列化格式。结果二进制字节可以通过 API "load_params" 加载。
- 加载参数的示例
- ::
- with open(path_params, "rb") as fi:
loaded_params = bytearray(fi.read())
module.load_params(loaded_params)
如何使用调试器?#
在
config.cmake
中设置USE_PROFILER
标志为ON
# Whether enable additional graph debug functions set(USE_PROFILER ON)
执行 'make' tvm,这样它就会生成
libtvm_runtime.so
在前端脚本文件中,而不是
from tvm.contrib import graph_executor
,而是导入GraphModuleDebug
from tvm.contrib.debugger.debug_executor import GraphModuleDebug
from tvm.contrib.debugger.debug_executor import GraphModuleDebug
m = GraphModuleDebug(
lib["debug_create"]("default", dev),
[dev],
lib.graph_json,
dump_root="/tmp/tvmdbg",
)
# set inputs
m.set_input('data', tvm.nd.array(data.astype(dtype)))
m.set_input(**params)
# execute
m.run()
tvm_out = m.get_output(0, tvm.nd.empty(out_shape, dtype)).numpy()
- 如果网络先前已导出到外部库,则使用
lib.export_library("network.so")
像共享对象文件/动态链接库一样,调试运行时的初始化会略有不同
- 如果网络先前已导出到外部库,则使用
lib = tvm.runtime.load_module("network.so")
m = graph_executor.create(lib["get_graph_json"](), lib, dev, dump_root="/tmp/tvmdbg")
# set inputs
m.set_input('data', tvm.nd.array(data.astype(dtype)))
m.set_input(**params)
# execute
m.run()
tvm_out = m.get_output(0, tvm.nd.empty(out_shape, dtype)).numpy()
输出被转储到 /tmp
文件夹中的临时文件夹或创建运行时时指定的文件夹中。
输出样例#
下面是调试器的输出示例。
Node Name Ops Time(us) Time(%) Start Time End Time Shape Inputs Outputs
--------- --- -------- ------- ---------- -------- ----- ------ -------
1_NCHW1c fuse___layout_transform___4 56.52 0.02 15:24:44.177475 15:24:44.177534 (1, 1, 224, 224) 1 1
_contrib_conv2d_nchwc0 fuse__contrib_conv2d_NCHWc 12436.11 3.4 15:24:44.177549 15:24:44.189993 (1, 1, 224, 224, 1) 2 1
relu0_NCHW8c fuse___layout_transform___broadcast_add_relu___layout_transform__ 4375.43 1.2 15:24:44.190027 15:24:44.194410 (8, 1, 5, 5, 1, 8) 2 1
_contrib_conv2d_nchwc1 fuse__contrib_conv2d_NCHWc_1 213108.6 58.28 15:24:44.194440 15:24:44.407558 (1, 8, 224, 224, 8) 2 1
relu1_NCHW8c fuse___layout_transform___broadcast_add_relu___layout_transform__ 2265.57 0.62 15:24:44.407600 15:24:44.409874 (64, 1, 1) 2 1
_contrib_conv2d_nchwc2 fuse__contrib_conv2d_NCHWc_2 104623.15 28.61 15:24:44.409905 15:24:44.514535 (1, 8, 224, 224, 8) 2 1
relu2_NCHW2c fuse___layout_transform___broadcast_add_relu___layout_transform___1 2004.77 0.55 15:24:44.514567 15:24:44.516582 (8, 8, 3, 3, 8, 8) 2 1
_contrib_conv2d_nchwc3 fuse__contrib_conv2d_NCHWc_3 25218.4 6.9 15:24:44.516628 15:24:44.541856 (1, 8, 224, 224, 8) 2 1
reshape1 fuse___layout_transform___broadcast_add_reshape_transpose_reshape 1554.25 0.43 15:24:44.541893 15:24:44.543452 (64, 1, 1) 2 1