调试器#

TVM 调试器是用于调试 TVM 计算图执行的接口。它有助于在 TVM 运行时访问计算图结构和张量值。

调试交换格式#

1. 计算图#

通过 relay 以 JSON 序列化格式构建的优化计算图被转储为它包含有关计算图的所有信息。UX 可以直接使用这个计算图,或者将此计算图转换为 UX 可以理解的格式。

计算图 JSON 格式的说明如下

1. nodes Nodes are either placeholders or computational nodes in json. The nodes are stored as a list. A node contains the below information

  • op - 运算类型,null 表示这是占位符/变量/输入节点,tvm_op 表示此节点可以执行

  • name - 节点名称

  • inputs - 此运算的输入位置,输入是元组列表,元组包含 (nodeid, index, version)。(可选)

  • attrs - 节点属性,包含以下信息

  • flatten_data - 是否需要在执行前将数据展平

  • func_name - 融合函数的名称,对应于生成的库中的符号

  • num_inputs - 此节点的输入数量

  • num_outputs - 此节点产生的输出数量

2. arg_nodes arg_nodes is a list of indices of nodes which is placeholder/variable/input or constant/param to the graph.

3. heads heads is a list of entries as the output of the graph.

4. node_row_ptr node_row_ptr stores the history of forward path, so you can skip constructing the entire graph in inference tasks.

5. attrs attrs can contain version numbers or similar helpful information.

  • storage_id - 每个节点在存储布局中的内存槽的 ID

  • dtype - 每个节点的数据类型(枚举值)

  • dltype - 每个节点的数据类型(按照顺序)

  • shape - 每个节点的形状(按照顺序)

  • device_index - 每个条目的设备分配(按照顺序)

导出的计算图示例:

{
  "nodes": [                                    # List of nodes
    {
      "op": "null",                             # operation type = null, this is a placeholder/variable/input or constant/param node
      "name": "x",                              # Name of the argument node
      "inputs": []                              # inputs for this node, its none since this is an argument node
    },
    {
      "op": "tvm_op",                           # operation type = tvm_op, this node can be executed
      "name": "relu0",                          # Name of the node
      "attrs": {                                # Attributes of the node
        "flatten_data": "0",                    # Whether this data need to be flattened
        "func_name": "fuse_l2_normalize_relu",  # Fused function name, corresponds to the symbol in the lib generated by compilation process
        "num_inputs": "1",                      # Number of inputs for this node
        "num_outputs": "1"                      # Number of outputs this node produces
      },
      "inputs": [[0, 0, 0]]                     # Position of the inputs for this operation
    }
  ],
  "arg_nodes": [0],                             # Which all nodes in this are argument nodes
  "node_row_ptr": [0, 1, 2],                    # Row indices for faster depth first search
  "heads": [[1, 0, 0]],                         # Position of the output nodes for this operation
  "attrs": {                                    # Attributes for the graph
    "storage_id": ["list_int", [1, 0]],         # memory slot id for each node in the storage layout
    "dtype": ["list_int", [0, 0]],              # Datatype of each node (enum value)
    "dltype": ["list_str", [                    # Datatype of each node in order
        "float32",
        "float32"]],
    "shape": ["list_shape", [                   # Shape of each node k order
        [1, 3, 20, 20],
        [1, 3, 20, 20]]],
    "device_index": ["list_int", [1, 1]],       # Device assignment for each node in order
  }
}

2. 张量转储#

执行后的张量以 tvm.ndarray 类型表示。所有张量将以二进制字节的形式保存为序列化格式。结果二进制字节可以通过 API "load_params" 加载。

加载参数的示例
::
with open(path_params, "rb") as fi:

loaded_params = bytearray(fi.read())

module.load_params(loaded_params)

如何使用调试器?#

  1. config.cmake 中设置 USE_PROFILER 标志为 ON

    # Whether enable additional graph debug functions
    set(USE_PROFILER ON)
    
  2. 执行 'make' tvm,这样它就会生成 libtvm_runtime.so

  3. 在前端脚本文件中,而不是 from tvm.contrib import graph_executor,而是导入 GraphModuleDebug from tvm.contrib.debugger.debug_executor import GraphModuleDebug

from tvm.contrib.debugger.debug_executor import GraphModuleDebug
m = GraphModuleDebug(
    lib["debug_create"]("default", dev),
    [dev],
    lib.graph_json,
    dump_root="/tmp/tvmdbg",
)
# set inputs
m.set_input('data', tvm.nd.array(data.astype(dtype)))
m.set_input(**params)
# execute
m.run()
tvm_out = m.get_output(0, tvm.nd.empty(out_shape, dtype)).numpy()
  1. 如果网络先前已导出到外部库,则使用 lib.export_library("network.so")

    像共享对象文件/动态链接库一样,调试运行时的初始化会略有不同

lib = tvm.runtime.load_module("network.so")
m = graph_executor.create(lib["get_graph_json"](), lib, dev, dump_root="/tmp/tvmdbg")
# set inputs
m.set_input('data', tvm.nd.array(data.astype(dtype)))
m.set_input(**params)
# execute
m.run()
tvm_out = m.get_output(0, tvm.nd.empty(out_shape, dtype)).numpy()

输出被转储到 /tmp 文件夹中的临时文件夹或创建运行时时指定的文件夹中。

输出样例#

下面是调试器的输出示例。

Node Name               Ops                                                                  Time(us)   Time(%)  Start Time       End Time         Shape                Inputs  Outputs
---------               ---                                                                  --------   -------  ----------       --------         -----                ------  -------
1_NCHW1c                fuse___layout_transform___4                                          56.52      0.02     15:24:44.177475  15:24:44.177534  (1, 1, 224, 224)     1       1
_contrib_conv2d_nchwc0  fuse__contrib_conv2d_NCHWc                                           12436.11   3.4      15:24:44.177549  15:24:44.189993  (1, 1, 224, 224, 1)  2       1
relu0_NCHW8c            fuse___layout_transform___broadcast_add_relu___layout_transform__    4375.43    1.2      15:24:44.190027  15:24:44.194410  (8, 1, 5, 5, 1, 8)   2       1
_contrib_conv2d_nchwc1  fuse__contrib_conv2d_NCHWc_1                                         213108.6   58.28    15:24:44.194440  15:24:44.407558  (1, 8, 224, 224, 8)  2       1
relu1_NCHW8c            fuse___layout_transform___broadcast_add_relu___layout_transform__    2265.57    0.62     15:24:44.407600  15:24:44.409874  (64, 1, 1)           2       1
_contrib_conv2d_nchwc2  fuse__contrib_conv2d_NCHWc_2                                         104623.15  28.61    15:24:44.409905  15:24:44.514535  (1, 8, 224, 224, 8)  2       1
relu2_NCHW2c            fuse___layout_transform___broadcast_add_relu___layout_transform___1  2004.77    0.55     15:24:44.514567  15:24:44.516582  (8, 8, 3, 3, 8, 8)   2       1
_contrib_conv2d_nchwc3  fuse__contrib_conv2d_NCHWc_3                                         25218.4    6.9      15:24:44.516628  15:24:44.541856  (1, 8, 224, 224, 8)  2       1
reshape1                fuse___layout_transform___broadcast_add_reshape_transpose_reshape    1554.25    0.43     15:24:44.541893  15:24:44.543452  (64, 1, 1)           2       1