调试器#

TVM 调试器是调试 TVM 计算图执行的接口。它有助于在 TVM 运行时提供对图结构和张量值的访问。

调试交换格式#

1. 计算图#

通过 relay 以 json 序列化格式构建的优化计算图被转储。它包含了关于图的全部信息。UX 可以直接使用这个图,也可以将这个图转换成 UX 可以理解的格式。

下面将解释 Graph JSON 格式:

1. nodes Nodes are either placeholders or computational nodes in json. The nodes are stored as a list. A node contains the below information

  • op - 运算类型,null 表示它是占位符/变量/输入节点,而 tvm_op 表示这个节点可以被执行。

  • name - 节点名称

  • inputs - 此运算的输入位置,输入是元组列表,每个元组包含 (nodeid, index, version)。(可选)

  • attrs - 节点的属性,包含以下信息:

  • flatten_data - 执行前是否需要展平此数据

  • func_name - 融合后的函数名称,对应于由 relay 编译过程生成的库中的符号。

  • num_inputs - 此节点的输入数量

  • num_outputs - 此节点产生的输出数量

2. arg_nodes arg_nodes is a list of indices of nodes which is placeholder/variable/input or constant/param to the graph.

3. heads heads is a list of entries as the output of the graph.

4. node_row_ptr node_row_ptr stores the history of forward path, so you can skip constructing the entire graph in inference tasks.

5. attrs attrs can contain version numbers or similar helpful information.

  • storage_id - 存储布局中每个节点的内存插槽ID。

  • dtype - 每个节点的数据类型(枚举值)。

  • dltype - 按顺序的每个节点的数据类型。

  • shape - 每个节点 k 阶的形状。

  • device_index - 图中每个条目的设备分配。

转储图的示例:

{
  "nodes": [                                    # List of nodes
    {
      "op": "null",                             # operation type = null, this is a placeholder/variable/input or constant/param node
      "name": "x",                              # Name of the argument node
      "inputs": []                              # inputs for this node, its none since this is an argument node
    },
    {
      "op": "tvm_op",                           # operation type = tvm_op, this node can be executed
      "name": "relu0",                          # Name of the node
      "attrs": {                                # Attributes of the node
        "flatten_data": "0",                    # Whether this data need to be flattened
        "func_name": "fuse_l2_normalize_relu",  # Fused function name, corresponds to the symbol in the lib generated by compilation process
        "num_inputs": "1",                      # Number of inputs for this node
        "num_outputs": "1"                      # Number of outputs this node produces
      },
      "inputs": [[0, 0, 0]]                     # Position of the inputs for this operation
    }
  ],
  "arg_nodes": [0],                             # Which all nodes in this are argument nodes
  "node_row_ptr": [0, 1, 2],                    # Row indices for faster depth first search
  "heads": [[1, 0, 0]],                         # Position of the output nodes for this operation
  "attrs": {                                    # Attributes for the graph
    "storage_id": ["list_int", [1, 0]],         # memory slot id for each node in the storage layout
    "dtype": ["list_int", [0, 0]],              # Datatype of each node (enum value)
    "dltype": ["list_str", [                    # Datatype of each node in order
        "float32",
        "float32"]],
    "shape": ["list_shape", [                   # Shape of each node k order
        [1, 3, 20, 20],
        [1, 3, 20, 20]]],
    "device_index": ["list_int", [1, 1]],       # Device assignment for each node in order
  }
}

2. Tensor dumping#

执行后收到的张量在 tvm.ndarray 类型中所有的张量将以二进制字节的序列化格式保存。结果二进制字节可以通过 API “load_params” 加载。

加载参数的示例
::
with open(path_params, “rb”) as fi:

loaded_params = bytearray(fi.read())

module.load_params(loaded_params)

如果使用 Debugger?#

  1. config.cmake 中设置 USE_PROFILERON

    # Whether enable additional graph debug functions
    set(USE_PROFILER ON)
    
  2. 执行 make tvm,这样它就会生成 libtvm_runtime.so

  3. 在前端脚本中替换 from tvm.contrib import graph_executor 导入为 from tvm.contrib.debugger.debug_executor import GraphModuleDebug

from tvm.contrib.debugger.debug_executor import GraphModuleDebug
m = GraphModuleDebug(
    lib["debug_create"]("default", dev),
    [dev],
    lib.graph_json,
    dump_root="/tmp/tvmdbg",
)
# set inputs
m.set_input('data', tvm.nd.array(data.astype(dtype)))
m.set_input(**params)
# execute
m.run()
tvm_out = m.get_output(0, tvm.nd.empty(out_shape, dtype)).numpy()
  1. 如果网络之前已经使用 lib.export_library("network.so") 导出到外部库,那么可以使用以下代码导入该库:

    与共享对象文件/动态链接库类似,初始化调试运行时会有一些不同。

lib = tvm.runtime.load_module("network.so")
m = graph_executor.create(lib["get_graph_json"](), lib, dev, dump_root="/tmp/tvmdbg")
# set inputs
m.set_input('data', tvm.nd.array(data.astype(dtype)))
m.set_input(**params)
# execute
m.run()
tvm_out = m.get_output(0, tvm.nd.empty(out_shape, dtype)).numpy()

输出会转储到 /tmp 文件夹中的临时文件夹,或者在创建运行时时指定的文件夹。

输出示例#

下面是调试器的输出示例:

Node Name               Ops                                                                  Time(us)   Time(%)  Start Time       End Time         Shape                Inputs  Outputs
---------               ---                                                                  --------   -------  ----------       --------         -----                ------  -------
1_NCHW1c                fuse___layout_transform___4                                          56.52      0.02     15:24:44.177475  15:24:44.177534  (1, 1, 224, 224)     1       1
_contrib_conv2d_nchwc0  fuse__contrib_conv2d_NCHWc                                           12436.11   3.4      15:24:44.177549  15:24:44.189993  (1, 1, 224, 224, 1)  2       1
relu0_NCHW8c            fuse___layout_transform___broadcast_add_relu___layout_transform__    4375.43    1.2      15:24:44.190027  15:24:44.194410  (8, 1, 5, 5, 1, 8)   2       1
_contrib_conv2d_nchwc1  fuse__contrib_conv2d_NCHWc_1                                         213108.6   58.28    15:24:44.194440  15:24:44.407558  (1, 8, 224, 224, 8)  2       1
relu1_NCHW8c            fuse___layout_transform___broadcast_add_relu___layout_transform__    2265.57    0.62     15:24:44.407600  15:24:44.409874  (64, 1, 1)           2       1
_contrib_conv2d_nchwc2  fuse__contrib_conv2d_NCHWc_2                                         104623.15  28.61    15:24:44.409905  15:24:44.514535  (1, 8, 224, 224, 8)  2       1
relu2_NCHW2c            fuse___layout_transform___broadcast_add_relu___layout_transform___1  2004.77    0.55     15:24:44.514567  15:24:44.516582  (8, 8, 3, 3, 8, 8)   2       1
_contrib_conv2d_nchwc3  fuse__contrib_conv2d_NCHWc_3                                         25218.4    6.9      15:24:44.516628  15:24:44.541856  (1, 8, 224, 224, 8)  2       1
reshape1                fuse___layout_transform___broadcast_add_reshape_transpose_reshape    1554.25    0.43     15:24:44.541893  15:24:44.543452  (64, 1, 1)           2       1