{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 调试\n", "\n", "通常在创作变换的过程中,我们的代码并不完全正确。在这种情况下,可能需要进行一些调试。关键是 backwards 工作:首先,检查调用生成的 module 的结果,以证明或否定正确性。然后,检查和调试生成的代码。然后,调试导致生成代码的变换过程。\n", "\n", "## 变换创作中的常见陷阱\n", "\n", "不确定的 {class}`set` 迭代顺序。在 Python 中,设置的数据类型是无序的。例如,使用 {class}`set` 来包含节点等对象的集合可能会导致意外的不确定性。一个例子是迭代一组节点,将它们插入到图中。因为设置的数据类型是无序的,输出程序中运算的顺序将是不确定的,并且可以在程序调用之间更改。推荐的替代方法是使用 {class}`dict` 数据类型,这是 Python 3.7(以及 cPython 3.6)开始按照[插入顺序](https://mail.python.org/pipermail/python-dev/2017-December/151283.html)排序。通过将要重复数据删除的值存储在 {class}`dict` 的键中,{class}`dict` 可以等价地用于 {class}`set`。\n", "\n", "## 检查 module 的正确性\n", "\n", "因为大多数深度学习 module 的输出都是由浮点 {class}`torch.Tensor` 实例组成,检查两个 {class}`torch.nn.Module` 结果之间的等价性不像做简单的相等性检查那样直接。为了激发这个想法,举个例子(RuntimeError:有多个值的张量的布尔值不明确):" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "ename": "RuntimeError", "evalue": "Boolean value of Tensor with more than one value is ambiguous", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m/media/pc/data/4tb/lxw/home/lxw/hub/torch-book/doc/tutorial/fx/Debugging.ipynb Cell 2\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 17\u001b[0m transformed_resnet18 \u001b[39m=\u001b[39m transform(resnet18)\n\u001b[1;32m 19\u001b[0m input_image \u001b[39m=\u001b[39m torch\u001b[39m.\u001b[39mrandn(\u001b[39m5\u001b[39m, \u001b[39m3\u001b[39m, \u001b[39m224\u001b[39m, \u001b[39m224\u001b[39m)\n\u001b[0;32m---> 21\u001b[0m \u001b[39massert\u001b[39;00m resnet18(input_image) \u001b[39m==\u001b[39m transformed_resnet18(input_image)\n", "\u001b[0;31mRuntimeError\u001b[0m: Boolean value of Tensor with more than one value is ambiguous" ] } ], "source": [ "import torch\n", "import torch.fx\n", "import torchvision.models as models\n", "\n", "\n", "def transform(m : torch.nn.Module) -> torch.nn.Module:\n", " gm = torch.fx.symbolic_trace(m)\n", "\n", " # Imagine we're doing some transforms here\n", " # <...>\n", "\n", " gm.recompile()\n", "\n", " return gm\n", "\n", "resnet18 = models.resnet18()\n", "transformed_resnet18 = transform(resnet18)\n", "\n", "input_image = torch.randn(5, 3, 224, 224)\n", "\n", "assert resnet18(input_image) == transformed_resnet18(input_image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在这里,尝试用 `==` 运算符检查两个深度学习模型的值是否相等。然而,由于运算符返回的是张量而不是 `bool` 值的问题,而且由于浮点值的比较应该使用误差边界(或 epsilon)来解释浮点运算的[非交换性](https://floating-point-gui.de/errors/comparison/),这两个问题都没有很好地定义。可以使用 {func}`torch.allclose`,它会考虑到相对和绝对公差阈值的近似比较:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "assert torch.allclose(resnet18(input_image), transformed_resnet18(input_image))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "与参考实现相比,这是工具箱中检查变换模块行为是否如期望的那样的第一个工具。\n", "\n", "## 调试生成的代码\n", "\n", "因为 FX 在 {class}`torch.fx.GraphModule` 上生成 {func}`forward` 函数,所以使用传统的调试技术(如 `print` 语句或 `pdb`)就不那么直接了。幸运的是,有几种技术可以用来调试生成的代码。\n", "\n", "### 使用 `pdb`\n", "\n", "调用 `pdb` 进入正在运行的程序。尽管表示 {class}`torch.fx.Graph` 的代码不在任何源文件中,但是当调用 `forward` 传递时,仍然可以使用 `pdb` 手动进入它。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--Return--\n", "None\n", "> \u001b[0;32m/tmp/ipykernel_2297333/4158250709.py\u001b[0m(21)\u001b[0;36m\u001b[0;34m()\u001b[0m\n", "\u001b[0;32m 19 \u001b[0;31m\u001b[0;31m# interactive `pdb` prompt. We can use the `step` or `s` command to\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0m\u001b[0;32m 20 \u001b[0;31m\u001b[0;31m# step into the execution of the next line\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0m\u001b[0;32m---> 21 \u001b[0;31m\u001b[0;32mimport\u001b[0m \u001b[0mpdb\u001b[0m\u001b[0;34m;\u001b[0m \u001b[0mpdb\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mset_trace\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0m\u001b[0;32m 22 \u001b[0;31m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0m\u001b[0;32m 23 \u001b[0;31m\u001b[0mmy_module_transformed\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minput_value\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0m\n" ] } ], "source": [ "import torch\n", "from torch import fx\n", "import torchvision.models as models\n", "\n", "def my_pass(inp: torch.nn.Module, tracer_class : type = fx.Tracer) -> torch.nn.Module:\n", " graph = tracer_class().trace(inp)\n", " # Transformation logic here\n", " # <...>\n", "\n", " # Return new Module\n", " return fx.GraphModule(inp, graph)\n", "\n", "my_module = models.resnet18()\n", "my_module_transformed = my_pass(my_module)\n", "\n", "input_value = torch.randn(5, 3, 224, 224)\n", "\n", "# When this line is executed at runtime, we will be dropped into an\n", "# interactive `pdb` prompt. We can use the `step` or `s` command to\n", "# step into the execution of the next line\n", "import pdb; pdb.set_trace()\n", "\n", "my_module_transformed(input_value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 打印生成代码\n", "\n", "如果您想要多次运行相同的代码,那么使用 `pdb` 逐步找到正确的代码可能有点乏味。在这种情况下,一种方法是简单地将生成的 `forward` 传递复制粘贴到代码中,并从那里检查它。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "# Assume that `traced` is a GraphModule that has undergone some\n", "# number of transforms\n", "\n", "# Copy this code for later\n", "print(traced)\n", "# Print the code generated from symbolic tracing. This outputs:\n", "\"\"\"\n", "def forward(self, y):\n", " x = self.x\n", " add_1 = x + y; x = y = None\n", " return add_1\n", "\"\"\"\n", "\n", "# Subclass the original Module\n", "class SubclassM(M):\n", " def __init__(self):\n", " super().__init__()\n", "\n", " # Paste the generated `forward` function (the one we printed and\n", " # copied above) here\n", " def forward(self, y):\n", " x = self.x\n", " add_1 = x + y; x = y = None\n", " return add_1\n", "\n", "# Create an instance of the original, untraced Module. Then, create an\n", "# instance of the Module with the copied `forward` function. We can\n", "# now compare the output of both the original and the traced version.\n", "pre_trace = M()\n", "post_trace = SubclassM()\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 使用 {func}`~torch.fx.GraphModule.to_folder` 函数\n", "\n", "{func}`~torch.fx.GraphModule.to_folder` 是 {class}`~torch.fx.GraphModule` 中的方法,它允许你将生成的 FX 代码转储到文件夹中。尽管像打印生成的代码那样,将 `forward` 传递复制到代码中通常就足够了,但是使用 {func}`~torch.fx.GraphModule.to_folder` 检查模块和参数可能更容易。\n", "\n", "```python\n", "m = symbolic_trace(M())\n", "m.to_folder(\"foo\", \"Bar\")\n", "from foo import Bar\n", "y = Bar()\n", "```\n", "\n", "在运行上面的示例之后,可以查看 `foo/module.py` 中的代码,并根据需要修改它(例如添加 `print` 语句或使用 `pdb`),以调试生成的代码。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 调试变换\n", "\n", "既然已经确定了变换正在创建不正确的代码,现在是调试变换本身的时候了。\n", "\n", "```python\n", "# Sample Module\n", "class M(torch.nn.Module):\n", " def forward(self, x, y):\n", " return x + y\n", "\n", "# Create an instance of `M`\n", "m = M()\n", "\n", "# Symbolically trace an instance of `M` (returns a GraphModule). In\n", "# this example, we'll only be discussing how to inspect a\n", "# GraphModule, so we aren't showing any sample transforms for the\n", "# sake of brevity.\n", "traced = symbolic_trace(m)\n", "\n", "# Print the code produced by tracing the module.\n", "print(traced)\n", "# The generated `forward` function is:\n", "\"\"\"\n", "def forward(self, x, y):\n", " add = x + y; x = y = None\n", " return add\n", "\"\"\"\n", "\n", "# Print the internal Graph.\n", "print(traced.graph)\n", "# This print-out returns:\n", "\"\"\"\n", "graph():\n", " %x : [#users=1] = placeholder[target=x]\n", " %y : [#users=1] = placeholder[target=y]\n", " %add : [#users=1] = call_function[target=operator.add](args = (%x, %y), kwargs = {})\n", " return add\n", "\"\"\"\n", "\n", "# Print a tabular representation of the internal Graph.\n", "traced.graph.print_tabular()\n", "# This gives us:\n", "\"\"\"\n", "opcode name target args kwargs\n", "------------- ------ ----------------------- ------ --------\n", "placeholder x x () {}\n", "placeholder y y () {}\n", "call_function add (x, y) {}\n", "output output output (add,) {}\n", "\"\"\"\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用上面的实用函数,可以在应用变换之前和之后比较跟踪的 {class}`torch.nn.Module`。\n", "\n", "抛开上面的例子,考虑下面的代码:\n", "\n", "```python\n", "# Sample user-defined function\n", "def transform_graph(module: torch.nn.Module, tracer_class : type = fx.Tracer) -> torch.nn.Module:\n", " # Get the Graph from our traced Module\n", " g = tracer_class().trace(module)\n", "\n", " \"\"\"\n", " Transformations on `g` go here\n", " \"\"\"\n", "\n", " return fx.GraphModule(module, g)\n", "\n", "# Transform the Graph\n", "transformed = transform_graph(traced)\n", "\n", "# Print the new code after our transforms. Check to see if it was\n", "# what we expected\n", "print(transformed)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用上面的例子,假设对 `print(tracing)` 的调用告诉我们变换中有一个错误。希望使用调试器找到哪里出了问题。可以通过中断 `transform_graph(已跟踪),然后按s“进入”对transform_graph(已跟踪)的调用来查看转换过程中发生了什么。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.10.4 ('tvmx': conda)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "e579259ee6098e2b9319de590d145b4b096774fe457bdf04260e3ba5c171e887" } } }, "nbformat": 4, "nbformat_minor": 2 }