{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "Tce3stUlHN0L" }, "outputs": [], "source": [ "##### Copyright 2021 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "tuOe1ymfHZPu", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "D70XgUYdLwI6" }, "source": [ "# TensorFlow 1.x 对比 TensorFlow 2 - 行为和 API" ] }, { "cell_type": "markdown", "metadata": { "id": "MfBg1C5NB3X0" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
在 TensorFlow.org 上查看 在 Google Colab 中运行 在 GitHub 上查看源代码 下载笔记本
" ] }, { "cell_type": "markdown", "metadata": { "id": "akxmN3SQsEcb" }, "source": [ "就底层而言,TensorFlow 2 遵循的是与 TF1.x 完全不同的编程范式。\n", "\n", "本指南将介绍 TF1.x 和 TF2 在行为和 API 方面的根本区别,以及您在迁移之旅中应如何应对这些区别。" ] }, { "cell_type": "markdown", "metadata": { "id": "Xzy2mT87mwth" }, "source": [ "## 主要变更的简略摘要\n", "\n", "从根本上讲,TF1.x 和 TF2 围绕执行(TF2 中的 Eager Execution)、变量、控制流、张量形状和张量相等性比较使用了一组不同的运行时行为。要与 TF2 兼容,您的代码必须与全套 TF2 行为兼容。在迁移期间,您可以通过 `tf.compat.v1.enable_*` 或 `tf.compat.v1.disable_*` API 单独启用或停用大多数行为。移除集合是一个例外,这是启用/停用 Eager Execution 的副作用。\n", "\n", "概括来讲,TensorFlow 2:\n", "\n", "- 移除了[冗余的 API](https://github.com/tensorflow/community/blob/master/rfcs/20180827-api-names.md)。\n", "- 使 API 更加一致 – 例如,[统一 RNN](https://github.com/tensorflow/community/blob/master/rfcs/20180920-unify-rnn-interface.md) 和[统一优化器](https://github.com/tensorflow/community/blob/master/rfcs/20181016-optimizer-unification.md)。\n", "- [函数优先于会话](https://github.com/tensorflow/community/blob/master/rfcs/20180918-functions-not-sessions-20.md),默认启用 [Eager Execution](https://tensorflow.google.cn/guide/eager) 的情况下可以更好地与 Python 运行时集成,以及为计算图和编译提供自动控制依赖项的 `tf.function`。\n", "- 弃用了全局计算图[集合](https://github.com/tensorflow/community/blob/master/rfcs/20180905-deprecate-collections.md)。\n", "- 通过使用 [`ResourceVariables`(而非 `ReferenceVariables`](https://github.com/tensorflow/community/blob/master/rfcs/20180817-variables-20.md))更改了变量并发语义。\n", "- 支持[基于函数](https://github.com/tensorflow/community/blob/master/rfcs/20180507-cond-v2.md)和可微的[控制流](https://github.com/tensorflow/community/blob/master/rfcs/20180821-differentiable-functional-while.md) (Control Flow v2)。\n", "- 简化了 TensorShape API 以保存 `int`(而非 `tf.compat.v1.Dimension`)对象。\n", "- 更新了张量相等机制。在 TF1.x 中,张量和变量上的 `==` 运算符会检查对象引用的相等性。而在 TF2 中,它会检查值的相等性。此外,张量/变量不再具有可哈希性,但如果您需要以集合形式或作为 `dict` 键使用,则可以通过 `var.ref()` 获取对它们的可哈希对象引用。\n", "\n", "以下部分提供了有关 TF1.x 和 TF2 之间差异的更多背景信息。要详细了解 TF2 背后的设计过程,请参阅 [RFC](https://github.com/tensorflow/community/pulls?utf8=%E2%9C%93&q=is%3Apr) 和[设计文档](https://github.com/tensorflow/community/tree/master/rfcs)。" ] }, { "cell_type": "markdown", "metadata": { "id": "dlCiIgEE2OhY" }, "source": [ "## API 清理\n", "\n", "许多 API 在 TF2 中[已消失或发生移动](https://github.com/tensorflow/community/blob/master/rfcs/20180827-api-names.md)。一些重大变更包括移除 `tf.app`、`tf.flags` 和 `tf.logging`,转而采用现在开源的 [absl-py](https://github.com/abseil/abseil-py),重新安置了 `tf.contrib` 中的项目,并清理了主要的 `tf.*` 命名空间,将不常用的函数移动到像 `tf.math` 这样的子软件包中。一些 API 已被替换为 TF2 等效项:`tf.summary`、`tf.keras.metrics` 和 `tf.keras.optimizers`。\n", "\n", "### `tf.compat.v1`:旧版和兼容性 API 端点\n", "\n", "`tf.compat` 和 `tf.compat.v1` 命名空间下的符号不被视为 TF2 API。这些命名空间公开了混合的兼容性符号,以及 TF 1.x 中的旧版 API 端点。这些旨在帮助从 TF1.x 迁移到 TF2。但是,这些 `compat.v1` API 都不是惯用的 TF2 API,因此不要将它们用于编写全新的 TF2 代码。\n", "\n", "单个 `tf.compat.v1` 符号可能与 TF2 兼容,因为即使启用了 TF2 行为(例如 `tf.compat.v1.losses.mean_squared_error`),它们也可以继续工作,而其他符号则与 TF2 不兼容(例如 `tf.compat.v1.metrics.accuracy`)。许多 `compat.v1` 符号(但非全部)都在其文档中包含了专门的迁移信息,解释了它们与 TF2 行为的兼容性程度,以及如何将它们迁移到 TF2 API。\n", "\n", "[TF2 升级脚本](https://tensorflow.google.cn/guide/migrate/upgrade)可以将许多 `compat.v1` API 符号映射到等效的 TF2 API,前提是它们共用别名或者具有相同但采用了不同顺序的参数。您还可以使用升级脚本以自动重命名 TF1.x API。\n", "\n", "### 同形异义 API\n", "\n", "TF2 `tf` 命名空间(不在 `compat.v1` 下)中存在一组“同形异义”符号,它们实际上会在后台忽略 TF2 行为,并且/或者与完整的 TF2 行为集不完全兼容。因此,这些 API 与 TF2 代码一起使用时可能会行为异常,并且可能不会提供警告。\n", "\n", "- `tf.estimator.*`:Estimator 会在后台创建和使用计算图和会话。因此,这些不应被视为与 TF2 兼容。如果您的代码正在运行 Estimator,它并未使用 TF2 行为。\n", "- `keras.Model.model_to_estimator(...)`:这会在后台创建一个 Estimator,如上所述,它与 TF2 不兼容。\n", "- `tf.Graph().as_default()`:这会进入 TF1.x 计算图行为,不遵循标准的 TF2 兼容 `tf.function` 行为。像这样进入计算图的代码通常会通过会话运行,不应视为与 TF2 兼容。\n", "- `tf.feature_column.*`:特征列 API 通常依赖于 TF1 风格的 `tf.compat.v1.get_variable` 变量创建,并假定将通过全局集合访问创建的变量。由于 TF2 不支持集合,在启用 TF2 行为的情况下运行 API 可能无法正常工作。\n", "\n", "### 其他 API 变更\n", "\n", "- TF2 的特性是对设备放置算法进行了重大改进,这样便不再有必要使用 `tf.colocate_with`。如果将它移除会导致性能下降,[请提交错误](https://github.com/tensorflow/tensorflow/issues)。\n", "\n", "- 将 `tf.v1.ConfigProto` 的所有用法替换为 `tf.config` 中的等效函数。" ] }, { "cell_type": "markdown", "metadata": { "id": "RxEU79Rd83Yz" }, "source": [ "## Eager Execution\n", "\n", "TF1.x 要求您通过进行 `tf.*` API 调用手动将[抽象语法树](https://en.wikipedia.org/wiki/Abstract_syntax_tree)(计算图)拼接在一起。随后,它要求用户通过将一组输出张量和输入张量传递给 `session.run` 调用来手动编译抽象语法树。TF2 会以 Eager 方式执行(像 Python 通常做的那样),使计算图和会话像实现细节一样。\n", "\n", "Eager Execution 一个值得注意的地方是不再需要 `tf.control_dependencies`,因为所有代码行均按顺序执行(在 `tf.function` 中,带副作用的代码按编写顺序执行)。" ] }, { "cell_type": "markdown", "metadata": { "id": "LH3YizX-9S7g" }, "source": [ "## 没有更多的全局变量\n", "\n", "TF1.x 严重依赖隐式全局命名空间。当您调用 `tf.Variable` 时,它会被放入默认计算图中的集合并保留在其中,即使您已失去指向它的 Python 变量的踪迹。随后,您可以恢复该 `tf.Variable`,但前提是您知道它在创建时的名称。如果您无法控制变量的创建,这就很难做到。结果,各种机制激增,试图帮助用户再次找到他们的变量,并寻找框架来查找用户创建的变量:变量范围、全局集合、辅助方法(如 `tf.get_global_step` 和 `tf.global_variables_initializer`)、隐式计算所有可训练变量梯度的优化器等。TF2 消除了所有这些机制 ([Variables 2.0 RFC](https://github.com/tensorflow/community/pull/11)),转而支持默认机制:跟踪您的变量!如果您失去了 `tf.Variable` 的踪迹,则会进行垃圾回收。\n", "\n", "跟踪变量的要求产生了一些额外的工作,但是借助诸如[建模填充码](./model_mapping.ipynb)之类的工具以及诸如 [`tf.Module` 和 `tf.keras.layers.Layer` 中的隐式面向对象变量集合](https://tensorflow.google.cn/guide/intro_to_modules)之类的行为,可以最大限度地减少负担。" ] }, { "cell_type": "markdown", "metadata": { "id": "NXwBgAjJ98J2" }, "source": [ "## 函数,而非会话\n", "\n", "`session.run` 调用几乎就像一个函数调用:指定输入和要调用的函数,然后返回一组输出。在 TF2 中,您可以使用 `tf.function` 来装饰 Python 函数,以将其标记为 JIT 编译,这样 TensorFlow 便可将其作为单个计算图运行 ([Functions 2.0 RFC](https://github.com/tensorflow/community/pull/20))。这种机制允许 TF2 获得计算图模式的所有好处:\n", "\n", "- 性能:可以优化函数(节点修剪、内核融合等)\n", "- 可移植性:可以导出/重新导入函数 ([SavedModel 2.0 RFC](https://github.com/tensorflow/community/pull/34)),从而允许您重用和共享模块化 TensorFlow 函数。\n", "\n", "```python\n", "# TF1.x\n", "outputs = session.run(f(placeholder), feed_dict={placeholder: input})\n", "# TF2\n", "outputs = f(input)\n", "```\n", "\n", "凭借自由穿插 Python 和 TensorFlow 代码的能力,您能够充分利用 Python 的表现力。但是,可移植的 TensorFlow 可以在没有 Python 解释器(如移动、C++ 和 JavaScript)的情况下执行。为帮助您避免在添加 `tf.function` 时重写代码,[AutoGraph](https://tensorflow.org/guide/function) 会将 Python 构造的一个子集转换成其 TensorFlow 等效项:\n", "\n", "- `for`/`while` -> `tf.while_loop`(支持 `break` 和 `continue`)\n", "- `if` -> `tf.cond`\n", "- `for _ in dataset` -> `dataset.reduce`\n", "\n", "AutoGraph 支持控制流的任意嵌套,这样便有可能高效而简洁地实现许多复杂的 ML 程序,例如序贯模型、强化学习、自定义训练循环等。" ] }, { "cell_type": "markdown", "metadata": { "id": "Mj3gaj4tpi7O" }, "source": [ "## 适应 TF 2.x 行为变更\n", "\n", "迁移到全套 TF2 行为后,您向 TF2 的迁移才算完成。可以通过 `tf.compat.v1.enable_v2_behaviors` 和 `tf.compat.v1.disable_v2_behaviors` 来启用或停用全套行为。以下部分详细讨论了各项主要行为变更。" ] }, { "cell_type": "markdown", "metadata": { "id": "_M0zEtR9p0XD" }, "source": [ "### 使用 `tf.function`\n", "\n", "在迁移期间,您的程序的最大变化可能源于基本编程模型范式从计算图和会话转变为 Eager Execution 和 `tf.function`。请参阅 [TF2 迁移指南](https://tensorflow.org/guide/migrate)以详细了解如何从与 Eager Execution 和 `tf.function` 不兼容的 API 迁移到与其兼容的 API。\n", "\n", "注:在迁移期间,您可以选择使用 `tf.compat.v1.enable_eager_execution` 和 `tf.compat.v1.disable_eager_execution` 来直接启用和停用 Eager Execution,但这在程序的生命周期内只能执行一次。\n", "\n", "以下是一些常见程序模式,它们不涉及从 `tf.Graph` 和 `tf.compat.v1.Session` 切换到 Eager Execution 和 `tf.function` 时可能会导致问题的 API。" ] }, { "cell_type": "markdown", "metadata": { "id": "UgwEtwwN2PWy" }, "source": [ "#### 模式 1:多次运行计划仅进行一次的 Python 对象操纵和变量创建\n", "\n", "\n", "\n", "在依赖计算图和会话的 TF1.x 程序中,通常会期望程序中的所有 Python 逻辑只运行一次。但是,使用 Eager Execution 和 `tf.function` 时,可以合理期望您的 Python 逻辑会至少运行一次,也可能会运行更多次(以 Eager 方式多次运行,或在不同的 `tf.function` 跟踪记录之间运行多次)。有时,`tf.function` 甚至会在同一输入上跟踪两次,从而导致意外行为(请参见示例 1 和示例 2)。请参阅 `tf.function` [指南](https://tensorflow.google.cn/guide/function),了解详细信息。\n", "\n", "注:这种模式通常会导致您的代码在不使用 `tf.function` 的情况下以 Eager 方式执行时无提示地出现异常行为,但在尝试将有问题的代码包装在 `tf.function` 内时通常会引发 `InaccessibleTensorError` 或 `ValueError`。要发现和调试此问题,建议尽早使用 `tf.function` 包装您的代码,并使用 [pdb](https://docs.python.org/3/library/pdb.html) 或交互式调试来识别 `InaccessibleTensorError` 的来源。\n", "\n", "**示例 1:变量创建**\n", "\n", "请思考下面的示例,该函数在调用时会创建一个变量:\n", "\n", "```python\n", "def f():\n", " v = tf.Variable(1.0)\n", " return v\n", "\n", "with tf.Graph().as_default():\n", " with tf.compat.v1.Session() as sess:\n", " res = f()\n", " sess.run(tf.compat.v1.global_variables_initializer())\n", " sess.run(res)\n", "```\n", "\n", "但是,不允许单纯地使用 `tf.function` 来包装以上包含变量创建的函数。`tf.function` 仅支持[第一次调用时的单例变量创建](https://tensorflow.google.cn/guide/function#creating_tfvariables)。为了强制执行这一点,当 tf.function 在第一次调用中检测到变量创建时,它将尝试再次跟踪并在第二次跟踪中发现变量创建时引发错误。\n", "\n", "```python\n", "@tf.function\n", "def f():\n", " print(\"trace\") # This will print twice because the python body is run twice\n", " v = tf.Variable(1.0)\n", " return v\n", "\n", "try:\n", " f()\n", "except ValueError as e:\n", " print(e)\n", "```\n", "\n", "一种变通方法是在第一次调用中创建变量后对其进行缓存和重用。\n", "\n", "```python\n", "class Model(tf.Module):\n", " def __init__(self):\n", " self.v = None\n", "\n", " @tf.function\n", " def __call__(self):\n", " print(\"trace\") # This will print twice because the python body is run twice\n", " if self.v is None:\n", " self.v = tf.Variable(0)\n", " return self.v\n", "\n", "m = Model()\n", "m()\n", "```\n", "\n", "**示例 2:因 `tf.function` 回溯而导致张量超出范围**\n", "\n", "如示例 1 所示,`tf.function` 将在第一次调用中检测到变量创建时进行回溯。这可能会进一步造成混乱,因为两次跟踪将创建两个计算图。当回溯创建的第二个计算图尝试访问第一次跟踪期间生成的计算图中的张量时,Tensorflow 将引发提示张量超出范围的错误。为了演示这一场景,下面的代码在第一次调用 `tf.function` 的基础上创建了一个数据集。这将按预期运行。\n", "\n", "```python\n", "class Model(tf.Module):\n", " def __init__(self):\n", " self.dataset = None\n", "\n", " @tf.function\n", " def __call__(self):\n", " print(\"trace\") # This will print once: only traced once\n", " if self.dataset is None:\n", " self.dataset = tf.data.Dataset.from_tensors([1, 2, 3])\n", " it = iter(self.dataset)\n", " return next(it)\n", "\n", "m = Model()\n", "m()\n", "```\n", "\n", "但是,如果我们还尝试在第一次调用 `tf.function` 时创建变量,代码将引发提示数据集超出范围的错误。这是因为数据集位于第一个计算图中,而第二个计算图也在尝试访问它。\n", "\n", "```python\n", "class Model(tf.Module):\n", " def __init__(self):\n", " self.v = None\n", " self.dataset = None\n", "\n", " @tf.function\n", " def __call__(self):\n", " print(\"trace\") # This will print twice because the python body is run twice\n", " if self.v is None:\n", " self.v = tf.Variable(0)\n", " if self.dataset is None:\n", " self.dataset = tf.data.Dataset.from_tensors([1, 2, 3])\n", " it = iter(self.dataset)\n", " return [self.v, next(it)]\n", "\n", "m = Model()\n", "try:\n", " m()\n", "except TypeError as e:\n", " print(e) # is out of scope and cannot be used here.\n", "```\n", "\n", "最直接的解决方案是确保变量创建和数据集创建均位于 `tf.funciton` 调用之外。例如:\n", "\n", "```python\n", "class Model(tf.Module):\n", " def __init__(self):\n", " self.v = None\n", " self.dataset = None\n", "\n", " def initialize(self):\n", " if self.dataset is None:\n", " self.dataset = tf.data.Dataset.from_tensors([1, 2, 3])\n", " if self.v is None:\n", " self.v = tf.Variable(0)\n", "\n", " @tf.function\n", " def __call__(self):\n", " it = iter(self.dataset)\n", " return [self.v, next(it)]\n", "\n", "m = Model()\n", "m.initialize()\n", "m()\n", "```\n", "\n", "但是,有时在 `tf.function` 中创建变量是不可避免的(例如某些 [TF Keras 优化器](https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/Optimizer#slots)中的槽位变量)。不过,我们可以简单地将数据集创建移到 `tf.function` 调用之外。我们可以依赖这种方式的原因是 `tf.function` 将以隐式输入的形式接收数据集,并且两个计算图都可以正确访问它。\n", "\n", "```python\n", "class Model(tf.Module):\n", " def __init__(self):\n", " self.v = None\n", " self.dataset = None\n", "\n", " def initialize(self):\n", " if self.dataset is None:\n", " self.dataset = tf.data.Dataset.from_tensors([1, 2, 3])\n", "\n", " @tf.function\n", " def __call__(self):\n", " if self.v is None:\n", " self.v = tf.Variable(0)\n", " it = iter(self.dataset)\n", " return [self.v, next(it)]\n", "\n", "m = Model()\n", "m.initialize()\n", "m()\n", "```\n", "\n", "**示例 3:因使用字典而导致意外重新创建 TensorFlow 对象**\n", "\n", "`tf.function` 对 Python 副作用的支持(例如附加到列表或检查/添加到字典)非常差。[使用 tf.function 提升性能](https://tensorflow.google.cn/guide/function#executing_python_side_effects)中提供了更多详细信息。在下面的示例中,代码使用字典来缓存数据集和迭代器。对于相同的键,对模型的每次调用都将返回数据集的相同迭代器。\n", "\n", "```python\n", "class Model(tf.Module):\n", " def __init__(self):\n", " self.datasets = {}\n", " self.iterators = {}\n", "\n", " def __call__(self, key):\n", " if key not in self.datasets:\n", " self.datasets[key] = tf.compat.v1.data.Dataset.from_tensor_slices([1, 2, 3])\n", " self.iterators[key] = self.datasets[key].make_initializable_iterator()\n", " return self.iterators[key]\n", "\n", "with tf.Graph().as_default():\n", " with tf.compat.v1.Session() as sess:\n", " m = Model()\n", " it = m('a')\n", " sess.run(it.initializer)\n", " for _ in range(3):\n", " print(sess.run(it.get_next())) # prints 1, 2, 3\n", "```\n", "\n", "但是,上面的模式在 `tf.function` 中不会以预期方式工作。在跟踪期间,`tf.function` 将忽略添加到字典的 Python 副作用。相反,它只会记住新数据集和迭代器的创建。因此,对模型的每次调用将始终返回一个新的迭代器。除非数值结果或性能足够显著,否则将很难注意到这个问题。因此,我们建议用户在将 `tf.function` 单纯地包装到 Python 代码之前仔细思考代码。\n", "\n", "```python\n", "class Model(tf.Module):\n", " def __init__(self):\n", " self.datasets = {}\n", " self.iterators = {}\n", "\n", " @tf.function\n", " def __call__(self, key):\n", " if key not in self.datasets:\n", " self.datasets[key] = tf.data.Dataset.from_tensor_slices([1, 2, 3])\n", " self.iterators[key] = iter(self.datasets[key])\n", " return self.iterators[key]\n", "\n", "m = Model()\n", "for _ in range(3):\n", " print(next(m('a'))) # prints 1, 1, 1\n", "```\n", "\n", "我们可以使用 [`tf.init_scope`](https://tensorflow.google.cn/api_docs/python/tf/init_scope) 将数据集和迭代器创建提至计算图之外,以实现预期的行为:\n", "\n", "```python\n", "class Model(tf.Module):\n", " def __init__(self):\n", " self.datasets = {}\n", " self.iterators = {}\n", "\n", " @tf.function\n", " def __call__(self, key):\n", " if key not in self.datasets:\n", " # Lifts ops out of function-building graphs\n", " with tf.init_scope():\n", " self.datasets[key] = tf.data.Dataset.from_tensor_slices([1, 2, 3])\n", " self.iterators[key] = iter(self.datasets[key])\n", " return self.iterators[key]\n", "\n", "m = Model()\n", "for _ in range(3):\n", " print(next(m('a'))) # prints 1, 2, 3\n", "```\n", "\n", "一般来说,您应避免在逻辑中依赖 Python 副作用,而应仅将其用于调试您的跟踪。\n", "\n", "**示例 4:操纵全局 Python 列表**\n", "\n", "以下 TF1.x 代码使用了全局损失列表,仅用于维护当前训练步骤生成的损失列表。请注意,无论会话运行多少个训练步骤,将损失附加到列表的 Python 逻辑都只会被调用一次。\n", "\n", "```python\n", "all_losses = []\n", "\n", "class Model():\n", " def __call__(...):\n", " ...\n", " all_losses.append(regularization_loss)\n", " all_losses.append(label_loss_a)\n", " all_losses.append(label_loss_b)\n", " ...\n", "\n", "g = tf.Graph()\n", "with g.as_default():\n", " ...\n", " # initialize all objects\n", " model = Model()\n", " optimizer = ...\n", " ...\n", " # train step\n", " model(...)\n", " total_loss = tf.reduce_sum(all_losses)\n", " optimizer.minimize(total_loss)\n", " ...\n", "...\n", "sess = tf.compat.v1.Session(graph=g)\n", "sess.run(...)\n", "```\n", "\n", "但是,如果将此 Python 逻辑单纯地映射到采用 Eager Execution 的 TF2,则全局损失列表在每个训练步骤中都将附加新值。这意味着之前期望列表仅包含当前训练步骤内损失的训练步骤代码现在实际上看到的是迄今运行的所有训练步骤的损失列表。这是一种意外的行为变更,需要在每个步骤开始时对该列表进行清理,或者将其设置为训练步骤的局部列表。\n", "\n", "```python\n", "all_losses = []\n", "\n", "class Model():\n", " def __call__(...):\n", " ...\n", " all_losses.append(regularization_loss)\n", " all_losses.append(label_loss_a)\n", " all_losses.append(label_loss_b)\n", " ...\n", "\n", "# initialize all objects\n", "model = Model()\n", "optimizer = ...\n", "\n", "def train_step(...)\n", " ...\n", " model(...)\n", " total_loss = tf.reduce_sum(all_losses) # global list is never cleared,\n", " # Accidentally accumulates sum loss across all training steps\n", " optimizer.minimize(total_loss)\n", " ...\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "qaYnjPo-tmTI" }, "source": [ "#### 模式 2:本应在 TF1.x 中每一步都重新计算的符号张量在切换到 Eager 时意外缓存了初始值。\n", "\n", "\n", "\n", "这种模式通常会导致您的代码在 tf.function 外部以 Eager 方式执行时无提示的出现异常行为,但如果初始值缓存发生在 `tf.function` 内部,则会引发 `InaccessibleTensorError`。但请注意,您通常会为了避免上述[模式 1](#pattern-1) 而无意中以这样的方式构建代码,使初始值缓存发生在任何可能引发错误的 `tf.function` *之外*。因此,如果您知道自己的程序可能容易受到这种模式的影响,请格外小心。\n", "\n", "这种模式的一般解决方案是重组代码或在必要时使用 Python 可调用对象,以确保值每次都重新计算,而非意外缓存。\n", "\n", "**示例 1:学习率/超参数等。取决于全局步骤的调度**\n", "\n", "在下面的代码段中,期望的模式是在每次运行会话时都读取最新的 `global_step` 值并计算新的学习率。\n", "\n", "```python\n", "g = tf.Graph()\n", "with g.as_default():\n", " ...\n", " global_step = tf.Variable(0)\n", " learning_rate = 1.0 / global_step\n", " opt = tf.compat.v1.train.GradientDescentOptimizer(learning_rate)\n", " ...\n", " global_step.assign_add(1)\n", "...\n", "sess = tf.compat.v1.Session(graph=g)\n", "sess.run(...)\n", "```\n", "\n", "但是,当尝试切换到 Eager 时,请注意学习率最终只计算一次并被重用,而未遵循预期调度:\n", "\n", "```python\n", "global_step = tf.Variable(0)\n", "learning_rate = 1.0 / global_step # Wrong! Only computed once!\n", "opt = tf.keras.optimizers.SGD(learning_rate)\n", "\n", "def train_step(...):\n", " ...\n", " opt.apply_gradients(...)\n", " global_step.assign_add(1)\n", " ...\n", "```\n", "\n", "这个特定示例是一种常见模式,优化器应只初始化一次,而非在每个训练步骤都初始化,因此 TF2 优化器支持 `tf.keras.optimizers.schedules.LearningRateSchedule` 调度或 Python 可调用对象作为学习率和其他超参数的参数。\n", "\n", "**示例 2:分配为对象特性然后通过指针重用的符号随机数初始化在切换到 Eager 时被意外缓存**\n", "\n", "请思考以下 `NoiseAdder` 模块:\n", "\n", "```python\n", "class NoiseAdder(tf.Module):\n", " def __init__(shape, mean):\n", " self.noise_distribution = tf.random.normal(shape=shape, mean=mean)\n", " self.trainable_scale = tf.Variable(1.0, trainable=True)\n", " \n", " def add_noise(input):\n", " return (self.noise_distribution + input) * self.trainable_scale\n", "```\n", "\n", "在 TF1.x 中如下使用会在每次运行会话时计算新的随机噪声张量:\n", "\n", "```python\n", "g = tf.Graph()\n", "with g.as_default():\n", " ...\n", " # initialize all variable-containing objects\n", " noise_adder = NoiseAdder(shape, mean)\n", " ...\n", " # computation pass\n", " x_with_noise = noise_adder.add_noise(x)\n", " ...\n", "...\n", "sess = tf.compat.v1.Session(graph=g)\n", "sess.run(...)\n", "```\n", "\n", "但在 TF2 中,在开始时初始化 `noise_adder` 将导致 `noise_distribution` 只计算一次并在所有训练步骤中冻结:\n", "\n", "```python\n", "...\n", "# initialize all variable-containing objects\n", "noise_adder = NoiseAdder(shape, mean) # Freezes `self.noise_distribution`!\n", "...\n", "# computation pass\n", "x_with_noise = noise_adder.add_noise(x)\n", "...\n", "```\n", "\n", "要解决此问题,请重构 `NoiseAdder` 以在每次需要新的随机张量时均调用 `tf.random.normal`,而非每次都引用同一个张量对象。\n", "\n", "```python\n", "class NoiseAdder(tf.Module):\n", " def __init__(shape, mean):\n", " self.noise_distribution = lambda: tf.random.normal(shape=shape, mean=mean)\n", " self.trainable_scale = tf.Variable(1.0, trainable=True)\n", " \n", " def add_noise(input):\n", " return (self.noise_distribution() + input) * self.trainable_scale\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "j2PXkSflCaCl" }, "source": [ "#### 模式 3:TF1.x 代码直接依赖张量并按名称查找张量\n", "\n", "\n", "\n", "TF1.x 代码测试通常会依赖于检查计算图中存在哪些张量或运算。在极少数情况下,建模代码也会依赖于这些按名称查找。\n", "\n", "在 `tf.function` 之外以 Eager 方式执行时根本不会生成张量名称,因此 `tf.Tensor.name` 的所有用法都必须发生在 `tf.function` 内部。请记住,即使在同一个 `tf.function` 中,TF1.x 与 TF2 之间实际的生成名称也很可能不同,并且 API 保证不能确保生成名称在各个 TF 版本之间的稳定性。\n", "\n", "注:即使是在 `tf.function` 之外仍会生成变量名称,也不能保证其名称在 TF1.x 与 TF2 之间匹配,除非遵循[模型映射指南](./model_mapping.ipynb)中的相关部分。\n" ] }, { "cell_type": "markdown", "metadata": { "id": "5NB3bycl5Lde" }, "source": [ "#### 模式 4:TF1.x 会话选择性地仅运行生成计算图的一部分\n", "\n", "\n", "\n", "在 TF1.x 中,您可以构造计算图,然后通过选择一组不需要运行计算图中每个运算的输入和输出,选择仅在会话中选择性地运行其中的一个子集。\n", "\n", "例如,您在单个计算图中可能同时具有生成器和鉴别器,并使用单独的 `tf.compat.v1.Session.run` 调用在仅训练鉴别器或仅训练生成器之间交替。\n", "\n", "在 TF2 中,由于 `tf.function` 中的自动控制依赖项以及 Eager Execution,不会对 `tf.function` 跟踪进行选择性剪枝。例如,即使只有鉴别器或生成器的输出是输出自 `tf.function`,也会运行包含所有变量更新的完整计算图。\n", "\n", "因此,您需要使用包含程序不同部分的多个 `tf.function`,或者为您分支的 `tf.function` 提供一个条件参数,以便仅执行您实际想要运行的部分。" ] }, { "cell_type": "markdown", "metadata": { "id": "CnNaUmROp5fV" }, "source": [ "### 集合移除\n", "\n", "启用 Eager Execution 后,与计算图集合相关的 `compat.v1` API(包括那些在后台读取或写入集合的 API,例如 `tf.compat.v1.trainable_variables`)将不再可用。有些可能会引发 `ValueError`,有些可能会静默地返回空列表。\n", "\n", "在 TF1.x 中,集合最标准的用法是维护初始化器、全局步骤、权重、正则化损失、模型输出损失和需要运行的变量更新(例如从 `BatchNormalization` 层)。\n", "\n", "处理上述各项标准用法:\n", "\n", "1. 初始化器 - 请忽略。启用 Eager Execution 的情况下不需要手动变量初始化。\n", "2. 全局步骤 - 有关迁移说明,请参阅 `tf.compat.v1.train.get_or_create_global_step` 的文档。\n", "3. 权重 - 请按照[模型映射指南](./model_mapping.ipynb)中的指导将您的模型映射到 `tf.Module`/`tf.keras.layers.Layer`/`tf.keras.Model`,然后使用它们各自的权重跟踪机制,例如 `tf.module.trainable_variables`。\n", "4. 正则化损失 - 请按照[模型映射指南](./model_mapping.ipynb)中的指导将您的模型映射到 `tf.Module`/`tf.keras.layers.Layer`/`tf.keras.Model`,然后使用 `tf.keras.losses`。或者,您也可以手动跟踪您的正则化损失。\n", "5. 模型输出损失 - 请使用 `tf.keras.Model` 损失管理机制,或在不使用集合的情况下单独跟踪您的损失。\n", "6. 权重更新 - 请忽略此集合。Eager Execution 和 `tf.function`(带有 AutoGraph 和自动控制流依赖项)意味着所有变量更新都将自动运行。因此,您不必在最后显式运行所有权重更新,但请注意,这意味着权重更新的发生时间可能与在 TF1.x 代码中不同,具体取决于您使用控制依赖项的方式。\n", "7. 摘要 - 请参阅[迁移摘要 API 指南](https://tensorflow.google.cn/tensorboard/migrate)。\n", "\n", "对于更为复杂的集合用法(例如使用自定义集合),您可能需要重构代码以维护自己的全局存储,或者使其完全不依赖于全局存储。" ] }, { "cell_type": "markdown", "metadata": { "id": "8J_ckZstp8y1" }, "source": [ "### `ResourceVariables` 而非 `ReferenceVariables`\n", "\n", "`ResourceVariables` 与 `ReferenceVariables` 相比具有更强的读写一致性保证。这样一来,在使用变量时,有关能否观察先前写入的结果的语义将更加可预测、更容易推理。此变更导致现有代码引发错误或静默中断的可能性极低。\n", "\n", "但是,这些更强大的一致性保证***有可能(尽管可能性很低)***增加特定程序的内存使用量。如果您遇到这种情况,请提交[议题](https://github.com/tensorflow/tensorflow/issues)。此外,如果您的单元测试依赖于与计算图中变量读取对应的运算符名称的精确字符串比较,请注意启用资源变量可能会稍微更改这些运算符的名称。\n", "\n", "为了隔离此行为变更对您的代码产生的影响,如果停用了 Eager Execution,可以使用 `tf.compat.v1.disable_resource_variables()` 和 `tf.compat.v1.enable_resource_variables()` 来全局停用或启用此行为变更。如果启用了 Eager Execution,将始终使用 `ResourceVariables`。\n" ] }, { "cell_type": "markdown", "metadata": { "id": "FTU-4P1vux0e" }, "source": [ "### Control Flow v2\n", "\n", "在 TF1.x 中,控制流运算(例如 `tf.cond` 和 `tf.while_loop`)会内嵌低级运算(例如 `Switch`、`Merge` 等)。TF2 提供了改进的函数式控制流运算,可以通过单独的 `tf.function` 跟踪记录对每个分支实现并支持更高阶的微分。\n", "\n", "为了隔离此行为变更对您的代码产生的影响,如果停用了 Eager Execution,您可以使用 `tf.compat.v1.disable_control_flow_v2()` 和 `tf.compat.v1.enable_control_flow_v2()` 来全局停用或启用此行为变更。但是,如果还停用了 Eager Execution,则只能停用 Control Flow v2。如果启用了 Eager Execution,将始终使用 Control Flow v2。\n", "\n", "这种行为变更可以极大地改变使用控制流的生成 TF 程序的结构,因为它们将包含多个嵌套函数跟踪记录,而非一个平面计算图。因此,任何高度依赖于所生成跟踪记录的确切语义的代码都可能需要进行一些修改。这包括:\n", "\n", "- 依赖于运算符和张量名称的代码\n", "- 从 TensorFlow 控制流分支外部引用在该分支内创建的张量的代码。这很可能会产生 `InaccessibleTensorError`\n", "\n", "此行为变更旨在保持或提高性能,但如果您遇到 Control Flow v2 性能不及 TF1.x 控制流性能的问题,请提交[议题](https://github.com/tensorflow/tensorflow/issues)并说明重现步骤。 " ] }, { "cell_type": "markdown", "metadata": { "id": "W7VwgVCGqE9S" }, "source": [ "## TensorShape API 行为变更\n", "\n", "`TensorShape` 类已经过简化,可以保存 `int`(而非 `tf.compat.v1.Dimension`)对象。因此,无需调用 `.value` 来获取 `int`。\n", "\n", "仍然可以从 `tf.TensorShape.dims` 访问各个 `tf.compat.v1.Dimension` 对象。\n", "\n", "要隔离此行为变更对您的代码产生的影响,您可以使用 `tf.compat.v1.disable_v2_tensorshape()` 和 `tf.compat.v1.enable_v2_tensorshape()` 来全局停用或启用此行为变更。" ] }, { "cell_type": "markdown", "metadata": { "id": "x36cWcmM8Eu1" }, "source": [ "以下代码演示了 TF1.x 与 TF2 之间的区别。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "QF4un9UpVTRA", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "import tensorflow as tf" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "PbpD-kHOZR4A", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "# Create a shape and choose an index\n", "i = 0\n", "shape = tf.TensorShape([16, None, 256])\n", "shape" ] }, { "cell_type": "markdown", "metadata": { "id": "kDFck03neNy0" }, "source": [ "如果您在 TF1.x 中使用此代码:\n", "\n", "```python\n", "value = shape[i].value\n", "```\n", "\n", "在 TF2 中则使用:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "KuR73QGEeNdH", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "value = shape[i]\n", "value" ] }, { "cell_type": "markdown", "metadata": { "id": "bPWPNKRiZmkd" }, "source": [ "如果您在 TF1.x 中使用此代码:\n", "\n", "```python\n", "for dim in shape:\n", " value = dim.value\n", " print(value)\n", "```\n", "\n", "在 TF2 中则使用:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "y6s0vuuprJfc", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "for value in shape:\n", " print(value)" ] }, { "cell_type": "markdown", "metadata": { "id": "YpRgngu3Zw-A" }, "source": [ "如果您在 TF1.x 中使用此代码(或使用任何其他维度方法):\n", "\n", "```python\n", "dim = shape[i]\n", "dim.assert_is_compatible_with(other_dim)\n", "```\n", "\n", "在 TF2 中则使用:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "LpViGEcUZDGX", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "other_dim = 16\n", "Dimension = tf.compat.v1.Dimension\n", "\n", "if shape.rank is None:\n", " dim = Dimension(None)\n", "else:\n", " dim = shape.dims[i]\n", "dim.is_compatible_with(other_dim) # or any other dimension method" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "GaiGe36dOdZ_", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "shape = tf.TensorShape(None)\n", "\n", "if shape:\n", " dim = shape.dims[i]\n", " dim.is_compatible_with(other_dim) # or any other dimension method" ] }, { "cell_type": "markdown", "metadata": { "id": "3kLLY0I3PI-l" }, "source": [ "如果秩已知,`tf.TensorShape` 的布尔值将为 `True`,否则为 `False`。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-Ow1ndKpOnJd", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "print(bool(tf.TensorShape([]))) # Scalar\n", "print(bool(tf.TensorShape([0]))) # 0-length vector\n", "print(bool(tf.TensorShape([1]))) # 1-length vector\n", "print(bool(tf.TensorShape([None]))) # Unknown-length vector\n", "print(bool(tf.TensorShape([1, 10, 100]))) # 3D tensor\n", "print(bool(tf.TensorShape([None, None, None]))) # 3D tensor with no known dimensions\n", "print()\n", "print(bool(tf.TensorShape(None))) # A tensor with unknown rank." ] }, { "cell_type": "markdown", "metadata": { "id": "KvfEd-uSsWqN" }, "source": [ "### 因 TensorShape 变更而导致的潜在错误\n", "\n", "TensorShape 行为变更不太可能会静默地破坏您的代码。但是,您可能会看到与形状相关的代码开始引发 `AttributeError`,因为 `int` 和 `None` 不具有与 `tf.compat.v1.Dimension` 相同的特性。以下是这些 `AttributeError` 的一些示例:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "r18f8JAGsQi6", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "try:\n", " # Create a shape and choose an index\n", " shape = tf.TensorShape([16, None, 256])\n", " value = shape[0].value\n", "except AttributeError as e:\n", " # 'int' object has no attribute 'value'\n", " print(e)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "t9flHru1uIdT", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "try:\n", " # Create a shape and choose an index\n", " shape = tf.TensorShape([16, None, 256])\n", " dim = shape[1]\n", " other_dim = shape[2]\n", " dim.assert_is_compatible_with(other_dim)\n", "except AttributeError as e:\n", " # 'NoneType' object has no attribute 'assert_is_compatible_with'\n", " print(e)" ] }, { "cell_type": "markdown", "metadata": { "id": "Og7H_TwJqIOF" }, "source": [ "## 按值比较张量相等性\n", "\n", "变量和张量上的二元 `==` 和 `!=` 运算符在 TF2 中已变更为按值进行比较,而不是像在 TF1.x 中那样按对象引用进行比较。此外,张量和变量不再具有直接可哈希性,也不能在集合或字典键中使用,因为可能无法按值对其进行哈希。相反,它们公开了一个 `.ref()` 方法,您可以使用该方法获取对张量或变量的可哈希引用。\n", "\n", "要隔离此行为变更产生的影响,您可以使用 `tf.compat.v1.disable_tensor_equality()` 和 `tf.compat.v1.enable_tensor_equality()` 来全局停用或启用此行为变更。" ] }, { "cell_type": "markdown", "metadata": { "id": "NGN4oL3lz0ki" }, "source": [ "例如,在 TF1.x 中,当您使用 `==` 运算符时,两个具有相同值的变量将返回 false:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "dkGPGpEZ5DI-", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "tf.compat.v1.disable_tensor_equality()\n", "x = tf.Variable(0.0)\n", "y = tf.Variable(0.0)\n", "\n", "x == y" ] }, { "cell_type": "markdown", "metadata": { "id": "RqbewjIFz_oz" }, "source": [ "而在启用了张量相等性检查的 TF2 中,`x == y` 则将返回 `True`。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "V5P_Rwy-zxVE", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "tf.compat.v1.enable_tensor_equality()\n", "x = tf.Variable(0.0)\n", "y = tf.Variable(0.0)\n", "\n", "x == y" ] }, { "cell_type": "markdown", "metadata": { "id": "BqdUPLhHypfs" }, "source": [ "因此,在 TF2 中,如果您需要按对象引用进行比较,请确保使用 `is` 和 `is not`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "iEjXVxlu4uxo", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "tf.compat.v1.enable_tensor_equality()\n", "x = tf.Variable(0.0)\n", "y = tf.Variable(0.0)\n", "\n", "x is y" ] }, { "cell_type": "markdown", "metadata": { "id": "r2ai1BGN01VI" }, "source": [ "### 哈希张量和变量\n", "\n", "对于 TF1.x 行为,您过去可以直接将变量和张量添加到需要哈希的数据结构中,例如 `set` 和 `dict` 键。\n", "\n", "```python\n", "x = tf.Variable(0.0)\n", "set([x, tf.constant(2.0)])\n", "```\n", "\n", "但是,在启用了张量相等性的 TF2 中,由于 `==` 和 `!=` 运算符语义更改为值相等性检查,张量和变量变为不可哈希。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-TR1KfJu462w", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "tf.compat.v1.enable_tensor_equality()\n", "x = tf.Variable(0.0)\n", "\n", "try:\n", " set([x, tf.constant(2.0)])\n", "except TypeError as e:\n", " # TypeError: Variable is unhashable. Instead, use tensor.ref() as the key.\n", " print(e)" ] }, { "cell_type": "markdown", "metadata": { "id": "CQY7NvNAa7be" }, "source": [ "因此,在 TF2 中,如果您需要使用张量或变量对象作为键或 `set` 内容,可以使用 `tensor.ref()` 来获取可用作键的可哈希引用:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "p-1kVPs01ZuU", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "tf.compat.v1.enable_tensor_equality()\n", "x = tf.Variable(0.0)\n", "\n", "tensor_set = set([x.ref(), tf.constant(2.0).ref()])\n", "assert x.ref() in tensor_set\n", "\n", "tensor_set" ] }, { "cell_type": "markdown", "metadata": { "id": "PqqRqfOYbaOX" }, "source": [ "如果需要,您还可以使用 `reference.deref()` 以从引用中获取张量或变量:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "DwRZMYV06M7q", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "referenced_var = x.ref().deref()\n", "assert referenced_var is x\n", "referenced_var" ] }, { "cell_type": "markdown", "metadata": { "id": "5XSFQbJaReVC" }, "source": [ "## 资源和延伸阅读\n", "\n", "- 请访问[迁移到 TF2](https://tensorflow.org/guide/migrate) 部分,详细了解如何从 TF1.x 迁移到 TF2。\n", "- 阅读[模型映射指南](./model_mapping.ipynb),详细了解如何映射 TF1.x 模型以直接在 TF2 中使用。 " ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "tf1_vs_tf2.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }