{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "wJcYs_ERTnnI" }, "outputs": [], "source": [ "##### Copyright 2021 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "HMUDt0CiUJk9", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "77z2OchJTk0l" }, "source": [ "# 从 TPUEstimator 迁移到 TPUStrategy\n", "\n", "
![]() | \n",
" ![]() | \n",
" ![]() | \n",
" ![]() | \n",
"
tf.GradientTape
)进行训练的更多示例,请参阅[使用 TPU](../../guide/tpu.ipynb) 指南。)\n",
"\n",
"由于您需要执行一些初始化工作以连接到远程聚簇并初始化 TPU 工作进程,因此首先创建一个 `TPUClusterResolver` 以提供聚簇信息并连接到聚簇。(在[使用 TPU](../../guide/tpu.ipynb) 指南的 TPU 初始化部分中了解详情。)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_TgdPNgXoS63",
"vscode": {
"languageId": "python"
}
},
"outputs": [],
"source": [
"cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')\n",
"tf.config.experimental_connect_to_cluster(cluster_resolver)\n",
"tf.tpu.experimental.initialize_tpu_system(cluster_resolver)\n",
"print(\"All devices: \", tf.config.list_logical_devices('TPU'))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "R4EHXhN3CVmo"
},
"source": [
"接下来,准备好数据后,将创建一个 `TPUStrategy`,然后在此策略的范围下定义模型、指标和优化器。\n",
"\n",
"要实现与 `TPUStrategy` 相当的训练速度,应当确保在 `Model.compile` 中为 `steps_per_execution` 选择一个数字,因为它指定了每次 `tf.function` 调用期间要运行的批次数,并且对性能至关重要。此参数类似于 `TPUEstimator` 中使用的 `iterations_per_loop`。如果您使用自定义训练循环,则应确保在 `tf.function` 修饰过的训练函数中运行多个步骤。如需了解详情,请转到[使用 TPU](../../guide/tpu.ipynb) 指南的*在 tf.function 中通过多个步骤提升性能*部分。\n",
"\n",
"`tf.distribute.TPUStrategy` 可以支持有界动态形状,即可以推断出动态形状计算的上限。但是,与静态形状相比,动态形状可能会引入一些性能开销。因此,如果有可能,通常建议将输入形状设为静态,尤其是在训练中。返回动态形状的一个常见运算是 `tf.data.Dataset.batch(batch_size)`,因为流中剩余的样本数可能小于批次大小。因此,在 TPU 上进行训练时,应使用 `tf.data.Dataset.batch(..., drop_remainder=True)` 来获得最佳训练性能。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "atVciNgPs0fw",
"vscode": {
"languageId": "python"
}
},
"outputs": [],
"source": [
"dataset = tf.data.Dataset.from_tensor_slices(\n",
" (features, labels)).shuffle(10).repeat().batch(\n",
" 8, drop_remainder=True).prefetch(2)\n",
"eval_dataset = tf.data.Dataset.from_tensor_slices(\n",
" (eval_features, eval_labels)).batch(1, drop_remainder=True)\n",
"\n",
"strategy = tf.distribute.TPUStrategy(cluster_resolver)\n",
"with strategy.scope():\n",
" model = tf.keras.models.Sequential([tf.keras.layers.Dense(1)])\n",
" optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.05)\n",
" model.compile(optimizer, \"mse\", steps_per_execution=10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FkM2VZyni98F"
},
"source": [
"这样,您就可以使用训练数据集训练模型了:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Kip65sYBlKiu",
"vscode": {
"languageId": "python"
}
},
"outputs": [],
"source": [
"model.fit(dataset, epochs=5, steps_per_epoch=10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "r0AEK8sNjLOj"
},
"source": [
"最后,使用评估数据集评估模型:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6tMRkyfKhqSL",
"vscode": {
"languageId": "python"
}
},
"outputs": [],
"source": [
"model.evaluate(eval_dataset, return_dict=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "67ec4d3f35d6"
},
"source": [
"## 后续步骤"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gHx_RUL8xcJ3"
},
"source": [
"要详细了解 TensorFlow 2 中的 `TPUStrategy`,请查看以下资源:\n",
"\n",
"- 指南:[使用 TPU](../../guide/tpu.ipynb)(涵盖使用 Keras `Model.fit`/自定义训练循环(使用 `tf.distribute.TPUStrategy`)进行训练,以及使用 `tf.function` 提升性能的技巧)\n",
"- 指南:[使用 TensorFlow 进行分布式训练](../../guide/distributed_training.ipynb)\n",
"\n",
"要详细了解自定义训练,请参阅:\n",
"\n",
"- 指南:[自定义 Model.fit 的功能](../..guide/keras/customizing_what_happens_in_fit.ipynb)\n",
"- 指南:[从头开始编写训练循环](https://tensorflow.google.cn/guide/keras/writing_a_training_loop_from_scratch)\n",
"\n",
"TPU(Google 用于机器学习的专用 ASIC)可通过 [Google Colab](https://colab.research.google.com/)、[TPU Research Cloud](https://sites.research.google/trc/) 和 [Cloud TPU](https://cloud.google.com/tpu) 获得。"
]
}
],
"metadata": {
"accelerator": "TPU",
"colab": {
"collapsed_sections": [],
"name": "tpu_estimator.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}