{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "wJcYs_ERTnnI" }, "outputs": [], "source": [ "##### Copyright 2021 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "HMUDt0CiUJk9", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "77z2OchJTk0l" }, "source": [ "# 从 TPU embedding_column 迁移到 TPUEmbedding 层\n", "\n", "\n", " \n", " \n", " \n", " \n", "
在 TensorFlow.org 上查看 在 Google Colab 运行 在 Github 上查看源代码 下载笔记本
" ] }, { "cell_type": "markdown", "metadata": { "id": "meUTrR4I6m1C" }, "source": [ "本指南演示了如何将 [TPU](../../guide/tpu.ipynb) 上的嵌入向量训练从 TensorFlow 1 带有 `TPUEstimator` 的 `embedding_column` API 迁移到 TensorFlow 2 带有 `TPUStrategy` 的 `TPUEmbedding` 层 API。\n", "\n", "嵌入向量是(大型)矩阵。它们是从稀疏特征空间映射到密集向量的查找表。嵌入向量提供高效和密集表示,可以捕捉特征之间复杂的相似度和关系。\n", "\n", "TensorFlow 包括对在 TPU 上训练嵌入向量的专门支持。这种特定于 TPU 的嵌入向量支持可让您训练大于单个 TPU 设备内存的嵌入向量,并在 TPU 上使用稀疏和不规则输入。\n", "\n", "- 在 TensorFlow 1 中,`tf.compat.v1.estimator.tpu.TPUEstimator` 是一个高级 API,它封装了适用于 TPU 的训练、评估、预测和导出。它对 `tf.compat.v1.tpu.experimental.embedding_column` 有特殊支持。\n", "- 要在 TensorFlow 2 中实现此目标,可以使用 TensorFlow Recommenders 的 `tfrs.layers.embedding.TPUEmbedding` 层。对于训练和评估,可以使用 TPU 分布策略 `tf.distribute.TPUStrategy`,例如,它与用于模型构建 (`tf.keras.Model`)、优化器 (`tf.keras.optimizers.Optimizer`) 的 Keras API 兼容,支持使用 `Model.fit` 或者自定义训练循环(使用 `tf.function` 和 `tf.GradientTape`)进行训练。\n", "\n", "有关详情,请参阅 `tfrs.layers.embedding.TPUEmbedding` 层的 API 文档,以及 `tf.tpu.experimental.embedding.TableConfig` 和 `tf.tpu.experimental.embedding.FeatureConfig` 文档。有关 `tf.distribute.TPUStrategy` 的概述,请查看[分布式训练](../../guide/distributed_training.ipynb)指南和[使用 TPU](../../guide/tpu.ipynb) 指南。如果您要从 `TPUEstimator` 迁移到 `TPUStrategy`,请查看 [TPU 迁移指南](tpu_estimator.ipynb)。" ] }, { "cell_type": "markdown", "metadata": { "id": "YdZSoIXEbhg-" }, "source": [ "## 安装\n", "\n", "首先,安装 [TensorFlow Recommenders](https://tensorflow.google.cn/recommenders) 并导入一些必要的软件包:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tYE3RnRN2jNu", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "!pip install tensorflow-recommenders" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "iE0vSfMXumKI", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "import tensorflow as tf\n", "import tensorflow.compat.v1 as tf1\n", "\n", "# TPUEmbedding layer is not part of TensorFlow.\n", "import tensorflow_recommenders as tfrs" ] }, { "cell_type": "markdown", "metadata": { "id": "Jsm9Rxx7s1OZ" }, "source": [ "然后,准备一个用于演示目的的简单数据集:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "m7rnGxsXtDkV", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "features = [[1., 1.5]]\n", "embedding_features_indices = [[0, 0], [0, 1]]\n", "embedding_features_values = [0, 5]\n", "labels = [[0.3]]\n", "eval_features = [[4., 4.5]]\n", "eval_embedding_features_indices = [[0, 0], [0, 1]]\n", "eval_embedding_features_values = [4, 3]\n", "eval_labels = [[0.8]]" ] }, { "cell_type": "markdown", "metadata": { "id": "4uXff1BEssdE" }, "source": [ "## TensorFlow 1:使用 TPUEstimator 在 TPU 上训练嵌入" ] }, { "cell_type": "markdown", "metadata": { "id": "pc-WSeYG2oje" }, "source": [ "在 TensorFlow 1 中,使用 `tf.compat.v1.tpu.experimental.embedding_column` API 设置 TPU 嵌入向量,并使用 `tf.compat.v1.estimator.tpu.TPUEstimator` 在 TPU 上训练/评估模型。\n", "\n", "输入是整数,范围从零到 TPU 嵌入向量表的词汇量大小。首先使用 `tf.feature_column.categorical_column_with_identity` 将输入编码为分类 ID。将 `\"sparse_feature\"` 用于 `key` 参数,因为输入特征是整数值,而 `num_buckets` 是嵌入向量表的词汇量大小 (`10`)。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "sO_y-IRT3dcM", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "embedding_id_column = (\n", " tf1.feature_column.categorical_column_with_identity(\n", " key=\"sparse_feature\", num_buckets=10))" ] }, { "cell_type": "markdown", "metadata": { "id": "57e2dec8ed4a" }, "source": [ "接下来,使用 `tpu.experimental.embedding_column` 将稀疏分类输入转换为密集表示,其中 `dimension` 是嵌入向量表的宽度。它将为每个 `num_buckets` 存储一个嵌入向量。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6d61c855011f", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "embedding_column = tf1.tpu.experimental.embedding_column(\n", " embedding_id_column, dimension=5)" ] }, { "cell_type": "markdown", "metadata": { "id": "c6061452ee5a" }, "source": [ "现在,通过 `tf.estimator.tpu.experimental.EmbeddingConfigSpec` 定义特定于 TPU 的嵌入向量配置。稍后,您会将其作为 `embedding_config_spec` 参数传递给 `tf.estimator.tpu.TPUEstimator`。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6abbf967fc82", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "embedding_config_spec = tf1.estimator.tpu.experimental.EmbeddingConfigSpec(\n", " feature_columns=(embedding_column,),\n", " optimization_parameters=(\n", " tf1.tpu.experimental.AdagradParameters(0.05)))" ] }, { "cell_type": "markdown", "metadata": { "id": "BVWHEQj5a7rN" }, "source": [ "接下来,要使用 `TPUEstimator`,请定义:\n", "\n", "- 用于训练数据的输入函数\n", "- 用于评估数据的评估输入函数\n", "- 用于指示 `TPUEstimator` 如何使用特征和标签定义训练运算的模型函数" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "lqe9obf7suIj", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "def _input_fn(params):\n", " dataset = tf1.data.Dataset.from_tensor_slices((\n", " {\"dense_feature\": features,\n", " \"sparse_feature\": tf1.SparseTensor(\n", " embedding_features_indices,\n", " embedding_features_values, [1, 2])},\n", " labels))\n", " dataset = dataset.repeat()\n", " return dataset.batch(params['batch_size'], drop_remainder=True)\n", "\n", "def _eval_input_fn(params):\n", " dataset = tf1.data.Dataset.from_tensor_slices((\n", " {\"dense_feature\": eval_features,\n", " \"sparse_feature\": tf1.SparseTensor(\n", " eval_embedding_features_indices,\n", " eval_embedding_features_values, [1, 2])},\n", " eval_labels))\n", " dataset = dataset.repeat()\n", " return dataset.batch(params['batch_size'], drop_remainder=True)\n", "\n", "def _model_fn(features, labels, mode, params):\n", " embedding_features = tf1.keras.layers.DenseFeatures(embedding_column)(features)\n", " concatenated_features = tf1.keras.layers.Concatenate(axis=1)(\n", " [embedding_features, features[\"dense_feature\"]])\n", " logits = tf1.layers.Dense(1)(concatenated_features)\n", " loss = tf1.losses.mean_squared_error(labels=labels, predictions=logits)\n", " optimizer = tf1.train.AdagradOptimizer(0.05)\n", " optimizer = tf1.tpu.CrossShardOptimizer(optimizer)\n", " train_op = optimizer.minimize(loss, global_step=tf1.train.get_global_step())\n", " return tf1.estimator.tpu.TPUEstimatorSpec(mode, loss=loss, train_op=train_op)" ] }, { "cell_type": "markdown", "metadata": { "id": "QYnP3Dszc-2R" }, "source": [ "定义这些函数后,创建一个提供聚簇信息的 `tf.distribute.cluster_resolver.TPUClusterResolver` 和一个 `tf.compat.v1.estimator.tpu.RunConfig` 对象。\n", "\n", "连同您已定义的模型函数,现在您可以创建一个 `TPUEstimator`。在这里,您将通过跳过检查点保存来简化流程。随后,您将为 `TPUEstimator` 的训练和评估指定批次大小。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "WAqyqawemlcl", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "cluster_resolver = tf1.distribute.cluster_resolver.TPUClusterResolver(tpu='')\n", "print(\"All devices: \", tf1.config.list_logical_devices('TPU'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "HsOpjW5plH9Q", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "tpu_config = tf1.estimator.tpu.TPUConfig(\n", " iterations_per_loop=10,\n", " per_host_input_for_training=tf1.estimator.tpu.InputPipelineConfig\n", " .PER_HOST_V2)\n", "config = tf1.estimator.tpu.RunConfig(\n", " cluster=cluster_resolver,\n", " save_checkpoints_steps=None,\n", " tpu_config=tpu_config)\n", "estimator = tf1.estimator.tpu.TPUEstimator(\n", " model_fn=_model_fn, config=config, train_batch_size=8, eval_batch_size=8,\n", " embedding_config_spec=embedding_config_spec)" ] }, { "cell_type": "markdown", "metadata": { "id": "Uxw7tWrcepaZ" }, "source": [ "调用 `TPUEstimator.train` 以开始训练模型:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "WZPKFOMAcyrP", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "estimator.train(_input_fn, steps=1)" ] }, { "cell_type": "markdown", "metadata": { "id": "ev1vjIz9euIw" }, "source": [ "然后,调用 `TPUEstimator.evaluate` 以使用评估数据评估模型:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "bqiKRiwWc0cz", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "estimator.evaluate(_eval_input_fn, steps=1)" ] }, { "cell_type": "markdown", "metadata": { "id": "KEmzBjfnsxwT" }, "source": [ "## TensorFlow 2:使用 TPUStrategy 在 TPU 上训练嵌入向量" ] }, { "cell_type": "markdown", "metadata": { "id": "UesuXNbShrbi" }, "source": [ "在 TensorFlow 2 中,要在 TPU 工作进程上进行训练,请将 `tf.distribute.TPUStrategy` 与 Keras API 一起用于模型定义和训练/评估。(有关使用 Keras Model.fit 和自定义训练循环(使用 `tf.function` 和 `tf.GradientTape`)进行训练的更多示例,请参阅[使用 TPU](https://render.githubusercontent.com/guide/tpu.ipynb) 指南。)\n", "\n", "您需要执行一些初始化工作以连接到远程聚簇并初始化 TPU 工作进程,因此首先创建一个 `TPUClusterResolver` 以提供聚簇信息并连接到聚簇。(在[使用 TPU](../../guide/tpu.ipynb) 指南的 *TPU 初始化*部分中了解详情。)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "_TgdPNgXoS63", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')\n", "tf.config.experimental_connect_to_cluster(cluster_resolver)\n", "tf.tpu.experimental.initialize_tpu_system(cluster_resolver)\n", "print(\"All devices: \", tf.config.list_logical_devices('TPU'))" ] }, { "cell_type": "markdown", "metadata": { "id": "94JBD0HxmdPI" }, "source": [ "接下来,准备您的数据。这类似于您在 TensorFlow 1 示例中创建数据集的方式,不同之处在于数据集函数现在传递一个 `tf.distribute.InputContext` 对象而不是 `params` 字典。可以使用此对象来确定本地批次大小(以及此流水线用于哪个主机,以便可以正确对数据分区)。\n", "\n", "- 使用 `tfrs.layers.embedding.TPUEmbedding` API 时,务必在使用 `Dataset.batch` 对数据集进行批处理时包含 `drop_remainder=True` 选项,因为 `TPUEmbedding` 需要固定的批批次大小。\n", "- 此外,如果在同一组设备上进行评估和训练,则必须使用相同的批次大小。\n", "- 最后,您应当将 `tf.keras.utils.experimental.DatasetCreator` 与 `tf.distribute.InputOptions` 中的特殊输入选项 `experimental_fetch_to_device=False`(保存特定于策略的配置)一起使用。下面的代码对此进行了演示:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9NTruOw6mcy9", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "global_batch_size = 8\n", "\n", "def _input_dataset(context: tf.distribute.InputContext):\n", " dataset = tf.data.Dataset.from_tensor_slices((\n", " {\"dense_feature\": features,\n", " \"sparse_feature\": tf.SparseTensor(\n", " embedding_features_indices,\n", " embedding_features_values, [1, 2])},\n", " labels))\n", " dataset = dataset.shuffle(10).repeat()\n", " dataset = dataset.batch(\n", " context.get_per_replica_batch_size(global_batch_size),\n", " drop_remainder=True)\n", " return dataset.prefetch(2)\n", "\n", "def _eval_dataset(context: tf.distribute.InputContext):\n", " dataset = tf.data.Dataset.from_tensor_slices((\n", " {\"dense_feature\": eval_features,\n", " \"sparse_feature\": tf.SparseTensor(\n", " eval_embedding_features_indices,\n", " eval_embedding_features_values, [1, 2])},\n", " eval_labels))\n", " dataset = dataset.repeat()\n", " dataset = dataset.batch(\n", " context.get_per_replica_batch_size(global_batch_size),\n", " drop_remainder=True)\n", " return dataset.prefetch(2)\n", "\n", "input_options = tf.distribute.InputOptions(\n", " experimental_fetch_to_device=False)\n", "\n", "input_dataset = tf.keras.utils.experimental.DatasetCreator(\n", " _input_dataset, input_options=input_options)\n", "\n", "eval_dataset = tf.keras.utils.experimental.DatasetCreator(\n", " _eval_dataset, input_options=input_options)" ] }, { "cell_type": "markdown", "metadata": { "id": "R4EHXhN3CVmo" }, "source": [ "接下来,准备好数据后,将创建一个 `TPUStrategy`,然后在此策略的范围 (`Strategy.scope`) 下定义模型、指标和优化器。\n", "\n", "您应当为 `Model.compile` 中的 `steps_per_execution` 选择一个数字,因为它指定了每次 `tf.function` 调用期间要运行的批次数,并且对性能至关重要。此参数类似于 `TPUEstimator` 中使用的 `iterations_per_loop`。\n", "\n", "TensorFlow 1 中通过 `tf.tpu.experimental.embedding_column`(和 `tf.tpu.experimental.shared_embedding_column`)指定的特征和表配置可以通过一对配置对象在 TensorFlow 2 中直接指定:\n", "\n", "- `tf.tpu.experimental.embedding.FeatureConfig`\n", "- `tf.tpu.experimental.embedding.TableConfig`\n", "\n", "(请参阅相关的 API 文档,了解更多详细信息。)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "atVciNgPs0fw", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "strategy = tf.distribute.TPUStrategy(cluster_resolver)\n", "with strategy.scope():\n", " if hasattr(tf.keras.optimizers, \"legacy\"):\n", " optimizer = tf.keras.optimizers.legacy.Adagrad(learning_rate=0.05)\n", " else:\n", " optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.05)\n", " dense_input = tf.keras.Input(shape=(2,), dtype=tf.float32, batch_size=global_batch_size)\n", " sparse_input = tf.keras.Input(shape=(), dtype=tf.int32, batch_size=global_batch_size)\n", " embedded_input = tfrs.layers.embedding.TPUEmbedding(\n", " feature_config=tf.tpu.experimental.embedding.FeatureConfig(\n", " table=tf.tpu.experimental.embedding.TableConfig(\n", " vocabulary_size=10,\n", " dim=5,\n", " initializer=tf.initializers.TruncatedNormal(mean=0.0, stddev=1)),\n", " name=\"sparse_input\"),\n", " optimizer=optimizer)(sparse_input)\n", " input = tf.keras.layers.Concatenate(axis=1)([dense_input, embedded_input])\n", " result = tf.keras.layers.Dense(1)(input)\n", " model = tf.keras.Model(inputs={\"dense_feature\": dense_input, \"sparse_feature\": sparse_input}, outputs=result)\n", " model.compile(optimizer, \"mse\", steps_per_execution=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "FkM2VZyni98F" }, "source": [ "这样,您就可以使用训练数据集训练模型了:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Kip65sYBlKiu", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "model.fit(input_dataset, epochs=5, steps_per_epoch=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "r0AEK8sNjLOj" }, "source": [ "最后,使用评估数据集评估模型:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6tMRkyfKhqSL", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "model.evaluate(eval_dataset, steps=1, return_dict=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "a97b888c1911" }, "source": [ "## 后续步骤" ] }, { "cell_type": "markdown", "metadata": { "id": "gHx_RUL8xcJ3" }, "source": [ "在 API 文档中详细了解如何设置特定于 TPU 的嵌入向量:\n", "\n", "- `tfrs.layers.embedding.TPUEmbedding`:特别介绍了特征和表配置、设置优化器、创建模型(使用 Keras [函数式](https://tensorflow.google.cn/guide/keras/functional) API 或通过[子类化](../..guide/keras/custom_layers_and_models.ipynb) `tf.keras.Model`)、训练/评估以及使用 `tf.saved_model` 应用模型\n", "- `tf.tpu.experimental.embedding.TableConfig`\n", "- `tf.tpu.experimental.embedding.FeatureConfig`\n", "\n", "要详细了解 TensorFlow 2 中的 `TPUStrategy`,请查看以下资源:\n", "\n", "- 指南:[使用 TPU](../../guide/tpu.ipynb)(涵盖使用 Keras `Model.fit` 进行训练/使用 `tf.distribute.TPUStrategy` 进行自定义训练循环,以及使用 `tf.function` 提升性能的技巧)\n", "- 指南:[使用 TensorFlow 进行分布式训练](../../guide/distributed_training.ipynb)\n", "- 指南:[从 TPUEstimator 迁移到 TPUStrategy](tpu_estimator.ipynb)。\n", "\n", "要详细了解如何自定义训练,请参阅:\n", "\n", "- 指南:[自定义 Model.fit 的功能](../..guide/keras/customizing_what_happens_in_fit.ipynb)\n", "- 指南:[从头开始编写训练循环](https://tensorflow.google.cn/guide/keras/writing_a_training_loop_from_scratch)\n", "\n", "TPUs(Google 用于机器学习的专用 ASIC)可通过 [Google Colab](https://colab.research.google.com/)、[TPU Research Cloud](https://sites.research.google/trc/) 和 [Cloud TPU](https://cloud.google.com/tpu) 获得。" ] } ], "metadata": { "accelerator": "TPU", "colab": { "collapsed_sections": [], "name": "tpu_embedding.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }