{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "wJcYs_ERTnnI" }, "outputs": [], "source": [ "##### Copyright 2021 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "HMUDt0CiUJk9", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "77z2OchJTk0l" }, "source": [ "# 迁移示例:预设 Estimator\n", "\n", "\n", " \n", "\n", " \n", " \n", "
在 TensorFlow.org 上查看 在 Google Colab 中运行\n", " 在 Github 上查看源代码 下载笔记本
" ] }, { "cell_type": "markdown", "metadata": { "id": "meUTrR4I6m1C" }, "source": [ "预设(或预制)Estimator 在 TensorFlow 1 中一直被用作一种快速简单的方式来针对各种典型用例训练模型。TensorFlow 2 通过 Keras 模型为其中一些方式提供了直接的近似替代。对于那些没有内置 TensorFlow 2 替代的预设 Estimator,您仍然能够相当轻松地构建自己的替代。\n", "\n", "本指南将通过几个直接等效项和自定义替代示例来演示如何使用 Keras 将 TensorFlow 1 的 `tf.estimator` 派生模型迁移到 TensorFlow 2。\n", "\n", "即,本指南包含下列迁移过程的示例:\n", "\n", "- 从 TensorFlow 1 中 `tf.estimator` 的 `LinearEstimator`、`Classifier` 或 `Regressor` 到 TensorFlow 2 中的 Keras `tf.compat.v1.keras.models.LinearModel`\n", "- 从 TensorFlow 1 中 `tf.estimator` 的 `DNNEstimator`、`Classifier` 或 `Regressor` 到 TensorFlow 2 中的自定义 Keras DNN ModelKeras\n", "- 从 TensorFlow 1 中 `tf.estimator` 的 `DNNLinearCombinedEstimator`、`Classifier` 或 `Regressor` 到 TensorFlow 2 中的 `tf.compat.v1.keras.models.WideDeepModel`\n", "- 从 TensorFlow 1 中 `tf.estimator` 的 `BoostedTreesEstimator`、`Classifier` 或 `Regressor` 到 TensorFlow 2 中的 `tfdf.keras.GradientBoostedTreesModel` in\n", "\n", "模型训练的一个常见前身是特征预处理,可以使用 `tf.feature_column` 为 TensorFlow 1 Estimator 模型完成此过程。有关 TensorFlow 2 中特征预处理的更多信息,请参阅[有关从特征列迁移到 Keras 预处理层 API 的本指南](migrating_feature_columns.ipynb)。" ] }, { "cell_type": "markdown", "metadata": { "id": "YdZSoIXEbhg-" }, "source": [ "## 安装\n", "\n", "从几个必要的 TensorFlow 导入开始:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qsgZp0f-nu9s", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "!pip install tensorflow_decision_forests" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "iE0vSfMXumKI", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "import keras\n", "import pandas as pd\n", "import tensorflow as tf\n", "import tensorflow.compat.v1 as tf1\n", "import tensorflow_decision_forests as tfdf\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Jsm9Rxx7s1OZ" }, "source": [ "从标准 Titanic 数据集中准备一些简单的数据进行演示:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wC6i_bEZPrPY", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "x_train = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')\n", "x_eval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')\n", "x_train['sex'].replace(('male', 'female'), (0, 1), inplace=True)\n", "x_eval['sex'].replace(('male', 'female'), (0, 1), inplace=True)\n", "\n", "x_train['alone'].replace(('n', 'y'), (0, 1), inplace=True)\n", "x_eval['alone'].replace(('n', 'y'), (0, 1), inplace=True)\n", "\n", "x_train['class'].replace(('First', 'Second', 'Third'), (1, 2, 3), inplace=True)\n", "x_eval['class'].replace(('First', 'Second', 'Third'), (1, 2, 3), inplace=True)\n", "\n", "x_train.drop(['embark_town', 'deck'], axis=1, inplace=True)\n", "x_eval.drop(['embark_town', 'deck'], axis=1, inplace=True)\n", "\n", "y_train = x_train.pop('survived')\n", "y_eval = x_eval.pop('survived')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "lqe9obf7suIj", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "# Data setup for TensorFlow 1 with `tf.estimator`\n", "def _input_fn():\n", " return tf1.data.Dataset.from_tensor_slices((dict(x_train), y_train)).batch(32)\n", "\n", "\n", "def _eval_input_fn():\n", " return tf1.data.Dataset.from_tensor_slices((dict(x_eval), y_eval)).batch(32)\n", "\n", "\n", "FEATURE_NAMES = [\n", " 'age', 'fare', 'sex', 'n_siblings_spouses', 'parch', 'class', 'alone'\n", "]\n", "\n", "feature_columns = []\n", "for fn in FEATURE_NAMES:\n", " feat_col = tf1.feature_column.numeric_column(fn, dtype=tf.float32)\n", " feature_columns.append(feat_col)" ] }, { "cell_type": "markdown", "metadata": { "id": "bYSgoezeMrpI" }, "source": [ "然后,创建一个方法来实例化一个简单的样本优化器,以便与我们的各种 TensorFlow 1 Estimator 和 TensorFlow 2 Keras 模型一起使用。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "YHB_nuzVLVLe", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "def create_sample_optimizer(tf_version):\n", " if tf_version == 'tf1':\n", " optimizer = lambda: tf.keras.optimizers.legacy.Ftrl(\n", " l1_regularization_strength=0.001,\n", " learning_rate=tf1.train.exponential_decay(\n", " learning_rate=0.1,\n", " global_step=tf1.train.get_global_step(),\n", " decay_steps=10000,\n", " decay_rate=0.9))\n", " elif tf_version == 'tf2':\n", " optimizer = tf.keras.optimizers.legacy.Ftrl(\n", " l1_regularization_strength=0.001,\n", " learning_rate=tf.keras.optimizers.schedules.ExponentialDecay(\n", " initial_learning_rate=0.1, decay_steps=10000, decay_rate=0.9))\n", " return optimizer" ] }, { "cell_type": "markdown", "metadata": { "id": "4uXff1BEssdE" }, "source": [ "## 示例 1:从 LinearEstimator 迁移" ] }, { "cell_type": "markdown", "metadata": { "id": "_O7fyhCnpvED" }, "source": [ "### TensorFlow 1:使用 LinearEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "A9560BqEOTpb" }, "source": [ "在 TensorFlow 1 中,可以使用 `tf.estimator.LinearEstimator` 为回归和分类问题创建基线线性模型。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "oWfh0QW4IXTn", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "linear_estimator = tf.estimator.LinearEstimator(\n", " head=tf.estimator.BinaryClassHead(),\n", " feature_columns=feature_columns,\n", " optimizer=create_sample_optimizer('tf1'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hi77Sg4k-0TR", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "linear_estimator.train(input_fn=_input_fn, steps=100)\n", "linear_estimator.evaluate(input_fn=_eval_input_fn, steps=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "KEmzBjfnsxwT" }, "source": [ "### TensorFlow 2:使用 Keras LinearModel" ] }, { "cell_type": "markdown", "metadata": { "id": "fkgkGf_AOaRR" }, "source": [ "在 TensorFlow 2 中,可以创建 Keras `tf.compat.v1.keras.models.LinearModel` 的实例,它是 `tf.estimator.LinearEstimator` 的替代。`tf.compat.v1.keras` 路径用于表示预制模型的存在目的是兼容性。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Kip65sYBlKiu", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "linear_model = tf.compat.v1.keras.experimental.LinearModel()\n", "linear_model.compile(loss='mse', optimizer=create_sample_optimizer('tf2'), metrics=['accuracy'])\n", "linear_model.fit(x_train, y_train, epochs=10)\n", "linear_model.evaluate(x_eval, y_eval, return_dict=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "RRrj78Lqplni" }, "source": [ "## 示例 2:从 DNNEstimator 迁移" ] }, { "cell_type": "markdown", "metadata": { "id": "YKl6XZ7Bp1t5" }, "source": [ "### TensorFlow 1:使用 DNNEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "J7wJUmgypln8" }, "source": [ "在 TensorFlow 1 中,可以使用 `tf.estimator.DNNEstimator` 为回归和分类问题创建基线深度神经网络 (DNN) 模型。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qHbgXCzfpln9", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "dnn_estimator = tf.estimator.DNNEstimator(\n", " head=tf.estimator.BinaryClassHead(),\n", " feature_columns=feature_columns,\n", " hidden_units=[128],\n", " activation_fn=tf.nn.relu,\n", " optimizer=create_sample_optimizer('tf1'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6DTnXxU2pln-", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "dnn_estimator.train(input_fn=_input_fn, steps=100)\n", "dnn_estimator.evaluate(input_fn=_eval_input_fn, steps=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "6xJz6px6pln-" }, "source": [ "### TensorFlow 2:使用 Keras 创建自定义 DNN 模型" ] }, { "cell_type": "markdown", "metadata": { "id": "7cgc72rzpln-" }, "source": [ "在 TensorFlow 2 中,可以创建一个自定义 DNN 模型来替代由 `tf.estimator.DNNEstimator` 生成的模型,此模型具有类似级别的用户指定自定义(例如,与前面的示例一样,能够自定义选定的模型优化器)。\n", "\n", "可以使用类似的工作流将 `tf.estimator.experimental.RNNEstimator` 替换为 Keras RNN 模型。Keras 通过 `tf.keras.layers.RNN`、`tf.keras.layers.LSTM` 和 `tf.keras.layers.GRU` 提供了许多内置的可自定义选项。要了解详情,请查看 [使用 Keras 的 RNN 指南](https://tensorflow.google.cn/guide/keras/rnn)的*内置 RNN 层:简单示例*部分。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "B5SdsjlL49RG", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "dnn_model = tf.keras.models.Sequential(\n", " [tf.keras.layers.Dense(128, activation='relu'),\n", " tf.keras.layers.Dense(1)])\n", "\n", "dnn_model.compile(loss='mse', optimizer=create_sample_optimizer('tf2'), metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JQmRw9_Upln_", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "dnn_model.fit(x_train, y_train, epochs=10)\n", "dnn_model.evaluate(x_eval, y_eval, return_dict=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "UeBHZ0cd1Pl2" }, "source": [ "## 示例 3:从 DNNLinearCombinedEstimator 迁移" ] }, { "cell_type": "markdown", "metadata": { "id": "GfRaObf5g4TU" }, "source": [ "### TensorFlow 1:使用 DNNLinearCombinedEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "2r13RMX-g4TV" }, "source": [ "在 TensorFlow 1 中,可以使用 `tf.estimator.DNNLinearCombinedEstimator` 为回归和分类问题创建基线组合模型,并为其线性和 DNN 组件提供自定义能力。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OyyDCqc5j7rf", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "optimizer = create_sample_optimizer('tf1')\n", "\n", "combined_estimator = tf.estimator.DNNLinearCombinedEstimator(\n", " head=tf.estimator.BinaryClassHead(),\n", " # Wide settings\n", " linear_feature_columns=feature_columns,\n", " linear_optimizer=optimizer,\n", " # Deep settings\n", " dnn_feature_columns=feature_columns,\n", " dnn_hidden_units=[128],\n", " dnn_optimizer=optimizer)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aXN-BxwzmRaf", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "combined_estimator.train(input_fn=_input_fn, steps=100)\n", "combined_estimator.evaluate(input_fn=_eval_input_fn, steps=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "BeMikL5ug4TX" }, "source": [ "### TensorFlow 2:使用 Keras WideDeepModel" ] }, { "cell_type": "markdown", "metadata": { "id": "CYByxxBhg4TX" }, "source": [ "在 TensorFlow 2 中,可以创建 Keras `tf.compat.v1.keras.models.WideDeepModel` 的一个实例来替代由 `tf.estimator.DNNLinearCombinedEstimator` 生成的实例,此实例具有类似级别的用户指定自定义(例如,与前面的示例一样,能够自定义选定的模型优化器)。\n", "\n", "此 `WideDeepModel` 是在 `LinearModel` 组件和自定义 DNN 模型的基础上构造的,这两者均已在前面的两个示例中进行了探讨。如果需要,也可以使用自定义线性模型来替代内置的 Keras `LinearModel`。\n", "\n", "如果您想构建自己的模型而不是预设 Estimator,请查看 [Keras 序贯模型](https://tensorflow.google.cn/guide/keras/sequential_model)指南。有关自定义训练和优化器的更多信息,请参阅[自定义训练:演示](https://tensorflow.google.cn/tutorials/customization/custom_training_walkthrough)指南。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mIFM3e-_RLSX", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "# Create LinearModel and DNN Model as in Examples 1 and 2\n", "optimizer = create_sample_optimizer('tf2')\n", "\n", "linear_model = tf.compat.v1.keras.experimental.LinearModel()\n", "linear_model.compile(loss='mse', optimizer=optimizer, metrics=['accuracy'])\n", "linear_model.fit(x_train, y_train, epochs=10, verbose=0)\n", "\n", "dnn_model = tf.keras.models.Sequential(\n", " [tf.keras.layers.Dense(128, activation='relu'),\n", " tf.keras.layers.Dense(1)])\n", "dnn_model.compile(loss='mse', optimizer=optimizer, metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mFmQz9kjmMSx", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "combined_model = tf.compat.v1.keras.experimental.WideDeepModel(linear_model,\n", " dnn_model)\n", "combined_model.compile(\n", " optimizer=[optimizer, optimizer], loss='mse', metrics=['accuracy'])\n", "combined_model.fit([x_train, x_train], y_train, epochs=10)\n", "combined_model.evaluate(x_eval, y_eval, return_dict=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "wP1DBRhpeOJn" }, "source": [ "## 示例 4:从 BoostedTreesEstimator 迁移" ] }, { "cell_type": "markdown", "metadata": { "id": "_3mCQVDSeOKD" }, "source": [ "### TensorFlow 1:使用 BoostedTreesEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "oEWYHNt4eOKD" }, "source": [ "在 TensorFlow 1 中,可以使用 `tf.estimator.BoostedTreesEstimator` 创建基线,以使用用于回归和分类问题的决策树集合创建一个基线梯度提升模型。TensorFlow 2 中不再包含此功能。" ] }, { "cell_type": "markdown", "metadata": { "id": "wliVIER1jLnA" }, "source": [ "```\n", "bt_estimator = tf1.estimator.BoostedTreesEstimator(\n", " head=tf.estimator.BinaryClassHead(),\n", " n_batches_per_layer=1,\n", " max_depth=10,\n", " n_trees=1000,\n", " feature_columns=feature_columns)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "-K87uBrZjR0u" }, "source": [ "```\n", "bt_estimator.train(input_fn=_input_fn, steps=1000)\n", "bt_estimator.evaluate(input_fn=_eval_input_fn, steps=100)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "eNuLP6BeeOKF" }, "source": [ "### TensorFlow 2:使用 TensorFlow Decision Forests" ] }, { "cell_type": "markdown", "metadata": { "id": "m3EVq388eOKF" }, "source": [ "在 TensorFlow 2 中,`tf.estimator.BoostedTreesEstimator` 被 [TensorFlow Decision Forests](https://tensorflow.google.cn/decision_forests) 软件包中的 [tfdf.keras.GradientBoostedTreesModel](https://tensorflow.google.cn/decision_forests/api_docs/python/tfdf/keras/GradientBoostedTreesModel#attributes) 所替代。\n", "\n", "与 `tf.estimator.BoostedTreesEstimator` 相比,TensorFlow Decision Forests 具备多项优势,尤其是在质量、速度、易用性和灵活性方面。要了解 TensorFlow Decision Forests,请从[初学者 colab](https://tensorflow.google.cn/decision_forests/tutorials/beginner_colab) 开始。\n", "\n", "以下示例显示了如何使用 TensorFlow 2 训练梯度提升树模型:" ] }, { "cell_type": "markdown", "metadata": { "id": "UB90fXJdVWC5" }, "source": [ "安装 TensorFlow Decision Forests。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9097mTCIVVE9", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "!pip install tensorflow_decision_forests" ] }, { "cell_type": "markdown", "metadata": { "id": "B1qTdAS-VpXk" }, "source": [ "创建一个 TensorFlow 数据集。请注意,Decision Forests 原生支持多种类型的特征,不需要预处理。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jkjFHmDTVswY", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "train_dataframe = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')\n", "eval_dataframe = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')\n", "\n", "# Convert the Pandas Dataframes into TensorFlow datasets.\n", "train_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(train_dataframe, label=\"survived\")\n", "eval_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(eval_dataframe, label=\"survived\")" ] }, { "cell_type": "markdown", "metadata": { "id": "7fPa-LfDWDzB" }, "source": [ "在 `train_dataset` 数据集上训练模型。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JO0yCH9hWPvJ", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "# Use the default hyper-parameters of the model.\n", "gbt_model = tfdf.keras.GradientBoostedTreesModel()\n", "gbt_model.fit(train_dataset)" ] }, { "cell_type": "markdown", "metadata": { "id": "2Y5xm29AWGxt" }, "source": [ "在 `eval_dataset` 数据集上评估模型的质量。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JLS_2vKKeOKF", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "gbt_model.compile(metrics=['accuracy'])\n", "gbt_evaluation = gbt_model.evaluate(eval_dataset, return_dict=True)\n", "print(gbt_evaluation)" ] }, { "cell_type": "markdown", "metadata": { "id": "Z22UJ5SUqToQ" }, "source": [ "梯度提升树只是 TensorFlow Decision Forests 中可用的众多决策森林算法之一。例如,随机森林(以 [tfdf.keras.GradientBoostedTreesModel](https://tensorflow.google.cn/decision_forests/api_docs/python/tfdf/keras/RandomForestModel) 的形式提供,非常抗过拟合),而 CART(以 [tfdf.keras.CartModel](https://tensorflow.google.cn/decision_forests/api_docs/python/tfdf/keras/CartModel) 的形式提供)则非常适合模型解释。\n", "\n", "在下一个示例中,训练并绘制一个随机森林模型。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "W3slOhn4Zi9X", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "# Train a Random Forest model\n", "rf_model = tfdf.keras.RandomForestModel()\n", "rf_model.fit(train_dataset)\n", "\n", "# Evaluate the Random Forest model\n", "rf_model.compile(metrics=['accuracy'])\n", "rf_evaluation = rf_model.evaluate(eval_dataset, return_dict=True)\n", "print(rf_evaluation)" ] }, { "cell_type": "markdown", "metadata": { "id": "Z0QYolhoZb_k" }, "source": [ "在最后一个示例中,训练和评估一个 CART 模型。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "027bGnCork_W", "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "# Train a CART model\n", "cart_model = tfdf.keras.CartModel()\n", "cart_model.fit(train_dataset)\n", "\n", "# Plot the CART model\n", "tfdf.model_plotter.plot_model_in_colab(cart_model, max_depth=2)" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "canned_estimators.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }