{ "cells": [ { "cell_type": "markdown", "metadata": { "cellView": "form", "id": "tXAbWHtqs1Y2" }, "source": [ "````{admonition} Copyright 2018 The TensorFlow Authors.\n", "```\n", "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "```\n", "````" ] }, { "cell_type": "markdown", "metadata": { "id": "HTgMAvQq-PU_" }, "source": [ "# 不规则张量\n", "\n", "
![]() | \n",
" ![]() | \n",
" ![]() | \n",
" ![]() | \n",
"
tf.segment_sum
等运算使用的[分段](https://tensorflow.google.cn/api_docs/python/tf/math#about_segmentation)格式相匹配。`row_limits` 方案与 `tf.sequence_mask` 等运算使用的格式相匹配。\n",
"- **均匀维**:如下文所述,`uniform_row_length` 编码用于对具有均匀维的不规则张量进行编码。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bpB7xKoUPtU6"
},
"source": [
"### 多个不规则维度\n",
"\n",
"具有多个不规则维度的不规则张量通过为 `values` 张量使用嵌套 `RaggedTensor` 进行编码。每个嵌套 `RaggedTensor` 都会增加一个不规则维度。\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yy3IGT2a-PWb"
},
"outputs": [],
"source": [
"rt = tf.RaggedTensor.from_row_splits(\n",
" values=tf.RaggedTensor.from_row_splits(\n",
" values=[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],\n",
" row_splits=[0, 3, 3, 5, 9, 10]),\n",
" row_splits=[0, 1, 1, 5])\n",
"print(rt)\n",
"print(\"Shape: {}\".format(rt.shape))\n",
"print(\"Number of partitioned dimensions: {}\".format(rt.ragged_rank))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5HqEEDzk-PWc"
},
"source": [
"工厂函数 `tf.RaggedTensor.from_nested_row_splits` 可用于通过提供一个 `row_splits` 张量列表,直接构造具有多个不规则维度的 RaggedTensor:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "AKYhtFcT-PWd"
},
"outputs": [],
"source": [
"rt = tf.RaggedTensor.from_nested_row_splits(\n",
" flat_values=[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],\n",
" nested_row_splits=([0, 1, 1, 5], [0, 3, 3, 5, 9, 10]))\n",
"print(rt)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BqAfbkAC56m0"
},
"source": [
"### 不规则秩和扁平值\n",
"\n",
"不规则张量的***不规则秩***是底层 `values` 张量的分区次数(即 `RaggedTensor` 对象的嵌套深度)。最内层的 `values` 张量称为其 ***flat_values***。在以下示例中,`conversations` 具有 ragged_rank=3,其 `flat_values` 为具有 24 个字符串的一维 `Tensor`:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BXp-Tt2bClem"
},
"outputs": [],
"source": [
"# shape = [batch, (paragraph), (sentence), (word)]\n",
"conversations = tf.ragged.constant(\n",
" [[[[\"I\", \"like\", \"ragged\", \"tensors.\"]],\n",
" [[\"Oh\", \"yeah?\"], [\"What\", \"can\", \"you\", \"use\", \"them\", \"for?\"]],\n",
" [[\"Processing\", \"variable\", \"length\", \"data!\"]]],\n",
" [[[\"I\", \"like\", \"cheese.\"], [\"Do\", \"you?\"]],\n",
" [[\"Yes.\"], [\"I\", \"do.\"]]]])\n",
"conversations.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "DZUMrgxXFd5s"
},
"outputs": [],
"source": [
"assert conversations.ragged_rank == len(conversations.nested_row_splits)\n",
"conversations.ragged_rank # Number of partitioned dimensions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xXLSNpS0Fdvp"
},
"outputs": [],
"source": [
"conversations.flat_values.numpy()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uba2EnAY-PWf"
},
"source": [
"### 均匀内层维度\n",
"\n",
"具有均匀内层维度的不规则张量通过为 flat_values(即最内层 `values`)使用多维 `tf.Tensor` 进行编码。\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "z2sHwHdy-PWg"
},
"outputs": [],
"source": [
"rt = tf.RaggedTensor.from_row_splits(\n",
" values=[[1, 3], [0, 0], [1, 3], [5, 3], [3, 3], [1, 2]],\n",
" row_splits=[0, 3, 4, 6])\n",
"print(rt)\n",
"print(\"Shape: {}\".format(rt.shape))\n",
"print(\"Number of partitioned dimensions: {}\".format(rt.ragged_rank))\n",
"print(\"Flat values shape: {}\".format(rt.flat_values.shape))\n",
"print(\"Flat values:\\n{}\".format(rt.flat_values))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WoGRKd50x_qz"
},
"source": [
"### 均匀非内层维度\n",
"\n",
"具有均匀非内层维度的不规则张量通过使用 `uniform_row_length` 对行分区进行编码。\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "70q1aCKwySgS"
},
"outputs": [],
"source": [
"rt = tf.RaggedTensor.from_uniform_row_length(\n",
" values=tf.RaggedTensor.from_row_splits(\n",
" values=[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],\n",
" row_splits=[0, 3, 5, 9, 10]),\n",
" uniform_row_length=2)\n",
"print(rt)\n",
"print(\"Shape: {}\".format(rt.shape))\n",
"print(\"Number of partitioned dimensions: {}\".format(rt.ragged_rank))"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "ragged_tensor.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "xxx",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}