构建神经网络

构建神经网络#

神经网络由执行数据运算的层/模块组成。PyTorch 的 torch.nn 命名空间提供了构建自定义神经网络所需的所有基本构件。在 PyTorch 中，每个模块都是 torch.nn.Module 类的子类。神经网络本身就是模块，它包含其他模块（层）。这种嵌套结构使得构建和管理复杂的架构变得简单。

在接下来的部分中，将构建神经网络来对 FashionMNIST 数据集中的图像进行分类。

import torch
from torch import nn

获取训练设备#

希望能够在硬件加速器上训练模型，例如 GPU 或 MPS（如果可用的话）。检查torch.cuda 或 torch.backends.mps 是否可用，否则将使用 CPU。

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cuda device

定义模型类#

通过子类化 torch.nn.Module 定义神经网络，并在 __init__ 中初始化神经网络层。每个 torch.nn.Module 子类都在 forward() 方法中实现对输入数据的操作。

class NeuralNetwork(nn.Module):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

实例化 NeuralNetwork 对象，并将其转移到指定的设备上，然后打印出它的结构。

model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

为了使用这个模型，向其传递输入数据。这会执行模型的 forward 方法以及一些后台操作。请勿直接调用 model.forward()！

对输入调用模型会返回二维张量，其中 dim=0 对应每个类别的 10 个原始预测值，而 dim=1 对应每个输出的个别值。通过将结果传递给 torch.nn.Softmax 模块的实例，可以获得预测概率。

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([3], device='cuda:0')

模型层#

详细解析 FashionMNIST 模型中的各层结构。为了说明这一点，取包含 3 张尺寸为 $28 \times 28$ 的图片的样本小批量，观察它在通过网络传递时的变化情况。

input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])

`torch.nn.Flatten`#

初始化了 torch.nn.Flatten 层，将每个 2D 的 $28 \times 28$ 图像转换为连续的 $784$ 像素值数组（保持小批量维度（在 dim=0 处））。

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])

`torch.nn.Linear`#

torch.nn.Linear 类是模块，它通过使用其存储的权重和偏置对输入执行线性变换。

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])

`torch.nn.ReLU`#

非线性激活函数是创建模型输入和输出之间复杂映射的关键。它们在线性变换后被应用，以引入 非线性，帮助神经网络学习各种现象。

在这个模型中，在线性层之间使用了torch.nn.ReLU，但在你的模型中也可以使用其他激活函数来引入非线性。

print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[ 0.4968, -0.1883,  0.3967, -0.3151,  0.1528,  0.0195, -0.3309, -0.0788,
         -0.3464,  0.0711, -0.4967, -0.0456, -0.0560, -0.1457, -0.1733, -0.0953,
         -0.0309, -0.5995,  0.2890, -0.0185],
        [ 0.3921, -0.2639,  0.6610, -0.4670,  0.0660, -0.0015, -0.1707, -0.0268,
         -0.2920, -0.0329, -0.5934,  0.0480, -0.4621,  0.3773,  0.2170,  0.4392,
          0.1168, -0.7666,  0.1065,  0.1997],
        [ 0.3515, -0.2565,  0.1049, -0.7943,  0.4229,  0.4230, -0.0615,  0.1683,
          0.0265, -0.0217, -0.3115, -0.1737, -0.8239, -0.0935,  0.1612,  0.1222,
          0.0991, -0.7683,  0.0792, -0.0179]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.4968, 0.0000, 0.3967, 0.0000, 0.1528, 0.0195, 0.0000, 0.0000, 0.0000,
         0.0711, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.2890, 0.0000],
        [0.3921, 0.0000, 0.6610, 0.0000, 0.0660, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0480, 0.0000, 0.3773, 0.2170, 0.4392, 0.1168, 0.0000,
         0.1065, 0.1997],
        [0.3515, 0.0000, 0.1049, 0.0000, 0.4229, 0.4230, 0.0000, 0.1683, 0.0265,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1612, 0.1222, 0.0991, 0.0000,
         0.0792, 0.0000]], grad_fn=<ReluBackward0>)

`torch.nn.Sequential`#

torch.nn.Sequential 是有序的模块容器。数据按照定义的顺序通过所有模块。你可以使用顺序容器来快速构建网络，例如 seq_modules。

seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

`torch.nn.Softmax`#

神经网络的最后一层线性层返回的是 logits

原始值在 $[- \infty, \infty]$
这些值被传递到 torch.nn.Softmax 模块。Logits 被缩放到 $[0, 1]$ 的值，代表了模型预测的每个类别的概率。dim 参数指明了沿着哪个维度的值必须加起来等于1。

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

模型参数#

神经网络内部的许多层是 参数化的，即它们具有在训练过程中优化的权重和偏差。继承自 torch.nn.Module 会自动跟踪在模型对象内定义的所有字段，并允许通过模型的 torch.nn.Module.parameters() 或 torch.nn.Module.named_parameters() 方法访问所有参数。

在这个例子中，遍历每个参数，并打印其大小和值的预览。

print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

构建神经网络

目录

构建神经网络#

获取训练设备#

定义模型类#

模型层#

torch.nn.Flatten#

torch.nn.Linear#

torch.nn.ReLU#

torch.nn.Sequential#

torch.nn.Softmax#