回顾 之前讨论的分类问题都是二分类问题,那么在多分类问题下如和设计网络。
输出层 对于多分类问题,如MNIST手写数字分类,在用了Sigmoid函数之后,其每一个输出节点的值都会在0到1之间,表示某样本属于$i\in [0,9]$的概率为$\hat{y}_i$.
如此一来,相当于把每一个类别看成一个二分类的问题,若是个类别的输出分别为$\hat{y}=[0,8,0.9,0.91,0.95,0.5,0.3,0.6,0.2,0.1,0.16]$,我们当然可以按概率大小得出该样本最有可能是$P(3)=0.95$。但是$P(0),P(1),P(2)$都很大,这在逻辑上解释不通啊,一个东西要么像个啥,不能即像又像。因此,我们的输出必须每个类别将彼此有种竞争(competive)关系,要能够互相抑制,不会产生歧义,且满足一个概率分布的性质
Softmax 运算 Softmax 运算解决了上述提出的问题,他通过下式将输出值变换成为正且和为1的概率分布:
其中:
在输出层后添加Softmax层,即可实现:
交叉熵损失函数 计算两个分布之间差异的方法:
如若真实值向量只有一个类别为$y_{j}^{(i)} = 1$,其余为0,上式可简化为:
numpy 实现NLLLosss损失
1 2 3 4 5 6 7 import numpy as npy = np.array([1 ,0 ,0 ]) z = np.array([0.2 ,0.1 ,-0.1 ]) y_pred = np.exp(z) / np.exp(z).sum () loss = (-y * np.log(y_pred)).sum () print (loss)
Pytorch 实现交叉熵损失 网络的最后一层不用做非线性变换(Sigmoid),直接用交叉熵损失,其内含softmax运算。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import torchy = torch.LongTensor([2 ,0 ,1 ]) y_pred1 = torch.Tensor([[0.1 ,0.2 ,0.9 ],[1.1 ,0.1 ,0.2 ],[0.2 ,2.1 ,0.1 ]]) y_pred2 = torch.Tensor([[0.8 ,0.2 ,0.3 ],[0.2 ,0.3 ,0.5 ],[0.2 ,0.2 ,0.5 ]]) criterion = torch.nn.CrossEntropyLoss() loss1 = criterion(y_pred1,y) loss2 = criterion(y_pred2,y) print ('batch loss1 = ' ,loss1.data,'\nbatch loss2 = ' ,loss2.data)
文档:
MNIST手写数字识别实战
该数据集的样本为一个$28\times 28 \text{ } (Width \times Height)$的图像(由一个矩阵表示),矩阵中每个每个元素值在0到255之间,将其映射到0到1之间后如右图。
1. 准备数据集: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 import torchfrom torchvision import transformsfrom torchvision import datasetsfrom torch.utils.data import DataLoaderimport torch.nn.functional as Fimport torch.optim as optimbatch_size = 64 transform = transforms.Compose([ transforms.ToTensor(), '''PIL(Pillow) 图像转 Tensor,并将0-255 映射到0-1 , 变为单通道wh(28*28) -> cwh(1*28*28)''' transforms.Normalize((0.1307 , ),(0.3081 , )) '''使其满足正态分布 (均值, ),(标准差, ),MNIST的经验值算来的''' ]) train_dataset = datasets.MNIST(root='./dataset/mnist/' ,train=True ,download=True ,transform=transform) train_loader = DataLoader(train_dataset,shuffle=True ,batch_size=batch_size) test_dataset = datasets.MNIST(root='./dataset/mnist/' ,train=False ,download=True ,transform=transform) test_loader = DataLoader(test_dataset,shuffle=False ,batch_size=batch_size)
2. 设计模型类:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 class Net (torch.nn.Module): def __init__ (self ): super (Net,self).__init__() self.linear1 = torch.nn.Linear(784 ,512 ) self.linear2 = torch.nn.Linear(512 ,256 ) self.linear3 = torch.nn.Linear(256 ,128 ) self.linear4 = torch.nn.Linear(128 ,64 ) self.linear5 = torch.nn.Linear(64 ,10 ) def forward (self,x ): x = x.view(-1 ,784 ) x = F.relu(self.linear1(x)) x = F.relu(self.linear2(x)) x = F.relu(self.linear3(x)) x = F.relu(self.linear4(x)) return self.linear5(x) model = Net()
3. 损失函数和优化器: 1 2 criterion = torch.nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(),lr=0.01 ,momentum=0.5 )
4. 训练+测试 模型: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 e_list = [] l_list = [] running_loss = 0.0 def train (epoch ): running_loss = 0.0 Loss = 0.0 for batch_idx, data in enumerate (train_loader,0 ): inputs, target = data optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs,target) loss.backward() optimizer.step() running_loss += loss.item() Loss += loss.item() if batch_idx % 300 == 299 : print ('[%d, %5d] loss: %.3f' % (epoch + 1 , batch_idx + 1 , running_loss / 300 )) running_loss = 0.0 e_list.append(epoch) l_list.append(running_loss/300 ) def test (): correct = 0 total = 0 with torch.no_grad(): for data in test_loader: images, labels = data outputs = model(images) _, predicted = torch.max (outputs.data, dim=1 ) total += labels.size(0 ) correct += (predicted == labels).sum ().item() print ('Accuracy on test set: %d %%' % (100 * correct / total)) if __name__ == '__main__' : for epoch in range (10 ): train(epoch) test()
5.收敛曲线: