图像识别基于Keras的手写数字识别(含代码)
前沿
人工智能的浪潮已经席卷全球,深度学习(Deep Learning)和人工智能(Artificial Intelligence, AI)等词汇也不断地充斥在我们身边。人工智能的发展是一个三起两落的变化,90年代期间,知识推理>神经网络>机器学习;2005年左右,机器学习>知识(语义网)>神经网络;而从2017年之后,基于深度学习的神经网络>知识(知识图谱)>机器学习。
卷积神经网络(convolutional neural network, CNN)作为深度学习中的代表,最早的灵感是来源于1961年Hubel和Wiesel两位神经生物学家,在对猫视觉皮层细胞的实验中,发现大脑可视皮层是分层的(CNN中的分层网络结构与其如出一辙)。深度学习作为机器学习(ML)的一个子领域,由于计算机能力的提高和大量数据的可用性,得到了戏剧性的复苏。但是,深度学习是否能等同或代表人工智能,这一点笔者认为有待商榷,深度学习可以认为是目前人工智能发展阶段的重要技术。由于本文主要撰写关于深度学习的入门实战,关于细节概念不做深入研究,下面笔者从实际案例,介绍深度学习处理图像的大致流程。
目录:
以手写识别数字为例,作为深度学习的入门项目,本文以Keras深度学习库为基础。其中使用的tensorflow等模块需要提前配置好,同时注意模型,图片保存、载入的文件路径问题。在自己的计算机上运行时,需要创建或修改。下面的流程包括:使用Keras载入MNIST数据集,构建Lenet训练网络模型,使用Keras进行模型的保存、载入,使用Keras实现对手写数字数据集的训练和预测,最后画出误差迭代图。
手写数字数据集介绍:
手写数字识别几乎是深度学习的入门数据集了。在keras中内置了MNIST数据集,其中测试集包含60000条数据,验证集包含10000条数据,为单通道的灰度图片,每张图片的像素大小为28 28.一共包含10个类别,为数字0到9。
导入相关模块: # import the necessary packages import numpy as np from keras.utils import np_utils from keras.optimizers import Adam from keras.preprocessing.image import ImageDataGenerator from keras.models import Sequential from keras.layers.convolutional import Conv2D from keras.layers.convolutional import MaxPooling2D from keras.layers.core import Activation from keras.layers.core import Flatten from keras.layers.core import Dense from keras import backend as K from keras.models import load_model
载入MNIST数据集
Keras可实现多种神经网络模型,并可以加载多种数据集来评价模型的效果,下面我们使用代码自动加载MNIST数据集。 # load minit data from keras.datasets import mnist (x_train, y_train),(x_test, y_test) = mnist.load_data()
显示MNIST训练数据集中的前面6张图片: # plot 6 images as gray scale import matplotlib.pyplot as plt plt.subplot(321) plt.imshow(x_train[0],cmap=plt.get_cmap("gray")) plt.subplot(322) plt.imshow(x_train[1],cmap=plt.get_cmap("gray")) plt.subplot(323) plt.imshow(x_train[2],cmap=plt.get_cmap("gray")) plt.subplot(324) plt.imshow(x_train[3],cmap=plt.get_cmap("gray")) plt.subplot(325) plt.imshow(x_train[4],cmap=plt.get_cmap("gray")) plt.subplot(326) plt.imshow(x_train[5],cmap=plt.get_cmap("gray")) # show plt.show()
数据的预处理
首先,将数据转换为4维向量[samples][width][height][pixels],以便于后面模型的输入 # reshape the data to four dimensions, due to the input of model # reshape to be [samples][width][height][pixels] x_train = x_train.reshape(x_train.shape[0], 28, 28, 1).astype("float32") x_test = x_test.reshape(x_test.shape[0], 28, 28, 1).astype("float32")
为了使模型训练效果更好,通常需要对图像进行归一化处理 # normalization x_train = x_train / 255.0 x_test = x_test / 255.0
最后,原始MNIST数据集的数据标签是0-9,通常要将其表示成one-hot向量。如训练数据标签为1,则将其转化为向量[0,1,0,0,0,0,0,0,0,0] # one-hot y_train = np_utils.to_categorical(y_train) y_test = np_utils.to_categorical(y_test)
模型的建立与计算
训练模型的参数设置: # parameters EPOCHS = 10 INIT_LR = 1e-3 BS = 32 CLASS_NUM = 10 norm_size = 28
本文使用Lenet网络架构,下面定义Lenet网络结构,若要更改网络结构,如用VGGNet,GoogleNet,Inception,ResNets或自己构建不同的网络结构,可以直接在这一块函数内进行修改。 # define lenet model def l_model(width, height, depth, NB_CLASS): model = Sequential() inputShape = (height, width, depth) # if we are using "channels last", update the input shape if K.image_data_format() == "channels_first": # for tensorflow inputShape = (depth, height, width) # first set of CONV => RELU => POOL layers model.add(Conv2D(20, (5, 5), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) # second set of CONV => RELU => POOL layers model.add(Conv2D(50, (5, 5), padding="same")) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(500)) model.add(Activation("relu")) # softmax classifier model.add(Dense(NB_CLASS)) model.add(Activation("softmax")) # return the constructed network architecture return model
再附上两个经典的模型:
VGG16: import inspect import os import numpy as np import tensorflow as tf import time VGG_MEAN = [103.939, 116.779, 123.68] class Vgg16: def __init__(self, vgg16_npy_path=None): if vgg16_npy_path is None: path = inspect.getfile(Vgg16) path = os.path.abspath(os.path.join(path, os.pardir)) path = os.path.join(path, "vgg16.npy") vgg16_npy_path = path print(path) self.data_dict = np.load(vgg16_npy_path, encoding="latin1").item() print("npy file loaded") def build(self, rgb): """ load variable from npy to build the VGG :param rgb: rgb image [batch, height, width, 3] values scaled [0, 1] """ start_time = time.time() print("build model started") rgb_scaled = rgb * 255.0 # Convert RGB to BGR red, green, blue = tf.split(axis=3, num_or_size_splits=3, value=rgb_scaled) assert red.get_shape().as_list()[1:] == [224, 224, 1] assert green.get_shape().as_list()[1:] == [224, 224, 1] assert blue.get_shape().as_list()[1:] == [224, 224, 1] bgr = tf.concat(axis=3, values=[ blue - VGG_MEAN[0], green - VGG_MEAN[1], red - VGG_MEAN[2], ]) assert bgr.get_shape().as_list()[1:] == [224, 224, 3] self.conv1_1 = self.conv_layer(bgr, "conv1_1") self.conv1_2 = self.conv_layer(self.conv1_1, "conv1_2") self.pool1 = self.max_pool(self.conv1_2, "pool1") self.conv2_1 = self.conv_layer(self.pool1, "conv2_1") self.conv2_2 = self.conv_layer(self.conv2_1, "conv2_2") self.pool2 = self.max_pool(self.conv2_2, "pool2") self.conv3_1 = self.conv_layer(self.pool2, "conv3_1") self.conv3_2 = self.conv_layer(self.conv3_1, "conv3_2") self.conv3_3 = self.conv_layer(self.conv3_2, "conv3_3") self.pool3 = self.max_pool(self.conv3_3, "pool3") self.conv4_1 = self.conv_layer(self.pool3, "conv4_1") self.conv4_2 = self.conv_layer(self.conv4_1, "conv4_2") self.conv4_3 = self.conv_layer(self.conv4_2, "conv4_3") self.pool4 = self.max_pool(self.conv4_3, "pool4") self.conv5_1 = self.conv_layer(self.pool4, "conv5_1") self.conv5_2 = self.conv_layer(self.conv5_1, "conv5_2") self.conv5_3 = self.conv_layer(self.conv5_2, "conv5_3") self.pool5 = self.max_pool(self.conv5_3, "pool5") self.fc6 = self.fc_layer(self.pool5, "fc6") assert self.fc6.get_shape().as_list()[1:] == [4096] self.relu6 = tf.nn.relu(self.fc6) self.fc7 = self.fc_layer(self.relu6, "fc7") self.relu7 = tf.nn.relu(self.fc7) self.fc8 = self.fc_layer(self.relu7, "fc8") self.prob = tf.nn.softmax(self.fc8, name="prob") self.data_dict = None print(("build model finished: %ds" % (time.time() - start_time))) def avg_pool(self, bottom, name): return tf.nn.avg_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME", name=name) def max_pool(self, bottom, name): return tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME", name=name) def conv_layer(self, bottom, name): with tf.variable_scope(name): filt = self.get_conv_filter(name) conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding="SAME") conv_biases = self.get_bias(name) bias = tf.nn.bias_add(conv, conv_biases) relu = tf.nn.relu(bias) return relu def fc_layer(self, bottom, name): with tf.variable_scope(name): shape = bottom.get_shape().as_list() dim = 1 for d in shape[1:]: dim *= d x = tf.reshape(bottom, [-1, dim]) weights = self.get_fc_weight(name) biases = self.get_bias(name) # Fully connected layer. Note that the "+" operation automatically # broadcasts the biases. fc = tf.nn.bias_add(tf.matmul(x, weights), biases) return fc def get_conv_filter(self, name): return tf.constant(self.data_dict[name][0], name="filter") def get_bias(self, name): return tf.constant(self.data_dict[name][1], name="biases") def get_fc_weight(self, name): return tf.constant(self.data_dict[name][0], name="weights")
GoogleNet: from keras.models import Model from keras.utils import plot_model from keras import regularizers from keras import backend as K from keras.layers import Input,Flatten, Dense,Dropout,BatchNormalization, concatenate from keras.layers.convolutional import Conv2D,MaxPooling2D,AveragePooling2D # Global Constants NB_CLASS=20 LEARNING_RATE=0.01 MOMENTUM=0.9 ALPHA=0.0001 BETA=0.75 GAMMA=0.1 DROPOUT=0.4 WEIGHT_DECAY=0.0005 LRN2D_NORM=True DATA_FORMAT="channels_last" # Theano:"channels_first" Tensorflow:"channels_last" USE_BN=True IM_WIDTH=224 IM_HEIGHT=224 EPOCH=50 def conv2D_lrn2d(x,filters,kernel_size,strides=(1,1),padding="same",dilation_rate=(1,1),activation="relu", use_bias=True,kernel_initializer="glorot_uniform",bias_initializer="zeros", kernel_regularizer=None,bias_regularizer=None,activity_regularizer=None, kernel_constraint=None,bias_constraint=None,lrn2d_norm=LRN2D_NORM,weight_decay=WEIGHT_DECAY): #l2 normalization if weight_decay: kernel_regularizer=regularizers.l2(weight_decay) bias_regularizer=regularizers.l2(weight_decay) else: kernel_regularizer=None bias_regularizer=None x=Conv2D(filters=filters,kernel_size=kernel_size,strides=strides,padding=padding,dilation_rate=dilation_rate, activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer, bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer, activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x) if lrn2d_norm: #batch normalization x=BatchNormalization()(x) return x def inception_module(x,params,concat_axis,padding="same",dilation_rate=(1,1),activation="relu", use_bias=True,kernel_initializer="glorot_uniform",bias_initializer="zeros", kernel_regularizer=None,bias_regularizer=None,activity_regularizer=None,kernel_constraint=None, bias_constraint=None,lrn2d_norm=LRN2D_NORM,weight_decay=None): (branch1,branch2,branch3,branch4)=params if weight_decay: kernel_regularizer=regularizers.l2(weight_decay) bias_regularizer=regularizers.l2(weight_decay) else: kernel_regularizer=None bias_regularizer=None #1x1 pathway1=Conv2D(filters=branch1[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate, activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer, bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer, activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x) #1x1->3x3 pathway2=Conv2D(filters=branch2[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate, activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer, bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer, activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x) pathway2=Conv2D(filters=branch2[1],kernel_size=(3,3),strides=1,padding=padding,dilation_rate=dilation_rate, activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer, bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer, activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway2) #1x1->5x5 pathway3=Conv2D(filters=branch3[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate, activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer, bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer, activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x) pathway3=Conv2D(filters=branch3[1],kernel_size=(5,5),strides=1,padding=padding,dilation_rate=dilation_rate, activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer, bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer, activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway3) #3x3->1x1 pathway4=MaxPooling2D(pool_size=(3,3),strides=1,padding=padding,data_format=DATA_FORMAT)(x) pathway4=Conv2D(filters=branch4[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate, activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer, bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer, activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway4) return concatenate([pathway1,pathway2,pathway3,pathway4],axis=concat_axis) class GoogleNet: @staticmethod def build(width, height, depth, NB_CLASS): INP_SHAPE = (height, width, depth) img_input = Input(shape=INP_SHAPE) CONCAT_AXIS = 3 # Data format:tensorflow,channels_last;theano,channels_last if K.image_data_format() == "channels_first": INP_SHAPE = (depth, height, width) img_input = Input(shape=INP_SHAPE) CONCAT_AXIS = 1 x = conv2D_lrn2d(img_input, 64, (7, 7), 2, padding="same", lrn2d_norm=False) x = MaxPooling2D(pool_size=(2, 2), strides=2, padding="same")(x) x = BatchNormalization()(x) x = conv2D_lrn2d(x, 64, (1, 1), 1, padding="same", lrn2d_norm=False) x = conv2D_lrn2d(x, 192, (3, 3), 1, padding="same", lrn2d_norm=True) x = MaxPooling2D(pool_size=(2, 2), strides=2, padding="same")(x) x = inception_module(x, params=[(64,), (96, 128), (16, 32), (32,)], concat_axis=CONCAT_AXIS) # 3a x = inception_module(x, params=[(128,), (128, 192), (32, 96), (64,)], concat_axis=CONCAT_AXIS) # 3b x = MaxPooling2D(pool_size=(2, 2), strides=2, padding="same")(x) x = inception_module(x, params=[(192,), (96, 208), (16, 48), (64,)], concat_axis=CONCAT_AXIS) # 4a x = inception_module(x, params=[(160,), (112, 224), (24, 64), (64,)], concat_axis=CONCAT_AXIS) # 4b x = inception_module(x, params=[(128,), (128, 256), (24, 64), (64,)], concat_axis=CONCAT_AXIS) # 4c x = inception_module(x, params=[(112,), (144, 288), (32, 64), (64,)], concat_axis=CONCAT_AXIS) # 4d x = inception_module(x, params=[(256,), (160, 320), (32, 128), (128,)], concat_axis=CONCAT_AXIS) # 4e x = MaxPooling2D(pool_size=(2, 2), strides=2, padding="same")(x) x = inception_module(x, params=[(256,), (160, 320), (32, 128), (128,)], concat_axis=CONCAT_AXIS) # 5a x = inception_module(x, params=[(384,), (192, 384), (48, 128), (128,)], concat_axis=CONCAT_AXIS) # 5b x = AveragePooling2D(pool_size=(1, 1), strides=1, padding="valid")(x) x = Flatten()(x) x = Dropout(DROPOUT)(x) x = Dense(output_dim=NB_CLASS, activation="linear")(x) x = Dense(output_dim=NB_CLASS, activation="softmax")(x) # Create a Keras Model model = Model(input=img_input, output=[x]) model.summary() # Save a PNG of the Model Build #plot_model(model, to_file="../imgs/GoogLeNet.png") # return the constructed network architecture return model
设置优化方法,loss函数,并编译模型: model = l_model(width=norm_size, height=norm_size, depth=1, NB_CLASS=CLASS_NUM) opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
本文使用生成器以节约内存: # Use generators to save memory aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode="nearest") H = model.fit_generator(aug.flow(x_train, y_train, batch_size=BS), steps_per_epoch=len(x_train) // BS, epochs=EPOCHS, verbose=2)
结果分析
作出训练阶段的损失、精确度迭代图,本文将epoch设置为10,已达到0.98的准确率(代码、图像如下所示)。 # plot the iteration process N = EPOCHS plt.figure() plt.plot(np.arange(0,N),H.history["loss"],label="loss") plt.plot(np.arange(0,N),H.history["acc"],label="train_acc") plt.title("Training Loss and Accuracy on mnist-img classifier") plt.xlabel("Epoch") plt.ylabel("Loss/Accuracy") plt.legend(loc="lower left") plt.savefig("../figure/Figure_2.png")
公众号:帕帕科技喵
欢迎关注与讨论~
连接物理和数字世界,以数据驱动企业持续增长核心摘要概念界定数字经济时代催生了以大数据为代表的新型生产要素,数据驱动强调以数据作为关键生产要素,是企业数字化转型的主线。行业背景新经济领域的高度数字化,通过传导至传统产业的转型
我就是你的鲜的每日C摩飞便携榨汁杯零序其实,自己是个懒人,水果这类东西属于懒的嚼,曾经也想过购买榨汁机原汁机破壁机等等家里有一个以前购买豆浆机赠送的榨汁机,跟上图这种类似,就是那种分离果肉的,带滤网,很简单的,记得
看得见的键盘黑科技,汉王智能键盘易公式体验记得小时候刚接触电脑时,在键盘上打字是让我最头疼的事儿。那时候就常常想电脑上的文字要是可以直接手写输入就好了。不知不觉中,我也长大成人,现在的我是一名标准的上班族。每天工作时,接触
给你在云端的佩戴感受OPPOEncoQ1无线降噪耳机测评零序说到OPPO,大家第一时间想到的是红绿厂中的绿厂,想到性价比并不是很高,想到充电五分钟,通话两小时的广告语,其实,蓝绿厂近几年也在走不同的道路,比如蓝厂的IQOO系列,绿厂的r
未来已来BOSE700无线消噪耳机使用评测零序BOSE作为降噪界的大佬,此次新出的700属于全新设计,全新系列,并不是传统QC系列的新品,而是另外一个分支,也是另外一种尝试。同时,我们要注意的是,BOSE定义的是消噪,而不
打破技术领域招聘中的年龄歧视任何领域都存在年龄歧视,包括科技领域。年长的技术人员经常抱怨,年龄歧视(Ageism)在求职面试中影响甚大,尽管这些技术人员拥有丰富的经验和技能,但却无法获得新职位。对于招聘经理和
交换机的工作原理一数据链路层的功能(1)数据链路层的建立维护与拆除(2)帧包装帧传输帧实验(3)帧的差错恢复(4)流量控制(5)物理地址寻址二以太网的MAC地址作用用来识别另一个以太网上的某个单独
2019年中国AI教育行业发展研究报告核心摘要校外教育在线化蓬勃发展,校内教育信息化稳中求进,资本与市场驱动消费体验升级,政策与技术促进服务质量提升,校内外教育市场供需缺口凸显,亟待智能化解决方案加速教育现代化的进程,
重磅润物有声中国互联网年度流量报告卷首语中国互联网行至2019年,终于将其版图全貌和精神内核展现在用户眼前。初看,带来的是颠覆。互联网是新世纪的石油。互联网似乎具有改变一切的力量。对于渺小个体而言,互联网给每个人带
亮点不止高通QCC3020JEETAIRPLUS评测零序TWS真无线蓝牙耳机2019年算是爆发的一年,各家都在推出自家的TWS,国产TWS的黑马之一的小捷科技也推出了自家最新的旗舰产品JEETAIRPLUS,作为采用高通QCC302
三层交换技术(1)工作原理主机A给B发送单播数据包交换机查找FIB表,找到下一跳地址查找下一跳地址所对应邻接关系二层封装信息转发三层交换二层交换三层转发(2)MLS传统的MLS是一次路由多次交