欢迎来到阶梯文库! | 帮助中心 阶梯文库,助您进步!
阶梯文库
全部分类
  • 行业报告资料库 >
    行业报告资料库
    可研报告 环评报告 节能报告 项目建议 论证报告 其他报告 炭行业报告 钢铁行业报告 农业报告 IT行业报告 文学报告 电力报告 军事报告 半导体行业报告 外语报告
  • 专业资料 >
    专业资料
    人文社科 经管营销 工程科技 IT/计算机 自然科学 医药卫生 农林牧渔
  • 教育专区 >
    教育专区
    幼儿教育 小学资料 中学资料 高中资料 大学资料 成人自考 职业教育 作文写作 字典词语 英语学习
  • 管理文献 >
    管理文献
    事务文书 其他资料 管理手册 管理方法 管理工具 管理制度
  • 应用文书 >
    应用文书
    毕业论文 工作报告 工作计划 PPT文档 工作总结 党团工作 表格/模板 合同协议
  • 生活休闲 >
    生活休闲
    服装配饰 星座运势 摄影摄像 保健养生 美食烹饪 家居装修 宗教风水 社会民生 美容塑身 手工制作 娱乐时尚 沟通交流 网络生活 科普知识 时政新闻 音乐歌曲 户外运动 婚嫁育儿 图书阅读 两性情感 游戏攻略 体育武术 期刊/杂志 宠物 旅游出行
  • 资格/认证考试 >
    资格/认证考试
    建造师考试 注册会计师 公务员考试 专升本考试 成考 自考 教师资格考试 司法考试 微软认证 思科认证 全国翻译资格认证 医师/药师资格考试 会计职称考试 报关员资格考试 人力资源管理师 安全工程师考试 出国培训 资产评估师考试 技工类职业技能考试 银行从业资格 计算机等级考试 营养师认证 物流师考试 证券从业资格考试 注册税务师 理财规划师 建筑师考试 质量管理体系认证
  • 标准规范 >
    标准规范
    机械行业标准(JB) 国家标准(GB) 电子行业标准(SJ) 化工行业标准(HG) 国家专业标准(ZB) 轻工行业标准(QB) 铁路运输行业标准(T 船舶行业标准(CB) 国家计量标准(JJ) 商检行业标准(SN) 农业行业标准(NY) 通信行业标准(YD) 石油天然气行业标准 交通行业标准(JT) 石油化工行业标准(S 冶金行业标准(YB) 纺织行业标准(FZ) 有色金属行业标准(Y 煤炭行业标准(MT) 电力行业标准(DL) 公共安全行业标准(G 建筑材料行业标准(J 医药行业标准(YY) 林业行业标准(LY) 建筑工业行业标准(J 城镇建设行业标准(C 烟草行业标准(YC) 水产行业标准(SC) 商业行业标准(SB) 汽车行业标准(QC) 教育行业标准(JY) 水利行业标准(SL) 地质矿产行业标准(D 环境保护行业标准(H 广播电影电视行业标 卫生行业标准(WS) 民用航空行业标准(M 地方标准(DB) 劳动和劳动安全行业 粮食行业标准(LS) 邮政行业标准(YZ) 海洋行业标准(HY) 测绘行业标准(CH) 航天工业行业标准(Q 稀土行业标准(XB) 新闻出版行业标准(C 包装行业标准(BB) 气象行业标准(QX) 档案行业标准(DA) 安全行业标准(AQ) 物资行业标准(WB) 金融行业标准(JR) 航空工业行业标准(H 外经贸行业标准(WM) 文化行业标准(WH) 民政行业标准(MZ) 旅游行业标准(LB) 土地管理行业标准(T 体育行业标准(TY) 其他行业标准 司法鉴定技术规范( 日本标准 美国标准 欧盟标准 德国标准
  • 企业文库 >
    企业文库
    企业宣传 产品文档 技术资料
  • 政务公开 >
    政务公开
    政策文件 便民服务 公示公告
  • 深度学习框架 >
    深度学习框架
  • 区块链 >
    区块链
  • 首页 阶梯文库 > 资源分类 > PDF文档下载
     

    深度学习教程1.0 Deep Learning Tutorial.pdf

    • 资源ID:3339       资源大小:1.47MB        全文页数:173页
    • 资源格式: PDF        下载权限:游客/注册会员/VIP会员    下载费用:10积分 【人民币1元】
    换一换
    游客快捷下载 游客一键下载
    会员登录下载
    下载资源需要10积分 【人民币1元】

    邮箱/手机:
    温馨提示:
    支付成功后,系统会根据您填写的邮箱或者手机号作为您下次登录的用户名和密码(如填写的是手机,那登陆用户名和密码就是手机号),方便下次登录下载和查询订单;
    特别说明:
    请自助下载,系统不会自动发送文件的哦;
    支付方式: 微信支付    支付宝   
    验证码:   换一换

          加入VIP,下载共享资源
     
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,既可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰   

    深度学习教程1.0 Deep Learning Tutorial.pdf

    Deep Learning Tutorial Release 0.1 LISA lab, University of Montreal September 01, 2015 CONTENTS 1 LICENSE 1 2 Deep Learning Tutorials 3 3 Getting Started 5 3.1 Download . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.4 A Primer on Supervised Optimization for Deep Learning . . . . . . . . . . . . . . . . . . . 8 3.5 Theano/Python Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4 Classifying MNIST digits using Logistic Regression 17 4.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Defining a Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 Creating a LogisticRegression class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4 Learning the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.5 Testing the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.6 Putting it All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.7 Prediction Using a Trained Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5 Multilayer Perceptron 35 5.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Going from logistic regression to MLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 Putting it All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4 Tips and Tricks for training MLPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6 Convolutional Neural Networks LeNet 51 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.2 Sparse Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.3 Shared Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.4 Details and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.5 The Convolution Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.6 MaxPooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.7 The Full Model LeNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.8 Putting it All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.9 Running the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 i 6.10 Tips and Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 7 Denoising Autoencoders dA 65 7.1 Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7.2 Denoising Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3 Putting it All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.4 Running the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 8 Stacked Denoising Autoencoders SdA 81 8.1 Stacked Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 8.2 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.3 Running the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.4 Tips and Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 9 Restricted Boltzmann Machines RBM 91 9.1 Energy-Based Models EBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 9.2 Restricted Boltzmann Machines RBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 9.3 Sampling in an RBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 9.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 9.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 10 Deep Belief Networks 109 10.1 Deep Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 10.2 Justifying Greedy-Layer Wise Pre-Training . . . . . . . . . . . . . . . . . . . . . . . . . . 110 10.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 10.4 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 10.5 Running the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 10.6 Tips and Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 11 Hybrid Monte-Carlo Sampling 119 11.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 11.2 Implementing HMC Using Theano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 11.3 Testing our Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 11.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 12 Recurrent Neural Networks with Word Embeddings 133 12.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 12.2 Code - Citations - Contact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 12.3 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 12.4 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 12.5 Recurrent Neural Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 12.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 12.7 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 12.8 Running the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 13 LSTM Networks for Sentiment Analysis 143 13.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 13.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 13.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 ii 13.4 Code - Citations - Contact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 13.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 14 Modeling and generating sequences of polyphonic music with the RNN-RBM 149 14.1 The RNN-RBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 14.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 14.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 14.4 How to improve this code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 15 Miscellaneous 159 15.1 Plotting Samples and Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 16 References 163 Bibliography 165 Index 167 iii iv CHAPTER ONE LICENSE Copyright c 2008–2013, Theano Development Team All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of Theano nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ‘’AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT, INDIRECT, IN- CIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIA- BILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 1 Deep Learning Tutorial, Release 0.1 2 Chapter 1. LICENSE CHAPTER TWO DEEP LEARNING TUTORIALS Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals Artificial Intelligence. See these course notes for a brief introduction to Machine Learning for AI and an introduction to Deep Learning algorithms. Deep Learning is about learning multiple levels of representation and abstraction that help to make sense of data such as images, sound, and text. For more about deep learning algorithms, see for example The monograph or review paper Learning Deep Architectures for AI Foundations yi. We use superscripts to distinguish training set examples xi 2RD is thus the i-th training example of dimensionality D. Similarly, yi 2 f0;;Lg is the i-th label assigned to input xi. It is straightforward to extend these examples to ones where yi has other types e.g. Gaussian for regression, or groups of multinomials for predicting multiple symbols. 3.3.2 Math Conventions W upper-case symbols refer to a matrix unless specified otherwise Wij element at i-th row and j-th column of matrix W Wi ;Wi vector, i-th row of matrix W W j vector, j-th column of matrix W 3.3. Notation 7 Deep Learning Tutorial, Release 0.1 b lower-case symbols refer to a vector unless specified otherwise bi i-th element of vector b 3.3.3 List of Symbols and acronyms D number of input dimensions. Dih number of hidden units in the i-th layer. f x, fx classification function associated with a model PYjx; , defined as argmaxkPY kjx; . Note that we will often drop the subscript. L number of labels. L ;D log-likelihoodDof the model defined by parameters . ‘ ;D empirical loss of the prediction function f parameterized by on data setD. NLL negative log-likelihood set of all parameters for a given model 3.3.4 Python Namespaces Tutorial code often uses the following namespaces import theano import theano.tensor as T import numpy 3.4 A Primer on Supervised Optimization for Deep Learning What’s exciting about Deep Learning is largely the use of unsupervised learning of deep networks. But supervised learning also plays an important role. The utility of unsupervised pre-training is often evaluated on the basis of what performance can be achieved after supervised fine-tuning. This chapter reviews the basics of supervised learning for classification models, and covers the minibatch stochastic gradient descent algorithm that is used to fine-tune many of the models in the Deep Learning Tutorials. Have a look at these introductory course notes on gradient-based learning for more basics on the notion of optimizing a training criterion using the gradient. 3.4.1 Learning a Classifier Zero-One Loss The models presented in these deep learning tutorials are mostly used for classification. The objective in training a classifier is to minimize the number of errors zero-one loss on unseen examples. If f RD 8 Chapter 3. Getting Started Deep Learning Tutorial, Release 0.1 f0;;Lgis the prediction function, then this loss can be written as ‘0;1 jDjX i0 Ifxi6yi where either D is the training set during training or D\Dtrain ; to avoid biasing the evaluation of validation or test error. I is the indicator function defined as Ix 1 if x is True 0 otherwise In this tutorial, f is defined as fx argmaxkPY kjx; In python, using Theano this can be written as zero_one_loss is a Theano variable representing a symbolic expression of the zero one loss ; to get the actual value this symbolic expression has to be compiled into a Theano function see the Theano tutorial for more details zero_one_loss T.sumT.neqT.argmaxp_y_given_x, y Negative Log-Likelihood Loss Since the zero-one loss is not differentiable, optimizing it for large models thousands or millions of param- eters is prohibitively expensive computationally. We thus maximize the log-likelihood of our classifier given all the labels in a training set. L ;D jDjX i0 logPY yijxi; The likelihood of the correct class is not the same as the number of right predictions, but from the point of view of a randomly initialized classifier they are pretty similar. Remember that likelihood and zero-one loss are different objectives; you should see that they are corralated on the validation set but sometimes one will rise while the other falls, or vice-versa. Since we usually speak in terms of minimizing a loss function, learning will thus attempt to minimize the negative log-likelihood NLL, defined as NLL ;D jDjX i0 logPY yijxi; The NLL of our classifier is a differentiable surrogate for the zero-one loss, and we use the gradient of this function over our training data as a supervised learning signal for deep learning of a classifier. This can be computed using the following line of code 3.4. A Primer on Supervised Optimization for Deep Learning 9 Deep Learning Tutorial, Release 0.1 NLL is a symbolic variable ; to get the actual value of NLL, this symbolic expression has to be compiled into a Theano function see the Theano tutorial for more details NLL -T.sumT.logp_y_given_x[T.arangey.shape[0], y] note on syntax T.arangey.shape[0] is a vector of integers [0,1,2,.,leny]. Indexing a matrix M by the two vectors [0,1,.,K], [a,b,.,k] returns the elements M[0,a], M[1,b], ., M[K,k] as a vector. Here, we use this syntax to retrieve the log-probability of the correct labels, y. 3.4.2 Stochastic Gradient Descent What is ordinary gradient descent it is a simple algorithm in which we repeatedly make small steps down- ward on an error surface defined by a loss function of some parameters. For the purp

    注意事项

    本文(深度学习教程1.0 Deep Learning Tutorial.pdf)为本站会员(admin)主动上传,阶梯文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知阶梯文库(发送邮件至77594475@qq.com或直接QQ联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。





    微信图片
    收起
    展开