Neural Networks II. Chen Gao. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Similar documents
Lecture 39: Training Neural Networks (Cont d)

Introduction to Machine Learning NPFL 054

Fun Neural Net Demo Site. CS 188: Artificial Intelligence. N-Layer Neural Network. Multi-class Softmax Σ >0? Deep Learning II

Neural Nets Using Backpropagation. Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill

A Brief History of the Development of Artificial Neural Networks

CS 1675: Intro to Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 1, 2018

Predicting NBA Shots

Convolutional Neural Networks

ARTIFICIAL NEURAL NETWORK BASED DESIGN FOR DUAL LATERAL WELL APPLICATIONS

Predicting the Total Number of Points Scored in NFL Games

Practical Approach to Evacuation Planning Via Network Flow and Deep Learning

Environmental Science: An Indian Journal

Universal Style Transfer via Feature Transforms

B. AA228/CS238 Component

Machine Learning Methods for Climbing Route Classification

Basketball field goal percentage prediction model research and application based on BP neural network

1.1 The size of the search space Modeling the problem Change over time Constraints... 21

Detection of Proportion of Different Gas Components Present in Manhole Gas Mixture Using Backpropagation Neural Network

Syntax and Parsing II

NEURAL NETWORKS BASED TYRE IDENTIFICATION FOR A TYRE INFLATOR OPERATIONS

Folding Reticulated Shell Structure Wind Pressure Coefficient Prediction Research based on RBF Neural Network

Neural Network in Computer Vision for RoboCup Middle Size League

Syntax and Parsing II

A Novel Travel Adviser Based on Improved Back-propagation Neural Network

Gait Analysis using Neural Networks

Development of PVT Correlation for Iraqi Crude Oils Using Artificial Neural Network

Predicting National Gas Consumption in Iran using a Hierarchical Combination of Neural Networks and Genetic Algorithms

A Novel Approach to Predicting the Results of NBA Matches

Coupling distributed and symbolic execution for natural language queries. Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin

CS249: ADVANCED DATA MINING

Performance of Fully Automated 3D Cracking Survey with Pixel Accuracy based on Deep Learning

Development of Prediction Models for Spent Nuclear Fuel Storage Cask Monitoring Hyong Chol Kim, Sam Hee Han, Dong Jae Park

CS 4649/7649 Robot Intelligence: Planning

Research Article Wind Speed Inversion in High Frequency Radar Based on Neural Network

Football Match Statistics Prediction using Artificial Neural Networks

Prediction Model of Tensile Strength Property in Friction Stir Welding Using Artificial Neural Network (ANNs)

In memory of Dr. Kevin P. Granata, my graduate advisor, who was killed protecting others on the morning of April 16, 2007.

intended velocity ( u k arm movements

The Pennsylvania State University. The Graduate School. Department of Energy and Mineral Engineering

Autodesk Moldflow Communicator Process settings

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

838. Improved aerodynamic optimization for the design of wind turbine blades

Detection of Valve Leakage in Reciprocating Compressor Using Artificial Neural Network (ANN)

Copy of my report. Why am I giving this talk. Overview. State highway network

EXPERIMENTÁLNÍ ANALÝZA MLP SÍTÍ PRO PREDIKCI POHYBU PLIC PŘI DÝCHÁNÍ Experimental Analysis of MLP for Lung Respiration Prediction

Heart Rate Prediction Based on Cycling Cadence Using Feedforward Neural Network

Predicting Horse Racing Results with Machine Learning

Evaluating and Classifying NBA Free Agents

An Artificial Neural Network-based Prediction Model for Underdog Teams in NBA Matches

SHOT ON GOAL. Name: Football scoring a goal and trigonometry Ian Edwards Luther College Teachers Teaching with Technology

Research Article Predicting Free Flow Speed and Crash Risk of Bicycle Traffic Flow Using Artificial Neural Network Models

Neuro-Fuzzy and Neural Networks Models to Estimate Radon Exhalation Rate in Uranium Ore- Rock Mine

A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications

Effect of Shear Span to Depth Ratio on Shear Strength of Steel Fiber Reinforced High Strength Concrete Deep Beam using ANN

IMPROVED OIL SLICK IDENTIFICATION USING CMOD5 MODEL FOR WIND SPEED EVALUATION ON SAR IMAGES

Predicting NFL Game Outcomes. Paul McBride ECE 539 Professor Yu Hen Hu

Evolving Gaits for the Lynxmotion Hexapod II Robot

Predicting Horse Racing Results with TensorFlow

PREDICTING the outcomes of sporting events

ASSESMENT of ESTIMATION MODELS for SCOUR AROUND PIPELINES under IRREGULAR WAVES

Open Research Online The Open University s repository of research publications and other research outputs

Ivan Suarez Robles, Joseph Wu 1

Sensing and Modeling of Terrain Features using Crawling Robots

JPEG-Compatibility Steganalysis Using Block-Histogram of Recompression Artifacts

Visualizing and Understanding Stochastic Depth Networks

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

Ocean Wave Forecasting

Prediction Market and Parimutuel Mechanism

Sprinkler Protection for Cloud Ceilings. SUPDET 2013 Jason Floyd, Ph.D. 28 February 2013

International Journal of Research in Science and Technology Volume 1, Issue 1: October - December, 2014

Artificial Neural Network Approach for Fault Detection in Pneumatic Valve in Cooler Water Spray System

Flow modelling hills complex terrain and other issues

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

FORECASTING OF ROLLING MOTION OF SMALL FISHING VESSELS UNDER FISHING OPERATION APPLYING A NON-DETERMINISTIC METHOD

A New Approach for Transformer Incipient Fault Diagnosis Based on Dissolved Gas Analysis (DGA)

Smart-Walk: An Intelligent Physiological Monitoring System for Smart Families

Navigate to the golf data folder and make it your working directory. Load the data by typing

THE TRANSACTIONS OF THE ROYAL INSTITUTION OF NAVAL ARCHITECTS. Part A International Journal of Maritime Engineering

CS145: INTRODUCTION TO DATA MINING

Lecture 10. Support Vector Machines (cont.)

Development of an Intelligent Gas Recognizer for Analysis of Sewer Gas

Using Spatio-Temporal Data To Create A Shot Probability Model

Using Soft Computing Techniques for Prediction of Winners in Tennis Matches

2 When Some or All Labels are Missing: The EM Algorithm

Adaptability and Fault Tolerance

Modeling vehicle delays at signalized junctions: Artificial neural networks approach

Motion Control of a Bipedal Walking Robot

Mathematics of Pari-Mutuel Wagering

Pedestrian Dynamics: Models of Pedestrian Behaviour

CS472 Foundations of Artificial Intelligence. Final Exam December 19, :30pm

Predicting Human Behavior from Public Cameras with Convolutional Neural Networks

SEPARATING A GAS MIXTURE INTO ITS CONSTITUENT ANALYTES USING FICA

Projecting Three-Point Percentages for the NBA Draft

Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China

Football Play Type Prediction and Tendency Analysis

NOVEL TECHNIQUE TO CONTROL THE PREMATURE INFANT INCUBATOR SYSTEM USING ANN

Comparison of Control Methods: Learning Robotics Manipulation with Contact Dynamics

CS 4649/7649 Robot Intelligence: Planning

Lesson 14: Modeling Relationships with a Line

Optimizing Cyclist Parking in a Closed System

Transcription:

Neural Networks II Chen Gao ECE-5424G / CS-5824 Virginia Tech Spring 2019

Neural Networks Origins: Algorithms that try to mimic the brain. What is this?

A single neuron in the brain Input Output Slide credit: Andrew Ng

An artificial neuron: Logistic unit x 0 x 1 x 2 Bias unit x = Output x 0 x 1 x 2 x 3 θ = h θ x = 1 θ 0 θ 1 θ 2 θ 3 1 + e θ x Weights Parameters x 3 Input Sigmoid (logistic) activation function Slide credit: Andrew Ng

Visualization of weights, bias, activation function range determined by g(.) bias b only change the position of the hyperplane Slide credit: Hugo Larochelle

Activation - sigmoid Squashes the neuron s preactivation between 0 and 1 Always positive Bounded Strictly increasing g x = 1 1 + e x Slide credit: Hugo Larochelle

Activation - hyperbolic tangent (tanh) Squashes the neuron s preactivation between -1 and 1 Can be positive or negative Bounded Strictly increasing g x = tanh x = ex e x e x + e x Slide credit: Hugo Larochelle

Activation - rectified linear(relu) Bounded below by 0 always non-negative Not upper bounded Tends to give neurons with sparse activities g x = relu x = max 0, x Slide credit: Hugo Larochelle

Activation - softmax For multi-class classification: we need multiple outputs (1 output per class) we would like to estimate the conditional probability p y = c x We use the softmax activation function at the output: g x = softmax x = ex 1 σ c e x c ex c σ c e x c Slide credit: Hugo Larochelle

Universal approximation theorem a single hidden layer neural network with a linear output unit can approximate any continuous function arbitrarily well, given enough hidden units Hornik, 1991 Slide credit: Hugo Larochelle

Neural network Multilayer x 0 x 1 x 2 a 0 a 1 a 2 Output h Θ x x 3 Layer 1 a 3 Layer 2 (hidden) Layer 3 Slide credit: Andrew Ng

Neural network x 0 x 1 x 2 x 3 a 0 a 1 a 2 a 3 h Θ x (1) (1) (1) (1) a 1 = g Θ10 x0 + Θ 11 x1 + Θ 12 x2 + Θ 13 x3 (1) (1) (1) (1) a 2 = g Θ20 x0 + Θ 21 x1 + Θ 22 x2 + Θ 23 x3 (j) a i = activation of unit i in layer j Θ j = matrix of weights controlling function mapping from layer j to layer j + 1 s j unit in layer j s j+1 units in layer j + 1 Size of Θ j? (1) (1) (1) (1) a 3 = g Θ30 x0 + Θ 31 x1 + Θ 32 x2 + Θ 33 x3 (1) (1) (1) h Θ (x) = g Θ 10 a0 + Θ11 a1 + Θ12 a2 + Θ13 a3 s j+1 (s j + 1) Slide credit: Andrew Ng

Neural network x 0 x = a 0 x 1 a 1 x 2 a 2 h Θ x x 0 x 1 x 2 x 3 z = Pre-activation z 1 z 2 z 3 x 3 a 3 Why do we need g(.)? (1) (1) (1) (1) a 1 = g Θ10 x0 + Θ 11 x1 + Θ 12 x2 + Θ 13 x3 = g(z 1 ) (1) (1) (1) (1) a 2 = g Θ20 x0 + Θ 21 x1 + Θ 22 x2 + Θ 23 x3 = g(z 2 ) (1) (1) (1) (1) a 3 = g Θ30 x0 + Θ 31 x1 + Θ 32 x2 + Θ 33 x3 = g(z 3 ) h Θ x = g Θ 2 10 a 2 0 + Θ 1 11 a 2 1 + Θ 1 12 a 2 2 + Θ 1 2 13 a 3 = g(z (3) ) Slide credit: Andrew Ng

Neural network x 0 x = a 0 x 1 a 1 x 2 a 2 h Θ x x 0 x 1 x 2 x 3 z = Pre-activation z 1 z 2 z 3 x 3 a 3 a 1 = g(z1 ) a 2 = g(z2 ) a 3 = g(z3 ) h Θ x = g(z (3) ) z = Θ (1) x = Θ (1) a (1) a = g(z ) Add a 0 = 1 z (3) = Θ a h Θ x = a (3) = g(z (3) ) Slide credit: Andrew Ng

Flow graph - Forward propagation X z a (3) h Θ x a z (3) W (1) b (1) W b How do we evaluate our prediction? z = Θ (1) x = Θ (1) a (1) a = g(z ) Add a 0 = 1 z (3) = Θ a h Θ x = a (3) = g(z (3) )

Cost function Logistic regression: Neural network: Slide credit: Andrew Ng

Gradient computation Need to compute: Slide credit: Andrew Ng

Gradient computation Given one training example x, y a (1) = x z = Θ (1) a (1) a = g(z ) (add a 0 ) z (3) = Θ a a (3) = g(z (3) (3) ) (add a 0 ) z (4) = Θ (3) a (3) a (4) = g z 4 = h Θ x Slide credit: Andrew Ng

Gradient computation: Backpropagation Intuition: δ j (l) = error of node j in layer l For each output unit (layer L = 4) δ (4) = a (4) y δ (3) = δ (4) δ(4) δ(4) a z (3) = (4) z (4) a (3) δ(4) a (4) z (4) a (3) z (3) = 1 *Θ 3 T δ (4). g z 4. g (z (3) ) z (3) = Θ a a (3) = g(z (3) ) z (4) = Θ (3) a (3) a (4) = g z 4 Slide credit: Andrew Ng

Backpropagation algorithm Training set x (1), y (1) x (m), y (m) Set Θ (1) = 0 For i = 1 to m Set a (1) = x Perform forward propagation to compute a (l) for l = 2.. L use y (i) to compute δ (L) = a (L) y (i) Compute δ (L 1), δ (L 2) δ Θ (l) = Θ (l) a (l) δ (l+1) Slide credit: Andrew Ng

Activation - sigmoid Partial derivative g x = g x 1 g x g x = 1 1 + e x Slide credit: Hugo Larochelle

Activation - hyperbolic tangent (tanh) Partial derivative g x = 1 g x 2 g x = tanh x = ex e x e x + e x Slide credit: Hugo Larochelle

Activation - rectified linear(relu) Partial derivative g x = 1 x > 0 g x = relu x = max 0, x Slide credit: Hugo Larochelle

Initialization For bias Initialize all to 0 For weights Can t initialize all weights to the same value we can show that all hidden units in a layer will always behave the same need to break symmetry Recipe: U[-b, b] the idea is to sample around 0 but break symmetry Slide credit: Hugo Larochelle

Putting it together Pick a network architecture No. of input units: Dimension of features No. output units: Number of classes Reasonable default: 1 hidden layer, or if >1 hidden layer, have same no. of hidden units in every layer (usually the more the better) Grid search Slide credit: Hugo Larochelle

Putting it together Early stopping Use a validation set performance to select the best configuration To select the number of epochs, stop training when validation set error increases Slide credit: Hugo Larochelle

Other tricks of the trade Normalizing your (real-valued) data Decaying the learning rate as we get closer to the optimum, makes sense to take smaller update steps mini-batch can give a more accurate estimate of the risk gradient Momentum can use an exponential average of previous gradients Slide credit: Hugo Larochelle

Dropout Idea: «cripple» neural network by removing hidden units each hidden unit is set to 0 with probability 0.5 hidden units cannot co-adapt to other units hidden units must be more generally useful Slide credit: Hugo Larochelle