Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China

Similar documents
Logistic Regression. Hongning Wang

Introduction to Machine Learning NPFL 054

Special Topics: Data Science

Lecture 5. Optimisation. Regularisation

Bayesian Methods: Naïve Bayes

CS249: ADVANCED DATA MINING

Communication Amid Uncertainty

knn & Naïve Bayes Hongning Wang

Lecture 39: Training Neural Networks (Cont d)

Deformable Convolutional Networks

Practical Approach to Evacuation Planning Via Network Flow and Deep Learning

Support Vector Machines: Optimization of Decision Making. Christopher Katinas March 10, 2016

Lecture 10. Support Vector Machines (cont.)

Imperfectly Shared Randomness in Communication

Communication Amid Uncertainty

Fun Neural Net Demo Site. CS 188: Artificial Intelligence. N-Layer Neural Network. Multi-class Softmax Σ >0? Deep Learning II

CS145: INTRODUCTION TO DATA MINING

Convolutional Neural Networks

Universal Style Transfer via Feature Transforms

Visualizing and Understanding Stochastic Depth Networks

JPEG-Compatibility Steganalysis Using Block-Histogram of Recompression Artifacts

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Physical Design of CMOS Integrated Circuits

Coupling distributed and symbolic execution for natural language queries. Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin

Operational Risk Management: Preventive vs. Corrective Control

Basketball field goal percentage prediction model research and application based on BP neural network

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Machine Learning Application in Aviation Safety

Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM)

CT4510: Computer Graphics. Transformation BOCHANG MOON

EE582 Physical Design Automation of VLSI Circuits and Systems

CAAD CTF 2018 Rules June 21, 2018 Version 1.1

Performance of Fully Automated 3D Cracking Survey with Pixel Accuracy based on Deep Learning

Pre-Kindergarten 2017 Summer Packet. Robert F Woodall Elementary

Neural Nets Using Backpropagation. Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill

CS 1675: Intro to Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 1, 2018

Introduction to Pattern Recognition

POKEMON HACKS. Jodie Ashford Josh Baggott Chloe Barnes Jordan bird

ECO 745: Theory of International Economics. Jack Rossbach Fall Lecture 6

THe rip currents are very fast moving narrow channels,

A Novel Approach to Predicting the Results of NBA Matches

Conservation of Energy. Chapter 7 of Essential University Physics, Richard Wolfson, 3 rd Edition

Tie Breaking Procedure

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

Improving Context Modeling for Video Object Detection and Tracking

Predicting the Total Number of Points Scored in NFL Games

Neural Networks II. Chen Gao. Virginia Tech Spring 2019 ECE-5424G / CS-5824

HIGH RESOLUTION DEPTH IMAGE RECOVERY ALGORITHM USING GRAYSCALE IMAGE.

SNARKs with Preprocessing. Eran Tromer

Introduction to Pattern Recognition

Predicting NBA Shots

Combining Experimental and Non-Experimental Design in Causal Inference

NCSS Statistical Software

Spatio-temporal analysis of team sports Joachim Gudmundsson

Coaches, Parents, Players and Fans

Substitution of Energy and Capital an Its Uncertainty for China

DNS Study on Three Vortex Identification Methods

Evaluating and Classifying NBA Free Agents

Predicting Horse Racing Results with TensorFlow

A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications

The sycc Color Space

Parsimonious Linear Fingerprinting for Time Series

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Machine Learning Methods for Climbing Route Classification

Chapter 10 Aggregate Demand I: Building the IS LM Model

Analysis of Gini s Mean Difference for Randomized Block Design

Biomechanics and Models of Locomotion

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Predicting Horse Racing Results with Machine Learning

graphic standards manual Mountain States Health Alliance

ISyE 6414 Regression Analysis

NATIONAL FEDERATION RULES B. National Federation Rules Apply with the following TOP GUN EXCEPTIONS

What is Restrained and Unrestrained Pipes and what is the Strength Criteria

Product Decomposition in Supply Chain Planning

Open Research Online The Open University s repository of research publications and other research outputs

The Evolution of Transport Planning

intended velocity ( u k arm movements

FORECASTING OF ROLLING MOTION OF SMALL FISHING VESSELS UNDER FISHING OPERATION APPLYING A NON-DETERMINISTIC METHOD

Folding Reticulated Shell Structure Wind Pressure Coefficient Prediction Research based on RBF Neural Network

Introduction to Genetics

Phrase-based Image Captioning

Addition and Subtraction of Rational Expressions

Numerical Simulation of Wave Loads on Static Offshore Structures

Functions of Random Variables & Expectation, Mean and Variance

A Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept

B. AA228/CS238 Component

LOCOMOTION CONTROL CYCLES ADAPTED FOR DISABILITIES IN HEXAPOD ROBOTS

ORF 201 Computer Methods in Problem Solving. Final Project: Dynamic Programming Optimal Sailing Strategies

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

Name: Grade: LESSON ONE: Home Row

A Brief History of the Development of Artificial Neural Networks

Smart-Walk: An Intelligent Physiological Monitoring System for Smart Families

Report for Experiment #11 Testing Newton s Second Law On the Moon

FIRE PROTECTION. In fact, hydraulic modeling allows for infinite what if scenarios including:

My ABC Insect Discovery Book

2 When Some or All Labels are Missing: The EM Algorithm

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

This file is part of the following reference:

Real Time Early Warning Indicators for Costly Asset Price Boom/Bust Cycles: A Role for Global Liquidity

Transcription:

Attacking and defending neural networks HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China

Outline Background Attacking methods Defending methods 2

AI is booming US and China have huge AI plans United Kingdom plans $1.3 billion AI push France spends $1.8 billion on AI to compete with US and China EU wants to invest 18 billion pounds in AI development 3

Neural networks are powerful Many powerful models Resnet (He et al., 2016) Inception V1 (Szegedy et al., 2014) Inception V2 (Szegedy et al., 2016) Inception V3 (Szegedy et al., 2016) DenseNet (Huang et al., 2017) Most of them achieve higher accuracy than humans on the ImageNet classification task 4

but Unreliable! legitimate Examples Adversarial Examples Crafted specifically to fool ResNet 5

Black-box Attacks (Transferability) Cross-model transferability ResNet Inc V3 dog dog Cross-data transferability 6

Physically realizable attack Sharif et al., CCS 2016 Kaylee Defer Nancy Travis 7

Physical attack Athalye et al., ICLR 2018 8

Definitions Given a classifier ff xx : xx XX yy YY, which maps the input sample xx to the label yy. Add a small noise δδ to xx and we have xx = xx + δδ. If ff xx yy, then xx is called an adversarial example, and xx is called the normal sample or legitimate sample. δδ is called adversarial perturbation, whose magnitude is often limited such as δδ pp εε, where pp could be 0, 1, 2 or, εε is a small constant If ff xx = yy, where yy is a specified label, then this attack is called targeted attack If ff xx is not specified to any class, then this attack is called untargeted attack 9

Definitions Given a classifier ff xx : xx XX yy YY, which maps the input sample xx to the label yy. Add a small noise δδ to xx and we have xx = xx + δδ. Attack a certain model: perturb xx to xx and make the model misclassify xx, i.e., ff xx yy White-box attack: the model is known Black-box attack: the model is unknown Defend a certain model: make the model map xx to yy 10

Outline Background Attacking methods Defending methods Interpretability Summary 11

Principles for attacking a classification model Recall: Learning a neural network amounts to minimizing a loss function LL (often cross entropy + regularizer) w.r.t. the weights and biases min ww,bb LL(xx, yy; ww, bb) where (xx, yy) denotes the input and desired output pair Attacking: maximizing the above loss function w.r.t. the input max LL(xx, yy; ww, bb) xx subject to xx xx < δδ pp 12

Attacking methods One-step FGSM (linearity of decision boundary) xx = xx + εε sign( xx LL(xx, yy)) Poor white-box attack ability, good black-box attack ability Iterative FGSM (I-FGSM) xx 0 = xx, xx tt+1 = clip(xx tt + α sign xx LL xx tt, yy ) Optimization-based methods min dd xx, xx LL(xx, yy) Poor white-box attack ability, good black-box attack ability 13

Optimization with Momentum Constrained optimization of adversarial attacks: argmax xx LL xx, yy ss. tt. xx xx εε Optimization algorithm with momentum Accelerate gradient descent; Escape from poor local minima and maxima; Stabilize update directions of stochastic gradient descent; Momentum can be used for adversarial attacks The white-box attack ability and black-box attack ability are both strong (transferability) Dong, Liao, Pang, Su, Zhu, Hu, Li, Boosting Adversarial Attacks with Momentum, CVPR 2018 14

Momentum Iterative FGSM xx 0 = xx, xx tt+1 = clip(xx tt + α sign xx LL xx tt, yy ) xx 0 = xx, gg 0 = 0 gg tt+1 = μμ gg tt + xxll xx tt, yy xx LL xx tt, yy 1 = clip(xx tt + αα sign gg tt+1 ) xx tt+1 Momentum μμ is the decay factor; gg tt accumulates the gradient w.r.t. input space of the first tt iterations; The current gradient is normalized. 15

Non-targeted Results εε = 16, μμ = 1.0, 10 iterations 16

Attacking an Ensemble of Models If an adversarial example remains adversarial for multiple models, it is more likely to be misclassified by other black-box models. Ensemble in logits ll xx = KK ii=1 ww ii ll ii (xx) The loss is defined as JJ xx, yy = 1 yy log(softmax(ll(xx))) Comparisons: Ensemble in predictions: pp xx = KK ii=1 ww ii pp ii (xx) Ensemble in loss: JJ xx, yy = KK ii=1 ww ii JJ ii (xx, yy) 17

Non-targeted Results (2) A total of 7 models: in each row, ensemble of 6 models and test on one model as indicated in the 1 st column 18

NIPS 2017 Competition Three tracks: Non-targeted Adversarial Attack 1 st Targeted Adversarial Attack 1 st Defense Against Adversarial Attack 1 st Evaluation: Given a dataset (5000images ImageNet Compatible) Score aaaaaaaaaaaa = ddeeeeeeeeeeee NN kk=1 [defense(attack(xx kk )) yy kk ] Score tttttttttttt = ddeeeeeeeeeeee NN kk=1 [defense target xx kk = yy kk ] Score dddddddddddddd = aaaaaaaaaaaa NN kk=1 [defense attack xx kk = yy kk ] Requirement: 4 εε 16, based on LL norm. Running time 500ss for 100 images 19

Outline Background Attacking methods Defending methods Interpretability Summary 22

Defending methods Obfuscated gradient methods Input transformations (e.g., JPEG compression) (Guo et al. 2018) Thermometer encoding (Buckman et al., 2018) Local intrinsic dimensionality (LID) (Ma et al., 2018) Stochastic activation pruning (Dhillon et al.,2018) Etc. Not powerful enough (Athalye, et al., ICML 2018 Best paper) Cannot defend black-box attack well 23

Defending black-box attacks Adversarial training (Kurakin, Goodfellow, Bengio, 2017) Add adversarial examples into the training set Computationally expensive Ensemble adversarial training (Tramèr et al., 2018) Use more than one model for adversarial training Even more computationally expensive Worked the best till our method HGD (Liao et al., 2018) 24

Motivation for denoising Misclassification: xx + dddd yy After denoising: xx + dddd xx yy Adv img Est. clean img 25

Neural networks for denoising Denoising Autoencoder Denoising U-Net Pixel Guided Denoiser (PGD) L1 dd xx L1 xx xx xx xx + xx xx Liao, Liang, Dong, Pang, Hu, Zhu, Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser, CVPR 2018 26

Architecture of the DU-Net 27

Experiment setting Attack methods FGSM attack IFGSM attack Training set 30K source images Test set for whitebox attack 10K source images Test set for blackbox attack 10K source images 28

Results DAE is poor for reconstructing large images DU-NET removes more noise for white-box attack, but the accuracy was lower than DAE in white box attack Why? 29

Error amplification Network Image Clean Im Adv Im Let s construct a loss in higher layers! 30

Proposed schemes Train Test Naive CNN Adv Im Pixel Guided Denoiser (PGD) Adv Im D Den Im L1 Clean Im L1 Adv Im D Den Im CNN High-level Representation Guided Denoiser (HGD) Adv Im D Den Im Feat1 CNN Feat2 CNN Clean Im Adv Im D Den Im CNN 31

Variants of HGD Feature guided denoiser FGD Logits guided denoiser LGD Class label guided denoiser CGD 32

Robustness of HGD 33

Transferability of HGD L1 Feat1 Feat2 D CNN1 CNN1 D CNN2 Adv Im Den Im Clean Im Adv Im Den Im The two CNNs for training and testing could be different CNN1 The target model CNN2 is Resnet 34

NIPS 2017 Competition Three tracks: Non-targeted Adversarial Attack 1 st Targeted Adversarial Attack 1 st Defense Against Adversarial Attack 1 st Evaluation: Given a dataset (5000images ImageNet Compatible) Score aaaaaaaaaaaa = ddeeeeeeeeeeee NN kk=1 [defense(attack(xx kk )) yy kk ] Score tttttttttttt = ddeeeeeeeeeeee NN kk=1 [defense target xx kk = yy kk ] Score dddddddddddddd = aaaaaaaaaaaa NN kk=1 [defense attack xx kk = yy kk ] Requirement: 4 εε 16, based on LL norm. Running time 500ss for 100 images 35

Summary and discussion Current deep learning models are not robust If DL is more widely used, the risk of adversarial attack will be higher One attacking method Momentum IFGSM is presented One defending methods High-level representation guided denoiser (HGD) is presented Defending techniques are not effective if they are known to the attacker HGD can be also fooled Human is much more robust to adversarial examples Brain-inspired computing is promising 37

Open source Momentum IFGSM https://github.com/dongyp13/non-targeted-adversarial- Attacks https://github.com/dongyp13/targeted-adversarial- Attack HGD https://github.com/lfz/guided-denoise 38

Q & A