Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Similar documents
Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Logistic Regression. Hongning Wang

Special Topics: Data Science

Bayesian Methods: Naïve Bayes

knn & Naïve Bayes Hongning Wang

Operational Risk Management: Preventive vs. Corrective Control

CS145: INTRODUCTION TO DATA MINING

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Lecture 5. Optimisation. Regularisation

CS249: ADVANCED DATA MINING

ECO 745: Theory of International Economics. Jack Rossbach Fall Lecture 6

SNARKs with Preprocessing. Eran Tromer

Functions of Random Variables & Expectation, Mean and Variance

Lecture 10. Support Vector Machines (cont.)

Planning and Acting in Partially Observable Stochastic Domains

Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM)

Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China

CT4510: Computer Graphics. Transformation BOCHANG MOON

Product Decomposition in Supply Chain Planning

Conservation of Energy. Chapter 7 of Essential University Physics, Richard Wolfson, 3 rd Edition

Support Vector Machines: Optimization of Decision Making. Christopher Katinas March 10, 2016

Communication Amid Uncertainty

Communication Amid Uncertainty

Pre-Kindergarten 2017 Summer Packet. Robert F Woodall Elementary

Gait Based Personal Identification System Using Rotation Sensor

Combining Experimental and Non-Experimental Design in Causal Inference

Imperfectly Shared Randomness in Communication

Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation

CS 4649/7649 Robot Intelligence: Planning

New Class of Almost Unbiased Modified Ratio Cum Product Estimators with Knownparameters of Auxiliary Variables

Coaches, Parents, Players and Fans

Analysis of Gini s Mean Difference for Randomized Block Design

2 When Some or All Labels are Missing: The EM Algorithm

Jamming phenomena of self-driven particles

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Parsimonious Linear Fingerprinting for Time Series

Biomechanics and Models of Locomotion

Evaluating and Classifying NBA Free Agents

NCSS Statistical Software

ISyE 6414 Regression Analysis

EE582 Physical Design Automation of VLSI Circuits and Systems

ISyE 6414: Regression Analysis

1.1 The size of the search space Modeling the problem Change over time Constraints... 21

intended velocity ( u k arm movements

Physical Design of CMOS Integrated Circuits

Introduction to Pattern Recognition

Fun Neural Net Demo Site. CS 188: Artificial Intelligence. N-Layer Neural Network. Multi-class Softmax Σ >0? Deep Learning II

Transformer fault diagnosis using Dissolved Gas Analysis technology and Bayesian networks

Addition and Subtraction of Rational Expressions

What is Restrained and Unrestrained Pipes and what is the Strength Criteria

Two Machine Learning Approaches to Understand the NBA Data

Introduction to Pattern Recognition

Numerical Model to Simulate Drift Trajectories of Large Vessels. Simon Mortensen, HoD Marine DHI Australia

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

San Francisco State University ECON 560 Summer Midterm Exam 2. Monday, July hour 15 minutes

Tie Breaking Procedure

Simulating Major League Baseball Games

Modeling of Estimated Respiratory Waveform

/435 Artificial Intelligence Fall 2015

Chapter 10 Aggregate Demand I: Building the IS LM Model

In memory of Dr. Kevin P. Granata, my graduate advisor, who was killed protecting others on the morning of April 16, 2007.

HIGH RESOLUTION DEPTH IMAGE RECOVERY ALGORITHM USING GRAYSCALE IMAGE.

Introduction to topological data analysis

Open Research Online The Open University s repository of research publications and other research outputs

Global Journal of Engineering Science and Research Management

Use of Auxiliary Variables and Asymptotically Optimum Estimators in Double Sampling

INTRODUCTION TO PATTERN RECOGNITION

Grade K-1 WRITING Traffic Safety Cross-Curriculum Activity Workbook

TSP at isolated intersections: Some advances under simulation environment

The Intrinsic Value of a Batted Ball Technical Details

Self-Driving Vehicles That (Fore) See

1. Task Definition. Automated Human Gait Recognition Ifueko Igbinedion, Ysis Tarter CS 229, Fall 2013

Neural Network in Computer Vision for RoboCup Middle Size League


Online Companion to Using Simulation to Help Manage the Pace of Play in Golf

New Albany / Fred Klink Memorial Classic 3rd Grade Championship Bracket **Daylight Saving Time - Turn Clocks Up One Hour**

Machine Learning Application in Aviation Safety

Deakin Research Online

Neural Nets Using Backpropagation. Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill

NATIONAL FEDERATION RULES B. National Federation Rules Apply with the following TOP GUN EXCEPTIONS

Chapter 2 Ventilation Network Analysis

graphic standards manual Mountain States Health Alliance

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

Object Recognition. Selim Aksoy. Bilkent University

This file is part of the following reference:

POKEMON HACKS. Jodie Ashford Josh Baggott Chloe Barnes Jordan bird

An approach for optimising railway traffic flow on high speed lines with differing signalling systems

The Mathematics of Procedural Generation in Video Games. OK-AR Section Meeting of the MAA

PREDICTING the outcomes of sporting events

Operations on Radical Expressions; Rationalization of Denominators

4. Learning Pedestrian Models

Cricket umpire assistance and ball tracking system using a single smartphone camera

A Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept

THe rip currents are very fast moving narrow channels,

March Madness Basketball Tournament

Team formation and selection of strategies for computer simulations of baseball gaming

A COURSE OUTLINE (September 2001)

EE 364B: Wind Farm Layout Optimization via Sequential Convex Programming

Gregor Mendel. 19 th century Austrian monk Studied pea plants in his garden. Father of modern genetics

AFRL-RI-RS-TR

Transcription:

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lectures: Stefanos Zafeiriou Goal (Lectures): To present modern statistical machine learning/pattern recognition algorithms. The course focuses on statistical latent variable models (continuous & discrete). Goal (Tutorials): To provide the students the necessary mathematical tools for deeply understanding the models. Main Material: Pattern Recognition & Machine Learning by C. Bishop Chapters 1,2,12,13,8,9 More materials in the website: http://ibug.doc.ic.ac.uk/courses/advanced-statistical-machine-learning-495/ Email of the course: course495imperial@gmail.com

Statistical Machine Learning The two main concepts are : machine learning and statistics Machine Learning: A branch of Artificial Intelligence (A.I.) that focuses on designing, developing and studying the properties of algorithms that learn from data. Statistics: A branch of applied mathematics that study collection, organization, analysis, interpretation, presentation and visualization of data and modelling their randomness and uncertainty using probability theory, as well as linear algebra and analysis.

Statistical Machine Learning Learn: Learning is at the core of the problem of Intelligence. Models of learning are used for understanding the function of the brain. Learning is used to develop modern intelligent machines. In many disciplines learning is now one of the main lines of research, i.e. signal/speech/ (medical) image processing, computer vision, control/robotics, natural language processing, bioinfomatics etc. It starts to dominate other domains such as software engineering, security sciences, finance etc. Major players invest huge amount of money in modern learning systems (e.g., Deep Learning by Google and Facebook) Next 25 years will be the age of machine learning

Machine Learning in movies Terminator 2

Machine Learning in movies Movie: 2001: A Space Odyssey Director: Stanley Kubrick Hal 9000

Statistical Machine Learning Some applications of modern learning algorithms Face detection Object/Face tracking Biometrics Speech/Gesture recognition Image segmentation Finance Bioinformatics

Applications Object & face detection (modern cameras)

Applications Face/Iris/Fingerprint recognition (biometrics)

Applications Object-target tracking

Applications Speech Recognition (voice Google search) Waveform Hello world

Applications Gesture recognition (Kinect games) Gestures

Applications Image Segmentation

Applications Biological data Discover genes associated to a disease what are you thinking?

What does the machine learn in each application? Face Recognition: learn a classifier, i.e. a function (design of classifiers is covered in 424: Neural Computation) Model: Parameters: But images are very high dimensional objects!!! We need to reduce them!!

Latent Variable Models Model (Deterministic): Parameters: d Model (probabilistic): n Parameters: m F=m*n d<<f Graphical Model: θθ yy nn xx nn N

Latent Variable Models Deterministic model: There is no randomness (uncertainty) associated with the model. We can compute the actual values of the latent space. Probabilistic model: We assign probability distributions to the latent variables and model their dependencies. We can compute only statistics of the latent variables. More flexible than deterministic models. Generative Probabilistic models: Model observations drawn from a probability density function (pdf) (i.e., model the way data are generated )

Latent Variable Models Generative Probabilistic models: Model the complete (joint) likelihood of both the data and latent structure But what is the latent (hidden) structure: Intuitive. Latent (hidden) structure {left eyebrow} {right eyebrow} {left eye} {right eye} {mouth} {tongue} {nose}

Latent Variable Models Speech: Image: Word: need Phonemes: n iy d Object Background Latent structure

Latent Variable Models (Static) General Concept: yy 1 yy 2 yy 3 yy NN Share a common linear structure xx 1 xx 2 xx 3 xx NN xx NN We want to find the parameters: Joint likelihood maximization: NN ii=1 Ν ιι=1 p(xx 1,.., xx NN, yy 1,.., yy NN θθ) = pp(xx ii yy ii, WW, μμ, σσ) pp(yy ii )

Latent Variable Models (Dynamic, Continuous)

Latent Variable Models (Dynamic, Continuous) yy 1 yy 2 yy 3 yy TT xx 1 xx 2 xx 3 xx TT Generative Model xx nn = WWyy nn + ee nn yy 1 = μμ 0 + uu yy nn = AAyy nn 1 + vv nn Parameters: Noise distribution ee~nn(ee 00, ΣΣ) uu~nn uu 00, PP 0 vv~nn(vv 00, ΓΓ) θθ = {WW, AA, μμ 0, ΣΣ, ΓΓ, PP 0 }

Latent Variable Models (Dynamic, Continuous) yy 1 yy 2 yy 3 yy TT xx 1 xx 2 xx 3 xx TT Markov Property: Joint likelihood: pp(xx 1,.., xx TT, yy 1,.., yy TT ) NN = pp xx ii yy ii, WW, μμ 0, ΣΣ pp(yy 1 μμ 0, PP 0 ) pp(yy ii yy ii 1, ΑΑ, ΓΓ) ii=1 pp(yy ii, yy 1,.., yy ii 1 = pp(yy ii yy ii 1 ) Ν ιι=2

This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. Latent Variable Models (Dynamic, Discrete) Word: need Phonemes: n iy d yy 1 xx 1 yy 2 xx 2 yy 3 xx 3 yy TT xx TT Latent structure takes discrete values: yy tt {start,n,iy,d,end}

Latent Variable Models (Dynamic, Discrete) yy 1 yy 2 yy 3 yy TT xx 1 xx 2 xx 3 xx TT AA = aa iiii = [pp yy tt yy tt 1 ] ππ = [pp(yy 1 )] yy tt pp yy tt yy tt 1 s n iy d e yy tt 1 s n iy aa 11 aa 21 aa 12 aa 22 aa 13 aa 23 aa 31 aa 41 aa 32 aa 42 aa 33 aa 43 aa 51 aa 52 aa 53 d e aa 14 aa 15 aa 24 aa 25 aa 34 aa 35 aa 44 aa 45 aa 54 aa 55

Latent Variable Models (Dynamic, Discrete) Matrix of transition probabilities is represented as an automaton. aa 35 aa 22 aa 33 aa 44 aa 12 aa 23 aa 34 aa 45 start n iy d end Data generation: e.g. if the latent variable is yy tt = bb xx tt ~ NN(xx tt mm bb, SS bb ) Parameters θθ = {AA, {mm bb, SS bb } bb, ππ}

Latent Variable Models (Dynamic, Discrete) Word model s n iy d e HMM of a word HMM of a word HMM of a word Language model

Latent Variable Models (Spatial) Image Segmentation Brain tumour segmentation

Latent Variable Models (Spatial) Undirected spatial dependencies yy uu+1vv yy uu+1vv+1 xx uu+1vv xx uu+1vv+1 yy uuuu yy uuuu+1 xx uuuu xx uuuu+1 Markov Random fields

Latent Variable Models (Spatial) yy uu+1vv yy uu+1vv+1 Markov Random fields yy uuuu xx uuuu xx uu+1vv yy uuuu+1 xx uuuu+1 xx uu+1vv+1 p(yy 11,.., yy nnnn ) = 1 ZZ ψψ(yy CC) CC is the maximal clique. ψψ yy CC = ee ( EE yy CC ) Potential function: Partition function: Markov blanket: pp(yy uuuu, yy vvvv YY / yy uuuu,yy vvvv ) = pp(yy uuuu YY yyuuuu ) pp(yy vvvv YY yyvvvv ) Complete likelihood: pp XX, YY θθ = 1 ZZ pp(xx uuuu yy uuuu, θθ) ψψ(yy CC θθ) uuvv CC CC ZZ = ψψ(yy CC ) yy CC

Summarize what we will study? Deterministic Component Analysis (3 weeks) Unsupervised approaches: Principal Component Analysis, Independent Component Analysis, Graph-based Component Analysis, Slow feature Analysis Supervised approaches: Linear discriminant analysis What we will learn?: How to find the latent space directly. How to find the latent space via linear projections.

Summarize what we will study? Static data Sequential data yy nn yy 1 yy 2 yy 3 yy TT θθ xx nn N xx 1 xx 2 xx 3 xx TT yy uu+1vv yy uu+1vv+1 yy uuuu xx uuuu xx uu+1vv xx uu+1vv+1 yy uuuu+1 xx uuuu+1 Spatial data

Summarize what we will study? Probabilistic Principal component analysis yy nn EE[yy] θθ xx nn N What we will learn?: How to formulate probabilistically the problem. How to find both data moments EE yy ii, EE[yyyy TT ] and parameters θθ

Summarize what we will study? Sequential data (3 weeks): yy 1 yy 2 yy 3 yy TT xx 1 xx 2 xx 3 xx TT What are the models?: The Kalman filter (1 week)/ the particle filter (1 week). The Hidden Markov Model (1 week). What we will learn?: How to formulate probabilistically the problems and learn parameters.

Summarize what we will study? Spatial data (2-3 weeks): yy uu+1vv yy uu+1vv+1 xx uu+1vv xx uu+1vv+1 yy uuuu xx uuuu What are the models?: Gaussian Markov Random Fields (1 week). Discrete Markov Random Fields (1 week) and Mean Field Approximation. What we will learn?: How to formulate probabilistically the problems and learn parameters. yy uuuu+1 xx uuuu+1

What are the tools we need? We need elements from differential/integral calculus using vectors and matrices (e.g., ) WW ff WW = ff(ww) ddww iiii Matrix cookbook (by Michael Syskind Pedersen & Kaare Brandt Petersen) Mike Brookes (EEE) has a nice page http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html We need linear algebra (matrix multiplications, matrix inversion etc). Special focus on eigenanalysis. Excellent book is Matrix Computation by Gene H. Golub, & Charles F. Van Loan

What are the tools we need? We need elements of optimization (mainly simple quadratic optimization problems with constraints which result to generalized eigenvalue problems). Refresh memory on how Lagrangian multipliers are used etc. We need tools from probability/statistics: random variable, probability density/mass function, marginalization Assume pdf pp xx the probability is computed PP xx AA = pp xx dddd AA Marginal distributions pp xx = xx pp xx, yy dddd First and second order moments pp yy = yy E xx = pp xx, yy dddd xx xxpp xx ddxx E xxxx TT = xxxx TT pp xx ddxx xx

What are the tools we need? Bayes rule and conditional independence. pp xx, yy = pp xx yy pp(yy) Bayes rule yy Conditional independence pp xx, zz, yy = pp xx yy pp zz yy pp(yy) xx zz Finally, we need tools from algorithms recursion and dynamic programming

What to have always in mind? What is my model? What are my model s parameters? yy nn How do I find them? Maximum Likelihood Expectation Maximization θθ xx nn N How do I use the model?

Assignments Assessment (90% from written exams & 10% from assignments) Two assignments: One will be given next week and should be delivered by 21 st of February The second will be given 24 th of February and should be delivered 17 th of March

Adv. Statistical Machine Learning Lectures Lecture 1-2: Introduction Lecture 3-4: A primer on calculus, linear algebra, probability/statistics Lecture 5-6: Deterministic Component Analysis (1) Lecture 7-8: Deterministic Component Analysis (2) Lecture 9-10: Deterministic Component Analysis (3) Lecture 11-12: Probabilistic Principal Component Analysis Lecture 13-14: Sequential Data: Kalman Filter (1) Lecture 15-16: Sequential Data: Kalman Filter (2) Lecture 17-18: Sequential Data: Hidden Markov Model (1) Lecture 18-19: Sequential Data: Hidden Markov Model (2) Lecture 19-20: Sequential Data: Particle Filtering (1) Lecture 20-21: Sequential Data: Particle Filtering (2) Lecture 22-23: Spatial Data: Gaussian Markov Random Field (GMRF) Lecture 23-24: Spatial Data: GMRF (2) Lecture 25-28: Spatial Data: Discrete MRF (Mean Field)

Machine Learning (models/parameters) What does the machine learn in each application (parameters of a model)? Object detection: learn classifier, i.e. a function (design of classifiers is covered in Neural Computation) Model: Parameters: