Introduction to Pattern Recognition

Similar documents
Introduction to Pattern Recognition

INTRODUCTION TO PATTERN RECOGNITION

Open Research Online The Open University s repository of research publications and other research outputs

A Novel Approach to Predicting the Results of NBA Matches

Measuring Relative Achievements: Percentile rank and Percentile point

PREDICTING the outcomes of sporting events

Predicting Tennis Match Outcomes Through Classification Shuyang Fang CS074 - Dartmouth College

Machine Learning an American Pastime

Lecture 10. Support Vector Machines (cont.)

Cricket umpire assistance and ball tracking system using a single smartphone camera

Black Sea Bass Encounter

Special Topics: Data Science

Evaluating and Classifying NBA Free Agents

Visual Traffic Jam Analysis Based on Trajectory Data

Using Spatio-Temporal Data To Create A Shot Probability Model

Advanced PMA Capabilities for MCM

Road Data Input System using Digital Map in Roadtraffic

PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images

A Game Theoretic Study of Attack and Defense in Cyber-Physical Systems

Tech Suit Survey (Virginia LMSC)

#19 MONITORING AND PREDICTING PEDESTRIAN BEHAVIOR USING TRAFFIC CAMERAS

Optimizing Cyclist Parking in a Closed System

Prediction of Nearshore Waves and Currents: Model Sensitivity, Confidence and Assimilation

Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation

The Intrinsic Value of a Batted Ball Technical Details

Application of Bayesian Networks to Shopping Assistance

Figure 1. Winning percentage when leading by indicated margin after each inning,

Taking Your Class for a Walk, Randomly

CSC242: Intro to AI. Lecture 21

Can trawling effort be identified from satellite-based VMS data?

Gait Recognition. Yu Liu and Abhishek Verma CONTENTS 16.1 DATASETS Datasets Conclusion 342 References 343

Two Machine Learning Approaches to Understand the NBA Data

2600T Series Pressure Transmitters Plugged Impulse Line Detection Diagnostic. Pressure Measurement Engineered solutions for all applications

FEATURES. Features. UCI Machine Learning Repository. Admin 9/23/13

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Predicting the Total Number of Points Scored in NFL Games

67. Sectional normalization and recognization on the PV-Diagram of reciprocating compressor

A SURVEY OF 1997 COLORADO ANGLERS AND THEIR WILLINGNESS TO PAY INCREASED LICENSE FEES

Transformer fault diagnosis using Dissolved Gas Analysis technology and Bayesian networks

Predicting NBA Shots

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

Title: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings

Chapter 12 Practice Test

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

From Bombe stops to Enigma keys

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

knn & Naïve Bayes Hongning Wang

Estimation and Analysis of Fish Catches by Category Based on Multidimensional Time Series Database on Sea Fishery in Greece

The system design must obey these constraints. The system is to have the minimum cost (capital plus operating) while meeting the constraints.

Chapter 5: Methods and Philosophy of Statistical Process Control

A Hare-Lynx Simulation Model

Appendix C. Corridor Spacing Research

CS 4649/7649 Robot Intelligence: Planning

Bayesian Optimized Random Forest for Movement Classification with Smartphones

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

Recognition of Tennis Strokes using Key Postures

Combined impacts of configurational and compositional properties of street network on vehicular flow

EVOLVING HEXAPOD GAITS USING A CYCLIC GENETIC ALGORITHM

What Causes the Favorite-Longshot Bias? Further Evidence from Tennis

THe rip currents are very fast moving narrow channels,


Using an Adaptive Thresholding Algorithm to Detect CA1 Hippocampal Sharp Wave Ripples. Jay Patel. Michigan State University

JPEG-Compatibility Steganalysis Using Block-Histogram of Recompression Artifacts

Opleiding Informatica

Ron Gibson, Senior Engineer Gary McCargar, Senior Engineer ONEOK Partners

Calculation of Trail Usage from Counter Data

PGA Tour Scores as a Gaussian Random Variable

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

Navigate to the golf data folder and make it your working directory. Load the data by typing

Author s Name Name of the Paper Session. Positioning Committee. Marine Technology Society. DYNAMIC POSITIONING CONFERENCE September 18-19, 2001

July 2011, Number 62 ALL-WAYS TM NEWSLETTER

Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China

Evaluating the Influence of R3 Treatments on Fishing License Sales in Pennsylvania

THE WAVE CLIMATE IN THE BELGIAN COASTAL ZONE

Data Sheet T 8389 EN. Series 3730 and 3731 Types , , , and. EXPERTplus Valve Diagnostic

Optimal Weather Routing Using Ensemble Weather Forecasts

Performance of Fully Automated 3D Cracking Survey with Pixel Accuracy based on Deep Learning

How Do Injuries in the NFL Affect the Outcome of the Game

Fun Neural Net Demo Site. CS 188: Artificial Intelligence. N-Layer Neural Network. Multi-class Softmax Σ >0? Deep Learning II

Lecture 1: Knot Theory

Electromyographic (EMG) Decomposition. Tutorial. Hamid R. Marateb, PhD; Kevin C. McGill, PhD

Section I: Multiple Choice Select the best answer for each problem.

1.1 The size of the search space Modeling the problem Change over time Constraints... 21

CSE 190a Project Report: Golf Club Head Tracking

PHYS Tutorial 7: Random Walks & Monte Carlo Integration

Modeler Wizard User Guide November 2011

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf

Advanced Hydraulics Prof. Dr. Suresh A. Kartha Department of Civil Engineering Indian Institute of Technology, Guwahati

ORF 201 Computer Methods in Problem Solving. Final Project: Dynamic Programming Optimal Sailing Strategies

Parsimonious Linear Fingerprinting for Time Series

Week 7 One-way ANOVA

Deconstructing Data Science

Verification of Peening Intensity

CS 528 Mobile and Ubiquitous Computing Lecture 7a: Applications of Activity Recognition + Machine Learning for Ubiquitous Computing.

TG GUIDELINES CONCERNING CALIBRATION INTERVALS AND RECALIBRATION

Monitoring beach usage on Gold Coast beaches: Is it beneficial?

Legendre et al Appendices and Supplements, p. 1

2010 TRAVEL TIME REPORT

Knots, four dimensions, and fractals

Transcription:

Introduction to Pattern Recognition Jason Corso SUNY at Buffalo 19 January 2011 J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 1 / 32

Examples of Pattern Recognition in the Real World Examples of Pattern Recognition in the Real World Hand-Written Digit Recognition J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 2 / 32

Examples of Pattern Recognition in the Real World Examples of Pattern Recognition in the Real World Computational Finance and the Stock Market J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 3 / 32

Examples of Pattern Recognition in the Real World Examples of Pattern Recognition in the Real World Bioinformatics and Gene Expression Analysis J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 4 / 32

Examples of Pattern Recognition in the Real World Examples of Pattern Recognition in the Real World Biometrics J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 5 / 32

Examples of Pattern Recognition in the Real World Examples of Pattern Recognition in the Real World It is also a Novel by William Gibson! Do let me know if you want to borrow it! J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 6 / 32

Pattern Recognition By Example Example: Sorting Fish Salmon Sea Bass J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 7 / 32

Pattern Recognition By Example Example: Sorting Fish Pattern Recognition System Requirements Set up a camera to watch the fish coming through on the conveyor belt. Classify each fish as salmon or sea bass. Prefer to mistake sea bass for salmon. J. Corso (SUNY at Buffalo) Introduction to FIGURE 1.1. The objects to be classified are first sensed by a transducer (camera), whose signals are preprocessed. Next the features are extracted and finally the classification is emitted, here either salmon or sea bass. Although the information flow is often chosen to be from the source to the classifier, some systems employ information Pattern Recognition 19 January 2011 8 / 32 flow in which earlier levels of processing can be altered based on the tentative or pre-

Pattern Recognition By Example A Note On Preprocessing Inevitably, preprocessing will be necessary. Preprocessing is the act of modifying the input data to simplify subsequent operations without losing relevant information. Examples of preprocessing (for varying types of data): Noise removal. Element segmentation; Spatial. Temporal. Alignment or registration of the query to a canonical frame. Fixed transformation of the data: Change color space (image specific). Wavelet decomposition. Transformation from denumerable representation (e.g., text) to a 1-of-B vector space. Preprocessing is a key part of our Pattern Recognition toolbox, but we will talk about it directly very little in this course. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 9 / 32

Clear that the populations of salmon and sea bass are indeed distinct. The space of all fish is quite large. Each dimension is defined by some property of the fish, most of which we cannot even measure with the camera. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 10 / 32 Pattern Recognition By Example Patterns and Models Ideal State Space The Space of All Fish Sea Bass Salmon

Pattern Recognition By Example Patterns and Models Real State Space The Space of All Fish Given a Set of Features Salmon Sea Bass When we choose a set of possible features, we are projecting this very high dimension space down into a lower dimension space. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 11 / 32

Pattern Recognition By Example Patterns and Models Features as Marginals The Space of All Fish Given a Set of Features Salmon Sea Bass Marginal (A Feature) And indeed, we can think of each individual feature as a single marginal distribution over the space. In other words, a projection down into a single dimension space. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 12 / 32

Pattern Recognition By Example Patterns and Models Models The Space of All Fish Sea Bass Salmon Sea Bass Of Salmon Models We build a model of each phenomenon we want to classify, which is an approximate representation given the features we ve selected. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 13 / 32

Pattern Recognition By Example Patterns and Models Models The overarching goal and approach in pattern classification is to hypothesize the class of these models, process the sensed data to eliminate noise (not due to the models), and for any sensed pattern choose the model that corresponds best. -DHS J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 14 / 32

Pattern Recognition By Example Selecting Feature(s) for the Fish Modeling for the Fish Example Suppose an expert at the fish packing plant tells us that length is the best feature. We cautiously trust this expert. Gather a few examples from our installation to analyze the length feature. These examples are our training set. Want to be sure to gather a representative population of them. We analyze the length feature by building histograms: marginal distributions. Histogram of the Length Feature count 22 20 18 16 12 10 8 6 4 2 0 salmon sea bass 5 10 15 20 25 l* length FIGURE 1.2. Histograms for the length feature for the two categories. No single threshold value of the length will serve to unambiguously discriminate between the two categories; using length alone, we will have some errors. The value marked l will lead to the smallest number of errors, on average. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 15 / 32

Pattern Recognition By Example Selecting Feature(s) for the Fish Modeling for the Fish Example Suppose an expert at the fish packing plant tells us that length is the best feature. We cautiously trust this expert. Gather a few examples from our installation to analyze the length feature. These examples are our training set. Want to be sure to gather a representative population of them. We analyze the length feature by building histograms: marginal distributions. Histogram of the Length Feature count 22 20 18 16 12 10 8 6 4 2 0 salmon sea bass 5 10 15 20 25 l* length FIGURE 1.2. Histograms for the length feature for the two categories. No single threshold value of the length will serve to unambiguously discriminate between the two categories; using length alone, we will have some errors. The value marked l will lead to But this is a disappointing result. The sea bass length does exceed the salmon length on average, but clearly not always. the smallest number of errors, on average. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 15 / 32

Pattern Recognition By Example Selecting Feature(s) for the Fish Lightness Feature Modeling for the Fish Example Try another feature after inspecting the data: lightness. count 14 12 10 8 6 4 2 salmon sea bass 0 lightness 2 4 x* 6 8 10 FIGURE 1.3. Histograms for the lightness feature for the two categories. No single threshold value x (decision boundary) will serve to unambiguously discriminate between the two categories; using lightness alone, we will have some errors. The value x marked will lead to the smallest number of errors, on average. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. This feature exhibits a much better separation between the two classes. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 16 / 32

Pattern Recognition By Example Feature Combination Modeling for the Fish Example Seldom will one feature be enough in practice. In the fish example, perhaps lightness, x 1, and width, x 2, will jointly do better than any alone. This is an example of a 2D feature space: [ ] x1 x =. (1) width 22 21 20 19 18 17 16 15 14 salmon x 2 sea bass 2 4 6 8 10 lightness J. Corso (SUNY FIGURE at 1.4. Buffalo) The two features Introduction of lightness to Pattern and width Recognition for sea bass and salmon. 19The January dark2011 17 / 32

Key Ideas in Pattern Recognition Curse Of Dimensionality The two features obviously separate the classes much better than one alone. This suggests adding a third feature. And a fourth feature. And so on. Key questions J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 18 / 32

Key Ideas in Pattern Recognition Curse Of Dimensionality The two features obviously separate the classes much better than one alone. This suggests adding a third feature. And a fourth feature. And so on. Key questions How many features are required? J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 18 / 32

Key Ideas in Pattern Recognition Curse Of Dimensionality The two features obviously separate the classes much better than one alone. This suggests adding a third feature. And a fourth feature. And so on. Key questions How many features are required? Is there a point where we have too many features? J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 18 / 32

Key Ideas in Pattern Recognition Curse Of Dimensionality The two features obviously separate the classes much better than one alone. This suggests adding a third feature. And a fourth feature. And so on. Key questions How many features are required? Is there a point where we have too many features? How do we know beforehand which features will work best? J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 18 / 32

Key Ideas in Pattern Recognition Curse Of Dimensionality The two features obviously separate the classes much better than one alone. This suggests adding a third feature. And a fourth feature. And so on. Key questions How many features are required? Is there a point where we have too many features? How do we know beforehand which features will work best? What happens when there is feature redundance/correlation? J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 18 / 32

Decision Boundary Key Ideas in Pattern Recognition Decision Boundaries and Generalization The decision boundary is the sub-space in which classification among multiple possible outcomes is equal. Off the decision boundary, all classification is unambiguous. count 14 12 10 8 6 4 2 0 salmon sea bass lightness 2 4 x* 6 8 10 2 4 6 8 10 FIGURE 1.3. Histograms for the lightness feature for the two categories. FIGURE No1.4. single The two features of lightness and width for sea bass and salmon. The dark threshold value x (decision boundary) will serve to unambiguously discriminate line could serve between the two categories; using lightness alone, we will have some errors. as a decision boundary of our classifier. Overall classification error on the The data value shown x is lower than if we use only one feature as in Fig. 1.3, but there will marked will lead to the smallest number of errors, on average. From: Richard still be O. some Duda, errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Classification. Copyright c 2001 by John Wiley & Sons, Inc. Wiley & Sons, Inc. width 22 21 20 19 18 17 16 15 14 salmon sea bass lightness J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 19 / 32

Key Ideas in Pattern Recognition Bias-Variance Dilemma Decision Boundaries and Generalization Depending on the available features, complexity of the problem and classifier, the decision boundaries will also vary in complexity. width 22 21 20 19 18 17 16 15 14 salmon sea bass 2 4 6 8 10 lightness width 22 21 20 19 18 17 16 15 14 salmon 2 4 6 8 10 IGURE 1.4. The two features of lightness and width for sea bass FIGURE and salmon. 1.6. The The dark decision boundary shown might represent FIGURE the optimal 1.5. Overly tradeoff complex between classification performance error on on the training set and simplicity of classifier, are complicated. thereby giving Whilethe such a decision may lead to perfect classification of our tra models for the fish will lead to decision boundarie ne could serve as a decision boundary of our classifier. Overall e data shown is lower than if we use only one feature as in highest Fig. 1.3, accuracy but there will on new patterns. From: Richard O. Duda, Peter samples, E. Hart, it would and David lead G. to poor performance on future patterns. The novel test till be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Stork, Pattern Classification. Copyright c 2001 by John Wiley marked & Sons,? is Inc. evidently most likely a salmon, whereas the complex decision boun lassification. Copyright c 2001 by John Wiley & Sons, Inc. shown leads it to be classified as a sea bass. From: Richard O. Duda, Peter E. Hart David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, In sea bass lightness width 22 21 20 19 18 17 16 15 14 salmon? sea bass 2 4 6 8 10 lightness J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 20 / 32

Key Ideas in Pattern Recognition Bias-Variance Dilemma Decision Boundaries and Generalization Depending on the available features, complexity of the problem and classifier, the decision boundaries will also vary in complexity. width 22 21 20 19 18 17 16 15 14 salmon sea bass 2 4 6 8 10 lightness width 22 21 20 19 18 17 16 15 14 salmon 2 4 6 8 10 IGURE 1.4. The two features of lightness and width for sea bass FIGURE and salmon. 1.6. The The dark decision boundary shown might represent FIGURE the optimal 1.5. Overly tradeoff complex between classification performance error on on the training set and simplicity of classifier, are complicated. thereby giving Whilethe such a decision may lead to perfect classification of our tra models for the fish will lead to decision boundarie ne could serve as a decision boundary of our classifier. Overall e data shown is lower than if we use only one feature as in highest Fig. 1.3, accuracy but there will on new patterns. From: Richard O. Duda, Peter samples, E. Hart, it would and David lead G. to poor performance on future patterns. The novel test till be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Stork, Pattern Classification. Copyright c 2001 by John Wiley marked & Sons,? is Inc. evidently most likely a salmon, whereas the complex decision boun lassification. Copyright Simple c 2001 by John Wiley decision & Sons, Inc. boundaries (e.g., linear) shown seem leads it to to be classified miss as a sea some bass. From: Richard obvious O. Duda, Peter E. Hart David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, In trends in the data variance. sea bass lightness width 22 21 20 19 18 17 16 15 14 salmon? sea bass 2 4 6 8 10 lightness J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 20 / 32

Key Ideas in Pattern Recognition Bias-Variance Dilemma Decision Boundaries and Generalization Depending on the available features, complexity of the problem and classifier, the decision boundaries will also vary in complexity. width 22 21 20 19 18 17 16 15 14 salmon sea bass 2 4 6 8 10 lightness width 22 21 20 19 18 17 16 15 14 salmon sea bass 2 4 6 8 10 lightness width 22 21 20 19 18 17 16 15 14 salmon? sea bass 2 4 6 8 10 IGURE 1.4. The two features of lightness and width for sea bass FIGURE and salmon. 1.6. The The dark decision boundary shown might represent FIGURE the optimal 1.5. Overly tradeoff complex between classification performance error on on the training set and simplicity of classifier, are complicated. thereby giving Whilethe such a decision may lead to perfect classification of our tra models for the fish will lead to decision boundarie ne could serve as a decision boundary of our classifier. Overall e data shown is lower than if we use only one feature as in highest Fig. 1.3, accuracy but there will on new patterns. From: Richard O. Duda, Peter samples, E. Hart, it would and David lead G. to poor performance on future patterns. The novel test till be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Stork, Pattern Classification. Copyright c 2001 by John Wiley marked & Sons,? is Inc. evidently most likely a salmon, whereas the complex decision boun lassification. Copyright Simple c 2001 by John Wiley decision & Sons, Inc. boundaries (e.g., linear) shown seem leads it to to be classified miss as a sea some bass. From: Richard obvious O. Duda, Peter E. Hart David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, In trends in the data variance. Complex decision boundaries seem to lock onto the idiosyncracies of the training data set bias. lightness J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 20 / 32

Key Ideas in Pattern Recognition Bias-Variance Dilemma Decision Boundaries and Generalization Depending on the available features, complexity of the problem and classifier, the decision boundaries will also vary in complexity. width 22 21 20 19 18 17 16 15 14 salmon sea bass 2 4 6 8 10 lightness width 22 21 20 19 18 17 16 15 14 salmon sea bass 2 4 6 8 10 lightness width 22 21 20 19 18 17 16 15 14 salmon? sea bass 2 4 6 8 10 IGURE 1.4. The two features of lightness and width for sea bass FIGURE and salmon. 1.6. The The dark decision boundary shown might represent FIGURE the optimal 1.5. Overly tradeoff complex between classification performance error on on the training set and simplicity of classifier, are complicated. thereby giving Whilethe such a decision may lead to perfect classification of our tra models for the fish will lead to decision boundarie ne could serve as a decision boundary of our classifier. Overall e data shown is lower than if we use only one feature as in highest Fig. 1.3, accuracy but there will on new patterns. From: Richard O. Duda, Peter samples, E. Hart, it would and David lead G. to poor performance on future patterns. The novel test till be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Stork, Pattern Classification. Copyright c 2001 by John Wiley marked & Sons,? is Inc. evidently most likely a salmon, whereas the complex decision boun lassification. Copyright Simple c 2001 by John Wiley decision & Sons, Inc. boundaries (e.g., linear) shown seem leads it to to be classified miss as a sea some bass. From: Richard obvious O. Duda, Peter E. Hart David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, In trends in the data variance. Complex decision boundaries seem to lock onto the idiosyncracies of the training data set bias. A central issue in pattern recognition is to build classifiers that can work properly on novel query data. Hence, generalization is key. Can we predict how well our classifier will generalize to novel data? J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 20 / 32 lightness

Decision Theory Key Ideas in Pattern Recognition Cost and Decision Theory In many situations, the consequences of our classifications are not equally costly. Recalling the fish example, it is acceptable to have tasty pieces of salmon in cans labeled sea bass. But, the converse is not so. Hence, we need to adjust our decisions (decision boundaries) to incorporate these varying costs. 6 For the lightness feature on the fish, we 4 2 would want to move the boundary to 0 2 4 x* 6 8 10 smaller values of lightness. count 14 12 10 8 salmon sea bass lightness FIGURE 1.3. Histograms for the lightness feature for the two categories. No single threshold value x (decision boundary) will serve to unambiguously discriminate between the two categories; using lightness alone, we will have some errors. The value x marked will lead to the smallest number of errors, on average. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. Our underlying goal is to establish a decision boundary to minimize the overall cost; this is called decision theory. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 21 / 32

Key Ideas in Pattern Recognition Pattern Recognition Definitions Any ideas? J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 22 / 32

Key Ideas in Pattern Recognition Pattern Recognition Definitions DHS: Pattern recognition is the act of taking in raw data and taking an action based on the category of the pattern. DHS: Pattern classification is to take in raw data, eliminate noise, and process it to select the most likely model that it represents. Jordan: The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying data into different categories. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 22 / 32

Key Ideas in Pattern Recognition Definitions Types of Pattern Recognition Approaches Statistical Focus on statistics of the patterns. The primary emphasis of our course. Syntactic Classifiers are defined using a set of logical rules. Grammars can group rules. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 23 / 32

Key Ideas in Pattern Recognition Definitions Feature Extraction and Classification Feature Extraction to characterize an object to be recognized by measurements whose values are very similar for objects in the same category, and very different for objects in different categories. Invariant features those that are invariant to irrelevant transformations of the underlying data are preferred. Classification to assign an category to the object based on the feature vector provided during feature extraction. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 24 / 32

Key Ideas in Pattern Recognition Definitions Feature Extraction and Classification Feature Extraction to characterize an object to be recognized by measurements whose values are very similar for objects in the same category, and very different for objects in different categories. Invariant features those that are invariant to irrelevant transformations of the underlying data are preferred. Classification to assign an category to the object based on the feature vector provided during feature extraction. The perfect feature extractor would yield a representation that is trivial to classify. The perfect classifier would yield a perfect model from an arbitrary set of features. But, these are seldom plausible. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 24 / 32

Key Ideas in Pattern Recognition Definitions Feature Extraction and Classification Classification Objective Functions For classification, there are numerous underlying objective functions that we can seek to optimize. Minimum-Error-Rate classification seeks to minimize the the error rate: the percentage of new patterns assigned to the wrong category. Total Expected Cost, or Risk minimization is also often used. Important underlying questions are How do we map knowledge about costs to best affect our classification decision? Can we estimate the total risk and hence know if our classifier is acceptable even before we deploy it? Can we bound the risk? J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 25 / 32

Key Ideas in Pattern Recognition No Free Lunch Theorem No Free Lunch Theorem A question you re probably asking is What is the best classifier? Any ideas? J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 26 / 32

Key Ideas in Pattern Recognition No Free Lunch Theorem No Free Lunch Theorem A question you re probably asking is What is the best classifier? Any ideas? We will learn that indeed no such generally best classifier exists. This is described in the No Free Lunch Theorem. If the goal is to obtain good generalization performance, there are no context-independent or usage-independent reasons to favor one learning or classification method over another. When confronting a new pattern recognition problem, appreciation of this thereom reminds us to focus on the aspects that matter most prior information, data distribution, amount of training data, and cost or reward function. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 26 / 32

Key Ideas in Pattern Recognition Analysis By Synthesis Analysis By Synthesis The availability of large collections of data on which to base our pattern recognition models is important. In the case of little data (and sometimes even in the case of much data), we can use analysis by synthesis to test our models. Given a model, we can randomly sample examples from it to analyze how close they are to our few examples and what we expect to see based on our knowledge of the problem. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 27 / 32

Key Ideas in Pattern Recognition Classifier Ensembles Classifier Ensembles Classifier combination is obvious get the power of multiple models for a single decision. But, what happens when the different classifiers disagree? How do we separate the available training data for each classifier? Should the classifiers be learned jointly or in silos? Examples Bagging Boosting Neural Networks (?) J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 28 / 32

Key Ideas in Pattern Recognition Classifier Ensembles SO MANY QUESTIONS... J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 29 / 32

Logistical Things Wrap-Up Read the course webpage (now and regularly): http://www.cse.buffalo.edu/~jcorso/t/ CSE555 Read the syllabus: http://www.cse.buffalo.edu/~jcorso/t/ CSE555/files/syllabus.pdf Read the newsgroup: sunyab.cse.555 THE BOOK J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 30 / 32

Logistical Things Wrap-Up Policy on reading and lecture notes Lecture notes are provided (mostly) via pdf linked from the course website. For lectures that are given primarily on the board, no notes are provided. It is always in your best interest to attend the lectures rather than exclusively read the book and notes. The notes are provided for reference. In the interest of the environment, I request that you do NOT print out the lecture notes. The lecture notes linked from the website may be updated time to time based on the lecture progress, questions, and errors. Check back regularly. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 31 / 32

Next Up! Wrap-Up A more realistic set of motivating application examples based in Mobile Robotics, given by Dr. Julian Ryde. J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 32 / 32