Object Recognition. Selim Aksoy. Bilkent University

Similar documents
INTRODUCTION TO PATTERN RECOGNITION

Deakin Research Online

Deformable Convolutional Networks

Petacat: Applying ideas from Copycat to image understanding

THe rip currents are very fast moving narrow channels,

Introduction to Pattern Recognition

Predicting Human Behavior from Public Cameras with Convolutional Neural Networks

Recognition of Tennis Strokes using Key Postures

Game, Shot and Match: Event-based Indexing of Tennis

Pedestrian Protection System for ADAS using ARM 9

Automated Proactive Road Safety Analysis

1.1 The size of the search space Modeling the problem Change over time Constraints... 21

Safety Assessment of Installing Traffic Signals at High-Speed Expressway Intersections

Self-Driving Vehicles That (Fore) See

Performance of Fully Automated 3D Cracking Survey with Pixel Accuracy based on Deep Learning

Modelling Approaches and Results of the FHDO Biomedical Computer Science Group at ImageCLEF 2015 Medical Classification Task

Published in: IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2007

Advanced PMA Capabilities for MCM

D-Case Modeling Guide for Target System

Cricket umpire assistance and ball tracking system using a single smartphone camera

JPEG-Compatibility Steganalysis Using Block-Histogram of Recompression Artifacts

Visual Traffic Jam Analysis Based on Trajectory Data

Chapter 5 DATA COLLECTION FOR TRANSPORTATION SAFETY STUDIES

SIDRA INTERSECTION 6.1 UPDATE HISTORY

HIGH RESOLUTION DEPTH IMAGE RECOVERY ALGORITHM USING GRAYSCALE IMAGE.

4. Learning Pedestrian Models

FACE DETECTION. Lab on Project Based Learning May Xin Huang, Mark Ison, Daniel Martínez Visual Perception

knn & Naïve Bayes Hongning Wang

Introduction to Pattern Recognition

Capacity and Level of Service LOS

Application of Dijkstra s Algorithm in the Evacuation System Utilizing Exit Signs

LABORATORY EXERCISE 1 CONTROL VALVE CHARACTERISTICS

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

View-invariant Estimation of Height and Stride for Gait Recognition

Universal Style Transfer via Feature Transforms

Vision Zero High Injury Network Methodology

Generation of See-Through Baseball Movie from Multi-Camera Views

FEATURES. Features. UCI Machine Learning Repository. Admin 9/23/13

Supplementary Material for Bayes Merging of Multiple Vocabularies for Scalable Image Retrieval

A New Benchmark for Vison-Based Cyclist Detection

A computer program that improves its performance at some task through experience.

CS145: INTRODUCTION TO DATA MINING

Biomechanics and Models of Locomotion

Commonsense Knowledge Acquisition and Applications

Beach Cities Living Streets Design Manual and Aviation Boulevard Multimodal Corridor Plan

Design of a Pedestrian Detection System Based on OpenCV. Ning Xu and Yong Ren*

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Open Research Online The Open University s repository of research publications and other research outputs

Fine-grained Walking Activity Recognition via Driving Recorder Dataset

A Bag-of-Gait Model for Gait Recognition

Fine-Grain Annotation of Cricket Videos

Recognizing Human Gait Types 183

Assistive guidance system for the visually impaired Rohit Takhar 1, Tushar Sharma 1, Udit Arora 1, Sohit Verma 1 1

#19 MONITORING AND PREDICTING PEDESTRIAN BEHAVIOR USING TRAFFIC CAMERAS

Using Perceptual Context to Ground Language

Active Pedestrian Safety: from Research to Reality

Deconstructing Data Science

Rural Highway Overtaking Lanes

FAULT DIAGNOSIS IN DEAERATOR USING FUZZY LOGIC

NBA Plays. Mitchell Kates. at the. September 2014

At each type of conflict location, the risk is affected by certain parameters:

Predicting Horse Racing Results with Machine Learning

Gait Analysis for Recognition and Classification

The Application of Pedestrian Microscopic Simulation Technology in Researching the Influenced Realm around Urban Rail Transit Station

Chapter 6. Analysis of the framework with FARS Dataset

Finding your feet: modelling the batting abilities of cricketers using Gaussian processes

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Title: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings

Queue analysis for the toll station of the Öresund fixed link. Pontus Matstoms *

Human Pose Tracking III: Dynamics. David Fleet University of Toronto

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

Software Reliability 1

Bayesian Optimized Random Forest for Movement Classification with Smartphones

Currents measurements in the coast of Montevideo, Uruguay

Introduction to topological data analysis

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

Introduction to Highway Safety Course

Transformer fault diagnosis using Dissolved Gas Analysis technology and Bayesian networks

PREDICTING the outcomes of sporting events

Pedestrian traffic flow operations on a platform: observations and comparison with simulation tool SimPed

Reliability of Safety-Critical Systems Chapter 3. Failures and Failure Analysis

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

NEIGHBORHOOD TRAFFIC CALMING POLICY

A Trajectory-Based Analysis of Coordinated Team Activity in Basketball Game

A SYSTEM FOR THE AUTOMATIC ANNOTATION OF TENNIS MATCHES. W.J. Christmas, A. Kostin, F. Yan, I. Kolonias and J. Kittler

Obtain a Simulation Model of a Pedestrian Collision Imminent Braking System Based on the Vehicle Testing Data

Application of Bayesian Networks to Shopping Assistance

Questions & Answers About the Operate within Operate within IROLs Standard

A Novel Approach to Evaluate Pedestrian Safety at Unsignalized Crossings using Trajectory Data

5.1 Introduction. Learning Objectives

Visual Background Recommendation for Dance Performances Using Dancer-Shared Images

Chapter 5: Methods and Philosophy of Statistical Process Control

Pedestrian Dynamics: Models of Pedestrian Behaviour

Uncovering Anomalous Usage of Medical Records via Social Network Analysis

Fun Neural Net Demo Site. CS 188: Artificial Intelligence. N-Layer Neural Network. Multi-class Softmax Σ >0? Deep Learning II

Impact of new imaging modalities on video surveillance and invasion of privacy

Early Skip Decision based on Merge Index of SKIP for HEVC Encoding

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Calculation of Trail Usage from Counter Data

Purpose. Scope. Process flow OPERATING PROCEDURE 07: HAZARD LOG MANAGEMENT

Transcription:

Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr

Image classification Image (scene) classification is a fundamental problem in image understanding. Automatic techniques for associating scenes with semantic labels have a high potential for improving the performance of other computer vision applications such as browsing (natural grouping of images instead of clusters based only on low-level features), retrieval (filtering images in archives based on content), and object recognition (the probability of an unknown object/region that exhibits several local features of a ship actually being a ship can be increased if the scene context is known to be a coast with high confidence but can be decreased if no water related context is dominant in that scene). CS 484, Spring 2012 2012, Selim Aksoy 2

Image classification The image classification problem has two critical components: representing images and learning models for semantic categories using these representations. Early work used low-level global features extracted from the whole image or from a fixed spatial layout. More recent approaches exploit local statistics in images using patches extracted by interest point detectors. Other configurations that use regions and their spatial relationships are also proposed. CS 484, Spring 2012 2012, Selim Aksoy 3

Hierarchical image classification Images Others (Face/close-up) Indoor Outdoor Others (Face/close-up) City Landscape Others Sunset Mountain/forest Mountain Forest Hierarchy of 11 scene categories (Vailaya et al., Image classification for content-based indexing, IEEE Trans. Image Processing, 2001). CS 484, Spring 2012 2012, Selim Aksoy 4

Hierarchical image classification Image representation: Mean and std. dev. of LUV values in 10x10 blocks for indoor/outdoor classification. Edge direction histograms for city/landscape classification. Histograms of HSV and LUV values for sunset/mountain/forest classification. Classification: Class-conditional density estimation using vector quantization. Bayesian classification. CS 484, Spring 2012 2012, Selim Aksoy 5

Hierarchical image classification CS 484, Spring 2012 2012, Selim Aksoy 6

Hierarchical image classification CS 484, Spring 2012 2012, Selim Aksoy 7

Image classification using bag-of-words Caltech data set: 13 natural scene categories. IDIAP data set (left to right): mountain, forest, indoor, city-panorama, city-street. CS 484, Spring 2012 2012, Selim Aksoy 8

Image classification using bag-of-words Flowchart from Fei-Fei Li, Pietro Perona, A Bayesian hierarchical model for learning natural scene categories, IEEE CVPR, 2005. CS 484, Spring 2012 2012, Selim Aksoy 9

Image classification using bag-of-words A codebook obtained from 650 training examples from 13 categories. Image patches are detected by a sliding grid and random sampling of scales. CS 484, Spring 2012 2012, Selim Aksoy 10

CS 484, Spring 2012 2012, Selim Aksoy 11

CS 484, Spring 2012 2012, Selim Aksoy 12

Image classification using bag-of-words CS 484, Spring 2012 2012, Selim Aksoy 13

Image classification using bag-of-words Flowchart from Quelhas et al., A thousand words in a scene, IEEE Trans. PAMI, 2007. Probabilistic Latent Semantic Analysis (PLSA) is used to learn aspect models to capture co-occurrences of visterms (visual terms). Bag-of-visterms representation or the aspect parameters are given as input to Support Vector Machines for classification. CS 484, Spring 2012 2012, Selim Aksoy 14

Image classification using bag-of-words CS 484, Spring 2012 2012, Selim Aksoy 15

CS 484, Spring 2012 2012, Selim Aksoy 16

Image classification using bag-of-regions D. Gökalp, S. Aksoy, Scene classification using bag-of-regions representations, IEEE CVPR, Beyond Patches Workshop, 2007. Region segmentation Region clustering region codebook Above-below spatial relationships region pairs Statistical region selection: identify region types that are frequently found in a particular class of scenes but rarely exist in other classes, and consistently occur together in the same class of scenes. Bayesian scene classification using bag of individual regions, bag of region pairs. CS 484, Spring 2012 2012, Selim Aksoy 17

Image classification using bag-of-regions Examples for region clusters. Each row represents a different cluster. CS 484, Spring 2012 2012, Selim Aksoy 18

Image classification using bag-of-regions CS 484, Spring 2012 2012, Selim Aksoy 19

Image classification using bag-of-regions Examples for correctly classified scenes. Examples for wrongly classified scenes. CS 484, Spring 2012 2012, Selim Aksoy 20

Image classification using factor graphs Boutell et al., Scene Parsing Using Region-Based Generative Models, IEEE Trans. Multimedia, 2007. CS 484, Spring 2012 2012, Selim Aksoy 21

ICCV 2005 Beijing, Short Course, Oct 15 Recognizing and Learning Object Categories Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT

Agenda Introduction Bag of words models Part-based models Discriminative methods Segmentation and recognition Conclusions CS 484, Spring 2012 2012, Selim Aksoy 23

perceptible vision material thing CS 484, Spring 2012 2012, Selim Aksoy 24

How many object categories are there? Biederman 1987 CS 484, Spring 2012 2012, Selim Aksoy 25

So what does object recognition involve? CS 484, Spring 2012 2012, Selim Aksoy 26

Verification: is that a bus? CS 484, Spring 2012 2012, Selim Aksoy 27

Detection: are there cars? CS 484, Spring 2012 2012, Selim Aksoy 28

Identification: is that a picture of Mao? CS 484, Spring 2012 2012, Selim Aksoy 29

Object categorization sky building flag banner bus face street lamp bus wall cars CS 484, Spring 2012 2012, Selim Aksoy 30

Scene and context categorization outdoor city traffic CS 484, Spring 2012 2012, Selim Aksoy 31

Challenges 1: view point variation Michelangelo 1475-1564 CS 484, Spring 2012 2012, Selim Aksoy 32

Challenges 2: illumination slide credit: S. Ullman CS 484, Spring 2012 2012, Selim Aksoy 33

Challenges 3: occlusion Magritte, 1957 CS 484, Spring 2012 2012, Selim Aksoy 34

Challenges 4: scale CS 484, Spring 2012 2012, Selim Aksoy 35

Challenges 5: deformation Xu, Beihong 1943 CS 484, Spring 2012 2012, Selim Aksoy 36

Challenges 6: background clutter Klimt, 1913 CS 484, Spring 2012 2012, Selim Aksoy 37

History: single object recognition CS 484, Spring 2012 2012, Selim Aksoy 38

Challenges 7: intra-class variation CS 484, Spring 2012 2012, Selim Aksoy 39

History: early object categorization CS 484, Spring 2012 2012, Selim Aksoy 40

CS 484, Spring 2012 2012, Selim Aksoy 41

ANIMALS PLANTS OBJECTS INANIMATE.. VERTEBRATE NATURAL MAN-MADE MAMMALS BIRDS TAPIR BOAR GROUSE CAMERA CS 484, Spring 2012 2012, Selim Aksoy 42

CS 484, Spring 2012 2012, Selim Aksoy 43

Scenes, Objects, and Parts Scene Objects Parts E. Sudderth, A. Torralba, W. Freeman, A. Willsky. ICCV 2005. Features CS 484, Spring 2012 2012, Selim Aksoy 44

Object categorization: the statistical viewpoint p( zebra image ) vs. p( no zebra image) Bayes rule: p ( zebra image ) p ( image zebra ) p ( zebra ) = p( no zebra image) p( image no zebra) p( no zebra) posterior ratio likelihood ratio prior ratio CS 484, Spring 2012 2012, Selim Aksoy 45

Object categorization: the statistical viewpoint p( zebra image) p( image zebra) p( zebra) = p( no zebra image) p( image no zebra) p( no zebra) posterior ratio likelihood ratio prior ratio Discriminative methods model posterior Generative methods model likelihood and prior CS 484, Spring 2012 2012, Selim Aksoy 46

Discriminative Direct modeling of p( zebra image) p( no zebra image) Decision boundary Zebra Non-zebra CS 484, Spring 2012 2012, Selim Aksoy 47

Generative Model p( image zebra) and p( image no zebra) p ( image zebra ) Low High p( image no zebra) Middle Middle Low CS 484, Spring 2012 2012, Selim Aksoy 48

Three main issues Representation How to represent an object category Learning How to form the classifier, given training data Recognition How the classifier is to be used on novel data CS 484, Spring 2012 2012, Selim Aksoy 49

Representation Generative / discriminative / hybrid CS 484, Spring 2012 2012, Selim Aksoy 50

Representation Generative / discriminative / hybrid Appearance only or location and appearance CS 484, Spring 2012 2012, Selim Aksoy 51

Representation Generative / discriminative / hybrid Appearance only or location and appearance Invariances View point Illumination Occlusion Scale Deformation Clutter etc. CS 484, Spring 2012 2012, Selim Aksoy 52

Representation Generative / discriminative / hybrid Appearance only or location and appearance invariances Part-based or global w/subwindow CS 484, Spring 2012 2012, Selim Aksoy 53

Representation Generative / discriminative / hybrid Appearance only or location and appearance invariances Parts or global w/subwindow Use set of features or each pixel in image CS 484, Spring 2012 2012, Selim Aksoy 54

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning CS 484, Spring 2012 2012, Selim Aksoy 55

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) Methods of training: generative vs. discriminative class de ensities 5 4 3 2 1 p(x C 1 ) p(x C 2 ) posterior pro obabilities 1 0.8 0.6 0.4 0.2 p(c 1 x) p(c 2 x) 0 0 0.2 0.4 0.6 0.8 1 x 0 0 0.2 0.4 0.6 0.8 1 x CS 484, Spring 2012 2012, Selim Aksoy 56

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) Level of supervision Manual segmentation; bounding box; image labels; noisy labels Contains a motorbike CS 484, Spring 2012 2012, Selim Aksoy 57

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) Level of supervision Manual segmentation; bounding box; image labels; noisy labels Batch/incremental (on category and image level; user-feedback ) CS 484, Spring 2012 2012, Selim Aksoy 58

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) Level of supervision Manual segmentation; bounding box; image labels; noisy labels Batch/incremental (on category and image level; user-feedback ) Training images: Issue of overfitting Negative images for discriminative methods CS 484, Spring 2012 2012, Selim Aksoy 59

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) Level of supervision Manual segmentation; bounding box; image labels; noisy labels Batch/incremental (on category and image level; user-feedback ) Training images: Issue of overfitting Negative images for discriminative methods Priors CS 484, Spring 2012 2012, Selim Aksoy 60

Recognition Scale / orientation range to search over Speed CS 484, Spring 2012 2012, Selim Aksoy 61

Object recognition using context The relationships between objects and the scene setting can be characterized in five ways: Interposition: objects interrupt their background Support: objects tend to rest on surfaces Probability: objects tend to be found in some contexts but not others Position: given an object is probable in a scene, it is often found in some positions and not others Familiar size: objects have a limited set of size relations with other objects CS 484, Spring 2012 2012, Selim Aksoy 62

Object recognition using context An ideal setting where the objects are perfectly segmented, the regions are classified, and the objects labels are refined with respect to semantic context in the image. (Rabinovich et al. Objects in Context, ICCV, 2007) CS 484, Spring 2012 2012, Selim Aksoy 63

Object recognition using context First column: segmentation results; second column: classification without contextual constraints; third column: classification with co-occurrence contextual constraints implemented using a conditional random field model. (Rabinovich et al. Objects in Context, ICCV, 2007) CS 484, Spring 2012 2012, Selim Aksoy 64

Object recognition using context better Using above below inside around relationships as spatial constraints better better worse First column: input image; second column: classification with co-occurrence contextual constraints; third column: classification with spatial and co-occurrence contextual constraints. (Galleguillos et al. Object Categorization using Co-Occurrence, Location and Appearance, CVPR, 2008) CS 484, Spring 2012 2012, Selim Aksoy 65

Object recognition using context Geometric context: learn labeling of the image into geometric classes such as support/ground (green), vertical planar (green) and sky (blue). The arrows show the direction that the vertical planar surface is facing. (Hoiem et al., Geometric Context from a Single Image, ICCV, 2005) CS 484, Spring 2012 2012, Selim Aksoy 66

Object recognition using context (a) (b) (c) (d) Putting objects in perspective: Car (a) and pedestrian (b) detection by using only local information. Car (c) and pedestrian (d) detection by using context. (Hoiem et al. Putting Objects in Perspective, CVPR, 2006) CS 484, Spring 2012 2012, Selim Aksoy 67

Object recognition using context Contextual recognition by maximizing a scene probability function that incorporates outputs of individual object detectors (confidence values for assigned object labels) and pairwise interactions between objects (likelihood of pairwise spatial relationships). Left column: without using context; right column: with using context. (Firat Kalaycilar, An object recognition framework using contextual interactions among objects, M.S. Thesis, Bilkent University, 2009) CS 484, Spring 2012 2012, Selim Aksoy 68