Faster Nearest Neighbors: Voronoi Diagrams and k-d Trees

Similar documents
10 Torque. Lab. What You Need To Know: Physics 211 Lab

Lesson 33: Horizontal & Vertical Circular Problems

British Prime Minister Benjamin Disraeli once remarked that

Multi-Robot Forest Coverage

Depth-first search and strong connectivity in Coq

Fundamental Algorithms for System Modeling, Analysis, and Optimization

Design Engineering Challenge: The Big Dig Contest Platter Strategies: Ball Liberation

arxiv:cs/ v1 [cs.ds] 8 Dec 1998

Torque. Physics 2. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

ABriefIntroductiontotheBasicsof Game Theory

Red-Black Trees Goodrich, Tamassia Red-Black Trees 1

Rearranging trees for robust consensus

CS3350B Computer Architecture. Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions

Noncrossing Trees and Noncrossing Graphs

Multi-Robot Flooding Algorithm for the Exploration of Unknown Indoor Environments

Accel. (m/s 2 ) Time (sec) Newton s 3 rd Law and Circular Motion. Group Problem 04

Efficient Algorithms for finding a Trunk on a Tree Network and its Applications

Data Sheet. Linear bearings

Experiment #10 Bio-Physics Pre-lab Questions

The Properties of. Model Rocket Body Tube Transitions

A tale of San Diego County s water. If you had to describe San Diego's weather, you probably would use

Matlab Simulink Implementation of Switched Reluctance Motor with Direct Torque Control Technique

Cyclostrophic Balance in Surface Gravity Waves: Essay on Coriolis Effects

1 of 6 5/12/2015 8:02 PM

The Study About Stopping Distance of Vehicles

Cheat-Proof Playout for Centralized and Distributed Online Games

CORESTA RECOMMENDED METHOD N 68

Lecture 24. Wind Lidar (6) Direct Motion Detection Lidar

"A Home for Gracious Living" (real estate brochure for Batten House)

Morrison Drive tel. Ottawa, ON, Canada K2H 8S fax. com

Electrical Equipment of Machine Tools

An Auction Based Mechanism for On-Demand Transport Services

RAILROAD CROSSING AT TWO LANE ROADWAY

Design and Simulation Model for Compensated and Optimized T-junctions in Microstrip Line

Experiment #10 Bio-Physics Pre-lab Comments, Thoughts and Suggestions

A CONCEPTUAL WHEELED ROBOT FOR IN-PIPE INSPECTION Ioan Doroftei, Mihaita Horodinca, Emmanuel Mignon

The Solution to the Bühlmann - Straub Model in the case of a Homogeneous Credibility Estimators

A Machine Vision based Gestural Interface for People with Upper Extremity Physical Impairments

A Deceleration Control Method of Automobile for Collision Avoidance based on Driver's Perceptual Risk

55CM ROUND CHARCOAL KETTLE BBQ

Tree. Tree. Siblings Grand Parent Grand Child H I J I J K. Tree Definitions

A Force Platform Free Gait Analysis

Trends in Cycling, Walking & Injury Rates in New Zealand

Interior Rule of the Quebec Open 2017

THE GREAT CARDBOARD BOAT RACE INTRODUCTION, BOAT BUILDING & RULES by United Way of Elkhart County

MODELLING THE INTERACTION EFFECTS OF THE HIGH-SPEED TRAIN TRACK BRIDGE SYSTEM USING ADINA

Complexity of Data Tree Patterns over XML Documents

Trees & Routing (1) BFS

OPTIMAL SCHEDULING MODELS FOR FERRY COMPANIES UNDER ALLIANCES

Carnegie Mellon University Forbes Ave., Pittsburgh, PA command as a point on the road and pans the camera in

Numerical study of super-critical carbon dioxide flow in steppedstaggered

Color Encodings: srgb and Beyond

Performance Characteristics of Parabolic Trough Solar Collector System for Hot Water Generation

A Collision Risk-Based Ship Domain Method Approach to Model the Virtual Force Field

Wind and extremely long bridges a challenge for computer aided design

Motivation. Prize-Collecting Steiner Tree Problem (PCSTP) Kosten und Profite. Das Fraktionale Prize-Collecting Steiner Tree Problem auf Baumgraphen

PREDICTION OF THIRD PARTY DAMAGE FAILURE FREQUENCY FOR PIPELINES TRANSPORTING MIXTURES OF NATURAL GAS AND HYDROGEN Zhang, L. 1, Adey, R.A.

ROCK HILL AQUATICS CENTER POOL SCHEDULES

Example. The information set is represented by the dashed line.

Project Proposal: Characterization of Tree Crown Attributes with High Resolution Fixed-Base Aerial Photography. by Rich Grotefendt and Rob Harrison

Rotor Design and Analysis of Stall-regulated Horizontal Axis Wind Turbine

tr0 TREES Hanan Samet

I. FORMULATION. Here, p i is the pressure in the bubble, assumed spatially uniform,

An integrated supply chain design model with random disruptions consideration

Bicycle and Pedestrian Master Plan

SHRiMP: Accurate Mapping of Short Color-space Reads

PREDICTION OF ELECTRICAL PRODUCTION FROM WIND ENERGY IN THE MOROCCAN SOUTH

DETC A NEW MODEL FOR WIND FARM LAYOUT OPTIMIZATION WITH LANDOWNER DECISIONS

PlacesForBikes City Ratings Methodology. Overall City Rating

The Dockline AUGUST Board of Directors. Greetings from the Commodore. Jim Manlick

the Susquehanna River. Today, PFBC protects and conserves aquatic species throughout Pennsylvania.

FALL PROTECTION PROGRAM

Incorporating Location, Routing and Inventory Decisions in Dual Sales Channel - A Hybrid Genetic Approach

Experimental and Numerical Studies on Fire Whirls

Tracking of High-speed, Non-smooth and Microscale-amplitude Wave Trajectories

Bubble clustering and trapping in large vortices. Part 1: Triggered bubbly jets investigated by phase-averaging

tr0 TREES Hanan Samet

Advanced Image Tracking Approach for Augmented Reality Applications

EC-FRM: An Erasure Coding Framework to Speed up Reads for Erasure Coded Cloud Storage Systems

THE IMPACTS OF CONGESTION ON COMMERCIAL VEHICLE TOUR CHARACTERISTICS AND COSTS

Watford Half Marathon. Watford Half Marathon. Sunday February 5th Starting at 10.30am. Enjoy Your Run!!! Notice to all Entrants.

Pennsylvania Congestion Management System: PA 100 Corridor. June 2002

EcoMobility World Festival 2013 Suwon: an analysis of changes in citizens awareness and satisfaction

Lecture Topics. Overview ECE 486/586. Computer Architecture. Lecture # 9. Processor Organization. Basic Processor Hardware Pipelining

COMPUTATIONAL INTELLIGENCE AND OPTIMIZATION METHODS APPLIED TO SAFE SHIP S CONTROL PROCESS

READING AREA TRANSPORTATION STUDY BICYCLE AND PEDESTRIAN TRANSPORTATION PLAN ADOPTED NOVEMBER 18, 2010

Watford Half Marathon. Watford Half Marathon. Sunday February 4th Starting at 10.30am. Enjoy Your Run!!! Notice to all Entrants.

High Axial Load Capacity, High speed, Long life. Spherical Roller Bearings

Target Allocation Study for Formation Ship-ToAir Missile System Based on the Missile Fire. Zone Division

AIRPLANE PAVEMENT MARKINGS

Asteroid body-fixed hovering using nonideal solar sails

Assessment of Direct Torque Control of a Double Feed Induction Machine

lack of resolution Gene duplication Organismal tree:

Alternate stable states in coupled fishery-aquaculture systems. Melissa Orobko

Fault tolerant oxygen control of a diesel engine air system

Three-axis Attitude Control with Two Reaction Wheels and Magnetic Torquer Bars

CARDBOARD BOAT BUILDING 101

f i r e - p a r t s. c o m

A Three-Axis Magnetic Sensor Array System for Permanent Magnet Tracking*

POSSIBLE AND REAL POWERFLOWS IN CONNECTED DIFFERENTIAL GEAR DRIVES WITH η 0 <i pq <1/η 0 INNER RATIO

Transcription:

154 25 Jonathan Richad Shewchuk Faste Neaest Neighbos: Voonoi Diagams and k-d Tees SPEEDING UP NEAREST NEIGHBOR CLASSIFIERS Can we pepocess taining pts to obtain sublinea quey time? 2 5 dimensions: Voonoi diagams Medium dim (up to 30): k-d tees Lage dim: locality sensitive hashing [still eseachy, not widely adopted] Lagest dim: exhaustive k-nn, but can use PCA o andom pojection [o anothe dimensionality eduction method] Voonoi Diagams Let P be a point set. The Voonoi cell of w 2 P is Vo w = {p 2 Rd : pw pv 8v 2 P} [A Voonoi cell is always a convex polyhedon o polytope.] The Voonoi diagam of P is the set of P s Voonoi cells. voo.pdf, vomcdonalds.jpg, voonoigegoeichinge.jpg, saltflat-1.jpg [Voonoi diagams sometimes aise in natue (salt flats, gia e, cystallogaphy).]

Faste Neaest Neighbos: Voonoi Diagams and k-d Tees 155 gia e-1.jpg, peovskite.jpg, votex.pdf [Believe it o not, the fist published Voonoi diagam dates back to 1644, in the book Pincipia Philosophiae by the famous mathematician and philosophe René Descates. He claimed that the sola system consists of votices. In each egion, matte is evolving aound one of the fixed stas (votex.pdf). His physics was wong, but his idea of dividing space into polyhedal egions has suvived.] Size (e.g. # of vetices) 2 O(n dd/2e ) [This uppe bound is tight when d is a small constant. As d gows, the tightest asymptotic uppe bound is somewhat smalle than this, but the complexity still gows exponentially with d.]... but often in pactice it is O(n). [Hee I m leaving out a constant that may gow exponentially with d.] Point location: Given quey point q, find the point w 2 P fo which q 2 Vo w. 2D: O(n log n) time to compute V.d. and a tapezoidal map fo pt location O(log n) quey time [because of the tapezoidal map] [That s a petty geat unning time compaed to the linea quey time of exhaustive seach.] dd: Use binay space patition tee (BSP tee) fo pt location [Unfotunately, it s di cult to chaacteize the unning time of this stategy, although it is likely to be easonably fast in 3 5 dimensions.] 1-NN only! [A standad Voonoi diagam suppots only 1-neaest neighbo queies. If you want the k neaest neighbos, thee is something called an ode-k Voonoi diagam that has a cell fo each possible k neaest neighbos. But nobody uses those, fo two easons. Fist, the size of an ode-k Voonoi diagam is O(k 2 n) in 2D, and wose in highe dimensions. Second, thee s no softwae available to compute one.] [Thee ae also Voonoi diagams fo othe distance metics, like the L 1 and L 1 noms.] [Voonoi diagams ae good fo 1-neaest neighbo in 2 o 3 dimensions, maybe 4 o 5, but fo anything beyond that, k-d tees ae much simple and pobably faste.]

156 Jonathan Richad Shewchuk k-d Tees Decision tees fo NN seach. Di eences: [compaed to decision tees] Choose splitting featue w/geatest width: featue i in max i, j,k (X ji X ki ). [With neaest neighbo seach, we don t cae about the entopy. Instead, what we want is that if we daw a sphee aound the quey point, it won t intesect vey many boxes of the decision tee. So it helps if the boxes ae nealy cubical, athe than long and thin.] Cheap altenative: otate though the featues. [We split on the fist featue at depth 1, the second featue at depth 2, and so on. This builds the tee faste, by a facto of O(d).] Choose splitting value: median point fo featue i, o X ji+x ki 2. Median guaantees blog 2 nc tee depth; O(nd log n) tee-building time. [... o just O(n log n) time if you otate though the featues. By contast, splitting nea the cente does moe to impove the aspect atios of the boxes, but it could unbalance you tee. You can altenate between medians at odd depths and centes at even depths, which also guaantees an O(log n) depth.] Each intenal node stoes a sample point. [... that lies in the node s box. Usually the splitting point.] [Some k-d tee implementation have points only at the leaves, but it s usually bette to have points in intenal nodes too, so when we seach the tee, we might stop seaching ealie.] 1 5 7 9 6 2 10 3 4 6 11 8 2 1 10 5 4 7 8 3 9 [Daw this by hand. kdteestuctue.pdf ] oot epesents R 2 ight halfplane lowe ight quate plane 11 Goal: given quey pt q, find a sample pt w such that qw apple(1 + ) qs, whee s is the closest sample pt. = 0 ) exact NN; >0 ) appoximate NN. The alg. maintains: Neaest neighbo found so fa (o k neaest). goes down # Binay heap of unexploed subtees, keyed by distance fom q. goes up " q neaest so fa [Daw this by hand. kdteequey.pdf ] [A quey in pogess.] [Each subtee epesents an axis-aligned box. The quey ties to avoid seaching most of the subtees by seaching the boxes close to q fist. We measue the distance fom q to a box and use it as a key fo the subtee in the heap. The seach stops when the distance to the kth-neaest neighbo found so fa apple the distance to the neaest unexploed box (divided by 1 + ). Fo example, in the figue above, the quey neve visits the box at fa uppe left o the box at fa lowe ight, because those boxes don t intesect the cicle.]

Faste Neaest Neighbos: Voonoi Diagams and k-d Tees 157 Q heap containing oot node with key zeo 1 while Q not empty and minkey(q) < 1+ B emovemin(q) w B s sample point min{, qw } [Optimization: stoe squae of instead.] B 0, B 00 child boxes of B if dist(q, B 0 ) < 1+ then inset(q, B0, dist(q, B 0 )) [The key fo B 0 is dist(q, B 0 )] if dist(q, B 00 ) < 1+ then inset(q, B00, dist(q, B 00 )) etun point that detemined Fo k-nn, eplace with a max-heap holding the k neaest neighbos [... just like in the exhaustive seach algoithm I discussed last lectue.] Woks with any L p nom fo p 2 [1, 1]. [k-d tees ae not limited to the Euclidean (L 2 ) nom.] Why -appoximate NN? q [Daw this by hand. kdteepoblem.pdf ] [A wost-case exact NN quey.] [In the wost case, we may have to visit evey node in the k-d tee to find the neaest neighbo. In that case, the k-d tee is slowe than simple exhaustive seach. This is an example whee an appoximate neaest neighbo seach can be much faste. In pactice, settling fo an appoximate neaest neighbo sometimes impoves the speed by a facto of 10 o even 100, because you don t need to look at most of the tee to do a quey. This is especially tue in high dimensions emembe that in high-dimensional space, the neaest point often isn t much close than a lot of othe points.] Softwae: ANN (David Mount & Sunil Aya, U. Mayland) FLANN (Maius Muja & David Lowe, U. Bitish Columbia) GeRaF (Geogios Samaas, U. Athens) [andom foests!]

158 Jonathan Richad Shewchuk Example: im2gps [I want to emphasize the fact that exhaustive neaest neighbo seach eally is one of the fist classifies you should ty in pactice, even if it seems too simple. So hee s an example of a moden eseach pape that uses 1-NN and 120-NN seach to solve a poblem.] Pape by James Hays and [ou own] Pof. Alexei Efos. [Goal: given a quey photogaph, detemine whee on the planet the photo was taken. Called geolocalization. They evaluated both 1-NN and 120-NN with a complex set of featues. What they did not do, howeve, is teat each photogaph as one long vecto. That s okay fo tiny digits, but too expensive fo millions of tavel photogaphs. Instead, they educed each photo to a small descipto made up of a vaiety of featues that extact the essence of each photo.] [Show slides (im2gps.pdf). Soy, images not included hee. http://gaphics.cs.cmu.edu/pojects/im2gps/] [Featues, in ough ode fom most e ective to least: 1. GIST: A compact image descipto based on oiented edge detection (Gabo filtes) + histogams. 2. Textons: A histogam of textues, ceated afte assembling a dictionay of common textues. 3. A shunk 16 16 image. 4. A colo histogam. 5. Anothe histogam of edges, this one based on the Canny edge detecto, invented by ou own Pof. John Canny. 6. A geometic descipto that s paticulaly good fo identifying gound, sky, and vetical lines.] [Bottom line: With 120-NN, thei most sophisticated implementation came within 64 km of the coect location about 50% of the time.] RELATED CLASSES [If you like machine leaning and you ll still be hee next yea, hee ae some couses you might want to take.] CS C281A (sping): Statistical Leaning Theoy [C281A is the most diect continuation of CS 189/289A.] EE 127 (sping), EE 227BT (fall): Numeical Optimization [a coe pat of ML] [It s had to oveemphasize the impotance of numeical optimization to machine leaning, as well as othe CS fields like gaphics, theoy, and scientific computing.] EE 126 (both): Random Pocesses [Makov chains, expectation maximization, PageRank] EE C106A/B (fall/sping): Into to Robotics [dynamics, contol, sensing] Math 110: Linea Algeba [but the eal gold is in Math 221] Math 221: Matix Computations [how to solve linea systems, compute SVDs, eigenvectos, etc.] CS 194-26 (fall): Computational Photogaphy (Efos) CS 294-43 (fall): Visual Object and Activity Recognition (Efos/Daell) CS 294-112 (fall): Deep Reinfocement Leaning (Levine) CS 298-115 (fall): Algoithmic Human-Robot Inteaction (Dagan) CS 298-131 (fall): Special Topics in Deep Leaning (Song/Daell) VS 265 (?): Neual Computation CS C280 (?): Compute Vision CS C267 (?): Scientific Computing [paallelization, pactical matix algeba, some gaph patitioning]