CS 188: Artificil Intelligence Spring 2011 Lecture 19: Dynmic Byes Nets, Nïve Byes 4/6/2011 Pieter Aeel UC Berkeley Slides dpted from Dn Klein. Announcements W4 out, due next week Mondy P4 out, due next week Fridy Mid-semester survey 2 1
Course contest Announcements II Regulr tournments. Instructions hve een posted! First week extr credit for top 20, next week top 10, then top 5, then top 3. First nightly tournment: tenttively Mondy night 3 P4: Ghostusters 2.0 Plot: Pcmn's grndfther, Grndpc, lerned to hunt ghosts for sport. He ws linded y his power, ut could her the ghosts nging nd clnging. Trnsition Model: All ghosts move rndomly, ut re sometimes ised Emission Model: Pcmn knows noisy distnce to ech ghost Noisy distnce pro True distnce = 8 15 13 11 9 7 5 3 1 2
Tody Dynmic Byes Nets (DBNs) [sometimes clled temporl Byes nets] Demos: Locliztion Simultneous Locliztion And Mpping (SLAM) Strt mchine lerning 5 Dynmic Byes Nets (DBNs) We wnt to trck multiple vriles over time, using multiple sources of evidence Ide: Repet fixed Byes net structure t ech time Vriles from time t cn condition on those from t-1 t =1 t =2 t =3 G 1 G 2 G 3 G 1 G 2 G 3 E 1 E 1 E 2 E 2 E 3 E 3 Discrete vlued dynmic Byes nets re lso HMMs 3
Exct Inference in DBNs Vrile elimintion pplies to dynmic Byes nets Procedure: unroll the network for T time steps, then eliminte vriles until P(X T e 1:T ) is computed t =1 t =2 t =3 G 1 G 2 G 3 G 1 G 2 G 3 E 1 E 1 E 2 E 2 E 3 E 3 Online elief updtes: Eliminte ll vriles from the previous time step; store fctors for current time only 7 DBN Prticle Filters A prticle is complete smple for time step Initilize: Generte prior smples for the t=1 Byes net Exmple prticle: G 1 = (3,3) G 1 = (5,3) Elpse time: Smple successor for ech prticle Exmple successor: G 2 = (2,3) G 2 = (6,3) Oserve: Weight ech entire smple y the likelihood of the evidence conditioned on the smple Likelihood: P(E 1 G 1 ) * P(E 1 G 1 ) Resmple: Select prior smples (tuples of vlues) in proportion to their likelihood [Demo] 8 4
DBN Prticle Filters A prticle is complete smple for time step Initilize: Generte prior smples for the t=1 Byes net Exmple prticle: G 1 = (3,3) G 1 = (5,3) Elpse time: Smple successor for ech prticle Exmple successor: G 2 = (2,3) G 2 = (6,3) Oserve: Weight ech entire smple y the likelihood of the evidence conditioned on the smple Likelihood: P(E 1 G 1 ) * P(E 1 G 1 ) Resmple: Select prior smples (tuples of vlues) in proportion to their likelihood 9 Trick I to Improve Prticle Filtering Performnce: Low Vrince Resmpling Advntges: More systemtic coverge of spce of smples If ll smples hve sme importnce weight, no smples re lost Lower computtionl complexity 5
Trick II to Improve Prticle Filtering Performnce: Regulriztion If no or little noise in trnsitions model, ll prticles will strt to coincide à regulriztion: introduce dditionl (rtificil) noise into the trnsition model SLAM SLAM = Simultneous Locliztion And Mpping We do not know the mp or our loction Our elief stte is over mps nd positions! Min techniques: Klmn filtering (Gussin HMMs) nd prticle methods [DEMOS] DP-SLAM, Ron Prr 6
Root Locliztion In root locliztion: We know the mp, ut not the root s position Oservtions my e vectors of rnge finder redings Stte spce nd redings re typiclly continuous (works siclly like very fine grid) nd so we cnnot store B(X) Prticle filtering is min technique [Demos] Glol-floor SLAM SLAM = Simultneous Locliztion And Mpping We do not know the mp or our loction Stte consists of position AND mp! Min techniques: Klmn filtering (Gussin HMMs) nd prticle methods 7
Prticle Filter Exmple 3 prticles mp of prticle 1 mp of prticle 3 mp of prticle 2 15 SLAM DEMOS fstslm.vi, visionslm_helioffice.wmv 8
Further redings We re done with Prt II Proilistic Resoning To lern more (eyond scope of 188): Koller nd Friedmn, Proilistic Grphicl Models (CS281A) Thrun, Burgrd nd Fox, Proilistic Rootics (CS287) Prt III: Mchine Lerning Up until now: how to reson in model nd how to mke optiml decisions Mchine lerning: how to cquire model on the sis of dt / experience Lerning prmeters (e.g. proilities) Lerning structure (e.g. BN grphs) Lerning hidden concepts (e.g. clustering) 9
Mchine Lerning Tody An ML Exmple: Prmeter Estimtion Mximum likelihood Smoothing Applictions Min concepts Nïve Byes Prmeter Estimtion r g g r g g r g g r r g g g g Estimting the distriution of rndom vrile Elicittion: sk humn (why is this hrd?) Empiriclly: use trining dt (lerning!) E.g.: for ech outcome x, look t the empiricl rte of tht vlue: r g g This is the estimte tht mximizes the likelihood of the dt Issue: overfitting. E.g., wht if only oserved 1 jelly en? 10
Estimtion: Smoothing Reltive frequencies re the mximum likelihood estimtes In Byesin sttistics, we think of the prmeters s just nother rndom vrile, with its own distriution???? Estimtion: Lplce Smoothing Lplce s estimte: Pretend you sw every outcome once more thn you ctully did H H T Cn derive this s MAP estimte with Dirichlet priors (see cs281) 11
Estimtion: Lplce Smoothing Lplce s estimte (extended): Pretend you sw every outcome k extr times H H T Wht s Lplce with k = 0? k is the strength of the prior Lplce for conditionls: Smooth ech condition independently: Exmple: Spm Filter Input: emil Output: spm/hm Setup: Get lrge collection of exmple emils, ech leled spm or hm Note: someone hs to hnd lel ll this dt! Wnt to lern to predict lels of new, future emils Fetures: The ttriutes used to mke the hm / spm decision Words: FREE! Text Ptterns: $dd, CAPS Non-text: SenderInContcts Der Sir. First, I must solicit your confidence in this trnsction, this is y virture of its nture s eing utterly confidencil nd top secret. TO BE REMOVED FROM FUTURE MAILINGS, SIMPLY REPLY TO THIS MESSAGE AND PUT "REMOVE" IN THE SUBJECT. 99 MILLION EMAIL ADDRESSES FOR ONLY $99 Ok, Iknow this is ltntly OT ut I'm eginning to go insne. Hd n old Dell Dimension XPS sitting in the corner nd decided to put it to use, I know it ws working pre eing stuck in the corner, ut when I plugged it in, hit the power nothing hppened. 12
Exmple: Digit Recognition Input: imges / pixel grids Output: digit 0-9 Setup: Get lrge collection of exmple imges, ech leled with digit Note: someone hs to hnd lel ll this dt! Wnt to lern to predict lels of new, future digit imges 0 1 2 Fetures: The ttriutes used to mke the digit decision Pixels: (6,8)=ON Shpe Ptterns: NumComponents, AspectRtio, NumLoops 1?? Other Clssifiction Tsks In clssifiction, we predict lels y (clsses) for inputs x Exmples: Spm detection (input: document, clsses: spm / hm) OCR (input: imges, clsses: chrcters) Medicl dignosis (input: symptoms, clsses: diseses) Automtic essy grder (input: document, clsses: grdes) Frud detection (input: ccount ctivity, clsses: frud / no frud) Customer service emil routing mny more Clssifiction is n importnt commercil technology! 13
Importnt Concepts Dt: leled instnces, e.g. emils mrked spm/hm Trining set Held out set Test set Fetures: ttriute-vlue pirs which chrcterize ech x Experimenttion cycle Lern prmeters (e.g. model proilities) on trining set (Tune hyperprmeters on held-out set) Compute ccurcy of test set Very importnt: never peek t the test set! Evlution Accurcy: frction of instnces predicted correctly Overfitting nd generliztion Wnt clssifier which does well on test dt Overfitting: fitting the trining dt very closely, ut not generlizing well We ll investigte overfitting nd generliztion formlly in few lectures Trining Dt Held-Out Dt Test Dt 14