CS 188: Artificil Intelligence Spring 2010 Lecture 21: DBNs, Viteri, Speech Recognition 4/8/2010 Written 6 due tonight Project 4 up! Due 4/15 strt erly! Announcements Course contest updte Plnning to post y Fridy night Pieter Aeel UC Berkeley 2 P4: Ghostusters Tody Plot: Pcmn's grndfther, Grndpc, lerned to hunt ghosts for sport. Noisy distnce pro True distnce = 8 15 Dynmic Byes Nets (DBNs) [sometimes clled temporl Byes nets] He ws linded y his power, ut could her the ghosts nging nd clnging. 1 11 HMMs: Most likely explntion queries Trnsition Model: All ghosts move rndomly, ut re sometimes ised Emission Model: Pcmn knows noisy distnce to ech ghost 9 7 5 1 A mssive HMM! Detils of this section not required Strt mchine lerning 4 Dynmic Byes Nets (DBNs) Exct Inference in DBNs We wnt to trck multiple vriles over time, using multiple sources of evidence Ide: Repet fixed Byes net structure t ech time Vriles from time t cn condition on those from t-1 t =1 t =2 t = Vrile elimintion pplies to dynmic Byes nets Procedure: unroll the network for T time steps, then eliminte vriles until P(X T e 1:T ) is computed t =1 t =2 t = G Discrete vlued dynmic Byes nets re lso HMMs Online elief updtes: Eliminte ll vriles from the previous time step; store fctors for current time only 6 1
DBN Prticle Filters A prticle is complete smple for time step Initilize: Generte prior smples for the t=1 Byes net Exmple prticle: = (,) = (5,) Elpse time: Smple successor for ech prticle Exmple successor: = (2,) = (6,) Oserve: Weight ech entire smple y the likelihood of the evidence conditioned on the smple Likelihood: P( ) * P( ) SLAM SLAM = Simultneous Locliztion And Mpping We do not know the mp or our loction Our elief stte is over mps nd positions! Min techniques: Klmn filtering (Gussin HMMs) nd prticle methods [DEMOS] [intel-l-rw-odo.wmv, intel-l-scn-mtching.wmv, visionslm_helioffice.wmv] Resmple: Select prior smples (tuples of vlues) in proportion to their likelihood 8 Tody Dynmic Byes Nets (DBNs) [sometimes clled temporl Byes nets] HMMs: Most likely explntion queries A mssive HMM! Detils of this section not required Strt mchine lerning 11 Speech nd Lnguge Speech technologies Automtic speech recognition (ASR) Text-to-speech synthesis (TTS) Dilog systems Lnguge processing technologies Mchine trnsltion Informtion extrction We serch, question nswering Text clssifiction, spm filtering, etc HMMs: MLE Queries Stte Pth Trellis HMMs defined y Sttes X Oservtions E Initil distr: Trnsitions: Emissions: Query: most likely explntion: X 1 X 2 X X 4 X E 4 E 1 Stte trellis: grph of sttes nd trnsitions over time rin rin Ech rc represents some trnsition Ech rc hs weight Ech pth is sequence of sttes The product of weights on pth is the seq s proility Cn think of the Forwrd (nd now Viteri) lgorithms s computing sums of ll pths (est pths) in this grph rin rin 14 2
Viteri Algorithm Exmple rin rin rin rin 15 16 Tody Dynmic Byes Nets (DBNs) [sometimes clled temporl Byes nets] Digitizing Speech HMMs: Most likely explntion queries A mssive HMM! Detils of this section not required Strt mchine lerning 17 18 Speech in n Hour Speech input is n coustic wve form s p ee ch l Spectrl Anlysis Frequency gives pitch; mplitude gives volume smpling t ~8 khz phone, ~16 khz mic (khz=1000 cycles/sec) s p ee ch l l to trnsition: Fourier trnsform of wve displyed s spectrogrm drkness indictes energy t ech frequency Grphs from Simon Arnfield s we tutoril on speech, Sheffield: 19 http://www.psyc.leeds.c.uk/reserch/cogn/speech/tutoril/ 20
Adding 100 Hz + 1000 Hz Wves Spectrum 0.99 Frequency components (100 nd 1000 Hz) on x-xis 0 Amplitude 0.9654 0 0.05 Time (s) 21 100 Frequency in Hz 1000 22 Prt of [e] from l Bck to Spectr Spectrum represents these freq components Computed y Fourier trnsform, lgorithm which seprtes out ech frequency component of wve. Note complex wve repeting nine times in figure Plus smller wves which repets 4 times for every lrge pttern Lrge wve hs frequency of 250 Hz (9 times in.06 seconds) Smll wve roughly 4 times this, or roughly 1000 Hz Two little tiny wves on top of pek of 1000 Hz wves 2 x-xis shows frequency, y-xis shows mgnitude (in deciels, log mesure of mplitude) Peks t 90 Hz, 1860 Hz, nd 020 Hz. 25 Resonnces of the vocl trct The humn vocl trct s n open tue Closed end Open end Length 17.5 cm. Air in tue of given length will tend to virte t resonnce frequency of tue. Constrint: Pressure differentil should e mximl t (closed) glottl end nd miniml t (open) lip end. 26 Figure from W. Brry Speech Science slides From Mrk Liermn s wesite 28 4
Acoustic Feture Sequence Time slices re trnslted into coustic feture vectors (~9 rel numers per slice)..e 12 e 1 e 14 e 15 e 16.. These re the oservtions, now we need the hidden sttes X Stte Spce P(E X) encodes which coustic vectors re pproprite for ech phoneme (ech kind of sound) P(X X ) encodes how sounds cn e strung together We will hve one stte for ech sound in ech word From some stte x, cn only: Sty in the sme stte (e.g. speking slowly) Move to the next position in the word At the end of the word, move to the strt of the next word We uild little stte grph for ech word nd chin them together to form our stte spce X 29 0 HMMs for Speech Decoding While there re some prcticl issues, finding the words given the coustics is n HMM inference prolem We wnt to know which stte sequence x 1:T is most likely given the evidence e 1:T : 1 From the sequence x, we cn simply red off the words End of Prt II! Now we re done with our unit on proilistic resoning Lst prt of clss: mchine lerning Prmeter Estimtion Estimting the distriution of rndom vrile Elicittion: sk humn! Usully need domin experts, nd sophisticted wys of eliciting proilities (e.g. etting gmes) Troule clirting Empiriclly: use trining dt For ech outcome x, look t the empiricl rte of tht vlue: r g g This is the estimte tht mximizes the likelihood of the dt 4 5