CS 75 Machne Learnng Lecture 4 ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 75 Machne Learnng Announcements Homework ue on Wednesday before the class Reports: hand n before the class rograms: submt electroncally Collaboratons on homeworks: You may dscuss materal wth your fellow students but the report and programs should be wrtten ndvduall CS 75 Machne Learnng
Outlne Outlne: ensty estmaton. Bernoull dstrbuton. Bnomal CS 75 Machne Learnng ensty estmaton ata: {.. n} x a vector of attrbute values Attrbutes: modeled by random varables X { X X K X d} wth: Contnuous values screte values E.g. blood pressure wth numercal values or chest pan wth dscrete values [no-pan mld moderate strong] Underlyng true probablty dstrbuton: px CS 75 Machne Learnng
ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: try to estmate the underlyng true probablty dstrbuton over varables X px usng examples n true dstrbuton n samples p X.. } { n estmate pˆ X Standard d assumptons: Samples are ndependent of each other come from the same dentcal dstrbuton fxed px CS 75 Machne Learnng ensty estmaton Types of densty estmaton: arametrc the dstrbuton s modeled usng a set of parameters Θ p X Θ Example: mean and covarances of multvarate normal Estmaton: fnd parameters Θ descrbng data on-parametrc The model of the dstrbuton utlzes all examples n As f all examples were parameters of the dstrbuton Examples: earest-neghbor Sem-parametrc CS 75 Machne Learnng
Learnng va parameter estmaton In ths lecture we consder parametrc densty estmaton Basc settngs: A set of random varables X { X X K X d} A model of the dstrbuton over varables n X wth parameters Θ : pˆ X Θ ata.. } { n Objectve: fnd the descrpton of parameters observed data Θ so they ft the CS 75 Machne Learnng arameter estmaton. Maxmum lkelhood ML maxmze p Θ yelds: one set of parameters Θ ML the target dstrbuton s approxmated as: pˆ X p X Θ ML Bayesan parameter estmaton uses the posteror dstrbuton over possble parameters p Θ p Θ p Θ p Yelds: all possble settngs of Θ and ther weghts The target dstrbuton s approxmated as: p ˆ X p X p X Θ p Θ dθ Θ CS 75 Machne Learnng
arameter estmaton. Other possble crtera: Maxmum a posteror probablty MA maxmze p Θ mode of the posteror Yelds: one set of parameters Θ MA Approxmaton: pˆ X p X Θ MA Expected value of the parameter Θˆ E Θ mean of the posteror Expectaton taken wth regard to posteror p Θ Yelds: one set of parameters Approxmaton: p ˆ X p X Θˆ CS 75 Machne Learnng Example: Bernoull dstrbuton. Con example: we have a con that can be based Outcomes: two possble values -- head or tal ata: a sequence of outcomes x such that head x tal x Model: probablty of a head probablty of a tal Objectve: We would lke to estmate the probablty of a head ˆ robablty of an outcome x x x x Bernoull dstrbuton CS 75 Machne Learnng
Maxmum lkelhood ML estmate. Lkelhood of data: n x Maxmum lkelhood estmate ML Optmze log-lkelhood arg max n x x l log log n x log x log log - number of heads seen - number of tals seen CS 75 Machne Learnng n x x log n x Maxmum lkelhood ML estmate. Optmze log-lkelhood l log log Set dervatve to zero Solvng l ML Soluton: ML CS 75 Machne Learnng
CS 75 Machne Learnng Bayesan parameter estmate osteror dstrbuton How to choose the pror probablty? p p va Bayes rule - s the lkelhood of data p - s the pror probablty on x n x CS 75 Machne Learnng ror dstrbuton p Choce of the pror: dstrbuton dstrbuton fts bnomal samplng - conjugate choces p Why? dx e x a a x - Gamma functon arameters:
dstrbuton 3.5 3.5 β.5.5 β.5.5 β5.5.5.5...3.4.5.6.7.8.9 CS 75 Machne Learnng MA soluton Maxmum a posteror estmate Selects the mode of the posteror dstrbuton MA arg max p MA soluton for pror p MA Soluton: MA CS 75 Machne Learnng
Bayesan framework Both ML or MA estmates pck one value of the parameter Assume: there are two dfferent parameter settngs that are close n terms of ther probablty values. Usng only one of them may ntroduce a strong bas f we use them for example for predctons. Bayesan parameter estmate Remedes the lmtaton of one choce Uses all possble parameter values Where p The posteror can be used to defne pˆ X : p ˆ X p X p X Θ p Θ dθ Θ CS 75 Machne Learnng Bayesan framework redctve probablty of an outcome x n the next tral x x x p d p d E osteror densty Equvalent to the expected value of the parameter expectaton s taken wth regard to the posteror dstrbuton p CS 75 Machne Learnng
CS 75 Machne Learnng Expected value of the parameter How to obtan the expected value? d d E d d ote: for nteger values of CS 75 Machne Learnng Expected value of the parameter Substtutng the results for the posteror: We get ote that the mean of the posteror s yet another reasonable parameter choce: E p ˆ E
Bnomal dstrbuton. Example problem: a based con Outcomes: two possble values -- head or tal ata: a set of order-ndependent outcomes We treat as a mult-set!!! - number of heads seen - number of tals seen Model: probablty of a head probablty of a tal Objectve: We would lke to estmate the probablty of a head ˆ robablty of an outcome Bnomal dstrbuton CS 75 Machne Learnng Maxmum lkelhood ML estmate. Lkelhood of data: Log-lkelhood!!! log l log log log!! Constant from the pont of optmzaton!!! ML Soluton: ML CS 75 Machne Learnng The same as for Bernoull and wth d sequence of examples!
CS 75 Machne Learnng osteror densty osteror densty ror choce Lkelhood osteror MA estmate max arg p MA p p va Bayes rule p p MA CS 75 Machne Learnng Expected value of the parameter The result s the same as for Bernoull dstrbuton Expected value of the parameter redctve probablty of event x d E E E x