World Congress of Inernaional Fuzzy Syses Associaion 2011 and Asia Fuzzy Syses Sociey Inernaional Conference 2011, Surabaya-Bali, Indonesia, 21-25 June 2011, ISB: 978-602-99359-0-5 Fuzzy Hidden arkov odels For Indonesian Speech Classificaion *Inan ura Yulia Telko Insiue of Technology inanura@gail.co The Houw Liong Telko Insiue of Technology houwhee@yahoo.co.id Adiwiaya Telko Insiue of Technology adw@ielko.ac.id Absrac: Indonesia has a lo of ribe, so ha here are a lo of dialecs. Speech classificaion is difficul if he daabase uses speech signals fro various people who have differen characerisics because of gender and dialec. The differen characerisics will influence frequency, inonaion, apliude, and period of he speech. I akes he syse us be rained for he various eplaes reference of speech signal. Therefore, his sudy has been developed for Indonesian speech classificaion. This sudy designs he soluion of he differen characerisics for Indonesian speech classificaion. The soluion cobines Fuzzy on Hidden arkov odels. The new design of fuzzy Hidden arkov odels will be proposed in his sudy. The odels will consis of Fuzzy C-eans Clusering which will be designed o subsiue he vecor quanizaion process and a new forward and backward ehod o handle he ebership degree of daa. The resul shows FH is beer han H and he iproveen was 3.33 %. Keywords: Fuzzy, Hidden arkov odels, Indonesian, Speech, Classificaion 1 ITRODUCTIO Over he pas several decades, he speech classificaion echnology has been uch done. There are any approaches o Speech Classificaion for exaple eplae-based, knowledge based, and sochasic-based approaches [13]. The successful resuls were he hidden arkov odel (H [8]. Oher resuls were Arificial eural ework [11], Suppor Vecor achine [10], Fuzzy [12] and Clusering [14]. Speech classificaion is a "language-dependen" syse. The applicaion of classificaion in a language canno be applied ino anoher language because each language has a lis of phonees. A lo of sudies have been carried ou abroad, bu i canno be applied well in Indonesian. English speech recogniion is he os speech recogniion syse and has been developed in references [2, 3, 7, 8, 9, 10, 11, 12, 14]. The nuber of sudies which has done Indonesia speech classificaion is sill few. They have done Indonesian speech classificaion based on speaker adapaion syse [6], and developed he corpus of Indonesian Speech Classificaion [4]. Speech classificaion is difficul because speech has soe unique characerisics. In differen ie, a sae word has differen for alhough has been spoken fro sae person. So speech classificaion is ore difficul if he daabase uses speech signals fro various people who have differen characerisics because of gender and dialec. The differen characerisics will influence frequency, inonaion, apliude, and period of he speech. I akes he syse us be rained for he various eplaes reference of speech signal. Therefore, a sudy sill needs o be conduced. Hidden arkov odels is a coon approach used o classify speech. However, a ehod is needed o develop a soluion fro he above proble, and for Indonesian Speech classificaion. This sudy designs i. The soluion cobines Fuzzy on Hidden arkov odels. Fuzzy handles varian fors of speech ore properly han here is no fuzzy. If he nuber of varian is higher, hen he area of each cluser of Fuzzy C-eans Clusering is wider. Acually soe sudy has cobined fuzzy on H [5, 7, 8, 11] bu hey were no designed o solve he differen characerisics proble in speech dialec and for Indonesian Speech. The new design of fuzzy Hidden arkov odels is proposed in his sudy. The odel consiss of Fuzzy C-eans Clusering which is designed o subsiue he vecor quanizaion process and a new forward and backward ehod o handle he ebership degree of daa. 2 ATERIALS AD ETHODS 2.1 Raw daa obaining In his sudy, i was used he speech recogniion daa fro Research and Developen Cener of Telko. Daa collecion was conduced in a soundproof roo (i eans here is no noise in speech and he nuber of involved speakers was 70 people. Experiens were perfored on speech daa se wih various characerisics dialec and gender. In his daa, soe dialec of speaker ribes in Indonesia was used, hey were Sundanese, Javanese, Baak, Beawi, Balinese bu here was no inforaion how uch heir proporion. The daa se was divided ino raining daa (80% of daa se and esing daa (20% of daa se. The speakers of raining daa and esing daa were differen because our speech classificaion was speaker independen syse. Table 1 liss he used words for he raining daa. Table1. Training daa Words Sounds Inforaion Balaysalasa /balaysalasa/ and 101 files fro /baleysalasa/ ale and feale Lubuklinggaw /lubuklinggaw/ and 101 files fro //lubuklinggo/ ale and feale Prabuulih /prabuulih/ and 101 files fro
World Congress of Inernaional Fuzzy Syses Associaion 2011 and Asia Fuzzy Syses Sociey Inernaional Conference 2011, Surabaya-Bali, Indonesia, 21-25 June 2011, ISB: 978-602-99359-0-5 /prabuuleh/ Tanungeni /anungeni/ and / anungéni/ Tarepa /arepa/ and /arépa/ ale and feale 100 files fro ale and feale 98 files fro ale and feale Table 1 shows he exreely differen sounds of each word. 2.2 Preprocessing The purpose of preprocessing is o ake all signal inpus confor wih he required specificaions in he syse [2]. The firs sep is cenering, i ais a shifing he locaion of he discree apliude disribuion and i akes is cener locae he axis y = 0. Thus, cenering akes he average apliude of he signal o zero. The nex sep is noralizaion, he process o equalize he axiu apliude of he sound signal. oralizaion is done by dividing each discree apliude values wih he axiu apliude value. 2.3 Feaure Exracion This process ais a obaining he characerisics of he voice signal. In his sudy, FCC is ipleened for feaure exracion. I produces 24 paraeer values. They are 12 Cepsral values and 12 firs-order derivaive value of hese Cepsral. The oupu of his process is ha every speech is divided ino a nuber of fraes and each frae will have 24 feaure values. 2.4 Vecor Quanizaion (VQ Basically, he oupu of feaure exracion is shorer han he original signal. However, in order o process H, an observaion sequence is needed [2]. The observaion represens all variaion of exising Cepsral. VQ is used for he foraion of discree sybols (codebook fro a series of observaions of he H odel for esiaing he vecor represenaion of he shorer er. VQ process is divided ino wo sages: he foraion of codebook and he codebook index deerinaion. When consrucing codebook, he inpu feaure vecor of he VQ is a whole variey of known voice signal. By using clusering algorihs, feaure vecor will be grouped ino clusers. The cluser cener is called codebook. Afer he codebook is consruced, he nex sep of VQ can be done by replacing a feaure vecor wih one vecor codebook ha has he salles Euclidean disance. The oupu of VQ is he inpu of Hidden arkov odels. 2.5 Hidden arkov odels (H H is a arkov chain ha is oupu sybol describes he chances of oupu sybol ransiions [3, 9]. Observaions for each sae are described separaely by a probabiliy funcion or densiy funcion (probabiliy densiy funcion, which is defined as an opporuniy o produce a ransiion beween saes. Unle he observable arkov odel (O, H consiss of a series of double sochasic process ha priarily process canno be direcly observable (hidden bu can only be observed hrough anoher se of sochasic processes ha produce a range of observaions. 2.5.1 Basic Eleen H as a discree observaion sybol has he following eleens [3, 9]: 1. H consiss of saes, hey are labeled by {1, 2,..} and sae o- is given by q. is esed paraeer in his sudy. 2. uber of observaion sybols (. Observaion sybol is he oupu being odeled. V= {V 1,., V } 3. Transiion probabiliy disribuion fro one sae o anoher sae (A A= {a i }, 1 i, 4. Observaion probabiliy disribuion of k h sybol in he h sae (B B= {b (V k }, 1 i, i 5. Iniial sae probabiliy disribuion π i π i = P (q 1 =i, 1 i H requires specificaion of wo odel paraeers and. A, B, and π are easured. H noaions are usually wrien wih λ (odel = (A, B, π 2.5.2 Basic proble and soluion There are hree basic probles in H o be solved, naely [3, 9]: 1. If a given observaion O= {O 1, O 2,.., O T } and odel evaluaion λ =(A, B, π, how o calculae he efficien probabiliy of observaions series? 2. If a given observaion O= {O 1, O 2,.., O T } and odel evaluaion λ =(A, B, π, how o choose he opial saes series ha represen he observaion? 3. How o se he paraeers of he odel evaluaion λ =(A, B, π o axiize he probabiliy O λ value? The soluion o he proble above is [3, 9]: 1. Evaluaion (Evaluaion of opporuniies The used coon ehod is o exaine every possible sequence of saes along he T (he nuber of observaions. I is no efficien. Anoher sipler procedure is forward and backward procedures. A. Forward procedure Forward variable (α (i a -ie and i-sae is defined by α (i= O 1, O 2,.., O T, q =i λ The forward opporuniies funcion can be solved for - sae and T-sybol inducively wih he following seps: a Iniializaion : 1( i 1bi ( O1, 1 i (1 b Inducion : ( i b ( O (2 i ( 1 i 1 1 1 (i,, 1 T-1 c Terinaion : O ( i T i (3 1 Forward probabiliy is calculaed based on he Trellis diagra paern. There are n poins each ie slo in he paern. All possible sequence is cobined o saes.
World Congress of Inernaional Fuzzy Syses Associaion 2011 and Asia Fuzzy Syses Sociey Inernaional Conference 2011, Surabaya-Bali, Indonesia, 21-25 June 2011, ISB: 978-602-99359-0-5 B. Backward procedure Backward variable β (i in ie o and i-sae is defined by β (i = P (O +1, O +2, O T, q =1 λ. Sep backward procedure is as follows: a Iniializaion : β (i = 1, 1 i (4 each cluser. The daa is used o be he Fuzzy Hidden arkov odels inpu. b Inducion : b ( O ( (5 ( i 1 i 1 1 1 (i,, =T-1,T-2,.. 1 To obain he sae o he i h ie and he rows of observaions a ie +1, hen i is assued ha he possible -sae a ie +1, o obain a ransiion fro i o, and rows of observaion on he -h sae. Then i calculaes he observaion of he -sae. C. Forward-backward procedure The cobinaion of forward and backward procedure can be used o obain he values of P (O λ Opporuniy in he sae a -ie of he sae before ie -1 can be calculaed wih he funcion of he forward opporuniies α (i. Backward probabiliy funcion is used o calculae he probabiliy of observaion sybol sequence ha i is sared fro ie + 1 o T. By aheaical calculaion, using a forwardbackward procedure is illusraed as he following forula: O ( i ( ( i ( i (6 i1 1 2. Decoding The second proble is looking for he hidden sae sequence (hidden for a sequence of generaed observaions fro odel. The soluion is used o find he opial sae sequence. I is Vierbi algorih (dynaic prograing. Vierbi algorih axiizes he probabiliy value Q O, λ so i will produce he opial sae sequence. Based on he Bayes rule, aheaically i is expressed as his forula: Q, O Q O, (7 O 3. The hird proble soluion is o adus he (raining paraeers based on cerain opial crierion. The usual ehod o solve his hird proble is he Bau-Welch algorih. This algorih is an ieraive ehod ha works o find he values of local axiu of he probabiliy funcion. This raining process coninues unil a criical sae is e. The odel resul should be beer raining han he previous odel. 2.6 Fuzzy Hidden arkov odels (FH The proposed FH does no ipleen vecor quanizaion. The subsiued process is Fuzzy C-eans Clusering. Fuzzy C-eans Clusering has wo funcions. Firs, i obains he codebook by Clusering processing, he codebook is a cluser cener. Second, i changes he feaure exracion oupu o be he daa wih ebership degree for i 1 i1 Figure 1. Speech classificaion using FH Fro he block diagra above can be elaboraed ha he syse is designed o have 2 ways (raining and esing. Boh ways have soe sae sage. They are preprocessing, feaure exracion, and Fuzzy C-eans Clusering. The syse inpu is speech. The speech is noralized. The noralized speech is exraced by feaure exracion processing. The raining of Fuzzy C-eans Clusering process is done o ge codebook. Afer he codebook is consruced, he nex sep can be done by replacing a feaure vecor wih a row of frae ebership degree for each cluser. The esing of Fuzzy C-eans Clusering replaces a feaure vecor wih a row of frae ebership degree for each cluser. Afer Fuzzy C-eans Clusering, he raining does re-esiaion process for FH and he esing process decided he os siilar reference odel. The syse oupu is ex. 2.6.1 Fuzzy C eans Clusering The seps of Fuzzy C-eans Clusering will be shown in he following seps [1]: 1. Iniial daa inpu, arix, wih size nx, (n = nuber of fraes, = nuber of feaures 2. Deerining he paraeers: a uber of clusers (k : esed paraeer b axiu ieraions ( : 1000 c The expeced salles error : 10-5 d Ieraion sar : 1 (one e Power (w : esed paraeer The nuber of cluser indicaes he variaion of recognized sound. If he nuber of cluser is 16 hen here are 16 variaion of recognized sound. The power of Fuzzy C-eans Clusering indicaes range of each cluser. If he power is 2 hen he cluser range is wider han if he power is 1.3. I eans if he power is 2 hen ebership degree of daa is higher han he power is 1.3. 3. Generaing rando values fro he arix U which is a arix nuber of fraes, and he nuber of clusers, o ake he arix eleens of he iniial pariion U.
World Congress of Inernaional Fuzzy Syses Associaion 2011 and Asia Fuzzy Syses Sociey Inernaional Conference 2011, Surabaya-Bali, Indonesia, 21-25 June 2011, ISB: 978-602-99359-0-5 Calculaing he pariion arix (μ : (8 Q 4. Calculaing he k h cluser cener (V k : V k n i1 n i1 w w i 5. Calculaing he obecive funcion (P a ieraion : c 2 w P V (10 i1 k 1 1 i k 6. Doing ieraion and a each ieraion he pariion arix(μ will be updaed: c k 1 1 1 i V i V 1 2 w1 k 1 2 w1 k (9 (11 7. Checking he sop condiion: 1 If new obecive funcion value less he sae old obecive funcion value is less han he expeced error value, or ore han he axiu value ieraion, ( P -P -1 <ξ or (>axier, hen sop 2 Sep 4 will be repeaed if he condiion has no sopped and =+1 Fuzzy C-eans Clusering is done o obain he cluser cener (codebook. Afer he codebook is consruced, he nex sep can be done by replacing a feaure vecor wih a row ebership degree of frae for each cluser. Afer he codebook is obained, hen calculae ebership degree of daa for each cluser ( O using he following xz equaion: y xy V 1 zy c xy V O xz z1 1 zy 1 2 w1 1 2 w1 oe: a x : nuber of fraes of observaion daa b y : nuber of feaures c z : nuber of clusers (12 2.6.2 Fuzzy Forward-Backward The difference beween he H and he FH is for each observaion H refers o one codebook value of one frae and while in FH, observaion refers o a frae value bu i has all he values in each codebook wih differen ebership degree. Therefore, a new fraework of forward and backward calculaion needs o be conduced. In his subchaper, he oher forward and backward calculaion is also shown [5]. Iniializaion of forward calculaion (=1: 1( i 1bi ( O1 (13 1( i 1bi ( O 1 (14 c. Ohers FH : i u(,1 b ( (15 1( 1 O 1 i 1 Inducion of forward calculaion (=2,...T: ( i a b ( O (16 i ( 1 i 1 1 ( i a b ( O (17 i ( 1 i 1 1 c. Ohers FH : ( ( i a u(, b ( (18 1 i i1 1 Inducion of backward calculaion (=T: ( a b ( O 1 B 1( (19 1 i a b ( O B ( (20 ( 1 i 1 1 c. Ohers FH : ( a B ( u(, b ( (21 i 1 1 Calculaion of forward-backward: oe : 1 P O ( i a b ( O B ( (22 i 1 1 i1 1 P O ( i a b ( O B ( (23 i 1 1 i1 1 c. Ohers FH : P O ( i a B ( u(, b ( (24 i 1 c a. b i O x B z zio 1 xz i1 1 1 ( (25 This forula eans ha he inpu daa is observaion daa which has ebership degree for each cluser, and he oupu daa is he observaion probabiliy disribuion of x h sybol in he i h sae (B. b. u(,=siilariy(cb(,o (26 cb ( is a cluser cener vecor for index. c. Siilariy easure (represens he nuber of feaures Table 2. Siilariy easure Cosine k 1 siilariy ( xi, x 2 (27 k k 1 k k 1 anhaan ( xi, x disance (28 k 1 Euclidean disance (29 ( x i, x k 1 k k 2 2
World Congress of Inernaional Fuzzy Syses Associaion 2011 and Asia Fuzzy Syses Sociey Inernaional Conference 2011, Surabaya-Bali, Indonesia, 21-25 June 2011, ISB: 978-602-99359-0-5 d. The four forulas of he proposed forward and backward calculaion are changed because every value b (O refers o all codebook wih differen degrees of ebership. 3 EPERIETAL RESULTS 3.1 Copare H and FH if he nuber was alered The purpose of experien was o obain he opial nuber of cluser. The saic variables were he nuber of saes and he power (w. In his experien, he nuber of sae was 7(seven and he power (w was 1.1. Table 3. If he nuber of cluser was alered ehod The nuber of cluser 16 32 H 66.67% 80 % FH 84.17% 88.33% Table 3 shows he accuracies of H and FH if he nuber of cluser was alered. If w increased hen FH and H accuracies increased. The opial nuber of cluser was 32. I eans ha he syse required 32 varian of recognized sound o obain a good accuracy. The experien did no ry if he nuber of cluser was 64 because since his sudy had only five recognized words, all words had few phonees. If he nuber of cluser was 64, i would cause he overspecializaion syse. 3.2 Copare FH for each power (w The purpose of experien was o obain he opial power (w of FH. The saic variables were he nuber of saes and he nuber of cluser. In his experien, he nuber of sae was 7 and he nuber of cluser was 32. The nuber of cluser was 32 because i was he opial nuber which was obained fro experien of able 3. Table 4. If he power (w was alered w Accuracy 1.05 92.5 % 1.1 88. 33 % 1.3 83. 33 % 1.5 65 % 1.7 46. 67 % Fro able 4 shows FH accuracy if power (w was ranging fro 1.05-1.7. The opial power (w was 1.05 and if w increased hen FH accuracies decreased. The explanaion of he resul will be shown in he following figure: Figure 2. The influence of w If he power (w was 1.3, each daa had differen degrees of ebership for each cluser. Oherwise, if he power (w was 2, hree clusers have he sae region and each daa has he sae degrees of ebership for each cluser. I eans ha here is no differen aong observaion daa and he syse will only recognize one label. 3.3 Copare FH for each sae The purpose of experien was o obain he opial nuber of sae. The saic variables were he nuber of cluser and he power (w. In his experien, he nuber cluser was 32 and he power was 1.05. The paraeer values were he opial values which were obained fro experien of able 3 and 4. Table 5. If he nuber of sae was alered ehod The nuber of sae 5 6 7 8 9 10 H 80 % 86.67% 80 % 86.67% 89.17% 85.83% FH 90 % 90 % 92.5% 90 % 91.6 % 91.67% Fro able 5, he opial nuber of sae of H was 9 and he opial nuber of sae of FH was 7.The accuracy was no influenced he nuber of saes because if he nuber of sae was increased, he accuracy soeies increased and decreased. 3.4 Copare H and FH The purpose of experien was o copare H and FH if hey had he opial condiion (he bes accuracy. The opial condiion of H was if he nuber of cluser was 32 and he nuber of sae was 9. The opial condiion of FH was if he nuber of cluser was 3, he power (w was 1.05, and he nuber of sae was 7. Table 6. H and FH ehod Accuracy H 89.17% FH 92.50 % Fro able 6, FH was beer han H. FH could iprove H accuracy and is iproveen was 3.3333 %. 4 COCLUSSIO AD RECOEDATIO 4.1 Conclussion Fro he analysis of he perforance of FH by using he daa in his sudy, i can be concluded ha he opial condiion of FH o obain a good accuracy in his sudy are he nuber of cluser is 32, he nuber of sae is 7, and he power (w is 1.05. Wih his opial condiion, he FH s accuracy is 92.50 % and i is beer han he H s accuracy, he iproveen is 3.33 %. 4.2 Recoendaions for fuure works Since our ehod is an effecive way o Indonesian Speech Classificaion, i is srongly recoend he use of FH for a bigger daabase, and he ipleenaion of ore efficien ie coplexiy on FH. The proposed ehod needs longer ie han H. Oher recoendaion is he use of bigger frequency on FH.
World Congress of Inernaional Fuzzy Syses Associaion 2011 and Asia Fuzzy Syses Sociey Inernaional Conference 2011, Surabaya-Bali, Indonesia, 21-25 June 2011, ISB: 978-602-99359-0-5 REFERECE [1] Book: Sri Kusuadewi, and Hari Purnoo: Aplasi Loga Fuzzy unuk Pendukung Kepuusan, Penerbi Graha Ilu, pp. 84-85, 2004. [2] Book: B.H. Juang and Lawrence R. Rabiner: Fundaenals of Speech Recogniion, Prenice-Hall Inernaional, Inc, 1993. [3] Journal: B. H. Juang; L. R. Rabiner, Hidden arkov odels for Speech Recogniion, Technoerics, Vol. 33, o. 3., pp. 251-272. 1991. [4] Journal: Dessi Pui Lesari,Koi Iwano, Sadaoki Furui: A Larger Vocabulary Coninuous Speech Recogniion Syse for Indonesian Languange, 15 h Indonesian Scienific Conference in Japan Proceedings, ISS: 1881-4034, 2006. [5] Journal: Harun Uguz, Ali Ozurk, Rıdvan Saracoglu, and Ahe Arslan: A Bioedical Syse Based on Fuzzy Discree Hidden arkov odel for The Diagnosis of The Brain Diseases, Exper Syses Wih Applicaions 35 1104 1114, 2008. [6] Journal: Haa Riza, and Oskar Riandi: Toward Asian Speech Translaion Syse: Developing Speech Recogniion and achine Translaion for Indonesian Language, Inernaional Join Conference on aural Language Processing, 2008. [7] Journal: Jia Zeng, and Zhi Qiang Liu: Inerval Type-2 Fuzzy Hidden arkov odels, Proceedings of Inernaional Conference on Fuzzy Syses vol.2 pp.123-1128 2004. [8] Journal: Jia Zeng And Zhi-Qiang Liu: Type-2 Fuzzy Hidden arkov odels o Phonee Recogniion, Proceedings of he 17h Inernaional Conference on Paern Recogniion, 2004. [9] Journal: Lawrence R. Rabiner: A Tuorial on Hidden arkov odels and Seleced Applicaions in Speech Recogniion, Proceedings of he IEEE, Vol.77, o.2, 1989. [10] Journal: Lei Chen, Sule Gunduz, and. Taer Ozsu: ixed Type Audio Classificaion wih Suppor Vecor achine, Proceedings of he IEEE Inernaional Conference on uliedia and Expo, 2006. [11] Journal: Paricia elin, Jerica Urias, Daniel Solano, e all: Voice Recogniion wih eural eworks, Type-2 Fuzzy Logic and Geneic Algorihs, Engineering Leers, 13:2, 2006. [12] Journal: Rain Halavai, Saeed Bagheri Shouraki, ahsa Eshraghi, ilad Alezadeh: A ovel Fuzzy Approach o Speech Processing, 5h Hybrid Inelligen Syses Conference, 2004. [13] Journal: Sinou D. Shenouda, Fayez W. zaki, Ar Goneid: Hybrid Fuzzy H Syse for Arabic Connecionis Speech Recogniion, Proceedings of he 5 h WSEAS Inernaional Conference on Signal Processing, roboics and Auoaion, pp 64-69, 2006. [14] Journal: Sephen E. Levinson, Lawrence R. Rabiner, Aaron E. Rosenberg, and Jay G. Wilpon: Ineracive Clusering Techniques for Selecing Speaker- Independen Reference Teplaes For Isolaed Word Recogniion. IEEE Transacions on Acousics, Speech, and Signal Processing, Vol. Assp-27, 1979.