A Novel Application of Fuzzy Set Theory and Topic Model in Sentence Extraction for Vietnamese Text

Similar documents
Sự hòa hợp giữa các thì

"Shepherds living with the smell of the sheep" (Pope Francis) DANH SÁCH THAM DỰ VIÊN

With these exceptional golfing privileges, there is no better golfing partner than your Visa Premium card

List of Participants

List of delegates to Italy From June 2018

Initial Environmental and Social Examination Report Annex D

BITEXCO FINANCIAL TOWER. International Summer Week. 1 st July May, 2016

hồ sơ năng lực GIỚI THIỆU CÔNG TY Company Introduction Billboard Ads sign Street Banner Events Contacts giới thiệu vinamad

PREMIER VILLAGE PHU QUOC RESORT & PREMIER RESIDENCES PHU QUOC EMERALD BAY MANAGED BY ACCORHOTELS

Phân tích và Thiết kế THUẬT TOÁN Hà Đại Dương Web: fit.mta.edu.vn/~duonghd

CBRE Seminar ASSET SERVICES OFFICE SERVICES. Standing out in a challenging and crowded market. 12 th February 2009

CÂY CÂN BẰNG AVL MỤC TIÊU TÓM TẮT. Hoàn tất bài thực hành này, sinh viên có thể:

Cho đến nay, có 180 tham dự viên và thuyết trình viên Đại Hội Linh Mục Việt Nam Emmaus V; đông nhất từ trước đến giờ. Chúng ta cảm tạ Chúa!

THÔNG BÁO - GIÚP NẠN NHÂN BÃO LỤT MIỀN TRUNG 2016

Vietnam, Que Huong Muon Thuo =: Vietnam, Mon Pays De Toujours = Vietnam, My Country Forever By Cao Linh Tran READ ONLINE

PHƯƠNG PHÁP HỌC VÀ THỰC HÀNH MÔN GÚT Phải chuẩn bị đầy đủ dụng cụ: Dây, vật dụng, cây, móc. Trí nhớ, nhanh, đúng chỗ Kiên nhẫn, bình tĩnh, hoạt bát

Chuyên đề SWAT (Soil and Water Assessment Tool)

Danh Sách Linh Mục Việt Nam Tham Dự Đại Hội Emmaus V (Hạn chót ghi danh ngày 30/9/ Please thêm $80 nếu ghi danh sau ngày 30/9/2013)

Trung Tâm Gia Sư Tài Năng Việt

MÔ HÌNH QUẢN TRỊ DOANH NGHIỆP THEO XU HƯỚNG ĐỔI MỚI

VBMA NEWSLETTER. June In this issue. Market news. VBMA Activities. Upcoming events

TÀI LIỆU HƯỚNG DẪN SỬ DỤNG CÂN ĐIỆN TỬ

FIEST ELEMENTARY OCTOBER 1-12, Fiest Elementary School Est. 1989

AMC 8 (A) 2 (B) 3 (C) 4 (D) 5 (E) 6

ALEGOLF MEMBERS PREFERRED RATES TABLE

THÔNG BÁO SỐ 2 HỘI NGHỊ KHOA HỌC CÔNG NGHỆ TOÀN QUỐC VỀ CƠ KHÍ LẦN THỨ IV NGÀY 9 & 10 THÁNG 10 NĂM 2015

MARKET-ing 8/12/2011. Bài giảng lưu hành nội bộ dành riêng cho SV KTS _ DH Nong Lam TPHCM. Marketing là gì? TS Nguyen Minh Duc 1

Third Amnesty Of God Eighty Ninth Year Tay Ninh Holy See **** REPORT

TÀI LIỆU HƯỚNG DẪN SỬ DỤNG CÂN BÀN CÂN SÀN TPS SERI-DH

List of Vietnamese Prisoners of Conscience- as of September 30, 2018

HÌNH THÁI HỌC CÂY PHẾ QUẢN Ở NGƯỜI VIỆT NAM

MỘT SỐ ĐẶC ĐIỂM HUYẾT HỌC CỦA GÀ ISA BROWN MẮC BỆNH NEWCASTLE

9:00-11:00 GIỜ : HỘI NGHỊ PHIÊN TOÀN THỂ

The VGA of SoCal Invitation Golf Tournament & The 2 nd VGA Cup

Những fairway độc quyền cho riêng bạn

Telephone: +84 (0) , Fax. +84 (0)

ỨNG DỤNG MẠNG XÃ HỘI ẢO TRONG QUẢNG BÁ DU LỊCH

CORPORATE GOVERNANCE REPORT (Year 2014)

KẾT QUẢ CHỌN TẠO GIỐNG LÚA BẮC THƠM SỐ 7 KHÁNG BỆNH BẠC LÁ

THÔNG BÁO KỸ THUẬT- TECHNICAL INFORMATION

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƢỜNG ĐẠI HỌC DÂN LẬP HẢI PHÕNG -----o0o----- ĐỒ ÁN TỐT NGHIỆP NGÀNH CÔNG NGHỆ THÔNG TIN

Phrasal verbs Nhữ ng cu m đo ng tữ hay ga p trong ca c ba i thi

TÀI LIỆU HƯỚNG DẪN SỬ DỤNG CÂN ĐẾM VIBRA TPS SERI C VIBRA TPS C

NH»NG ÇIŠU CÀN BI T VŠ BŒNH LAO

ỨNG DỤNG NỘI SOI PHẾ QUẢN CAN THIỆP TẠI TRUNG TÂM HÔ HẤP BỆNH VIỆN BẠCH MAI. TS. Vũ Văn Giáp TS. Chu Thị Hạnh GS.TS. Ngô Quý Châu và CS

Chúa Nh t XXII Th ng Niên N m C. Ngày 01/09/2013 Bản Tin Số Nhân Đ c Đ u Tiên. Lm. G.T. Phạm Quốc Hưng, C.Ss.R

Page 1 of 15 UNIVERSITY OF SCIENCE AND TECHNOLOGY OF HANOI TRƯỜNG ĐẠI HỌC KHOA HỌC VÀ CÔNG NGHỆ HÀ NỘI. Photo. Dir/ Service/Depart Name Function

The Abyss. Whitepaper. Tháng 4 năm 2018 Phiên bản 2.0

BÁO CÁO THƯỜNG NIÊN NĂM 2016

REGULATION The 2nd CHAMPIONSHIP OF THE VIETNAM VOCOTRUYEN 2018

ĐỌC KINH TÔN VƯƠNG LỊCH PHỤNG VỤ TRONG TUẦN. Ngày 13/5/2018 Lúc 7:00PM Giáo Khu 1 Ô/B Hoàng Vang Herald Dr.

JEN Ngay~..A.Q.l ~1: e vi~c tuyen sinh dao tao trinh dq thac si narn 2017 C~uyen ~J~.c._M..fil,

THƯ VIỆN TRUNG TÂM - THÔNG BÁO TÀI LIỆU MỚI THÁNG 3/2016

VBMA NEWSLETTER. June Market News and VBMA Activities. In this issue. Market news. VBMA Activities. Upcoming Events

Do Thi Thanh Huyen, Le Thi Kim Ngan. Design. Ailsa Cuthbert. English editing

Sxmb du doan. 10/17/2017 Daphne irene video 10/19/2017. Men that play with a catheter

Lợi thế cạnh tranh của Áo

SỰ BIẾN ĐỔI CỦA LƯỢNG COLIFORMS VÀ Escherichia coli GÂY NHIỄM TRÊN CÁ RÔ PHI KHI BẢO QUẢN Ở NHIỆT ĐỘ DƯƠNG THẤP

SỰ PHÂN BỐ VÀ CƯỜNG LỰC KHAI THÁC CÁ KÈO GIỐNG (Pseudapocryptes elongatus, Cuvier 1816) Ở VÙNG VEN BIỂN TỈNH SÓC TRĂNG VÀ BẠC LIÊU

Kiem tra tr~c nghi~m TIENG ANH LOP 9

VIỆN TIÊU CHUẨN ANH - LỊCH ĐÀO TẠO / TRAINING SCHEDULE 2018 Đào tạo tại Đà Nẵng/ in Đà Nẵng

DANH MỤC SÁCH MỚI THÁNG 12 NĂM 2014

TÀI LIỆU HƢỚNG DẪN CHƢƠNG TRÌNH VISA VIETNAM 2012 PLATINUM GOLF

THỜI GIAN TIÊU ĐỀ BÀI PHÁT BIỂU DIỄN GIẢ TIÊU ĐỀ BÀI PHÁT BIỂU DIỄN GIẢ PHÒNG HỘI THẢO 2 PHẦN TỔNG QUÁT

CURRICULUM VITAE FROM DÌNH ÂN TRÂN

Safety Analysis Methodology in Marine Salvage System Design

HỌ CÁ BỐNG TRẮNG (GOBIIDAE) TRONG CÁC RẠN SAN HÔ SPECIES COMPOSITION AND DISTRIBUTION OF THE FAMILY GOBIIDAE IN CORAL REEFS IN THE NHA TRANG BAY

Establishment and management of fisheries refugia in Phu Quoc Marine Protected Area, Vietnam

DI TRUYỀN & CHỌN GiỐNG THỦY SẢN. Ts. Phạm Thanh Liêm Ts. Dương Thúy Yên Bộ môn Kỹ thuật Nuôi Nƣớc Ngọt

Morphological change on Cua Dai Beach, Vietnam: Part I image analysis

BAO CAO KET QUA THO' NGHll;M

Central Golf Tour Lang Co - Hoi An - Da Nang 5 Days 4 Nights

ẢNH HƯỞNG TỈ LỆ CÁC HUFA (DHA:EPA:ARA) TRONG THỨC ĂN LÀM GIÀU ĐẾN SINH TRƯỞNG VÀ TỈ LỆ SỐNG CỦA ẤU TRÙNG CÁ CHẼM - Lates calcarifer (Bloch, 1790)

VIÊM TAI GIỮA MẠN THỦNG NHĨ KÉO DÀI ẢNH HƢỞNG ĐẾN SỰ PHÁT TRIỂN THÔNG BÀO XƢƠNG CHŨM

Le Anh Tuan

CHẨN ĐOÁN VÀ ĐIỀU TRỊ TEO THỰC QUẢN

CORPORATE GOVERNANCE REPORT (First half of the year 2014)

Phần 2. AUTOLISP. BS: Nguyễn Quang Trung 1

Khối: Cao Đẳng Năm 2008

THÀNH PHẦN LOÀI VÀ MỨC ĐỘ PHONG PHÚ CỦA CÁC LOÀI CÁ BỐNG THUỘC HỌ (ELEOTRIDAE) TRÊN SÔNG HẬU STUDY ON FISH COMPOSITION AND ABUNDANCE OF GOBY FISH

CORPORATE GOVERNANCE REPORT (In the year 2013)

MÔ TẢ BA LOÀI MỚI TRONG NHÓM CÁ BẬU, GIỐNG Garra Hamilton, 1822 (Cyprinidae, Cypriniformes) ĐƯỢC PHÁT HIỆN Ở BẮC VIỆT NAM

TẠP CHÍ KHOA HỌC, Đại học Huế, Số 55, 2009

4 th IRTAD CONFERENCE. for target setting and monitoring performances and progress

CỬU MÔN VÕ ĐƯỜNG. Võ Văn Hóa Việt Nam

THÔNG BÁO KỸ THUẬT TÀU BIỂN TECHNICAL INFORMATION ON SEA-GOING SHIPS

Tạp chí Khoa học Kỹ thuật NÔNG LÂM NGHIỆP

Kính gửi: Thư viện Trường BÁO GIÁ DANH MỤC SÁCH GIÁO TRÌNH

Gradebook Report. Student Assignment Progress Report. Course Name. Student NameLast NameFirst Name

2979 Vietnamese songs Karaoke Page 1

ISSN: [Dung * et al., 6(8): August, 2017] Impact Factor: 4.116

UNIT 12: WATER SPORTS

data science = data (math stat cs...)?

A Novel Gear-shifting Strategy Used on Smart Bicycles

SCIENTIFIC COMMITTEE NINTH REGULAR SESSION August 2013 Pohnpei, Federated States of Micronesia

DANH MỤC LUẬN VĂN THẠC SĨ CHUYÊN NGÀNH GIẢNG DẠY TIẾNG ANH

Tìm hiểu CMS Joomla và ứng dụng xây dựng website bán máy tính qua mạng

INTEGRAL VOVINAM BELT EXAMINATION Programme

In this issue. Market News. VBMA Activities. Upcoming Events 1 TD /06/

HỘI NGHỊ VẬT LÝ LÝ THUYẾT TOÀN QUỐC LẦN THỨ 31

Transcription:

IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010 41 A Novel Application of Fuzzy Set Theory and Topic Model in Sentence Extraction for Vietnamese Text Ha Nguyen Thi Thu 1 and Nguyen Thien Luan2, epartment of Computer Science, Electric Power University,235 Hoang Quoc Viet, Ha noi, Viet Nam Faculty of Information Technology, Le Qui on Technical University,100 Hoang Quoc Viet, Ha Noi, Viet Nam Summary Vietnamese language has common characteristics with some Asian languages such as Chinese, Japanese, Korean... They do not define words based on spaces. In this article, we present a method that application of Fuzzy set theory and topic model to extract sentences in Vietnamese texts which have been categorized by topic. This method based on identification of important features as : length of sentence, weight of terms in sentences, position of sentences..., then extracting important sentences according to the ratio, this ratio indicate which sentences in original text will be extracted. We also built a system based on this method and experiments have obtained good results, satisfying the given requirements. Key words: Vietnamese text, Sentence extraction, topic model, Fuzzy set theory 1. Introduction " A text that is produced from one or more texts, that conveys important information in the original text(s, and that is no longer than half of the original text(s and usually significantly less than that" (Radev et all 2002 [1]. Lin - 2000 proposed construction of an automatic text summary system including sentence- extracting process [3]. For a long time, most of automatic text summariztion systems based on sentence extraction in one or more texts and then combined them to make the summary. The earliest research on text summary used sentence extraction method based on specific features of words and phrases frequency (Luhn, 1958, position of the sentence in the text (Baxendale, 1958 and important phrases (Edmundson, 1969 [1]. Appearance frequency of a word mainly based on the tf * idf method. Most of these methods are effectively applied for English. However, for Asian languages like Vietnamese, Chinese, Japanese or Korean with single syllable, it is very difficult to separate words. Unlike English, words of these languages can not be determined based only on a space [4]. Thus, determination of the weight of sentences based on the method of calculating the frequency of words in sentences is not appropriate. On the other hand, study on Vietnamese text summary is now something new, corpus for Vietnamese text summarization or word segmentation is still limited. We studied Vietnamese text that categorized by topic and found that with each different theme, weight of words (hereafter called terms would depend on the different topics. Therefore, we have built sets of terms for each topic and applied Fuzzy theory in determining degree of membership of each term in text which is easier than identifying important sentences for extracting. Structure of this article is as follows: in section 2, we introduce structure of automatic Vietnamese text sentence extraction system. In section 3, we present methodology using topic model to build sets of terms and Fuzzy set theory to determine weight of sentence in text. Section 4 shown the result of our system and evaluation with some others method and section 5 is conclusion. 2. Modeling of sentence extraction The method applying Fuzzy theory is quite effective in approximately solving the problems of classification, identification of texts, scripts, images, sounds,... An Fuzzy set A on a classic set Χ is defined as follows: Membership function μa (x quantifies the degree which elements x belong to basic set Χ. If the function gives result 0 for an element, then that element is not in the given set, the result 1 describes a full member of the set. The values in range of 0 to 1 show fuzzy elements. We propse a model for automatic Vietnamese text sentence calculation as follows Manuscript received August 5, 2010 Manuscript revised August 20, 2010

42 IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010 3. Method of sentence extraction 3.1 Topic Model Vietnamese has not a normal word segmentation, thus in this research, we have applied the Topic Model [4] to build a set of terms corresponding to the different topics. First we build sets of documents for training that has been classified by difference topic. With each topic we have a training set: = d, d,.. d } (1 { 1 2 n Fig 1: Calculated weight of sentence Where: n di i= μ = 1 (3 n μ t : is degree of membership of t on ( n: is number of documents in 1 if t di di = 0 if t di Where : is the training set, collecting set of documents d i are the same of topic. For each topic, we build a corresponding set of terms. T = t, t,.. t } (2 { 1 2 m 3.2 egree of membership of terms in training set egree of membership of terms in the training set indicates importance of the term to the topic [6]. It is calculated by the ratio of total number of texts containing the term divide total texts in the training set. 3.3 egree of membership of terms in test document. If the degree of membership of terms in the training set indicates importance of the term to the training set, the degree of membership of terms in test document shows the importance of the term to the test document and it is calculated by the number of appearance of term t over the total terms appearing in test document. Nt ( μc = (4 m Nt ( i= 1 i

IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010 43 Where: μ C indicates grade of membership of t in test document C. Nt ( is number of t appear in C. 3.4 Calculating the importance of sentences We propose a method for calculate weight of sentences μ c μ t si si Fi = a1 + a2 + a3 + a4 position max{ μ } max{ μc } S (5 Where : a1, a2, a3, a is the score are the linear 4 coefficients indicate grade of membership of term t in training set, grade of membership of term t in test document, length of s i and position of s i in document. 1if si in the head of document or foot of Position = 0 if otherwise document The algorithm is described as a follows: ALGORITHM CALCULATING THE IMPORTANCE OF SENTENCES Input C: Test ocument Output R: Set of important sentences 1. Initialization R Ø; S Ø; 2.Sentence segmentation S s 1,s 2,..,s k 3. Calculating weight of each sentence in Test ocument For each s k For i=1 to s k For each term in s k calculate L(s i L(s i + t μ μ + t ( si F( s L( si + S C ( i + position Fig. 2 Procedure for calculated weight of sentence After the sentences have been calculated, they will be sorted in descending order and extracted according to the ratio r which is length of set of extract sentences divided by length of set of sentences in the original text. The algorithm of sentence extraction is as follows SENTENCE EXTRACTION ALGORITHM Input : C: Test ocument Ratio r Output: : R : Set of Sentence. Begin R= ; p=1; // select sentence to extract o While length (R + s[ i p ]<= C *r Begin R =R + s[ i p ] t[i p ]= true; p=p+1; End // arrange sentences in order R= For i=1 to n do if t[i]=true then R=R+s[i] End Fig. 3 Procedure for selected sentence 4. Result and evaluation We evaluate this approach by apply it to summarize Vietnamese news and comparing with other current approaches, our research shows interesting and satisfactory results. Based on the proposed method above, we have built a system to calculate importance of sentences and automatic extraction of sentences for Vietnamese texts. We tested with four different topics: sport, technology, political, and economy. For each topic, we collected about 300 different texts for training from the Vietnamese website http://www.vnexpress.net. And here is the architecture of our automatic Vietnamese text sentence extraction. 3.5 Method for extracting important sentences

44 IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010 Fig. 4 Architecture for extraction At the present, Vietnamese does not have any standard assessment method, therefore, we compare the result of extract according to the method proposed by Ha Thanh Le [Le Thanh Ha, Quyet Thang Huynh, Mai Luong, 2005] and our result. Comparison shows that our extract result is better and more stable than theirs. In addition, we also compared our result with an online summary system http://smmry.com, and result from extract by human. In evaluate our method with an online summary system. We downloaded english documents from http://news.bbc.co.uk and used for exprimentation. We translated to Vietnamese text and apply our system to extract sentences, then translated to English and compared it with result of smmry.com. Real Madrid are set to complete the signing of Portugal centre-back Ricardo Carvalho from Chelsea. The Spanish giants have agreed a 6.7m fee to sign the 32-year-old, who will link up once again with former Chelsea boss Jose Mourinho. "Chelsea can confirm it has agreed terms with Real Madrid for the transfer of Carvalho," said a club statement. Carvalho, who was also with Mourinho at Porto, oined Chelsea in 2004 for 19.8m and won the league three times. The Chelsea statement added: "The transfer is subect to a medical and the player agreeing personal terms. Chelsea would like to thank Riccy for his six years of service, and we wish him well in his future career." As well as winning three Premier League titles during his time at Stamford Bridge, Carvalho also won the FA Cup three times, the League Cup twice and the Community Shield twice. He becomes Real's fifth signing of the summer, following Portugal's Angel di Maria, Spanish duo Pedro Leon and Sergio Canales and Germany World Cup star Sami Khedira. A statement on the Spanish club's website began: "Best known for his perfect positioning, his foresight and his ability to advance the ball out of the first third, Ricardo Alberto Silveira Carvalho oins Real Madrid at the age of 32 and with 19 titles to speak of." Last month, Carvalho told a Spanish newspaper it would be a dream come true to oin Mourinho at the Bernabeu. "If there was a possibility to sign with Real Madrid, I would go there right now swimming or running," he said. "It would be a dream to be able to play for Madrid, which I consider to be the best club in the world, and follow the orders of the best coach in the history of football. "With Mourinho I experienced two marvellous stages at Porto and Chelsea. To have the opportunity to win another Champions League with him at Real Madrid would be tremendous." Carvalho played more than 200 games for Chelsea, scoring 11 goals. He is the fifth player to leave the Stamford Bridge club this summer after Joe Cole, Michael Ballack, Juliano Belletti and eco. Fig.5 Original text for experiment

IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010 45 From original text, we extracted 5 sentences which are the most important in it. And follow is the result of extracting by online sumary system summry.com SENTENCES EXTRACTE BY SUMMY.COM The Spanish giants have agreed a 6.7m fee to sign the 32-year-old, who will link up once again with former Chelsea boss Jose Mourinho. "Chelsea can confirm it has agreed terms with Real Madrid for the transfer of Carvalho," said a club statement. Carvalho, who was with Mourinho at Porto, oined Chelsea in 2004 for 19. "It would be a dream to be able to play for Madrid, which I consider to be the best club in the world, and follow the orders of the best coach in the history of football. "With Mourinho I experienced two marvellous stages at Porto and Chelsea. Have the opportunity to win another Champions League with him at Real Madrid would be remendous." Carvalho played more than 200 games for Chelsea, scoring 11 goals. Fig. 6 Extracted by smmry.com And then, we use our system for extract sentence from original text. And here is the result of extracting by our system. SENTENCES EXTRACTE BY OUR SYSTEM Chelsea có thể xác nhận nó đã đồng ý các điều khoản với Real Madrid về việc chuyển nhượng của Carvalho," một câu lạc bộ tuyên bố Carvalho cùng với Mourinho tại Porto, gia nhập Chelsea vào năm 2004 với giá 19.8m và chiến thắng ở giải Vô địch ba lần. Chelsea tuyên bố thêm: "chuyển giao này phải được một y tế và người chơi đồng ý các điều khoản cá nhân. Chelsea muốn cảm ơn Riccy sau sáu năm phục vụ, và chúng tôi mong muốn sự nghiệp tương lai của anh ấy tốt." Cũng như giành ba danh hiệu Premier League trong thời gian của mình tại Stamford Bridge, Carvalho cũng đã giành FA Cup ba lần, League Cup hai lần và Community Shield hai lần. Có cơ hội để giành chiến thắng khác với Champions League tại Real Madrid "Carvalho đã chơi hơn 200 trò chơi cho Chelsea, ghi được 11 bàn thắng.. Anh là cầu thủ thứ năm rời khỏi câu lạc bộ sân Stamford Bridge mùa hè này sau Joe Cole, Michael Ballack, Juliano Belletti và eco. TRANSLATE TO ENGLISH "Chelsea can confirm it has agreed terms with Real Madrid for the transfer of Carvalho," said a club statement Carvalho, who was also with Mourinho at Porto, oined Chelsea in 2004 for 19.8m and won the league three times. The Chelsea statement added: "The transfer is subect to a medical and the player agreeing personal terms. Chelsea would like to thank Riccy for his six years of service, and we wish him well in his future career." As well as winning three Premier League titles during his time at Stamford Bridge, Carvalho also won the FA Cup three times, the League Cup twice and the Community Shield twice. Have the opportunity to win another Champions League with him at Real Madrid would be remendous." Carvalho played more than 200 games for Chelsea, scoring 11 goals. He is the fifth player to leave the Stamford Bridge club this summer after Joe Cole, Michael Ballack, Juliano Belletti and eco. Fig. 7 Extracted by our system

46 IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010 Pression is the traditional assessment method is given by: correct Pression = correct + wrong In which: correct the number of sentences extracted by both human and system. wrong is the number of sentences extracted by the system but not by human. Table 1: Traditon of evaluation Compression rate Method 80% 60% 40% 20% Smmy.com 0.761 0.783 0.697 0.61 Ours 0.875 0.802 0.67 0.64 Human 0.91 0.85 0.863 0.82 Ha Le Thanh 0.62 0.754 0.698 0.543 5. CONCLUSION This article represents the application of Fuzzy theory and Topic Model to calculate the importance of sentences in the extract process - one of the processes in automatic Vietnamese text summary system. Based on the extracted sentences, we further reduced such sentences to make the summary more compact in terms of space, more concise in terms of content and meanings. As Vietnamese is much similar to the Chinese (over 80% of Vietnamese are borrowed from Chinese, Japanese, Korean, this method once well-applied to Vietnamese, will definitely be able to apply to Chinese, Japanese and Korean. Acknowledgments We would like to thank the experts of University of Engineering and technology, Vietnam of University, and Japan Advanced Institute of Science and Technology, r Nguyen Le Minh, r Nguyen Huu Quynh, r Nguyen Van Vinh, r Nguyen Phuong Thai for their great help in building the experimental summarizing application. References [1] ipanan as and Andre F.T. Martins (2007. A Survey on Automatic Text Summarization [2] Chin-Yew Lin and Eduard Hovy The Potential and Limitations of Automatic Sentence Extraction for Summarization. In Proceedings of the HLT-NAACL 2003 Workshop on Automatic Summarization, May 30 to June 1, 2003, Edmonton,Canada. [3] Hongyan Jing and Kathleen R. McKeown. Cut and paste based text summarization. In Proceedings of the 1st Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-2000, pages 178 185, 2000. [4] Mark Steyvers and Tom Griffiths. Probabilistic Topic Models. [5] Thanh, Le Ha; Quyet, Thang Huynh; Chi, Mai Luong A Primary Study on Summarization of ocuments in Vietnamese Proceedings of the First World Congress of the International Federation for Systems Research Nov. 14-17, 2118, Kobe, Japan. [6] wi H.Widyantoro and John Yen, A Fuzzy Similarity Approach in text Classification Task.epartment of computer Science Texas A&M University College Station, TX 77844-3112. [7] Minh, Le Nguyen; Shimazu, Akira; Xuan, Hieu Phan; Tu, Bao Ho; Horiguchi, Susumu Sentence Extraction with Support Vector Machine Ensemble Proceedings of the First World Congress of the International Federation for Systems Research Nov. 14-17, 2119, Kobe, Japan, Symposium 5. [8] K. Han, Y. Song, and H. Rim. KU Text Summarization System for UC 2003. In ocument Understanding Conference. raft Papers, pages 118 121, 2003. [9] C.-Y. Lin. Improving Summarization Performance by Sentence Compression - A Pilot Study. In Proceedings of the International Workshop on Information Retrieval with Asian Language, pages 1 8, 2003. [10] C.-Y. Lin and E. Hovy. The Potential and Limitations of Automatic Sentence Extraction for Summarization. In Text Summarization: Proceedings of the NLT-NAACL Workshop, pages 73 80, 2003. Ha Nguyen Thi Thu is a Lecture with Viet nam Electric Power University (Ha noi, Viet Nam. She is a Fellow. She interested in Natural Language Processing (NLP, ata Mining and Machine Learning. She received M.E degree from China, Guilin university of electronic technology in 2006. Nguyen Thien Luan is a Lecturer with the Le Qui on Technical University (Ha Noi, Viet Nam. His research interests include fuzzy logical, image processing, communication and network security. He has authored or co-authored more than 20 scientific articles, books chapters, reports and chaired many scientific research proects, in the areas of his research. He received his Ph..(1989.