DETECTION AND REFACTORING OF BAD SMELL

Similar documents
High Speed 128-bit BCD Adder Architecture Using CLA

Reduced drift, high accuracy stable carbon isotope ratio measurements using a reference gas with the Picarro 13 CO 2 G2101-i gas analyzer

Evaluation of a Center Pivot Variable Rate Irrigation System

LSSVM Model for Penetration Depth Detection in Underwater Arc Welding Process

First digit of chosen number Frequency (f i ) Total 100

ITRS 2013 Silicon Platforms + Virtual Platforms = An explosion in SoC design by Gary Smith

Peak Field Approximation of Shock Wave Overpressure Based on Sparse Data

Terminating Head

Engineering Analysis of Implementing Pedestrian Scramble Crossing at Traffic Junctions in Singapore

Numerical Study of Occupants Evacuation from a Room for Requirements in Codes

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Evolutionary Sets of Safe Ship Trajectories: Evaluation of Individuals

Development of Accident Modification Factors for Rural Frontage Road Segments in Texas

Fast Adaptive Coding Unit Depth Range Selection Algorithm for High Efficiency Video Coding

Mechanical Engineering Journal

A Prediction of Reliability of Suction Valve in Reciprocating Compressor

Pedestrian Facilities Planning on Tianjin New Area program

English Premier League (EPL) Soccer Matches Prediction using An Adaptive Neuro-Fuzzy Inference System (ANFIS) for

Seabed type clustering using single-beam echo sounder time series data

Investigation on Hull Hydrodynamics with Different Draughts for 470 Class Yacht

CS 2750 Machine Learning. Lecture 4. Density estimation. CS 2750 Machine Learning. Announcements

PERFORMANCE AND COMPENSATION ON THE EUROPEAN PGA TOUR: A STATISTICAL ANALYSIS

An intro to PCA: Edge Orientation Estimation. Lecture #09 February 15 th, 2013

Aerator Performance in Reducing Phenomenon of Cavitation in Supercritical Flow in Steep Channel Bed

The impact of foreign players on international football performance

Dynamic Analysis of the Discharge Valve of the Rotary Compressor

Experimental And Numerical Investigation Of The Flow Analysis Of The Water-Saving Safety Valve

A Study on Parametric Wave Estimation Based on Measured Ship Motions

Applications on openpdc platform at Washington State University

ADDITIONAL INSTRUCTIONS FOR ISU SYNCHRONIZED SKATING TECHNICAL CONTROLLERS AND TECHNICAL SPECIALISTS

Chinese and foreign men s decathlon performance comparison and structural factor correlation test based on SPSS regression model

Ergonomics Design on Bottom Curve Shape of Shoe-Last Based on Experimental Contacting Pressure Data

Comprehensive evaluation research of volleyball players athletic ability based on Fuzzy mathematical model

A comparison study on the deck house shape of high speed planing crafts for air resistance reduction

Numerical Analysis of Rapid Gas Decompression in Pure Nitrogen using 1D and 3D Transient Mathematical Models of Gas Flow in Pipes

Multi-Criteria Decision Tree Approach to Classify All-Rounder in Indian Premier League

Study on coastal bridge under the action of extreme wave

CFD Simulation of R134a and R410A Two-Phase Flow in the Vertical Header of Microchannel Heat Exchanger

IDENTIFICATION OF TRANSPORTATION IMPROVEMENT PROJECTS IN PHNOM PENH CONSIDERING TRAFFIC CONGESTION LEVEL

Risk analysis of natural gas pipeline

Aalborg Universitet. Published in: 9th ewtec Publication date: Document Version Publisher's PDF, also known as Version of record

Product Information. Gripper for small components MPG-plus

Journal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article

PARAMETER OPTIMIZATION OF SEA WATERWAY SYSTEM DREDGED TO THE

International Journal of Engineering and Technology, Vol. 8, No. 5, October Model Systems. Yang Jianjun and Li Wenjin

A NEW METHOD FOR IMPROVING SCATTEROMETER WIND QUALITY CONTROL

Availability assessment of a raw gas re-injection plant for the production of oil and gas. Carlo Michelassi, Giacomo Monaci

Application of fuzzy neural network in the pattern classification of table tennis rotating flight trajectory

OPTIMIZATION OF PRESSURE HULLS OF COMPOSITE MATERIALS

Product Information. Universal gripper PZN-plus

RADIAL STIFFNESS OF A BICYCLE WHEEL AN ANALYTICAL STUDY

Structural Gate Decomposition for Depth-Optimal Technology Mapping in LUT-based FPGA

School of Civil Engineering, Shandong University, Jinan , China

Crash Frequency and Severity Modeling Using Clustered Data from Washington State

Investigation on Rudder Hydrodynamics for 470 Class Yacht

Comparative Deterministic and Probabilistic Analysis of Two Unsaturated Soil Slope Models after Rainfall Infiltration

Product Information. Long-stroke gripper PSH 42

Automated External Defibrillators DESIGNED FOR UNEXPECTED HEROES

Sustainability Enhancement under Uncertainty: A Monte Carlo Based Simulation and System Optimization Method

Mass Spectrometry. Fundamental GC-MS. GC-MS Interfaces

Contractor's Material and Test Certificate for Underground Piping

COMPARATIVE ANALYSIS OF WAVE WEATHER WINDOWS IN OPERATION AND MAINTENANCE OF OFFSHORE WIND FARMS AT HSINCHU AND CHANGHUA, TAIWAN

arxiv: v1 [cs.ne] 3 Jul 2017

International Journal of Industrial Engineering Computations

Decomposition guide Technical report on decomposition

Product Information. Radial gripper PRG 52

A Climbing Robot based on Under Pressure Adhesion for the Inspection of Concrete Walls

M. Álvarez-Mozos a, F. Ferreira b, J.M. Alonso-Meijide c & A.A. Pinto d a Department of Statistics and Operations Research, Faculty of

Sports Injuries in School Gaelic Football: A Study Over One Season

A PROBABILITY BASED APPROACH FOR THE ALLOCATION OF PLAYER DRAFT SELECTIONS IN AUSTRALIAN RULES

Research and Application of Work Roll Contour Technology on Thin Gauge Stainless Steel in Hot Rolling

BETHANY TAX INCREMENT FINANCING DISTRICT NO. 1 NOTICE OF TWO PUBLIC HEARINGS

Johnnie Johnson, Owen Jones and Leilei Tang. Exploring decision-makers use of price information in a speculative market

Keywords: Ordered regression model; Risk perception; Collision risk; Port navigation safety; Automatic Radar Plotting Aid; Harbor pilot.

Evaluating Rent Dissipation in the Spanish Football Industry *

Planning of production and utility systems under unit performance degradation and alternative resource-constrained cleaning policies

Driver s Decision Model at an Onset of Amber Period at Signalised Intersections

ENERGY SAVING IN THE HYDRAULIC CIRCUIT FOR AGRICULTURAL TRACTORS: FOCUS ON THE POWER SUPPLY GROUP.

Modeling the Performance of a Baseball Player's Offensive Production

GAS-LIQUID INTERFACIAL AREA IN OXYGEN ABSORPTION INTO OIL-IN-WATER EMULSIONS

JIMAR ANNUAL REPORT FOR FY 2001 (Project ) Project Title: Analyzing the Technical and Economic Structure of Hawaii s Pelagic Fishery

For models: 660 EF/EFO

Nonlinear Risk Optimization Approach to Gas Lift Allocation Optimization

Wave Breaking Energy in Coastal Region

Investigating Reinforcement Learning in Multiagent Coalition Formation

Evaluating the Effectiveness of Price and Yield Risk Management Products in Reducing. Revenue Risk for Southeastern Crop Producers * Todd D.

Equilibrium or Simple Rule at Wimbledon? An Empirical Study

SCIENTIFIC COMMITTEE THIRTEENTH REGULAR SESSION. Rarotonga, Cook Islands 9-17 August, 2017

Methodology for ACT WorkKeys as a Predictor of Worker Productivity

Product Information. Long-stroke gripper PFH-mini

Journal of Chemical and Pharmaceutical Research, 2014, 6(3): Research Article

Relative Salary Efficiency of PGA Tour Golfers: A Dynamic Review

Aalborg Universitet. Published in: 9th ewtec Publication date: Document Version Accepted author manuscript, peer reviewed version

Report No. FHWA/LA.13/508. University of Louisiana at Lafayette. Department of Civil and Environmental Engineering

SECOND-ORDER CREST STATISTICS OF REALISTIC SEA STATES

Sustainability Profiling of Long-living Software Systems

Muscle drain versus brain gain in association football: technology transfer through

Cost Effective Safety Improvements for Two-Lane Rural Roads

Coastal Engineering Technical Note

Safety Impact of Gateway Monuments

Transcription:

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 DETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALE Jang Dexun 1, Ma Pejun 2, Su Xaohong 3, Wang Tantan 4 School Of Computer Scence and Technology, Harbn Insttute of Technology, Harbn, Chna 1 negrocanfly@163.com, 2 slverghost192@163.cn 3 Suxh@ht.edu.cn, 4 Wangtt@ht.edu.cn ABSTRACT Bad smells are sgns of potental problems n code. Detectng bad smells, however, remans tme consumng for software engneers despte proposals on bad smell detecton and refactorng tools. Large Class s a knd of bad smells caused by large scale, and the detecton s hard to acheve automatcally. In ths paper, a Large Class bad smell detecton approach based on class length dstrbuton model and coheson metrcs s proposed. In programs, the lengths of classes are confrmed accordng to the certan dstrbutons. The class length dstrbuton model s generalzed to detect programs after groupng. Meanwhle, coheson metrcs are analyzed for bad smell detecton. The bad smell detecton experments of open source programs show that Large Class bad smell can be detected effectvely and accurately wth ths approach, and refactorng scheme can be proposed for desgn qualty mprovements of programs. KEYWORDS Dstrbuton rule; Class length dstrbuton model; Coheson metrcs; Bad smell detecton; refactorng scheme 1. INTRODUCTION Nowadays, wth the development of software programmng, the number of software analyss tools avalable for detectng bad smells sgnfcantly ncrease. Although these tools are ganng acceptance n practce, a lack of detecton towards some bad smells may decrease the effectveness, such as Long Method, Large Class and Long Parameter Lst [1]. The key of these bad smells s about the structure and components. Software programs are composed of components from every level. The component from hgher level s composed of ones from lower level. Characters compose keywords, whle keywords, varables and operators compose statements. The composton level of object-orented programs s shown n Fgure 1. DOI : 10.5121/jsea.2013.4501 1

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 Fgure 1. Composton level of object-orented programs. Large Class [1] bad smell s one classcal bad smells, meanng a class s too large. The cause of large classes may be the large number of nstance varables or methods. Large Class has long hstory, but the detecton s always vague. From the defnton [1], ths bad smell detecton should be acheved by the class length statstcs. Usually the class length s measured by the lnes of code. In practce t s dffcult to confrm a threshold value for detectng one partcular class s too large or not. So t s also dffcult to detect Large Class bad smell partcularly n busness open source programs. The fxed threshold value s not fastdous for Large Class bad smell detecton. In ths paper a detectng method of Large Class bad smell s proposed based on scale dstrbuton. The length of all the classes n one program s extracted, and then dstrbuton model of class scale s bult usng the length of these classes. In dstrbuton model the groups whch are farthest from the dstrbuton curve s consdered to be canddate groups of Large Class bad smell. Furthermore, the coheson metrcs of the classes n these groups are measured to confrm Large Class. The rest of the paper s organzed as follows. Secton 2 presents a short overvew of related work. In Secton 3, the class length dstrbuton model s bult to present the dstrbuton rules of class length. Wth ths model and coheson metrcs presented, the detecton method of Large Class bad smell s proposed n Secton 4. Secton 5 dscusses how to gve proper refactorng scheme. And Secton 6 shows the experment results. The concluson s provded n Secton 7. 2. RELATED WORK In the past decades, a number of studes were conducted for bad smells of programmng codes. Webster [2] ntroduced smells n the context of object-orented programmng codes, and the smells sorted as conceptual, poltcal, codng, and qualty assurance ptfalls. Rel [3] defned 61 heurstcs characterzng good object-orented programmng that enable engneers to assess the qualty of ther systems manually and provde a bass for mprovng desgn and mplementaton. Beck Fowler [1] compled 22 code smells that are desgn problems n source code, and t s the bass of suggestng for refactorngs. Travassos et al. [4] ntroduced a process based on manual nspectons and readng technques to dentfy smells. But manual detecton of bad smells s one tme-consumng process, and s easy to mstake. So researchers pay more attenton n automatc detecton. Marnescu [5] presented a metrc-based approach to detect code smells wth detecton strateges, mplemented n the IPLASMA tool. Tahvldar and Kontoganns [6] used an object-orented metrcs sute consstng of complexty, couplng, and coheson metrcs to detect classes for whch qualty has deterorated and re-engneer detected desgn flaws. A lmtaton of ther approach s that t ndcates the knd 2

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 of the requred transformaton but does not specfy on whch specfc methods, attrbutes, or classes ths transformaton should apply (ths process requres human nterpretaton). O Keeffe and O Cnnede [7] treated object-orented desgn as a search problem n the space of alternatve desgns. Ths s applcaton of search-based approaches to solvng optmzaton problems n software engneerng. Bad smell detectng by metrc needs to be selected proper metrcs and the judgng threshold should be predetermned. Vsualzaton technques are used n some approaches for complex software analyss. These semautomatc approaches are nterestng compromses between fully automatc detecton technques that can be effcent but loose n track of context and manual nspecton that s slow and naccurate [8, 9]. However, they requre human expertse and are thus stll tme-consumng. Other approaches perform fully automatc detecton of smells and use vsualzaton technques to present the detecton results [10, 11]. But vsual detectng results need manual nterventon. Some bad smells relevant to coheson can be detected usng dstance theory. Smon et al. [12] defned a dstance-based metrc to measure the coheson between attrbutes and methods. The nspraton about the approach n ths paper s drawn from the work [12] n the sense that t also employs the Jaccard dstance. However, the approach has proposed several new defntons and processes to get mprovements. The concepton of dstance metrcs s defned not only among enttes (attrbutes and methods) but also between classes. In [13], the dstances between enttes and classes are defned to measure the coheson among them. The bad smell detecton wth dstance theory needs more calculaton. In ths paper the equaton of dstance between one entty and one class has been used for computng the coheson degree of one class. There s less research about bad smell detecton of Large Class. Lu et al [14] proposed a detecton and resoluton sequence for dfferent knds of bad smells to smplfy ther detecton and resoluton, ncludng Large Class bad smell. But Lu pad more attenton to the schedule of detecton rather than Large Class detecton tself, and the specfc detectng process was not provded n the paper. In Large Class bad smell detecton, class sze measures have been ntroduced. When class sze s large, t s seen as Large Class. In bad smell detecton tools, the man way [15] of measurng class sze s to measure the number of lnes of code,.e. NLOC, or the number of attrbutes and methods. PMD[16] and Checkstyle[17] both use NLOC as detecton strategy. The former uses a threshold of 1000 and the second a threshold of 2000. The fxed threshold value s not fastdous for Large Class bad smell detecton, and easy to cause false detecton. And n these tools, there s no functon about refactorng of Large Class bad smell. These researches above show that, the detecton of Large Class bad smell s based on fxed threshold comparson. Snce the fxed threshold s selected manually, the objectvty s low. Moreover, the refactorng method s decded manually, and there s no suggeston or scheme about that. 3. THE DISTRIBUTION OF CLASS LENGTH 3.1. Class length dstrbuton appearance In object-orented programs there are a large number of classes. The length of these classes s not the same. In ths paper, t s declared that f the length of one class s larger than the average length of the program, t s called larger class, or smaller class. 3

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 There are some programs wth more classes are the larger classes, whle others are the opposte. Ths depends on the dfferent functon programs should be acheved. And ths s also relevant to dfferent codng habts and programmng styles of developers. From Table1 t s seen that class length statstcs of some open source programs s lsted. Table 1. Class length statstcs of open source programs. Program Number of Class Average Length Larger Class Smaller Class HSQLDB-2.2.7 111 500 29.73% 70.27% Tyrant-0.96 117 101 27.35% 72.65% Tyrant-0.334 262 169 27.10% 72.90% swngwt-0.60 44 41 22.73% 77.27% Trama 16 249 25% 75% ArgoUML 1874 91 19.80% 80.20% Sprng Frk 1531 57 27.69% 72.31% Azureus_Vuze4812 1597 129 28.18% 71.82% In Large Class bad smell detectng, the usage of fxed value threshold may cause mstakes: the detecton results of some programs (such as HSQLDB-2.2.7 n Table 1) are that most of classes are too large, and from the results of other programs (such as Sprng Frk) there s no Large Class bad smell at all. Besdes that, actually the value of fxed threshold s set manually, wth the lower objectvty. Because of the programs wth dfferent codng habts and programmng styles, the detecton result of Large Class bad smell wth the fxed value threshold s naccurate and less persuasve. In Table 1, the percentage of large classes n programs s lower, and the rato of larger classes and smaller ones s between 1:4 and 3:7. Functonally, common classes usually are desgned to be small and easy to use, partcularly for the frequently used ones. Oppostely, large scale classes are desgned for complex functonalty and computng algorthm. But n programs there are more classes whch are smple and common, and complex classes are less. So for the class length statstcs, smaller classes are majorty, and larger ones are mnorty. Addtonally, n the step of functonal desgn some classes have been desgned to acheve certan functons but these classes are just created but not coded completely. Ths stuaton s obvous partcularly n multple versons comparson of program desgn. Maybe these classes only contan some member varables, comments, or even just class names themselves. Ths knd of unfnshed classes may cause mnorty smaller classes. From the analyss of statstcs and program desgn, the numercal comparson relatonshp of larger classes and smaller classes would be clear. Above all, one conjecture s proposed n ths paper. Conjecture: the class length statstcs of programs confrm to certan dstrbuton rule. And ths dstrbuton rule should be verfed n programs statstcs. 3.2. The verfcaton of certan dstrbuton conjecture The process of curve fttng about the statstcs data of class length s shown n Fgure 2. 4

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 3.2.1 Obtan the data Fgure 2. Process of class length statstcs curve fttng. Get the data about the number of classes, the length of each class. The class length s measured by lnes of code. n s the number of classes n the program. The length of class and = 1, 2,, n. C s defned as A, 3.2.2 Data statstcs Groupng Accordng to Sturges Equaton, the classes need to be grouped. The Sturges Equaton s N = 1+ 3. 32 lg n (1) n s the number of classes. Wth Equaton (1), the classes are dvded nto N groups, named G as, = 1, 2,, N Gettng nterval scope Get the maxmum value A max and mnmum value A mn of each class s length, and the span X. [ A The nterval mn, Amax ] s dvded nto N parts, and the length of sub nterval s m = X / N. So G the nterval of group s [( Amn1) +, m Amn ] + m, = 1, 2,, N. Class number statstcs The number of classes n group Fgure 3: G s defned as P, and the statstcs algorthm s shown as 5

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 Algorthm:Class number statstcs Input: G Output: P, Begn Foreach( G ) Foreach( j = 1, 2,, N ) If( mn mn P ++; EndIf EndFor EndFor End A [( A 1) + j, m A + ] j m ) After the algorthm the vector 3.2.3 Curve fttng Graphcal vector Fgure 3. Class number statstcs algorthm. P s valued. The number of group nterval s defned as the data of x axs, and P s defned as the data of y axs. So a seres of ponts s created n the rectangular coordnates to represent the class length statstcs. Curve Fttng Accordng to the pont set of class length statstcs n the rectangular coordnates, get one curve wth the least value of Mean Squared Error (MSE). The process of curve fttng s executed wth all types of statstcal curves. After the curve fttng of the class length statstcs from a large number of open source programs, the Exponental curve s found to be the optmal fttng curve defned as y y A e R0 x = 0 + (2) Through the statstcs data obtanment of large amount of programs, the resduals threshold T s calculated. The value of resduals threshold T s the average of each group MSE n open source programs curve fttng. Ths resduals threshold s used for bad smell detecton. Wth the class length data statstcs of programs to be checked, f the resdual R of group s larger than the resduals threshold, there s Large Class bad smell n ths group, and the bad small classes n ths group s R T. 6

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 4. BAD SMELL DETECTION Usually the quarantne programs are open source programs whch contan a large number of classes. In the detecton method, the nputs are the codes, and the outputs are the bad smell classes. 4.1. Bad smell locaton n group Classes are dvded wth ther length, by Sturges Equaton, and the result s created n a dmensonal vector P = { P 1,P 2, P N }. And ths vector P s ftted wth Exponental curve n the rectangular coordnates. The optmal fttng curve wth least value of MSE s y y ' A e R 0' x = 0 + (3) After curve fttng, the postve resdual R + s: R + = P y (4) Where P s the number of classes n group G, and y s the value of Equaton (3) curve n place. If R + > T, there are bad smell classes n group G, and the number N of bad smell classes s computed n Equaton (5). 4.2. Bad smell locaton n class N = R T (5) + As the bad smell group locaton above, the bad smell groups may not be the largest groups. Smlarly, the dentfyng method s not to smply select the x largest classes. So t s the key of Large Class bad smell detecton: the detectng bass s not from the metrcs of destnaton class tself (length or others), but from metrcs of all the classes. In ths paper, the bad smell locaton n class s dentfed wth the nner coheson of classes. The coheson metrc s defned wth the entty dstance theory. In entty dstance theory, these concepts should be defned. Defnton 1 (Entty): the entty s the attrbute a or the method m n one class, whch s sgned as E. Defnton 2 (Property Set): the property set s the set of enttes whch have nvokng-relatons wth the gven entty E, and t s sgned as P() E. If one method uses (accesses/calls) one attrbute or another method, they two have nvokng-relatons wth each other. In more detal, P() a contans a tself and all the methods use a, and P() m contans tself and all the attrbutes and methods m uses. Defnton 3(Dstance): the dstance value Dst( E1,) E2 of entty E1 and E 2 s 7

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 ()() P E Dst( E1,) E2 1= ()() P E P E P E 1 2 1 2 (6) Where () P x s the member count of P() x, and the dstance between entty e and class C s the average of the dstances between e and every entty nc : D( e,) C 1 Dst( e,) y, e y EC EC 1 y C = 1 Dst( e,) y e E, y E EC y C C C (7) Where EC s the set of enttes C contans. Defnton 4(Coheson Metrc): the Coheson Metrc value s the rate of the average of the dstance of enttes out of the class and those n the class. Coneson C = e C e C Ds tan ce( e,c) e C Ds tan ce( e,c) e C (8) If the coheson metrc value s smaller, the degree of coheson s lower. So wth the x smallest coheson metrc value, these classes are dentfed to Large Class bad smell. 5. REFACTORING SCHEME In ths secton the classes whch are sure to have Large Class bad smell s refactored. And the refactorng process s Extract Class, whch means the destnaton class should be dvded nto two or more new classes. In practce, the destnaton class would be dvded nto two parts, and the bad smell detecton would be executed agan. The basc dea of refactorng scheme s to dvde the enttes n the destnaton class based on the coheson degree among them. So the key deas are how to represent coheson degree between enttes n classes and how to cluster enttes n classes. 5.1. Coheson degree representaton of enttes n class The coheson degree s represented as the dstance between two enttes. The dstance value of entty E1 ande 2 s shown n Equaton (5). Before clusterng, all the dstances between each two enttes n the destnaton class should be computed accurately. The lower dstance value s, the hgher the coheson degree s. 8

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 5.2. Enttes clusterng algorthm The agglomeratve clusterng algorthm [18] (whch s a herarchcal clusterng algorthm) s used n ths paper. The process s gven below: 1) Assgn each entty to a sngle cluster, and the dstance value of each two cluster s the dstance of the two enttes; 2) Repeat mergng untl the total cluster number reduces to 2. And the consdered mergng crteron s to merge two clusters wth the lowest dstance value. After mergng once, the dstance to the new mergng cluster s the average of those to last clusters. 3) Output the two clusters (each of them contans several enttes). The agglomeratve clusterng algorthm s gven n Fgure 4: Algorthm:Agglomeratve Clusterng Algorthm Input:ench enttes and ther dstance Output:two new clusters Begn each entty s assgned to be a sngle cluster; Whle(clusterng number s more than 2) merge two clusters A, B wth the lowest dstance value as cluster C; Foreach(any other cluster X n the class) Dst(C,X)=Avg(Dst(A,X),Dst(B,X)); EndFor EndWhle Fgure 4. Agglomeratve clusterng algorthm of refactorng. After the algorthm, accordng to the two new clusters, Extract Class operaton would be executed as refactorng. 6. EXPERIMENTAL RESULTS In ths paper several Java open source programs are used to detect Large Class bad smells. The names of these programs are shown s Table 2: Table 2. Open source programs n Large Class bad smell detecton. Program name Number of classes HSQLDB 2.2.4 111 Tyrant 0.96 116 Swng WT 0.60 44 Trama 16 ArgoUML 1874 JFreeChart 1.0.13 504 9

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 6.1. Large Class bad smell locaton n group If all the groups of statstcs data have hgh fttng degree (through threshold comparson) after detecton, there s no Large Class bad smell at all. And sometmes the postve resdual s less than 0, so t s detected to be no bad smell. The results of Large Class bad smell group locaton towards the programs n Table 2 s shown n Table 3: Table 3. Results of Large Class bad smell group locaton. Group HSQLDB2.2. Tyrant0.96 SwngWT0.6 Trama ArgoUML Soul3.0 4 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0 4 0 0 0 0 0 0 5 0 0 0 1 0 0 6 0 0 0 0 0 7 0 0 3 0 4 8 0 2 0 0 9 1 0 10 1 0 11 0 12 2 In Table 3, the nonzero dgt N means the exstence of Large Class bad smell n ts group, whch s computed n Equaton (5). And the value means the number of Large Class bad smell n the group. If N s equal to zero, there s no Large Class bad smell at all n G group. Besdes, dfferent programs have dfferent numbers of class, so the group number of each program s dfferent wth each other. So maybe there s no dgt n N poston. HSQLDB2.2.4 has only 8 groups, so they are blank spaces n group 8 to group 12. The ArgoUML program has 12 groups, whch s more than any others. 6.2. Large Class bad smell class locaton n class The coheson metrcs of classes n group locaton are computed wth the Equaton (8) to detect whch class/classes have bad smell. As the locaton method proposed n Secton 4.2, N classes were detected as Large Class bad smell wth smallest coheson metrcs. Table 4 shows the coheson metrcs of group 8 class members of Tyrant0.96 program. Wth ths, the classes Creature and GameScreen are both located to be Large Class. 10

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 Table 4. Coheson metrcs of group 8 class members of Tyrant0.96 program Class name Number of lnes Coheson metrc Creature 898 5.763 GameScreen 625 3.125 Map 788 12.061 Table 5 shows the coheson metrcs of group 7 class members n JFreeChart1.0.13 program. After the coheson metrcs computng and analyss, the classes AbstractRenderer, PePlot, CategoryPlot and ChartPanel are dentfed as Large Class. Table 5. Coheson metrcs of group 7 class members of JFreeChart1.0.13 program Class name Number of lnes Coheson metrc AbstractRenderer 1879 4.932 PePlot 1725 5.375 DatasetUtltes 1808 5.864 ChartPanel 1642 3.509 DateAxs 1752 12.826 6.3. Code refactorng results The classes wth bad smell should be refactored by Extract Class accordng to the enttes dstance and agglomeratve clusterng algorthm. After refactorng the programs should be test agan. Fgure 5 shows the test results of Tyrant0.96 before and after refactorng. (a) Fttng curve before refactorng (b) Fttng curve after refactorng Fgure 5. Comparsons on the results of Tyrant0.96 before and after refactorng. In Fgure 5(a), MSE of the data s 0.01810, and that s 0.01037 n Fgure 5(b). In Fgure 5(b), the curve has better approxmaton than that n (a). The MSE s less than threshold, so the refactorng s effectve and there s no Large Class bad smell at all. 6.4 Comparsons wth PMD and Checkstyle tools In the secton of related work, the refactorng tools PMD and Checkstyle are ntroduced. PMD and Checkstyle have the ablty for Large Class bad smell detecton, and no refactorng operaton suggeston. As mentoned, n these tools, f the lne number of one class s hgher preset threshold, the class s detected as Large Class. The threshold of PMD for Large Class s 1000, and the 11

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 threshold of Checkstyle s 2000. But PMD and Checkstyle cannot provde refactorng schemes for exstng Large Class bad smells. The detectng results from these two refactorng tools are compared wth the approach n ths paper. The results of comparson are shown n Table 6. After manual confrmaton, the precson comparsons of the methods n ths paper and PMD & Checkstype are dsplayed n Table 7. Table 6. Coheson metrcs of group 7 class members of JFreeChart1.0.13 program Detecton Tyrant0.96 JFreeChart1.0.13 tools & methods Large Class Number Large Class Name Large Class Number Large Class Name Method n 2 Creature AbstractRenderer, PePlot, 4 ths paper GameScreen DatasetUtltes, ChartPanel PMD 0 -- AbstractRenderer, PePlot, DatasetUtltes, ChartPanel, DateAxs, ChartFactory, 12 AbstractXYItemRenderer, ContourPlot, ThermometerPlot, AbstractCategoryItemRenderer XYPlot, CategoryPlot Checkstyle 0 -- 2 XYPlot, CategoryPlot Table 7. Precson comparsons of the methods n ths paper and PMD & Checkstype Program Bad smell detecton Precson (%) Refactorng scheme Precson (%) PMD Checkstyle Ths paper method Ths paper method Tyrant0.96 -- -- 100 100 JFreeChart1.0.13 33.33 0 100 100 -- means that the precson rate cannot be computed. From the comparson n Table 7, the method n ths paper s much better than the exstng Large Class bad smell detecton tools. In small scale programs the classes are general small, the potental probablty of Large Class s low, and vce versa. The CLDM s more sutable for larger scale programs Large Class bad smell detecton and refactorng schemes. Small scale programs have less Large Class, so the false postve rate of CLDM s hgher. In addton, because of the dfferent threshold, the detectng Precson and Recall of PMD and Checkstyle are not the same n dfferent scale programs. But t s not sure whch threshold s more effectve for all the programs. 7. CONCLUSION In ths paper the approach of Large Class bad smell detecton and refactorng scheme has been proposed. Fxed-threshold-based detecton method s analyzed to be rgd and error-prone. And a 12

Internatonal Journal of Software Engneerng & Applcatons (IJSEA), Vol.4, No.5, September 2013 new model s developed to descrbe the statstc dstrbuton of class length. In ths model, the class groups that are far away from the dstrbuton curve are treated as contanng bad smells potentally. And combnng wth coheson metrc computng, the bad smell classes are confrmed n the class groups. After usng Agglomeratve Clusterng Technque, the scheme of Extract Class s proposed for refactorng. The contrbutons of ths paper are as follows. Frst, the characterstcs of Large Class bad smell are quantfed wth statstcal analyss. Second, the length and coheson metrcs based approach s proposed for Large Class bad smell detecton. ACKNOWLEDGEMENTS Ths research s supported by the Natonal Natural Scence Foundaton of Chna under Grant No.61173021 and the Research Fund for the Doctoral Program of Hgher Educaton of Chna (Grant No. 20112302120052 and 20092302110040). REFERENCES [1] M. Fowler, (1999) Refactorng: Improvng the desgn of exstng code, Addson-Wesley, pp89-92. [2] B.F. Webster, (1995) Ptfalls of Object Orented Development, frst M&T Books, Feb. [3] A.J. Rel, (1996) Object-Orented Desgn Heurstcs, Addson-Wesley. [4] G. Travassos, F. Shull, M. Fredercks, & V.R. Basl., (1999) Detectng Defects n Object -Orented Desgns: Usng Readng Technques to Increase Software Qualty, Proceedng of 14th Conference n Object-Orented Programmng, Systems, Languages, and Applcatons, pp47-56. [5] R. Marnescu, (2004) Detecton Strateges: Metrcs -Based Rules for Detectng Desgn Flaws, Proceedng of 20th Internatonal Conference n Software Mantenance, pp350-359. [6] Ladan Tahvldar & Kostas Kontoganns, (2003) A Metrc -Based Approach to Enhance Desgn Qualty through Meta-Pattern Transformatons, 7th European Conference Software Mantenance and Reengneerng, pp183-192. [7] M. O'Keeffe & M. O'Cnnede, (2008) Search -based refactorng: an emprcal study, Journal of software mantenance and evoluton: research and practce,pp345-364. [8] K. Dhambr, H. Sahraou & P. Pouln, (2008) Vsual Detecton of Desgn Anomales, Proceedng of 12th European Conference n Software Mantenance and Reeng, pp279-283. [9] G. Langeler, H.A. Sahraou & P. Pouln, (2005) Vsualzaton-Based Analyss of Qualty for Large- Scale Software Systems, Proceedng of 20th Internatonal Conference n Automated Software Engneerng, pp214-223. [10] M. Lanza & R. Marnescu, (2006) Object -Orented Metrcs n Practce, Sprnger-Verlag. pp125-128. [11] E. van Emden & L. Moonen, (2002) Java Qualty Assurance by Detectng Code Smells, Proceedng of 9th Workng Conference n Reverse Engneerng, pp120-128. [12] F. Smon, F. Stenbruckner C. Lewerentz, (2001) Metrcs Based Refactorng, Proceedng of 5th European Conference n Software Mantenance and Reengneerng, pp30-38. [13] D.X. Jang & P.J. Ma, (2012) Detectng Bad Smells Wth Weght Based Dstance Metrcs Theory, Proceedng of 2nd Internatonal Conference on Instrumentaton, Measurement, Computer, Communcaton and Control, pp299-304. [14] H. Lu, Z.Y. Ma & W.Z. Shao, (2012) Schedule of Bad Smell Detecton and Resoluton: A New Way to Save Effort, IEEE Transactons on Software Engneerng, Vol. 38, No. 1, pp220-235. [15] D. Fontana, A. Francesca & P.Braone, (2012) Automatc detecton of bad smells n code A n expermental assessment, Journal of Object Technology, Vol. 11, No. 2, pp1-38. [16] http://pmd.sourceforge.net. [17] http://checkstyle.sourceforge.net. [18] J.W. Han & M. Kamber, (2005) Data Mnng Concepts and Technques, Morgan Kaufmann Publshers. 13