The Application of Data Mining in Sports Events

Similar documents
Basketball field goal percentage prediction model research and application based on BP neural network

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

Smart-Walk: An Intelligent Physiological Monitoring System for Smart Families

Journal of Chemical and Pharmaceutical Research, 2014, 6(3): Research Article

2nd International Conference on Management Science and Industrial Engineering (MSIE 2013)

Large olympic stadium expansion and development planning of surplus value

A Machine Learning Approach to Predicting Winning Patterns in Track Cycling Omnium

2017 International Conference on Advanced Education, Psychology and Sports Science (AEPSS 2017)

Using Spatio-Temporal Data To Create A Shot Probability Model

Legendre et al Appendices and Supplements, p. 1

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

THE APPLICATION OF BASKETBALL COACH S ASSISTANT DECISION SUPPORT SYSTEM

Impulse Lab Write Up. K leigh Olsen. 6th hour

Computer diagnostics for the analysis of table tennis matches *

Ocean Fishing Fleet Scheduling Path Optimization Model Research. Based On Improved Ant Colony Algorithm

EFFICIENCY OF TRIPLE LEFT-TURN LANES AT SIGNALIZED INTERSECTIONS

Journal of Chemical and Pharmaceutical Research, 2014, 6(3): Research Article

Combined impacts of configurational and compositional properties of street network on vehicular flow

Keywords: multiple linear regression; pedestrian crossing delay; right-turn car flow; the number of pedestrians;

A Correlation Study of Nadal's and Federer's Technical Indicators in Different Tennis Arenas

Linear Compressor Suction Valve Optimization

Journal of Human Sport and Exercise E-ISSN: Universidad de Alicante España

Investigation of Winning Factors of Miami Heat in NBA Playoff Season

Bulletin 1 (issued on 1 Oct 2017)

Road Data Input System using Digital Map in Roadtraffic

EBVS are LESSONS FROM THE ASBP / 2. EBVs of Bulls Entered in the ASBP have provided a reliable Prediction of the Performance of their Progeny

TECHNICAL NOTE THROUGH KERBSIDE LANE UTILISATION AT SIGNALISED INTERSECTIONS

Decompression Method For Massive Compressed Files In Mobile Rich Media Applications

A SEMI-PRESSURE-DRIVEN APPROACH TO RELIABILITY ASSESSMENT OF WATER DISTRIBUTION NETWORKS

bespoke In general health and rehabilitation Breath-by-breath multi-functional respiratory gas analyser In human performance

NCSS Statistical Software

JEPonline Journal of Exercise Physiologyonline

Folding Reticulated Shell Structure Wind Pressure Coefficient Prediction Research based on RBF Neural Network

DO HEIGHT AND WEIGHT PLAY AN IMPORTANT ROLE IN BLOCK AND ATTACK EFFICIENCY IN HIGH-LEVEL MEN S VOLLEYBALL?

Quantifying Bullwhip Effect and reducing its Impact. Roshan Shaikh and Mudasser Ali Khan * ABSTRACT

INFLUENCE OF MEASURING PROCESS AUTOMATION ON UNCERTAINTY OF MASS STANDARD AND WEIGHTS CALIBRATION.

Running head: DATA ANALYSIS AND INTERPRETATION 1

Zhang Shuguang. The Development of Community Sports Facilities in China: Effects of the Olympic Games

Analyses of the Scoring of Writing Essays For the Pennsylvania System of Student Assessment

Navigate to the golf data folder and make it your working directory. Load the data by typing

I'd like to thank the Board for the opportunity to present here today.

Retention Time Locking: Concepts and Applications. Application

Correlation analysis between UK onshore and offshore wind speeds

Mobility Detection Using Everyday GSM Traces

Journal of Human Sport and Exercise E-ISSN: Universidad de Alicante España

Analysis of Foot Pressure Variation with Change in Stride Length

The Application of Pedestrian Microscopic Simulation Technology in Researching the Influenced Realm around Urban Rail Transit Station

Numerical Simulation And Aerodynamic Performance Comparison Between Seagull Aerofoil and NACA 4412 Aerofoil under Low-Reynolds 1

Research of Variable Volume and Gas Injection DC Inverter Air Conditioning Compressor

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

u = Open Access Reliability Analysis and Optimization of the Ship Ballast Water System Tang Ming 1, Zhu Fa-xin 2,* and Li Yu-le 2 " ) x # m," > 0

Improving Smelt Dissolving Tank TTA Control at Zellstoff-Celgar. Coyle, Irene; Russell, Shawn; Zellstoff Celgar LLP

Torrild - WindSIM Case study

Open Research Online The Open University s repository of research publications and other research outputs

The Effect of Driver Mass and Shaft Length on Initial Golf Ball Launch Conditions: A Designed Experimental Study

CALIBRATION OF THE PLATOON DISPERSION MODEL BY CONSIDERING THE IMPACT OF THE PERCENTAGE OF BUSES AT SIGNALIZED INTERSECTIONS

A Combined Recruitment Index for Demersal Juvenile Cod in NAFO Divisions 3K and 3L

GN21 Frequently Asked Questions For Golfers

DEVELOPMENT OF A SET OF TRIP GENERATION MODELS FOR TRAVEL DEMAND ESTIMATION IN THE COLOMBO METROPOLITAN REGION

Performance Analysis of Centrifugal Compressor under Multiple Working Conditions Based on Time-weighted Average

Revisiting the Hot Hand Theory with Free Throw Data in a Multivariate Framework

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

The Effect of Some Variables Offensive and Defensive Play on the Order of Participating teams Ranked in the World Cup Football 2010

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

Authors: Mário Rui da Cunha Pereira, Hands on Science

The Use of Rank-sum Ratio on the Attack and Defensive Ability of the Top 8 Men s Basketball Competition in the 16th Asian Games

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

Computational Analysis Task Casio ClassPad

Analysis of Estimated Breeding Values for Marble Score, Carcase Weight and the Terminal Carcase Index

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

Available online at ScienceDirect

Research on the Sealing Detection Technology for No Leak Detection Interface Specimen Yingjun Huanga, Xudong Liaob, Guoyun Baic, Tao Chend, Miao Loue

EVACUATION SIMULATION FOR DISABLED PEOPLE IN PASSENGER SHIP

INFLUENCE OF ENVIRONMENTAL PARAMETERS ON FISHERY

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

High intensity in football: is it correlated with technical events outcome? Submission Type: Original investigation

THE PRESSURE SIGNAL CALIBRATION TECHNOLOGY OF THE COMPREHENSIVE TEST


Study on the shape parameters of bulbous bow of. tuna longline fishing vessel

If this is in reference to the Junior Team then the procedure is exactly the same.

Biostatistics & SAS programming

Lab Report Outline the Bones of the Story

Information Technology for Monitoring of Municipal Gas Consumption, Based on Additive Model and Correlated for Weather Factors

The performance preconditions of Czech world junior tennis champions, 14 years and under, girls

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement

2017 RELIABILITY STUDY STYLE INSIGHTS

Is lung capacity affected by smoking, sport, height or gender. Table of contents

Study on the Influencing Factors of Gas Mixing Length in Nitrogen Displacement of Gas Pipeline Kun Huang 1,a Yan Xian 2,b Kunrong Shen 3,c

CSA Genetic Evaluation

DOI /HORIZONS.B P23 UDC : (497.11) PEDESTRIAN CROSSING BEHAVIOUR AT UNSIGNALIZED CROSSINGS 1

You have all these reasons to encourage this kind of data capture. This is

==================================================

First Experimental investigations on Wheel- Walking for improving Triple-Bogie rover locomotion performances

Chapter 20. Planning Accelerated Life Tests. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

HSIS. Association of Selected Intersection Factors With Red-Light-Running Crashes. State Databases Used SUMMARY REPORT

Projecting Three-Point Percentages for the NBA Draft

Failure Data Analysis for Aircraft Maintenance Planning

Opleiding Informatica

Basketball data science

SPECTRAL CHARACTERISTICS OF FLUCTUATING WIND LOADS ON A SEPARATE TWIN-BOX DECK WITH CENTRAL SLOT

Transcription:

2nd International Conference on Electronics, Network and Computer Engineering (ICENCE 2016) The Application of Mining in Sports Events Jiao Yu-ping1,a, Li Xia2,b 1 Physical Education Department Guangdong University of Foreign Studies, China 2 Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, China a michael0924@qq.com,b200211025@oamail.gdufs.edu.cn Key words: mining; The Asian Games; Volleyball; Correlation analysis Abstract. In recent years, data mining has played a more and more important role in all fields and had an increasingly greater influence. By using association analysis of data mining and classification algorithm, this paper analyzes the result data of volleyball games, which were held in a university s volleyball venue during 2010 Guangzhou Asian Games. It studies the correlation between different competing countries, competition time and audience number and make correlation analysis between the competition results and athletes physical qualities. These results may be significant for scientific game guidance and athlete selection. 1. Introduction With the development of China s competitive sports and the constant improvement of athletes competitive levels, a higher request was also put forward towards sports training and competition efficiency. Today, in the age of Internet and technology, both the training of competitive sports and the selection of athletes need a more scientific prediction, management and program. However, the data management of our country s competitive sports is largely at a state of disorder at present: modern information technology and various data of country s competitive sports development, which was accumulated for years, have not been fully used; regularities and modes other than objective experience have not been mined efficiently neither. In these 10 years, the skill of data mining and analysis becomes increasingly mature. If it is applied to the training and the competing process of athletic competition, they will be more scientific and standardized. mining is defined as mining the hidden, unknown but potentially useful information from plenty of actually applied data, which is massive, incomplete, noisy, ambiguous and random. These hidden, unknown, but potentially useful information can be presented in various forms such as concept, rule, mode and law. Simply put, data mining is a deep-level data analysis approach. That is: data mining is a kind of information technology which is not limited to search and access, but find potential links between different data. This paper analyses the result data of volleyball games, which was held in a volleyball venue of a university during 2010 Guangzhou Asian Games, using Apriori algorithm and ID3 algorithm. It studies the correlation between different competing countries, competition time and audience number and the correlation between competition results and athletes physical qualities respectively. These results may be significant for scientific game guidance and athlete selection. 2. The application of data mining in the data analysis of Asian Games volleyball matches 2.1 Experimental data All data for this research comes from the information department of main volleyball venue of women volleyball games at 2010 Asian Games (Guangwai Gymnasium). All data is realistic and believable including detail information such as weight and height of athletes from 11 competing 2016. The authors - Published by Atlantis Press 815

nations and statistics of 35 matches. 2.1.2 ing The data of this article comes from the volleyball s group game record in 2010 Asian Games, Guangwai Gymnasium. The record contains 5 data field: Num, Country1, Country2, Time and Class. 35 data was in total and 11 competing country was such as KAZ, TPE and CHN. In order to facilitate mining, we the collected data. 3 Analysis of experimental results 3.1 Association analysis of competing nations, audience number and game time Mining ed arff documents with association analysis method using WEKA software. Results are as follow. Apriori = = = = = = = Minimum support : 0.1 (4 instances) Minimum metric <confidence>: 0.8 cycles performed: 18 Generated sets of large ets: Size of set of large ets L(1): 15 Size of set of large ets L(2): 7 Best rules found: 1. Country1=MGL 4 = => time=1 4 conf:(1) 2. Country1=CHN 4 = => class=3 4 conf:(1) 3. Class=3 5= =>country1=chn 4 conf:(0.8) 4. Countey2=MDV 5 = =>time=1 4 conf:(0.8) Analysis of experimental results: On the basis of the 4 association rules from the experiment results, we can reach the following conclusion. (1) When CHN team takes part in the match, audience will always reach a number of 1500 or above. As a result, organizing committee should arrange more hands in volunteering, security and logistical support to avoid accident like stampede, which happens because of over-crowding. (2) When MGL (Mongolia) or MDV (Maldives) team takes part in a match, competing time of the match will always limited to 40-60 minutes because of the weak strength of those teams. So under this circumstance, organizing committee should arrange corresponding staff to hold their position as quick as possible to fit in the short competing time and make good preparation for the next match. 3.2 mining analysis of single technique s rank and athlete s physical qualities based on correlation analysis 3.2.1 Experimental objectives By concluding from the scores of 35 women's volleyball matches, in order to find the rules which can be used as an index for athlete s selection,we hope that a relation between physical quality and scores could be confirmed through the correlation analysis of some related data of the physical quality about the top 48 athletes in single technic. 3.2.2 Introduction of data set and data Description of this experiment s raw data set and data in this paper are shown as table1, 2 and 3. This research spent a large amount of time on the of the data. Table1. 3 Athlete s technical ranking of Asian group games.txt Athlete s technical ranking of Asian group games 816

sources Technical ranking of volleyball athletes on Guangwai volleyball venue in 2010 Asian Games We double-checked, cleared and corrected null value and default on the basis of original hard copy file data. Only those who ranks inside top 48 with total technical score are 48 recorded 7 Abbreviation Name of data fields rk rank figure 1~48 no number category 1~13 noc nation category KAZ=Kazakhstan, TPE=Taipei (China), PRK=People s Republic of Korea, THA=Thailand, TJK=Tajikistan, MGL=Mongolia, MDV=Maldives, KOR=Korea, JPN=Japan, IND=India, CHN=China spike spike figure times of athletes spike 16~120 block block figure times of athletes block 0~21 serve serve figure times of athletes serve 0~19 total total figure total scores of the three skills 27~139 sources Table2 4--- of athlete s physical quality.txt of athlete s physical quality Athlete s physical quality data in the group games on Guangwai volleyball venue in 2010 Asian Games We double-checked, cleared and corrected null value and default on the basis of original hard copy file data. For some missing official data, we use? to represent. The data are from the athletes who come from 11 participating 130 countries respectively in this Asian Games, women s volleyball games. 7 Abbreviation Name of data fields noc nation category KAZ=Kazakhstan, TPE=Taipei (China), PRK=People s Republic of Korea, THA=Thailand, TJK=Tajikistan, MGL=Mongolia, MDV=Maldives, KOR=Korea, JPN=Japan, IND=India, CHN=China no number category height height figure athlete s height 167~192 weight weight(kg) figure athlete s weight 61~90 blockmax the highest the max height of spike that the point of spike figure 817

the highest the max height of block that the servemax point of block figure birth year of birth figure athlete s year of birth 1978~1994 Table3 5---- Comparison of technical ranking and athlete s physical index.arff Comparison of technical ranking and athlete s physical index sources Two data sources that table1 and table2 corresponds with respectively from table1 and table2 was collected and data transformation was done to adapt the data set to WEKA format. 48 The data are from the athletes who come from 11 participating countries respectively in this Asian Games, women s volleyball games. Abbreviation 12 field name rk rank figure 1~48 no number category 1~13 noc nation category KAZ=Kazakhstan, TPE=Taipei (China), PRK=People s Republic of Korea, THA=Thailand, TJK=Tajikistan, MGL=Mongolia, MDV=Maldives, KOR=Korea, JPN=Japan, IND=India, CHN=China spike spike figure times of athletes spike 16~120 block block figure times of athletes block 0~21 serve serve figure times of athletes serve 0~19 total total score figure total scores of the three skills 27~139 height height figure athlete s height 167~192 weight weight(kg) figure athlete s weight 61~90 spikemax blockmax the highest point of spike the highest point of block figure figure the max height of spike that the the max height of block that the birth year of birth figure athlete s year of birth 1978~1994 After organizing needed data sheets: table1, 2 and 3, in order to mine the results of this experiment (relationship between athlete s single skill ranking and physical quality), athlete s height, weight, spikemax, blockmax and birth were extracted from table2 (data of athlete s physical quality). Along with table2 s data, correlated single skill rankings in table1 were extracted as well to form the 48 samples. Because the data are used to make correlation analysis with Apriori algorithm, 6 attributes of the data set that we prepared are figures; we need to divide them into different sections to adapt this algorithm. We divided the data into 4-6 sections by rules respectively. 51.arff are produced as belows. 818

Table4. 51---- Comparison of technical ranking and athlete s physical index.arff Comparison of technical ranking and athlete s physical index sources from table1 and table2 For the fact that the data set itself has a mass of number attribute, we pre-treat several attributes into sections in order to make correlation analysis with Apriori algorithm. The data are from the athletes who come from 11 participating countries respectively in this Asian Games, women s volleyball 48 games. 6 Abbreviati Name of data on fields height athlete s height category athlete s height 1,2,3,4,5,6 weight athlete s weight category athlete s weight 1,2,3,4,5,6 spikemax the highest point category the max height of spike that of spike the 1,2,3,4,5,6 blockmax the highest point category the max height of block that of block the 1,2,3,4,5,6 birth year of birth category athlete s year of birth 1,2,3,4 class ranking class category athlete s ranking section 1,2,3,4,5,6 The expression of data section height weight spike block birth total 166-170 61-65 261-270 261-270 76-80 20-39 1 171-175 66-70 271-280 271-280 81-85 40-59 2 176-180 71-75 281-290 281-290 86-90 60-79 3 181-185 76-80 291-310 291-300 91-95 80-99 4 186-190 81-85 301-310 301-310 100-119 5 191-195 86-90 311-320 311-320 120-139 6 3.2.3 Mining data by WEKA and Apriori algorithm WEKA and Apriori algorithm were used to mine the data in table 3 and 4 In the process of mining, the confidence coefficients of these two algorithms were set as 0.9 and 0.7 respectively (due to space constraints, we haven t provide the experimental conclusion but only analysis.) In the end, experimental conclusion was drawn. (1) There is no strong relationship between physical quality and competition results. Athlete around 25 years old not only owns an excellent physical fitness but also have abundant competition experience, which is helpful to adjust their psyche in the competition. (2) Although there is no strong relationship between physical quality and competition results, we can see that height enjoy an advantage in volleyball match. Athletes with appropriate weight and height serve more powerful, and their blocks will be relatively a lot easier. It is a must for them to achieve good results. (3) Therefore, when selecting athletes, it s more favorable for the improvement of competition results to choose athletes with about 185cm in height and 70kg in weight. 819

(4) The disadvantage of this experiment is that mining result may have errors due to the less data set. 3.3 mining of differences between Athletes physical conditions of each competing nations on the basis of correlation analysis 3.3.1 Experimental objectives To make some objective recommendation to the training and selecting of athletes through analyzing data such as height, weight, spike and block of athletes from 11 competing countries. :Put hard-copy data into computer; delete some without athlete s personal information and transfer some field types (transformation towards the inter-partition). One of the original 11 countries was also deleted because it never took part in international games before and its athletes information was absent. Therefore, 10 countries athlete information was left. 3.3.2 mining of correlation analysis Mining the data that shows the differences in physical quality of different nations athlete using Apriori algorithm, we can draw the following conclusion: (1) The heights of spiking and blocking are strongly correlated, and it can be confirmed that the height of the players is related to their level of blocking by changing the parameters. (2) The heights of spiking and blocking are also related to the weight of the players. Generally, lightweight players ( 64 kg) can reach the heights of no more than 289cm when spiking and no more than 279cm when blocking. As a result, players who are comparatively higher and lighter should be a better choice for training and selecting. Summary mining is a new information technology with strong vitality and actual effect, which can be widely applied to all fields. Among them, the field of sports has accumulated massive data. So there is no doubt that sports statistic plays an irreplaceable role in scientific studies of sports. The data mining method that was put forward in this paper can not only be applied to technical-tactics analysis and staffing arrangement of volleyball games, but also to the analysis and application of other events. mining will exert a greater influence on all fields in the future. Also, it will have a broad application prospect on the field of sports. References [1] Pawan Lingras,Unsupervised Rough Set Classification Using Gas, Journal of Inteligent Information Systems, 2001,16: 215~228. [2]L.P.Khoo,S.B.Tor and LY.Zhai,A Rough-Set-Based Approach for Classification and Rule Induction,International journal of Advanced Manufacturing Technology, 1999, 15 :438-444 [3]Wu X, Zhu X,Wu G Q, ea al.date mining with big date[j].knowledge and Date Engineering,IEEE Transactions on,2014,26(1):97-107. [4] Bryant R, Katz R H,Lzaowska E D. Big- Computing: creating Revolutionary Breakthroughs in Commerce, Scieence and Society[J].2008. [5]McKenna A, Hanna M, Banks E, et al.the Genome Analysis Toolkit: a MapReduce framework for analyzing next-generationg DNA sequencing date [J].Genome research, 2010,20(9):1297-1303. [6]Aleksander Ohrn.ROSETTA Technical Reference Manua.Knowledge Systems Group. Department of Comouter and Information Science, NTNU.Trondheim,Norway.1999(11). 820