Extensible Detection and Indexing of Highlight Events in Broadcasted Sports Video

Similar documents
ICC WORLD TWENTY ( WORLD CUP-2014 )- A CASE STUDY

A SECOND SOLUTION FOR THE RHIND PAPYRUS UNIT FRACTION DECOMPOSITIONS

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING, THE UNIVERSITY OF NEW MEXICO ECE-238L: Computer Logic Design Fall Notes - Chapter 6.

8.5. Solving Equations II. Goal Solve equations by balancing.

SPH4U Transmission of Waves in One and Two Dimensions LoRusso

ELIGIBILITY / LEVELS / VENUES

ELIGIBILITY / LEVELS / VENUES

THE EFFECTS OF COUPLED INDUCTORS IN PARALLEL INTERLEAVED BUCK CONVERTERS

The structure of the Fibonacci numbers in the modular ring Z 5

ELIGIBILITY / LEVELS / VENUES

ELIGIBILITY / LEVELS / VENUES

THE LATENT DEMAND METHOD

Lecture 13a: Chunks. Announcements. Announcements (III) Announcements (II) Project #3 Preview 4/18/18. Pipeline of NLP Tools

DAMAGE ASSESSMENT OF FIBRE ROPES FOR OFFSHORE MOORING

number in a data set adds (or subtracts) that value to measures of center but does not affect measures of spread.

Catenary Analysis and Calculation Method of Track Rope of Cargo Cableway with Multiple Loads

Available online at ScienceDirect. Procedia Engineering 113 (2015 )

This report presents an assessment of existing and future parking & traffic requirements, for the site based on the current development proposal.

2) What s the Purpose of Your Project?

ELIGIBILITY / LEVELS / VENUES

"The twisting movement of any hoof should, for physiological reasons, not be hindered by Shoeing." (Lungwitz 1884)

GENETICS 101 GLOSSARY

Series 600 Accessories

SPEED OF SOUND MEASUREMENTS IN GAS-MIXTURES AT VARYING COMPOSITION USING AN ULTRASONIC GAS FLOW METER WITH SILICON BASED TRANSDUCERS

Patrick Boston (Leeds University) and Mark Chapman (Edinburgh University)

Eagan YMCA Swim Lessons Schedule

Andover YMCA Swim Lessons Schedule

Headfirst Entry - Diving and Sliding

Load Calculation and Design of Roller Crowning of Truck Hub Bearing

P h o t o g r a p h i c L i g h t i n g ( 1 1 B )

Hazard Identificaiton of Railway Signaling System Using PHA and HAZOP Methods

A Data Envelopment Analysis Evaluation and Financial Resources Reallocation for Brazilian Olympic Sports

WhisperFit EZ Ventilation Fans

Coal Pulveriser. Global Solutions

Real time lane departure warning system based on principal component analysis of grayscale distribution and risk evaluation model

Gait-Event-Based Synchronization Method for Gait Rehabilitation Robots via a Bio-inspired Adaptive Oscillator

Intersleek Pro. Divers Manual. Our World is Water CONTENTS

n UL Listed and FM Approved for n Solenoid control n Quick pressure relief valve 73Q n Pressure sustaining & reducing valve 723

A Comparison of MOEA/D, NSGA II and SPEA2 Algorithms

SYMMETRY AND VARIABILITY OF VERTICAL GROUND REACTION FORCE AND CENTER OF PRESSURE IN ABLE-BODIED GAIT

ASSESSMENT SCORING SYSTEM OF ROAD SAFETY INFRASTRUCTURE

Footwork is the foundation for a skilled basketball player, involving moves

Modelling Lane Changing Behaviour of Heavy Commercial Vehicles

1. Write down the ideal gas law and define all its variable and parameters. 2. Calculate the values and units of the ideal gas law constant R.

PERFORMANCE TEAM EVALUATION IN 2008 BEIJING OLYMPIC GAMES

Controlling noise at work

Research Article. Relative analysis of Taekwondo back kick skills biomechanics based on 3D photograph parsing. Mingming Guo

Active Travel The Role of Self-Selection in Explaining the Effect of Built Environment on Active Travel

Outline. Changing needs in Urban Traffic. Introduction The starting point Basic principles Analysis Facts Changes Context Solutions Key messages

JOBST Elvarex Plus Feel the difference in seamless 3D!

Operating Instructions SURGICAL POWER & ACCESSORIES

WIND TUNNEL EXPERIMENT ON THE EFFECT OF WIND ON SMOKE EXHAUST SYSTEMS FOR A HIGH RISE BUILDING

Obstacle Avoidance for Visually Impaired Using Auto-adaptive Thresholding on Kinect s Depth Image

Welcome to the world of the Rube Goldberg!

Characterization of Refrigeration System Compressor Performance

(612)

West St Paul YMCA Swim Lessons Schedule

Human-Robot Interaction: Group Behavior Level

draft final report NGSIM Arterial-Lane Selection Mode Federal Highway Administration Cambridge Systematics, Inc.

Range St. Dev. n Mean. Total Mean % Competency. Range St. Dev. n Mean. Total Mean % Competency

GFC NIST MASS FLOW CONTROLLERS. Typical Stainless Steel GFC Mass Flow Controller. Design Features. General Description. Principles of Operation

Held under the sanction of USA Swimming, issued by North Carolina Swimming, Inc. Sanction No. NC11117

Basic Gas Spring Theory

Climbing/Rappelling NATIONAL STANDARDS BOY SCOUTS OF AMERICA

When rule-based models need to count

» WYOMING s RIDE 2013

Introductory Rules of Officiating Small Sided Games Under 6 &Under 8 HANDBOOK

Wondering where to start?

ANALYSIS AND MODELING TIME HEADWAY DISTRIBUTIONS UNDER HEAVY TRAFFIC FLOW CONDITIONS IN THE URBAN HIGHWAYS: CASE OF ISFAHAN

The Analysis of Bullwhip Effect in Supply Chain Based on Strategic Alliance

A Comparative Investigation of Reheat In Gas Turbine Cycles

EMSBS/EMST. Drill For Machining Ultra-Deep Minute Holes FEATURES. For ultra-deep drilling of miniature holes. New chip stopper controls chip flow.

7.0 Nonmotorized Facilities

University of California, Los Angeles Department of Statistics. Measures of central tendency and variation Data display

Policy sensitivity analysis of Karachi commuters

MINNESOTA DEER MANAGEMENT

GFC NIST MASS FLOW CONTROLLERS. Typical Stainless Steel GFC Mass Flow Controller. Design Features. General Description. Principles of Operation

Electrooculogram Signals Analysis for Process Control Operator Based on Fuzzy c-means

St. Paul Midway YMCA Swim Lessons Schedule

securing your safety

The Prediction of Dynamic Strain in Leaf-Type Compressor Valves With Variable Mass and Stiffness

DFC NIST DIGITAL MASS FLOW CONTROLLERS. DFC with optional LCD readout PROG RS485. Programmable Mass Flow Controller with Digital Signal Processing

Capacity of Shared-Short Lanes at Unsignalised Intersections

A Fuzzy-based Software Tool Used to Predict 110m Hurdles Results During the Annual Training Cycle

EFFICIENT ESTIMATION OF GAS LIQUID RATIOS FOR PLUNGER LIFT SYSTEMS IN PETROLEUM PRODUCTION OPERATIONS

Introduction to Algorithms 6.046J/18.401J/SMA5503

Influences of Teaching Styles and Motor Educability on Learning Outcomes of Volleyball

Wondering where to start?

XFM DIGITAL MASS FLOW METER. XFM with Profibus Interface. XFM without. Readout. XFM with. Readout. Option

Cincinnati United Lakota Monroe

10 th International Congress of the International Maritime Association of the Mediterranean (IMAM), Rethymno, Crete, May 2002

operate regenerator top without boiling aq. amine solution.

Rochester YMCA Swim Lessons Schedule

Sequential parimutuel games

Wondering where to start?

Precautions for Total Hip Replacement Patients Only

Wondering where to start?

Traffic conflicts at roundabouts: risk analysis under car-following conditions

Application of a Statistical Method to investigate patterns of beach evolution in the vicinity of a seawall

Hastings YMCA Swim Lessons Schedule

Transcription:

Extesible Detectio ad Idexig of Highlight Evets i Broadcasted Sports Video Dia W. Tjodroegoro 1, Yi-Pig Phoebe Che 2, Bih Pham 3 1 School of Iformatio Systems, Queeslad Uiversity of Techology, Brisbae, Australia 2 School of Iformatio Techology, Deaki Uiversity, Melboure, Australia 3 Faculty of Iformatio Techology, Queeslad Uiversity of Techology, Brisbae, Australia dia@qut.edu.au, phoebe@deaki.edu.au, b.pham@qut.edu.au Abstract Cotet-based idexig is fudametal to support ad sustai the ogoig growth of broadcasted sports video. The mai challege is to desig extesible frameworks to detect ad idex highlight evets. This paper presets: 1) A statistical-drive evet detectio approach that utilizes a miimum amout of maual kowledge ad is based o a uiversal scope-of-detectio ad audio-visual features; 2) A semi-schema-based idexig that combies the beefits of schema-based modelig to esure that the video idexes are valid at all time without maual checkig, ad schema-less modelig to allow several passes of istatiatio i which additioal elemets ca be declared. To demostrate the performace of the evets detectio, a large dataset of sport videos with a total of aroud 15 hours icludig soccer, basketball ad Australia football is used. Keywords: Extesible sports video idexig, multimodal evet detectio 1 Itroductio Sports video idexig approaches ca be categorised based o low-level (perceptual) features ad high-level sematic aotatio (Djeraba, 2002). There are some elemets beyod perceptual level (kow as the sematic gaps) which ca make feature based-idexig tedious ad iaccurate. For example, users caot always describe the visual characteristics of certai objects they wat to view for each query. I cotrast, the mai beefit of sematicbased idexig is the ability to support more ituitive queries. However, sematic aotatio is geerally timecosumig, ad ofte icomplete due to the limitatios of maual supervisio ad the curretly available techiques for automatic sematic extractio. Therefore, video should be idexed usig sematic that ca be extracted automatically with miimal huma itervetio. Evetsbased idexig ca be oted as the most suitable idexig techique for sport videos as sport highlights o TV, magazie or iteret are commoly described usig a set of evets, particularly the importat or excitig oes. Copyright 2006, Australia Computer Society, Ic. This paper appeared at the Twety-Nith Australasia Computer Sciece Coferece (ACSC2006), Hobart, Tasmaia, Australia. Cofereces i Research ad Practice i Iformatio Techology (CRPIT), Vol. 48. Vladimir Estivill-Castro ad Gill Dobbie, Eds. Reproductio for academic, ot-for profit purposes permitted provided this text is icluded. As there is yet a complete solutio that ca extract all evets automatically, we eed to desig frameworks that support extesible detectio ad idexig of (highlight) evets. Extesibility is emphasized as the algorithms developed for automatic extractio of features ad sematic i sports video eed to be exteded gradually while improvig the performace. As a result of more extractable cotets, the idexig scheme eeds to support cotiuous updates. The first ad secod sectio of this paper addresses each of these issues respectively. Followig this, the experimetal results that use a large dataset are reported before we close with some coclusios ad future work. 2 Extesible Evets Detectio It has become a well-kow theory that sports evets ca be detected based o the occurreces of specific audio ad visual features which ca be extracted automatically. To date, there are two mai approaches to fuse audiovisual features. Oe alterative, called machie-learig approach, uses probabilistic models to automatically capture the uique patters of audio visual featuremeasuremets i specific (highlight) evets. For example, Hidde Markov Model (HMM) ca be traied to capture the trasitios of still, stadig, walkig, throwig, jumpig-dow ad ruig-dow states durig athletic sports evets (Wu et al., 2002). The mai beefit of usig such approach is the potetial robustess, thaks to the modest usage of domai-specific kowledge which is oly eeded to select the best features set to describe each evet. However, oe of the most challegig requiremets for costructig reliable models is to use features that ca be detected flawlessly durig traiig due to the absece of maual supervisio. Moreover, addig a ew feature ito a particular model will require re-traiig of the whole model. Thus, it is geerally difficult to build extesible models that allow gradual developmet or improvemet i the feature extractio algorithms. To tackle this limitatio, our statistical-drive models are costructed based o the characteristics of each feature. Ay additio of a ew feature will oly result o the updates of the rules that were associated with that feature. Aother alterative for audio-visual fusio is to use maual heuristic rules. For example, the temporal gaps betwee specific features durig basketball goal have a predictable patter that ca be perceived maually (Nepal et al., 2001). The mai beefit of this approach is the absece of comprehesive traiig for each highlight ad

the computatios are relatively less complex. However, this method usually relies o maual observatios to costruct the detectio models for differet evets. Eve though the umbers of domais ad evets of iterest are limited ad the amout of efforts is affordable, we primarily aim to reduce the subjectivity ad limitatio of maual decisios. These two approaches still have two major drawbacks, amely, 1) the lack of a defiitive solutio for the scope of highlight detectio such as where to start ad fiish the extractio. For example, Eki et al (Eki ad Tekalp, 2003b) detect goals by examiig the video-frames betwee the global shot that causes the goal ad the global shot that shows the restart of the game. However, this template scope was ot used to detect other evets. O the other had, Ha et al (Ha et al., 2003) used a static temporal-segmet of 30-40 sec (empirical) for soccer highlights detectio. 2) The lack of a uiversal set of features for detectig differet highlights ad across differet sports. Features that best describe a highlight are selected usig domai kowledge. For istace, whistle i soccer is oly used to detect foul ad offside, while excitemet ad goal-area are used to idetify goal attempt (Dua et al., 2003). I order to solve the first drawback, some approaches (Xu et al., 1998, Li ad Ibrahim Seza, 2001) have claimed that highlights are maily cotaied i a play scee. However, based o a user study as reported i our earlier paper (Tjodroegoro et al., 2004b), we have foud that most users eed to watch the whole play ad break to uderstad fully a evet. For example, whe a whistle is blow durig a play i soccer video, we would expect that somethig has happeed. Durig the break, the closeup views of the players, a replay scee, ad/or the text display will cofirm whether it was a foul or offside. Cosequetly, it is expected that automated sematic aalysis should also eed to use both play ad break segmets to detect highlights. As for the secod drawback, we aim to reduce the amout of maual choice of features set. For istace, it is quite ituitive to decide that the most effective evet-depedet features to describe a soccer foul are whistle, followed by referee appearace. However, we were able to idetify some additioal characteristics of foul that could be easily missed by maual observatio such as shorter duratio (compared to shoot) ad less excitemet (compared to foul), based o statistical features that will be discussed i sectio 2.2. 2.1 Play-Break as Stadard Scope of Evets Most broadcasted sport videos use trasitios of typical shot types to emphasize story boudaries while aidig importat cotets with additioal items. For example, a log global shot is ormally used to describe a attackig play that could ed with scorig of a goal. After a goal is scored, zoom-i ad close-up shots will be domiatly used to capture players ad supporters celebratio durig the break. Subsequetly, some slow-motio replay shots ad artificial texts are usually iserted to add some additioal cotets to the goal highlight. Based o this example, it should be clear that play-break sequeces should be effective cotaiers for a sematic cotet sice they cotai all the required details. Usig this assumptio, we should be able to extract all the pheomeal features from play-break that ca be utilized for highlights detectio. Thus, as show i Figure 1, the scopig of highlight (evet) detectio should be from the last play-shot util the last break shot. Figure 1. Extractig Evets from Play-Break. Aalysis of camera-views trasitio i a sports video has bee used successfully for play-break segmetatio (Eki ad Tekalp, 2003a). We have exteded this approach by addig replay-based correctio to improve the performace. Figure 2 shows how a replay scee (R) ca fix the boudaries of play-break sequeces which are formed by a sequetial play scee (P) ad break scee (B). Please ote that.s idicates start while.e idicates ed. For example, R.s is short for the start of replay scee. Figure 2. Locatios of Replays i Play-breaks. Based o these scearios, a algorithm to perform replayscee based play-break segmetatio has bee developed. This algorithm aims to: 1) fix the iaccurate boudaries of play-break sequeces due to shorter breaks; 2) locate missig sequeces due to missed breaks; ad 3) avoid false sequeces due to falsely detected play which is followed by a break. Algorithm to fix play-break boudaries, based o replay scee locatios If (A.s > B.s) & (A.e < B.e) A strict_durig B If (A.s > B.s & A.e <= B.e) OR (A.s >= B.s & A.e < B.e) A durig B If A.e = B.e A meets B (1) If [R strict_durig P] & [(R.e P.e) >= dur_thres] B.s = R.s; B.e = R.e; Create a ew sequece where [P 2.s = R.e+1] & [P 2.e P.e] (2) If [R strict_durig P] & [(R.e P.e) <= dur_thres] P.e = R.e; B.s = R.e+1 (3) If [R meets B] & [R.s < P.e] P.e = R.s (4-5) If [R durig B] & [R meets B] ) OR (If [R strict_durig B]) No processig required (6) If [R durig B] & [(R.e P 2.s) >= dur_thres] B.e = R.e; Amed the eighbor sequece: [P 2.s = R.e+1] (7) If [R durig P 2 ] & [(R.e P 2.s) >= dur_thres] Attach sequece 2 to sequece 1 (i.e. combie seq 1 ad seq 2 ito oe sequece)

It is importat to ote that some broadcasters isert some advertisemets (ads) i-betwee or durig the replay. To obtai the correct legth of the total break, the total legth of the ads has to be take ito accout. 2.2 Statistical-Drive Evets Detectio As most of the curret ciematic-heuristics for highlight detectio are heavily based o maual discoveries ad domai-specific rules, we aim to miimize the amout of maual supervisio i discoverig the pheomeal features that exist i each of the differet highlights. Moreover, i developig the rules for highlight detectio, we should use as little domai kowledge as possible to make the framework more flexible for other sports with miimum adjustmets. For this purpose, we have coducted a semi-supervised traiig from differet broadcasters ad differet matches for each highlight to determie the characteristics of play-break sequeces cotaiig differet highlights ad o highlights. It is semi-supervised traiig as we maually classify the specific highlight that each play-break sequece cotais. Moreover, the automatically detected play-break boudaries ad mid-level features locatios withi each play-break such as excitemet are maually checked to esure the accuracy of traiig. Durig traiig, statistics of each highlight are calculated with the followig parameters (the examples are based o AFL video): SqD = duratio of curretly-observed play-break sequece. For example, we ca predict that a sequece that cotais a goal will be much loger tha a sequece with o highlight. BrR = duratio of break / SqD. Rather tha measurig the legth of a break to determie a highlight, the ratio of break segmet withi a sequece is more robust ad descriptive. For example, we ca distiguish goal from behid based o the fact that goal has a higher break ratio tha behid due to a loger goal celebratio ad slow motio replay. PlR = duratio of play scee / SqD. We fid that most o-highlight sequeces have the highest play ratio sice they usually cotai very short break. RpD = duratio of (slow-motio) replay scee i the sequece. This measuremet implicitly represets the umber of replay shots which is geerally hard to be determied due to may camera chages durig a slow motio replay. ExcR = duratio of excitemet / SqD. Typically, a goal cosists of a very high excitemet ratio whereas a o-highlight usually cotais o excitemet. NgR = duratio of the frames cotaiig goalarea/duratio of play-break sequece. A high ratio of ear goal area durig a play potetially idicate goal. CuR = legth of close-up views that icludes crowd, stadium, ad advertisemets withi the sequece / SqD. We fid that the ratio of close-up views used i a sequece ca predict the type of highlight. For example, goal ad behid highlights geerally has a higher close-up views due to focusig o just oe player such as the shooter ad goal celebratio. The statistical data of the uiversal feature sets withi each highlight after a traiig that uses 20 samples is preseted i Table 1. Based o the traied statistics, we have costructed a ovel set of statistical-drive heuristics to detect soccer, AFL, ad basketball highlights. We do ot eed to use ay domai-specific kowledge, thereby makig the approach less-subjective ad robust whe applied for similar sports. As each feature ca be cosidered idepedetly, more features ca be itroduced without the ecessity to make major chages i the highlight classificatio rules. Moreover, our model does ot eed to be re-traied as a whole, thereby promotig extesibility. Hece, our approach will reap the full beefit whe larger set of features are to be developed/improved gradually. Highlight classificatio is performed as: [HgtClass] = Classify_ Highlight (D,NgR,Exc R,CuR,PlR, RpR) where, HgtClass is the highlight class most likely cotaied by the sequece, while D, NgR, ad so o are the statistical parameters described earlier. This equatio will be performed accordig to the sport gere. I order to classify which highlight is cotaied i a sequece, the algorithm uses some measuremets. For example, i soccer, G, S, F, ad No are the highlightscore for goal, shoot, foul ad o-highlight respectively. Each of these measuremets is icremeted by 1 poit whe certai rules are met. Thus, users should be able to ituitively decide the most-likely highlight of each sequece based o the highest score. However, to reduce users workload, we ca apply some post-processig to automate/assist their decisio. Feature Duratio (D) Play Ratio (PlR) Near Goal (NgR) Excitemet (ExcR) Close-up (CuR) Replay (RpD) Soccer G=Goal, S=Shoot, F=Foul, N=No_(avg; max; mi) Gd_(73; 104; 43) Sd_(36, 73; 10) Fd_(38; 72; 14) Nd_(24; 40; 5) Gp_(0.30; 0.46; 0.07) Sp_(0.57; 0.87; 0.15) Fp_(0.64; 0.97; 0.08) Np_(0.73; 0.91; 0.47) G_(0.47; 1; 0.13) S_(0.55; 0.93; 0) F_(0.23; 0.81; 0) N_(0.17; 0.1; 0) Ge_(0.45; 0.83; 0.10) Se_(0.35; 0.79; 0) Fe_(0.20; 0.50; 0) Ne_(0.2;0.6; 0) Gc_(0.26; 0.51; 0.08) Sc_(0.23; 0.74; 0) Fc_(0.12; 0.29; 0) Nc_(0.2; 0.6; 0) Gr_(25; 34; 20) Sr_(6; 16; 0) Fr_(6; 23; 0) Nr_(0; 0; 0) AFL G=Goal, B=Behid, M=Mark, T=Tackle, N=No_(avg; max; mi) Gd_(72; 120; 40) Bd_(31; 53; 7) Md_(26; 65; 8) Td_(25; 63; 10) Nd_(20; 42; 8) Gp_(0.17; 0.33;0.06) Bp_(0.38; 0.92; 0.10) Mp_(0.62; 0.86; 0.26) Tp_(0.55; 0.83; 0.08) Np_(0.52; 0.81; 0.17) G_(0.13; 0.43; 0.02) B_(0.10; 0.39; 0.02) M_(0.02; 0.23; 0) T_(0.01; 0.05; 0) N_(0.01; 0.08; 0) Ge_(0.29; 0.54; 0) Be_(0.38; 0.86; 0) Me_(0.32; 0.91;0) Te_(0.22; 0.59; 0) Ne_(0.30; 0.75; 0) Gc_(0.35; 0.86; 0) Bc_(0.35; 0.76; 0) Mc_( 0.28; 0.56; 0) Tc_(0.18; 0.44; 0) Nc_(0.29; 0.69; 0) Gr_(9; 23; 0) Br_(6; 40; 0) Mr_(1; 14;0) Tr_(4; 14; 0) Nr_(0; 0; 0) Basketball G=Goal, F=Foul, FT=Free throw, T=Timeout_(avg; max; mi) Gd_(24; 51.6; 9.6) Fd_(28.8; 60; 12) FTd_( 20.4; 30; 11) Td_(124.8; 255; 25) Gp_(0.71; 0.94; 0.27) Fp_(0.48; 0.72; 0.13) FTp_(0.50; 0.81; 0.23) Tp_(0.12; 0.24; 0.05) G_(0.49; 0.92; 0.04) F_( 0.43; 0.93; 0) FT_(0.55; 1; 0.05) T_(0.34; 0.85; 0) Ge_(0.41; 0.82; 0.05) Fe_(0.34; 0.78; 0) FTe_(0.44; 0.90; 0) Te_(0.24; 0.43; 0.05) Gc_(0.11; 0.3; 0) Fc_(0.27; 0.69; 0) FTc_(0.26; 0.68; 0) Tc_(0.49; 0.78; 0.16) Nc_(0.2; 0.63; 0) Gr_(0; 0; 0) Fr_(4.8; 13; 0) FTr_(0; 0; 0) Tr_(16; 40; 0) Table 1. Statistics of Soccer, AFL, ad Basketball Highlights. The essece of highlight classificatio is o comparig the value of each iput parameter agaist the typical statistical characteristics: mi, avg, ad max which are

deoted as a stat. The followig algorithm describes the calculatio that ca be applied to ay sport (usig soccer as a example). Commo evet classificatio algorithm Let Det_Soccer _Regio( val ) = Regio( val, stat G, stat S, stat F, stat N ) Perform regio 1.. = Det_Soccer_Regio( D),( NgR),( ExcR),( CuR),( PlR),( RpR) For regio 1 to regio Icremet the correspodig highlight score //G, Sh, F, No i this case where, 1, if ( AvgD1 MiAvgD) & ( TD1 MiTD) 2, if ( AvgD2 MiAvgD) & ( TD2 MiTD) Regio( val, stat1, stat2,... stat ) =...,if ( AvgD MiAvgD) & ( TD MiTD) stat = {avg, mi, max ), max mi TD = val stat + val stat, AvgD = val stat MiAvgD = mi(avgd1, AvgD2,... AvgD ) MiTD = mi(td,,... ), 1 TD2 TD. It is to be oted that i Det_soccer_regio(val), stat X matches the value iput. Therefore, whe val is NgR, the stat G = {G_avg, G_max, G_mi} is used accordig to the statistics-table. I additio to the commo algorithm, we ca improve the accuracy of the evet classificatio for a particular sport based o its statistical pheomea. This cocept is described i the rest of this sectio. avg 2.2.1 Evets Classificatio i Soccer Whe play ratio, sequece duratio ad ear goal ratio fall withi the statistics of goal or shoot, it is likely that the sequece cotais goal or shoot. Otherwise, we will usually fid a foul or o-highlight. However, shoot ofte has similar characteristics with foul. I order to differetiate goal from shoot, ad shoot/foul from ohighlight, we apply some statistical features: Goal vs. Shoot: Compared to shoot, goal has loger duratio, more replays ad more excitemet. However, goal has shorter play scee due to the domiace of break durig celebratio. Shoot, Foul, vs. No-highlight (Noe): Noe does ot cotai ay replay whereas foul cotais loger replay tha shoot i average. Foul has the lowest close-up ratio as compared to shoot ad oe. Noe has the shortest duratio as compared to shoot ad foul. Noe cotais the least excitemet as compared to shoot ad foul, whereas foul has less excitemet tha shoot. Based o these fidigs, the followig algorithm is developed. Specific algorithm to classify highlight evets i soccer regio Perform 1..3 = Det_Soccer_Regio (PlR), (D), (NgR) accordigly If all regio1, 2 ad 3 = 1 or 2 //Most likely to be goal or shoot Icremet G ad Sh, Perform regio 4..7 = Det_Soccer_Regio( ExcR), ( RpD), ( PlR), ( D) For regio4 to regio7 If curret regio = 1, Icremet G Else if curret regio = 2, icremet Sh Else //Most likely to be foul, shoot, or o Icremet F, Sh, No Perform regio 4..7 = Det_Soccer_Regio ( CuR), ( ExcR), ( D), ( RpD) For regio4 to regio7 If curret regio = 2, icremet Sh Else if curret regio = 3, Icremet F Else if curret regio = 4, icremet No It should be oted that the more compact represetatio of this algorithm is preseted i Figure 3, where {val} is the covetio regio Det_Soccer_Regio(val ),(val ),..( ) of 1..N = 1 2 valn. Thus, squares deote the statistics that eed to be checked, whereas the o-boxed texts are the associated highlight poit(s) that will be icremeted based o the outputs of each regio. This represetatio is used for describig other sports. 2.2.2 Evets Classificatio i AFL I AFL, a goal is scored whe the ball is kicked completely over the goal-lie by a player of the attackig team without beig touched by ay other player. A behid is scored whe the football touches or passes over the goal post after beig touched by aother player, or the football passes completely over the behid-lie. A mark is take if a player catches or takes cotrol of the football withi the playig surface after it has bee kicked by aother player a distace of at least 15 meters ad the ball has ot touched the groud or bee touched by aother player. A tackle is whe the attackig player is beig forced to stop from movig because beig held (tackled) by a player from the defesive team. Based o these defiitios, it should be clear that goal is the hardest evet to achieve. Thus, it will be celebrated logest ad give greatest emphasis will be give by the broadcaster. Cosequetly, behid, mark ad tackle ca be listed i the order of its importace (i.e. behid is more iterestig tha mark). Figure 4 shows the highlight classificatio rules for AFL. Let G, B, M, T, No be the highlight-score for goal, behid, mark, tackle ad o-highlight respectively. Thus, for AFL evet detectio: Det_AFL_Regio( val ) = Regio( val, stat G, statb, statm, statt, statn ). The algorithm firstly checks that if curret PlR belogs to stat G (i.e. output = 1) ad NgR is greater tha the miimum of the typical value for goal ad behid, the the sequece is most likely to cotai either goal or behid. This is followed by comparig: ExcR, RpD, ad PlR values: the outputs determie which score is icremeted from G or B. Else (if PlR does ot belog to stat G ), it is more likely to cotai mark, tackle, or oe. This is followed by comparig: D, CuR, PlR, ad RpR values: the outputs determie which score is icremeted from M, T, or N.

Figure 3. Highlight Classificatio Rules for Soccer Figure 4. Highlight Classificatio Rules for AFL. Figure 5. Highlight Classificatio Rules for Basketball 2.2.3 Evets Classificatio i Basketball Compared to soccer ad AFL, goals i basketball are ot celebrated ad do ot eed a special resume such as kick off. Therefore, it is oted that the rules applied to soccer ad AFL caot be used directly for basketball goals. Figure 5 shows the highlight classificatio rules for basketball. Let G, FT, F, T be the highlight-score for goal, free-throw, foul, ad timeout respectively. Thus, for basketball evet detectio, let: Det_Basketball_Regio( val ) = Regio( val, stat G, statft, statf, statt ) The algorithm firstly checks if curret PlR belogs to stat T (i.e. output = 4), the the sequece is most likely to cotai timeout. This is followed by comparig: Cur, RpD, NgR, ad D values: each time that the output of compariso is equal to 4, T is further icremeted. Else (if curret PlR does ot belog tostat T ), it is more likely to cotai goal, free-throw, or foul (if RpD > 0). This is followed by checkig: stat If NgR belogs to regio G or stat FT (i.e. output = 1 or 2), the the compariso is based o the values of: CuR, PlR, D, ad NgD: the outputs determie which score is icremeted from G or FT. stat Else, (if NgR does ot belog to regio G orstat FT ), the the compariso is based o the values of: CuR, PlR, NgD, ad ExcR: each time that the output of compariso is equal to 3, F is further icremeted. 3 Extesible Idexig For the idexig of evets, OO modelig is recogized for its ability to support complex data defiitios. We have idetified two mai alteratives i usig O-O for modellig data based o the models preseted i AVIS (Adali et al., 1996) ad OVID (Oomoto ad Taaka, 1997), amely, schema-based ad schema-less,. A schema-based model (Adali et al., 1996) ca be composed of three types of etities (i.e. idex-able items) i a video database, amely, 1) video objects, which capture etities that preset i the video frames, 2) activity types, which is the subject of a frame sequece, ad 3) evet, which is the istatiatio of a activity type. Thus, their model has allowed users to query the locatio of the occurrece of their desired object or evets. The mai beefit of usig a schema-based model is its capability to support easy updates due to the strict compoets that have to be followed exactly for each etity. However, the mai limitatio is the difficulty to iclude ew descriptio durig istatiatio of video models due to the static schema; therefore, the model is ot extesible. I cotrast, schema-less modelig (Oomoto ad Taaka, 1997) is desiged based o the fact that each video iterval ca be regarded as a video object, i which the attributes ca be objects, evets, or other video objects. Thus, the cotet of a video object is more dyamic. Moreover, they also proposed dyamic calculatio of iheritace, overlap, merge ad projectio of itervals to satisfy user queries. However, there are two mai problems of schema-less modellig. First, query difficulties arise as users/developers must ispect the attribute defiitio of each object to develop a query because each object has its ow attribute structure. Secod, the total depedecy o users or applicatios for supervisig the istatiatio of video objects occurs due to the fact that a schema is ot preset. I order to combie the stregths of schema-based ad schema-less modellig, this sectio demostrates the utilizatio of XML to desig ad costruct a semi-schema based video model. Schema-based matchig esures that the video idexes are valid durig data operatios such as isertio, thereby miimizig the eed of maual checkig. However, the model is also semi-schema based as it allows additioal declared elemets i the istatiated objects as compared to its schema defiitio. Moreover, ot all elemets i a object eed to be istatiated at oe time as video cotet extractio ofte requires several passes due to the complexity ad legthy processig; thereby supportig a extesible modelig scheme.i additio to the stregth of OO modelig, the video model also attempts to beefit from relatioal modelig scheme. I particular, the utilizatio of referetial itegrity (Coolly ad Begg., 2002) allows a object to iclude elemets which are refereced from the existig objects withi the database. The mai purpose is to reduce objects beig added withi aother object(s), thereby avoidig complex hierarchies ad potetial redudacies. Hece, i overall, the proposed video model supports object-relatioal modelig approach while adoptig semi-schema based idex costructio ad maiteace. The sport video idexig is desiged usig two mai abstractio classes, amely, segmet ad evet. Each segmet is istatiated with a uique key of segmet Id ito either: video-, visual-, or audio-segmet. A segmet, as show i Figure 7, ca be istatiated as video-, audioor visual-segmet which are extracted from a raw video track whe mid-level features (e.g. whistle ad

excitemet) ca be detected. A evet ca be istatiated ito geeric (e.g. iterestig evet), domai-specific (e.g. soccer goal), or further-tactical (e.g. soccer free kick) sematics. Evets ad segmets are chose as they ca provide a effective descriptio for may sport games. For example, most users will beefit from watchig soccer goals as the most celebrated ad excitig evet. Segmets are used as the text-alterative aotatios to describe the goal. As show i Figure 6, the last ear-goal segmet i a play-break sequece cotaiig goal describe how the goal was scored. Face ad text displays ca iform who scored the goal (i.e. the actor of the evet) ad the updated score. Replay scee shows the goal from differet agles to further emphasize the details of how the goal is scored. I most cases, whe the replay scee is associated with excitemet, the cotet is more importat. Excitemet durig the last play shot i a goal is usually associated with descriptive arratio about the goal. I fact, we (huma) ofte ca hear a goal without actually seeig it. Figure 6. Goal Evet with Segmet-Based Aotatios. We have utilized some of the mai beefits from usig XML to store ad idex the extracted iformatio from sport videos: XML is extesible by allowig additioal iformatio without affectig others. This is importat to support gradual developmets of feature extractio techiques that ca add extractable segmets ad evets. XML is iterally descriptive ad ca be displayed i various ways. This is importat to allow users browsig the XML data directly, while search results ca also be retured as XML that ca provide direct lik(s) to the video locatio. XML fully supports semi-structured aspects that match video database characteristics: 1) Object ca be described usig attributes (properties), other objects (i.e. ested object), or heterogeeous elemets (i.e. ay elemet). Istatiated objects from the same class may ot have the same umber of attributes as ot all attributes are compulsory, depedig o the mi ad max occurs. 2) XML supports two types of relatioships: estig ad referecig. However, to reduce redudacy, we have used referecig istead of ested object class. We have used XML Schema to defie ad costruct the XML-based video schema as it has replaced DTD as the most descriptive laguage. Due to its expressive power, XML schema has also bee used as the basis of MPEG-7 DDL (Data Defiitio Laguage) ad XQuery data model. Therefore, we should be able to easily leverage our proposed model to support MPEG-7 stadard multimedia descriptios ad XQuery implemetatio. For a more compact represetatio of XML schema, this sectio will demostrate the use of ORA-SS (Object- Relatioship-Attribute otatio for Semi-Structured data) (Dobbie et al., 2000) to desig the video model as show i Figure 7 to Figure 9 (that is located o the last page). ORA-SS otatio is chose for its ability to represet most of XML schema s features. It is to be oted that our diagrams exted the ORA-SS otatio by demostratig a more complex sample which itegrate iheritace diagram with schema diagram. We have also itroduced two additioal otatios: 1) italic texts idicate abstract object, 2) (i Figure 9) idicates repeated object to avoid complex/crossig lies. The followigs describe the overall video idexig model. As show i Figure 9, a sport video (SV) is a type of video segmet which cosists of SV compoets, overall summary, ad hierarchical summary. SV compoets are composed of: 1) segmet collectio which stores a flat-list of audio, visual ad audio segmets that ca be extracted from the sport video, 2) sytactic relatio collectio which stores all the sytactic relatios such as composed of ad starts after betwee oe source segmets ad oe or more destiatio segmets, ad 3) sematic relatio collectio which records all the sematic relatios such as is actor of ad appears i betwee oe source segmet or sematic object ad oe or more destiatio segmets or sematic objects. Overall summary describes the sport video game as a whole; it icludes where (stadium), whe (date time), who (teams that compete), fial result, ad match statistics. Match statistics ca be stored as XML tags or a visual frame such as text displays that depicts the umber of goals, shots, fouls, red/yellow cards, ad couter attacks i a soccer game. Hierarchical summary is composed of comprehesive summary ad highlight evets (HE) summary. Comprehesive summary describes sport video i terms of play-break sequeces which are the mai story decompositio uit i most of sport videos. For example, a attackig attempt durig a play is stopped whe there is a goal or foul. Each playbreak ca cotai zero or oe (key) evet ad ca be decomposed ito oe or more play ad break shots. Each play or break ca be described by text-alterative aotatios, icludig face, replay ad excitemet which are refereced (segmets) from segmet collectio. O the other had, HE summary orgaizes highlight evets ito commo summary theme such as soccer goals ad basketball free throws. Each time sport video is istatiated, it will be specialized ito the classified gere, such as soccer video, basketball

video ad AFL video. Therefore, a soccer video will iherit all compoets of (geeral) sport video while providig extra attributes such as sport category ad some extra compoets. I particular, for each type of sport video, we ca extract domai specific evets such as soccer goal. Each domai evet ca be described usig specific roles such as goal scorer. It should be oted that goal scorer will referece to a player that is defied elsewhere i order to avoid ested compoets. Similarly, domai evets are refereced by hierarchical summaries. Fially, a sport video database is composed of oe or more classified sport videos, ad oe sematic object collectio. Sematic object collectio defies the details of all the sematic objects that appear i the sport videos. For example, player ca be istatiated ito soccer player which is described by the specific attributes of a soccer player such as squad umber, ad preferred positio. Figure 7. Extesible Idexig Scheme (1). Figure 8. Extesible Idexig Scheme (2). It is to be oted that i order to achieve a faster gradual idex costructio, all segmets should be able to be extracted icremetally i the same level, without cocerig about the hierarchy. For example, assumig that PB1 cotais P1, P2, ad B1, the system should be able to add B1 without ecessarily attachig it to PB1. This allows the system to easily add P1 ad P2 at later time. Therefore, hierarchy structures should be stored separately as a hierarchical view or processed dyamically whe required by users for browsig. Usig the proposed video model, we have demostrated a sport video idexig scheme that supports: Extesible video idexes that allow gradual extractio of segmets ad evets without affectig the others. For example, we ca itroduce more segmets ad evets icremetally without affectig the existig oes. Similarly, more sematic objects, such as stadium ad referee, ca be itroduced at a later stage whe may sport videos share the same stadium ad referee. Object-Relatioship modelig scheme. I particular, we have demostrated that iheritace ad referecig are importat features i video database modelig. Iheritace eables us to reuse existig paret compoets while refiig them with more specific items. Referecig eables us to store video compoets ito a flat list which ca be refereced by hierarchical structures to avoid redudacies. Semi-schema based modelig scheme. As show i Figure 7, we allow users/applicatios to add ANY additioal elemets (or attributes) ito a segmet descriptio as log as the elemet has bee declared somewhere else i the proposed schema, or other schema withi a particular scope. I fact, we may attach ANY ito other elemets i our data model to allow more flexibility as users ofte kow better what they wat to describe tha developers. However, we aim gradually modifyig the schema with ew compoets, especially whe the extra iformatio provided by users ca be used to erich the curret video model. 4 Experimetal Results Performace results for mid-level features extractio (that are required durig traiig ad evaluatio) icludig view classificatio, ear-goal, ad excitemet, have bee preseted i our previous papers (Tjodroegoro et al., 2004a). For AFL ad basketball videos, we oly eed to esure that the adaptive thresholds are effective for each video sample. For this purpose, we compare the truth ad the automatic results of features detectio o each video for duratio of 5-10 miutes. We the select the best empirical thresholds that ca be applied to all videos withi the same domai. Missig ad/or false detectios o idividual mid-level features detectio have less sigificat impacts o the highlights classificatio as the models deped o the fusio of all features. For example, soccer goal will still be detectable eve if the ear goal ratio ad excitemet is ot detected perfectly. Nevertheless, the more accurate mid-level features ca be extracted, the highlight poits will be more accurately calculated. Hece, durig experimet we have set a miimum value that highlight poit should reach to be trusted. For all sport videos, we have successfully applied a miimum of 3 poits for all highlights which meas that at least 3 mid-level features ca be detected. I almost all cases, highlights ca be detected with a 6 to 7 poit miimum threshold. Table 2 will describe the video samples used durig experimet. For each sport, we have used videos from differet competitios, broadcasters ad/or stage of touramet. The purpose is, for example, fial match is expected to cotai more excitemet tha a group match while exhibitio will show may replay scees to display players skills. Our experimet was coducted usig MATLAB 6.5 with image processig toolbox. The videos are captured directly from a TV tuer ad compressed ito.mpg format which ca be read ito MATLAB image matrixes. Sample Group (Broadcaster) Soccer: UEFA Champios League Group Stage Matches (SBS) Soccer: UEFA Champios league (SBS) Videos team1-teams2_period-[duratio] MachesterUtd-Deportivo1,2-[9:51, 19:50] Madrid-Mila1,2[9:55,9:52] Juvetus-Madrid1,2:[19:45,9:50] Mila-Iterazioale1,2:[9:40,5:53]

Elimiatio Rouds Soccer: FIFA World cup Fial (Nie) Soccer: Iteratioal Exhibitio (SBS) Soccer: FIFA 100 th Aiversary Exhibitio (SBS) AFL League Matches (Nie) AFL League Matches (Te) AFL League Fial rouds (Te) Basketball: Athes 2004 Olympics (Seve) Basketball: Athes 2004 Olympics (SBS) Mila-Depor1,2-[51:15,49:36] (S1) Madrid-BayerMuich1,2-[59:41,59:00] (S2) Depor-Porto-[50:01,59:30] (S3) Brazil-Germay [9:29,19:46] Aussie-SthAfrica1,2-[48:31,47:50] (S4) Brazil-Frace1,2-[31:36,37:39] (S5) COL-GEEL_2-[28:39] (A3) StK-HAW_3-[19:33] (A4) Rich-StK_4-[25:20] (A5) COL-HAW_2-[28:15] (A1) ESS-BL_2-[35:28] (A2) BL-ADEL_1,2:[35:33,18:00] (A6) Port-Geel_3,4-[30:37,29:00] (A7) Wome: AusBrazil_ 1,2,3-[19:50,19:41,4:20] (B1) Wome: Russia-USA_3-[19:58] (B2) Me: Australia-USA_1,2-[29:51,6:15] (B3) Me: USA-Agola_2,3-[22:25,15:01] (B4) Wome: Australia-USA_1,2-[24:04-11:11] (B5) Table 2. Sample Video Data. 4.1 Performace of Play-Break Segmetatio Play-break scopig plays a sigificat role to esure that we ca extract all of the features that usually exist i each highlight. Moreover, the statistics (especially play-/breakdomiace) will be affected whe the play-break sequeces are detected perfectly. Table 3 to Table 5 depicts the performace of the play-break segmetatio algorithm o soccer, AFL ad basketball videos, respectively. It is to be oted that that RC = Replay-based (P-B sequece) Correctio, PD = perfectly detected, D = detected, M = missed detectio, F = false detectio, Tr = Total umber i Truth, Det = Total Detected, RR = Recall Rate, PR= Precisio Rate, ad PD decr = perfectly detected decrease rate if RC is ot used; Tru= PD+D+M, Det = PD+D+F, RR = (PD+D+M)/Tru * 100%, PR= (PD+D)/Det * 100%, ad PD_Decr = (PD-D)/PD * 100%. The results demostrate that RC is geerally useful to improve the play-break segmetatio performace. It is due to the fact that may (if ot most) replay scees, especially soccer ad AFL use global (i.e. play) shots. This is show by all PD_decr, RR, ad PR as RC always improves all of these performace statistics. I particular, the RR ad PR for soccer 1-1 with RC are 100% each but they are reduced to below 50% without RC. I soccer 1-1 without RC, the PD dropped from 49 to 12 (i.e. 75% worse) whereas M icreases from 0 to 25 ad F icreases from 0 to 5. This is due to the fact that soccer1 video cotais may replay scees which are played abruptly durig a play, thereby causig a too-log play scee ad missig a break. However, based o the statistics show i Table 5, RC for basketball may ot be as importat as that of soccer ad AFL. It is because basketball s replay scee uses more break shots such as zoom-i ad closeup, as compared to soccer ad basketball. 4.2 Performace of Soccer Evets Detectio Based o Table 6 ad Table 7, most soccer highlights ca be distiguished from o-highlights with high recall ad precisio. As there are ormally ot may goal highlights i a soccer match, it would be ideal to have a high RR over a reasoable PR; 5 out of 7 goals are correctly detected from the 5 sample videos while 2 shoots ad 1 o-highlight are classified as goals. The shoot segmets detected as goals very excitig ad early result i goal. O the other had, the o-highlight detected as a goal also cosist of a log duratio ad replay scees ad excited commetaries due to a fight betwee players. The foul detectio is also effective as the RR is 81% ad most of the misdetectios are either detected as shoot or o which have the closest characteristics. However, the PR is cosiderably low sice some shoots ad ohighlights are detected as foul. A alterative solutio is to use whistle existece for foul detectio, but we still eed to achieve a really accurate whistle detectio that ca overcome the high-level of oise i most of sport domais. Oly 46 out of 266 o-highlight sequeces were icorrectly detected as highlights. These additioal highlights will still be preseted to the viewers as there are geerally ot may sigificat evets durig a soccer video. I fact, most of these false highlights ca still be iterestig for some viewers as they ofte cosist of log excitemet, ear-goal duratio ad replay scee. 4.3 Performace Basketball Evets Detectio Highlights detectio i basketball is slightly harder tha soccer ad AFL due to the fact that: 1) goals are geerally ot celebrated as much as soccer ad AFL, 2) ohighlights are ofte detected as goal ad vice versa. Fortuately, o-highlights maily just iclude ball out play which hardly happe i basketball matches. Thus, we have decided to exclude o-highlight detectio ad replace it with timeout detectio which ca be regarded as o-highlights for most viewers. However, for some sport fas, timeouts may still be iterestig to show the players ad coaches for each team ad some replay scees. I additio to these problems, sequeces cotaiig fouls are sometimes iseparable from the resultig free throws. For such cases, the fouls are ofte detected as goal due to the high amout of excitemet ad log ear-goal. However, fouls which are detected as goals ca actually be avoided by applyig a higher miimum highlight poit for goal but at the expese of missig some goal segmets. For our experimet, we did ot use this optio as we wat to use a uiversal threshold for all highlights. Based o Table 8 ad Table 9, basketball goal detectio achieves high RR ad reasoable PR. This is due to the fact that goals geerally have very uique characteristics as compared to foul ad free throw. Timeouts ca be detected very accurately (high RR ad PR) due to their very log ad may replay scees. Moreover, most broadcasters will play some i-betwee advertisemets whe a timeout is loger tha 2 miutes, thereby icreasig the close-up ratio. Free throw is also detected very well due to the fact that free throw is maily played i ear-goal positio; that is, the camera focuses o capturig the player with the ball to shoot. However, it is geerally distiguishable from goal based o: less excitemet, higher ear goal, ad more close-up shott; that is, goal scorer is ofte just show with zoom-i views to keep the game flowig. However, the system oly detected 28 out of 54 foul evets. This problem is

caused by the fact that after foul, basketball videos ofte abruptly switches to a replay scee which is followed by time-out or free-throw. This ca be fixed with the itroductio of additioal kowledge such as whistledetectio. 4.4 Performace of AFL Evets Detectio As show i Table 10 ad Table 11, the overall performace of the AFL highlights detectio is foud to yield promisig results. All 37 goals from the 7 videos were correctly detected. Although the RR of behid detectio seems to be low, most of the miss-detectios are actually detected as goal. Moreover, behid is still a subtype of goal except that it has lower poit awarded. The slightly lower performace for detectio of mark ad tackle detectio is caused by the fact that our system does ot iclude whistle feature which is predomiatly used durig these evets. Based o the experimetal results, mark is the hardest to be detected ad eeds additioal kowledge. I Table 11, PR ad RR for behid is N/A as 1 behid was detected as goal while Mark = N/A because 5 marks were detected as goal. 5 Coclusio ad Future Work We have proposed a extesible approach for detectig evets i sports video. The use of play-break scopig for all highlights have eabled us to obtai statisticalpheomea of the features cotaied i each highlight. Sice the rules for highlight classificatio are drive by the statistics, oe or low amout of domai-specific kowledge is required. Therefore, the proposed algorithms should be more robust for differet sports, especially, field-ball goal orieted games. Based o the experimetal results, play-break sequeces are prove to be effective cotaiers for detectig highlights. Thus, play-breaks eed to be perfectly segmeted ad we have show that replay-correctio improves the performace. We have also proposed a segmet-evet based video data model which is desiged usig semi-schema-based ad object-relatioship modelig schemes. The schema is developed ito XML schema with ORA-SS otatio. The proposed schema is extesible as it supports icremetal developmet of algorithms for feature-sematic extractio. Moreover, the schema does ot eed to be complete at oe time while allowig users to add additioal elemets. We have also emphasized the usage of referecig relatioship to avoid redudat data. Referecig also allows the system to add segmets ad evets to achieve more straightforward ad faster data isertios. I order to further verify ad improve the robustess of the proposed algorithms for evets detectio we have icorporated more sport gere such as volleyball, teis ad gymastics, ito the existig dataset. The extracted iformatio will allow the system to costruct a larger sample of video database data which cosequetly would verify the beefits from usig the proposed video idexig model. Video Soccer Play-break detectio PD D M F Tru Det RR PR PD_decr S1-1 (RC) 49 0 0 0 49 49 100.00 100.00 S1-1 12 12 25 5 49 54 48.98 44.44 75.51 S1-2(RC) 53 0 0 1 53 54 100.00 98.15 S1-2 36 10 7 1 53 54 86.79 85.19 32.08 S2-1(RC) 54 1 1 12 56 68 98.21 80.88 S2-1 53 2 1 12 56 68 98.21 80.88 1.85 S2-2(RC) 58 1 0 7 59 66 100.00 89.39 S2-2 55 4 0 7 59 66 100.00 89.39 5.17 S3-1 (RC) 49 0 0 4 49 53 100.00 92.45 S3-1 45 4 0 5 49 54 100.00 90.74 8.16 S3-2 (RC) 69 0 0 3 69 72 100.00 95.83 S3-2 65 4 0 5 69 74 100.00 93.24 5.80 S4-1(RC) 49 0 0 9 49 58 100.00 84.48 S4-1 40 8 1 13 49 62 97.96 77.42 18.37 S4-2(RC) 47 0 0 9 47 56 100.00 83.93 S4-2 36 11 0 12 47 59 100.00 79.66 23.40 S5 (RC) 48 0 0 0 48 48 100.00 100.00 S5 24 16 8 1 48 49 83.33 81.63 50.00 Table 3. Play-Break Detectio i Soccer Videos. AFL Play-break detectio Video PD D M F Tru Det RR PR PD decr A1 (RC) 34 0 0 5 34 39 100.00 87.18 A1 29 5 0 8 34 42 100.00 80.95 14.71 A2 (RC) 21 6 0 8 27 35 100.00 77.14 A2 16 10 1 5 27 32 96.30 81.25 23.81 A3 (RC) 20 3 0 4 23 27 100.00 85.19 A3 17 6 0 6 23 29 100.00 79.31 15.00 A4 (RC) 29 0 0 1 29 30 100.00 96.67 A4 21 6 2 2 29 31 93.10 87.10 27.59 A5 (RC) 34 0 0 1 34 35 100.00 97.14 A5 23 4 7 3 34 37 79.41 72.97 32.35 A6 (RC) 50 2 0 3 52 55 100.00 94.55 A6 36 10 6 7 52 59 88.46 77.97 28.00 A7 (RC) 41 10 4 4 55 59 92.73 86.44 A7 39 12 4 6 55 61 92.73 83.61 4.88 Table 4. Play-Break Detectio Results i AFL Videos. Video Basketball Play-break detectio PD D M F Tru Det RR PR PD decr B1 (RC) 32 6 2 3 40 43 95.00 88.37 B1 31 7 2 4 40 44 95.00 86.36 3.13 B2 (RC) 19 2 0 2 21 23 100.00 91.30 B2 18 3 0 3 21 24 100.00 87.50 5.26 B3 (RC) 39 3 0 1 42 43 100.00 97.67 B3 38 4 0 2 42 44 100.00 95.45 2.56 B4 (RC) 26 5 2 2 33 35 93.94 88.57 B4 25 6 0 3 31 34 100.00 91.18 3.85 B5 (RC) 39 0 1 1 40 41 97.50 95.12 B5 25 13 2 5 40 45 95.00 84.44 35.90 Table 5. Play-Break Detectio Results i Basketball. Groud truth Highlight classificatio of 5 videos Goal Shoot Foul No Truth Goal 5 0 2 0 7 Shoot 2 66 32 12 112 Foul 0 13 91 13 117 No 1 11 34 220 266 Detected 8 90 159 245 Table 6. Evets Detectio Results i Soccer Videos. S1 S2 S3 S4 S5 Average RR PR RR PR RR PR RR PR RR PR RR PR Goal 60 100.0 100 50.0 N/A N/A 100 33.3 N/A N/A 86.7 61.1 Shoot 39.4 76.5 64.0 84.2 80.0 66.7 78.9 71.4 40.0 66.7 60.5 73.1 Foul 85.2 53.5 68.0 53.1 71.4 78.9 88.9 38.1 92.9 52.0 81.3 55.1 No 86.5 82.1 86.3 88.5 90.5 90.5 75.8 100.0 60.0 80.0 79.8 88.2 Table 7. Distributio of Soccer Evets Detectio Groud truth Highlight classificatio of 5 basketball videos Goal Free throw Foul Timeout Truth Goal 56 0 0 2 58 Free throw 4 14 0 0 18 Foul 21 2 28 3 54 Timeout 0 0 0 13 13 Total Detected 81 16 28 18 Table 8. Basketball Evets Detectio Results

B1 B2 B3 B4 B5 Average RR PR RR PR RR PR RR PR RR PR RR PR Goal 100 72.2 75 50.0 95 70.4 100 72.2 100 66.7 94 66.3 Free throw 100.0 66.7 100.0 75.0 80.0 100.0 50.0 100.0 66.7 100.0 79.33 88.3 Foul 64.7 100.0 50.0 100.0 30.8 100.0 37.5 100.0 75.0 100.0 51.59 100.0 Timeout 100.0 100.0 100.0 50.0 100.0 40.0 100.0 66.7 100.0 100.0 100 71.3 Table 9. Distributio of Basketball Evets Detectio Groud truth Highlight classificatio of 7 videos Goal Behid Mark Tackle No Truth Goal 37 0 0 0 0 37 Behid 11 12 7 0 2 32 Mark 15 1 35 8 5 64 Tackle 4 0 9 20 2 35 No 4 4 11 3 33 55 Detected 71 17 62 31 42 Table 10. Evets Detectio Results i AFL Videos A1 A2 A3 A4 A5 A6 A7 AVG RR PR RR PR RR PR RR PR RR PR RR PR RR PR RR PR Goal 100.0 44.4 100.0 52.9 100.0 57.1 100.0 33.3 100.0 50.0 100.0 63.6 100.0 53.3 100.0 50.7 Behid 50.0 100.0 N/A N/A 33.3 33.3 33.3 66.7 50.0 100.0 33.3 100.0 33.3 50.0 38.9 75.0 Mark 50.0 60.0 N/A N/A 60.0 60.0 77.8 77.8 60.0 42.9 66.7 42.1 47.1 80.0 60.3 60.5 Tackle 80.0 66.7 100.0 75.0 25.0 100.0 100.0 100.0 12.5 33.3 85.7 66.7 50.0 50.0 64.7 70.2 No 77.8 100.0 33.3 100.0 50.0 50.0 50.0 66.7 71.4 71.4 46.2 100.0 75.0 69.2 57.7 79.6 Table 11. Distributio of AFL Evets Detectio 6 Refereces Adali, S., Cada, K. S., Che, S.-S., Erol, K. ad Subrahmaia, V. S. (1996) 'The Advaced Video Iformatio System: Data Structures ad Query Processig' Multimedia Systems, 4, 172-186. Coolly, T. M. ad Begg., C. E. (2002) Database systems : a practical approach to desig, implemetatio, ad maagemet, Addiso-Wesley, Harlow [Eglad] ; [New York]. Djeraba, C. (2002) 'Cotet-based multimedia idexig ad retrieval' Multimedia, IEEE, 9, 18-22. Dobbie, G., Xiaoyig, W., Lig, T. W. ad Lee, M. L. (2000) I Techical Report Departmet of Computer Sciece,Natioal Uiversity of Sigapore. Dua, L.-Y., Xu, M., Chua, T.-S., Qi, T. ad Xu, C.-S. (2003) I ACM MM2004ACM, Berkeley, USA, pp. 33-44. Figure 9. Extesible Idexig Scheme (3). Eki, A. ad Tekalp, A. M. (2003a) I Iteratioal Coferece o Mulmedia ad Expo 2003 (ICME03), Vol. 1 IEEE, pp. 6-9 July 2003. Eki, A. ad Tekalp, M. (2003b) 'Automatic Soccer Video Aalysis ad Summarizatio' IEEE Trasactio o Image Processig, 12, 796-807. Ha, M., Hua, W., Che, T. ad Gog, Y. (2003) I Iformatio, Commuicatios ad Sigal Processig, 2003 ad the Fourth Pacific Rim Coferece o Multimedia. Proceedigs of the 2003 Joit Coferece of the Fourth Iteratioal Coferece o, Vol. 2, pp. 950-954. Li, B. ad Ibrahim Seza, M. (2001) I Cotet-Based Access of Image ad Video Libraries, 2001. (CBAIVL 2001). IEEE Workshop opractical, Sharp Labs. of America, Camas, WA, USA, pp. 132-138. Nepal, S., Sriivasa, U. ad Reyolds, G. (2001) I ACM Iteratioal Coferece o MultimediaACM, Ottawa; Caada, pp. 261-269. Oomoto, E. ad Taaka, K. (1997) I The Hadbook of Multimedia Iformatio Maagemet(Ed, William I. Grosky, R. J. a. R. M.) Pretice Hall, Upper Saddle River, NJ, pp. 405-448. Tjodroegoro, D., Che, Y.-P. P. ad Pham, B. (2004a) 'Itegratig Highlights to Play-break Sequeces for More Complete Sport Video Summarizatio' IEEE Multimedia, Oct-Dec 2004, 22-37. Tjodroegoro, D., Che, Y.-P. P. ad Pham, B. (2004b) I The 6th Iteratioal ACM Multimedia Iformatio Retrieval WorkshopACM Press, New York, USA, pp. 267-274. Wu, C., Ma, Y.-F., Zhag, H.-J. ad Zhog, Y.-Z. (2002) I Multimedia ad Expo, 2002. Proceedigs. 2002 IEEE Iteratioal Coferece o, Vol. 1, pp. 805-808. Xu, P., Xie, L. ad Chag, S.-F. (1998) I IEEE Iteratioal Coferece o Multimedia ad ExpoIEEE, Tokyo, Japa,.