Machine Learning for Stock Selection

Similar documents
Morningstar Investor Return

A Probabilistic Approach to Worst Case Scenarios

Chapter : Linear Motion 1

Strategic Decision Making in Portfolio Management with Goal Programming Model

Paul M. Sommers David U. Cha And Daniel P. Glatt. March 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO

KEY CONCEPTS AND PROCESS SKILLS. 1. An allele is one of the two or more forms of a gene present in a population. MATERIALS AND ADVANCE PREPARATION

The t-test. What We Will Cover in This Section. A Research Situation

Market Timing with GEYR in Emerging Stock Market: The Evidence from Stock Exchange of Thailand

Proportional Reasoning

Using Rates of Change to Create a Graphical Model. LEARN ABOUT the Math. Create a speed versus time graph for Steve s walk to work.

Stock Return Expectations in the Credit Market

A Liability Tracking Portfolio for Pension Fund Management

Interpreting Sinusoidal Functions

Lifecycle Funds. T. Rowe Price Target Retirement Fund. Lifecycle Asset Allocation

Capacity Utilization Metrics Revisited: Delay Weighting vs Demand Weighting. Mark Hansen Chieh-Yu Hsiao University of California, Berkeley 01/29/04

CMA DiRECtions for ADMinistRAtion GRADE 6. California Modified Assessment. test Examiner and Proctor Responsibilities

What the Puck? an exploration of Two-Dimensional collisions

As time goes by - Using time series based decision tree induction to analyze the behaviour of opponent players

3. The amount to which $1,000 will grow in 5 years at a 6 percent annual interest rate compounded annually is

Bootstrapping Multilayer Neural Networks for Portfolio Construction

Idiosyncratic Volatility, Stock Returns and Economy Conditions: The Role of Idiosyncratic Volatility in the Australian Stock Market

Overview. Do white-tailed tailed and mule deer compete? Ecological Definitions (Birch 1957): Mule and white-tailed tailed deer potentially compete.

Time & Distance SAKSHI If an object travels the same distance (D) with two different speeds S 1 taking different times t 1

DYNAMIC portfolio optimization is one of the important

QUANTITATIVE FINANCE RESEARCH CENTRE. Optimal Time Series Momentum QUANTITATIVE FINANCE RESEARCH CENTRE QUANTITATIVE F INANCE RESEARCH CENTRE

Simulation based approach for measuring concentration risk

Economics 487. Homework #4 Solution Key Portfolio Calculations and the Markowitz Algorithm

Guidance Statement on Calculation Methodology

Market timing and statistical arbitrage: Which market timing opportunities arise from equity price busts coinciding with recessions?

Homework 2. is unbiased if. Y is consistent if. c. in real life you typically get to sample many times.

KINEMATICS IN ONE DIMENSION

Semi-Fixed-Priority Scheduling: New Priority Assignment Policy for Practical Imprecise Computation

MODEL SELECTION FOR VALUE-AT-RISK: UNIVARIATE AND MULTIVARIATE APPROACHES SANG JIN LEE

Constructing Absolute Return Funds with ETFs: A Dynamic Risk-Budgeting Approach. July 2008

Evaluating Portfolio Policies: A Duality Approach

AP Physics 1 Per. Unit 2 Homework. s av

Time-Variation in Diversification Benefits of Commodity, REITs, and TIPS 1

An Alternative Mathematical Model for Oxygen Transfer Evaluation in Clean Water

Dynamics of market correlations: Taxonomy and portfolio analysis

2. JOMON WARE ROPE STYLES

Rolling ADF Tests: Detecting Rational Bubbles in Greater China Stock Markets

Transit Priority Strategies for Multiple Routes Under Headway-Based Operations

A Study on the Powering Performance of Multi-Axes Propulsion Ships with Wing Pods

Methods for Estimating Term Structure of Interest Rates

8/31/11. the distance it travelled. The slope of the tangent to a curve in the position vs time graph for a particles motion gives:

Name Class Date. Step 2: Rearrange the acceleration equation to solve for final speed. a v final v initial v. final v initial v.

Monte Carlo simulation modelling of aircraft dispatch with known faults

Performance Attribution for Equity Portfolios

Reproducing laboratory-scale rip currents on a barred beach by a Boussinesq wave model

Do Competitive Advantages Lead to Higher Future Rates of Return?

Overreaction and Underreaction : - Evidence for the Portuguese Stock Market -

ITG Dynamic Daily Risk Model for Europe

Evaluating the Performance of Forecasting Models for Portfolio Allocation Purposes with Generalized GRACH Method

CALCULATORS: Casio: ClassPad 300 ClassPad 300 Plus ClassPad Manager TI: TI-89, TI-89 Titanium Voyage 200. The Casio ClassPad 300

2017 MCM/ICM Merging Area Designing Model for A Highway Toll Plaza Summary Sheet

Proceedings of the ASME 28th International Conference on Ocean, Offshore and Arctic Engineering OMAE2009 May 31 - June 5, 2009, Honolulu, Hawaii

SURFACE PAVEMENT CHARACTERISTICS AND ACCIDENT RATE

Betting Against Beta

Simulation Validation Methods

TRACK PROCEDURES 2016 RACE DAY

Flexible Seasonal Closures in the Northern Prawn Fishery

Application of System Dynamics in Car-following Models

LSU RISK ASSESSMENT FORM Please read How to Complete a Risk Assessment before completion

INSTRUCTIONS FOR USE. This file can only be used to produce a handout master:

Online Portfolio Selection: A Survey

Zelio Control Measurement Relays RM4L Liquid Level Relays

Momentum profits and time varying unsystematic risk

Avoiding Component Failure in Industrial Refrigeration Systems

The Great Recession in the U.K. Labour Market: A Transatlantic View

Examining the limitations for visual anglecar following models

CALCULATION OF EXPECTED SLIDING DISTANCE OF BREAKWATER CAISSON CONSIDERING VARIABILITY IN WAVE DIRECTION

The Measuring System for Estimation of Power of Wind Flow Generated by Train Movement and Its Experimental Testing

67.301/1. RLP 10: Pneumatic volume-flow controller. Sauter Components

Portfolio Strategies Based on Analysts Consensus

Centre for Investment Research Discussion Paper Series. Momentum Profits and Time-Varying Unsystematic Risk

296 Finance a úvěr-czech Journal of Economics and Finance, 64, 2014, no. 4

Sources of Over-Performance in Equity Markets: Mean Reversion, Common Trends and Herding

The Current Account as A Dynamic Portfolio Choice Problem

Portfolio Efficiency: Traditional Mean-Variance Analysis versus Linear Programming

Valuing Volatility Spillovers

Gas Source Localisation by Constructing Concentration Gridmaps with a Mobile Robot

What should investors know about the stability of momentum investing and its riskiness? The case of the Australian Security Exchange

The Construction of a Bioeconomic Model of the Indonesian Flying Fish Fishery

Reliability Design Technology for Power Semiconductor Modules

3.00 m. 8. At La Ronde, the free-fall ride called the Orbit" causes a 60.0 kg person to accelerate at a rate of 9.81 m/s 2 down.

PRESSURE SENSOR TECHNICAL GUIDE INTRODUCTION FEATURES OF ELECTRIC PRESSURE SENSOR. Photoelectric. Sensor. Proximity Sensor. Inductive. Sensor.

Making Sense of Genetics Problems

Keywords: overfishing, voluntary vessel buy back programs, backward bending supply curve, offshore fisheries in Taiwan

Simulation of Scattering Acoustic Field in Rod and Identify of. Ultrasonic Flaw Detecting Signal

Smart Beta Multifactor Construction Methodology: Mixing versus Integrating

Bill Turnblad, Community Development Director City of Stillwater Leif Garnass, PE, PTOE, Senior Associate Joe DeVore, Traffic Engineer

The safe ships trajectory in a restricted area

ANALYSIS OF RELIABILITY, MAINTENANCE AND RISK BASED INSPECTION OF PRESSURE SAFETY VALVES

Unsystematic Risk. Xiafei Li Cass Business School, City University. Joëlle Miffre Cass Business School, City University

Asset Allocation with Higher Order Moments and Factor Models

Chapter / rev/min Ans. C / in. C mm Ans teeth Ans. C / mm Ans.

Local Does as Local Is: Information Content of the Geography of Individual Investors Common Stock Investments

Real-time Stochastic Evacuation Models for Decision Support in Actual Emergencies

A NEW 296 ACRE DISTRIBUTION PARK

Prepared by: Candice A. Churchwell, Senior Consultant Aimee C. Savage, Project Analyst. June 17, 2014 CALMAC ID SCE0350

Transcription:

Machine Learning for Sock Selecion Rober J. Yan Compuer Science Dep., The Uniersiy of Wesern Onario jyan@csd.uwo.ca Charles X. Ling Compuer Science Dep., The Uniersiy of Wesern Onario cling@csd.uwo.ca ABSTRACT In his paper, we propose a new mehod called Prooype Ranking (PR) designed for he sock selecion problem. PR akes ino accoun he huge size of real-world sock daa and applies a modified compeiie learning echnique o predic he ranks of socks. The primary arge of PR is o selec he op performing socks among many ordinary socks. PR is designed o perform he learning and esing in a noisy socks sample se where he op performing socks are usually he minoriy. The performance of PR is ealuaed by a rading simulaion of he real sock daa. Each week he socks wih he highes prediced ranks are chosen o consruc a porfolio. In he period of 978-2004, PR s porfolio earns a much higher aerage reurn as well as a higher risk-adjused reurn han Cooper s mehod, which shows ha he PR mehod leads o a clear profi improemen. Caegories and Subjec Descripors I.5 [PATTERN RECOGNITION] General Terms Algorihms Keywords Sock selecion. INTRODUCTION Recenly a considerable amoun of work has been deoed o predicing socks based on he machine learning echniques (e.g., [;3;6]). These mehods use a se of raining samples o generae an approximaion of he underlying funcion of daa. Comparing wih saisical mehods, machine learning mehods do no inole assumpions abou sample independence or special disribuion [7]. These assumpions may no always be me in he real world siuaions, which machine learning mehods are designed o adap. In his paper, we inesigae he issue of sock selecion o form a porfolio wih high reurn. In a real world rading enironmen, gien a se of socks, how can we selec hose bes socks? This ask inoles a ranking predicion of socks and chooses he op ones o form he porfolio. The usual caegorical predicion sysems (i.e., The price/reurn rend predicion [3] ha only predics he direcion of he price moemen raher Permission o make digial or hard copies of all or par of his work for personal or classroom use is graned wihou fee proided ha copies are no made or disribued for profi or commercial adanage and ha copies bear his noice and he full ciaion on he firs page. To copy oherwise, or republish, o pos on serers or o redisribue o liss, requires prior specific permission and/or a fee. KDD 07, Augus 2 5, 2007, San Jose, California, USA. Copyrigh 2007 ACM 978--59593-609-7/07/0008...$5.00. han he expeced price) are no appropriae for his ask. For insance, we do no know how o selec he 5-bes socks if he sysem predics ha 20 socks will moe upward. Therefore, he ask of sock selecion needs a coninuous predicion sysem. All he sock price/reurn predicion mehods (i.e., linear regression) are coninuous sysems. Howeer, hey may sill lead o unreliable resuls. When i comes o he indiidual sock predicion, he majoriy of preious mehods (e.g.,[6]) selec he model ha achiees he maximum oerall predicion accuracy (i.e., sum of squared deiaions from acual oupus) for all socks. Howeer, in he case of sock selecion, where he goal is o form a porfolio by hose bes socks, we only care abou he op performing socks. Thus, he opimized model for all socks may no be suiable for our ask. We propose a new mehod, namely Prooype Ranking (PR) ha is based on he compeiie learning [5]. PR is designed for he sock selecion ask raher han he indiidual sock predicion ask. The oerall predicion accuracy is no longer he primary objecie during he model searching. Insead, PR ries o learn a nework of prooypes, where he prooypes are he super poins ha represen a group of raining samples nearby and he whole nework can be considered as a model. This nework has a beer chance o disinguish he op performing socks from ordinary socks. PR is applied o samples of NYSE and AMEX indiidual socks oer he period 978 o 2004. The experimens resuls show ha PR is robus in shor-erm sock selecion, and is performance is beer han he radiional Cooper s mehod of selecion [2] afer he ransacion coss. Secion 2 defines he ask of sock selecion. Secion 3 inroduces he process of PR learning and esing. The experimens resuls are shown in secion 4. A conclusion is gien in secion 0. 2. DEFINING STOCK SELECTION TASK In his secion, we will discuss he formulaion of sock selecion ask and is ealuaion. We assume ha rading days (when he marke is open) are diided ino weeks of fie days labeled by he index. The ask of sock selecion is o find n bes performing socks in he se of socks ha we choose for week, gien only informaion se aailable a he sar of he week. In order o formulae he sock selecion ino a machine learning ask we need o specify he following eniies: is he raining sample se of week wih N samples, S(j, ) j,, N ;. Noe ha each sample in is associaed wih a specific week prior o week. S( j, ) ( X( j, ), RR( j, )) is a sample, where SR he sample space and X R is he predicor ecor; RR is

is he sock real reurn and here exiss a underlying funcion f ( X ( )) RR( ). is a separaed esing sample se of week wih M samples. S( j, ) ( X( j, ), RR( j, )) j,, M. As a ypical machine learning process, a ranking funcion g ha approximaes f is learned from by a specific algorihm. The rank of a esing sample j in is hen prediced by Rank ( j, ) g( X ( j, )). Afer all he esing samples are assigned he prediced ranks, n socks wih highes/lowes ranks are seleced o form a porfolio of week. This process is repeaed from he firs esing week s o he las esing week e. We can see ha such sock selecion ask depends on wo key decisions: How do we find he g? Wha choice o make for he predicor ecor? We will discuss how o use he compeiie learning based mehod PR o find he ranking funcion g in secion 3.. For he predicor ecor, we follow Cooper [2] in he choice of predicors. This will be discussed in secion 4.. 3. PROTOTYPE RANKING In his secion, we discuss he algorihm of PR mehod consising of a raining process and a esing process. PR applies a modified compeiie learning mehod o learn a ranking funcion g based on he raining sample se and generaes prediced ranks for esing samples. A quick reiew of radiional compeiie learning is as follows: A compeiie learning model (nework) consiss of H prooypes p p p. Prooypes could be hough of, 2,, H as super poins ha represen a group of acual raining samples around hem in he inpu space R. Each prooype has an associaed reference ecor w R. The general compeiie learning process can be described as follows:. Iniialize he se by randomly choosing w i for each p i. 2. For each raining sample S R, calculae he disance from S o each w i and choose one or seeral closes prooypes (winners). 3. Adap he reference ecor of winners owards S: w ( ) w ( ) ( S w ) i i i ε is he learning rae. The compeiie learning algorihms are widely used for making clusering analysis [5] and feaure mapping [4]. 3. PR Training As shown in Figure 2, he PR raining consiss of he following hree seps. (). Daa Preparaion. The raw sock daa is conered ino samples. For each week, samples are diided ino raining samples and esing samples. (2). Training prooype ree. The radiional compeiie learning defines a mapping from he inpu daa o a single prooype nework. A modified compeiie learning algorihm is inroduced in his paper, which maps he inpu daa ino muliple prooype neworks arranged o a ree srucure. We call hese neworks a prooype ree. Figure shows an example of wo-dimensional prooype ree wih deph=3. In PR algorihm, an iniial complee k-ary prooype ree of deph L is firs creaed. Each node represens a fixed prooype in he predicor space R, which is a subspace of he inpu sample space R. Nodes in he same deph are disribued uniformly o compose a nework. The raining process maps a raining sample se o a prooype ree. For each raining sample S( j, ), PR searches is neares prooypes (winners) on each ree leel m. Those winning prooypes are hen adaped o S( j, ). Noe ha in PR, he searching of winners is performed in he predicor space R insead of he enire sample space R, because he predicion ask needs paerns in R space. A he end of his sep, we obain a rained prooype ree. I reflecs he paerns in he raining samples. (3). Opimizing he rained prooype ree ino a ranking model for he minoriy samples. This sep firs prunes he redundan prooypes. If all he children of a prooype are similar o each oher, hey can be replaced by heir paren prooype wihou informaion los. Considering he majoriy of socks are ordinary in a sock daase, mos prooypes in he ree mus be rained o be ordinary. Such a ree ends o gie ordinary predicions, which is meaningless o us. Pruning dramaically decreases he number of ordinary prooypes. Afer pruning, he ree has a beer change o generae exreme predicions. Howeer, single predicion is no wha we need. To make he pruned ree predic relaie relaions among socks, we assign each prooype an expeced rank. By doing so, he pruned ree is conered ino a ranking model. When i is used for predicion, esing samples ha are close o prooypes wih high expeced ranks obain high prediced ranking score. 3 2 Figure. Illusraion of 2D prooype ree 3.2 PR Tesing The idea of PR esing is assigning each esing sample a prediced ranking score. Inside he ranking model obained from he raining, here are a number of prooypes wih expeced ranks disribued in he R. Since prooypes always represen

he nearby samples, he rank of a esing sample should be close o he ranks of is neighbour prooypes. Therefore, we may apply he kernel regression [0] o calculae he prediced rank of a esing sample. Those esing socks wih he highes/lowes prediced ranking scores are seleced o form a porfolio. The real reurn of his porfolio is hen ealuaed as a measure o judge he performance of PR. As a summary, PR mehod has seeral properies: I has he abiliy o process he real-world sock daase. To adap o he new daa, he model mus be renewed eery week. Considering he huge size of he real-world sock daase, he bach mehods ha use all he aailable daa o build a new model each week become impracical. Insead, PR adops he on-line updae mechanism. I uses only he laes daa o updae he old model. PR mehod akes he properies of sock daa ino accoun. By applying he prooypes, PR can handle he daa noise and daa imbalance (i.e., here are many more samples belonging o one caegory han anoher). PR mehod does no predic he indiidual sock reurn or price. The goal of PR is generaing he ranking scores. The ranking score can be considered as he relaie price/reurn and is more predicable han indiidual price/reurn [8]. Training Tesing (). Daa Preparaion Training Samples (2). Training prooype Tree (3). Opimizing prooype Tree for ranking Prooype Tree (4). Predicing sample ranks Prediced Sample Ranks Ranking Model Figure 2. The framework of PR Iniial Tree 4. EXPERIMENTS In his secion, some empirical experimen resuls are discussed. In secion 4., we firs inroduce he daa used in he experimens. The procedure of he experimens as well as he measuremen is discussed in secion 4.2. In he following secions, he resuls of hree experimens ha we design o ealuae he PR mehods are proided. 4. Daa The daa come from he daabase of he cener for Research in Securiy Prices (CRSP). We examine all samples of NYSE and AMEX indiidual socks oer he period 962 (Dec.) o 2004 (Dec.). The sock unierse we sudy is reised monhly. I consiss of he 300 NYSE and AMEX socks ha hae he larges marke capializaion. In all 504 differen socks were chosen. We coner he daily daa ino fie-rading-day weekly daa. In a gien week, we omi any sock ha has missing olume or price informaion for any of he preious en days. Samples in he weekly daa se hae he same forma: S( j, ) ( X( j, ), RR( j, )) where is he index of week and j is he sock permanen number. The predicor ecor X( j, ) conains hree predicors: Predicor x(,j, ) = he reurn of sock j for he week -. Predicor 2 x(2,j, ) = he reurn of sock j for he week -2. VV2 Predicor 3 x(3,j, ) = olume alue raio defined as, VV2 where V, V2 are he alues of he olume for sock j for weeks -, -2. Comparing wih he olume raio Cooper used, which is VV2 represened by, our olume raio leads o a more V symmeric disribuion of alues. 4.2 Procedure The PR mehod is ealuaed in he ime period from he firs week of 978 o he las week of 2004. We apply PR on for raining a model and hen make predicions for he socks in. To ealuae he performance of PR, we need o compare he prediced resuls wih real resuls. As we menioned in secion, he oerall crieria (i.e., he sum of square error) is no appropriae. The righ hing we need o ealuae is he efficiency of he algorihm. Tha is, wheher or no hose socks chosen by PR are profiable. Clearly, his could be ealuaed by checking he real reurn of he chosen socks, a porfolio. In his paper we hae used a simple porfolio formaion scheme. Each week we form a neural porfolio consising of n socks long and n socks shor. The long (shor) socks are hose wih highes (lowes) ranks. Each sock has equal weigh (excep when here are seeral socks ied for las place, and hen all hose socks are chosen wih equal reduced weigh). The aerage reurn of hese porfolios oer he esing ime period, which is denoed by ARP, is wha we sudy. PR mehod aims o minimize he danger of daa snooping. Daa snooping occurs when a gien se of daa is used more han once for purposes of inference or model selecion [9]. Therefore, he parameers of learning mus be decided prior o he esing ime period. In his experimen, PR searches he opimal alues of is parameers in he ime period from 963 o 977 and makes learning and esing in 978-2004. Those opimal alues of parameers are d=4, ( ) 0.9, k=9, s=4.5, and T=0.8. We design wo experimens for differen ealuaing purposes. Experimen ess he predicabiliy of he PR mehod.

Experimen 2 compares PR mehod wih Cooper s mehod boh before and afer he ransacion coss. In hese experimens, we diide he esing period ino wo (978-993 and 994 2004), because 978-993 was he one Cooper used for his ess so we can obain a direc comparison. 4.3 Experimen The predicabiliy of PR can be ealuaed by comparing he reurns of differen porfolios i consrucs in he same ime period. Gien a week, wo porfolios P, P2 are consruced. P has 2n socks and P2 has 2n2 socks. We denoe he expeced reurn and he real reurn of a porfolio P in week as RP ( P) and RP ( P) respeciely. Naurally, if a porfolio performs as i is prediced, he algorihm ha generaes he porfolio is considered o be wih predicabiliy. The condiion of predicabiliy can be defined as: Assume ha an algorihm predics P is beer P2, which means ha RP( P) RP( P2). If ( RP( P) RP( P2)) ( RP( P) RP( P2)), hen his algorihm has he predicabiliy in week. Howeer, PR does no really calculae he expec reurn of a porfolio. PR always chooses he sock wih he highes (lowes) rank and he chosen sock always has he highes expeced reurn in he se of remaining socks. The more socks inoled in a porfolio, he lower is expeced reurn. Therefore we may change he condiion of predicabiliy o: If ( n n2) ( RP( P) RP( P2)), hen PR has he predicabiliy in week. Similarly, he condiion of predicabiliy in a cerain ime period is defined as follows. If ( n n2) ( ARP( P) ARP( P2)), hen PR has he predicabiliy in his ime period. Howeer, he aboe condiion only works in he pure daase wih no noise. Gien a real-world sock daase, PR generaes wo porfolios P wih n socks and P2 wih n2 socks. If n and n2 are oo close (i.e., n 5 and n 2 6 ), een if PR has a cerain leel of predicabiliy, he aboe condiion may sill be iolaed because of he heay noise in he raining samples. To correcly reflec PR s predicabiliy under he noisy enironmen, he difference beween n and n2 should be larger enough o olerae he noise. We always se ha n2 n 5. In his experimen, for boh ime periods 978-993 and 994-2004, PR generaes en porfolios wih differen sock numbers 2n ( n 5 i, i,,0 ). We compare hese porfolios and presen he resuls in Figure 3. For each ime period, he predicabiliy condiion has been esed by 9 cases. In all he cases, he condiion is saisfied. In boh gien periods, he aerage reurn of he porfolio increases seadily as n decreases from 50. In addiion, we calculae he reurn difference d b/w beween he reurn of he expeced bes porfolio and he reurn of he expeced wors porfolio. d b/w represens he leel of predicabiliy in a way. db/ wrp( Pi) RP( Pj) where Pi argmax{ RP( P)} and Pj argmin{ RP( P)}. We may rewrie he equaion as follows. db/ w RP( P( n 5)) RP( P( n 50)) In 978-993, d b/w is.0% and in 994-2004, d b/w is 0.7%. They are boh significan changes. All hese resuls show srong eidences of he predicabiliy of PR oer 978-993 and 994-2004. Figure 3. The oerall predicabiliy of PR 4.4 Experimen 2 In his paper, we focus on he shor-erm sock selecion based on only hisorical reurn and olume informaion. Cooper [2] inesigaed he same problem and proposed a mehod (CP). In he learning phase, Cooper firs, for each predicor, diides ino deciles he hisorical disribuion of predicor alues. Using he decile boundary alues, he hree-dimensional predicor space is pariioned ino 000 cells, wih each cell assigning an aerage one-week reurn of all socks in i. In he esing phase, he aerage reurn of a cell can be used as he prediced reurn of esing samples belonging o his cell. CP has no machine learning echniques inoled. The comparison beween PR and CP will show us wheher he sock selecion ask benefis from applying some machine learning echniques. We apply boh PR and CP o he same sock daa se using he same procedure discussed in secion 4.2. For each week from 978 o 2004, each mehod forms hree weekly porfolios wih 0, 20, 30 socks respeciely. Table repors heir performances in 978-993 and 994-2004. In all cases, he PR earns higher Ae. reurn compared wih CP. The aerage margin of hree PR reurns oer hree CP reurns in he 978-993 is 76.3% and in 994-2004, i is 58.4%. We also compare he risk-adjused porfolio performance, which is usually measured by he Sharpe Raio (reun/sd.). Table shows ha in his case PR also ouperforms CP. For example, he Sharpe Raios of hree PR porfolio oer 978-993 are 0.5, 0.52, and 0.52 respeciely. In conras, he Sharpe Raio of CP porfolios are 0.32, 0.38, and 0.37 oer he same ime period, respeciely. The comparison in 994-2004 shows he similar resuls. The aboe experimen compares he predicabiliy of PR and CP and shows ha PR generaes more accurae and sable predicions. We also ealuae wheher PR s predicions are more profiable compared o CP s predicions under he ransacion coss. The final inesmen alue (FIV) of he porfolios under he ransacion coss is used as he measure. Alhough esimaing he real ransacion coss of each rade is difficul, i is

reasonable o suppose ha he coss for CP and PR would hae been similar. For echnical conenience, we follow Cooper [2] in seing he round-rip cos leels o ealuae he afer-cos performance for boh mehods: 0.25%; l (low ransacion coss) Transacion coss cl 0.5%; l 2 (medium ransacion coss) 0.75%; l 3 (high ransacion coss) Table. The Performance Comparison: PR.s. CP Porfolio 0-sock 20-sock 30-sock Performance 978-993 994-2004 PR CP PR CP Ae. Reurn (%).69 0.89.3 0.8 STD (%) 3.3 2.8 6.2 5. Sharpe Raio 0.5 0.32 0.2 0.6 Ae. Reurn (%).35 0.80.32 0.8 STD (%) 2.6 2. 5. 4.3 Sharpe Raio 0.52 0.38 0.26 0.9 Ae. Reurn (%).4 0.67.6 0.77 STD (%) 2.2.8 4.6 3.5 Sharpe Raio 0.52 0.37 0.27 0.22 We compare he 0-sock porfolios of PR and CP, which represen heir bes profiabiliy. Considering ha he inesing is a coninuous process, we do no spli he esing period. Therefore we calculae he FIV of he porfolios in 2004 (Assume ha inesors sar off wih $ in 978 and reines he porfolio income eery week) under differen ransacion coss. These resuls are shown in Table 2. Table 2. The FIV Comparison: PR.s. CP Transacion Coss FIV of PR (2004) ($) FIV of CP (2004) ($) Low 6E5 256.5 Medium 77.7 0.22 High.43 0 For boh mehods, he profi drops dramaically as he ransacion coss increase from 0.25% o 0.75%. Under he same coss leel, PR always ouperforms CP. A he low cos leel, he FIV of PR and CP in 2004 are $ 6E5 and $256, respeciely. In he cases of medium and high ransacion coss, PR porfolios are sill profiable. The FIVs of PR are $77.7 (medium) and $.43 (high). In conras, he profi of CP porfolios has disappeared under medium or high ransacion coss. As we expeced, PR suries a higher leel of coss relaie o CP and shows beer profiabiliy. 5. CONCLUSION This paper proposed a machine learning mehod called Prooype Ranking (PR) for shor-erm sock predicion. The goal of he PR mehod is o selec n bes performing socks from a sock se based on he ranking funcion g learned in he hisorical sock daa. PR applies a modified compeiie learning echnique, which is designed for discoering models under he noisy and imbalanced enironmen. In he esing phase, each esing sample is assigned a prediced ranking score and he socks wih he highes/lowes ranks are seleced o form a porfolio. The experimenal resuls show srong eidences of he predicabiliy of PR. In addiion, PR ouperforms CP, which is a non-machine-learning mehod. This shows he adanage of applying machine learning in he shor-erm sock predicion. This work can be furher improed in wo direcions. Firs, gien curren predicors, we may apply boosing echniques o improe he accuracy. Second, in he paper we only apply he shor-erm predicing. I is possible o combine he shor-erm predicing wih he long-erm predicing for he sock selecion. 6. ACKNOWLEDGMENTS We appreciae he access o he CRSP daabase proided by he Uniersiy of Wesern Onario ia he WRDS sysem. REFERENCES [] Aramo,D., and Chordia,T. (2006), Predicing sock reurns, Journal of Financial Economics 82, 387-45. [2] Cooper,M. (999), Filer rules based on price and olume in indiidual securiy oerreacion, Reiew of Financial Sudies 2, 90-935. [3] Edwards,R.D. and Magee,J., Technical Analysis of Sock Trends (Amacom Books, 997). [4] Frizke,B. (994), Growing Cell Srucures - A Self- Organizing Nework for Unsuperised and Superised Learning, Neural Neworks 7, 44-460. [5] Frizke,B. Some compeiie learning mehods. 997. [6] Hamid,S.A., and Iqbal,Z. (2004), Using neural neworks for forecasing olailiy of S&P 500 Index fuures prices, Journal of Business Research 57, 6-25. [7] Hasie,T., Tibshirani,R., and Friedman,J.H., The Elemens of Saisical Learning (Springer, 2003). [8] Hellsrom,T. (200), Opimizing he Sharpe Raio for a Rank Based rading Sysem, EPIA 200, LNAI 2258 30-4. [9] Sullian,R., Timmermann,A., and Whie,H. (999), Daa- Snooping, echnical rading rule performance, and he boosrap, The journal of Finance 54 647-69. [0] Wolberg,J.R., Exper rading sysems : modeling financial markes wih kernel regression (; Wiley, New York 2000).