Evaluation and Ranking of Market Forecasters

Background, motivation, and outcomes Methodology Results Conclusion

Evaluation and Ranking of Market Forecasters

Amir Salehipour

ARC DECRA Fellow, University of Technology Sydney

Joint work with David H. Bailey, Jonathan M. Borwein, andMarcos Lopez de Prado

September 29, 2017


Table of Contents

1 Background, motivation, and outcomes

2 MethodologyEvaluation algorithmTraining and testing the algorithm

3 ResultsForecaster accuracyTraders and investors

4 Conclusion


Background, motivation, and outcomes

Motivation

1 Many investors rely on market experts and forecasters when makinginvestment decisions.

2 Ranking and grading market forecasters provides investors with metrics onwhich they may choose forecasters with the best record of accuracy.

Aim of this studyThis study develops a novel ranking methodology to rank the market forecaster;in particular,

1 we distinguish forecasts by their time frame, and specificity, rather thanconsidering all forecasts equally important, and

2 we analyze the impact of the number of forecasts made by a particularforecaster.

Outcomes of this study

1 Across a dataset including 6,627 forecasts made by 68 forecasters, theaccuracy is around 48% implying the majority of forecasters perform atlevels not significantly different than chance.


Forecasts are not reliable enough!

“The forecasts were least useful when they mattered most” [Kaissar].

When analyzing a set of strategists’ predictions from 1999 to 2016, Kaissar foundthat forecasts were surprisingly unreliable during major inflection points [Kaissar]:

1 The strategists overestimated the S&P 500’s year-end price by 26.2% onaverage during the three recession years 2000 to 2002.

2 They underestimated the index’s level by 10.6% for the initial recovery year2003.

3 They overestimated the S&P 500’s year-end level by a whopping 64.3% in2008, but then underestimated the index by 10.9% for the first half of 2009.


Previous ranking

In 2012, the CXO Advisory Group ranked 68 forecasters based on their 6,582forecasts (forecasts made for the S&P 500 index) [CXO1].

That study acknowledged some weaknesses:

All predictions and forecasts were considered equally significant.

The analysis was not adjusted based on the number of forecasts made by aparticular forecaster: some experts made only a handful of predictions, whileothers made many; weighting these the same may lead to distortions whentheir forecasting records are compared.


Our ranking

We extended and advanced the previous ranking (therefore, we used the samedataset).

In particular, the major contributions of our study are:

Investigating in greater detail how market forecasters can be graded andranked;

developing an alternative and comprehensive ranking methodology;

recognizing and prioritizing forecasts by considering different weights forshort- and long-term forecasts, or for important/specific forecasts; and,

investigating and analyzing the most effective and meaningful metrics andmeasures.


Our ranking methodology

1 Part 1:

Every forecast statement is evaluated by calculating the return of the S&P500 index over four periods of one month, three months, nine months, and 12months.

The correctness of the forecast is determined in accordance with the timeframes.

2 Part 2

Each individual forecast is treated according to two factors of time frame andimportance/specificity (not all forecasts are equally important).

Long-term forecasts are treated as more significant than the short-termforecasts (because in the long-term underlying trends, if any, tend toovercome short-term noise; wt ∈ {0.25, 0.50, 0.75, 1.00}).

Specific forecasts are treated more important than non-specific ones(ws ∈ {0.50, 1.00}).


Forecaster’s score

Combined weight for a forecast:

w+i = wt × ws if forecast i is correct

w−i = wt × ws if forecast i is not correct

Where w+i implies when the forecast is true, and w−

i when it is false.

Then, score (accuracy) of a forecaster may be obtained as:

εj =Σ

nji=1w

+i

Σnji=1w

+i + Σ

nji=1w

−i

(1)

where j is the forecaster’s index, and nj is the total number of forecasts made byforecaster j .


Evaluation algorithm

We developed an algorithm, and coded that in the programming languagePython 2.7.

The algorithm reads every record in the dataset, processes the forecaststatements by assigning weights according to the six sets of pre-definedkeywords, and derives the forecasters score.

Keywords: Kt = {{Kt1}, {Kt2}, {Kt3}, {Kt4}}, and Ks = {{Ks1}, {Ks2}},where Kt includes subsets of keywords associated with time frames, and Ks

includes subsets of keywords associated with the importance of the forecasts.

The algorithm analyzes every forecast by applying both sets of keywords tofind any match, and then assigns weights accordingly.


Training and testing

Training the algorithm:

The performance of the algorithm depends on the keywords; we consider aset of 14 forecasters (about 20%) as the training dataset.

We manually analyze and evaluate every forecast in the training set, andcalculate the score of the forecasts.

Then we apply the algorithm to the training dataset, and compare theforecasters’ accuracy obtained by the algorithm against the one obtainedmanually. We update the sets of keywords accordingly.

After several attempts, we obtained an accuracy of 92.16% for thealgorithm’s performance.

Testing the algorithm:

We apply the algorithm to the remaining 54 forecasters, which we call thetesting dataset.


Results – Forecasters accuracy

0

1

2

3

4

5

6

01020304050607080

Abb

y Jo

seph

Coh

enA

den

Sis

ters

Ben

Zac

ksB

erni

e S

chae

ffer

Bill

Car

aB

ill Fl

ecke

nste

inB

ob B

rinke

rB

ob D

oll

Bob

Hoy

eC

abot

Mar

ket L

ette

rC

arl F

utia

Car

l Sw

enlin

Cha

rles

Bid

erm

anC

lif D

roke

Com

stoc

k P

artn

ers

Cur

t Hes

ler

Dan

Sul

livan

Dav

id D

rem

anD

avid

Nas

sar

Den

nis

Slo

thow

erD

on H

ays

Don

Lus

kin

Don

ald

Row

eD

oug

Kas

sG

ary

D. H

albe

rtG

ary

Kal

tbau

mG

ary

Sav

age

Gar

y S

hilli

ngIg

or G

reen

wal

dJa

ck S

chan

nep

Jam

es D

ines

Jam

es O

berw

eis

Jam

es S

tew

art

Jaso

n K

elly

Jere

my

Gra

ntha

mJi

m C

ram

erJi

m J

ubak

Jim

Pup

lava

John

Buc

king

ham

John

Mau

ldin

Jon

Mar

kman

Ken

Fis

her

Lasz

lo B

iriny

iLi

nda

Sch

urm

anLo

uis

Nav

ellie

rM

arc

Fabe

rM

ark

Arb

eter

Mar

tin G

oldb

erg

Mik

e P

aule

noff

Nad

eem

Wal

ayat

Pau

l Tra

cyP

eter

Elia

des

Pric

e H

eadl

eyR

icha

rd B

and

Ric

hard

Mor

oney

Ric

hard

Rho

des

Ric

hard

Rus

sell

Rob

ert D

rach

Rob

ert M

cHug

hR

ober

t Pre

chte

rS

&P

Out

look

Ste

phen

Lee

bS

teve

Sav

illeS

teve

Sju

gger

udS

teve

n Jo

n K

apla

nTi

m W

ood

Tobi

n S

mith

Trad

ing

Wire

%

Total accuracy versus accuracy per forecast and forecast share

Accuracy (This study) Accuracy per forecast Forecast share (no. of forecasts/total forecasts)

Because not every forecaster has made an equal number of forecasts, we analyzedthe accuracy per forecast:

ej =εjnj

(2)

Where, εj is obtained by the algorithm, and nj number of forecasts made byforecaster j .And forecast share of forecaster j :

sj =nj

Σjnj× 100 (3)


Results – Forecasters accuracy vs CXO study

Accuracy gap grasps the changes in the forecasters accuracy between two studies:

-20

-15

-10

-5

0

5

10

15

20

Abby

Jos

eph

Coh

enA

den

Sist

ers

Ben

Zack

sBe

rnie

Sch

aeffe

rB

ill C

ara

Bill

Fle

cken

stei

nB

ob B

rinke

rBo

b D

oll

Bob

Hoy

eC

abot

Mar

ket L

ette

rC

arl F

utia

Car

l Sw

enlin

Cha

rles

Bide

rman

Clif

Dro

keC

omst

ock

Partn

ers

Cur

t Hes

ler

Dan

Sul

livan

Dav

id D

rem

anD

avid

Nas

sar

Den

nis

Slot

how

erD

on H

ays

Don

Lus

kin

Don

ald

Row

eD

oug

Kass

Gar

y D

. Hal

bert

Gar

y Ka

ltbau

mG

ary

Sava

geG

ary

Shi

lling

Igor

Gre

enw

ald

Jack

Sch

anne

pJa

mes

Din

esJa

mes

Obe

rwei

sJa

mes

Ste

war

tJa

son

Kel

lyJe

rem

y G

rant

ham

Jim

Cra

mer

Jim

Jub

akJi

m P

upla

vaJo

hn B

ucki

ngha

mJo

hn M

auld

inJo

n M

arkm

anKe

n Fi

sher

Lasz

lo B

iriny

iLi

nda

Schu

rman

Loui

s N

avel

lier

Mar

c Fa

ber

Mar

k Ar

bete

rM

artin

Gol

dber

gM

ike

Paul

enof

fN

adee

m W

alay

atP

aul T

racy

Pet

er E

liade

sP

rice

Hea

dley

Ric

hard

Ban

dR

icha

rd M

oron

eyR

icha

rd R

hode

sR

icha

rd R

usse

llR

ober

t Dra

chR

ober

t McH

ugh

Rob

ert P

rech

ter

S&P

Out

look

Ste

phen

Lee

bS

teve

Sav

illeSt

eve

Sjug

geru

dSt

even

Jon

Kap

lan

Tim

Woo

dTo

bin

Smith

Trad

ing

Wire

%

Gap = (Accuracy of this study - Accuracy of benchmark)

Positive values reflect improvement in the accuracy over the previous study, andnegative values reflect decreased accuracy.

In particular, 63.24% of forecasters have lower accuracy.


Results – Forecasters time frame and specificity

0%10%20%30%40%50%60%70%80%90%

100%

Abb

y Jo

seph

Coh

enA

den

Sis

ters

Ben

Zac

ksB

erni

e S

chae

ffer

Bill

Car

aB

ill Fl

ecke

nste

inB

ob B

rinke

rB

ob D

oll

Bob

Hoy

eC

abot

Mar

ket L

ette

rC

arl F

utia

Car

l Sw

enlin

Cha

rles

Bid

erm

anC

lif D

roke

Com

stoc

k P

artn

ers

Cur

t Hes

ler

Dan

Sul

livan

Dav

id D

rem

anD

avid

Nas

sar

Den

nis

Slo

thow

erD

on H

ays

Don

Lus

kin

Don

ald

Row

eD

oug

Kas

sG

ary

D. H

albe

rtG

ary

Kal

tbau

mG

ary

Sav

age

Gar

y S

hilli

ngIg

or G

reen

wal

dJa

ck S

chan

nep

Jam

es D

ines

Jam

es O

berw

eis

Jam

es S

tew

art

Jaso

n K

elly

Jere

my

Gra

ntha

mJi

m C

ram

erJi

m J

ubak

Jim

Pup

lava

John

Buc

king

ham

John

Mau

ldin

Jon

Mar

kman

Ken

Fis

her

Lasz

lo B

iriny

iLi

nda

Sch

urm

anLo

uis

Nav

ellie

rM

arc

Fabe

rM

ark

Arb

eter

Mar

tin G

oldb

erg

Mik

e P

aule

noff

Nad

eem

Wal

ayat

Pau

l Tra

cyP

eter

Elia

des

Pric

e H

eadl

eyR

icha

rd B

and

Ric

hard

Mor

oney

Ric

hard

Rho

des

Ric

hard

Rus

sell

Rob

ert D

rach

Rob

ert M

cHug

hR

ober

t Pre

chte

rS

&P

Out

look

Ste

phen

Lee

bS

teve

Sav

ille

Ste

ve S

jugg

erud

Ste

ven

Jon

Kap

lan

Tim

Woo

dTo

bin

Sm

ithTr

adin

g W

ire

Percentage of forecasts time frames

% of forecasts with weight 1.00 % of forecasts with weight 0.75 % of forecasts with weight 0.50 % of forecasts with weight 0.25

0%10%20%30%40%50%60%70%80%90%

100%

Abb

y Jo

seph

Coh

enA

den

Sis

ters

Ben

Zac

ksB

erni

e S

chae

ffer

Bill

Car

aB

ill F

leck

enst

ein

Bob

Brin

ker

Bob

Dol

lB

ob H

oye

Cab

ot M

arke

t Let

ter

Car

l Fut

iaC

arl S

wen

linC

harle

s B

ider

man

Clif

Dro

keC

omst

ock

Par

tner

sC

urt H

esle

rD

an S

ulliv

anD

avid

Dre

man

Dav

id N

assa

rD

enni

s S

loth

ower

Don

Hay

sD

on L

uski

nD

onal

d R

owe

Dou

g K

ass

Gar

y D

. Hal

bert

Gar

y K

altb

aum

Gar

y S

avag

eG

ary

Shilli

ngIg

or G

reen

wal

dJa

ck S

chan

nep

Jam

es D

ines

Jam

es O

berw

eis

Jam

es S

tew

art

Jaso

n K

elly

Jere

my

Gra

ntha

mJi

m C

ram

erJi

m J

ubak

Jim

Pup

lava

John

Buc

king

ham

John

Mau

ldin

Jon

Mar

kman

Ken

Fis

her

Lasz

lo B

iriny

iLi

nda

Sch

urm

anLo

uis

Nav

ellie

rM

arc

Fabe

rM

ark

Arb

eter

Mar

tin G

oldb

erg

Mik

e P

aule

noff

Nad

eem

Wal

ayat

Pau

l Tra

cyP

eter

Elia

des

Pric

e H

eadl

eyR

icha

rd B

and

Ric

hard

Mor

oney

Ric

hard

Rho

des

Ric

hard

Rus

sell

Rob

ert D

rach

Rob

ert M

cHug

hR

ober

t Pre

chte

rS

&P O

utlo

okS

teph

en L

eeb

Ste

ve S

avill

eS

teve

Sju

gger

udS

teve

n Jo

n K

apla

nTi

m W

ood

Tobi

n S

mith

Trad

ing

Wire

Percentage of specific forecasts versus non-specific forecasts

% of specific forecasts % of non-specific forecasts


Results – Traders and investors

Long-term forecasters or “investors”, who have at least 30% of the forecasts withweights 0.75 and 1.00, and short-term forecasters or “traders”, with the majorityof the forecasts with weights 0.25 and 0.50.

We observed no forecaster has 50% or more of his forecasts with weights greaterthan 0.50.


Results – Traders and investors

Investors

0

10

20

30

40

50

60

0

10

20

30

40

50

60

Ken

Fis

her

Jam

es O

berw

eis

Pau

l Tra

cy

Ste

phen

Lee

b

Clif

Dro

ke

Gar

y D

. Ha

lber

t

Jere

my

Gra

ntha

m

Jim

Cra

me

r

Abb

y Jo

seph

Co

hen

Cur

t H

esl

er

Ste

ven

Jon

Kap

lan

Ste

ve S

avi

lle Per

cen

tage

of

fore

cast

s

Acc

urac

y

Accuracy of forecasters in the investor group

Accuracy (This study) Percentage of forecasts with weights > 0.5

Traders

0

20

40

60

80

100

01020304050607080

John

Buc

king

ham

Jack

Sch

anne

p

Dav

id N

assa

r

Dav

id D

rem

an

Cab

ot M

arke

t Let

ter

Loui

s N

avel

lier

Lasz

lo B

iriny

i

Stev

e S

jugg

erud

Rob

ert D

rach

Jaso

n Ke

lly

Bob

Dol

l

Dan

Sul

livan

Ade

n Si

ster

s

Don

Lus

kin

Ben

Zack

s

Gar

y K

altb

aum

Ric

hard

Mor

oney

Igor

Gre

enw

ald

Tobi

n S

mith

Car

l Sw

enlin

Mar

k Ar

bete

r

Ric

hard

Rho

des

Car

l Fut

ia

Don

Hay

s

Jam

es S

tew

art

Trad

ing

Wire

S&P

Out

look

Bob

Brin

ker

Pet

er E

liade

s

Jon

Mar

kman

Mar

tin G

oldb

erg

Jam

es D

ines

Cha

rles

Bid

erm

an

Den

nis

Slot

how

er

Bill

Car

a

Tim

Woo

d

Ber

nie

Scha

effe

r

Lind

a S

chur

man

Ric

hard

Ban

d

Don

ald

Row

e

Pric

e H

eadl

ey

Dou

g Ka

ss

Gar

y S

avag

e

Mar

c Fa

ber

Jim

Jub

ak

Ric

hard

Rus

sell

John

Mau

ldin

Nad

eem

Wal

ayat

Gar

y S

hillin

g

Jim

Pup

lava

Bill

Flec

kens

tein

Com

stoc

k Pa

rtner

s

Bob

Hoy

e

Rob

ert M

cHug

h

Mik

e P

aule

noff

Rob

ert P

rech

ter Pe

rcen

tage

of f

orec

asts

Accu

racy

Accuracy of forecasters in the trader group

Accuracy (This study) Percentage of forecasts with weights <= 0.5


Results – Summary

Major finding: the majority of forecasters perform at levels not significantlydifferent than chance, which makes it very difficult to tell if there is any skillpresent.

Across all forecasts, the accuracy is around 48%.

Two-thirds of forecasts predict as far as only a month.

Only one-third of forecasts predict periods over one month.

Two-thirds of forecasters have an accuracy level below 50%.

Only about 6% of forecasters have their accuracy values between 70%and 79%; the highest accuracy value is still below 80%.


Publication and awards

David H. Bailey, Jonathan M. Borwein, Amir Salehipour, and Marcos L opezde Prado. “Evaluation and ranking of market forecasters”. In: Journal ofInvestment Management. Forthcoming.

Awarded “Silver Bullet” award (20 May 2017).

Among the most viewed papers on the SSRN (www.ssrn.com) with a total2,762 views within six months (28 September 2017).


Bibliography

Terry Burnham, “Ben Bernanke as Easter bunny: Why the Fed can’t prevent the coming crash”,PBS NewsHour, 11 July 2013, available at http://www.pbs.org/newshour/making-sense/ben-bernanke-as-easter-bunny-why-the-fed-cant-prevent-the-coming-crash/.

Terry Burnham, “Why one economist isn’t running with the bulls: Dow 5,000 remains closer thanyou think”, PBS NewsHour, 21 May 2014, available at http://www.pbs.org/newshour/making-sense/one-economist-isnt-running-bulls-dow-5000-remains-closer-think/.

Nir Kaissar, “S&P 500 forecasts: Crystal ball or magic 8?,” Bloomberg News, 23 December 2016,available at https://www.bloomberg.com/gadfly/articles/2016-12-23/s-p-500-forecasts-mostly-hit-mark-until-they-matter-most.

Steve LeCompte, editor, “Guru grades”, CXO Advisory Group, 2013, available athttp://www.cxoadvisory.com/gurus/.

Steve LeCompte, editor, “The most intriguing gurus?”, CXO Advisory Group, 2009, available athttp://www.cxoadvisory.com/4025/investing-expertise/the-most-intriguing-gurus/.

Miles Udland, “Here’s what 13 top Wall Street pros are predicting for stocks in 2015”, BusinessInsider, 3 January 2015, available athttp://www.businessinsider.com/wall-street-2015-sp-500-forecasts-2015-1.

Minitab Inc, “Minitab 17 Statistical Software”, www.minitab.com, 2015, www.minitab.com.

http://www.pbs.org/newshour/making-sense/ben-bernanke-as-easter-bunny-why-the-fed-cant-prevent-the-coming-crash/

http://www.pbs.org/newshour/making-sense/ben-bernanke-as-easter-bunny-why-the-fed-cant-prevent-the-coming-crash/

http://www.pbs.org/newshour/making-sense/one-economist-isnt-running-bulls-dow-5000-remains-closer-think/

http://www.pbs.org/newshour/making-sense/one-economist-isnt-running-bulls-dow-5000-remains-closer-think/

https://www.bloomberg.com/gadfly/articles/2016-12-23/s-p-500-forecasts-mostly-hit-mark-until-they-matter-most

https://www.bloomberg.com/gadfly/articles/2016-12-23/s-p-500-forecasts-mostly-hit-mark-until-they-matter-most

http://www.cxoadvisory.com/gurus/

http://www.cxoadvisory.com/4025/investing-expertise/the-most-intriguing-gurus/

http://www.businessinsider.com/wall-street-2015-sp-500-forecasts-2015-1

www.minitab.com

Documents

Evaluation and Ranking of Market Forecasters