Upload
ali-jafri-chicago-booth-mba
View
223
Download
0
Embed Size (px)
Citation preview
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
1/42
Laboratoire des Sciences de l'Image, de l'Informatique et de la Tldtection
LSIIT - UMR 7005
Fundamental and Applied Computer Science Research Master
Internship Research Report
Discovery of Stock TradingExpertise Using Genetic
Programming
By: Syed Muhammed Ali Jafri
Supervisor: Pr. Jerzy KORCZAK
Illkirch, September 2006
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
2/42
Contents1. Introduction 1
2. Background 22.1 Financial Prediction 22.2 Evolutionary Computing 22.3 Genetic Algorithms 2
2.3.1 String representation 2
2.3.2 Crossover and mutation operations 3
2.3.3 Fitness based selection 3
2.4 Genetic Programming 42.4.1 Tree structure representation 4
2.4.2 Operators and terminals 4
2.4.3 GP Crossover and mutation 5
3. Internet Bourse Experts System. 73.1 Introduction 73.2 Conceptual Flow 73.2 GP Engine in IBE 9
4 System Design and Implementation 94.1 Problem Definition 94.2 Implementation details 9
4.2.1 Technical Indicator set 10
4.2.2 Fitness Measurement12
4.2.3 Description of GP Engine 14
5. Experimentation 205.1 Experimental Aims and Objectives 205.2 Trading Procedure 205.3 Experimental Input Data 215.4 Parameters 225.5 Results 245.6 Discussion of Results 26
6. Conclusion 29
Appendix A: GP Algorithm Parameters 31
Bibliography 37
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
3/42
1. Introduction
Evolutionary computation has been extensively applied to problems whose solution space is
irregular, i.e., too large and highly complex, so that it is difficult to employ conventional
optimization procedures to search for the global optimum [Chen, 1998].
Solution spaces for financial time series data are highly irregular. General acceptance of this
property has in fact fostered the growth of financial engineering [Chen, 1998].
Many strategies and frameworks have been employed ranging from the traditional and more
popular autoregressive statistical approaches such as ARCH and GARCH [Gourieroux, 1997]
To more recent evolutionary approaches such as neural networks [Krishnaswamy et al., 2000],
genetic algorithms [Allen, 1999], [Korczak, 2001], [Lipinski, 2003], [Korczak, 2004] and
genetic programming [Langdon, 1995], [Kaboudan, 1999], [Santini, 2000], [Hui, 2003],
[Castebrunet,2005] which is the concern of this report.
The objective is to model a process of evolution-based learning and to create a geneticprogramming based system, which will be able to accept high frequency stock market data,
analyze it and rapidly and give BUY/SELL/HOLD signals.
This system will generate trees of technical trading rules, joined together by the logical
operators. Every decision signal at a certain point will be result of a training stage and a testing
stage. The training stage will include generation of trees, performance testing and evolution.
At the end of the training stage, a single best tree will remain which will be used to generate a
decision for the testing stage. For further time steps, a selection of the best trees will be reused
in the previous time step.
The idea is that at each time step better performing trees will be used and promoted, the lesser
trees being discarded.
This report begins with an introduction to the theory and practice of financial prediction and a
description of evolutionary computation techniques, with a particular focus on Genetic
Programming. This is followed by a review of an existing system, Internet Bourse Experts.
Details of the design and implementation of the GP system are included next, together with a
discussion of various design choices and also a description of the dynamically-adaptive GP
algorithm and how this algorithm will fit into IBE.
1/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
4/42
2. Background
2.1 Financial Prediction
The idea behind financial prediction is to use historical pricing data of the assets traded to
identify unique trends and patterns in the fluctuations of prices [Pantazopoulos et al., 1998].
These patterns and trends are used to predict what the forthcoming price movements will be
and decisions to buy or sell an asset are made on this basis.
Classes of patterns and/or trends are uniquely identified using technical indicators which can
either quantitative or qualitative. A lot of technical indicators are based on moving average
computations or from series of local minimums and maximums [Lendasse et al., 2001].
2.2 Evolutionary Computing
Evolutionary Computing concerns itself with computer programs trying to behave as livingorganisms undergoing Darwinian mechanisms of natural selection for the purpose of
optimization, adaptation or search [Koza, 1992], [Spears, 2003]. All evolutionary algorithms
involve the representation of set of possible solutions to a given problem as a population of
individuals. The fitness of each candidate solution is tested and the best individuals are
permitted to survive and produce offspring derived from them. This is seen to create complex
and highly adapted organisms - optimized solutions to the problem of survival and
reproduction in the natural environment [Xu et al., 2003].
2.3 Genetic Algorithms
Genetic Algorithms operate on a population of individuals represented by character strings
[Holland 1975], [Mitchell et al 1992]. These are evaluated according to a fitness function
appropriate to the problem in hand. Pairs of individuals, selected at random but biased
according to fitness, are recombined to create members of a new population. Starting from an
initial population of randomly generated candidate solutions, successive generations are
produced until some termination criterion is reached: This may be the convergence of the
average and maximum fitness values, or simply a limit on the number of generations.
2.3.1 String representation
Genetic Algorithms represent candidate solutions as strings - finite sequences of charactersfrom a given alphabet (typically binary or integer numeric). The method of mapping a
candidate solution to a GA string depends on the problem domain: The string may represent,
for example, an ordered sequence of operations, or a set of independent parameters. However,
a particular location in the string sequence always represents the same part or parameter of the
solution.
2/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
5/42
2.3.2 Crossover and mutation operations
The string representation used in GA is analogous to the structure of biological genetic
material - DNA. In the same way, the method of creating new GA strings mimics the
recombination mechanisms of DNA.
Crossover is the operation of exchanging information, or genetic material, between twoindividuals. It works by swapping the values at corresponding locations between pairs of
strings. There are various methods for implementing crossover suited to different applications.
The simplest method is single point crossover: a point is selected at random to divide each
string into two sections, one of which is swapped over. Alternatively, a greater number of
crossover points may be used so that more than one contiguous sub-sequence is exchanged
between parent strings. Another method, uniform crossover, acts on individual locations -
swapping each according to a fixed probability.
Mutation is simply the action of randomly changing the value of individual locations or sub-
strings within a GA sequence. Although crossover is the main factor in the evolutionary
behavior in GA, mutation is important because it is the only way of introducing new genetic
material into the overall population.
2.3.3 Fitness based selection
As stated previously, individuals are selected for reproduction randomly, but with the
probability of selection weighted according the measured fitness of the candidate solution.
There are many methods by which fitness based selection can be implemented [Blickle, 1995].
The following are the three most successful in terms of effectiveness and popularity:
Fitness Proportional Selection (FPS)
The sum total of the fitness values of all population members is calculated and a
random number is selected between zero and this value. Running through all
population members, the fitness values are summed a second time. When the sum
exceeds the randomly generated number, the current population member is returned. If
the total fitness sum is thought of as the circumference of a circle, then each individual
is represented by a sector of the circle equal to its fitness value. If a pointer is placed at
a random position on the wheel, the probability of it falling within any individual
sector is proportional to the fitness of that individual. This is why FPS is also known as
Roulette Wheel Selection. A disadvantage of this method is that a fitness proportional
selection weighting may not always be suitable. It may be desirable todisproportionately bias selection in favor of individuals whose fitness is only
marginally greater than average, or to have only a small bias towards individuals who
have very high relative measured fitness. Another problem with this method is that it
does not work with negative fitness values.
Rank Selection
This method works like FPS, only the fraction of the roulette wheel assigned to each
individual is dependent on rank position rather than absolute fitness. The degree of bias
can be controlled by using the rank position value raised by a chosen polynomial
3/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
6/42
factor. This is a comparatively slow method because the population must be sorted
according to fitness.
Tournament Selection
A group of Individuals are selected from the population at random. The fittest member
of this group is returned. The degree of selection bias is related to the size of the groupor tournament - the greater the size, the greater the relative weight of fitter individuals:
With a tournament size of two, the fittest member of the population is twice as likely to
be selected as the median. This method is the most computationally efficient as only
the individuals selected for the tournament need to be inspected.
Other various schemes also exist such as truncation selection, linear ranking selection, and
exponential ranking selection but these schemes are outperformed by the above mentioned
schemes.
For a comprehensive review on selection schemes please refer to [Blickle, 1995].
2.4 Genetic Programming
Genetic Programming (GP) is an extension of GA [Koza, 1992]. GP uses a similar
evolutionary procedure for search and optimization based on selective recombination from a
population of candidate solutions. It differs from GA in the representation of the candidate
solutions.
2.4.1 Tree structure representation
The tree structure is a hierarchical model consisting of a set of interconnected nodes (seefigure 1). Each node can have several connections to nodes at a lower level, but only a single
parent connection.
The name genetic programming refers to the fact that the tree structure is usually used to
represent a function in the style of a computer program syntax tree. The branch nodes
represent functions - they take values passed by their immediate descendants as input
arguments and return an output to their parent. The terminal leaf nodes represent input
arguments or variables. The branching hierarchy denotes the evaluation ordering of functions.
In contrast to genetic algorithms, the tree representation of GP facilitates the creation of
candidate solutions of variable size and complexity - crossover and mutation operations canalter the size of individual trees. Another important difference is that, unlike GA, there is no
specific mapping of individual parts of the tree to a part of a candidate solution. The GP
function parse tree returns output values from a given set of input variables.
2.4.2 Operators and terminals
The GP tree structure is constructed from two sets of node types - functions and terminals.
The branch nodes - those which have at least one connection to a child node - are taken from
the function set. This set typically consists of simple logical (AND, OR, etc), conditional (IF-
4/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
7/42
THEN-ELSE) arithmetic (+, -, *, /), or comparison (, =) operators. The choice of function
set is a design decision which depends on the problem domain and on the data types that GP
function should take as input and return as output. The terminal set consists of all the data
input variables which are to be evaluated by the GP function.
Function and terminal sets must be chosen such that they are capable of expressing a solutionto the problem. This means that the designer should have knowledge about the problem
domain - including some idea of the likely form of solutions.
2.4.3 GP Crossover and mutation
Genetic Programming implements crossover and mutation operations equivalent to those used
in GA. To carry out the process of crossover on a pair of trees, a single node is selected at
random from each - these form the crossover points. The sub-trees originating at these nodes
are swapped over, creating two new GP trees (shown in figure 2). If the two sub-trees contain
a different number of nodes then the resulting offspring trees will be of different sizes to theparents. Crossover is easy to implement in code by swapping over pointers between parent and
child nodes at the selected points.
The parent trees used are selected using the roulette wheel selection algorithm.
In this the parents are selected according to their fitness. The better the chromosomes are, the
more chances to be selected they have. Imagine a roulette wheel where all the chromosomes in
the population are placed. The size of the section in the roulette wheel is proportional to the
value of the fitness function of every chromosome - the bigger the value is, the larger the
section is.
A marble is thrown in the roulette wheel and the chromosome where it stops is selected.
Clearly, the chromosomes with bigger fitness value will be selected more times.
This process can be described by the following algorithm.
1. Calculate the sum of all chromosome fitnesss in population - sum S.
2. Generate random number from the interval (0,S) - r.
3. While: Go through the population and sum the fitnesss from 0 - sum s. When the sum
s is greater then r, stop and return the chromosome where you are.
Of course, the step 1 is performed only once for each population.
Mutation works in a similar way; in the strictest definition a new randomly generated sub-tree
is inserted at a randomly selected node and the displaced section is discarded.
In our application mutation could be any one of the following operations;
1. Removing a randomly selected node from the tree (Deletion operation).
2. Adding a node to a randomly selected point in the tree without removing any portion
of the original tree (Addition operation).
3. The classical definition of mutation; removing a section of a tree and replacing it with
a randomly generated sub-tree (Replacement operation) .
5/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
8/42
Because crossover can exchange sub-trees between different locations, unlike in GA, there is
less need for mutation in creating and maintaining diversity in the population of candidate
solutions. Therefore the mutation operator is sometimes left out of GP algorithms if the
population is made large enough to ensure sufficient initial diversity of available building
blocks [Mitchell,1998].
Figure 1: GP parse-tree representation of two functions taking four separate input
parameters
AND
OR
NOT
IF
In ut 1
OR
Input 2 Input 3 Input 4
NOT
Input 1 AND
Input 2
In ut 3
Input 4
NOT
Parent 1 Parent 2
Subtree 1 1
Subtree 2 1
return value return value
6/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
9/42
Figure 2: Result GP of crossover.
3. Internet Bourse Experts System
One of the goals of this project is to analyze an existing system, namely the Internet Bourse
Experts system, and to attempt to improve on its existing experts generator, which uses GA,by replacing it with a GP based system.
3.1 Introduction
Internet Bourse Experts (IBE) is an on-line multi-agent system, based on client-server
architecture, which analyzes financial data and is able to generate stock trading expertise. In
this context, this expertise is composed of trading rules in GA based strings [Korczak,
Kustner, 2001], [Korczak, Lipinski , 2004], [Lipinski , 2003].
Given a library of trading rules the objective is to find the best case scenario collection of
trading rules and to judge its efficiency without giving much priority to economic relevance.
IBE uses genetic algorithms which employs the "survival of the fittest" ideology to create AI
based experts which base their decisions on a subset of the trading rules. This does not mean a
global optimum but the most effective under the circumstances. The fitness function used to
evaluate experts in the population is explicitly tailored to stock trading. The evolutionary
approach presented here whereby knowledge-based trading systems building are to be built, is
evaluated on real financial time series.
AND
OR
NOT
IF
Input 1
OR
Input 2
Input 3 Input 4
NOT
Input 1 ANDInput 2
Input 3
Input 4
NOT
Child 1 Child 2
7/40
return valuereturn value
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
10/42
There are a large numberof trading rules based on technical analysis indicators. Using these
rules, financial experts and market traders make decisions on the stock market: to buy, sell, or
defer action and do nothing.
For more details refer to [Lipinski, 2003], [Korczak, 2001], [Korczak, 2004].
3.2 Conceptual Flow
Within the system, a certain number of intelligent agents exist [Zitvogel, 2003]. These agents
specialize and represent different methods to analyze and process the data as well as
heterogeneous events.Each agent is autonomous; using its own methods to analyze the market
and concentrating on its own objectives.
The starting point of the system, according to the diagram (refer to figure 3), is of course the
stock market data from which all events are conceived. This data is preprocessed by the
database agents and then stored in the database.
Preprocessing consists of grouping all the data and calculating an average. The reason for this
is, keeping in mind the large volume of data that arrives from the stock market per second.
It is useless and impossible to store all of it. It seems better that the data be preprocessed and
stored in the database by intelligent agents , the stock market being continuously analyzed by
other intelligent agents known as market watch agents which try to detect as early as possible
the important events to keep the system on track. As a consequence, the system can adapt
easily to the new situation.
When the preprocessed data finally arrives in the financial database, it is treated by two classes
of agents. The first class is focused on a global analysis of the market like the analysis of
volatility and the second one is concentrated on the analysis of individual action. Thus, thesecond class forms those specialist experts which are used to define the state of quotations of a
particular stock. In certain cases, the agents require supplementary knowledge of which is
stored in an experts database which is managed by the experts observation agents.
The expert generator uses genetic algorithms to find the best composition of rules. Each
composition forms an expert in concurrence with the others.
Also, these agents process digital data, finding agents based on textual analysis. Certain
agents can observe the flashes of information which correspond to a particular action. After
the text analysis phase, they generate an additional signal for other agents, telling them to
change certain of their parameters.
At the end, the output of each agent is captured by the visualization agent and presented to the
user.
8/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
11/42
Fig. 3: Agents of IBE
3.2 GP Engine in IBE
The idea will be to replace the GA based engine in the Experts Generator module with our GP
engine. The previous section states that the agents can be divided into two classes. The first
class of agents deal with a global analysis of the market and the second class are concerned
with individual action.
Each tree will generate trading signals with respect to the current point in time of the stockbeing monitored. During the systems migration phase, when the GA engine in IBE will be
replaced with the GP engine, the first class of agents will remain unchanged. The second class
of agents which will undergo a change. As the Experts Generator module will receive data it
will use the genetic program to generate trees of trading rules with certain predefined
parameters. Similar to the genetic algorithm, each tree will be a composition of trading rules.
4 System Design and Implementation
4.1 Problem Definition
The objective is to create a system which optimizes a set of existing technical trading rules
using historical quotation data using a GP based method. This system must be able to evolve
optimized candidate solutions and also implement an appropriate dynamically adaptive GP
learning algorithm. Meaning it must be continually evolving and adapting to the changing
dynamics of the stock market. It must be able to produce trading expertise, promoting the
fittest ones and rejecting the weaker ones.
It must be also figured that what is fit now might be weak at the next moment. This signifies
continuous fitness evaluation.
9/40
Stock Watch Agents
Database Agents
Live Stock Market Data
Volatility Agents
Financial Database
Experts Database
Experts Observation Agents
Experts Generator
Action Analysis Agents
Users Database
Text Analysis Agents
Visualization Agents
Security Agents
Users
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
12/42
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
13/42
Price Channel Breakout (PCB)
If the current price exceeds the maximum from the previous n time units, BUY; if it
goes below the minimum from this period, SELL; otherwise HOLD.
If Pricecurrent x=currentncurrent Pricex Return BUY
Else If Price current x=currentncurrent Pricex Return SELLELSE Return HOLD
Simple Moving Average Crossover (SMAC)
If a short term (5-day) moving average value crosses above a long term (50-day)
moving average then BUY; if the short term average crosses below then SELL.
If MovingAverageShortTerm MovingAverageLongTerm Return BUY
Else If MovingAverageShortTerm MovingAverageLongTerm Return SELL
ELSE Return HOLD
Moving Average Convergence Divergence (MACD)
The MACD is the difference between a short term and long term price Exponential
Moving Average (EMA) values. If the MACD crosses above its own EMA value,
return a BUY indicator; SELL if it crosses below.
MovingAverageCD=ExponentialMovingAverageShortTermExponentialMovingAverageLongTerm
If MovingAverageCD ExponentialMovingAverageCurrent Return BUY
Else If MovingAverageCD ExponentialMovingAverageCurrent Return SELLELSE Return HOLD
Relative Strength Index (RSI)
The RSI compares the magnitude of a stock's recent gains to the magnitude of its
recent losses and turns that information into a number that ranges from 0 to 100. It
takes a single parameter, n, the number of time periods to use in the calculation
11/40
[4.2]
[4.3]
[4.4]
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
14/42
AverageGain = TotalGains / nAverageLoss = TotalLoss / n
FirstRelativeStrength = AverageGain /AverageLoss
For Count=2 to n
SmoothedRelativeStrengthn=[AverageGaincount1 count1Gaincount]/ count[AverageLoss count1 count1Losscount] / count
RelativeStrengthIndex=1100
1RelativeStrength
If RelativeStrengthIndex 70 then BUYElse If RelativeStrengthIndex 30 then SELL
Else Hold
K-Stochastic
A technical momentum indicator that compares a security's closing price to its price
range over a given time period. The oscillator's sensitivity to market movements can be
reduced by adjusting the time period or by taking a moving average of the result. It
takes a single parameter, n, the number of time periods to use in the calculation
%K = 100[Price CurrentLowestPricen/HighestPricenLowestPricen]
%D = 3-Period Moving Average of %K
If %K %D then BUY
Else If %K %D then SELLElse HOLD
1-Day Price Change
The 1-Day Price Change Indicator gives a BUY signal if the price has risen from the
previous days value and SELL if it has dropped. This is a naive trading strategy that is
being used to benchmark the performance of the GP-evolved trading rules.
If PriceCurrent Price Current1 then BUY
Else If PriceCurrent PriceCurrent1 then SELL
Else HOLD
4.2.2 Fitness Measurement
The fitness of individual technical trading rules is measured directly from the returns
generated by simulated trading using those rules [Altenberg, 1993].
12/40
[4.5]
[4.7]
[4.6]
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
15/42
A set of ratios to measure the performance of a stock movement have been observed and
analyzed [Lipinski, 2003]. These ratios, while not very useful on their own, do provide a
valuable insight on the dynamics of a stock price when used in conjunction with each other.
They include:
Sharpe Ratio
Sharpe Ratio=rprfp
where :
rp is Expected Portfolio Return
rf is Risk Free Rate
is p is Portfolio standard deviation
Source : [Sharpe, 1996]
The Sharpe ratio measures risk adjusted performance. On an international front, current
Sharpe ratios range from 1.7 to 2.5 with the average being 0.9, the ratio of choice by
modern standards being above 1.0 [Domash, 2006].
The larger the Sharpe ratio the better (the more consistent the results). The ratio will be
negative if the average return is less than the risk-free return. Some systems exhibit a
Sharpe ratio of 0.5 or more, and ratios above 1.0 are sometimes seen. For a long-term
system, open profit should be included in each month's profit and loss data in order for
the Sharpe ratio to be meaningful. If there are less than 12 months of data, we do not
calculate the Sharpe ratio, because such a small number of data points might not be
statistically significant and could give misleading results.
The one-year (short-term) Sharpe ratio provides an indication of how well a system has
performed in the most recent 12 months. The calculation uses the average monthly
profit/loss in excess of the risk-free return for the most recent 12 months, divided by
the standard deviation of monthly profits and losses over the same period.
Sortino ratio
Sortino Ratio=< R >R fd
where :
< R > is Expected ReturnRf is The Risk - Free Rate of Return
p is Standard deviation of Negative Asset Returns
Source: [Sortino, 1994]
The larger the Sortino ratio the better. The Sortino ratio will be larger if the profit is
high, and if the disappointments are small. For a given average disappointment, the
Sortino ratio would be better if there were many small disappointments, rather than a
few large disappointments (see the examples below). The ratio will be negative if the
13/40
[4.8]
[4.9]
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
16/42
average return is below the risk-free return. Some systems exhibit a Sortino ratio of 1
or more, and ratios above 2 may be seen.
If there are less than 24 months of data, we do not calculate the Sortino ratio, because
we feel that there may not be enough "disappointment" data to be statistically
meaningful.
When "average" and "standard deviation" of the disappointments are mentioned, the
calculations include the zero values. For example, for disappointments of 1.5, 1.5, 0, 0,
0, 0 the average is 0.5 and the standard deviation is 0.8; for disappointments of 1, 1, 1,
0, 0, 0 the average is again 0.5 but the standard deviation is 0.5, which is significantly
smaller than in the first example.
The one-year (short-term) Sortino ratio is calculated, to provide an indication of how
well a system has performed in the most recent
A seemingly obvious method to evaluate the fitness of a trading rule is to see whether itgenerates any profits. Overheads such as transaction costs have to be taken into account. The
idea that the trading rule might perform better under all conditions and time periods except for
the current one has to be taken into account as well. This would imply giving individual
trading rules a second chance.
Another means to judge fitness is to compare the results of a trade made by individual trading
rules to the results of a trade made by the BUY and HOLD strategy.
Many authors have disputed the effectiveness of this strategy and in some literatures it is
termed as a wrong idea for short term investments but provides a steady performance over
long term portfolios. For a comparative study, it does provide an indication. [Koza et al, 1996]
Initially both these methods will be used and after due experimentation, the decision to deploy
one or both of them will be made.
4.2.3 Description of GP Engine
The GP algorithm will be detailed in this chapter. The flow chart in figure 5 details the
components of this algorithm. In the flowchart, the functionality of each module has
been divided into the level of the GP hierarchy with which it is concerned. Further
more, before the actual algorithm is presented, the notation and terminology used
within it is also explained so as to facilitate understanding the algorithm.
14/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
17/42
Fig. 5: Flow Chart of GP Algorithm
Figure 5 shows a detailed flowchart representation of the algorithm.
Specification, Notation & Terminology
Each function of the algorithm is represented by a letter (A,B,C etc..) and a namedetailing the functionality. Any function can be called from another function and
arguments are provided in italics.
15/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
18/42
An in-depth analysis of the algorithm parameters and their significance is provided in
Annex A.
16/40
Object_Offset/Object_Count This integer relates to an object such as
population,generation or expert identifies
which is the count or the offset of that
object being worked upon
Node_Library A text file containing nodes for the GP
tree. Nodes are randomly selected fromthis library to create GP trees.
T,Ttest/train The variable T is an integer
representation of the total time for a
stock as the number of "ticks". When it
is subscripted with either test or train,
this variable then specifies ,at which tick
to begin testing or training
NTRE/GEN The integer N defines the maximum
number of trees or generations that can
be created during any instant
Objectobject_Count This variable is a direct representation ofthe object at an instance of object_count
Ctest/train (time) This variable, defines the capital or the
performance measure. There are
seperate capital values for the testing
period and training period. represented as
Ctestnet(Ttest-1+Population_Offset) and
Operation_Limit This integer defines the limit for any
operation, starting at 0.
Px This variable defines the percentage for
any context X. X can be elitism,
crossover, mutation, the percentage of
generation to be carried and the
percentage of the stock quotations to be
used for testing.
Rand Any random variable
Rdec This variable represents the boundary
limit
B time The variable B is an identification of the
stock and when subscripted with a time
value, it means the value of the stock at
that time.
A lower/uppertime This variable is generated by the moving
average parameter. At any given time,
this would be the upper and lower limits
of the moving average boundary
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
19/42
Algorithm
Initialization:
1. Initialize parameters
2. InitializePopulation_Offset= 03. RetrieveNode_Library
4. SetPopulationPopulation_Offset=Population Creation fromNode_Library5. Train, Test and EvolvePopulationPopulation_Offset
Train, Test and EvolvePopulation
1. While Ttest+Population_Offset< T
i. Train, Test and Evolve Generation
ii. IncrementPopulation_Offsetby 1
iii. SetPopulationPopulation_Offset=Population Creation fromPopulationPopulation_Offset-1
Train, Test and Evolve Generation
1. While Generation_Count
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
20/42
iv. Mutation_Limit = Pmut *NTRE/100
4. ForCount= 0 toElitism_Limit
i. GenerationExpert_Count = Previous_GenerationCountii. Increment Expert_Count
5. ForCount= 0 to Crossover_Count
i. Initialize Parent_Expert_1 = Roulette Wheel Selection of Expert fromPrevious_Generation atPopulation_Offset
ii. Initialize Parent_Expert_2 = Roulette Wheel Selection of Expert fromPrevious_Generation atPopulation_Offset
iii. GenerationExpert_Count , GenerationExpert_Count+1 =Expert Creation From Crossover ofParent_Expert_1 andParent_Expert_2
iv. Increment Expert_Countby 2
6. ForCount= 0 toMutation_Count
i. GenerationExpert_Count = Expert Creation From Mutation of a GenerationCountii. Increment Expert_Count
Roulette Wheel Selection ofExpertfrom Generation atPopulation_Offset
1. InitializePerformance_Sum = Sum ofCtestnet(Ttest-1+Population_Offset)of eachExpertin Generation
2. InitializePerformance_Ratio = 0
3. Initialize Count= 0
4. Select a random double valueRandbetween 0 and 1.
5. WhilePerformance_Ratio
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
21/42
Tree Creation FromNode_Library:
1. Initialize an empty Tree
2. While Tree_Depth
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
22/42
4.2.4 Conclusion
The problem, rapidly analyze stock market price data for a given stock and give a
BUY/SELL or HOLD decision, has been detailed. A solution, a genetic programming
based algorithm which incorporates certain financial technical indicator functions.The technical indicator functions are also detailed and explained alongwith the context.
A flow chart describing the flow of the modules of the GP algorithm is presented to
give a clearer view of its structure. Then finally the algorithm itself is presented, with a
technical specification.
Now at this stage, we are ready to do some experiments by assigning parametric data
and to draw conclusions from these experiments.
5. Experimentation
5.1 Experimental Aims and Objectives
The primary objective of the experimental work was to demonstrate the effectiveness
(or otherwise) of this system and of the general concept - GP optimization of TI based
trading rules - in making profitable forecasts of stock price movements.
It is necessary to demonstrate that the GP algorithm is learning rules which have some
predictive power beyond the training period, as opposed to just learning the behavior of
the training data. The stock data used for the experiment is composed of variable tick
rates and is sufficiently unpredictable to facilitate the goals. Applying various
parameters, one of the objectives is to establish whether any profit is achieved, and todiscern relationships between parameters and performance values.
5.2 Trading Procedure
A selection of the parameters of the GP algorithm will be assigned a range of values.
Then the GP algorithm will be applied to experimental input data. The American
trading strategy will be used; at the beginning of the experiment on each set of data, the
initial number of stocks in hand will be zero towards the end of the data, a BUY
decision will be forced.
A trial is run with the first set of values of each parameter. Performance is measuredand noted. Then the value of one parameter is assigned the next value in its range and
the above process is repeated.
The time taken for the whole process as a ratio of the number of populations will be
used as one of the measures of evaluating performance. This ration provides two
advantages. It gives a reasonable estimate as to how much time will be taken to
calculate a decision for one quotation. The size of data in each set of quotations is
different and the parameter for the percentage of data used in testing will yield a biased
result, this ratio eliminates such concerns.
The second measure of performance will be the net profit at the end. That is the
difference between the initial capital and the net worth at the end.
20/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
23/42
5.3 Experimental Input Data
The experiments used stock market price data for simulated evaluation of trading
strategies.
Experiments were conducted using share price data for AXA, Peugeot S.A. and STMicroelectronics N.V. traded on the Paris Stock Exchange; Bourse de Paris. Data for
all stocks covers the same 6 day period, from 29 th May 2006 to 3rd June 2006. The price
values were plotted from a spreadsheet and visually inspected for anomalous values,
such as negative volume values at the start of the trading day, before being used as
input for the GP system as CSV files. Also note that there are periods within each
graph represented by sloping lines. These lines are periods of inactivity in the stock
market, the time after which the stock market is closed for the day and before it opens
the next morning.
Fig. 6: AXA Data
Fig. 7: Peugeot Data
21/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
24/42
Fig. 8: ST Microelectronics Data
5.4 Parameters
The parameters used in the GP algorithm will now be explained, with regards to range
of values and reasons for selection of these parameters at the respective values or valueranges.
Initial CapitalC0, Commission Pcom, Initial Number of Stocks St, Decision BoundaryRdec.and Trading Strategy
Initial Capital and Commission are fixed at 100,000 and 0.2 % respectively.
Since the American trading strategy is used, the Initial Number of Stocks will be
throughout zero.
Decision Boundary is set at 0.2% as a lower value would allow too many decisions to
be taken, thus increasing the commission by a large number. A higher value would
filter too much allowing too few decisions to be made.
Moving Average RangeRMAThe Moving Average Range will affect how narrowly to filter decisions according to
price fluctuations. A low value will allow decisions to be made according to smaller
shifts as opposed to a high value which will be less sensitive. Values selected for this
are 10,15 and 30.
Buy Sell Percentage
This value is fixed at 50 to allow for the effects of decisions to be more apparent.
Number of GenerationsNGENand Number of Trees in each Generation. NTRE.The number of generations and number of trees effect the performance and time taken.
A low value will take less processing time but performance will be sacrificed and vice
versa.
Values for Number of Generations include 5,10 and 20 and for Number of Trees in
each Generation include 100 ,200 and 300.
Maximum Tree Depth NDEPThis parameter is fixed at 20 as too deep a tree would needless increase processing
time and resources.
22/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
25/42
Percentage of Previous Population Carried ForwardPcarry.This parameter is fixed at 50 as this would provide an equal mix of expertise from the
old population and new expertise from randomly created nodes.
ElitismPelite
The value for elitism is fixed as 2 as too high a value would lead to convergence.
Crossover ProbabilityPcross .and Mutation ProbabilityPmutThis parameter would effect expertise exchanged but genetically modified between
generations.
Values are set between 80 and 90 for crossover and 10 and 20 for mutation.
Replacement, addition and deletion Probability in MutationPrepmut, Paddmut PdelmutThese are fixed as 33.33 for each one of them.
Training Start Quotation limit Ttrain
The training start time is fixed at 30 , as a lower value would limit the effectiveness ofsome of the technical indicators.
Percentage of Quotations for TestingPtest.
This parameter would define how much of the data would be used for training and how
much for testing.
Its values include 60,75 and 90.
Refer to figure 9 for a summary of parameter values and ranges.
C0 Pcom St Rdec.
100,000 0.2 0 0.2
RMA NGEN. NTRE. NDEP Pcarry. Pelite Pcross . Pmut Prep
mut Padd
mut Pdel
mut
10 5 100 20 50 2 80 20 33.33 33.33 33.33
15 10 200 85 15
30 20 300 90 10
Ttrain Ptest30 60
75
90
Fig. 9: Summary of Parameters
23/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
26/42
5.5 Results
Representation of Results
Performance is measured as the ratio of the profit gained to the initial capital invested.
The results are shown on scatter charts with the parameter values on the X-axis versus
profit ratio on the Y-axis. The use of these type of charts is helpful in determining thetendencies of profit ratios with respect to a certain parameter. They are also helpful in
determining anomalies.
Buy & Hold as a Performance Indicator
The application also does a Buy & Hold run before trials are run on each stock.
Stocks are bought at the beginning of each business day. The amount of stocks bought
is determined by the Buy/Sell percentage, which is fixed during parameterization.
The profit gained during each such run is also marked on the scatter graph.
Variation of Profit
According to Figure 10, which shows the net profit as a variant of the moving average
range, a tendency for higher values of profit are shown at a moving average range of
30. A slight discrepancy is noticed for the AXA stock value which shows higher values
of profit at a moving average range of 10. This anomaly can be taken as a random
occurrence and discounted as it appears isolated.
Fig 10. Scatter charts of net profit as a variant of Moving Average Range
Figure 11, shows the net profit as a variant of the number of generations per
population, a tendency for higher values of profit are shown at the value of 10. Adiscrepancy is noticed for the AXA stock value which shows a slightly higher value of
profit at 20.
24/40
AXA Profit Ratio vs. RMA
-2.00E-02
-1.50E-02
-1.00E-02
-5.00E-03
0.00E+005.00E-03
1.00E-02
1.50E-02
2.00E-02
0 10 20 30 40
RMA
ProfitRatio
Profit Ratio
Buy& Hold Ratio
Peugeot Profit Ratio vs. RMA
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0 10 20 30 40
RMA
ProfitRatio
Profit Ratio
Buy& Hold Ratio
STM Profit Ratio vs. RMA
-3.00E-02
-2.00E-02
-1.00E-02
0.00E+00
1.00E-02
2.00E-02
3.00E-02
0 10 20 30 40
RMA
ProfitRatio
Profit Ratio
Buy& Hold Ratio
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
27/42
Fig 11. Scatter charts of net profit as a variant of Number of generations
Figure 12, shows the net profit as a variant of the number of trees per generation, a
tendency for higher values of profit are shown at the low value of 200.
Fig 12. Scatter charts of net profit as a variant of Number of Trees
Figure 13 shows the net profit as a variant of the crossover percentage and there is ahigh profit ratio trend at the 80 percent mark.
Fig 13. Scatter charts of net profit as a variant of Crossover percentage
Figure 14, shows the net profit as a variant of the testing percentage, a tendency for
higher values of profit are shown at the 75 percent mark. A marked discrepancy can be
seen with regards to STMicroelectronics which shows higher profit values at 90.
25/40
AXA Profit Ratio vs. NGEN
-2.00E-02-1.50E-02
-1.00E-02
-5.00E-03
0.00E+00
5.00E-03
1.00E-02
1.50E-02
2.00E-02
0 5 10 15 20 25
Ngen
ProfitRatio
Profit Ratio
Buy& Hold Ratio
Peugeot Profit Ratio vs. NGEN
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0 5 10 15 20 25
Ngen
ProfitRatio
Profit Ratio
Buy& Hold Ratio
STM Profit Ratio vs. NGEN
-3.00E-02
-2.00E-02
-1.00E-02
0.00E+00
1.00E-02
2.00E-02
3.00E-02
0 5 10 15 20 25
Ngen
ProfitRatio
Profit Ratio
Buy& Hold Ratio
AXA Profit Ratio vs. NTrees
-2.00E-02
-1.50E-02
-1.00E-02
-5.00E-03
0.00E+00
5.00E-03
1.00E-02
1.50E-02
2.00E-02
0 100 200 300 400
Ntrees
ProfitRatio
Profit Ratio
Buy& Hold Ratio
Peugeot Profit Ratio vs. Ntrees
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0 100 200 300 400
Ntrees
ProfitRatio
Profit Ratio
Buy& Hold Ratio
STM Profit Ratio vs. Ntrees
-3.00E-02
-2.00E-02
-1.00E-02
0.00E+00
1.00E-02
2.00E-02
3.00E-02
0 100 200 300 400
Ntrees
ProfitRatio
Profit Ratio
Buy& Hold Ratio
AXA Profit Ratio vs. PCross
-2.00E-02
-1.50E-02
-1.00E-02
-5.00E-03
0.00E+00
5.00E-03
1.00E-02
1.50E-02
2.00E-02
78 80 82 84 86 88 90 92
PCross
ProfitRatio
Profit Ratio
Buy& Hold Ratio
Peugeot Profit Ratio vs. PCross
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
78 80 82 84 86 88 90 92
PCross
ProfitRatio
Profit Ratio
Buy& Hold Ratio
STMProfit Ratio vs. PCross
-3.00E-02
-2.00E-02
-1.00E-02
0.00E+00
1.00E-02
2.00E-02
3.00E-02
78 80 82 84 86 88 90 92
PCross
ProfitRatio
Profit Ratio
Buy& Hold Ratio
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
28/42
Fig 14. Scatter charts of net profit as a variant of Testing percentage
Measure of Profit and Time per population
Figure 15 shows the net profit as a variant of the time per population in seconds. Ascan be seen, higher profit values are closer to the low end of the time range. This
means that higher profit values are in fact more likely to generated at shorter amounts
of time, at the 10 second boundary or before.
Fig 15. Scatter charts of net profit as a variant of Time per population
The evolutionary performance of the GP algorithm was reasonably sensitive to the
control parameters: Varying the crossover and mutation probabilities, number of
generations etc had a noticeable effect on the profit values attained
5.6 Discussion of Results
If the anomalies in the above results are disregarded; the following parameters at the
following settings should give very high, if not the highest, profit values at a time ratio
of less than 10 seconds per population.
26/40
AXA Profit Ratio vs. Ptest
-2.00E-02
-1.50E-02
-1.00E-02
-5.00E-03
0.00E+00
5.00E-03
1.00E-02
1.50E-02
2.00E-02
0 20 40 60 80 100
Ptest
ProfitRatio
Profit Ratio
Buy& Hold Ratio
Peugeot Profit Ratio vs. Ptest
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0 20 40 60 80 100
Ptest
ProfitRatio
Profit Ratio
Buy& Hold Ratio
STM Profit Ratio vs. Ptest
-3.00E-02
-2.00E-02
-1.00E-02
0.00E+00
1.00E-02
2.00E-02
3.00E-02
0 20 40 60 80 100
Ptest
ProfitRatio
Profit Ratio
Buy& Hold Ratio
AXA Profit Ratio vs. Time(seconds) per population
-2.00E-02-1.50E-02
-1.00E-02
-5.00E-03
0.00E+00
5.00E-03
1.00E-02
1.50E-02
2.00E-02
0 10 20 30 40 50
Time(seconds)
ProfitRatio
Profit Ratio
Buy& Hold Ratio
Peugeot Profit Ratio vs. Time(seconds) per
population
-0.03-0.02
-0.01
0
0.01
0.02
0.03
0 10 20 30 40 50
Time(seconds)
ProfitRatio
Profit Ratio
Buy& Hold Ratio
STM Profit Ratio vs. Time(seconds) per population
-3.00E-02
-2.00E-02
-1.00E-02
0.00E+00
1.00E-02
2.00E-02
3.00E-02
0 10 20 30 40 50 60
Time(seconds)
ProfitRatio
Profit Ratio
Buy& Hold Ratio
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
29/42
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
30/42
The highest profit for each data set has parameters which are slightly off from the
above proposed optimal settings.
Some of the parameters at certain datasets seem to follow a pattern with respect to
profit. An example would be AXA with increasing moving average ranges. The exact
opposite is noted at STMicroelectronics which shows increasing profit at decreasing
moving average ranges. A few parameters seem to show no pattern at all; number ofgenerations for example.
Figures 19,20 and 21 show the stock data and the BUY/SELL decisions of the GP
algorithm for AXA, Peugeot and STMicroelectronics respectively. The circles
represent SELL decisions, the squares represent BUY decisions. The following f
igures are subsets of the original stock data, to make it easier to represent on paper.
Fig 19. Graph output of AXA quotes with BUY/SELL decisions
Fig 20. Graph output of Peugeot quotes with BUY/SELL decisions
28/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
31/42
Fig 21. Graph output of STM quotes with BUY/SELL decisions
6. ConclusionThis report put forth the problem of analyzing financial time series data to suggest actions to
be taken in quasi-real time. A solution was proposed, based on genetic programming. The idea
was to create GP trees with financial technical indicators as branches and logical operators to
join these branches.
In order to fully appreciate the significance of this endeavor, current systems which employ
similar techniques were studied. The greatest inspiration was the Internet Bourse Experts
system, which employed genetic algorithm. In depth analysis was done of another GP based
system called EDDIE. A development platform had to be chosen which would make designing
of the software portion easier.
The initial tasks included an intensive study of evolutionary computing and stock market
trading methodologies. A tentative GP algorithm was devised. The hierarchical structure of the
major objects; population, generation, expert and tree, was proposed.The functionalities of
each object was designed such that any property or function could be accessible at any point in
the program. A representation for a tree structure was researched. Functionalities such as tree
construction, parsing, removal and modification of nodes and evaluation had to be
incorporated in this representation. A grammar for this kind of representation which emulated
a typical GP tree structure.
The technical indicators used in the project had been selected from their obvious benefits on
previous work in this domain. The vast library of IBE's trading functions is an obvious source.
All experiments were conducted on real stock price data. In all cases, as was demonstrated
during the experimentation phase, the results are more profitable then by the technique of
Buy-and-hold.
The tick frequency of each data set was different. Although the time period for each was the
same, the number of quotations was different. AXA contained 1013 quotations, Peugeot had
828 quotations, while STMicroElectronics had 860 quotations. This factor was not taken into
29/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
32/42
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
33/42
Appendix A
GP Algorithm Parameters
Trading-Specific Parameters
Initial Capital
This defines the initial working capital as type double, before any trading decisions are
made, it is represented as C0 and the amount of working capital at any subsequent time
period tis represented as Ct.
Commission
This is the commission charged per transaction as a percentage of number of stocks
bought or sold, as type double, represented asPcom .
Buy Sell PercentageThis parameter defines the percentage of capital to use to buy stocks, if the decision to
buy is given or the percentage of stocks in hand to sell if the decision to sell is given.
Both of are type double.
They are represented asPbuy andPsellrespectively.
Initial Number of Stocks
This parameter is used to define the initial number of stocks to have in hand, at the start
of the trading day. It is of type integer. It is represented as S0 . The number of stocks at
any given time tis represented as St.
Decision BoundaryThis parameter defines the minimum difference in stock prices which will allow a
decision to take place. It is of type double and it is represented byRdec . At any point in
time the absolute difference between subsequent stock prices must be greater than or
equal toRdec.
.
Moving Average Range
This parameter defines the previous number of time periods used in calculating a
moving average from stock prices. It is of type integer. It is defined asRMA. The
commission percentagePcom of the moving average at a given time tis added and
subtracted to create a boundary, if the current stock price falls into it, no decision is to
be made. This allows for decisions to be made according to fluctuations in price
movement trends.
Trading Strategy
This parameter defines whether to use the American trading strategy, i.e. if the
American trading strategy is used, fix the initial number of stocks at 0 and end of the
trading day, sell all stocks.
31/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
34/42
Importance of Trading-Specific Parameters
Portfolio Management and Performance (Pt)
The capital at time t Ct , and the number of stocks in hand at time t St are
continuously changing based on whether the decision is BUY or SELL.
A. If the decision is BUY, and if the working capital is more than zero, the
following formula is used;
1. Sbuy = (Pbuy * Ct-1 ) /Bt
Where Sbuy is the number of stocks to buy
andBt is the stock value at time t.
2. St= Sbuy + St-13. Ct = Ct-1 - (Bt* Sbuy) - (Pcom * Sbuy)
B. If the decision is SELL, and if the number of stocks is more than zero, the
following formula is used;
1.Ssell= (Psell * Ct-1 ) /BtWhere Ssellis the number of stocks to sell
2. St= Ssell- St-13. Ct = Ct-1 + (Bt* Ssell) - (Pcom * Ssell)
C. After either of these steps, the net worth , which is used a performance
measure is calculated;
1. Cnet(t) = Ct + (Bt* St)
Where Cnet(t) is the net worth/ performance at time t
.
Also all variable portfolio indices, Capital, Number of Stocks and net worth/
performance are separate for training and testing periods. They are represented as Ctestt, Stesttand C
testnet(t).For testing and for training as C
traint , S
traint and C
trainnet(t) respectively.
As can be seen from the above two formulas , the commission , Pcom, always figures
into the calculation and is always deducted from the capital , regardless of the decision
made. Keeping this in mind, decisions have to made carefully as too many would
deplete the capital too soon. The following section describes how it is possible to avoid
such an event from happening.
Filtering measures
Two filtering measures are used, moving average and decision boundary.
For the moving average;
1. Calculate the moving average from time tto t-RMA as follows;
I. SetA t := 0
II. for Index= 0 toRMAA t= (A t*Index + Bt-Index)/(Index + 1)
WhereA t is the moving average at time t andIndex is a counter
32/40
Formula 13
[A.1]
[A.2]
[A.3]
[A.4]
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
35/42
2. Calculate and upper limit and lower range of the
moving average by adding and subtracting the
commission , Pcom, as a percentage of the moving
average as follows;
I. Aupper
t =A t+ (Pcom * A t)/100
II. Alowert=A t- (Pcom * A t)/100Where A
uppert is the upper limit and A
lowertis the lower limit.
If a stock value at time t does not lie betweenAupper
t andAlower
t and the absolute
difference between the current and previous value is greater then or equal to the
decision boundary ,Rdec , a request for a BUY or SELL decision will be made.
IF (Rdec |Bt-Bt-1|) AND NOT(Alower
tBtAupper
t)
REQUEST DECISION.
Genetic Programming Specific Parameters
Number of Generations
This defines the maximum number of generations in each population. It is of type
integer. It is represented asNGEN.
Number of Trees in each Generation
This defines the maximum number of trees in each generation. It is of type integer. It
is represented asNTRE.
Maximum Tree Depth
This defines the maximum depth of a tree. It is of type integer. It is represented as
NDEP.
Percentage of Previous Population Carried Forward
This defines the percentage of the top members of the previous population which will
be used to create the new population. It is of type integer. It is represented asPcarry.
The very first population is consists of trees which have been generated from randomly
selected leaf nodes from a library. This kind of randomness is sufficient for an initial
population, but for subsequent populations, the expertise of a previous population is
necessary as it may be provide a reasonable solution for the forthcoming sample space.
Elitism
This defines the percentage of the elitist trees based on performance which will be
carried forward into the next generation unchanged. It is of type integer. It is
represented asPelite.
Crossover Probability
This defines the percentage of the number of trees from a previous generation which
are used to create new trees using the crossover process described in section. It is of
type integer. It is represented asPcross .
33/40
[A.5]
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
36/42
Mutation Probability
This defines the percentage of the number of trees from the previous generation which
are used to create new trees using the mutation process described in section. It is of
type integer. It is represented asPmut .
Mutation consists of replacement, addition or deletion operations and the probabilityfor each operation occurring is defined asPrepmut,P
addmutandP
delmut.
They are of type double.
Importance of Genetic Programming Specific Parameters
Hierarchical Structure
The Tree is the base object. The conceptual structure of the tree has been detailed in
section. The technical indicators and logical operators are stored as strings and to
evaluate the tree, the string is parsed. When a tree operates on stock market data, itgives a BUY, SELL or HOLD decision.
Trees can either be created from random nodes, from crossover operations or from
mutation operations.
EachExpertcontains 2 trees, a BUY tree and a SELL tree. The result of both trees
undergoes a XOR operation to return a single result. Each expert maintains a record of
parameters, performance, capital and number of stocks. In future references, an expert
would refer to the pair of BUY and SELL trees.
Each Generation containsNTREnumber of experts. The first generation, GEN0 ,
contains experts created from random nodes. Subsequent generations, GEN1 to
GENNGEN, have high ranking experts from the previous generation and new experts
created by genetic operations, namely mutation and crossover.
EachPopulation containsNGENnumber of generations. The first generation in the first
population is randomly created (as detailed above, subsequent generations in the same
population are created through elitism and genetic operations) , but the first generation
in forthcoming populations will consist ofPcarry percent of the elitist members of the
fittest generation from the previous population.
Figure A.1 exhibits the hierarchical structure of the objects described above.
34/40
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
37/42
Fig. A.1: Hierarchical Structure of GP Objects
Training and Testing Specific Parameters
Training Start Quotation limitThis defines at which point in time on the stock market sample to begin training. This
number is an integer and at minimum it has to be 30. Some of the technical indicators
used in the application read quotations going back to several points in time. It is
represented as Ttrain.
Percentage of Quotations for Testing
This defines what percentage of the stock market sample to use for testing. This
number is an integer. It is represented asPtest.
Subsequently;
35/40
Population POP0
Generation GEN0
Generation
GEN
..
ExpertEXP0 ExpertEXP
NTRE
..
Sell TreeBuy Tree Sell TreeBuy Tree
ExpertEXP0 ExpertEXP
NTRE
..
Sell TreeBuy TreeSell TreeBuy Tree
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
38/42
Ttest= (Ptest * T)/100
Where Trepresents the total time line in the sample space and Ttestis the
time at which testing will begin.
Importance of Training and Testing Specific Parameters
The sample space, in this case the stock market data, is divided into TRAINING and
TESTING periods which move forward a single time unit as populations progress.
The first training period is between Ttrain and Ttest-1 and the first testing period is at Ttest .
This means that each expert of the first generation of the first population will be
applied to the stock quotations during this training period. The performance of each
expert will be calculated and the fittest will be used to create the second generation.
This process will repeat itself untilNGENgenerations have been created. The last
generation will be the fittest according to the fitness measure. The fittest tree will be
applied to the first testing period, Ttest.
At this point, a new population is to be created. The training and testing periods will beoffset by 1. Therefore, in this case, the training period will be between Ttrain+1 and Ttest-
1+1 and the testing period will be Ttest+1. Instead of complete random creation of the first
generation in this new population,Pcarry percent of the elitist experts from the last
generation of the previous population will be carried as they are into the new
population and the remainder will be randomly generated.
This process will repeat until the last point in the sample space is tested. That is,
Until Ttest+offset== T.
Where offset is an integer which is initialized at 0 and is incremented
by 1 each time a new population is to be created.
36/40
Formula 15
Formula 16[A.7]
[A.6]
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
39/42
7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
40/42
[Goodhart, 1995]
Goodhart, C., OHara, M., High Frequency Data in Financial
Markets: Issues and Applications, London School of Economics,
1995.
[Gourieroux, 1997]Gourieroux., C.,ARCH Models and Financial Applications,
Springer Verlag, 1997.
[Holland,1975]
Holland, J., Adaptation in Natural and Artificial Systems,1975.
[Hui, 2003]
Hui, A., Using Genetic Programming to Perform Time-Series
Forecasting of Stock Prices, http://ww.genetic-programming.org ,
2003.
[Kaboudan, 2000]
Kaboudan, M., Genetic Programming Prediction of Stock Prices,
Computational Economics, Volume 16, pp. 207236, 2000.
[Korczak, 2001]
Korczak, J., Kustner. P.,A Stock Trading System using Genetic
Approach and Object-Oriented Database Technology, Proceedings
on Workshop on Artificial Intelligence for Financial Time Series
Analysis, 2001.
[Korczak, 2004] Korczak, J., Lipinski, P.,Evolutionary building of stock trading
Experts in a Real-Time System, Proceedings of the 2004 Congress
on Evolutionary Computation, CEC 2004, pp.940-947, 2004.
[Korczak, 2001]
Korczak, J., Roger, P., Stock timing using genetic
algorithms,Applied Stochastic Models in Business and Industry
Volume 18: pages 121134,2001.
[Koza, 1992]
Koza, J., Genetic Programming: On the Programming ofComputers by Means of Natural Selection, The MIT Press, 1992.
[Koza, 1995]
Koza, J., Survey of Genetic Algorithms and Genetic Programming,
Proceedings of the WESCON 95 Conference Record,1995.
38/40
http://ww.genetic-programming.org/http://ww.genetic-programming.org/7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
41/42
[Koza et al., 1996]
Koza, J., Bennett III, F., Andre, K., Keane,M.,Artificial
Intelligence in Design, http://www.genetic-programming.com,
1996.
[Krishnaswamy et al., 2000]Krishnaswamy, C., Gilbert, E., Pashley, M., Neural Network
Applications in Finance: A Practical Introduction, Financial
Practice and Education, 2000.
[Langdon, 1995]
Langdon, W., Qureshi, A., Genetic Programming: Computers
using "Natural Selection" to generate programs, The MIT Press,
1995.
[Lendasse et al., 2001]
Lendasse A., Lee J., de Bodt, E., Wertz, V., Verleysen, M.,Dimension Reduction of Technical Indicators for the Prediction of
Financial Time Series - Application to the BEL20 Market Index,
European Journal of Economic and Social Systems 15, Vol. 2, pp.
31-48, 2001.
[Lipinski, 2003]
Lipinski P.,Evolutionary Data-Mining Methods in Discovering
Stock Market Expertise from Financial Time Series, PhD Thesis,
ULP Strasbourg, 2003.
[Mitchell et al., 1992]
Mitchell M., Forrest S., Holland ,J., The royal road for genetic
algorithms: Fitness landscapes and GA performance; Proceedings
of the First European Conference on Artificial Life, Paris, France,
pp. 245, 1992.
[Molgedey, 2000]
Molgedey, L., Ebeling, W.,Intraday Patterns and Local
Predictability of High Frequency Financial Time Series, Physica A:
Statistical Mechanics and its Applications,Volume 287, Issues 3-
4,pp. 420-428, 2000.
[Pantazopoulos et al., 1998]
Pantazopoulos, K., Tsoukalas, L., Bourbakis, N., Brun, M.,
Houstis, E.,Financial prediction and trading strategies using
neuro-fuzzy approaches , IEEE Transactions on Systems, Man and
Cybernetics, Part B,Volume: 28, Issue: 4, pp. 520-531, 1998.
39/40
http://www.genetic-programming.com/http://www.genetic-programming.com/7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming
42/42
[Santini, 2000]
Santini, M., Tattamanzi A., Genetic Programming for Financial
Time Series Prediction, Proceedings of EuroGP'2001, Volume:
2038, pp. 360371, 2001.
[Sharpe, 1996]Sharpe, W.,Mutual Fund Performance, Journal of Business, pp.
119-138, 1966
[Sortino, 1994]
Sortino, F., Price, L., Performance Measurement in a Downside
Risk Framework, The Journal of Investing, pp. 59-65, 1994
[Spears, 2003]
Spears,W., Gordon-Spears, D., Evolution of strategies for resource
protection problems, Advances in evolutionary computing: theory
and applications, Springer-Verlag, 2003.
[Xu et al., 2003]
Xu, Z., Leung, K., Liang, Y., Leung, Y., Efficiency Speed-up
Strategies for Evolutionary Computation: Fundamentals and Fast-
GAs, Applied Mathematics and Computation, v.142, pp. 341-388,
2003.
[Zitvogel, 2003]
Zitvogel, O.,Dveloppement d'un Systme Multi-Agents, Interface
Intelligente, Ngociation et Gestion de Bases de Dones, Internship
Report, LSIIT-AFD, Illkirch, 2003.