Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Case Study: A Linked Linear Regression -
Data Envelopment Analysis Model for the Long-Term Prediction of
Mutual Fund Performance
Cindy Chao, Greg Higgins Michael McGee
Department of Computer Science and Engineering SOUTHERN METHODIST UNIVERSITY
Dallas, Texas
May 14, 1994
LINKED LINEAR RECRESSION - DEA MODEL
Contents
Abstract.............................................................................................................................3
Introduction......................................................................................................................3
Whatis a Mutual Fund? .................................................................................................4
DEAFactors......................................................................................................................4
Model Development and Validation.............................................................................6
Examinationof Resiilts ...................................................................................................9
InvestmentPerformance............................................................................................... 10
Conclusion...................................................................................................................... 11
References....................................................................................................................... 12
ProfessionalReferences................................................................................................. 13
Appendix........................................................................................................................ 14
DEAFormulation .............................................................................................. 15
DEACode ........................................................................................................... 16
ForcastingCode ................................................................................................. 19
2
Case Study: A Linked Linear Regression - Data Envelopment Analysis Model for the Long-Term Prediction of Mutual Fund Performance
Michael McGee Greg Higgins Cindy Chao Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275
Data Envelopment Analysis (DEA) can be used to evaluate managerial efficiency of mutual funds. By extending the DEA methodology with linear regression, a new prediction tool for future efficiency is developed. The linked model was tested on a sample of fifty growth mutual funds over a three year time horizon - resulting in accurate predictions of future efficiency with 99% confidence.
The current mutual fund market contains nearly $1.4 trillion
dollars; this rapid growth of mutual
funds as a long—term investment instrument has led to an increased
current assessment of a fund's management. DEA is a linear programming methodology that
computes relative efficiency scores among a group of homogeneous
demand for evaluation and forecasting decision—making units (DMUs). techniques. The performance of a Combined with linear regression, DEA mutual fund is directly linked to the performance of the management staff. Data Envelopment Analysis (DEA) can
be used to evaluate managerial
efficiency; however, this is only, a
becomes a prediction tool for long-term forecasting. The linked model not only evaluates return on investment
but also the quality of the fund's managerial team on a long—term basis.
3
LINKED LINEAR REGRESSION - DEA MODEL
What is a Mutual Fund?
A mutual fund is an investment company that pools the money of many institutional and individual investors into one professionally
managed investment account. The
fund manager creates a portfolio for the shareholders by buying a variety of "true" financial instruments, such as corporate stocks and bonds, and
government and municipal securities, in accordance with the fund's particular investment objectives. Fund managers are professional investors who know the financial market and
follow it daily with extensive research capabilities to effectively investigate all investment opportunities. The fund is always monitored, with individual securities constantly being bought, sold, and traded in order to optimize
the investment objectives. The beauty of investing in sometimes twenty different securities lies in the protection
of the investment through portfolio diversification.
Mutual funds are divided into six major categories according to their investment objectives, which are outlined in the fund's prospectus, such as goals, level of risk, and the type and frequency of return desired. Since the
goal of the linked model is forecasting of long-term performance, growth mutual funds were selected because their objectives emphasize the accumulation of capital gains, and they have the greatest long-term growth potential.
DEA Factors
The inputs and outputs of the envelopment analysis were made up of the main characteristics of a mutual fund. They were selected because they are important factors to investors and are directly linked to the performance of the fund, manager. The inputs
selected were beta - a measure of risk, expense ratio, and loads & other fees. The outputs used were net asset value (NAV) per share, dividends, and capital gains. Inputs
Beta is a measure of the fund
portfolio's average volatility,or risk, relative to the S&P 500. The beta value of the market is a constant equal to 1.00. A fund with a beta value greater than 1.00 is more volatile than the market. It is expected to outperform the "average fund" in a bull (rising) market and bring in greater returns, while it is expected to fall more than
4
LINKED LINEAR REGRESSION DEA MODEL
the average fund in times of a bear (declining) market. A fund with a beta value less than 1.00 Possesses the characteristics of the converse situation. For example, a portfolio with a beta value of 0.75 is expected to be 25% less responsive to the general economic trend than the average fund and return 25% more conservative than the average fund.
Expense ratio is a measure which relates expenses to the average net assets for a period. It is calculated by taking the annual expenses paid by the fund (including management and investment advisory fees, custodial
and transfer agency charges, legal fees, and distribution costs) divided by the
average shares outstanding for the period. The greater the expense ratio, the larger the percentage of the shareholder's investment gets paid out directly to the management company. i The expense ratio does not include
loads or commissions paid for sales or f transaction fees. f
Loads and other fees represent all C other charges made to the shareholder which are not accounted for in the h expense ratio. A load is a portion of the c offering price of the fund that pays for v costs such as sales commissions,
brokerage costs, and transaction fees. This value is a Percentage of the
Purchase price taken off either before
the money is invested (front-end load) or after investment when the
shareholder redeems the shares (back- end load). The amount of the load is established by the managerial company, but is paid to the sales and transaction team. Some funds have a load of 0.00%, meaning that they have the same
benefits as load funds, except they are sold directly by the fund company
rather than through a sales broker so that no loads or other fees are incurred. With no-load funds, the investor must
seek out and contact the investment firm directly. There is a distinct advantage to investing in no-load funds. For example, if we had invested 1000 in a particular portfolio with a
7.50% front-end load, $75 would be :aken off the top, and we would be left vith only $925 worth of shares in the
und. Had this been a no-load fund, the till $1000 would have been utilized. )utputs
Net asset value per share represents DW much each share is worth. It is LlCulated by taking the total market
due of a fund's shares (securities plus sh plus accrued earnings minus
5
LINKED LINEAR REGRESSION - DEA MODEL
liabilities) divided by the number of
shares outstanding. No profits or losses are realized through NAV until the
fund is sold. Interest lies in the amount
of increase or decrease in NAV from time of purchase to time of sale.
Dividends are payments declared by a corporation's board of directors and made to shareholders, usually on a quarterly basis. A fund's dividends come directly from the dividends paid
out by the individual securities within the portfolio. They are paid out equally among the shareholders on a scheduled time frame.
Capital gains on a mutual fund are the profits made from the sale of a
capital asset (such as a security, stock, or bond) within the portfolio. The
number used for capital gains is adjusted for losses and stock splits. The sale of an asset occurs when the
manager decides that a security needs to be sold for the benefit of the
portfolio. Capital gains are distributed to shareholders when they are realized
(when a security is sold and a profit is made) and are paid out as capital distributions.
Dividends and capital gains are t treated as separate outputs because
they are incurred, paid out, and taxed f
in different ways. Shareholders can reinvest dividends and capital
distributions if they elect to do so.
Model Development and Validation The decision—making units (DMUs)
in this analysis were the 50 growth
mutual funds. A prerequisite of DEA is homogeneous DMUs. By selecting the
funds from the same sector (growth funds), this requirement is met.
The DEA model uses the dual formulation of the linear program as described by Charnes, Cooper and
Rhodes (1978). The objective function of this form is to minimize the
objective value of the current DM10. If a DM10 is efficient the optimal value is 1, while an inefficient DM10 is driven
towards zero. The first step in the
development of the linked model is the generation of DEA scores for the
mutual funds being studied. The data collected for the mutual funds is used
as the inputs and outputs for the DEA code which is documented in the appendix. The DEA code is written
using the SAS/OR package allowing :he DEA scores to be fed directly into he regression phase of the model.
DEA scores are generated for the ifty mutual funds in the study using
LINKED LINEAR REGRESSION - DEA MODEL
the 1990 data collected from Morningstar Mutual Funds. At this point
DEA scores are also generated for 1993 and stored for use in validating the model's forecasts. The 1990 DEA scores are then regressed against all of the six
DEA factors previously described. This regression model is unique
from previous research in that the goal of the regression is not to create an efficiency score, but to forecast an efficiency score for the future of a particular DIv[LT. Previous work by Thanassoulis (1993) compared the DEA
methodology with linear regression as a means of creating efficiency scores. Our model links the two methods into one prediction tool. The full factor model that resulted appeared on the surface to be a good predictor of efficiency (R2 =
0.8239). The full factor model included three non-significant factors at the 5%
level of significance. Beta (p-value = 0.68), load (p-value = 0.65), and dividend (p-value = 0.50) were all eliminated from the model leaving only
expense ratio, net asset value, and capital gains as significant factors. The revised model remained significant
with an R2 = 0.8213.
As stated before, this model appears to be a good predictor; however, linear
IflI &A A*Iy.I. •_.
S.. I I
•.I 5.7 5.5 •.4 •.5 II 1.2
.,a,.t.. V.,.. - *fl P5(5
Figure 1: Residual vs. Predicted DEA Score
regression models must be validated
through diagnostic tests. One assumption in regression analysis underpins the entire methodology - errors are a randomly distributed variable following a normal distribution with mean equal to zero, N(0,a2). Regression analysis in SAS is based on a linear relationship between the dependent and independent variables.
These assumptions can be verified through the examination of the residuals, or errors, of the regression model. A plot of the residuals versus the predicted values, Figure 1, reveals a distinct pattern indicating a non-linear
relationship between the independent and dependent variables. This requires the examination of univariate statistics
7
LINKED LINEAR REGRESSION — DEA MODEL
for each of the variables in the regression model. A standard normal quantile plot for each of the variables allows the determination of the distribution of each variable.
From the univariate procedures, some of the variables demonstrated an
exponential distribution requiring a log—linear transformation in order to validate the regression model. Net asset value, capital gains, dividends and the actual DEA score follow exponential distributions while beta, load, and expense ratio are all normally distributed. A second data set of the inputs and outputs was created by
taking the natural log of each of the exponentially distributed variables. Once normality was confirmed
through univarjate procedures, a new full factor linear regression model was fitted. The newly formed model resulted in an R2 = 0.8163 and was validated through the residual plot shown in Figure 2. Using the SAS regression report, the regression equation was created using the
coefficients of the significant factors:
beta, expense ratio, net asset value, and dividend.
The next step is the creation of predicted efficiency scores. The
common method for forecast model verification is the prediction of the
past. The linked model is verified by using the input and output data for
1993 in the regression equation created from the 1990 data. The predicted
values are then compared to the actual 1993 DEA scores generated earlier by the DEA code. The forecast errors are evaluated using a significance test on the mean difference. The test is
conducted at the 1% level of significance. H0: mean difference = 0; Ha: mean difference # 0. The resulting test statistic: z = -0.5164 and critical value, z = 2.57 leads to the conclusion that the prediction values are not
significantly different from the actual
DEA scores with 99% confidence.
•
II,:
•I
,
• .i — •.S P.O AS PfldItt*O V.,.. It LOLA POLO
Figure 2: Residual vs. Predicted DEA Score—after linear traa
8
LINKED LINEAR REGRESSION - DEA MODEL
Fund DEA 1990 Predicted 1993 DEA 1993 Error
FTRNX 1.00000 1.00000 0.62320 -0.37680
PRWAX 1.00000 0.78983 0.29415 -0.49568
G1NLX 1.00000 0.33207 0.23691 -0.09516
ACRNX 1.00000 1.00000 1.00000 0.00000
FDETX 0.41944 0.98900 1.00000 0.01100
CFIMX 0.73415 0.98032 1.00000 0.01968
LOMCX 0.32965 0.68927 1.00000 0.31073Table 1: Efficient MUtUI t-unas
Examination of Results
In analyzing the results of the sample studied, it is important to remember that a mutual fund is mainly used as a long-term investment, providing stable low-risk growth. A
potential shareholder wants to invest in a fund that will be efficient in the future, regardless of its current efficiency. Therefore we focus our examination on the set of funds with efficient scores in either 1990 or 1993.
Table 1 shows the 1990 DEA, Predicted
1993, and actual 1993 DEA scores along with the errors for each prediction (Error = 1993 DEA - Predicted 1993).
Of the first four funds listed (all receiving efficient ratings in 1990), the
future efficiency of three funds was correctly predicted. Our linked model
predicted future behavior of currently efficient funds with 75% accuracy. Although the error for T. Rowe Price New America Fund, PRWAX, is high, we are not concerned with the magnitude of inefficiency, only the
state of its efficiency. Of the last four funds in the table
(all receiving efficient ratings in 1993), three funds were predicted accurately. The fifth and sixth funds, Fidelity Destiny Portfolios: Destiny II, FDETX, and Clipper Fund, CFIMX, both
received a predicted score of approximately 0.98 for 1993. Based on the previous significance test, these values are not significantly different
from 1.00. Thus our model predicts the funds that will be efficient in the future
with 75% accuracy.
LINKED LINEAR REGRESSION - DEA MODEL
Table 2: Investment Simulation Results
Investment Performance Since investors are concerned with
the long-term growth and appreciation of their investments, DEA by itself is not an effective tool to aid in the
development of investment strategies.
The percent of appreciation we
received in the portfolio of 1990 efficient mutual funds was 46.92%, while the appreciation was significantly higher for the portfolio of funds recommended by the linked
However, combined with linear model at 60.41%. regression, DEA gives us the ability to estimate the long-term efficiency of a fund. To illustrate the effectiveness of the linked model relative to DEA alone, we simulated an investment in the efficient funds in 1990 and the funds
predicted to be efficient by our linked model. Since mutual funds allow the purchase of fractional shares, we invested $1000 in each of the mutual
funds. The results are given in Table 2.
10
LINKED LINEAR REGRESSION - DEA MODEL
Conclusion The linked model allows us to
broaden the range of applications for DEA. The linked model enables the
prediction of future efficiency, which is of great benefit to investors because it
provides more insight into the long-term viability of a mutual fund. The model also requires little historical data
for each DM0 unlike other forecasting methods. Another benefit is the relative
compactness of the model; very few predictors are required to explain a high degree of variability and achieve
accurate predictions. Data envelopment analysis by itself is not an accurate predictor of future efficiency of mutual funds. Only one fund that
was efficient in 1990 remained in the efficient set in 1993. Linear regression
and DEA combine to give us an accurate prediction for the future
behavior of DMUs.
An immediate extension to this
research is the definition of an "ideal
type" mutual fund. An optimization algorithm will be used to generate values for the regression factors and
record the combinations which yield and efficient fund prediction. Once this
benchmark is defined a growth mutual fund prospectus can simply be visually inspected to determine managerial efficiency. The next phase of this research would be to compare the accuracy of the linked model with that
of other forecasting methods, such as time series decomposition. Another step in the continued development of this area of research is to improve the accuracy of the predictions through
further modifications to the regression model using a weighted regression
model or the inclusion of adaptive
error control.
11
LINKED LINEAR REGRESSION - DEA MODEL
References
Barr, R. S., L. M. Seiford, and T. F. Siems (1991), "An Envelopment Analysis Approach to Measuring the Management Quality of Banks," The Annals of Operations Research. J.C. Baltzer AG, Science Publishers.
Dilorio, F. C. (1991), SAS Applications Programming: a Gentle Introduction. Boston: PWS-KENT Publishing Company.
Flatt, L. G. (1994), "FUNDamentals: An Envelopment-Analysis Approach to Measuring the Managerial Quality of Mutual Funds," technical report. Dallas, Texas: Southern Methodist University, Department of Computer Science and Engineering.
Morningstar Mutual Funds User's Guide (1992), Chicago: Morningstar, Inc.
Morningstar Mutual Funds (1993), Chicago: Morningstar, Inc.
Norman, M. and B. Stoker (1991), Data Envelopment Analysis: The Assessment of Performance. New York: John Wiley & Sons.
SAS Institute Inc. (1989), SAS/OR User's Guide, Version 6, First Edition. Cary, NC: SAS Institute Inc.
Sexton, T. R., S. Sleeper, and R. E. Taggart, Jr. (1994), "Improving Pupil Transportation in North Carolina," Interfaces, January-February 1994. Providence, RI: The Institute of Management Sciences.
Thanassoulis, E. (1993), "A Comparison of Regression Analysis and Data Envelopment Analysis as Alternative Methods for Performance Assessments," O.R. Journal of the Operations Research Society. Coventry: Operational Research Society Ltd.
12
LINKED LINEAR REGRESSION - DEA MODEL
Professional References
Barr, Richard S., Ph.D. Associate Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas.
Dula, Jose, Ph.D. Assistant Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas.
Guerra, Rudy, Ph.D. Assistant Professor, Department of Statistical Science. Southern Methodist University, Dallas, Texas.
Welch, David. Senior Systems Analyst II, Bradfield Computer Center. Southern Methodist University, Dallas, Texas.
13
LINKED LINEAR REGRESSION - DEA MODEL
Appendix
DEA Formulation Mat hematical formuation of the DEA model used in the Linked Model.
DEA Code The DEA code is developed using Release 6.0 of the SAS System on an IBM 3090 mainframe. The code presented requires only two input files . One file contains the input and output factor data while the second file contains a listing of the mutualfrnd ticker symbols. The program is construct ed from a series of SAS Macros which fully automate the formulation and solution of the multiple linear programs required in Data Envelopment Analysis.
Forecasting Code Developed as an extension to the DEA code, the forecasting model uses the same computing platform and software. The program reads the same data files as the DEA code as well as the DEA scores generated by the DEA study, The program generates predicted efficiency scores and all necessary statistical procedures for validating the model.
14
LINKED LINEAR REGRESSION - DEA MODEL
DEA Formulation
A
[X
Inputs
10outputs
+ Slacks =
e
-
A.
Min E)
A-A:!^O A7
e^!o
Mat heinaticalformulation of DEA requires that the inputs and out puts for each DMU are formated in a matrix, A The matrix is multiplied by a vector, A,to construct the linear programming constraints.
Minimize 0
Subject to:
Inputs Beta: X,DMUl+XDMU2+... +xODMU5O-x51(0i'l):5OiO
Load: xIDMUI+xDMU2+... +xo'DMU50-X5(0i1)!90O
E_Pafio: xiDMUi +x2DMU2+... +x5oDMUso-x51(0i l):50 O
Output$ NAV: X1DMUI+XDMU2+... +X5ODMU5O-X51(00) 011
Cap_Gain: xrDMUr +x2 DMU2+ ... +x5ODMU5Ox5(0rO) 0i • 1
Dividend: xrDMUr +)(2 DMU2 + ... + X50 DMUso - X51 (0 0) 1
All Variables 2: 0
Mere a Is current ciecislon-maldng unit (DMU)
Linear program solved by SAS for each DM11.
15
LINKED LINEAR REGRESSION —DEA MODEL
DEA Code Part 1: Initialization
This section reads the data files into a SAS dataset. Using the SAS Data Management Language datasets are created for the right hand side vectors, as well as the theta vectors. The two datasets are perfectly matched - each right hand side has exactly one corresponding theta vector. There is one right hand side and one theta vector for each DMU in the study. These vectors are merged with the inputs and outputs to formulate the linear program. The final step in the initialization is the renaming of variables to the proper ticker symbols in order to aid in report interpretation.
MICHAEL MCGEE SENIOR DESIGN GROUP '94 DEA ANALYZER
FILENAME NDT MENAME CTAA';
FILENAME DAflAF9O DATAA'; OPTIONS UNESIZE.72;
%LET SYMBOLS = ACRNX LOMCX HRTVX STCSX BEONX GINIX PARNX VCAGX DEVLXAGBBX
FPPTX FDE1X FDESI( FCN1( FBGRX FDVLX SRFCX FRGRX BRW1X SAMYX FDFFX FAGOX MLSVXALTFX ANEFX GPAFX PRWAX CGRWX OVOPX WTMGX
NYVTX SEOGX BVALX BARAXAFGPX FTRNXAMRGX DNLDX HCRPX NBSSX PRGM( INNSI( TWHIX DWM( LMITX JHNX SRSPX CFIMX cWBX BOSSX
DATA TICK; INFILENDT; ARRAY TICKR(50) $5;
INPUT TICKRr);
r MUTUAL FUND TICKER SYMBOLS
DATA ONE; INFILE DAT; ARRAY FUND (50)
INPUT JD_$ FUND() _TYPE_ $ _RHS.;
r GENERATE THETA COLUMNS 1
DATA NEW-COL; SET ONE; ARRAY THETA (SO);
ARRAY FUND(50);
DO I ITO W. IF N_ = I THEN THETA(I) I;
ELSE IF_N== 2 &_N_ <--4 THEN THETA(I .FUNO(I);
ELSETHETA(I) =0;
END DROP I;
P GENERATE RIGHT HAND SIDE VECTORS 1
DATA RANGE; ARRAY RH(50); ARRAY FUND(50);
SET ONE; DO I = ITO SE; IF _N_ = I THEN RH(I) MISSING;
ELSE IF _N_ 4 THEN RH(I)= FUND([),
ELSE RH(I) 0;
END; DROP MISSING I FUNDI.FUND50;
DATA PARAM; MERGE NEW_COL RANGE;
P RENAMES THE FUNDSTO THE PROPER NASDAQTICKER SYMBOL!
PROC DATASETS;
MODIFY ONE; RENAME FUNDI =ACRNX
FUND2=LOMCX
FUND3= HRTVX FUND4 STCSX FUNDS=BEONX FUNDO=GINLX
FUNDT=PARNX FUND8=VCAGA FUND9=DEVLC FUNDIO=AGBBX
FUNDII= FPPIX FUNDI2= FDEIX
16
LINKED LINEAR REGRESSION - DEA MODEL
FUNDI3 FDESX FUND14 FCNTX FUNDI5= FBGFO( FUND16 FDVLX
FUND17 SRFCX FUNDI8 FRGRX
FUNDI9 BRW1X FUND20SAMVX
FUND21 FDFFX FUND22 FAGOX FUND23=MLSVX FUND24.ALTFX FUND26ANEFX
EUND26 GPAFX FUND27. PRWAX
FUND28=CGRWX
FUND29. QVOPX FUND30.WTMGX FUND3I NWTX FUND SEX3X
FUND33 BVALX FUND34BARAX FUND35=AFOPX FUND36= FTRNX FUND37AMRGX
FUND38= DNLDX
FUND39 HCRPX FUND40=NBSSX FUND41 PRGY( FUND42= INNDX
FUND43 TWHIX FUND44=DWIVX FUND45=LMVTX FUND46=JHNGX FUND47 SRSPX FUND48CFX FUND49 CLMBX
FUND5O= BOSSX;
Part 2: DEA Macro This macro accepts one parameter:
the index of the DMIJ to evaluate. The macro merges the input/output dataset with the appropriate theta and right hand side vectors. This dataset now contains the LP formulation for SAS to analyze. The procedure generates detailed printed reports for each of the
NAME DEA DEC: AUTOMATES THE CREATION OF PROBLEM DATASET.
MERGES THE CURRENT THETAAND RHS INTO THE COEFFICIENT MATRIX AND EXECUTES THE 1?,
PAI*A: NDMU -THE INDEX NUMBER OF THE DMU TO BE ANALYZED
%MACRO DEA(NOMU);
DATA —NULL—; ARRAY TICKR(50) $S; SEE TICK: CALL S'PUT (FUND)(',PUT(TICKR{&NDMU}$S.));
RUN;
DATA CUR—U; ARRAY THETA(50);
ARRAY RH{5O}; IMIDMU; SET PARAM; CTHETA THETA(I);
RHS_ RH{I}; KEEP CTHETA_RHS_;
DATA PROBLEM; MERGE ONE CUR_DMU;
TIELEI DATA ENVELOPMENT ANALYSIS'; TITLE2 CURREN1 E(AU= &FUNDX';
P EXECUTE THE LP' / PROC LP DATAPROBLEM POUT a &FUNDX;
%MEND DEA
17
LINKED LINEAR REGRESSION - DEA MODEL
Part 3: Loop Macro ' In order to iterate the execution of a
SAS procedure without repetitive statements a macro was written to loop around the calls to the DEA macro. The loop macro accepts one parameter: the number of DMUs in study.
Part 4: Extract Macro In addition to the written reports
generated by the DEA macro, datasets were created in memory containing the results of the LPs. This macro combines the reports for each of the DMUs and extracts the DEA scores into a dataset which is then fed to the forecasting model.
NAME: LOOP DEM. ITERATIVELY CALLS THE DEA MACRO PA4: LIMIT - NUMBER OF OMUS TO BE PROCESSED
%MACRO LOOP(LIMIT) %DO INDEX I %TO &LIIrr;
%DEA(&INDEX); %ENE;
%MEND LOOE;
NAME: EXTRACT DEW. EXTRACTS THE OBJECT WE VALUES FROM THE
INDIVIDUAL FUND REPORTS AND CREATES DATASET CONTAINING THE IN FORMATION
%MACRO EXTRACT;
P MERGE LP OUTPUT DATASETS AND REMOVE UNIMPORtANT VARIABLES V
DATA REEl; SEE &SYMBOLS; KEEP _VAR_ _VALUE.
P EXTRACT OBJECTIVE FUNCTION VALUES i
DATA RES2;
SEE RESI; IF _VAft. OBJECr THEN SCORE = -VALUE-;
ELSE DELETE; KEEP SCORE
P GENERATE LISTING OF MUTUAL FUNDS DATA RESE; SEE TICK;
ARRAY TIOKR(50) $5; DO I ITO 50 PCHANGE THIS TO 50 IN PRODUCTION RUN 1
NAME TICKR(I); OUTPUT;
END;
KEEP NAME;
P MERGE CNTASEI' INTO ANAL FORM!
DATA RESULTS;
MERGE RESE RESE;
PROC PRINT DATA=RESULTS;
WEND EXTRACT;
rEXECUTABLE CODE
%LOOP(50); %EXTRACT;
18
LINKED LINEAR REGRESSION - DEA MODEL
Forcasting Code
P Mutual Fund Data Files 1
filename dati 'mfs93 data a'; filename dat2 'mfname data a'; filename dat5 'mfs90 data a';
P DEA Score Data Files 1
filename dat3 'score93 data a'; filename dat4 'score90 data a';
option linesize 72 nonumber nodate;
P Read 1993 Mutual Fund Data */ data one;
mule dati; array val(50); input name $ val{};
proc transpose; P Transpose Matrix: from lO*DMU to DMUIO 1
run;
proc datasets; P Assign proper names to lOs 1
modify data 1; rename colt =Beta
co12 = Load co13 = E_Rabe col4=NAV col5 = Cap-Gain co16 Dividend
_Name_= Fund;
data funds; P Create List of Fund Names / infile dat2; input Fund $;
data scr93; r Read 1993 DEA scores from file 1
infile dat3; input dea93;
tie1 '1993 DEA Analysis';
data comp93; set datal; merge datal funds scr93;
/ Run All possible regression models 1
proc reg; tle2 'All Possible Regression Models';
model dea93=Beta Load E_Ratio NAV Cap-Gain Dividend/ selecben=Rsquare;
/* Full Fit Model */ proc rag;
title2'Full Model'; model dea93=Beta Load E_Ratio NAV Cap-Gain Dividend;
proc rag; tle2 'Minimum Spec Model;
model dea93 = E_Ratio NAV CAP_GAIN;
/* 1990 Data Analysis */
data info9o; infile dat5; array val(50}; input name $ val{};
proc transpose; P Transpose Matrix from IODMU to DMUIO 1
run;
proc datasets; P Assign proper names to lOs modify data2; rename coil = Beta
col2 Load co13 E_Rabo col4=NAV col5 = Cap-Gain coI6 Dividend
_Name_= Fund;
data scr9o; infile dat4; input dea90;
data comp90; merge data2 funds scr90;
tiflel 1990 DEA Analysis';
proc reg data=comp9o; tille2 'All Possible Regressions'; model dea9o= Beta Load E_Ratio NAV Cap-Gain Dividend
/selection rsquare;
proc reg data=cornp9o; tifle2 'Full Model'; model dea90= Beta Load E_Ratio NAV Cap-Gain Dividend;
proc rag data=comp9o; tide2 'Minimum Spec'; model dea90= E_Ratio NAV CAP-GAIN; plot residual.(predicted. E_Ratio NAV Cap-gain);
/* Forecasting Validation Model 1
P Create predicted values using forecast model compare to actual 1993 values.
19
LINKED LINEAR REGRESSION - DEA MODEL
data predict; model Ldea = Beta Load E_Ratio LNAV LCG LDiv set datal; plot residuat(predicted.); Exp_dea=-0.177220E_Rao + 0.016477NAV +
0.152344Cap_Gain+0.279313; proc reg data = linear; if Exp_dea> 1 then Exp_dea = 1; model Idea = Beta E_Rao LNAV LDIV;
data forecast; merge predict funds scr93 scr9o; / Linear Regression Equation / Err = dea93 - exp_dea; data predict2; keep fund dea9O exp_dea dea93 err; set trans93;
PDEA 0.6874598eta - 0.525658E_Ratio + 0.753563LNAV + O.229900LD1V -2.930810;
Exp_dea = exp(PDEA); tine 'Regression Predictions'; if Exp_dea> I then Exp_dea = 1; tiUe2'Non-Linear Model'; proc pnnt data=forecast; / Compute Forecast Errors
var fund dea90 dec93 exp_dea err; data frcs; merge predict2 funds scr93 scr90;
proc plot; Err = dea93 - exp_dea; plot exp_dea * dea93; keep fund dea90 exp_dea dea93 err;
/ added 4/18/94'1 P Plot 93 Prediction vs 90 Actual data fcas title 'Regression Predictions';
merge scr90 forecast; tille2'Linear Model'; keep exp_dea dea90 dea93 fund; proc print data=frcst2;
var fund dea90 dea93 exp_dea err;
tillel 'Predicted 93 vs. Actual 1990'; tiUe2 'Non-Linear Model'; proc plot data=fcas
plot exp_dea * dea90;
P Linear transform to improve predictions data linear;
set comp90; Ldea = log(dea90); LNAV = log(NAV); it Cap—Gain = 0 then LCG = 0;
else LCG log(Cap_Gain); LEA log(E_Ratio); if Dividend = 0 then LDIV = 0;
else LDIV = LOG(Dividend);
P Transform forecast inputs to linear model /
data trans93; set comp93; Ldea log(dea93); LNAV = log(NAV); if Cap—Gain = 0 then LCG = 0;
else LCG = log(Cap_Gain); LER = log(E_Ratio); if Dividend = 0 then LDIV = 0;
else LDIV = LOG(Dividend);
Title 'Linear Forecast Model'; title2 'Model Verification'; proc rag data = linear;
20
Case Study: A Linked Linear Regression -
Data Envelopment Analysis Model for the Long-Term Prediction of
Mutual Fund Performance
Cindy Chao, Greg Higgins Michael McGee
Department of Computer Science and Engineering SOUTHERN METHODIST UNIVERSITY
Dallas, Texas
May 14, 1994
C
LINKED LINEAR REGRESSION - DEA MODEL
Contents
Abstract.............................................................................................................................3
Introduction...................................................................................................................... 3
Whatis a Mutual Fund? .................................................................................................4
DEAFactors......................................................................................................................4
Model Development and Validation............................................................................. 6
Examinationof Results ...................................................................................................9
InvestmentPerformance............................................................................................... 10
Conclusion...................................................................................................................... 11
References....................................................................................................................... 12
ProfessionalReferences................................................................................................. 13
Appendix........................................................................................................................ 14
DEAFormulation .............................................................................................. 15
DEACode ........................................................................................................... 16
ForcastingCode ................................................................................................. 19
2
Case Study: A Linked Linear Regression - Data Envelopment Analysis Model for the Long-Term Prediction of Mutual Fund Performance
Michael McGee Greg Higgins Cindy Chao Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275
Data Envelopment Analysis (DEA) can be used to evaluate managerial efficiency of mutual funds. By extending the DEA methodology with linear regression, a new prediction tool for future efficiency is developed. The linked model was tested on a sample of fifty growth mutual funds over a three year time horizon - resulting in accurate predictions of future efficiency with 99% confidence.
The current mutual fund market current assessment of a fund's contains nearly $1.4 trillion management. DEA is a linear
dollars; this rapid growth of mutual funds as a long-term investment instrument has led to an increased demand for evaluation and forecasting techniques. The performance of a mutual fund is directly linked to the performance of the management staff. Data Envelopment Analysis (DEA) can be used to evaluate managerial
programming methodology that computes relative efficiency scores among a group of homogeneous decision-making units (DMU5). Combined with linear regression, DEA becomes a prediction tool for long-term forecasting. The linked model not only evaluates return on investment but also the quality of the fund's
efficiency; however, this is only a managerial team on a long-term basis.
'3
LINKED LINEAR REGRESSION - DEA MODEL
What is a Mutual Fund? A mutual fund is an investment
company that pools the money of many institutional and individual investors into one professionally managed investment account. The fund manager creates a portfolio for the shareholders by buying a variety of "true" financial instruments, such as
corporate stocks and bonds, and government and municipal securities, in accordance with the fund's particular investment objectives. Fund managers are professional investors who know the financial market and follow it daily with extensive research capabilities to effectively investigate all
investment opportunities. The fund is always monitored, with individual securities constantly being bought, sold, and traded in order to optimize
the investment objectives. The beauty of investing in sometimes twenty
goal of the linked model is forecasting of long-term performance, growth mutual funds were selected because their objectives emphasize the accumulation of capital gains, and they have the greatest long-term growth potential.
DEA Factors The inputs and outputs of the
envelopment analysis were made up of the main characteristics of a mutual fund. They were selected because they are important factors to investors and are directly linked to the performance
of the fund manager. The inputs selected were beta - a measure of risk,
expense ratio, and loads & other fees. The outputs used were net asset value (NAV) per share, dividends, and capital gains.
Inputs Beta is a measure of the fund
different securities lies in the protection portfolio's average volatility, or risk,
of the investment through portfolio relative to the S&P 500. The beta value
diversification. Mutual funds are divided into six
major categories according to their investment objectives, which are outlined in the fund's prospectus, such as goals, level of risk, and the type and frequency of return desired. Since the
of the market is a constant equal to 1.00. A fund with a beta value greater than 1.00 is more volatile than the market. It is expected to outperform the "average fund" in a bull (rising) market and bring in greater returns, while it is expected to fall more than
4
LINKED LINEAR REGRESSION - DEA MODEL
the average fund in times of a bear brokerage costs, and transaction fees. (declining) market. A fund with a beta This value is a percentage of the value less than 1.00 possesses the purchase price taken off either before characteristics of the converse the money is invested (front-end load) situation. For example, a portfolio with a beta value of 0.75 is expected to be 25% less responsive to the general
or after investment when the shareholder redeems the shares (back-end load). The amount of the load is
economic trend than the average fund established by the managerial company, and return 25% more conservative than but is paid to the sales and transaction the average fund. team. Some funds have a load of 0.00%,
Ll
Expense ratio is a measure which relates expenses to the average net assets for a period. It is calculated by taking the annual expenses paid by the fund (including management and
investment advisory fees, custodial and transfer agency charges, legal fees, and distribution costs) divided by the
average shares outstanding for the period. The greater the expense ratio, the larger the percentage of the shareholder's investment gets paid out directly to the management company. The expense ratio does not include loads or commissions paid for sales or transaction fees.
Loads and other fees represent all
other charges made to the shareholder which are not accounted for in the expense ratio. A load is a portion of the offering price of the fund that pays for
costs such as sales commissions,
meaning that they have the same benefits as load funds, except they are sold directly by the fund company rather than through a sales broker so that no loads or other fees are incurred.
With no-load funds, the investor must seek out and contact the investment firm directly. There is a distinct advantage to investing in no-load
funds. For example, if we had invested $1000 in a particular portfolio with a 7.50% front-end load, $75 would be taken off the top, and we would be left with only $925 worth of shares in the fund. Had this been a no-load fund, the full $1000 would have been utilized.
Outputs Net asset value per share represents
how much each share is worth. It is calculated by taking the total market value of a fund's shares (securities plus
cash plus accrued earnings minus
LINKED LINEAR REGRESSION - DEA MODEL
liabilities) divided by the number of
shares outstanding. No profits or losses are realized through NAV until the fund is sold. Interest lies in the amount
of increase or decrease in NAV from
time of purchase to time of sale. Dividends are payments declared
by a corporation's board of directors and made to shareholders, usually on a quarterly basis. A fund's dividends come directly from the dividends paid out by the individual securities within
in different ways. Shareholders can
reinvest dividends and capital distributions if they elect to do so.
Model Development and Validation The decision—making units (DMUs)
in this analysis were the 50 growth mutual funds. A prerequisite of DEA is homogeneous DMUs. By selecting the funds from the same sector (growth funds), this requirement is met.
The DEA model uses the dual
the portfolio. They are paid out equally formulation of the linear program as
among the shareholders on a scheduled time frame.
Capital gains on a mutual fund are
the profits made from the sale of a capital asset (such as a security, stock, or bond) within the portfolio. The number used for capital gains is
adjusted for losses and stock splits. The sale of an asset occurs when the manager decides that a security needs to be sold for the benefit of the
described by Charnes, Cooper and Rhodes (1978). The objective function of this form is to minimize the
objective value of the current DMU. If a DMU is efficient the optimal value is 1, while an inefficient DMU is driven towards zero. The first step in the development of the linked model is the generation of DEA scores for the mutual funds being studied. The data collected for the mutual funds is used
portfolio. Capital gains are distributed as the inputs and outputs for the DEA to shareholders when they are realized code which is documented in the
(when a security is sold and a profit is made) and are paid out as capital distributions.
Dividends and capital gains are treated as separate outputs because they are incurred, paid out, and taxed
appendix. The DEA code is written
using the SAS/OR package allowing the DEA scores to be fed directly into the regression phase of the model.
DEA scores are generated for the fifty mutual funds in the study using
LINKED LINEAR REGRESSION - DEA MODEL
n
.
S
S
S
the 1990 data collected from Morningstar Mutual Funds. At this point
DEA scores are also generated for 1993 and stored for use in validating the model's forecasts. The 1990 DEA scores are then regressed against all of the six
DEA factors previously described. This regression model is unique
from previous research in that the goal of the regression is not to create an efficiency score, but to forecast an efficiency score for the future of a particular DMtJ. Previous work by Thanassoulis (1993) compared the DEA methodology with linear regression as a means of creating efficiency scores. Our model links the two methods into one prediction tool. The full factor model that resulted appeared on the surface to be a good predictor of efficiency (R2 =
0.8239). The full factor model included three non-significant factors at the 5%
level of significance. Beta (p-value = 0.68), load (p-value = 0.65), and dividend (p-value = 0.50) were all eliminated from the model leaving only expense ratio, net asset value, and capital gains as significant factors. The revised model remained significant
with an R2 = 0.8213.
As stated before, this model appears to be a good predictor; however, linear
fl fLAW flED
Figure 1: Residual vs. Predicted DEA Score
regression models must be validated
through diagnostic tests. One assumption in regression analysis underpins the entire methodology - errors are a randomly distributed
variable following a normal distribution with mean equal to zero, N(0,(Y). Regression analysis in SAS is based on a linear relationship between the dependent and independent variables. These assumptions can be verified through the examination of the residuals, or errors, of the regression model. A plot of the residuals versus the predicted values, Figure 1, reveals a distinct pattern indicating a non-linear
relationship between the independent
and dependent variables. This requires the examination of univariate statistics
S
7
0
LINKED LINEAR REGRESSION - DEA MODEL
S
.
S
S
S
S
for each of the variables in the regression model. A standard normal quantile plot for each of the variables allows the determination of the
distribution of each variable. From the univariate procedures,
some of the variables demonstrated an
exponential distribution requiring a log—linear transformation in order to validate the regression model. Net asset value, capital gains, dividends and the actual DEA score follow exponential distributions while beta, load, and expense ratio are all normally distributed. A second data set of the
inputs and outputs was created by
taking the natural log of each of the exponentially distributed variables. Once normality was confirmed
through univariate procedures, a new full factor linear regression model was fitted. The newly formed model resulted in an R2 = 0.8163 and was
validated through the residual plot shown in Figure 2. Using the SAS
regression report, the regression equation was created using the coefficients of the significant factors: beta, expense ratio, net asset value, and dividend.
The next step is the creation of
predicted efficiency scores. The
Figure 2: Residual vs. Predicted DEA Score—after linear transform
common method for forecast model verification is the prediction of the
past. The linked model is verified by
using the input and output data for 1993 in the regression equation created from the 1990 data. The predicted values are then compared to the actual 1993 DEA scores generated earlier by the DEA code. The forecast errors are evaluated using a significance test on
the mean difference. The test is conducted at the 1% level of significance. H0: mean difference = 0;
Ha : mean difference # 0. The resulting test statistic: z = -0.5164 and critical value, z = 2.57 leads to the conclusion that the prediction values are not
significantly different from the actual DEA scores with 99% confidence.
8
0
LINKED LINEAR REGRESSION - DEA MODEL
Fund flEA 1990 Predicted 1993 flEA 1993 Error FTRNX 1.00000 1.00000 0.62320 -0.37680
PRWAX 1.00000 0.78983 0.29415 -0.49568
GINLX 1.00000 0.33207 0.23691 -0.09516
ACRNX 1.00000 1.00000 1.00000 0.00000
FDETX 0.41944 0.98900 1.00000 0.01100
CFIMX 0.73415 0.98032 1.00000 0.01968
LOMCX 0.32965 0.68927 1.00000 10.31073Table 1: Efficient Mutual Funds
Examination of Results In analyzing the results of the
sample studied, it is important to remember that a mutual fund is mainly used as a long-term investment,
providing stable low-risk growth. A potential shareholder wants to invest in a fund that will be efficient in the future, regardless of its current efficiency. Therefore we focus our examination on the set of funds with efficient scores in either 1990 or 1993. Table 1 shows the 1990 DEA, Predicted
1993, and actual 1993 DEA scores along with the errors for each prediction (Error = 1993 DEA - Predicted 1993).
Of the first four funds listed (all receiving efficient ratings in 1990), the future efficiency of three funds was correctly predicted. Our linked model
predicted future behavior of currently efficient funds with 75% accuracy. Although the error for T. Rowe Price New America Fund, PRWAX, is high, we are not concerned with the
magnitude of inefficiency, only the
state of its efficiency. Of the last four funds in the table
(all receiving efficient ratings in 1993),
three funds were predicted accurately. The fifth and sixth funds, Fidelity Destiny Portfolios: Destiny II, FDETX, and Clipper Fund, CFIMX, both
received' a predicted score of approximately 0.98 for 1993. Based on the previous significance test, these values are not significantly different from 1.00. Thus our model predicts the funds that will be efficient in the future with 75% accuracy.
LINKED LINEAR REGRESSION - DEA MODEL
Table 2: Investment Simulation Results
Investment Performance Since investors are concerned with
the long-term growth and appreciation of their investments, DEA by itself is not an effective tool to aid in the development of investment strategies. However, combined with linear regression, DEA gives us the ability to estimate the long-term efficiency of a
fund. To illustrate the effectiveness of the linked model relative to DEA alone, we simulated an investment in the efficient funds in 1990 and the funds
predicted to be efficient by our linked model. Since mutual funds allow the purchase of fractional shares, we invested $1000 in each of the mutual funds. The results are given in Table 2.
The percent of appreciation we
received in the portfolio of 1990 efficient mutual funds was 46.92%, while the appreciation was significantly higher for the portfolio of
funds recommended by the linked model at 60.41%.
10
LINKED LINEAR REGRESSION - DEA MODEL
Conclusion An immediate extension to this
Q
The linked model allows us to broaden the range of applications for DEA. The linked model enables the
prediction of future efficiency, which is of great benefit to investors because it provides more insight into the long-term viability of a mutual fund. The model also requires little historical data for each DMIJ unlike other forecasting methods. Another benefit is the relative compactness of the model; very few predictors are required to explain a high degree of variability and achieve accurate predictions. Data envelopment analysis by itself is not an accurate predictor of future efficiency
of mutual funds. Only one fund that was efficient in 1990 remained in the efficient set in 1993. Linear regression and DEA combine to give us an accurate prediction for the future behavior of DMUs.
research is the definition of an "ideal type" mutual fund. An optimization algorithm will be used to generate
values for the regression factors and record the combinations which yield and efficient fund prediction. Once this benchmark is defined a growth mutual fund prospectus can simply be visually inspected to determine managerial efficiency. The next phase of this research would be to compare the accuracy of the linked model with that of other forecasting methods, such as time series decomposition. Another step in the continued development of this area of research is to improve the
accuracy of the predictions through further modifications to the regression model using a weighted regression model or the inclusion of adaptive error control.
11
LINKED LINEAR REGRESSION - DEA MODEL
References
Barr, R. S., L. M. Seiford, and T. F. Siems (1991), "An Envelopment Analysis Approach to Measuring the Management Quality of Banks," The Annals of Operations Research. J.C. Baltzer AG, Science Publishers.
Dilorio, F. C. (1991), SAS Applications Programming: a Gentle Introduction. Boston: PWS-KENT Publishing Company.
Flatt, L. G. (1994), "FUNDamentals: An Envelopment-Analysis Approach to Measuring the Managerial Quality of Mutual Funds," technical report. Dallas, Texas: Southern Methodist University, Department of Computer Science and Engineering.
Morningstar Mutual Funds User's Guide (1992), Chicago: Morningstar, Inc.
Morningstar Mutual Funds (1993), Chicago: Morningstar, Inc.
Norman, M. and B. Stoker (1991), Data Envelopment Analysis: The Assessment of Performance. New York: John Wiley & Sons.
SAS Institute Inc. (1989), SAS/OR User's Guide, Version 6, First Edition. Cary, NC: SAS Institute Inc.
Sexton, T. R., S. Sleeper, and R. E. Taggart, Jr. (1994), "Improving Pupil Transportation in North Carolina," Interfaces, January-February 1994. Providence, RI: The Institute of Management Sciences.
Thanassoulis, E. (1993), "A Comparison of Regression Analysis and Data Envelopment Analysis as Alternative Methods for Performance Assessments," O.R. Journal of the Operations Research Society. Coventry: Operational Research Society Ltd.
12
LINKED LINEAR REGRESSION - DEA MODEL
[ii Professional References
Barr, Richard S., Ph.D. Associate Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas.
Dula, Jose, Ph.D. Assistant Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas.
Guerra, Rudy, Ph.D. Assistant Professor, Department of Statistical Science. Southern Methodist University, Dallas, Texas.
Welch, David. Senior Systems Analyst II, Bradfield Computer Center. Southern Methodist University, Dallas, Texas.
13
LINKED LINEAR REGRESSION - DEA MODEL
Appendix
DEA Formulation Mat hematicalformuation of the DEA model used in the Linked Model.
DEA Code The DEA code is developed using Release 6.0 of the SAS System on an IBM 3090 mainframe. The code presented requires only two input files. One file contains the input and output factor data while the second file contains a listing of the mutualfund ticker symbols. The program is construct ed from a series of SAS Macros which fully automate the formulation and solution of the multiple linear programs required in Data Envelopment Analysis.
Forecasting Code Developed as an extension to the DEA code, the forecasting model uses the same computing platform and software. The program reads the same data files as the DEA code as well as the DEA scores generated by the DEA study, The program generates predicted efficiency scores and all necessary statistical procedures for validating the model.
14
LINKED LINEAR REGRESSION - DEA MODEL
DEA Formulation
A
Ex0
Inputs = =
-------- ] S
+ Slacks = outputs
A.
Mine
A7-A!^O A7
e^!o
Mat heinatical formulation of DEA requires that the inputs and out puts for each DMU are formated in a matrix, A The matrix is multiplied by a vector, 2,to construct the linear programming constraints.
Minimize 0 Subject to:
Inputs Beta: xi'DMUi +xDMU2+... +xsoDMUso - xs(E 1)<-E 0
Load: xIDMUI+x'DMU2+...+x5oDMU50-x5I(&1):50O
E_Patio: XIDMUI + X2 'DMU2 +... + x5a • DMU50 - Xsi (& 1)--^0 '0
OutputsNAV: xDMU +X'DMU2+... +XDMU50-x51(00) 01
Cap_Gain: xDMUi + X2 • DMU2 + + Xo DMU5O - Xti (0 0) 0 * 1 Dividend: xiDMUi + X2 • DMU2 +... + x50 DMU5O X51 (e 0) 0 1
All Variables ^! 0
where 01$ current declslon-inaldng unit (OMU)
Linear program solved by SAS for each DM11.
F]
15
fl
LINKED LINEAR REGRESSION - DEA MODEL
DEA Code S
Part 1: Initialization MICHAEL MCGEE
SENIOR DESIGN GROUP '94
This section reads the data files into DEA ANALYZER
S
a SAS dataset. Using the SAS Data Management Language datasets are created for the right hand side vectors, as well as the theta vectors. The two
FILENAME NOT MFNAME DATAA'; FILENAME DAT F9O DATA A'; OPTIONS LINESIZE72;
%LEI SYMBOLS ACRNX LOMCX HRTVX STCSX BEONX GINIX PARNX VCAGX DEVLXAGBBX FPPTX FDETX FDESX FCND( FBGRX FDVLX SRFCX FRGRX BIRWIX SAMVX FDFFX FAGOX MLSVX ALTFX ANEFX GPAFX PRWAX CGRWI( OVOPX WTMGX NWTX SEOCX BVALX BARANAFGPX FTRNXAMRGX DNLDX HCRPX NBSSX PRGWI( INNDX TWHIX DW1X LMVTX JHNGX SRSPX CFIMX CLMBX BOSSX
rU
datasets are perfectly matched - each DATA TICK; WILE NOT;
right hand side has exactly one ARMY TICKR(so)ss;
INPUT TICKR?);
corresponding theta vector. There is one . MUTUAL FUND TICKER SYMBOLS •1
right hand side and one theta vector for
S
each DMtJ in the study. These vectors are merged with the inputs and outputs to formulate the linear program. The final step in the initialization is the renaming of variables to the proper ticker symbols in order to aid in report interpretation.
DATA ONE; INFILE DAT; ARRAY FUND (50) INPUT _ID_ $
FUND (} _TYPE_ $ _RHS_;
P GENERATE THETACOLUMNS DATA NEW-COL; SET ONE; ARRAY THETA (SO); ARRAY FUND(5O); DOIITO5 IF _N_ = I THEN THETA(I) = I;
ELSE IF _N_ >=2 & _N_ <=4 THEN THETA{I}= -FUND(I);
ELSE THETA(I) 0; END DROP I;
P GENERATE RIGHT HAND SIDE VECTORS DATA RANGE;
ARRAY RH(50); ARRAY FUND(50);
SET ONE; DO I = ITO 5 IF _N_ I THEN RH(I} = MISSING; ELSE IF _N_>4 THEN RH(I)= FUND{I};
ELSE RH(I) 0; END; DROP MISSING I FUNDI-FUND5O;
DATA PARAM; MERGE NEW-COL RANGE;
P RENAMES THE FUNDS TO THE PROPER NASDAQ TICKER SYMBOL / PROC DATASETS; MODIFY ONE; RENAME FUNDI ACRNX
FUNO2 LOMCX
FUNDS HRT\
FUND4 = STCSX FUNDS = BEONX FUND6 = GINLX FUND7= PARNX FUND8 = VCAGA FUNDS = DEVLC FUNDIO= AGBBX FUNDII= FPPT)( FUNDI2= FDETX
iri
LINKED LINEAR REGRESSION - DEA MODEL
FUNDI3= FDESX FUNDl4 FCN1X FUNDIS= FBGRX FUNDI6= FDVLX FUNDI7= SRFCX FUND118 FRGRX FUNDI9= BRWIX FUND20= SAM'fX FUND2I FDFFX FUND22 FAGOX RJND23= MLSVX FUND24ALTFX FUND25=ANEFX FUND26= GPAFX FUND27= PRWAX FUND28 CGRY( FUND29 QVOPX FUND30= WTMGX FUND3I= NYVTX FUND32 SEcGX FUND33 BVALX FUND34= BARAX FUND35=AFGPX FUND36 FTRNX FUND37 AMRGX FUND38DNLDX FUND39= HCRPX FUND40=NBSSX FUND4I= PRGWX FUND42 INNDX FUND431WHD( FUND44 DWIvX FUND45= LMVD( FUND46= JHNGX FUND47= SRSPX FUND48= CFX FUND49= CLMB)( FUNDSO. BOSSX;
Part 2: DEA Macro 'NAME: DEA
This macro accepts one parameter:DESC: AUTOMATES THE CREATION OF PROBLEM DATASET.
MERGES THE CURRENT THETAAND RHS INTOTHE COEFFICIENT MATRIXAND EXECUTESTHE LP.
the index of the DM1.1 to evaluate. The PAPM:NDMU-THEINDEXNUMBEROFTHEDMUTOBEANALYZED •1
macro merges the input/output dataset %MACRODEA(NDMU); DATA _NULL_;
with the appropriate theta and right ARMY TICKR(So)ss; SErTK;
hand side vectors. This dataset nowCALL SYMPUT ('FUNDX',PUr(TICKR(&NDMU},SS.)); RUN;
contains the LP formulation for SAS to DATA CURU;
analyze. The procedure generatesARRAY THETA{50}; ARRAYRH(5O); h&NDMU;
detailed printed reports for each of the SET PARM; CTHEIA=A TH ErA{I};
Dtv[(Js._RHS_ = RH{I}; KEEP CTHETA_RHS_;
DATA PROBLEM; MERGE ONE CU R_DMU;
TITLEI 'DATA ENVELOPMENT ANALYSIS'; TITLE2 CURRENT OMU= &FUNDX';
/ EXECUTETHE LP '/ PROC LP DATA=PROBLEM POUT &FUNDX;
%MEND DEA;
17
LINKED LINEAR REGRESSION - DEA MODEL
Part 3: Loop Macro PNAME LOOP
In order to iterate the execution of a DESC: ITERATIVELY CALLS THE DEA MACRO PAPM: LIMIT • NUMBER OFDMU5TOBEPROCESSED
SAS procedure without repetitive%WCROLOOLIM
statements a macro was written to loop %DO INDEX I%TO8LAIIT; %DEA(AJNDEX);
around the calls to the DEA macro. The%EN %MENDLOO
loop macro accepts one parameter: the number of DMUs in study.
Part 4: Extract Macro "NAME EXTRACT
In addition to the written reportsDESC: EXTRACTS THE OBJECTIVE VALUES FROM THE
INDIVIDUAL FUND REPORTSANDCREATESA DATASET CONTAINING THE INFORMATION
generated by the DEA macro, datasets were created in memory containing the
-/.MACRO EXTRACT;
P MERGE LP OUTPUT DATASETSAND REMOVE UNIMPORTANT VARIABLES 'I
results of the LPs. This macro combines DATA RESI;
the reports for each of the DMUs andSET &SYMBOLS; KEEP VARVALUE;
extracts the DEA scores into a dataset P EXTRACT OBJECTFIE FUNCTION VALUES /
which is then fed to the forecasting DATARES2;
- -1 1
model.
SET RESI; IF -VAR OBJECT'THEN SCORE =_VALUE_;
ELSE DELETE; KEEP SCORE;
P GENERATE LISTING OF MUTUAL FUNDS/ DATA RES3; SET TICK;
ARRAY TICKR{SO} $5;
DO ITO 501, PCHANGE THIS TO 50 IN PRODUCTION RUN 'I NAME TICKR{l); OUTPUT;
END;
KEEP NAME;
P MERGE DATASET INTO FINAL FORM •/ DATA RESULTS; MERGE RES3 RES2;
PROC PRINT DATMRESULTS; %MEND EXTRACT;
P
EXECUTABLE CODE
%LOOP(50); %EXTRACT;
18
LINKED LINEAR REGRESSION - DEA MODEL
Forcasting Code
1 Mutual Fund Data Files 'I filename datl 'm1s93 data a'; filename daf2 'mfname data a'; filename dat5 'mfsgo data a';
1 DEA Score Data Files 1
filename dat3 'score93 data a'; filename dat4 'score90 data a';
option linesize = 72 nonumber nodate;
/* Read 1993 Mutual Fund Data •/
data one; infile
del I;
array val(50); input name $val{};
proc transpose; / Transpose Matrix: from IODMU to DMU*IO *1
run;
proc datasets; P Assign proper names to lOs 1
modify data 1; rename coil = Beta
co12 = Load co13 = E_Ratio col4=NAV co15 = Cap—Gain co16 = Dividend
_Name—= Fund;
data funds; P Create List of Fund Names 1
infile dat2; input Fund $;
data scr93; / Read 1993 DEA scores from file 1
infile dat3; input dea93;
titlel '1993 DEAAnalysis';
data comp93; set datal; merge data 1 funds scr93;
1 Run All possible regression models 1
proc reg; title2 'All Possible Regression Models'; model dea93=Beta Load E_Ratio NAV Cap_Gain Dividend/
selection=Rsquare;
proc reg; tle2 'Minimum Spec Model';
model dea93 = E_Ratio NAV CAP_GAIN;
1* 1990 Data Analysis /
data info9o; intile dat5; array val(50); input name $ val{};
proc transpose; J* Transpose Matrix: from IODMU to DMUIO I
run;
proc datasets; /* Assign proper names to lOs 1
modify data2; rename coil = Beta
co12 = Load co13 = E_Ratio col4=NAV col5 Cap—Gain co16 = Dividend
_Name—= Fund;
data scr90; infile dat4; input dea9o;
data comp9o; merge data2 funds scr9o;
title 1 '1990 DEA Analysis';
proc reg dala=comp9o; tifle2 'All Possible Regressions'; model dea90= Beta Load E_Ratio NAV Cap —Gain Dividend
/selection = rsquare;
proc reg data=comp90; tie2 'Full Model'; model dea9o= Beta Load E_Ratio NAV Cap —Gain Dividend;
proc reg data=comp90; title2 'Minimum Spec'; model dea9o= E_Ratio NAV CAP_GAIN; plot residual.(predicted. E_Ratio NAV Cap—gain);
1 Full Fit Model 1
proc reg;
P Forecasting Validation Model /
title2'Full Model';
P Create predicted values using forecast model model dea93=Beta Load E_Ratio NAV Cap—Gain Dividend;
compare to actual 1993 values.
19
LINKED LINEAR REGRESSION - DEA MODEL
data predict; model Ldea = Beta Load E_Ratio LNAV LCG LDiv; set data 11; plot residual.(predicted.); Exp_dea=-0.1 77220E_Rao + 0.016477NAV +
0. 152344*Cap_Gain+0.279313; proc reg data = linear; if Exp_dea> 1 then Exp_dea = 1; model Idea = Beta E_Rao LNAV LDIV;
data forecast; merge predict funds scr93 scr9o; Err = dea93 - exp_dea; keep fund dea90 exp_dea dea93 err;
title 'Regression Predictions'; title2'Non-Linear Model';
S proc print data=forecast; var fund dea90 dea93 exp_dea err;
proc plot; plot exp_dea dea93;
1 added 4/18/94/ 1 Plot 93 Prediction vs 90 Actual 1 data fcast
merge scr90 forecast; keep exp_dea dea90 dea93 fund;
title 1 'Predicted 93 vs. Actual 1990'; tifle2 'Non-Linear Model'; proc plot data=fcast
plot exp_dea * dea90;
/* Linear transform to improve predictions 1 data linear;
set comp90; Ldea = log(deago); LNAV = log(NAV); if Cap—Gain = 0 then LCG = 0;
else LCG = log(Cap_Gain); LEA = log(E_Ratio); if Dividend = 0 then LDIV = 0;
else LDIV = LOG(Dividend);
/ Transform forecast inputs to linear model */
data trans93; set comp93; Ldea = log(dea93); LNAV log(NAV); if Cap—Gain = 0 then LCG = 0;
else LCG = Iog(Cap_Gain); LEA = log(E_Aatio); if Dividend = 0 then LDIV = 0;
else LDIV LOG(Dividend);
Title 'Linear Forecast Model'; title2 'Model Verification'; proc reg data = linear;
I' Linear Regression Equation 1 data predict2;
set trans93; PDEA = 0.687459Beta - 0.525658E —Ratio + 0.753563LNAV
+ 0.229900LD1V -2.930810; Exp_dea = exp(PDEA); if Exp_dea> 1 then Exp_dea = 1;
1 Compute Forecast Errors 1 data frcs;
merge predict2 funds scr93 scr9o; Err dea93 - exp_dea; keep fund dea90 exp_dea dea93 err;
title Regression Predictions'; title2'Linear Model'; proc print data=frcst2;
var fund dea90 dea93 exp_dea err;
20