15
82 1541-1672/11/$26.00 © 2011 IEEE IEEE INTELLIGENT SYSTEMS Published by the IEEE Computer Society TRENDS & CONTROVERSIES Editor: Hsinchun Chen, University of Arizona, [email protected] Smart Market and Money and services (such as a retail market or real estate market). In economics and fi nance, fi nancial and monetary concepts such as emerging markets, commodity markets, and the stock market are often mentioned. In all these areas, one of the most challenging research directions is model- ing and predicting market movements. In recent years, the availability of diverse and voluminous market-related mass media and social media content (or Business Big Data) and the emer- gence of sophisticated, scalable text and social mining techniques present a unique opportunity for advancing research relating to smart market and money. This research area, at the intersec- tion of computational and fi nance research, aims at developing intelligent (smart) mechanisms and algorithms for predicting market and stock performances. Market Modeling and Analysis In economics, finance, accounting, and market- ing, researchers have developed many sophisti- cated theories and analytical models. Most of these modeling and analytical techniques are quantitative in nature and rely on carefully con- structed and highly relevant market- or firm- specific metrics, such as market return, market capitalization, and book-to-market ratio in the classical Fama-French three-factors model. 1 Nu- merous online, unstructured (text) Web-enabled business data sources considered by financial ana- lysts in their expert industry and firm analysis, in- cluding 10K/10Q SEC reports, mass media news, local news, Internet news, financial blogs, inves- tor forums, and tweets, are providing new op- portunities for extracting market- or firm-specific intelligence. Large-scale automated analysis of this text-based, qualitative content has only recently become possible via techniques such as topic ex- traction, named-entity recognition, sentiment and affect analysis, multilingual language analysis, so- cial network analysis, statistical machine learning, and temporal data and text mining. David Leinweber, an AI researcher by train- ing and an early innovator in the application of modern IT in trading and investing, argued in his highly acclaimed book Nerds on Wall Street that future Wall Street development opportunities are in advanced electronic tools and understanding both quantitative and qualitative information. 2 Some practitioners believe that “stocks are stories and bonds are mathematics.” Advanced text ana- lytics for mass and social media, when combined with well-grounded financial analytical models, could provide important new insights for under- standing and predicting the markets. In the IT community, this kind of data- and text-analytics-based approach to business analysis has gained significant attention and traction and is often referred to as business intelligence and ana- lytics. 3 BI has been used as an umbrella term to describe concepts and methods to improve busi- ness decision making by using fact-based support systems. BI includes the underlying architectures, tools, databases, applications, and methodologies relevant to business decision making. As a data- centric approach, BI heavily relies on various ad- vanced data collection, extraction, and analysis technologies. Since 2004, Web intelligence, Web analytics, Web 2.0, and user-generated content have begun to usher in a new and exciting era of Business Intelligence 2.0 research. Advanced information-extraction, topic-identification, opinion- mining, and time-series analysis techniques can be applied to traditional business information and the new BI 2.0 content for various accounting, finance, and marketing applications. M arket, a term frequently used in mass me- dia and academic publications, is an elusive concept. In marketing, researchers and practitio- ners describe market as a place to exchange products Hsinchun Chen, University of Arizona

Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

82 1541-1672/11/$26.00 © 2011 IEEE Ieee InTeLLIGenT SYSTemSPublished by the IEEE Computer Society

T R E N D S & C O N T R O V E R S I E SEditor: Hsinchun chen, University of Arizona, [email protected]

Smart Market and Money

and services (such as a retail market or real estate market). In economics and fi nance, fi nancial and monetary concepts such as emerging markets, commodity markets, and the stock market are often mentioned. In all these areas, one of the most challenging research directions is model-ing and predicting market movements. In recent years, the availability of diverse and voluminous market-related mass media and social media content (or Business Big Data) and the emer-gence of sophisticated, scalable text and social mining techniques present a unique opportunity for advancing research relating to smart market and money. This research area, at the intersec-tion of computational and fi nance research, aims at developing intelligent (smart) mechanisms and algorithms for predicting market and stock performances.

Market Modeling and AnalysisIn economics, fi nance, accounting, and market-ing, researchers have developed many sophisti-cated theories and analytical models. Most of these modeling and analytical techniques are quantitative in nature and rely on carefully con-structed and highly relevant market- or fi rm-specifi c metrics, such as market return, market capitalization, and book-to-market ratio in the classical Fama-French three-factors model.1 Nu-merous online, unstructured (text) Web-enabled business data sources considered by fi nancial ana-lysts in their expert industry and fi rm analysis, in-cluding 10K/10Q SEC reports, mass media news, local news, Internet news, fi nancial blogs, inves-tor forums, and tweets, are providing new op-portunities for extracting market- or fi rm-specifi c

intelligence. Large-scale automated analysis of this text-based, qualitative content has only recently become possible via techniques such as topic ex-traction, named-entity recognition, sentiment and affect analysis, multilingual language analysis, so-cial network analysis, statistical machine learning, and temporal data and text mining.

David Leinweber, an AI researcher by train-ing and an early innovator in the application of modern IT in trading and investing, argued in his highly acclaimed book Nerds on Wall Street that future Wall Street development opportunities are in advanced electronic tools and understanding both quantitative and qualitative information.2 Some practitioners believe that “stocks are stories and bonds are mathematics.” Advanced text ana-lytics for mass and social media, when combined with well-grounded fi nancial analytical models, could provide important new insights for under-standing and predicting the markets.

In the IT community, this kind of data- and text-analytics-based approach to business analysis has gained signifi cant attention and traction and is often referred to as business intelligence and ana-lytics.3 BI has been used as an umbrella term to describe concepts and methods to improve busi-ness decision making by using fact-based support systems. BI includes the underlying architectures, tools, databases, applications, and methodologies relevant to business decision making. As a data-centric approach, BI heavily relies on various ad-vanced data collection, extraction, and analysis technologies. Since 2004, Web intelligence, Web analytics, Web 2.0, and user-generated content have begun to usher in a new and exciting era of Business Intelligence 2.0 research. Advanced information-extraction, topic-identifi cation, opinion-mining, and time-series analysis techniques can be applied to traditional business information and the new BI 2.0 content for various accounting, fi nance, and marketing applications.

M arket, a term frequently used in mass me-

dia and academic publications, is an elusive

concept. In marketing, researchers and practitio-

ners describe market as a place to exchange products

Hsinchun Chen, University of Arizona

IS-26-06-TandC.indd 82 11/10/11 3:02 PM

Page 2: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

november/december 2011 www.computer.org/intelligent 83

Market PredictionOne of the most challenging areas of economics- and finance-related ana-lytical research is in predicting stock performances. In the popular press and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general public alike. American TV personality (on CNBC’s Mad Money), former hedge fund manager, and bestselling author Jim Cramer is one such example. Ac-cording to an article on CNBC’s web-site titled “Mad Money Manifesto,” Cramer claims that the show’s mis-sion and his job

is not to tell you what to think, but to

teach you how to think about the market

like a pro. This show is not about picking

stocks. It’s not about giving you tips that

will make you money overnight—tips are

for waiters. Our mission is educational,

to teach you how to analyze stocks and

the market through the prism of events.

Can you really beat the market? Theoretical perspectives on stock be-havior hold pessimistic assessments of the predictability of stock behavior. The famous Efficient Market Hypoth-esis argues that the price of a stock reflects all available information and the market reacts instantaneously, making it impossible to outperform the market. The Random Walk The-ory states that the price of a stock var-ies randomly over time, so future pre-diction of the market is impossible.

Despite these theories, industry fi-nancial analysts have adopted two ap-proaches to stock prediction. Funda-mentalists utilize fundamental and financial measures of the economy, in-dustrial sector, and firms to predict market and firm performances. For example, the Fama-French three-factor model considers market return, mar-ket capitalization, and book-to-market

ratio in its analysis.1 On the other hand, technicians use historical time-series information of the stock and market behavior, such as historical price, volatility, and trading volume, to predict the market. In addition to standard regression-based analytical techniques, various machine learn-ing methods have been adopted for financial analysis, including artificial neural networks, Bayesian classifiers, and support vector machines.

Recently researchers have incorpo-rated firm-related news article mea-sures. Computer scientists have devel-oped trend-based language models, press release categorization (for ex-ample, good, bad, or neutral), and textual representation of news ar-ticles.4 For example, using proper nouns and past stock price as repre-sentations and support vector regres-sion (SVR) for analysis, the AZFin-Text system was able to outperform major quantitative (quant) funds during a five-week testbed period in 2005 based on 2,809 news articles and 10 million stock quotes.4 Several studies have also attempted to corre-late Web forums with stock behav-ior. Early studies focused on activity, without content analysis, and identi-fied concurrent relationships. Subse-quent research measured opinions in forum discussions and identified pre-dictive relationships between forum discussion sentiment and subsequent stock returns, volatility, and trading volume.5,6 With the widespread avail-ability of Business Big Data and the recent advancement in text and Web mining, tremendous opportunities exist for computational and finance researchers to advance research relat-ing to smart market and money.

In This IssueThis T&C Department includes three article on smart market and money from distinguished experts in

information systems and business. Each article presents unique per-spectives, advanced computational methods, and selected results and examples.

In “AZ SmartStock: Stock Predic-tion with Targeted Sentiment and Life Support,” Hsinchun Chen, Ed-ward Chun-Neng Huang, Hsinmin Lu, and Shu-Hsing Li report on the design and testing of the AZ Smart-Stock system, which incorporates target sentiment and life support in a prototype stock-trading engine. We considered transaction costs and simulated trading performed using data collected for 129 trading days in 2008. The proposed trading model outperformed other benchmark mod-els in the 10-day trading window. This article also suggests several di-rections for future research in pre-dicting market movements.

In “A Stakeholder Approach to Stock Prediction Using Finance So-cial Media,” my colleague David Zimbra and I describe research that utilizes firm-related finance Web fo-rum discussions to predict stock re-turns and trading of firm stock. Rec-ognizing the diversity among forum participants, we segmented them into distinct stakeholder groups based on their interactions (posting activi-ties) in the forums. By analyzing fine-grained stockholder groups, this sys-tem reported improved stock-return prediction versus a baseline system and aggregated forum model.

In the final article, “Computational Intelligence for Smart Markets: In-dividual Behavior and Preferences,” Paulo B. Goes argues that in today’s Web-enabled marketplaces, the eco-nomic environment is much more complex than the preference model-ing used by experimental economists. The monitoring opportunities avail-able with the Internet provide am-ple opportunities to build analytics

IS-26-06-TandC.indd 83 11/10/11 3:02 PM

Page 3: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

84 www.computer.org/intelligent Ieee InTeLLIGenT SYSTemS

and computational intelligence to understand in real time the com-plexities of the preference structure and behaviors of today’s heteroge-neous market participants. Goes summarizes selected Web-based auc-tion research that illustrates how to acquire computational intelli-gence on the preferences and behav-iors of the participants in these new microeconomies.

AcknowledgmentsThis material is based in part on work sup-ported by the US National Science Founda-tion (NSF) under grant CNS-0709338 and CBET-0730908 and the Defense Threat Reduction Agency (DTRA) under award HDTRA1-09-0-0058.

References 1. E. Fama and K. French, “Common

Risk Factors in the Returns on Stocks

and Bonds,” J. Financial Economics,

vol. 33, 1993, pp. 3–56.

2. D. Leinweber, Nerds on Wall Street,

John Wiley & Sons, 2009.

3. H. Chen, “Business and Market Intel-

ligence 2.0,” IEEE Intelligent Systems,

vol. 25, no. 1, 2010, pp. 68–71.

4. R. Schumaker and H. Chen, “Textual

Analysis of Stock Market Prediction

Using Breaking Financial News: The

AZFinText System,” ACM Trans.

Information Systems, vol. 27, no. 2,

2009, article no. 12.

5. W. Antweiler and M. Frank, “Is All

That Talk Just Noise? The Information

Content of Internet Stock Message

Boards,” J. Finance, vol. 59, no. 3, 2004,

pp. 1259–1295.

6. S. Das and M. Chen, “Yahoo! for

Amazon: Sentiment Extraction from

Small Talk on the Web,” Manage-

ment Science, vol. 53, no. 9, 2007,

pp. 1375–1388.

Hsinchun chen is the director of the Ar-

tificial Intelligence Lab at the University

of Arizona. Contact him at hchen@eller.

arizona.edu.

AZ Smart Stock: Stock Prediction with Targeted Sentiment and Life Support

Hsinchun Chen, University of ArizonaEdward Chun-Neng Huang, MicrosoftHsin-Min Lu and Shu-Hsing Li, National Taiwan University

Stock market prediction has long been a challenging research topic. One research stream focuses on us-ing text information to predict stock market movements. Robert Schu-maker and Hsinchun Chen inves-tigated various representations of text data in breaking financial news, in conjunction with past stock re-turns, for intraday stock price pre-diction.1 Their experiments suggest that proper nouns can deliver the best prediction accuracy in simulated trading. Their experiments, nonethe-less, did not consider news sentiment (good or bad) and ignored transac-tion costs, which could significantly reduce profitability in high-frequency intraday trading. Sentiment extracted from financial text data (such as fo-rums and news articles) has been ex-amined as a proxy of public opinions and correlated to stock market activi-ties by prior studies.

In other research, Paul Tetlock ad-opted GI Inquirer, a popular senti-ment dictionary, to evaluate the pes-simism level of the daily “Abreast of the Market” column in the Wall Street Journal.2 His results showed that the pessimism level was corre-lated with the downward pressure on market prices. Werner Antweiler and Murray Frank examined the rela-tionship between sentiment in Yahoo Finance Web forum discussions and stock behaviors.3 They developed a naïve Bayes-based sentiment classifier to classify forum messages into buy, hold, or sell. To represent aggregated

sentiment, they proposed a measure of disagreement that they found to be associated with stock volatility and trading volume. Using the event study framework, Antweiler and Frank later found that news events observed from the Wall Street Journal had sta-tistically significant effects on cumu-lative abnormal stock returns in five- to 40-day windows.4 Their results suggest that stock markets might need some time to fully absorb the impact of new information. However, their experiment did not consider the effect of repetitive news from multi-ple news sources.

We identified several research op-portunities from previous studies. News sentiment can contain valu-able information, but one news arti-cle might cover multiple companies. Thus, associating sentiment with the right company (targeted sentiment) presents a challenge. Old, often- repetitive news might also contain significantly less value and could be processed differently if we consider an aging theory (life support). Lastly, prior studies have not systematically studied the impact of trading win-dows and often ignored transaction costs.

Our work addresses these chal-lenges by developing a text-based stock-prediction engine with tar-geted sentiment and life-support considerations in a real-world finan-cial setting. This article presents our system design and summarizes our findings.

System DesignOur inter-day trading experiments follow Antweiler and Frank’s work,4 with five-, 10-, 20-, and 40-day trad-ing windows. For a given day t, our system collects news articles pub-lished between 16:00 on day t – 1 and 16:00 on day t. We use the in-formation in this text collection to

IS-26-06-TandC.indd 84 11/10/11 3:02 PM

Page 4: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

november/december 2011 www.computer.org/intelligent 85

predict future stock prices. Trading decisions are made based on the pre-dicted prices. Figure 1 presents our AZ SmartStock system design, which consists of four main components: data collection, data representation, stock price prediction, and simulated trading.

Data CollectionBoth historical news articles and stock market data need to be col-lected for our text analytics. We fo-cus on S&P 500 firms to minimize the potential illiquid problem asso-ciated with thinly traded stocks. We extracted news articles from major newswires on Yahoo Finance. We collected high-frequency trading data from the Wharton Research Data Ser-vices (WRDS). This dataset was also used to estimate bid-ask spreads asso-ciated with the transactions suggested by our stock price prediction module.

Data RepresentationWe identified company names in news articles using the well-established Stanford Named Entity Recognizer (NER) tool (http://nlp.stanford.edu/software/CRF-NER.shtml). The ex-tracted named entities were then com-pared with the set of S&P 500 com-pany names on that date. The matching process started from the tightest rule (exact match) to the loosest one (par-tial name match).

To extract the specific sentiments to-ward a given company from the news articles, we used companies identified from the previous step as sentiment tar-gets. We then adopted a sentiment dic-tionary developed by Tim Loughran and Bill McDonald specifically for fi-nancial text analytics, to classify words in news articles into positive and nega-tive categories.5 Six negation rules were implemented following Loughran and McDonald’s work.5

Our sentiment-extraction approach works at the sentence level to link companies with specific-sentiment words. The algorithm was developed based on two assumptions:

• sentiments about a company will be expressed after the company has been mentioned and

• companies are more likely to be as-sociated with sentiment words closer to them.

Unlike sentiment analysis, life sup-port aims to extract the degree of topic novelty. Following previous work,6 a company’s life support comes from the aggregated energy (novelty) of terms used in the news articles where the company is mentioned. The energy is measured by the degree to which the distributions of terms have changed. We do this by first observing the chang-ing proportion of individual terms in

Figure 1. The AZ SmartStock system design. The four main components are data collection, data representation, stock price prediction, and simulated trading.

Firm information

Data collection

News database

Data collection

Trade and quote

Trade and quote

Life support Sentiment

Company name identification

Feature extraction

Stock price prediction models

Stock price prediction

Simulated tradingTransaction cost System evaluation

Data representation

Yahoo! Financebreaking news

IS-26-06-TandC.indd 85 11/10/11 3:02 PM

Page 5: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

86 www.computer.org/intelligent Ieee InTeLLIGenT SYSTemS

each news source. Specifically, the energy score of a term k on day t is given by summing the chi-square sta-tistics of different news sources:

Ea b c d a d b ca b c d ak ts s s s s s s s

s s s s s,

( )( )( )( )(

=+ + + −+ +

2

++ +∑ c b ds s ss)( )

Table 1 gives values of as, bs, cs, and ds.

Then, we transform the energy scores of terms into life-support values using a sigmoid function. The life support score LSg,t of a com-pany g on day t are the aggregation of life supports of all terms in the news articles where g has been men-tioned. LSg,t will decay over time by a constant decay function, LS′g,t = Max(LSg,t – b,0), where b is the de-cay nutrition factor, an empirical constant.

We further propose adjusted life support, which combines life sup-port and sentiment scores that rep-resent the joint effect of topic nov-elty and opinion directions of news events. The adjusted life support score of a company g on day t, AdjLSg,t, is defined as LSg,t * Sg. The total ad-justed life support of company g on day t is the accumulative adjusted life support (TotalAdjLSg,t), which is cal-culated as AdjLSg,t + AdjLS′g,t–1, + ... + AdjLS′g,0.

Stock Price PredictionAlthough there are many predictive approaches available, we adopted support vector regression (SVR) in this research, similar to a prior study.1 SVR are an extension of support vec-tor machines, which are known to de-liver excellent prediction performance in classifying discrete outcomes. Our firm-level stock-prediction models

considered past stock returns and text data in past news articles when making predictions for future stocks. To study the effect of different text representations, we adopted several model specifications. The first model, a baseline (M-Reg), predicts future returns using past returns:

Rt+n,t = <a, Rt> + b

where <•, •> denotes the dot product, Rt = (Pricet – Pricet–1)/ Pricet–1 denotes the stock return on day t, and Rt+n,t = (Pricet+n – Pricet)/ Pricet denotes the stock return on the following n days.

The second model, sentiment model (M-Senti), includes additional sentiment- related information as input. We cap-ture two aspects of news sentiment in the model. The first aspect is the absolute sentiment on day t: Senti-mentt = POSt – NEGt, where POSt is the number of positive words appear-ing on day t and NEGt is the num-ber of negative words appearing on day t. The second aspect of sentiment is the volume of sentiment words: SentiWordst = POSt + NEGt. Com-bining the two sentiment variables, our second model can be written as follows:

Rt+n,t = <a, (Sentimentt, SentiWordst, Rt)> + b

The third model, life-support model (M-LS), was included to in-vestigate the effect of life-support scores. Instead of including senti-ment scores, here we added life-sup-port scores to the baseline regression model:

Rt+n,t = <a, (TotalAdjLSt, SentiWordst, Rt)> + b

All models were trained using 60 days of historical data with a linear kernel for SVR.

Simulated TradingWe aggregated news articles by day and conducted prediction at the clos-ing time (16:00) of a trading day. We experimented with the holding peri-ods of five, 10, 20, and 40 days. This study considered transparent trans-action costs and a bid-ask spread. The transparent transaction cost was set to $4.95 per order ($9.90 for a roundtrip) according to the online broker TradeKing. In addition, we also considered two regulatory fees that occur during transactions: a Se-curities and Exchange Commission fee, which is the principal amount times 0.0000169 and applied to the sale transactions of all equities, and a trading activity fee, which is 0.000075 per share for equity sells with a maximum charge of $3.75 per sale transaction. We also considered the bid-ask spread by matching the quote records with the trade records (which we don’t discuss in detail here due to space limitations).

Our study incorporates two sim-ple trading strategies: buy and short. The buy strategy is to purchase and hold stocks for a period of time. The short strategy short-sells stocks first and buys them back later. Therefore, the buy strategy gets a positive re-turn when the stock price increases, while the short strategy gains posi-tive returns for stocks with decreas-ing prices.

At the end of each trading day, trading decisions of different hold-ing periods were made independently. For each holding period, the top five stocks with the highest predicted pos-itive and negative returns are traded with a $1,000 investment for each stock. Positions were held for a pre-specified trading window and closed on the last day of the trading window. We used three measures to evaluate the performance of our prediction engine: mean square error (MSE), directional

Table 1. Variables for determining the energy score of a term k.

DayIn news sources

In other news sources

t as bs

t – 1 cs ds

IS-26-06-TandC.indd 86 11/10/11 3:02 PM

Page 6: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

november/december 2011 www.computer.org/intelligent 87

accuracy (DA), and return of simu-lated trading. We report selected DA and return results here.

Experimental ResultsWe evaluated the performance of our AZ SmartStock system using 114,347 news articles published from 1 Janu-ary 2008 to 31 October 2008 (129 trading days during the 2008 US financial collapse) from the Yahoo Finance Breaking News section. We compared the performance of M-Reg, M-Senti, and M-LS prediction models.

Table 2 gives the overall DA results. The performances of the three mod-els did not differ significantly from one another. Overall, M-LS achieved the best directional accuracy (52.05 percent) after holding the stock for 10 days. The 10-day holding window consistently performed better than the five-, 20-, and 40-day holding windows.

Simulated trading was conducted to evaluate each model’s profitability. To compare the returns for different trading windows, we converted all the reported returns into returns of the entire testing time period, 129 days. Holding the S&P 500 during this pe-riod of time yielded a return of –30.10 percent (during the 2008 US financial collapse).

Table 3 and Figure 2 summarize the trading returns of the three mod-els across different trading windows. The results show that M-LS had the best performance across all trad-ing windows—albeit all the models achieved negative returns during the 2008 financial collapse. The best per-formance was obtained with M-LS for a 10-day trading window, result-ing in a return of –0.35 percent. M-Senti came in the second place in most cases. In addition, all models out-performed the S&P 500 index in all trading windows. The 10-day trading

window resulted in significantly bet-ter returns than other trading win-dows, which is consistent with the DA results.

Despite our promising results, there are several caveats thus far. Our results might have been heavily influenced by the highly volatile and disastrous financial events in 2008. A more systematic experimentation of other, longer time periods involv-ing ups and downs in the financial

markets are necessary. In addition, it would be interesting to perform an additional sensitivity analysis of each model’s selected best and worst stock recommendations to better understand the reasons for the suc-cess and failure of different tech-niques. We also hope to better inter-pret the decision rules or knowledge derived from the AZ SmartStock text analytics so we can make more informed decisions about

Table 2. Directional accuracy (DA) results.

Directional accuracy (%)

Prediction model 5 days 10 days 20 days 40 days

Baseline (M-Reg) 50.88 51.67 48.66 42.26

Sentiment (M-Senti) 50.79 51.76 48.44 42.38

Life support (M-LS) 50.90 52.05 42.98 44.01

Table 3. Trading returns across different trading windows.

Prediction model 5 days (%) 10 days (%) 20 days (%) 40 days (%)

M-Reg -10.32 -3.09% -5.67 -17.07

M-Senti -12.51 -0.71 -5.28 -15.72

M-LS -9.95 -0.35 -4.02 -13.77

Figure 2. Comparison of different prediction models: the baseline model (M-Reg), sentiment model (M-Senti), and life support model (M-LS). M-LS had the best performance across all trading windows. In all trading windows, all the models outperformed the S&P 500 index, which yielded a –30.10 percent return during this period.

–35.00

–30.00

–25.00

–20.00

–15.00

–10.00

–5.00

0.00

5.00

5 days

10 days

20 days

40 days

M-Reg

M-Senti

M-LS

SP500

Retu

rns

(%)

Trading window

IS-26-06-TandC.indd 87 11/10/11 3:02 PM

Page 7: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

88 www.computer.org/intelligent Ieee InTeLLIGenT SYSTemS

future market movements and firm performances.

AcknowledgmentsThis material is based in part upon work sup-ported by the US National Science Foundation (NSF) under grants CNS-0709338 and CBET-0730908 and the US Department of Defense (DOD) under award HDTRA1-09-0-0058. We thank all members of the Business Intel-ligence research team at the AI Lab.

References 1. R. Schumaker and H. Chen, “Textual

Analysis of Stock Market Prediction

Using Breaking Financial News: The

AZFinText System,” ACM Trans. Infor-

mation Systems, vol. 27, no. 2, 2009,

article no. 12.

2. P. Tetlock, “Giving Content to Investor

Sentiment: The Role of Media in the

Stock Market,” J. Finance, vol. 62,

no. 3, 2007, pp. 1139–1168.

3. W. Antweiler and M. Frank, “Is All

That Talk Just Noise? The Informa-

tion Content of Internet Stock Message

Boards,” J. Finance, vol. 59, no. 3,

2004, pp. 1259–1295.

4. W. Antweiler and M. Frank, “Do US

Stock Markets Typically Overreact to

Corporate News Stories?” working

paper, 2006, http://papers.ssrn.com/

sol3/papers.cfm?abstract_id=878091.

5. T. Loughran and B. McDonald, “When

is a Liability not a Liability? Textual

Analysis, Dictionaries, and 10-Ks,”

J. Finance, vol. 66, no. 1, 2011, pp. 35–65.

6. K. Chen, L. Luesukprasert, and S.T.

Chou, “Hot Topic Extraction Based on

Timeline Analysis and Multidimensional

Sentence Modeling,” IEEE Trans.

Knowledge and Data Eng., vol. 19,

no. 8, 2007, pp. 1016–1025.

Hsinchun chen is the director of the Artificial

Intelligence Lab at the University of Arizona.

Contact him at [email protected].

edward chun-neng Huang is a software

design engineer in test (SDET) at Microsoft.

Contact him at [email protected].

Hsin-min Lu is an assistant professor at the

National Taiwan University. Contact him at

[email protected].

Shu-Hsing Li is the dean of the College of

Management at the National Taiwan Uni-

versity. Contact him at shli@management.

ntu.edu.tw.

A Stakeholder Approach to Stock Prediction Using Finance Social Media

David Zimbra and Hsinchun Chen, University of Arizona

The prediction of a firm’s stock re-turn has long been of interest to re-searchers in the finance and machine learning disciplines. Traditionally, the firm’s financial measures and historical stock behavior informa-tion are utilized for prediction. More recently, researchers have demon-strated improved performance by in-cluding professional news articles on the firm.1 Finance Web forums on the firm have also been revealed as valuable sources in explaining subse-quent stock behavior,2 although few studies have leveraged them in a true predictive context performing simu-lated trading based on the extracted information. This article examines the inclusion of measures extracted from the discussion of a firm within a finance Web forum in the simu-lated trading of firm stock for one year.

Stock-Return Prediction and Finance Web ForumsAlthough the two prominent theo-retical perspectives on stock market behavior, the efficient market hy-pothesis and the random walk theory, provide pessimistic assessments of its predictability, researchers have dem-onstrated in empirical studies that

stock-return prediction might be pos-sible. Stock trading philosophies and the prediction of return have gen-erally followed the fundamentalist and technician approaches. Accord-ing to the fundamentalist approach, a stock’s price is determined by the fundamental and financial measures of the economy, industry, and firm. Technicians utilize historical time-series information of the stock and market behavior to predict future returns. In simulated trading, fun-damentalist strategies correspond to longer waiting times before reacting to new information on the firm, while technicians respond quickly to cap-ture profits before the market fully absorbs the new information into the stock price.3

Following the technician philoso-phy, researchers have integrated pro-fessional news articles into the pre-dictive models since new information on the firm is often released through the press, improving model perfor-mance by capitalizing on the time lag before investors react.1 Approaches to the automated analysis of the news articles generally take two forms, where the article content is repre-sented as textual features directly ap-plied to learn the relationship with stock return or by performing senti-ment analysis on the article, such as classifying good, bad, and neutral news, and using the derived senti-ment measures to predict return.

With similar motivations, research-ers have examined the relationships between discussions in firm-related finance Web forums, such as Yahoo Finance, and subsequent stock be-havior.2 Studies have revealed these forums provide significant explana-tory power of subsequent firm stock return, with unique information not covered in the professional news. In addition to considering forum activ-ity, such as message-posting volume,

IS-26-06-TandC.indd 88 11/10/11 3:02 PM

Page 8: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

november/december 2011 www.computer.org/intelligent 89

researchers have developed “bullish-ness” classifiers to perform sentiment analysis and classify messages repre-senting investors’ buy, hold, and sell positions. Bullishness classifiers typi-cally performed modestly, with 60 to 70 percent accuracy attributed to a noisy forum environment.2 Although investors and shareholders are specif-ically targeted by these classification schemes, forums hosted on major Web platforms can attract a diverse collection of the firm’s stakeholders. Thus, the diversity of the forum pop-ulation might be responsible for the unimpressive performance in classify-ing investment positions.

Departing from the prior literature, we adopt a stakeholder, rather than shareholder, perspective of forum participants and segment them into stakeholder groups to be assessed in-dividually. According to the stake-holder theory of the firm, satisfying the interests of a diverse collection of stakeholders, beyond those of the shareholders, is essential to success. Following that theory, a stakeholder approach to forum analysis recog-nizes that various stakeholder groups might have distinctive information in explaining firm stock return. Al-though few studies have leveraged firm-related finance Web forums in a true predictive context, this re-search performs simulated trading of firm stock based on the measures ex-tracted from online discussions.

Web Forum Analysis and Stock-Return ModelingFigure 3 shows the analytic frame-work we applied in this research. The framework’s four stages include stakeholder social network extrac-tion, stakeholder clustering, senti-ment analysis, and stock modeling.

Web forum participants form a so-cial network through their interac-tions in discussions. Subgroups are a

nontrivial structural feature of social networks, and identifying such sub-groups in the forum social network might reveal distinctive stakeholder perspectives within the population. To perform the stakeholder segmen-tation through clustering, we first ex-tract the forum social network and represent it using by an interaction matrix. In this research, posting mes-sages in the same discussion thread constitutes a relationship between participants. Each time participants interact in a discussion thread, their relationship is strengthened.

Because stakeholders often be-long to more than one stakeholder group, we apply a probabilistic clus-tering approach to group-related fo-rum participants. Specifically, a finite mixture model is utilized, with the expectation-maximization algorithm for estimating parameter values. We determine the number of clusters to represent the stakeholder groups in the forum using maximum likeli-hood estimation and cross validation. To ensure the independence of attri-butes included in the mixture model, we extract principle components from the interaction matrix prior to the clustering. Principle-component analysis also serves to reduce the highly dimensional feature space.

After clustering, stakeholders are rep-resented by their probabilistic assign-ments to each of the identified clusters throughout the stock modeling.

Similar to prior studies on firm-related finance Web forums and stock behavior,2 we perform senti-ment analysis on the forum messages. However, unlike the bullishness clas-sifiers devised to interpret investor communications, we apply a more general approach to evaluate the sen-timents expressed by various stake-holder groups. Specifically, we use the Opinion Finder (OF) system for sub-jectivity analysis4 and Senti Word-Net (SWN) lexicon5 in a combined fashion for sentiment analysis. Lexi-cal approaches to sentiment analysis are general, but they lack contextual knowledge of the specific usage of terms, which might result in errone-ous application. To incorporate con-textual information and ensure the correct SWN entry is applied based on the specific usage of the term in a subjective statement, we apply OF prior to assigning an SWN score. OF enables targeted application of SWN and provides measures of subjectivity to enrich the analysis.

To evaluate the application of firm-related finance Web forums in the prediction of daily stock return, we

Figure 3. Analytic framework. The framework consists of four stages: stakeholder social network extraction, stakeholder clustering, sentiment analysis, and stock modeling.

Web forum

Analytic framework

Stakeholder social network extraction

Stakeholder clustering

Sentiment analysis

Principle-component analysis Probabilistic clustering

Stock return modeling

Baseline model Forum model Stakeholder model

OpinionFinder

SentiWordNet

IS-26-06-TandC.indd 89 11/10/11 3:02 PM

Page 9: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

90 www.computer.org/intelligent Ieee InTeLLIGenT SYSTemS

compare the performance of a baseline model consisting of well- established fundamental and techni-cian explanatory variables to mod-els that incorporate forum measures. The dependent variable in all models is the log-difference in the daily close price of stock. The independent vari-ables included in the baseline model are the Fama-French factors: market return; book-to-market ratio; mar-ket capitalization; two lagged terms of prior daily stock return, volatil-ity, and trading volume; and dummy variables for the day of the week. In addition to the baseline variables, the forum-level model includes six fo-rum measures characterizing the fo-rum discussions during the prior day (using the trading day definition of 16:00 to 16:00): the number of mes-sages posted, average message length,

and average and variance in senti-ment and subjectivity.

Instead of forum-level measures, the stakeholder-level model incor-porates six measures for each of the stakeholder groups identified in the clustering. All stock-return mod-els utilize support vector regression, calibrated with five months of daily historical information and applied for the prediction of daily return for one month of trading days. For each month during the year of trad-ing, we perform stakeholder clus-tering to assess the current state of the forum and estimate new stock-return models. To identify the most relevant variables and ensure parsi-monious prediction models, feature selection is performed prior to model calibration using a correlation-based approach similar to step-wise regression;

the algorithm seeks subsets of input variables with high correlation with the dependent variable, and low in-ter-correlation. During the predic-tion period, participants in the fo-rum are classified according to the stakeholder groups established in the clustering. In the simulated trad-ing, firm stock is bought or shorted and sold on a trading day based on each model’s daily stock-return predictions.

Experimentation and ResultsFor our experiment, we selected Wal-Mart for stock-return modeling due to its prominence in the market and diverse and active collection of stake-holder groups. Prior studies have ex-amined Yahoo Finance firm-related Web forums, which we use in this research. We performed Wal-Mart stock trading based on our models’ stock-return predictions for one year (250 trading days) from 1 March 2006 to 28 February 2007. In total, we used 17 months of data, including the 5 months required to calibrate the first prediction models prior to trad-ing. Table 4 summarizes the Yahoo Finance forum data in the study.

To construct the stakeholder-level stock-return prediction models, we performed clustering each month dur-ing the year of trading to identify the current stakeholder groups in the fo-rum and assign participants to their appropriate mixture of groups. For example, to develop the first predic-tion model for March 2006, the stake-holder clustering was based on the previous five months of forum data (from 1 October 2005 to 28 February 2006). Eight stakeholder groups were identified, three of which were consid-ered major with more than 10 percent of forum participants.

Table 5 gives the top word bigrams and trigrams representing the three

Table 4. Summary of the Yahoo Finance Forum data.

Forum MessagesDiscussion

threads StakeholdersMessages per thread

Messages per stakeholder

Yahoo Finance (finance.yahoo.com), Wal-Mart (WMT)

134,201 40,633 5,533 3.3 24.25

Table 5. Major stakeholder groups in the March 2006 prediction model.

Stakeholder group

Forum participants (%) Top word bigrams Top word trigrams

1 53 wall street http news yahoo

long term http finance messages

bottom line moneycentral msn com

holiday season long-term sentiment strong

wmt stock quote profile research

2 11 sales growth higher gas prices

middle class wal-mart super center

small town www nlpc org

shop wal-mart ap wal mart

people want pay labor costs

4 23 million people pay health care

mom pop choose spend money

work wal-mart wal-mart distribution center

united states employee health care

past years everyday low prices

IS-26-06-TandC.indd 90 11/10/11 3:02 PM

Page 10: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

november/december 2011 www.computer.org/intelligent 91

major stakeholder groups (ranked by their term frequency). Stakeholder group 1 seemed to consist of techni-cal investors, heavily engaged in the exchange of news. Group 2 primarily discussed Wal-Mart’s growth and its impact on consumers and communi-ties. Stakeholder group 4 seemed to be made up of employees conversing on work and healthcare related issues.

In the simulated trading of Wal-Mart stock, each model began with an initial investment of $10,000. Ac-cording to each model’s daily predic-tions, if the anticipated daily stock return was greater than 0.1 percent (or less than –0.1 percent), the Wal-Mart stock was bought (shorted). Unchanged positions on consecutive trading days were held; otherwise, the stock was sold. Although prior stud-ies typically disregard trading costs, for additional realism, we incorpo-rated an $8 charge per transaction.

Table 6 presents the results of the ex-periment. For each of the prediction models, we report the final value of the

investment after completion of trading during the year as well as directional accuracy, the performance in correctly predicting the direction of the daily stock return, positive or negative.

As an additional point of reference, holding the Wal-Mart stock for the year would have resulted in an ending investment value of $10,096, providing an annual return of less than 1 percent. Results from the experimentation and simulated trading revealed each predic-tion model performed well, with bet-ter than 50 percent directional accu-racy, and earned substantial profit in simulated trading of Wal-Mart stock. Forum-level variables incorporated into the model, however, provided little

improvement over the baseline. Only after stakeholder segmentation and ex-traction of group-level measures from the forum was the improvement in di-rectional accuracy over the baseline statistically significant. The stake-holder model also produced a 44 per-cent annual return in simulated trad-ing, an impressive 27 percent increase over the baseline. Figure 4 depicts the investment values for each model over the year of trading.

This research shows that recogniz-ing the true diversity among forum participants, segmenting them into stakeholder groups based on their in-teractions in the forum social network, and assessing them independently

Table 6. Results of stock-return prediction models and simulated trading.

ModelDirectional

accuracy (%)Ending investment

value (%)Annual

return (%)

Baseline 56.8 $11,767 17

Forum 58.0 $11,643 16

Stakeholder 61.2* $14,413 44

*Pair-wise t-test; improvement over baseline model at p < 0.10.

Figure 4. Investment values during one year of simulated trading. The stakeholder model produced a 27 percent increase over the baseline annual return in simulated trading.

9,000

10,000

11,000

12,000

13,000

14,000

15,000

Start 5/1/2006 5/30/2006 6/27/2006 7/26/2006 8/23/2006 9/21/2006 10/19/2006 11/16/2006 12/15/2006 1/18/2007 2/15/2007 3/16/2007

BaselineForumStakeholder

IS-26-06-TandC.indd 91 11/10/11 3:02 PM

Page 11: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

92 www.computer.org/intelligent Ieee InTeLLIGenT SYSTemS

refined the measures extracted from the forum and improved stock-return prediction. The impressive perfor-mance of the stakeholder-level model represented a statistically significant improvement over the baseline in di-rectional accuracy and provided an an-nual return of 44 percent in simulated trading of firm stock.

This study contributes to the emerging trend in stock-return pre-diction research to incorporate ad-ditional sources of text-based infor-mation (professional news articles or online social media) into the models to improve performance. However, the study has several limitations, in-cluding the generality of the findings since the analysis covered only one firm for one year and the consistency of the approach under different mar-ket conditions and industries. In ad-dition to expanding the number of firms, industries, and time periods included in our future research, we intend to examine shorter time win-dows for stock-return prediction and higher-frequency trading.

AcknowledgmentsFunding for this research was provided by US National Science Foundation grants CNS-0709338 and CBET-0730908 and Defense Threat Reduction Agency (DTRA) grant HDTRA1-09-1-0058.

References 1. R. Schumaker and H. Chen, “Textual

Analysis of Stock Market Prediction

using Breaking Financial News: The

AZFinText System,” ACM Trans. Infor-

mation Systems, vol. 27, no. 2, 2009,

article no. 12.

2. S. Das and M. Chen, “Yahoo! for

Amazon: Sentiment Extraction from

Small Talk on the Web,” Manage-

ment Science, vol. 53, no. 9, 2007,

pp. 1375–1388.

3. B. LeBaron, W. B. Arthur, and

R. Palmer, “Time Series Properties of

an Artificial Stock Market,” J. Economic

Dynamics and Control, vol. 23,

no. 9–10, 1999, pp. 1487–1516.

4. T. Wilson et al., “OpinionFinder:

A System for Subjectivity Analysis,”

Proc. Conf. Human Language Technol-

ogy and Empirical Methods in Natural

Language Processing, Assoc. for Com-

putational Linguistics, 2005, pp. 34–35.

5. A. Esuli and F. Sebastiani, “SentiWord-

Net: A Publicly Available Lexical Resource

for Opinion Mining,” Proc. Conf. Lan-

guage Resources and Evaluation, 2006.

david Zimbra is a doctoral student in man-

agement information systems at the Univer-

sity of Arizona. Contact him at zimbra@

email.arizona.edu.

Hsinchun chen is the director of the Ar-

tificial Intelligence Lab at the University

of Arizona. Contact him at hchen@eller.

arizona.edu.

Computational Intelligence for Smart Markets: Individual Behavior and Preferences

Paulo B. Goes, University of Arizona

In an influential work for the exper-imental economics field, Nobel Prize winner Vernon Smith established the concept of a microeconomic system as consisting of an economic environ-ment, together with an economic in-stitution (or economic mechanism).1 The economic environment is the preferences of the people in the econ-omy. In the context of experimen-tal economics, Kevin A. Mccabe, Stephen J. Rassenti, and Vernon L. Smith introduced the concept of “smart computer-assisted markets.”2 The “smartness” of the markets came from the real-time computation achieved by the optimization algo-rithms to derive prices and determine winners in the auction experiments.

In the years that followed, the term “smart market” continued to be applied to computational and opti-mization efforts of price and winner determination involved in combina-torial auctions, usually formulated through complex math programming allocation problems. In a recent re-search commentary, Martin Bichler, Alok Gupta, and Wolfgang Ketter re-positioned the term “smart markets” in the more general context of com-putational intelligence for decision making by market participants.3

In today’s Web-enabled market-places, the economic environment is more complex than the preference modeling used by experimental econ-omists. The monitoring opportunities available with the Internet provide ample opportunities to build analyt-ics and computational intelligence to understand in real time the complexi-ties of the preference structure and behaviors of today’s heterogeneous market participants. This article uses the context of Web-based auctions to illustrate how to acquire compu-tational intelligence on participants’ preferences and behaviors in these new microeconomies.

Online Auctions Bidding BehaviorOver the last two decades, online auction sites, such as eBay and samsclub.com, have developed sophis-ticated user interfaces that let bid-ders search the entire marketplace of all available auctions, compare posted price alternatives, place bids, and monitor bidding on auctions of interest. These enhanced user inter-faces, the long duration of the auc-tions, the heterogeneous population of bidders who join and leave an auc-tion at will, and the knowledge acqui-sition opportunities of the Internet have pushed the boundaries and rede-fined Smith’s economic environment.

IS-26-06-TandC.indd 92 11/10/11 3:02 PM

Page 12: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

november/december 2011 www.computer.org/intelligent 93

Bidders’ preference models must ac-count for all the additional behav-ioral characteristics of the bidder’s interaction with the economic insti-tution (mechanism) that could impact an auction’s outcome.

Unlike the constrained experimen-tal economics environment, where bidders are assumed to be homoge-neous, a great deal of heterogeneity exists among bidders in the online environment. Bidders are smart con-sumers or resellers who approach the bidding process in different ways, af-fected by how they process informa-tion, their participation cost struc-ture, and their purchasing intentions and demand. For example, in the same auction, we can fi nd experi-enced small retailers who are pur-chasing items for resale and indi-viduals who are buying for personal consumption. Insights from bidding behavior of heterogeneous consum-ers are critical to designing auction mechanisms, which directly affect the auctioneers’ revenue, the appeal of the marketing channel to the consum-ers, and the effi ciency of the goods-consumers allocation.

Determining Bidders’ Participation Profi lesA typical consumer-oriented online auction lasts from several hours to sev-eral days. This helps attract more bid-ders, who are unknown beforehand to the auctioneer, and lets the auctioneer promote the auction to multiple time zones. Ravi Bapna and his colleagues used clustering techniques on detailed bidding data available from the inter-actions between bidders and auction-eers.4 Time of entry (TOE) captures the time of the fi rst bid placed by the bidder in the auction, while time of exit(TOX) clocks the time of the last bid by the bidder in the auction. NOB is the total number of bids placed by the bid-der in the auction between TOE and

TOX. Together, these variables cap-ture bidders’ degree of participation in an auction and can also be associated with the bidders’ cognitive preference of how they approach the interactive bidding process. Through a k-means cluster analysis on a large dataset from Yankee auctions, the authors were able to identify the following general partic-ipation profi les or strategies:

•A participator is a bidder with low TOE, high TOX, and high NOB. This bidder monitors the auction throughout its duration and bids actively.

• An evaluator is a bidder with rela-tively low or medium TOE, which generally coincides with the TOX.

This bidder places close to one bid (NOB = 1). This is the profi le of bidders who know how much they want to pay, probably have good knowledge about the item and its common value, and do not want to spend time actively participating throughout the auction.

•An opportunist is a bidder with a NOB close to 1, but who also has a high TOE and high TOX. These bidders place their only bids toward the end of the auction, looking for a bargain. This general strategy has also been identifi ed in the eBay en-vironment as “sniping.”

The differences in how consumers approach bidding affect the auction

outcomes. In the Yankee auction that utilizes a popular online multi-item, ascending, discriminatory mecha-nism, evaluators tend to realize the lowest levels of surplus, while oppor-tunists have a higher probability of winning. The actual mix of profi les present in any given auction dictates the competitive nature of the bidding and the fi nal outcome.

This bidding-strategy classifica-tion, which was found using data mining, has proven robust. Using data from various online auctions in different environments—consumer to consumer (C2C), business to con-sumer (B2C), and even B2B—similar clusterings were obtained.

The Internet has also opened op-portunities for retailers to increasingly explore the concept of sequential auc-tions to dispose of a large inventory of items. Dell and Sam’s Club, for exam-ple, routinely conduct a series of auc-tions selling identical items through their own auction sites to liquidate large inventories. These sequences of auctions often span several weeks. Bidders in such sequences have the op-portunity to participate in multiple auctions of the same item, and as they do, they learn from the experience and fi ne tune their bidding behavior to maximize their payoffs. This evo-lution in bidding behavior affects the nature of demand in these auctions.

In an earlier work, my colleagues and I introduced the variable time since last auction (TLA) to measure how long the repeat bidder waited between auctions of the same sequence to bid again.5 With the enhanced tu-ple [TOE, TOX, NOB, TLA], we performed cluster analyses on exten-sive datasets of sequential auctions and confi rmed the original catego-rization of evaluators, participators, and opportunists, but with a recur-rence dimension. Table 7 shows the results. New clusters with high TLA

The differences in how consumers approach bidding affect the auction outcomes.

IS-26-06-TandC.indd 93 11/10/11 3:02 PM

Page 13: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

94 www.computer.org/intelligent Ieee InTeLLIGenT SYSTemS

levels were identified as “intermittent strategies,” in which evaluators and opportunists intermittently opted to skip auctions. The vast majority of re-peat bidders were assigned to clusters

that reflected their recurrent partici-pation, not skipping auctions in the sequence.

By applying longitudinal data ana-lytics, we were able to capture the ef-fect of learning and experience as bidders repeat their participation throughout the auction sequence.5 We also revealed that there is demand het-erogeneity among the participants of sequential auctions. Individual bidders who want to purchase only one item coexist with resellers who are par-ticipating to purchase multiple items. These two types of consumers learn from their experiences differently. Indi-vidual buyers start bidding using early or middle evaluator strategies and tend to try out various strategies over time. As they accumulate more experience, they shift into using opportunist be-havior. Resellers tend to stick with opportunist or late-bidding strategies throughout the auction sequence.

Understanding Bidders’ BehaviorsAdditional analysis of the actual competitive behavior of bidders within each auction is possible us-ing a network structure called the outbid-by graph. Each node of the

graph is a bidder who is a winner at some point in the auction. There is a direct arc between bidder i and bid-der j, if bidder j directly outbid bidder i. Cycles in the graphs indicate com-petition activity. The numbers along each arc display the normalized time of the bids, so we can track the tim-ing of the competitive interactions. For example, suppose bidder A places the first bid in an auction at time 0.2 (when 20 percent of the auction dura-tion has elapsed). At time 0.3, bidder B bids higher and is now the current winner. At time 0.31, bidder A reacts and outbids bidder B. This bidding sequence defines a cycle in the graph (see Figure 5).

Figure 6 shows two real eBay auc-tions using outbid graphs. In Figure 6a, bidder “bizzywildcat” places the first bid at time 0.35. He or she is out-bid by “ivannada” at time 0.55 but regains the item at time 0.58. Three bidders subsequently become win-ners at times 0.63, 0.77, and 0.86, re-spectively. A bidding war takes place at the end of the auction between “tikidude11” and “terrible88,” who ends up winning the auction.

In Figure 6b, most of the bidding activity takes place toward the end

Table 7. Mean (standard deviation) of behavioral clusters of bidders participating in sequential auctions.

Bidders adopting strategy

Clusters/strategies Percentage No. of bids Time of entry Time of exit Time since last auction

Recurrent strategies

Early evaluators (EE-R) 22.76 1.17(0.53)

1.83(1.08)

1.92(1.13)

0.66(2.49)

Middle evaluators (ME-R) 24.56 1.20(0.48)

5.55(1.06)

5.64(1.07)

0.62(1.60)

Opportunists (O-R) 36.89 1.15(0.41)

9.04(0.88)

9.20(0.77)

0.77(2.71)

Participators (P-R) 5.45 2.56(0.84)

2.83(1.72)

8.97(1.13)

0.49(1.44)

Intermittent strategies

Early evaluators (EE-I) 2.73 1.15(0.50)

1.90(1.05)

1.97(1.14)

6.40(7.97)

Middle evaluators (ME-I) 2.82 1.40(0.70)

5.41(1.49)

6.22(1.49)

8.86(10.56)

Opportunists (O-I) 4.79 1.11(0.38)

9.17(0.82)

9.30(0.71)

9.04(12.11)

Figure 5. The outbid-by graph. Example with two bidders. In the bidding sequence, Bidder A eventually outbids bidder B at time 0.31.

Auctionbegins

Bidder A

Bidder B

0.20

0.30 0.31

IS-26-06-TandC.indd 94 11/10/11 3:02 PM

Page 14: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

november/december 2011 www.computer.org/intelligent 95

of the auction (after 92 percent of to-tal duration), with several bidders en-gaging in a bidding war. The graph in Figure 6a shows a more linear bidding process with fewer competi-tive flare outs, while the one in Fig-ure 6b displays a much more com-petitive auction, especially toward the end.

Constructed in real time, the outbid graph can reveal patterns of competition and detailed characterization of the bidding process. It provides the ba-sis for advanced data mining analyses such as graph clustering and sequence mining, with the idea of identifying and characterizing bidding patterns and whole auction patterns that are useful for decision making, both for the auctioneers and potential bidding agents that can aid the bidder.

Overall, in wide-open, online, con-sumer-oriented auctions, it is impor-tant to realize that heterogeneous bid-ders with different levels of experience,

different purposes, and different bid-ding behavior coexist throughout on-line auctions. The specific mix of behavior clusters present at any indi-vidual auction together with the com-petitive interactions will determine the outcome of the auction.

Auctioneers have the tools to mon-itor participants’ interactions in real time, perform analytics, and use the knowledge obtained to set the market-place and auction parameters to optimize their objectives. Bidders or third-party agents can also closely monitor an auction’s progress. I an-ticipate more advanced bidding tools will be created that can help bidders with the process of selecting bidding strategies that optimize their desir-able outcomes. In the eBay environ-ment, for example, third-party ven-dors are already offering sniping agents to regular bidders.

Future environments where all par-ticipants are armed with computa-tional intelligence to guide their de-cisions will generate a sophisticated computational game. We are not far from such an environment today. In small focused B2B environments such as energy and telecommunication marketplaces, this is already happen-ing. Still, there will always be a het-erogeneity of consumers with differ-ent levels of expertise and experience in general, open consumer-oriented markets.

In just two decades, the Internet has developed into a massive collec-tion of interconnected virtual sen-sors that capture all kinds of in-formation about consumers—what they do, where they are, what they think, and how they communicate with each other. Connectivity is everywhere, so the monitoring capa-bilities of tweets, social media, web-sites, blogs, news feeds, and mobile

Figure 6. Outbid-by graphs of actual eBay auctions. (a) In the first auction, various bidders outbid one another early on, and then a bidding war between “tikidude11” and “terrible88” ultimately determines the winner. (b) In the second example, most of the bidding activity takes place toward the end of the auction.

Auctionbegins

Auctionendspicayou69

waycool

audra1972

tko60

winwithu

viet1301

0.70

0.92

0.75

0.992 0.9960.9930.98

0.99

0.9970.95

0.991

Auctionbegins

Auctionends

johannat

ivannada jonathang

kbehm77

tikidude11

terryble88

bizzywildcat

0.35

0.55

0.58

0.63

0.77

0.86

0.95

0.96 0.98 0.98

(a) (b)

IS-26-06-TandC.indd 95 11/10/11 3:02 PM

Page 15: Hsinchun Chen, University of Arizona M€¦ · and stock advisory columns, “beat-ing the market” has often been con-sidered the elusive Holy Grail for practitioners and the general

96 www.computer.org/intelligent Ieee InTeLLIGenT SYSTemS

devices are limitless. Monitoring plat-forms can be built that lead to compu-tational intelligence of the underlying economic environment with accurate modeling of participants’ behaviors and intentions. Designers of smart marketplaces and their participants should carefully consider the existence of these capabilities.

References 1. V. Smith, “Microeconomic Systems as

an Experimental Science,” Am.

Economic Rev., vol. 72, no. 5, 1982,

pp. 923–955.

2. K. McCabe, S. Rassenti, and V. Smith,

“Smart Computer-Assisted Markets,”

Science, vol. 254, no. 5031, 1991,

pp. 534–538.

3. M. Bichler, A. Gupta, and W. Ketter,

“Research Commentary: Designing Smart

Markets,” Information Systems Research,

vol. 21, no. 4, 2010, pp. 688–699.

4. R. Bapna et al., “User Heterogeneity

and its Impact on Electronic Auction

Market Design: An Empirical Explo-

ration,” Management Information

Systems Q., vol. 28, no. 1, 2004,

pp. 21–43.

5. P. Goes, G. Karuga, and A. Tripathi,

“Bidding Behavior Evolution in Sequen-

tial Auctions: Characterization and

Analysis,” working paper, Univ.

of Arizona, 2011.

Paulo b. Goes is the Salter Distinguished

Professor of Management and Technology

and head of the Management Informa-

tion Systems Department at the University

of Arizona. Contact him at pgoes@email.

arizona.edu.

Selected CS articles and columns are also available for free at

http://ComputingNow.computer.org.

Call

Publish Your Paper

in IEEE Intelligent Systems

IEEE Intelligent Systems seeks papers on all aspects

of artifi cial intelligence,

focusing on the development

of the latest research into

practical, fi elded applications.

For guidelines, see

www.computer.org/mc/

intelligent/author.htm.

The #1 AI Magazine www.computer.org/intelligent

Be on the Cutting Edge of Artifi cial Intelligence!

IEEE

Articlesfor

IS-26-06-TandC.indd 96 11/10/11 3:03 PM