Automated Repricing in Comparison Shopping Agents - Price ... · change rates on every third day. Based on product minimum price change rates of up to every second hour the market

Automated Repricing inComparison Shopping Agents:Price Prediction and PricingStrategy Extraction UsingDecision TreesMaster-ThesisManuel Zahn | 1397961Wirtschaftsinformatik

Fachgebiet WirtschaftsinformatikFachbereich Rechts- und Wirtschaftswis-senschaften

Manuel ZahnMatrikelnummer: 1397961Studiengang: Master Wirtschaftsinformatik

Master-ThesisThema: "Automated Repricing in Comparison Shopping Agents:Price Prediction and Pricing Strategy Extraction Using Decision Trees"

Eingereicht: 22.10.2016

Betreuerin: Dr. Irina Heimbach

Prof. Dr. Oliver HinzFachgebiet WirtschaftsinformatikFachbereich Rechts- und WirtschaftswissenschaftenTechnische Universität DarmstadtHochschulstraße 164289 Darmstadt

Prof. Dr. Johannes FürnkranzKnowledge Engineering GroupFachbereich InformatikTechnische Universität DarmstadtHochschulstraße 1064289 Darmstadt

In Kooperation mit:Patagona GmbHPoststraße 964293 Darmstadt

Erklärung zur Master-Thesis

Hiermit versichere ich, die vorliegende Master-Thesis ohne Hilfe Dritter nur mitden angegebenen Quellen und Hilfsmitteln angefertigt zu haben. Alle Stellen, dieaus Quellen entnommen wurden, sind als solche kenntlich gemacht. Diese Arbeithat in gleicher oder ähnlicher Form noch keiner Prüfungsbehörde vorgelegen.

Darmstadt, den 22.10.2016

(Manuel Zahn)

Abstract

Prices on comparison shopping agents (CSAs) often emerge from automated complex rules likean alignment on competitor prices. Drawing conclusions from prices to their underlying pric-ing strategies is a major challenge. From an economic point of view, such pricing insights arecrucial, since they can be used for price prediction which enables better enforcement of ownpricing strategies. Following a strict divide-and-conquer concept, this thesis analyzes the feasi-bility of automated pricing strategy extraction and price prediction.

In a first step, 21.6 million offers are crawled from a German CSA. 100 products over a timespan of 80 days are represented in this recent and unique dataset. Subsequently, a fine-grainedmarket analysis has been conducted for multiple dimensions1 and multiple research fields. Thekey findings comprise: Detected daily minimum price change rates and detected reseller pricechange rates on every third day. Based on product minimum price change rates of up to everysecond hour the market analysis has shown that the dataset provides sufficient price dynamicsfor the purpose of gaining price insights.

Primarily, the problem has been simplified to the question: Is it possible to detect the priceseries’ origin by partitioning into manual and automated2 creation? This has been tested viasupervised classification. Experts have classified the price series. Decision tree algorithms arefurther used to classify the prices series based on a broad feature set. A comprehensive evalu-ation with 10-fold cross validation indicates that an automated repricing detection is possiblewith high accuracy.3 This result lays the foundation for the next tasks.

Mainly, the price prediction is performed with two different approaches. On the one hand, thereare time-series-based predictors which rely on the pure reseller price series using methods likesupport vector regression and arima models. On the other hand, there are feature-based algo-rithms which use a combination of decision and regression trees as key building blocks. A timeseries cross validation with up to 80 folds has been conducted. The feature-based algorithmsachieve promising forecasting results for different types of price changes. Up to 11% less pre-diction errors are made compared to a reference ’no price change’ predictor.

The pricing strategy extraction is based on a combined heuristic approach build on two classesof features and profound methods. For example, causality measures are used to identify com-petitor interlink strategies and motif discovery methods are used for extracting time-dependentstrategies. Based on 6,632 reseller price series, six different types of strategies are extracted.

The results of this thesis facilitate a deeper understanding of pricing mechanisms on CSAs andenable online retailers and repricing providers to be a step ahead of their competitors.

1 The market analysis dimensions encompass aggregation level, time, price and availability.2 In terms of ’by an automated repricing algorithm’.3 Reaching up to 97.11% area under the receiver operator characteristic (ROC) curve for the testing sets.

Contents

List of Figures 2

List of Tables 3

List of Algorithms 4

List of Abbreviations 5

1 Introduction 61.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Research Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Literature Review 102.1 Dynamic Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Comparison Shopping Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.1 Price Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.2 Customer Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.3 Reseller Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Dynamic Pricing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Pricing Strategy Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5 Price Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Market Review of Repricing Providers 243.1 Repricing Providers in Germany . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Repricing Providers in USA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Competitive Market Analysis 294.1 Approach and Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.1 1D: All Offers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3.2 1D: Product Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3.3 1D: Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3.4 1D: Resellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Analysis 395.1 Automated Repricing Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.1.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.2 Price Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Contents 3

5.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.3 Pricing Strategy Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.3.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.3.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6 Conclusion 76

Bibliography 78

A Product Selection Process 84

B Classification Feature Selection Algorithms 86

C Classification Classifiers Grid Search Configuration 88

D Evaluation of Different Balancing Schemes 89

E Detailed Classification Results 90

F Large Decision Tree Examples 92

G Prediction Classifier Grid Search Configuration 93

H Start Hour Prediction Comparison 94

I Detailed Minimum Price Prediction Results 95

J Detailed Reseller Price Prediction Results 97

Contents 1

List of Figures

1 The environment of a repricing provider (From an e-commerce perspective). . . . 72 A typical offer section from a CSA (idealo.de). . . . . . . . . . . . . . . . . . . . . 123 The market analysis concept. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 The offer origin on idealo.de. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Analysis of price trends on a day of the week base. . . . . . . . . . . . . . . . . . . . . 346 Delta analysis of all offers by different time horizons. . . . . . . . . . . . . . . . . . . 367 Product categories under consideration of different deltas. . . . . . . . . . . . . . . . 378 Product with GTIN 8628264 with two high frequency repricing resellers. . . . . . . 379 Analysis overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3910 A simple decision tree example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4011 5-fold cross validation partitioning scheme. . . . . . . . . . . . . . . . . . . . . . . . . 4212 The evaluation scheme of the automated repricing classification. . . . . . . . . . . . 4513 Automated repricing ratio of categories. . . . . . . . . . . . . . . . . . . . . . . . . . . 4714 A generated C4.5 tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4815 Classification prediction results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4816 Transition of the classification prediction results from theory to practice. . . . . . . 4917 5-fold time series cross validation partitioning scheme. . . . . . . . . . . . . . . . . . 5018 A simple regression tree example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5119 The price delta prediction concept. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5420 The evaluation scheme of the decision/regression tree price predictor. . . . . . . . 5721 Minimum price prediction results for simple delta . . . . . . . . . . . . . . . . . . . . 6022 Minimum price prediction results for direction delta . . . . . . . . . . . . . . . . . . . 6123 Minimum price prediction results for absolute delta. . . . . . . . . . . . . . . . . . . . 6124 A grown M5 tree for minimum price prediction. . . . . . . . . . . . . . . . . . . . . . 6225 Reseller price delta prediction results of the car category. . . . . . . . . . . . . . . . . 6326 A grown M5 tree for reseller price prediction. . . . . . . . . . . . . . . . . . . . . . . . 6427 The pricing strategy extraction pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . 6728 Extracted pricing strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7029 Interlink between mein-reifen-outlet.de and giga-reifen.de. . . . . . . . . . . . . . . . 7130 Interlink between acom-pc.de and future-x.de. . . . . . . . . . . . . . . . . . . . . . . . 7131 Night time frame strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7232 Daily assortment repricing strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7333 The target position strategy in action (GTIN 3439602810019). . . . . . . . . . . . . 7434 Different balancing schemes with REP trees. . . . . . . . . . . . . . . . . . . . . . . . . 8935 A generated C4.5 tree of medium size. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9236 A generated C4.5 tree of large size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9237 Start hours and RMSE stability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

List of Figures 2

idealo.de

idealo.de

List of Tables

1 Baseline DP directions based on Boer (2015) and Gönsch et al. (2013, p. 511). . . 102 Price dispersion explanation approaches based on Grover et al. (2006, pp. 300-

302). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Summary of observed strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 The underlying strategy parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Repricing providers in Germany. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Repricing providers in the USA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Examples of deltas and delta ratios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 The composition of a product quintuple. . . . . . . . . . . . . . . . . . . . . . . . . . . 299 The 100 selected products. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3010 The different offer analyzers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3311 Minimum price trends of selected categories. . . . . . . . . . . . . . . . . . . . . . . . 3512 Overview of classification features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4413 Overview of time series prediction methods of R’s forecast package. . . . . . . . . . 5314 Overview of prediction features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5615 Price persistence ratios of the decision tree approaches (predictive car category

with all resellers). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6416 Correctly identified target position strategies. . . . . . . . . . . . . . . . . . . . . . . . 7317 Top 40 (10/25/2015 - 13 PM) of Billiger.de. . . . . . . . . . . . . . . . . . . . 8418 Top 40 mapped categories of Billiger.de. . . . . . . . . . . . . . . . . . . . . . . 8519 Product category selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8520 Classification grid search parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8821 Base results of the automated repricing classification. . . . . . . . . . . . . . . . . . . 9022 Detailed results of the automated repricing classification. . . . . . . . . . . . . . . . 9023 Preferred features of the automated repricing classifiers. . . . . . . . . . . . . . . . . 9124 Price prediction grid search parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . 9325 Minimum price prediction results for daily simple price deltas. . . . . . . . . . . . . 9526 Minimum price prediction results for daily direction price deltas. . . . . . . . . . . . 9627 Minimum price prediction results for daily absolute price deltas. . . . . . . . . . . . 9628 Reseller price prediction results for the car product category. . . . . . . . . . . . . . 97

List of Tables 3

Billiger.de

Billiger.de

List of Algorithms

1 Random forest algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 Interlink strategy extractor scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693 Greedy feature selection algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864 Binary feature selection algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

List of Algorithms 4

List of Abbreviations

API Application Programming Interface

AR Automated Repricing

ARIMA Autoregressive Integrated Moving Average

BATS Box-Cox transform, ARMA errors, Trend, and Seasonal components

CSA Comparison Shopping Agent

DSHW Double-Seasonal Holt Winters

DP Dynamic Pricing

ETS Exponential Smoothing

GTIN Global Trade Item Number

HW Holt Winters

KPI Key Performance Indicator

LR Linear Regression

MAE Mean Absolute Error

MLP Multilayer Perceptron

MR Manual Repricing

NNETAR Neural Network Auto Regression

PPR Price Persistence Ratio

RMSE Root Mean Squared Error

ROC Receiver Operator Characteristic

SMOTE Synthetic Minority Over-sampling Technique

STL Seasonal and Trend decomposition using Loess

SVR Support Vector Regression

TBATS Trigonometric BATS

UTC Coordinated Universal Time

List of Abbreviations 5

1 Introduction

Resellers on comparison shopping agents often use sophisticated dynamic repricing policies for

setting prices. Drawing conclusions from prices to their underlying pricing strategies is a major

challenge. In this thesis, the market dynamics of a comparison shopping agent are analyzed and

machine learning methods are applied in order to reverse engineer pricing knowledge. Pricing

insights can have great impacts from an online retailer’s perspective, since future prices can be

predicted, which in turn is the key for pareto optimizing one’s own prices.

1.1 Motivation

Dynamic pricing describes price analysis and price adjustments of products or services in a

market environment, where prices can easily and frequently be adjusted (Boer 2015a, p. 2).

That characterizes the environment, where online resellers are operating. Often there is a

fine pricing line between minimizing lost sales and maximizing margins. Advanced pricing

strategies have to be implemented for reaching the online reseller’s goals. The application of

dynamic pricing software for automated price determination, so called repricing tools, is a key

factor in order to margins in electronic commerce. It can be assumed that such price intelligence

is emerging and wide-spread used of up to 25% by the online resellers (Baird and Rosenblum

2013, p. 21, (US); Skorupa 2014, p. 2, (Worldwide)).

However, repricing tools only consider the current pricing situation. Developing and ap-

plying pricing strategies is a crucial task for an online reseller. And manual implementation of

pricing strategies is arduous and time consuming and thus ineffective. On this account repricing

providers arose. They offer automated repricing services relying on a broad spectrum of pricing

strategies and corresponding parameters. Researchers denoted that repricing providers have to

learn, adapt to and anticipate in the dynamic e-commerce environment (Kephart et al. 2000,

p. 749). So what do those artificial pricing strategies look like and is it possible to deduce them?

The thesis is fostered by cooperation with a leading German repricing provider which grants

access to its crawling framework. The crawled dataset consists of 21.6 million offers for 100

products of a single German comparison shopping agent (CSA). Building on top of a fine-grained

market analysis, this master thesis analyses historic reseller price series on a CSA with machine

learning methods. Primarily, state-of-the-art decision tree approaches are applied. The intended

objective is to derive pricing strategies of resellers on product level by using exploited pricing

intelligence. If the pricing strategies of the competitors are known, the future pricing structure

can be predicted. Vice versa, new repricing algorithms can be developed in order to optimize

prices from an online retailer’s perspective. Further, this thesis will show that pricing strategies

do not necessarily have to be known in order to make good forecasts of price changes.

1.2 Research Problem

This subsection describes the environment of a repricing provider which is shown in figure 1.

On the one hand, there are customers, which may have a basket of desired goods at a

specific point in time. These customers pursue different goals like buying their products for the

1 Introduction 6

Online purchasewith

baskett of goods

Goals vMinimize costs vFast deliveryvTrustworthy Shop

Pricecomparison

w/o w/

For {1..n} products

Retailer

Top Hits

Assortmentt

Pareto PriceOptimization

automated manual

PricingStrategy

Competition-based

Demand-based

Inventory-based

Target Position

Timeframe

Pull-Up

Basis Parameters

Problem The price recommendation is based on historical data!

Customer Online Retailer

Comparison Shopping Agents

Price

Delivery Time

Criteria

w/ delivery

w/o delivery

GoalsvMaximize marginsvIncrease customer satisfaction

Min/Max pricePrice gap

Delivery costOwn shops

...

......

RepricingProvider

Consider Pricing Strategies

Crawl Offers

Price Recommendation

Map Criteria

w/ delivery

w/o delivery

Figure 1: The environment of a repricing provider (From an e-commerce perspective).

lowest price, buying only from trustworthy shops, buying only products with a fast delivery,

or a combination of the mentioned intentions. The goals are decomposed in parameters. The

customers may use a CSA.

A CSA acts as an intermediary and periodically aggregates pricing information from multiple

online retailers. This information is provided as a price overview at product level. A CSA typi-

cally collects further offer information like delivery time and retailer rankings.

On the other hand, there are online retailers who may decide to spend money for being listed on

the CSAs. An online retailer typically performs price adjustments in order to achieve his goals.

This process is further called pareto price optimization. The goals can reach from maximizing

his margins/sales/customer satisfaction to minimizing costs/stocks/delivery time. The pareto

price optimization can be performed either manually or automated by following a specific ap-

proach.

A competition-based automated approach is offered by repricing providers.4 The main repric-

ing service covers crawling offers from CSAs and calculating price recommendations according

to the chosen strategy of the online retailer. In this way the customer’s criteria are indirectly

represented. Usually the online retailer wants be a on top of the list in the CSA for the purpose

4 During the further course of this thesis automated competition-based repricing activities are associated withrepricing providers. However, there also exist online retailers who have self-developed solutions for thispurpose.

1 Introduction 7

of attracting many customers.

The repricing providers consider prices at a specific point in time (depending on the crawl-

ing interval). However, the calculated price recommendations are future-oriented and valid

until the next process of offer crawling. In a dynamic market environment the pricing strategy

of an online retailer may not be fulfilled. This is exactly where this thesis continues by trying

to derive the other online retailer’s pricing strategies and by predicting prices for the next time

frame.

1.3 Objective

This thesis aims to perform an in-depth examination of pricing dynamics and their underlying

pricing strategies on CSAs on both counts: theoretically and practically. In order to fulfill this

goal, the thesis addresses three central research questions:

1. What are the pricing dynamics on a CSA? Can price series provide enough information in

order to derive advanced pricing insights?

2. To which extent can pricing strategies be extracted?

3. How precise can prices be predicted using the gained pricing knowledge?

Fine-grained evaluation should be conducted by applying state-of-the-art machine learning

methods and comparable approaches. The focus on maching learning methods is determined

by decision trees. The developed approaches should be wrapped in proofs of concepts. This

thesis pioneers by contribution of:

• A thorough market analysis regarding price changes on a CSA

• Extracting pricing strategies on a real dataset of a CSA

• Applying a decision tree approach to predict prices on a CSA

• Predicting prices on a CSAfor all resellers instead of focussing on minimum prices

The practical implication consists of providing hints for transforming this thesis’ applied ap-

proaches into applicable features for repricing providers.

The scope is limited to a business perspective, more precisely a repricing provider per-

spective. Consumers and CSAs are considered as black boxes. This implicates, a consumer

perspective and corresponding questions (like how pricing strategies influence buying decisions

of different kind of consumers etc.) are out of scope for the evaluation. The CSAs are consid-

ered as black box online marketplaces. This means that questions like how can a CSA influence

prices on their platforms are out of scope, too. The price prediction is limited to the next pe-

riod. The thesis operates on historical price information. This implies that no active market

intervention takes place. It is assumed that information like internal sales or demand data is

not available for the repricing providers.

1 Introduction 8

1.4 Structure

The remainder of this thesis is structured as follows: The next chapter gives a conceptual

overview ranging from dynamic pricing basics leading to CSA mechanisms. In chapter three,

the market of repricing providers is examined with focus on Germany and USA. Based on these

findings, repricing strategies and corresponding parameters are derived. A fine-grained mar-

ket analysis of a German CSA is conducted in chapter four. The main chapter five comprises

evaluated proofs of concepts for automated repricing classification, price prediction and pricing

strategy extraction. The sixth chapter concludes the thesis.

1 Introduction 9

2 Literature Review

This chapter supplies background information reaching from dynamic pricing to price predic-

tion. Basic pricing mechanisms on CSAs are explained. Key messages from related work are

embedded in the context of the thesis’ subjects.

2.1 Dynamic Pricing

Dynamic Pricing (DP) describes price analysis and price adjustments of products or services in

a market environment, where prices can easily and frequently be adjusted (Boer 2015a, p. 2).

Such price adjustments are shaped by their realtime character (Lin and Sibdari 2009, p. 969).

Since online retailers operate in such a changeful environment (Boer 2014, p. 863) it is essential

to understand the basic processes and directions of DP. Particularly, regarding that pricing is a

vital aspect of reseller’s activities due to its close link to economic success (Kopalle et al. 2009).

DP has its origin in the travel industry decades ago and is subsequently deployed in the

retail industry (Chen and Chen 2014, p. 1). It enables resellers to increase revenue by synchro-

nizing supply with demand, to respond to dynamic demand patterns and to segment customers

(Chen and Chen 2014, p. 1). DP models can be distinguished into four baseline directions like

presented in table 1.

DP Direction Characterized by

Demand-based Further differentiation in static and dynamic demandcurves with different consumer types

Competition-based Number of competitors in the modeled market

Learning-based Pricing policies that consider uncertainty regardingthe relation between price and expected demand

Inventory-based Depends on reseller’s capacity (reaching from limitedto infinite inventory levels)

Table 1: Baseline DP directions based on Boer (2015) and Gönsch et al. (2013, p. 511).

Demand-based DP is denoted by customer differentiation. Most classical demand-based

DP models depend on a myopic costumer behavior, where the customers buy products as soon

as the price falls below their product valuations (Levin et al. 2009, p. 32). Otherwise, strategic

customer behavior takes future prices into account (Elmaghraby and Keskinocak 2003), which

can be seen as a counteraction to DP. Strategic customers take the option of delaying their

purchases into account (Levin et al. 2009, p. 41). Strategic behavior of customers can have

serious impacts on revenues when DP is not used (Levin et al. 2009, p. 32). Typically, a reseller

wants to adjust his prices aligned on the demand. This alignment has the intention of skimming

the costumer’s reservation prices (Lin and Sibdari 2009, p. 969). A fundamental problem can be

found in the reseller’s state of not knowing the consumers response to different selling prices.

Hence, the revenue-optimizing prices can not be known in advance (Boer 2015b, p. 1). Most

2 Literature Review 10

studies in this field are written under the assumption of stable demand functions, which is not

realistic.

The reseller has to pay attention, when applying customer segmentation with DP. Cus-

tomers expect price changes, as long as there are price changes in the past (Bergen et al. 2003,

p. 668). But there is a fine line between a negative reaction caused by the perception of price

discrimination and potential economical benefits. Especially, since this practice is based on loyal

customers with low price sensitivity (Weisstein et al. 2013, p. 505).

The most interesting direction for this thesis is located in the competition-based DP: This di-

rection requires monitoring of the competitor’s prices. A common assumption in corresponding

models is, that each reseller has in-depth knowledge about the market participants. Exem-

plarily, this knowledge may includes the reseller’s pricing strategy or remaining capacity and

the customer’s reservation prices or demand curves. This assumption is unrealistic (Sato and

Sawaki 2013, p. 223).

Lin and Sibdari (2009) develop a game-theoretic model for DP which is in accordance with

the basic nature of CSAs. It considers competition and price comparison shoppers. Their myopic

buying decisions are based on prices, inventory levels and reservation prices. However, Lin and

Sibdari (2009, p. 971) assume real-time inventory levels as public information.

Levin et al. (2009) develop a stochastic game-theoretic model with DP under competition

and a dedicated strategic customer model. The authors conclude, that strategic customers re-

duce the reseller’s profit. Additionally, if myopic customers are not considered as such, the

reseller’s profit decreases as well. This model requires perfect knowledge of the market infor-

mation including remaining capacity and market segments for both: resellers and customers.

Currie et al. (2007) present a DP model for airline tickets. It is characterized by limited

inventory, a fixed time constraint, finite horizon, changing ticket demand and two competitors.

The competitor’s prices can be modeled with any price function. This function needs to be

known in advance, but this could be improved by using forecasts as an alternative. The actual

optimization problem is solved by calculus of variations and Lagrangian multipliers.

DP with assortments needs special treatment by modeling cross-interactions (Kachani and

Shmatov 2010).

Learning-based DP tries to derive the relation between price and market response. Boer

(2015) uses historic sales data and a corresponding estimator. He forecasts sales via a slid-

ing window linear regression and by giving most recent sales higher weights. However, the

model is under the assumption of a monopolist.

Primarily, Resellers have to perform price experiments in order to learn about the price

which generates the highest profit (Boer 2014, p. 863).

DP can be further used for improving inventory and capacity management (Transchel and

Minner 2009). The interesting part of inventory-based DP is, that this approach indirectly

considers demand which has influence on the inventory.


From a reseller’s point of view, it’s paramount that the own pricing strategy is interwoven with

the competitors and has bidirectional effects (Kopalle et al. 2009).

2.2 Comparison Shopping Agents

A Comparison Shopping Agent (CSA), also known as price comparison website, shopbot or price

comparison engine, acts as intermediary between customer and reseller. The CSA periodically

aggregates objective data (e.g. prices) and quantified subjective data (e.g. service quality) from

multiple online retailers.

CSAs are a popular resource for strategic customers. In 2001, already 45.7% of hardware

online shoppers used CSAs (Zhang and Jing 2011, p. 3). According to a study of Aprimo (2012),

96% of smartphone users want to do price comparisons in future. Over 50% of smartphone

users use their device in local stores for price comparisons, whereas consumer electronics (39%)

is the top category for mobile price comparison.

A typical offer section from a CSA is shown in figure 2. It provides product description,

pricing information, availability, reseller reputation and an affiliate link. This information is

supplied to the customer as sorted, quickly accessible price overviews at product level.

Figure 2: A typical offer section from a CSA (idealo.de).

In general, CSAs can be categorized into the type of relationship to their resellers (Wan et al.

2003, pp. 500-501):

• Independent CSA: There exists no partnership and ads are displayed on the price compar-

ison website.

• Dependent CSA: There exists a contractually partnership whereas the reseller pays for the

offered services.

• Embedded CSA: The comparison mechanism is integrated like implemented by Amazon’s

marketplace.

According to Moraga-González and Wildenbeest (2011, p. 6), the business models of a CSA can

be distinguished based on their revenue model:


idealo.de

1. The customers don’t have to pay and the CSAs are charged either by a flat-fee or more

recently by cost-per-click. The fees can be category dependent like shown by Pricegrabber

or calculated based on transactions like on pricefight.com.

2. Free for both parties e.g. Google Shopping before February 2013.

3. The customers are charged, which is less common.

For example, geizhals.at is an Austrian dependent CSA which has implemented the

first business model. Sellers have to pay fixed fees for clickthroughs. The fee is reduced if

geizhals.at is embedded on the reseller’s web site. In Austria, electronic online resellers

can not afford to not be represented on geizhals.at(Hackl et al. 2014, p. 202).

There is considerable uncertainty about the quality of a CSA (Clement and Schreiber 2013,

pp. 265-269; Mei-Pochtler and Hepp 2013, p. 78). Since a CSA may not provide accurate and

complete information, customers have to use multiple CSAs or perform own online price re-

searches (Pathak 2012, p. 64; Zhang and Jing 2011). A high perceived quality of a CSA is impor-

tant, because it increases the customer’s purchase intention (Bretschneider et al. 2015, pp. 46-

51) and therefore the CSA’s revenue. There also exist less common meta/derivative CSAs which

have no direct connection to resellers, but rather crawl other CSAs (Wan et al. 2003, pp. 502-

503). Examples can be found in roboshopper.net or meta-preisvergleich.de.

Typically, a CSA only provides product price comparisons. A more sophisticated kind of prob-

lem is the price comparison of whole consumer baskets. This class of optimization problems is

known as the ’Internet Shopping Optimization Problem’. Błazewicz et al. (2010, pp. 386-387)

proof that this kind of problem is NP-hard5. The CSA geizhals.de provides such a consumer

basket optimization with a brute force approach and limited by a time constraint.

Pathak (2012, pp. 69-70) discovers significant temporal delays for prices between online

shops and CSAs up to 3.39 days for six major CSAs. Reasons for incomplete information can be

found in temporal delay and selection bias (Pathak 2012, p. 65).

2.2.1 Price Dispersion

During the early stages of electronic commerce, a transformation into archetypal economic mod-

els has been predicted (Brynjolfsson and Smith 2000). The media jumped on the economical

bandwagon and made auspicious promises (Economist 1999):

THE explosive growth of the Internet promises a new age of perfectly competitive mar-

kets. With perfect information about prices and products at their fingertips, consumers

can quickly and easily find the best deals. In this brave new world, retailers’ profit

margins will be competed away, as they are all forced to price at cost.

5 NP-hard is an algorithm complexity class in computer science, which is at least solvable in non-deterministicpolynomial time.


pricefight.com

geizhals.at

geizhals.at

geizhals.at

roboshopper.net

meta-preisvergleich.de

geizhals.de

At a first glance, the prediction seems to be justified by findings like the following: A

CSA is characterized by almost zero sunk costs6, minimal resource requirements and market

transparency (Haynes and Thompson 2008a, p. 4; Haynes and Thompson 2008b, p. 471). Ho-

mogeneous products are offered, which in general are well suited for price comparison (Clement

and Schreiber 2013, pp. 267-268). A CSA enables a strong reduction of the customer’s search

costs (Bakos 1997; Ellison and Ellison 2009, p. 428). Further, a CSA reduces resellers ob-

fuscation techniques (Ellison and Ellison 2009) like false prices or tremendous delivery costs.

This reduction has been confirmed with test purchases (Baye; Morgan, and Scholten 2004,

p. 18). CSAs establish a higher level of price transparency and reduce information asymmetries

(Clement and Schreiber 2013, p. 285).

Why do all these characteristics not lead to price convergence in a multi reseller environment?

In theory, as long as all firms are Betrand oligopolists and the customers are fully informed,

all transactions take place at the perfectly competitive costs (Bakos 1997, pp. 3,10; Baye; Mor-

gan, and Scholten 2004, pp. 4-5,18). This statement is based on the Betrand model. The

Bertrand model implies, that competition should cut prices until the marginal production costs

are reached. The main assumptions are (Tirole 1988, pp. 209-212):

• The offered products are homogeneous products.

• At least two resellers operate in the market, in which the resellers don’t cooperate. More-

over, the resellers have to pay the same product costs.

• The customer is always a strategic buyer, who has no search costs and only makes pur-

chases at the lowest price.

Varian (1980) pioneers by presenting a model which explains price dispersion via search

theory. He differentiates uninformed customers, who buy at random local shops and informed

customers, who know the price distribution of the local shops e.g. by newspapers. The more

resellers, the higher the price dispersion, whereas the price dispersion is explained by different

search costs for the two types of customers. Contradictory, he predicts a positive correlation

between the number of resellers and the average selling price.

Baye and Morgan (2001) transfer the model of Varian to electronic markets, where CSAs

act as intermediary and connect customers and resellers. The customers are distinguished in

customers who use CSAs and customers who don’t. If all resellers are on the CSA this would

lead to Betrand competition which would further lead to price convergence. However, price

dispersion is desired by CSAs in order to ensure their business model. Hence, the reseller fees

must be high enough to prevent that all resellers operate in the CSA in order to maximize its

profit. A lack in this model consists of the assumption of buyer’s fees, which has not been en-

forced.

6 Sunk costs are already incurred irreversible costs.


Grover et al. (2006, pp. 300-302) conducted a meta analysis and identified three main ex-

planatory approaches for price dispersion in electronic markets. These findings are presented

in table 2.

Explanatory Approach Stated by Examplary Reasons

Search costs 10 Papers Reseller loyalty, reputation, product popularity

Service differentiation 6 Papers Fulfillment, ordering process, consumer satis-faction

Market characteristics 6 Papers Number of resellers, stage in product life cycle,average price

Table 2: Price dispersion explanation approaches based on Grover et al. (2006, pp. 300-302).

A key message for explaining price dispersion is the following correlation: The more re-

sellers offer a product, the higher the price dispersion (Haynes and Thompson 2008b, p. 467;

Baye; Morgan, and Scholten 2004) because new resellers introduce their offers with low prices

(Bounie et al. 2012, p. 10).

Hackl et al. (2014) performed a reseller’s margin analysis with data gathered from geizhals.

at combined with wholesale prices from a hardware producer. They rely on daily pricing data

for 70 digital cameras from January 2007 until December 2008. Hackl et al. (2014) observe,

that the more resellers, the lower their margins. The number of substitutes is also essential,

since it is negatively correlated with the margin. The older a product regarding its life cycle,

the lower its price and vice-versa the lower the margin (Hackl et al. 2014, p. 215).

Many researchers have contributed valuable insights on price dispersion enforced by the het-

erogeneity of the electronic market. On customer side, the main differentiation is between

strategic and myopic buyers (consideration of CSA or not) (Varian 1980, p. 652; Grover et al.

2006). Besides prices, customers consider shipping services, availability and reputation among

others (Klausegger 2009, p. 16). This variety of identified customer attributes of interest deliv-

ers supplementing explanation approaches of price dispersion (Zhang and Jing 2011, p. 2). So,

the resellers implement differentiation schemes in order to access different market segments

with the corresponding customer groups (Clay et al. 2001, p. 521).

Furthermore, researchers discovered: The more information overload7 the higher the price

dispersion in electronic markets. The more information equivocality8 the higher the price dis-

persion in electronic markets (Grover et al. 2006).

In summary, the electronic commerce reality has shown that price dispersion is pervasive.

Consequently, the prerequisite for different pricing strategies is ensured.

7 In terms of incomplete information: may lead to ineffective decisions.8 Online feedback systems like consumer ratings are needed for online buying decisions.


geizhals.at

geizhals.at

2.2.2 Customer Characteristics

Prices on CSAs have great impact on the customer’s price perception. As a result, prices on

CSAs serve as internal reference prices and acceptable price ranges (Jung et al. 2014, p. 2084;

Broeckelmann and Groeppel-Klein 2008). Notably, consumers are more sensitive to shipping

costs instead of item prices (Brynjolfsson and Smith 2001, p. 5).

The top three offer selection criteria in a CSA are price, availability and reseller rating. Sur-

prisingly, the main intention of using a CSA isn’t searching the lowest offers (42%), instead it’s

research about best fitting products and actually available manufacturers (51.3%). So, a CSA

has great influence on the manufacturer selection (69.4%). The study has been conducted by

the Austrian CSA geizhals.at by asking 2,000 of their users in 2009 (Klausegger 2009).

The position on CSAs is crucial for strategic customers. Findings from the search result page

clicktrough behavior can be transferred, since the results are ranked in the same manner. Pe-

trescu et al. (2014) analyzed 465,000 keywords on 5,000 websites of google search results.

67.6% of the clickthroughs are generated by the top five hits.

According to Brynjolfsson and Smith (2001, p. 15), 49% of the CSA users chose the lowest

offer for books on the former CSA evenbetter.com. Baye et al. (2009) reveal a difference

between first and second place results in a loss of 60% clicks. Their dataset consists of PDAs

offered on kelkoo.com. Further, they discover a loss of 17% in clickthrough rates for each

competitor positioned above. That’s a vital point for the further progress of this thesis, since

top positions in a CSA generate more clicks and hence sales. Therefore, it is reasonable that

resellers try to reach top positions with advanced pricing strategies.

2.2.3 Reseller Characteristics

The main intention for a reseller of being listed on CSAs is gaining more visibility in order to

increase the sales. Schieder and Lorenz (2012, pp. 18,20) have carried out a study about the

general usage of pricing intelligence with 44 online resellers. 30% of the resellers use methods

of ’dynamic price optimization’. 61.5% of resellers which use dynamic price optimization detect

clear profit increase.

A study from the Austrian CSA geizhals.at confirms the findings above: 60.7% of their

listed resellers observe increasements in sales and profits after being listed. The main reason

for being listed on a CSA is getting new customers. The results are based on an online survey

with 89 resellers (Klausegger 2011, pp. 7-8).

The more customers use CSAs, the higher the resellers pressure for being listed (Clement

and Schreiber 2013, p. 267). Unfortunately, customers show great loyalty to a CSA but not

to the resellers (Zhang and Jing 2011, p. 8). However, a good reseller reputation is important,

especially for resellers with a price premium (Bodur et al. 2015, p. 137). Hence, the reseller

rating (which quantifies the reseller reputation) is positively correlated with the reseller choice

made by customers on CSAs (Bodur et al. 2015, p. 135). Brynjolfsson and Smith (2001, p. 45)


geizhals.at

evenbetter.com

kelkoo.com

geizhals.at

discovered that retailers with highly rated reputation and previous visited retailers have a sig-

nificant price advantage of 3.1% and respectively 6.8% in customers view. Waldfogel and Chen

(2006, pp. 447-448) neglect the importance of reseller reputation. They state, that the more

CSAs are used, the less important is the reputation of the reseller. The reason for this assertion

can be found in the increasing price sensitivity and the accompanying decrease of loyal cus-

tomers (Kocas 2002, pp. 117-118).

The top three pricing business challenges are: Increased price sensitivity of consumers (55%),

increased pricing aggressiveness of competitors (48%) and increased price transparency (47%).

This statement originates from a study based on 123 worldwide online resellers (Baird and

Rosenblum 2015, p. 6).

Riekhof and Wurr (2013, p. 10) asked 231 German resellers for the main obstacles for

pricing. The top two answers are cost calculations (88%) and competitor analysis (70%).

Based on a data set from Amazon US/UK/FR Bounie et al. (2012, p. 1) analyze the Amazon

marketplace. They observe only every 20th day a reseller price adjustment. That’s remarkably

low, since the market analysis chapter 3 of this thesis reveals, that price adjustments of in aver-

age every third day are observed per reseller on a current dataset. The last point illustrates the

problems of the related work with focus on CSAs. Many of them lack of current datasets and

the multiple usage of old datasets, e.g. Bounie et al. (2012) use a dataset from 2006, Zhang and

Jing (2011) use a dataset from 2001 and Ellison and Ellison (2009) use a dataset from 2000-

2001. Since CSAs evolved, more and more repricing providers with high frequency repricing

enter the market (see chapter 3.1) and price comparison can be performed even easier by us-

ing dedicated mobile apps. So, some papers are already outdated before they are published.

This thesis tries to overcome that issue with an in-depth market analysis of a recent dataset.

The dataset is backed by a high frequency crawling interval of 15 minutes which results in an

unprecedented amount of crawled offers for CSAs in literature (21.6 million offers).

2.3 Dynamic Pricing Strategies

This chapter summarizes theoretic automated repricing approaches which are developed in the

literature. Different reseller pricing strategies are self-sustainable in order to map the heteroge-

neous buying strategies of the customers (Grover et al. 2006). Since manual repricing is slow

and expensive and thus inflexbile, a growing need for automated pricing strategies arose. Mul-

tiple pricing strategies have been developed and tested in simulated CSA environments:

Undercutting Strategy | competition-based | Deck and Wilson (2003)

This strategy’s main action relies in underbidding the lowest price by a fixed amount. Sup-

plementary, minimal and maximal price boundaries are set. As soon as the lowest price can’t

be reached, the maximal price is set. The resulting prices are the same as in a game-theoretic

prediction, so that’s what probably happens by manual price settings.


Low Price Matching Strategy | competition-based | Deck and Wilson (2003)

This strategy tries to match the lowest price. Price boundaries exist too. The next reachable

price is matched as long as the lowest offer can’t be reached. Compared to a game-theoretic

prediction, the resulting prices are higher.

Trigger Pricing Strategy | competition-based | Deck and Wilson (2003)

This strategy starts by setting an initial price. If another reseller is below or equal a threshold

(trigger), an associated new price is set. The resulting prices are lower compared to a game-

theoretic prediction.

Beat Half the Market Strategy | competition-based | Hertweck et al. (2009)

This strategy aims for a middle position in the CSA rankings.

Tiered Pricing Strategy | learning-based | Dasgupta and Melliar-Smith (2003)

Dasgupta and Melliar-Smith (2003) introduce a strategy which tries dynamic pricing by deriving

the customer’s intention of the purchase. The intention is distinguished in price-sensitive (com-

parison shopping) and price-insensitive by the reseller selection criterion and the historic pur-

chase behavior. The strategy tries to learn the buyer’s reservation prices. The price-insensitive

buyers are charged with higher prices. The prices for price-sensitive buyers are calculated by in-

corporating historical price and profit data in order to retrieve a polynomial fit. This fit is used

for prediction with non-linear regression of future prices and profits. In theory, this strategy

can increase the reseller’s profit up to 20%. The feasibility of the derivation of the customer’s

purchase intention has been confirmed by other researchers e.g. Moe (2003). She shows,

that resellers have the opportunity to differentiate the shopping behavior of their customers by

analyzing clickstream data. She classifies the derivable shop visitation behavior into directed

buying, search/deliberation, hedonic browsing and knowledge building.

Reinforcement Learning Strategy | learning-based | Kephart et al. (2000)

The reinforcement learning9 strategy is based on Q-Learning and learns anticipated future dis-

counted profits. Subsequently, the repricing policy with the highest future discounted profit is

chosen.

Profit Price Adaption Strategy | learning-based | Kutschinski et al. (2003)

This strategy estimates profits based on current price and price/profit history with a single state

Q-Learner.

Q-Learner Strategy | learning-based | Kutschinski et al. (2003)

This strategy uses Q-Learning in combination with a Boltzmann price selection mechanism. At

the beginning, this mechanism allows a wide range of possible profit functions and keeps getting

9 Reinforcement learning is learning from feedback (Kutschinski et al. 2003, p. 2209). Q-Learning (Watkins andDayan 1992) is a reinforcement learning technique which is based on dynamic programming. It learns to actoptimally in Markovian environments by experiencing the consequences of actions.


more restrictive after each iteration. It learns a profit function and tries to undercut competitors.

Derivative Following Strategy | key performance indicator | Kephart et al. (2000)

This strategy is detached from competition and costumers. It consists of incremental price

changes in one direction. The strategy’s pricing behavior can be aligned on key performance

indicators like profitability or revenue. As soon as the indicator decreases, the price change

direction is reversed.

The adjustments can be enhanced by adaptive stepwise price adjustments (Dasgupta and

Das 2000, p. 4).

Goal-directed Strategy | inventory-based | DiMicco et al. (2001)

This strategy is an inventory-based variation of the Derivative Following Strategy. The main

input parameter is a time span, at which end, a product should be sold. The strategy adjusts

the product prices according to the inventory level and adapts based on inventory changes and

time progress. There is no direct consideration of competitors and buyers.

Ramezani et al. (2011) present an advanced Goal-directed Strategy. It focuses on the num-

ber of products sold and the corresponding changes in inventory. An evolutionary algorithm is

used for optimizing pricing step amplitudes and price change thresholds.

Game-theoretic Strategy | game-theoretic | Kephart et al. (2000)

The strategy calculates a random distribution of prices considering ratios of strategic buyers,

buyer’s reservation prices and the number of resellers. This strategy lacks of the premised

knowledge in advance.

In general, the nature of the presented competition-based approaches is over-simplistic e.g.

they can be reduced to modifications of the later introduced Target Position Strategy. Chapter

3 shows, that in practice, the used pricing strategies are much more elaborated with a phalanx

of adjustable parameters. Furthermore, the complexity of repricing strategies increases due to

the increasing number of configuration parameters (e.g. internal clickstream data like basket

activities, sales, number of product views) (Meyer 2012, pp. 69-70).

The learning-based strategies, which all rely on Q-Learning, suffer from slow learning rates

and unrealistic assumptions of the economic environment. The reason for that is the markov

property, which has to be fulfilled. It states that the environment is not allowed to change

during learning (Kutschinski et al. 2003, p. 2209).

All theoretic strategies are somehow probed under simulated markets but not under real

conditions. So, their actual impacts can’t be stated. In practice, Heynes and Thompson dis-

cover a skimming price strategy. They observe up to 35% reseller fluctuations per week on

nextag.com. This observation can be traced back to so called ’hit-and-run’ pricing strate-

gies, where resellers enter the market with low prices for a short period of time until they exit

(Haynes and Thompson 2008a, p. 19; Haynes and Thompson 2008b, p. 467).


nextag.com

In airline ticket context, (Sato and Sawaki 2013) state that knowledge of the competitor’s

pricing strategy has great impact on maximizing the expected revenue. Generally speaking,

there exists no perfect pricing strategy. Depending on the degree of CSA usage and competitor

strategies, different pricing strategies are more effective (Hertweck et al. 2009, pp. 166-168).

2.4 Pricing Strategy Extraction

Pricing intelligence is exceedingly useful for the resellers. Presently, the top three applica-

tions are: weekly price reviews, adaptive price adjustments and monthly/quarterly key reviews.

This statement originates from a study based on 123 worldwide online resellers (Baird and

Rosenblum 2015, p. 13).

However, gained underlying pricing strategies go a step ahead and can be seen as blueprints

for pricing intelligence. To the best of my knowledge, there exists only a single paper which

addresses pricing strategy extraction on CSAs. Hertweck et al. (2010)’s approach consists of

two main stages:

1. Classifying strategies of competitors

2. Providing best counterstrategies in a simulated market

They model a market with one product, 1000 strategic and myopic customers, four competitors

and a 30 day horizon. The competitors use one of five common strategies: manual, lowest price

match, trigger, derivative following, beat half the market (see previous subchapter 2.3).

During the first stage, random competitor strategies are created. Based on the historic

prices, eleven basic statistic features are derived e.g. the number of prices changes, the price

standard derivation and the average position. A modular neuronal network is trained for each

strategy. The authors achieve a strategy accuracy of 65.3% up to 92.7%. At least three of four

competitor strategies are correctly identified by 85.9%.

The second stage consists of the calculation of a table containing the best counterstrategies

for all strategy combinations.

Hertweck et al. (2010) conclude that a profitability increase by 2.4% is possible in their simu-

lated environment.

Their concept shows several shortcomings: As soon as their approach would consider all com-

petitors, disproportionately more computational effort would be needed for calculating all com-

binations. Furthermore, the more strategies have to be extracted, the less is the probability

to guess them all correct. However, concentrating on the top four competitors is a promising

approach to reduce complexity and achieves good results. Comparing the used features, this

thesis’ features provide a wider spectrum of sophisticated measures (see table 12 with 40 his-

toric features and table 14 with 15 current features). Hertweck et al. (2010) train and evaluate

their model in a synthetic environment, whereas in this thesis a real dataset is used. Finally, the

resellers apply more advanced strategies (see chapter 3), which is by far not covered with the

applied five basic strategies.


2.5 Price Prediction

The closest and most sophisticated approach has been provided by Decide with their service on

decide.com.10 They offered a paid service for predicting the best time when to buy a product

online. Decide analyzed the offer price history on CSAs and informed their customers as soon as

the lowest price has been predicted for the next two weeks. Supplementary, they granted a price

guarantee for compensating suboptimal purchase recommendations: Once a cheaper product

price has been offered within two weeks, the difference to cheapest offer has been paid. Decide

was acquired by eBay on September 201311 and the service was shut down. Decide’s approach

is based on the co-founder’s previous paper:

Etzioni et al. (2003) predict prices for airline tickets and make recommendations for pur-

chase decisions. They combine multiple techniques. Ripper (Cohen 1995) is used as separate

and conquer rule learner based on flight number, remaining hours until departure, current price

and airline. Subsequently, Q-Learning is applied for making purchase decisions for the next in-

terval. It’s based on reinforcement learning and optimizes decisions based on discounted future

rewards (negative and positive). A moving average model, which is a time series prediction

technique, is used to make a secondary purchase decision. All techniques are combined via

stacked generalization into aggregated rules. Etzioni et al. (2003) achieve 4.4% savings of

ticket prices based on a real data set containing 12,000 ticket prices within a 41 day period.

Their savings correspond to 61.8% of the overall possible savings.

Principally, related papers concerning ticket prices in the airline industry are well suited for

the context of this thesis. The airline industry varies prices by seasonality, availability and com-

petition (Etzioni et al. 2003, p. 119) similar to electronic commerce. Both domains deal with

uncertainty comprising future prices (Agrawal et al. 2011b, p. 709). There also operate inter-

mediaries in the form of flight CSAs and the ticket prices exhibit stepwise character too. The

differences are discussed later in this thesis.

A further analogy from the airline industry also addresses purchase recommendations of flight

tickets. Domínguez-Menchero et al. (2014) exploit the general nature of flight prices which be-

comes manifest in the negative correlation via the remaining days until the departure. Instead

of flight prices, their model uses the reciprocal saving rates of the ticket prices for a horizon of

30 days. Every route is estimated by isotonic regression which tries to find a best fit in a point

cloud with non-increasing piece-wise functions.

Groves and Gini (2015) provide a comprehensive approach which can be seen as state-of-

the-art airline ticket price prediction. Their algorithm can be applied for specified routes and

travel dates. Daily decisions are made with a time horizon of up to 60 days. The underlying

concept is composed of a dedicated feature selection algorithm which uses 92 historic and

current features. The current features refer to a particular point in time. Subsequently, a

10 Accessed via the internet archive: https://web.archive.org/web/20130614192602/https://www.decide.com (visited on 09/30/2016).

11 https://www.crunchbase.com/organization/decide-com (visited on 09/30/2016).


decide.com

https://web.archive.org/web/20130614192602/https://www.decide.com

https://web.archive.org/web/20130614192602/https://www.decide.com

https://www.crunchbase.com/organization/decide-com

regression model based on partial least squares is applied. Ripper is used for creating decision

policies. Finally, a parameter search for finding the best configuration of the previous steps has

been applied. In an evaluation, their approach clearly outperforms the base line approach of

Etzioni et al. (2003).

This thesis employs the aspect of feature creation separation as well in chapter 5.2.2.

The main difference between the domain of airline tickets and consumer products can be

found in their nature of usage. The waiting time for consumer products is associated with a

loss in utilization. Airline tickets don’t exhibit a loss in utility by delaying the ticket purchase.

So, for consumer products, there exists a tradeoff between potential price drops and loss in

utility (Agrawal et al. 2011b, pp. 709-710; Agrawal et al. 2011a, p. 352). In contrast, this thesis

focuses on the pure price prediction.

Groves and Gini (2015, 3:5-3:6) detect strong and cyclic patterns in airline ticket prices.

Such patterns may crystallize better compared to electronic markets. Further, Groves and Gini

(2015, 3:7-3:8) differentiate three base types of competing airlines:

• The low category airlines which compete with the cheapest offers.

• The medium category airlines which perform aggressive pricing above the low category

airlines.

• The high category airlines which hold the price premium and rarely adjust their prices.

Compared to the complex strategies found in CSAs in chapter 3, these strategies are easier

to predict. Additionally, ticket prices get usually higher the closer to departure (Domínguez-

Menchero et al. 2014, p. 140), whereas product prices getting lower during their product

market cycle (Agrawal et al. 2011a, pp. 714-715). Hence, the approaches from airline in-

dustry background are a good baseline but they need adjustments for matching the case of CSA.

Agrawal et al. (2011) transfer the knowledge of airline ticket purchase recommendations to

the electronic commerce context. They implement a system that helps customers when to make

a buying decision. The system uses the price history and derived features in order to forecast

future price distributions with autoregressive models and smoothing methods like holt winters

(see chaper 5.2.1). The resulting price distributions are used for building recommendation

policies. Additionally, the authors take sales volume, seasonality and competitive products into

account. However, sales data and data of all surrogates products are not available in most cases.

Lucchese et al. (2012) use a hedonic model for price prediction via autoregressive models in

heterogeneous markets. It is based on the hedonic base assumption which states, that product

quality can be disassembled into product features. Further, the product quality can be associated

with a corresponding price. They consider multiple products and their co-dependencies.

The authors may achieve good results in simulated markets, but in practice all surrogates

would have to be known and needed to be tracked. Besides, product prices are afflicted by more

than the underlying product features e.g. by competition, reseller specific costs or seasonality.


Agrawal et al. (2011) develop an interesting modular concept for encapsulation of purchase

recommendations. They implement three different recommendation strategies for customers in

order to make product purchases on a CSA. Their strategies incorporate forecast algorithms as

black boxes. Further, a loss in utilization is considered which is connected to prolonged waiting

time for the usage of the desired products. The three recommendation strategies process the

calculated forecast distributions by evaluating, for example, the future maximum or average

expected utility.

Ahmad et al. (2016) investigate the offline version of price prediction under competition. They

provide three different approaches of determining local competitors. Subsequently, four vector

based autoregression models are used to predict the retail prices of nearby resellers via historic

prices. Besides, they also can predict wholesale prices.

Their approach is only transferable to a limited extent, since offline characteristics deter-

mine the prices. Such characteristics can be found in the geographical locations, which influence

the customer’s search costs or the predefined competitors.

Research about stock market prediction is not considered in this thesis. Stock prices under-

lay high frequency adjustments, whereas prices on CSAs are more stable and can be described

as piece-wise function. The stock market often depends on events which can abruptly influence

the stock prices. CSAs don’t show an event-triggered impact in large extent. The prices may be

influenced by events like manufacturer price drops, seasonality or campaigns. However, price

pattern recognition techniques from stock market can be employed like shown in chapter 5.3.2.

In summary, the example of decide.com has shown, that prediction of the lowest prices

is possible with high accuracy in the form of high level purchase recommendations. The field of

airline ticket prediction provides promising techniques like dedicated feature creation processes.


decide.com

3 Market Review of Repricing Providers

A repricing provider is an agent that adjusts or recommends prices automatically on the seller’s

behalf in response to changing market conditions (Kephart et al. 2000, p. 732). Often, the term

’price optimization’ is used by the repricing providers. This term is inaccurate, since an optimum

can’t be achieved due to conflicting goals. Only pareto-optimal prices can be reached (Meyer

2012, p. 68).

This chapter provides information about the operating repricing providers in Germany and

in USA. The repricing provider’s websites served as data basis. Unfortunately, they supply only

sparse information of their available repricing strategies and underlying strategy parameters.

However, by considering a wide spectrum of repricing providers a realistic picture of repricing

characteristics crystallizes. The market review has been conducted in December of 2015.

Strategy Description

Target Position Strategy This strategy aims for a specified position in a CSA. It is accom-panied with a price gap parameter to the next competitor anda decision if delivery costs are considered.

Pull-Up Strategy This strategy is a specialized version of the target position strat-egy. First of all, it matches the desired target position. Subse-quently, the strategy raises the price by a predefined amountin the next iteration. If the competitor pulls up too, an upwardpricing spiral has been triggered.

Time Frame Strategy This is a meta strategy, which triggers different other strate-gies or pricing policies based on the current time. Commondifferences exhibit in day/night or workday/weekend.

Sole Vendor Strategy This is a repricing rule which applies as long as no other com-petitors offer the dedicated product on the CSA.

Interlink Strategy This strategy is characterized by an alignment on a specifiedcompetitor. A price gap may is chosen. The same result canbe achieved by applying a whitelist with one competitor to theTarget Position Strategy.

Buy-Box Strategy The main goal of this strategy is to step in Amazon’s buy-box.This is not necessarily achieved by the lowest price becauseother criteria like shop reputation and availability have to beconsidered too.

KPI Maximization Strategy This strategy has no direct reference to the competition since itadaptively orientates on economic key performance indicators(KPI) like sales or profit. Occasionally, customer behavior orseasonal components are incorporated.

Table 3: Summary of observed strategies.

3 Market Review of Repricing Providers 24

A summary of the observed strategy landscape is outlined in table 3. The most popular

strategy is the Target Position Strategy (competition-based). Table 4 shows the main part of

underlying strategy parameters.

Parameter Description

Price Boundary The price boundary specifies a valid price range for repricingactivities.

Gap This parameter defines a price gap according to the aimed com-petitor. The gap can be relative or absolute. A price gap of zeromeans matching the aimed competitor.

Consideration ofDelivery Costs

Are delivery costs for the position calculation on the CSA in-cluded?

Shop Reputation Usually, customers on CSAs have the opportunity to rate theresellers which results in this quantified parameter.

Availability This parameter expresses the availability of the product.

Blacklist This kind of list contains competitors which are excluded fromthe repricing activities.

Whitelist This kind of list contains only competitors which are consid-ered for repricing.

Adjust-ToNextPricier

If the AdjustToNextPricier option is active and the desired tar-get position can’t be reached due to the price boundary, thetarget position is realigned on the next reachable competitor.

Table 4: The underlying strategy parameters.

3.1 Repricing Providers in Germany

The repricing providers in Germany rely on competition-based strategies like shown in ta-

ble 5. wisergermany.de is the only provider which links the competition-based strat-

egy with a demand-based strategy oriented on price elasticity. clousale.com advertises

a high frequency repricing interval of up to two minutes. They rely on the dedicated CSA APIs.

beny-software.de and patagona.de offer an adaptive crawling interval whose crawl-

ing frequency is determined by the degree of price deltas. Four repricers have a very generic

CSA handling by offering integration of further CSAs as required. The other repricers focus on

Amazon and Ebay. Typically, the pricing model orientates on the number of products.

3.2 Repricing Providers in USA

The repricing providers in USA have an emphasis on the Buy-Box Strategy for Amazon like shown

in table 6. Whereas the repricing providers normally hide strategy details, channelmax.net


wisergermany.de

clousale.com

beny-software.de

patagona.de

channelmax.net

Repricing Provider CSAs Strategies Strategy Parameters Pricing Model

becoding.de unlimited Target Position N/A #Products

beny-software.de unlimited Target PositionPull-Up

Price Boundary, Delivery Time, ShopReputation, Delivery Costs Considera-tion

#Products, #CSAs, Setup

clousale.com Amazon, Ebay Target Position Price Boundary, Gap, Blacklist, Whitelist #Sales, #Products, Contract Duration

cludes.de/repricing Amazon Target Position Price Boundary or Wholesale Price*x,Delivery Time, Blacklist

#Products

jtl-software.de Amazon Buy-BoxSole Vendor

Price Boundary, Gap #Sales, #Products

logicsale.de Amazon, Ebay Target PositionTime Frame

Price Boundary, Gap, Blacklist #Products, Contract Duration

patagona.de unlimited Target PositionPull-UpTime Frame

Price Boundary, Gap, Delivery CostsConsideration, Blacklist, Whitelist,AdjustToNextPricier

#Products, Crawling Interval

preisanalytics.de unlimited Target Position Price Boundary, Gap #Products, #CSAs

priceparser.de Amazon Target Position Price Boundary, Gap, Shop Reputation,Delivery Costs Consideration, DeliveryTime, Blacklist, Whitelist

Software License

repricing.de Amazon Target Position Price Boundary, Gap, Blacklist, Whitelist #Products, #CSAs

wisergermany.de Amazon, Ebay Target PositionTime FrameSales

Price Boundary, Gap, Traffic, SalesSpeed, Conversion-Rate

#Products, #Shops

Table 5: Repricing providers in Germany.

supplies a public documentation in which more than 60 repricing parameters are specified.12

appeagle.com and ereprice.com provide instantly repricing by using the price delta

notifications of the Amazon API. darwinpricing.com is based on geographical customer

segmentation and learns price sensitivities of customers via A/B pricing tests. feedvisor.

com advertises a rule-independent full-automated repricing strategy which optimizes profit.

Unfortunately, they don’t give further details except that machine learning techniques are used

which mainly learn from sales data.

3.3 Discussion

There is a broad spectrum of strategies applied in practice. Most strategies are competition-

based and in some few cases machine learning methods are already used. The strategies and

underlying parameters go far beyond the strategies developed in literature (see chapter 2.3).

Buy-Box Repricing in Germany is not as popular as in the USA. In the US market dedicated CSAs

have been emerged e.g. with focus on Airbnb or geographical customer segmentation.

Price boundaries are crucial parameters for repricing activities. Missing price boundaries can

have great impact on the calculated prices. Eisen (2011) provides a good example for miss-

ing maximal boundaries on Amazon. A book from 1992 about the genetics of flies reached

23,698,655.93$ plus 3.99$ delivery costs. Two large book resellers were the only market par-

ticipants:

12 http://www.channelmax.info/wiki/mediawiki-1.15.1/index.php5?title=Talk:RepriceRule (visited on 10/08/2016).


becoding.de

beny-software.de

clousale.com

cludes.de/repricing

jtl-software.de

logicsale.de

patagona.de

preisanalytics.de

priceparser.de

repricing.de

wisergermany.de

appeagle.com

ereprice.com

darwinpricing.com

feedvisor.com

feedvisor.com

http://www.channelmax.info/wiki/mediawiki-1.15.1/index.php5?title=Talk:RepriceRule

http://www.channelmax.info/wiki/mediawiki-1.15.1/index.php5?title=Talk:RepriceRule

• On the one hand, reseller ’profnath’ targeted position one by providing a price of 0.9983

times aligned to the current lowest competitor.

• On the other hand, reseller ’bordeebook’ aimed for position two by providing a price of

1.27059 times aligned to the the current lowest competitor.

This constellation triggered an upward pricing spiral. Far more important are minimum price

boundaries. Missing or disregarded minimum prices can cause high losses. The repricing

providers appeagle.com and repricerexpress.com were responsible for one penny

listings on Amazon and corresponding high losses (Steiner 2012; Holland 2014).

In theory, repricing providers are vulnerable to price wars. So, machine learning techniques

are needed which account the future consequences of pricing (Kephart et al. 2000, p. 749).

In practice, price boundaries limit downward pricing spirals, as long as they are properly set.

However, appropriate countermeasures like the pull-up or profit maximization strategy exist

for further limitation. The more of such interactions are observed on a CSA, the higher the

possibility of exposing underlying pricing strategies.


appeagle.com

repricerexpress.com

Repricing Provider CSAs Strategies Strategy Parameters Pricing Model

appeagle.com Amazon, Ebay, Rakuten Buy-Box Price Boundary, Gap, AdjustToNextPricier, Product Condi-tion, Shop Reputation, Blacklist

#Products, Features

beyondpricing.com Airbnb Profit Maximization Min/Basis Price, Tag, Events, Season, Neighborhood, De-mand

#Sales

bqool.com Amazon Buy-BoxProfit MaximizationInterlink

Price Boundary, Gap, Blacklist, AdjustToNextPricier #Products, Features

channelmax.net Amazon, Rakuten InterlinkBuy-BoxPull-UpTime Frame

Blacklist, Whitelist Price Boundary, Gap, Delivery Costs,Shop Reputation, Product Condition

#Products, Features

darwinpricing.com Own Shop Profit Maximization Location (based on price indices) #Sales, Features

ecomengine.com Amazon Target Position Price Boundary, Gap, Delivery Costs, Interval N/A

ereprice.com Amazon Buy-BoxSole Vendor

Price Boundary, Top N Competitors, Gap #Products

everbooked.com Airbnb Profit Maximization Price Boundary, Demand, Events, Weekday, Season, Avail-ability

#Sales, Features

feedvisor.com Amazon Buy-Box combined withPull-Up

Shop Reputation, Delivery Time, Delivery Costs, PriceElasticity, Prediction?

#Features, #Sales

get4it.com Amazon, Ebay, Bigcommerce Buy-Box Shop Reputation #Products

marketyze.com BestBuy, ReStockIt, sears, Ebay, N/A Rounding N/A

repriceit.com Amazon Target PositionTime Frame

Price Boundary, Delivery Costs, Product Condition, #Offers,Shop Reputation

#Products

repricerexpress.com Amazon Buy-BoxTime Frame

Price Boundary, Gap, Blacklist, Delivery Time, Shop Reputa-tion, Product Condition, AdjustToNextPricier

#Products

solidcommerce.com Amazon, Ebay, sears, Rakuten, newegg,Overstock.com, etsy

Target PositionTime FrameSole Vendor

Price Boundary, Shop Reputation, Product Condition, Costs,Gap, Amazon Costs

N/A

teikametrics.com Amazon, Ebay Profit Maximization Shop Reputation, Blacklist, Delivery Costs N/A

wiser.com see wisergermany.de

Table 6: Repricing providers in the USA.

3M

arketReviewofRepricing

Providers28

appeagle.com

beyondpricing.com

bqool.com

channelmax.net

darwinpricing.com

ecomengine.com

ereprice.com

everbooked.com

feedvisor.com

get4it.com

marketyze.com

repriceit.com

repricerexpress.com

solidcommerce.com

teikametrics.com

wiser.com

wisergermany.de

4 Competitive Market Analysis

This chapter examines the dataset consisting of 21.6 million offers which have been crawled

from a major German CSA. The focus relies on analyzing price changes in order to determine

the degree of pricing dynamics.

It is important to define a price change which is further called price delta as synonym.

Table 7 shows examples for delta calculations. Between n timestamps there can only exist n-1

deltas.

Reseller t1 t2 t3 Deltas PossibleDeltas

Delta Ratio

A 1 1 1 0 2 0.0

B 1 1 2 1 2 0.5

C 1 2 1 2 2 1.0

Table 7: Examples of deltas and delta ratios.

4.1 Approach and Settings

100 products have been chosen by a dedicated product selection process. The selection builds

on a top 40 ranking of the most popular products on the German CSA Billiger.de on 10/25/2015.

This ranking is used for filtering popular categories. Subsequently, a category distribution has

been retrieved. Twenty popular products reflecting the category distribution have been chosen

as reference products. Every product is further grouped to a quintuple by enriching with cor-

responding products. The composition of a quintuple can be obtained from table 8. The 100

selected products are presented in table 9. The described intermediary steps of the product

selection can be found in appendix A.

Distinction Main Dimension

Reference (Base)

Variant Configuration/Appearance

Predecessor Publication Date

Substitute Manufacturer

Cheap Substitute Price

Table 8: The composition of a product quintuple.

The crawling framework of Patagona has been used. It enables offer extraction on a wide

range of European CSAs by high frequency crawling intervals. The crawling framework is fur-

ther treated as black box. idealo.de has been chosen as target CSA since it represents a

4 Competitive Market Analysis 29

idealo.de

Category Quintuple Id Product GTIN Ref

eren

ce

Vari

ant

Pred

eces

sor

Subs

titu

te

Che

apSu

bsti

tute

1 Samsung Galaxy S6 32GB Black Sapphire 8806086676137 �

2 Samsung Galaxy S6 32GB Gold Platinum 8806086936651 �

Q01-G-III 3 Samsung Galaxy S5 16GB Charcoal Black 4250698798406 �

4 Apple iPhone 6 64GB Spacegrau 0888462064101 �

5 Sony Xperia Z3 Compact Black 7311271485889 �

6 Motorola Moto X (2. Generation) 16GB Schwarz 6947681520554 �

7 Motorola Moto X (2. Generation) 32GB Schwarz 6947681521735 �

Q02-G-II 8 Motorola Moto X Walnuß 6947681521148 �

9 LG G3 16GB Schwarz 8806084958235 �

10 Huawei Ascend P7 Schwarz 6901443004836 �

11 Microsoft Lumia 640 schwarz 6438158728189 �

12 Microsoft Lumia 640 orange 6438158724808 �

Q03-G-II 13 Nokia Lumia 635 Schwarz 6438158708068 �

14 Motorola Moto G (2. Generation) 8GB Schwarz 6947681519374 �

15 Samsung Galaxy J1 Schwarz 8806086669122 �

Smartphone 16 Wiko Rainbow Jam 8GB schwarz 4016138998450 �

17 Wiko Rainbow Jam 16GB schwarz 4016138998528 �

Q04-G-II 18 Wiko Rainbow Schwarz 6297000671000 �

19 Motorola Moto E (2. Generation) schwarz 6947681523258 �

20 Sony Xperia E1 Black 4055432001978 �

21 Honor 7 grau 0637825998191 �

22 Honor 7 silber 6901443074020 �

Q05-G-III 23 Honor 6 schwarz 6901443026623 �

24 Huawei P8 Titanium Grey 6901443056705 �

25 LG G Flex 2 16GB Platinum Silver 8806084978172 �

26 Samsung Galaxy S6 Edge+ 32GB Black Sapphire 8806086960687 �

27 Samsung Galaxy S6 Edge+ 64GB Black Sapphire 8806086960601 �

Q06-G-III 28 Samsung Galaxy Note 4 Charcoal Black 8806086371292 �

29 Apple iPhone 6 Plus 16GB Spacegrau 0888462039147 �

30 Motorola Moto X Play 16GB schwarz 6947681527683 �

31 Maxi-Cosi Pebble - Black Raven (2015) 8712930089186 �

32 Maxi-Cosi Pebble - Mosaic Blue (2014) 8712930090366 �

Q07-G-II 33 Maxi-Cosi Pebble - Total Black (2011) 8712930051329 �

34 Cybex Aton Q plus - Storm Cloud 4250183799697 �

35 Römer Baby Safe Plus SHR II Black Thunder 4000984096415 �

36 Gesslein S4 2014 (316000) 4250652384188 �

37 Gesslein S4 2014 (174000) 4250652384096 �

Q08-G-I 38 Gesslein S4 2013 (917000) 4250190167212 �

39 Maclaren BMW 5010902199219 �

Kids 40 Quinny Zapp Red Rumour 8712930081210 �

41 DerDieDas ErgoFlex Panther 4006047405613 �

42 DerDieDas ErgoFlex XL Panther 4006047406610 �

Q09-G-II 43 DerDieDas XLight Candy Castle 4006047404968 �

44 McNeill Ergo Light Plus Caro Softpink 4017245935987 �

45 Scout Buddy Street Soccer 4007953379111 �

46 Milupa Aptamil 2 (800 g) 4008976022336 �

47 Milupa Aptamil 3 (800 g) 4008976022343 �

Q10-V-I 48 Milupa Milumil 2 (800 g) 4008976032878 �

49 Töpfer Lactana Bio 2 (600 g) 4006303122001 �

50 Holle Bio-Folgemilch 2 (600 g) 7640104950394 �

51 Novartis Voltaren Schmerzgel forte 23,2 mg/g (150 g) 08628270 �

52 Novartis Voltaren Schmerzgel forte 23,2 mg/g (100 g) 08628264 �

Q11-V-I 53 Novartis Voltaren Schmerzgel (180 g) 06998784 �

54 Hermes Doc Ibuprofen Schmerzgel (150 g) 4058900010236 �

55 ratiopharm Diclofenac Gel (150 g) 0609788909156 �

56 Stada Grippostad C Kapseln (24 Stk.) 00571748 �

57 Stada Grippostad C Stickpack Granulat (12 Stk.) 09671871 �

Healthcare Q12-V-I 58 Stada Echinacea Classic Tropfen (50 ml) 01309337 �

59 Bayer Aspirin Complex Granulat (10 Stk.) 03227112 �

60 ratiopharm Grippal + C Brausetabletten (20 Stk.) 00999877 �

61 Thierry Mugler Alien Eau de Toilette (30 ml) 3439602810118 �

62 Thierry Mugler Alien Eau de Toilette (60 ml) 3439602810019 �

Q13-V-I 63 Thierry Mugler Womanity Eau pour Elles Eau de Toilette (50 ml) 3439601200118 �

64 Chloé Eau de Toilette (30 ml) 3607340309410 �

65 Jil Sander Eve Eau de Toilette (30 ml) 3607342216754 �

66 Weber Master-Touch GBS 57 cm Black 0077924033025

67 Weber Master-Touch GBS 57 cm Special Edition 0077924032950 �

Q14-G-II 68 Weber One-Touch Original 47 cm Black 0077924003592 �

69 Rösle No.1 Sport F60 Holzkohle-Kugelgrill 4004293250056 �

DIY & Garden 70 Landmann Kugelgrill Black Pearl Comfort (31341) 4000810313419 �

71 Bosch GSR 10,8 V-EC Professional (2 x 2,0 Ah, in L-Boxx) 3165140739108 �

72 Bosch GSR 10,8 V-EC Professional (2 x 2,5 Ah Akkus in L-Boxx) 3165140822114 �

Q15-G-II 73 Bosch GSR 10,8-2-LI Professional 2 x 2,0 Ah + L-Boxx (0 601 868 109) 3165140727495 �

74 DeWalt DCD790D2 (mit 2 x 2,0 Ah Akkus) 5035048410622 �

75 Einhell BT-CD 14,4 2B 4006825538250 �

76 Continental ContiWinterContact TS 830 P 205/55 R16 91H 4019238434033 �

77 Continental ContiWinterContact TS 830 P ContiSeal 205/55 R16 91H 4019238454291 �

Car Q16-G-I 78 Continental ContiWinterContact TS 850 205/55 R16 91H 4019238560688 �

79 Goodyear Ultra Grip 9 205/55 R16 91H 5452000447166 �

80 Nexen Winguard Snow’G 205/55 R16 91H 8807622186608 �

81 Canon EOS 700D Kit 18-55 mm Canon IS STM 3662362017743 �

82 Canon EOS 700D Kit 18-135 mm Canon IS STM 8714574602585 �

Photography Q17-G-III 83 Canon EOS 600D Kit 18-55 mm [Canon DC III] 4960999984094 �

84 Nikon D5300 Kit 18-55 mm Nikon VR II schwarz 0018208935871 �

85 Sony Alpha 58 Kit 18-55 mm 4013675005603 �

86 Philips Senseo Viva Café HD 7825/69 Schwarz 8710103761945 �

87 Philips Senseo Viva Café HD 7825/40 Sizzling Grape 8710103558033 �

Q18-G-I 88 Philips Senseo Original HD 7810/60 schwarz 8710103168836 �

89 Petra KM 42.17 Artenso latte schwarz glänzend 4211129758109 �

Home 90 Petra KM 34.00 4211129851701

91 Apple MacBook Air 13" 2015 (MJVE2D/A) 0888462348164 �

92 Apple MacBook Air 13" 2015 (MJVG2D/A) 4005922018313 �

Computer Q19-G-III 93 Apple MacBook Air 13" 2014 (MD761) 0885909943074 �

94 Asus Zenbook UX305FA-FC159T 4712900139884 �

95 Lenovo IdeaPad U330P (59424883) 0888772347536 �

96 Microsoft Xbox One 500GB 0885370808315 �

97 Microsoft Xbox One 1TB 0885370898279 �

Entertainment Q20-G-III 98 Microsoft Xbox 360 E 500GB 0885370767360 �

99 Sony PlayStation 4 (PS4) 500GB 0711719437017 �

100 Nintendo Wii U Basic Pack 0045496311018 �

Table 9: The 100 selected products.4 Competitive Market Analysis 30

major German CSA. Over a period of 80 days reaching from 11/1/2015 until 1/19/2016 offers

have been crawled. A high frequency crawling interval of 15 minutes has been applied. In total,

the dataset consists of 21,621,484 offers.

The main approach is defined by dividing the dataset in appropriate buckets via multiple

dimensions in order to enable fine-grained statements. The market analysis has been performed

on a computer with an Intel i7-6700 processor with 4x 3.4 GHz and 32 GB memory. The memory

size allows loading and processing the market data on-the-fly without need for using a database.

4.2 Implementation

Primarily scala13 is used as underlying programming language. Once the dataset has been

loaded from JSON product offer files, a multidimensional market analysis is conducted. The

basic concept is shown in figure 3. Initially, the offers are grouped on a high aggregation level.

Offers

➢ MarketPlaceAnalyzer➢ PriceAnalyzer➢ DeltaAnalyzer➢ MinPriceDeltaAnalyzer➢ DeliveryCostAnalyzer➢ ResellerNumberAnalyzer➢ OfferCountAnalyzer➢ PriceLeaderChangeAnalzer➢ PriceTrendAnalyzer➢ MinPriceTrendAnalyzer➢ DeliveryCostDeltaAnalyzer➢ Top3VendorAnalyzer

Apply Analyzers

Two-Dimensional Analysis

All Categories Quintuples Products Resellers

Hours

Weekdays

Yeardays

Price

Price Classes

Availability

Product Grouping Dimension (1D)

Leaf

Dim

ensi

on (

2D)

Offer Buckets(1D x 2D)

GnuplotTemplates

Chartsto plot

UseGnuplot

csv

Figure 3: The market analysis concept.

Either the offers are ungrouped, grouped on product level (product category, quintuple or on

product basis) or grouped on reseller level. Afterwards, the resulting offer segments are further

divided by the second dimension:

13 http://www.scala-lang.org


http://www.scala-lang.org

1. Time 2D: This dimension should illustrate time dependencies between offers. The offers

are grouped by either the hour of day, day of week, or on day-of-the-year basis.

2. Price 2D: This dimension separates the products based on their average price. Either the

price is continuously mapped or the products are classified into three price classes: Class

I (0-100€), Class II (100-300€) and Class III (300-1500€).

3. Availability 2D: This dimension separates the offers in available and out of stock.

A dozen different analyzers are applied to the segmented offer buckets, . The analysis aspects

are ranging from simple task like counting the offers to more complex tasks like calculating the

degree of price leader changes. The analyzers are described in table 10.

Afterwards, the analysis results are written to csv files. Gnuplot14 is used for visualizing the

market analysis charts. Based on individual templates for each analyzer the charts are plotted.

The market analysis and gnuplot scripts are fully parallelized.

4.3 Results

The conduction of the market analysis inclusive plotting lasts two hours and produces 123,268

csv files and corresponding plots. An excerpt of the plots is presented in this subchapter with

focus on price deltas.

4.3.1 1D: All Offers

The dataset contains 21,621,484 offers from 1,589 distinct resellers. The average price of the

offers accounts for 283.45€ with average delivery costs of 2.73€. The top three resellers are

Amazon (66 products), otto.de (47 products) and jacob-computer.de (44 products). The average

delta ratio is 0.31% which corresponds to a price change on every third day per reseller. In

general there are 27% more price cuts than price hikes.

The average minimum price delta is 1.00% which corresponds to a daily minimum price

change rate. This means that three times more repricing activities can be expected on the

first position. Again, there are more minimum price cuts than minimum price hikes (16%).

Principally, more price cuts confirm the detected negative price trend with a slope of -0.000494

and a 95% confidence interval of 0.000140.

Surprisingly at a first glance, the price leader change ratio amounts to 1.64% and hence

is 64% higher than the minimum price delta ratio. The reason can be found in cases of price

classes where more than one price leader exist. If another reseller matches the first position

the minimum price remains. However, a price leader change has actually taken place and is

detected.

Remarkably high is the delivery cost delta ratio of 0.05% which corresponds to a delivery

cost change of every 20th day. A striking aspect is the equipartition of price cuts and hikes

which can be later deduced from time-based delivery cost patterns.

14 http://gnuplot.sourceforge.net


http://gnuplot.sourceforge.net

Analyzer Description

MarketPlaceAnalyzer Offers on idealo.de are often provided indirectly via othermarketplaces. This analyzer groups the offers by their origin. Itdistinguishes the marketplaces of Amazon, Rakuten, Ebay andHitmeister. Finally, a marketplace distribution is calculated.

PriceAnalyzer This analyzer calculates basic price statistics in form of mini-mum, maximum, mean, geometric mean, standard deviationand variance.

DeliveryCostAnalyzer This analyzer calculates basic delivery costs statistics in formof minimum, maximum, mean, geometric mean, standard de-viation and variance.

DeltaAnalyzer This analyzer calculates the price delta ratio subdivided intothe delta directions. If a reseller offers multiple product vari-ants only the variant with the lowest price is considered. Note:If time dimensions are used, the most recent offers from theprevious bucket are considered too.

MinPriceDeltaAnalyzer This analyzer builds on the DeltaAnalyzer. However, it calcu-lates deltas only on the minimum price series of the products.

DeliveryCostDeltaAnalyzer This analyzer builds on the DeltaAnalyzer. It focusses on deliv-ery cost deltas.

PriceLeaderChangeAnalyzer This analyzer builds on the DeltaAnalyzer. However, priceleader changes are considered. A price leader change means ifthe current price leader(s) are not equal to the previous priceleader(s).

PriceTrendAnalyzer In order to assess the overall price trend, this analyzer calcu-lates a linear regression and returns the slope and the 95%confidence interval.

MinPriceTrendAnalyzer This analyzer builds on the PriceTrendAnalyzer. However, itcalculates the trend of the minimum prices.

ResellerNumberAnalyzer The ResellerNumberAnalyzer counts the distinct number of re-sellers.

OfferCountAnalyzer This analyzer returns the plain number of offers.

Top3VendorAnalyzer This analyzer returns the three vendors which offer the mostproducts of the current offer segment.

Table 10: The different offer analyzers.


idealo.de

The offer origin can be obtained from figure 4 whereas 16% originate from marketplaces.

Hitmeister2%

Rakuten3%

Ebay

5%

Amazon

6%

Non-Marketplace

84%

Figure 4: The offer origin on idealo.de.

The hourly-based delta analysis reveals significant differences between daytime and nighttime

like shown in figure 6(a). The lowest price delta ratio is reached at 4 am (UTC) with 0.14%. At

8 am (UTC) the price delta ratio peaks with 0.56%. However, those differences have not been

confirmed for the min price delta ratio.

The delta analysis by weekdays is shown in figure 6(b) whereas one corresponds to Monday.

A clear difference between workday and weekday is evident. The average price delta ratio for

workdays is 0.35% and 0.23% for weekends. Further analysis of price leader change ratios,

minimum price delta ratios and delivery cost delta price ratios come to the same conclusion.

The price trend analysis in figure 5 reveals that mostly on Monday and Tuesday the down-

ward price trend peaks, whereas on Friday an increasing price trend is inclined. However, the

confidence spectrum is high which can be explained by high category differences.

Pric

eTr

end

Time [Weekdays]

95% confidencetrend value

−0.008

−0.006

−0.004

−0.002

0

0.002

0.004

0.006

1 2 3 4 5 6 7

Figure 5: Analysis of price trends on a day of the week base.


idealo.de

Figure 6(c) shows the average price delta ratios distributed based on days. Day one rep-

resents 11/1/2015. The workday/weekend differences are once again underlined. Days 7/8,

14/15, 21/22 etc. are weekends. The price delta ratio reaches its maximum on November the

24th (Tuesday) with 0.83%. Actually, the whole week until November the 27th (Friday) reaches

a top plateau. This can be explained by Amazon’s black Friday week which took place at this

period. The period between Christmas and the beginning of 2016 are characterized by low

repricing activities (day 54 until day 64).

The day-based reseller number analysis reveals that between beginning and end of the month a

decrease of resellers takes place. E.g. in November the distinct reseller number drops from 591

to 560. Some resellers may have limited monthly budget for the listings on CSAs.

The offers can be separated in 15,753,939 available and 5,867,545 out of stock offers. An

offer is available if the delivery time is equal or less than two days. The price delta ratio of

available products (0.29%) is lower than the out of the stock counterpart (0.35%). The same

applies to the availability difference for the minimum price delta ratios (0.97% vs. 1.38%).

The minimum price trend accounts to -0.006568 (0.000362 confidence) and -0.000913

(0.000166 confidence) to unavailable and respectively available products. A possible explana-

tion for the stronger downwards minimum price trend could be a compensation regarding the

longer waiting time.

4.3.2 1D: Product Categories

A category-based analysis of different deltas is presented in figure 8. Regarding the delta price

change ratios, the car category exhibits clearly the highest ratio with 1.96%. Regarding the

minimum delta price change ratios, the car category still exhibits the highest ratio. However,

the healthcare category holds the second highest ratio with 1.74% although in the plain delta

ratio it only reaches 0.17%. This difference by factor ten leads to the conclusion, that a limited

number of resellers apply high frequency repricing. The car category exhibits an average price

leader change ratio of 5.01% and peaks on 24th of November with 10.79%.

The electronic categories show a strong downward minimum price trend like shown in table

11.

Category Price Trend 95% Confidence

Entertainment -0.003617 0.000255

Computer -0.003073 0.000584

Smartphone -0.001848 0.000180

Table 11: Minimum price trends of selected categories.


Del

taR

atio

Time UTC [Hours]

Delta Down RatioDelta Up Ratio

0

0.001

0.002

0.003

0.004

0.005

0.006

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a) Delta analysis of all offers by hours of day.

Del

taR

atio

Time [Weekdays]


0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0.004

0.0045

1 2 3 4 5 6 7

(b) Delta analysis of all offers by weekdays.

Del

taR

atio

Time [Days]


0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

(c) Delta analysis of all offers by days starting from 11/1/2015.

Figure 6: Delta analysis of all offers by different time horizons.

4.3.3 1D: Products

A car tire15 holds the highest average price delta ratio of 2.51%. Furthermore, this product

reaches a price leader change ratio peak on 1/11/2016 with 47.50%. A school backpack16 has

the overall lowest average price delta ratio of 0.01%.

In the first four days of the dataset, the overall minimum price delta is very high . This can

be traced back to products from the healthcare category e.g. GTINs 8628264, 571748, 3227112

and 4058900010236. This behavior is exemplarily shown in figure 8(a) for the product with

GTIN 8628264 (Novartis Voltaren Schmerzgel forte 23,2 mg/g (100 g)). The resellers apolux.de

and apotheke-online.de make high frequency jumps to predefined boundaries.

15 GTIN 401923843403316 GTIN 4017245935987


Del

taR

atio

Categories


0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

Smartphone

KidsHealthcare

DIYandGarden

CarPhotography

Home

Computer

Entertainment

(a) Delta analysis by product categories.

Min

Pric

eD

elta

Rat

io

Categories


0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Smartphone

KidsHealthcare

DIYandGarden

CarPhotography

Home

Computer

Entertainment

(b) Minimum price delta analysis by product cate-gories.

Figure 7: Product categories under consideration of different deltas.

Del

taR

atio

Time [Days]


0

0.002

0.004

0.006

0.008

0.01

0.012

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

(a) Daily deltas of a single product.

Pric

e(E

uro)

Time

apotheke-online.deapolux.de

9.1

9.15

9.2

9.25

9.3

9.35

9.4

9.45

9.5

01.11/00h

01.11/12h

02.11/00h

02.11/12h

03.11/00h

03.11/12h

04.11/00h

04.11/12h

05.11/00h

(b) Excerpt of a price series with alternatingprice leaders.

Figure 8: Product with GTIN 8628264 with two high frequency repricing resellers.

Another car tire17 exhibits the highest average minimum price delta rate of 7.86% which corre-

sponds to a minimum price change within every two hours.

4.3.4 1D: Resellers

This subsection provides three examples of resellers and their repricing activities.

Amazon shows a price change rate of 0.62% which is more than twice as high as the re-

seller’s average. The price hike and price cut ratios are nearly identical. The highest price

change rate is achieved on 12/15/2015 by 2.51%. The highest minimum price change rate is

reached on 12/27/2015 by 5.30%.

17 GTIN 4019238454291


Mindfactory is a good example for explaining the high rate of delivery cost deltas. Mind-

factory offers ’Midnight-Shopping’18. In order to have full order books at morning, Mindfactory

drops the delivery costs from midnight to 6 am for a major part of its assortment. So, the de-

tected average delivery cost delta at midnight (CET) is 31.90%. Overall, Mindfactory reveals an

average delivery cost delta of 2.24%. The delivery cost hikes and cuts are equally distributed.

good-tires [ebay.de] is the reseller with the highest average price delta ratio.19 It accounts

for 55.21% and the price hikes and cuts are equally distributed too.

4.4 Discussion

This chapter focussed on price deltas since they express pricing strategy actions on operational

level. If and only if enough price changes come into the picture, drawing conclusions about

price patterns and pricing strategies is viable. This analysis recognizes an almost seven times

higher price change rate compared to Bounie et al. (2012, p. 1). Potential high frequency market

dynamics have been detected which may be sufficient for deriving pricing strategies and fore-

casting prices. Especially on the first price position are notably repricing activities. Events like

Christmas or Amazon’s black Friday can express itself in alternating price deltas. However, there

are great differences between product categories in price changes. Time-dependent repricing

activities have been discovered such as differences between workday/weekend and day/night.

That may be an indication of automated repricing since e.g. during night the manual repricing

activities should be on a minimum. Vice versa automated repricers should remain.

18 http://www.mindfactory.de/info_center.php/icID/1619 for a reseller with at least 1,000 offers


http://www.mindfactory.de/info_center.php/icID/16

5 Analysis

This chapter’s primary objective is the derivation of pricing insights based on reseller price

histories of a CSA. The foundation of all three tasks is a preprocessing step which calculates

features for the price series. Breaking down the objective, three main tasks have been conducted

like shown in figure 9:

Feature Set Extraction

Chapter 5.1 Automated Repricing Classification

SupervisedClassification

Chapter 5.3Pricing Strategy Extraction

Chapter 5.2Price Prediction

Price Delta Features

Price Features

Position Features

Price Gap Features

SerialExtraction

Price DeltaTransformation

Figure 9: Analysis overview.

1. Task: Supervised classification of reseller price series regarding automated repricing

2. Task: Prediction of minimum and reseller price deltas

3. Task: Pricing strategy extraction with six strategy-dependent extractors arranged in a

serial extraction process

The dataset, presented in the previous chapter 4 serves as base data. The analysis has

been conducted on a computer with an Intel i7-6700 processor with 4x 3.4 GHz and 32 GB

memory. Regarding the implementation, Scala is used as underlying programming language.

All analysis steps are fully parallelized. Weka20 in version 3.8 has been used for the decision tree

approaches. Every following subchapter has a concept section which explains the fundamental

concepts used in the approaches.

20 http://www.cs.waikato.ac.nz/ml/weka

5 Analysis 39

http://www.cs.waikato.ac.nz/ml/weka

5.1 Automated Repricing Classification

The main goal of this subsection is to ascertain the possibility for detecting the origin of a

price series on a CSA. This task distinguishes between manual and automated repricing. Only

if this problem is solvable more in-depth pricing insights are conceivable. In summary, the ap-

proach consists of supervised learning. The historic price series have been classified by repricing

domain experts. Subsequently, decision tree algorithms are applied and validated in a compre-

hensive evaluation. A transition step is performed in order to deduce conclusions for real world

application of the presented approaches.

5.1.1 Concepts

Supervised classification learning is a learning scheme which uses a set of classified examples

(training set) in order to classify unseen examples21 (testing set). This process is called super-

vised since it is provided with the training set inclusive the actual outcome in form of a class.

The testing set is used for applying metrics for determining the success rate (Witten and Frank

2005, pp. 42-43).

Decision tree approaches are a common way for supervised classification learning. Attributes

from the training set are used to establish a tree structure. The tree nodes contain rules for

attribute examinations and the edges represent the actual decisions. The leaf nodes determine

the classes (Witten and Frank 2005, p. 62). A simple decision tree example is shown in figure

10.

Time until due date of master thesis

write thesis

< 30 days

weather

>= 30 days

cycling

good

write thesis

bad

Figure 10: A simple decision tree example.

C4.5 is a decision tree approach which is based on a greedy divide-and-conquer concept. At-

tributes are added top down during tree construction according to their information gain22.

21 Examples are further called instances. Every instance is determined by values of predefined attributes.22 Information gain corresponds to an increase in the average purity of the subsets.

5 Analysis 40

C4.5 uses two pruning concepts for preventing overfitting23. On the one hand, prepruning is

used during the decision tree growing process. A parameter for setting the minimum num-

ber of instances per leaf node can be specified. Additionally, C4.5 only allows pruning for

nodes with at least two successors. On the other hand, a confidence-based postpruning is used.

Since the actual error for the testing set is not known in advance, such error has to be es-

timated by confidence intervals on each node. The confidence intervals and a parametrized

confidence level determine the pruning decision. Furthermore, C4.5 supports missing values,

discrete and continuous attributes and weighting of attributes (Quinlan 1993; Witten and Frank

2005, pp. 187-199).

A REP tree is a decision tree approach which also takes advantage of splitting by information

gain. It uses reduced error pruning. This postpruning technique estimates the expected testing

set error by holding back a training subset. However, the actual tree is build on less training

data. REP trees are optimized for fast execution due to sorting numeric attributes only once. It

supports prepruning with minimum instances and the missing values treatment of C4.5 (Witten

and Frank 2005, pp. 193-194,407-408).

Random forest is a multiple decision tree approach which builds in each iteration a predefined

set of randomized decision trees. The basic algorithm is presented in algorithm 1. The forest

remains unpruned. Unknown instances can be classified by considering all built trees by a

majority vote (Breiman 2001; Witten and Frank 2005, pp. 320-321).

Algorithm 1 Random forest algorithm.for a predefined number of trees do

bootst rap← Select a random training subsetwhile growing a tree with bootst rap do

for every node doSelect a random subset of attributesSplit the node by e.g. information gain

end forend while

end for

Cross validation is an accuracy estimation method. The dataset is randomized and split into

k subsets (folds) of approximately the same size. During k iterations each subset is exactly

used once as testing set whereas the k-1 other subsets are used as training data. An example

of a five-fold cross validation can be found in figure 11. A ten-fold cross validation has become

the standard accuracy estimation method (Kohavi 1995; Witten and Frank 2005, pp. 149-151).

Cross validation can prevent overfitting (Hsu et al. 2003, p. 5).

23 Overfitting describes the problem that occurs if a decision tree’s complexity is too high. In this case, thedecision tree may be well suited for the training set, but it is overfitted due to exactly matching the dynamicsof the training set.

5 Analysis 41

Dataset

1st Fold

2nd Fold

3rd Fold

4th Fold

5th Fold

Figure 11: 5-fold cross validation partitioning scheme.

Grid search is a naive approach for finding a good classifier configuration. A classifier is trained

via trying different predetermined configurations. The configuration with the highest prediction

accuracy is chosen (Hsu et al. 2003, pp. 5-8).

Feature selection approaches try to remove irrelevant and redundant attributes24. Sophis-

ticated wrapper approaches have been developed for the purpose of feature subset selection.

These approaches consider the underlying classifier as black box and measure the classifier per-

formance during feature selection (Kohavi and John 1997; Guyon and Elisseeff 2003).

Auto balancing describes the process of establishing a class balance within datasets. A dataset

is imbalanced if their classes are not equipartioned. Imbalanced datasets have major impacts

on the classifier and prediction accuracy. Classifiers tend to make strong optimizations of their

models for the majority class. At first glance, this leads to good prediction results. However, in

such cases the prediction accuracy of the minority class is often very low.

Therefore, countermeasures have been developed. A simple solution is to establish class

balance by:

• Reweighting the classes

• Reducing the number of instances of the majority class (undersampling)

• Increasing the number of instances of the minority class by injecting duplicates (oversam-

pling)

Chawla et al. (2002) developed SMOTE25 which is a more sophisticated method to over-

come imbalanced datasets. SMOTE generates synthetic instances of the minority class by using

a predefined number of nearest neighbours. Additionally, undersampling of the majority class

is recommended.

In order to measure prediction accuracy various metrics have been developed (Witten and

Frank 2005, pp. 168-173):

24 Features are a synonym for attributes.25 SMOTE stands for Synthetic Minority Over-sampling Technique.

5 Analysis 42

• ROC stands for receiver operating characteristic and describes the trade off between true

positive rate (y-axis) and false positive rate (x-axis): The higher the area under the ROC

curve, the better the model. This metric defines the point until the classifier can separate

between the determined classes.

• Precision (number of documents retrieved that are relevanttotal number of documents that are retrieved ) is a metric for accuracy.

• Recall (number of documents retrieved that are relevanttotal number of documents that are relevant ) is a metric for completeness.

• F-measure(2∗recal l∗precisionrecal l+precision ) combines precision and recall as harmonic mean.

5.1.2 Approach

Supervised classification learning techniques are used. Repricing domain experts have classified

the 7,300 price series on product level in either manual repricing or automated repricing. Clas-

sification criteria were for example: Very frequent price changes, fast price changes in order

to hold a position, price change patterns and price changes at conspicuous times. Two main

classification schemes of automated repricers are further distinguished:

• Pure: This class contains exactly the expert’s classification

• Injected: The injected class incorporates the identified automated repricers per product

from the pure class. Supplementary, if a reseller is at least for two products classified as

automated repricer, his whole assortment is tagged as automated repricing origin. The

basic idea behind this concept is that it is unreasonable for a reseller to apply automated

repricing only on one product.

Per reseller price series a wide range of features has been calculated. The main domains are

delta, gap, position and price feature. Features are tagged as meta feature if competition is

incorporated. The offers are prefiltered by dropping resellers which could not be determined

during the offer crawling process. Further, offers with inexplicable prices are filtered (prices

above 10,000€ or less than 0.01€). Table 12 shows the forty used features.

The features which are preferred by the decision tree classifiers are marked in the ’im-

portance’ column, whereas three stars means most important. Unsurprisingly, the price delta

features are the most selected features. However, other features are important too like the most

frequent cent ending. A manual repricing reseller may have more often prices which are ending

on 99 cents and automated repricing resellers may have other irregular cent endings. A high

distinct price ratio can only be reached if a lot of price changes haven taken place. Availability is

selected as important feature which confirms the observation from the chapter 4.3.1 that price

delta ratios differ regarding availability.

The evaluation scheme is presented in figure 12. The main goal of the evaluation is an

optimized classifier setup in order to provide good prediction results. The training set is auto

balanced by SMOTE since only 5% of the resellers are classified as automated repricers. If

5 Analysis 43

Category Nr Name Description Range Meta Importance

DELTA

1 avgDelta Degree of price changes in reseller price series [0..1] ***

2 avgDeltaToProduct Ratio of avgDelta to average degree of price changes ofproduct

[0..max] �

3 avgDeltaToMinPriceProduct Ratio of avgDelta to degree of min price changes of product [0..max] �

4 avgTop3ShortestChangeRatio Average of the top 3 shortest price change intervals inmilliseconds

[x..max] ***

5 deltaDownRatio Ratio of resellers price increases to all possible price changes [0..1] **

6 deltaUpRatio Ratio of resellers price decreases to all possible price changes [0..1]

7 downUpDeltaRatio Ratio of deltaDownRatio and deltaUpRatio [0..max] **

8 longestPlateau Longest period in days with no price changes [0..80]

9 mainDeltaTime Most frequent hour of resellers price changes [0..23]

10 maxDeltaDayRatio Highest resellers delta ratio achieved on a single day [0..1] **

11 nightDeltaRatio Ratio of how many reseller deltas are made between 23hand 7h

[0..1]

GAP

12 avgGapToMinPrice Relative gap of the reseller price series to the product minprice

[1..max] �

13 avgHigherGap Average absolute gap to next higher offer [0..max] �

14 avgLowerGap Average absolute gap to next lower offer [0..max] �

15 avgRelativeHigherGap Average gap to next higher offer compared to product minprice

[0..1] �

16 avgRelativeLowerGap Average gap to next lower offer compared to product minprice

[0..1] �

17 mainAbsoluteHigherGap Most frequent absolute gap to next higher offer [0..max] �

18 mainAbsoluteLowerGap Most frequent absolute gap to next lower offer [0..max] �

POSITION

19 avgPos Average position exclusive delivery costs [1..max] �

20 avgPosWithDelivery Average position inclusive delivery costs [1..max] �

21 degreeInTop3 Resellers degree in top 3 without delivery costs [0..1] � *

22 degreeInTop3WithDelivery Resellers degree in top 3 with delivery costs [0..1] �

23 degreeInTop10 Resellers degree in top 10 without delivery costs [0..1] �

24 degreeInTop10WithDelivery Resellers degree in top 10 with delivery costs [0..1] �

25 endogenousChangeRatio Ratio of endogenous position changes [0..1] �

26 exogenousChangeRatio Ratio of exogenous position changes [0..1] �

27 maxPosition Maximum position of the reseller without delivery costs [1..max] �

28 minPosition Minimum position of the reseller without delivery costs [1..max] �

29 positionSpan Difference between resellers max and min position withoutdelivery costs

[0..n] �

PRICE

30 avgPriceToProduct Ratio of average price to average product price [0..max] �

31 distinctPriceRatio Number of distinct prices [1..max] *

32 priceSegments Number of coherent price segments in reseller price series [1..max]

33 priceTrend Resellers price trend by linear regression [min..max]

34 priceTrendComparison Ratio resellers price trend to product price trend [min..max] �

35 relativeMedianSpan Relative price span between resellers median price and minprice

[0..1]

36 mostFrequentCentEnding Most often used cent amount by reseller [0..0.99] *

37 relativePriceSpan Relative price span between resellers max and min price [0..1]

REST

38 availability Degree of offer availability for delivery [0..1] *

39 numberOfResellers Number of resellers which are selling this product [1..max] �

40 offerRatio Ratio of average number of reseller offers to average numberof product offers

[0..max] � ***

Table 12: Overview of classification features.

5 Analysis 44

10-fold Cross Validation

DatasetsPure and derive

Injected

Offers

FilterDrop 1.8M

Offers (21.6M) Price Series (7.3K)

PreprocessingCalculation ofFeatures (#40)

ClassificationBy Experts

Training Set Testing Set

Classification Schemes Pure+Injected

Classifiers C4.5+REP Tree+Random Forest

Fea

ture

Wra

pp

er A

pp

roac

h

Feature Selectors Greedy+Binary

5-fold Cross Validation

Grid Search find best classifier config

Classification Schemes Pure+Injected

Classifiers C4.5+REP Tree+Random Forest

Eva

luat

ion

Optimized Features Greedy+Binary

Optimized Feature Setsfor particular evaluation context

UseTraining Set

Testing Setto predict

Applied Metric: Area Under ROC Curve

Autobalancing

Grid Search find best classifier config

Figure 12: The evaluation scheme of the automated repricing classification.

5 Analysis 45

no balancing scheme would be used, prediction models are built which nearly always predict

manual repricing. Appendix B covers the impacts of different balancing mechanisms on the

achieved metrics.

In order to get stable evaluation results a ten-fold cross validation is conducted. Three

different decision tree classifiers have been used: C4.5, REP tree and random forest. Pruning

has been activated for C4.5 and REP trees.

Two feature selectors have been developed. The greedy feature selector adds features in

an iterative way. In each iteration the feature with the highest positive metric gain is selected.

The binary feature selector iterates over all features and builds two classes: with and without

the current feature. Random samples are generated for each class. If the average metric of

the current feature-incorporating class is higher than the class without the current feature,

the current feature is added. Further details of the feature selection schemes can be found in

appendix B.

A naive grid search has been implemented in order to ensure the optimized configuration

of the used classifiers. This brute force approach is needed to ensure inter cross validation-fold

comparability. The grid search parameters can be obtained from appendix C. Feature selection

and grid search are combined in a deep nested way for considering interdependencies.

Before the actual evaluation is triggered, the sophisticated feature wrapper approach is

applied with intention of filtering an optimized feature set. Afterwards, the actual evaluation

can be set up with the previous optimized features for different scenarios with the classification

schemes and classifiers. The applied metric is the area under the ROC curve since it represents

the general classification performance incorporating both classes.

5.1.3 Evaluation

The experts have classified 383 price series as artificial which amounts to 5.25%. The auto-

mated repricing ratio according to the categories is shown in figure 13. The car category is

characterized by the a high degree of 21.68% automated repricing. Hence, a car tyre26 scored

the highest automated repricing ratio with 41.98%.

Figure 15(a) presents the classification results with the pure classification scheme. The results

are the averaged from the 10-folds of cross validation. The full evaluation took 43 hours.

Random forest performs best and achieves with assistance of the binary random feature selector

an average predictive ROC area of 97.11% (testing ROC area was 98.59%). Hereby, the F-

measure accounts for 88.94% whereas the manual repricing part achieves 89.49% and the

automated repricing part reaches 87.33%.

The C4.5 classifier attains up to 95.94% predictive ROC area via the binary feature selector

(testing ROC area was 97.28%). This classifier scores the highest F-measure with 89.97% which

is one percent more than random forest. An example C4.5 tree with pure classification scheme

and binary feature selection is pictured in figure 14. The leafs contain the number of classi-

fied and misclassified instances. The strong correlation of price deltas to automated repricing

classification becomes apparent. Examples of larger C4.5 trees can be viewed in appendix F.

26 GTIN 4019238434033

5 Analysis 46

Aut

omat

edR

epri

cing

Rat

io

Categories

0

0.05

0.1

0.15

0.2

0.25

Smartphone

KidsHealthcare

DIYandGarden

CarPhotography

Home

Computer

Entertainment

Figure 13: Automated repricing ratio of categories.

The fast REP tree places third with up to 95.05% predictive ROC area.

If we look at the injected results in figure 15(b), the predictive ROC area drops significantly

by approximately 6% across all classifiers. The classifier ordering remains unchanged. Conse-

quently, random forest scores best with up to 91.48% predictive ROC area (testing ROC area

was 93.08%). A F-measure of up to 82.14% has been realized. In general, the prediction results

are still at a high level. Detailed results including all class-dependent metrics, tree sizes and

preferred features are shown in appendix E.

Across all scenarios, the binary sampling feature selector performs better with an average

ROC area of 92.67% compared to 91.99% by greedy. The binary feature selector selects in aver-

age thirteen features whereas the greedy feature selector chooses four. In summary, the binary

feature selector calculated three times more trees.

So, an interesting question arises:

Are the promising classification results transferable to practice?

In practice, a high frequency crawling interval of 15 minutes is expensive. Furthermore, price

series information of 80 days may is not available.

Therefore, the crawling interval and the time range will be synthetically reduced for the

subsequent analysis. The tests were conducted with the C4.5 classifier configured with the

binary feature selector using the pure classification scheme under a 10-fold cross validation.

Figure 16(a) shows slow metric decrease by reducing the crawling interval until the two times

daily interval. The ROC area falls from 95.94% (15 min interval) to 94.03% (720 min interval).

This interval incorporates the preferred features: degreeInTop10WithDelivery, deltaDownRatio

5 Analysis 47

maxDeltaDayRatio

deltaUpRatio

<= 0.041667

avgDelta

> 0.041667

manual (3486.0/1.0)

<= 0

deltaUpRatio

> 0

deltaDownRatio

<= 0.49976

avgTop3ShortestChangeRatio

> 0.49976

auto (57.0)

<= 0.52448

degreeInTop10

> 0.52448

manual (164.0/6.0)

<= 0.000264

mostFrequentCentEnding

> 0.000264

manual (135.0/14.0)

<= 0.000087

mostFrequentCentEnding

> 0.000087

auto (226.0/61.0)

<= 0.884916

manual (102.0/18.0)

> 0.884916

avgDelta

<= 66999589.561183

manual (1032.0/17.0)

> 66999589.561183

deltaUpRatio

<= 0.009324

manual (115.0)

> 0.009324

manual (102.0/28.0)

<= 0.625

auto (58.0/12.0)

> 0.625

degreeInTop10

<= 0.004461

longestPlateau

> 0.004461

manual (165.0/2.0)

<= 0.015873

avgDelta

> 0.015873

manual (64.0/10.0)

<= 0.001656

degreeInTop10

> 0.001656

degreeInTop3WithDelivery

<= 0.856523

manual (58.0/21.0)

> 0.856523

maxDeltaDayRatio

<= 0.167389

auto (80.0/10.0)

> 0.167389

manual (53.0/17.0)

<= 0.059536

auto (89.0/30.0)

> 0.059536

manual (84.0/17.0)

<= 0


> 0

degreeInTop10

<= 13801666

longestPlateau

> 13801666

maxPosition

<= 0.000009

auto (4202.0/102.0)

> 0.000009

manual (50.0/23.0)

<= 39.558625

auto (379.0/65.0)

> 39.558625

manual (70.0/14.0)

<= 7.057037

auto (201.0/36.0)

> 7.057037

Figure 14: A generated C4.5 tree.

RO

Car

ea

Feature Selection Mechanism

REP treeC4.5Random Forest

0.945

0.95

0.955

0.96

0.965

0.97

0.975

0.98

Binary

Greedy

(a) Classification prediction results withpure classification scheme.

RO

Car

ea

Feature Selection Mechanism

REP treeC4.5Random Forest

0.86

0.87

0.88

0.89

0.9

0.91

0.92

0.93

Binary

Greedy

(b) Classification prediction results withinjected classification scheme.

Figure 15: Classification prediction results.

and avgTop3ShortestChangeRatio. Ultimately, at a daily crawling interval a predictive ROC area

of 90.89% is reached. At this stage, the overall F-measure amounts to 81.69% and the F-measure

for the automated repricing class (AR) stands at 78.31%. Although we reduced the number of

underlying offers by almost factor 100 the achieved metrics are still on a high level.

Figure 16(b) continues with the synthetic daily crawling interval. Now the time range is

getting reduced. Covering only twenty offers, an average predictive ROC area of 89.89% is still

reached. Nonetheless, the F-measure of the automated repricing class drops to 74.81% and in a

time range of ten days it further decreases to 66.19%.

5.1.4 Discussion

In summary, the random forest classifier and binary random feature selector perform best at au-

tomated repricing detection. The feature selectors significantly reduce the number of features

5 Analysis 48

Met

ric

Sampling Rate [Minutes]

ROC areaF-measureF-measure(AR)

0.7

0.75

0.8

0.85

0.9

0.95

15 90 180360

7201440

(a) Sampling rate curve.

Met

ric

Time Range [Days]

ROC areaF-measureF-measure(AR)

0.7

0.75

0.8

0.85

0.9

0.95

80 40 20 10

(b) Time range curve.

Figure 16: Transition of the classification prediction results from theory to practice.

under minimal prediction accuracy loss. Auto balancing can establish class equipartioning.

Promising classifier results have been achieved for the high frequency crawling interval. A

transition to practice is possible, since even a time range of only 20 daily offers good results.

The author recommends at least a time range of 20 days with a two times daily crawling interval.

However, the ground truth is not known. The results are based on supervised classification by

domain experts. So, the actual automated repricing distribution may looks different.

An interesting subsequent question is if the classifier models can be created in order to

handle different sampling intervals and time ranges all at once. The features could enable it

since they are designed for relative values.

In pursuance of unmasking automated repricing resellers, a single price series may is suf-

ficient to make conclusions about the whole assortment. This information could be used in

practice as additional information by repricing providers. Black lists or dedicated strategies

for handling the uncovered resellers are conceivable. This chapter has shown that automated

repricing classification is feasible. Hence, the foundations for prediction and strategy extraction

are laid.

5 Analysis 49

5.2 Price Prediction

The price prediction abstracts from underlying pricing strategies. The hypothesis is that fore-

casts can be made accurate without strategies. A sophisticated combination of decision and

regression trees has been developed. The approach is compared to a broad spectrum of other

non-feature-based predictors. The divide-and-conquer concept is consistently pursued by con-

sideration of:

• Price delta series instead of plain price series

• Different price delta types

• Multiple stage solutions

An exhaustive evaluation has been conducted by varying prediction intervals, target prediction

series and approach configurations.

5.2.1 Concepts

Hyndman and Athanasopoulos (2014, pp. 46-51) describe a special treatment regarding cross

validation during the process of time series evaluation. Randomization can’t be applied since it

deconstructs the underlying time dependency. Time series cross validation needs to consider

prior observations as training set. In order to produce reliable forecasts a minimum number of

observations k is needed. The basic process is described here:

1. Select the observation at time k+i for the testing set and rely on the observations at time

points 1,2,..,k+i-1 to build the prediction model. Compute the forecast error for time k+i.

2. Repeat the above steps for i=1..n whereas n is the number of desired folds.

An example of a five-fold time series cross validation is shown in figure 17.

Time Series1st Fold

2nd Fold

3rd Fold

4th Fold

5th Fold

Figure 17: 5-fold time series cross validation partitioning scheme.

Regression trees differ from normal decision trees by using continuous values at the leafs

(Quinlan 1992, p. 343). A simple regression tree example is shown in figure 18. The values in

the leafs are averages of the incorporating instances. Regression tree algorithms minimize the

instance variations instead of using information gain as splitting criterion (Witten and Frank

2005, pp. 243-244).

5 Analysis 50

bike condition

0 km cycling

poor

season

top

motivation

winter

tour de france stage live on tv

summer

30 km cycling

normal

60 km cycling

superior

80 km cycling

no

0 km cycling

yes

Figure 18: A simple regression tree example.

M5 trees are developed by Quinlan (1992). They are refined regression trees. Instead of nu-

merical values, multivariate linear models are placed in the leafs. Further, a reduced error

pruning technique is applied as well as smoothing for compensating severe discontinuities be-

tween neighbouring linear models. Wang and Witten (1997) enhanced M5 trees by a number

of additions. A compensation factor is added to the the linear models, a minimum instance

number at the leafs is introduced and treatment for missing values is implemented.

Support vector regression is a non-linear technique that tries to find a flat function for ad-

justing to a training set. A tolerance parameter is given which determines a boundary at which

deviations are considered as relevant (via a loss function). In the first place, the training set is

mapped into a feature space. A dot product calculation is performed by the underlying kernel

function and weights are added. Overfitting is prevented by slack variables for constraint relax-

ation (Smola and Schölkopf 2004). Support vector regression is used for example in financial

market prediction and electric utility forecasting (Sapankevych and Sankar 2009, p. 26).

In order to evaluate numeric forecasts, prediction metrics have to be applied which comprise

the magnitude of prediction error. The forecast error of an ith observation can be described

as the difference between actual and predicted value ei = yi − yi. The two most commonly

used metrics are further presented (Hyndman and Athanasopoulos 2014, pp. 46-51; Witten

and Frank 2005, pp. 176-179):

Mean absolute error MAE = av erage(|ei|)

Root mean squared error RMSE =q

av erage(e2i )

5 Analysis 51

The metrics differ by their scales. RMSE penalizes higher deviations in a stronger way.

This thesis uses the forecast package27 of the statistical software R28. This package provides a

wide range of statistical methods for predicting univariate time series which are briefly recapit-

ulated in table 13.

The Kalman filter estimates the state of a process by minimizing the mean squared error.

A predetermined set of equations is recursively used. Predictions are based on current state

and external variables plus their probability distributions and dependencies (Welch and Bishop

2006).

5.2.2 Approach

The first step of the price prediction approaches is a price series simplification. The series are

converted to a price delta series. Such a normalization process has the advantage of having zero

mean and the same scaling (Arias et al. 2013, p. 8:13).

Besides, three price delta types are distinguished:

1. Simple Delta: This type cuts down the price delta to the essentials: A boolean

value if a price delta takes place or not.

2. Direction Delta: This delta type additionally considers the delta direction. Three

values are possible: Positive delta (price hike), no delta and neg-

ative delta (price cut).

3. Absolute Delta: This delta reflects the actual numerical price delta value.

This trichotomy leads to better solvable prediction sub-tasks.

Primarily, two types of approaches can be separated at high level. On the one hand, there are

time series-based approaches which consider only the delta series. This sort of predictors has

the capability of predicting the price delta for any point in time. The reason for this is that their

skeletons are based on time-dependent functions. On the other hand, there are feature-based

approaches which calculate features on the basis of the price series. An approach overview is

given in figure 19.

The no delta predictor forecasts always no price change. The evaluation will show later, that

such a ’no change predictor’ achieves good results in stable environments. This predictor can

be seen as reference because the repricing providers implicitly rely on the assumption of stable

prices during their price calculations.

27 https://cran.r-project.org/web/packages/forecast/index.html28 https://www.r-project.org

5 Analysis 52

https://cran.r-project.org/web/packages/forecast/index.html

https://www.r-project.org

Prediction Method Description

Autoregressive IntegratedMoving Average (ARIMA)

Autoregressive means regression against itself plus allowance ofa randomized variable (white noise). Moving average models usepast forecast errors instead of past values for prediction. ARIMAcombines both concepts (Hyndman and Athanasopoulos 2014,pp. 223-230; Hyndman and Khandakar 2008, pp. 8-12).

Exponential Smoothing (ETS) The core element of this method is a compensation of pointsin time: Most recent time points have higher weights. (Hynd-man; Koehler, et al. 2002; Hyndman and Athanasopoulos 2014,pp. 171-212)

Box–Cox transform,ARMA errors, Trend,and Seasonal components(BATS)

This technique considers the acronym-incorporated features. Box-Cox transformation tries to optimize the regression models bylogarithmically transforming the target values. The objective isto probe of different transformation parameters and to select thebest transformation. BATS is a multi-seasonal model. It relies onexponential smoothing (Livera et al. 2011).

Trigonometric BATS (TBATS) TBATS is an extended version of BATS with better considerationof non-integer seasonality (Livera et al. 2011).

Holt Winters (HW) This technique is a simple version of exponential smoothing withonly one seasonal component. It tries to decompose the time se-ries into a seasonal, slope and level component (Hyndman andAthanasopoulos 2014, pp. 188-194).

Double-Seasonal Holt Winters(DSHW)

DSHW is an extended version of HW which considers two sea-sonal components (Taylor 2003).

Seasonal and Trenddecomposition using Loess(STL)

STL supports any type of seasonality. In addition, this methodprovides outlier robustness and support for changing seasonalcomponents. Time series are decomposed into seasonal, trendand irregular components (Hyndman and Athanasopoulos 2014,pp. 163-167).

Neural Network Auto Regres-sion (NNETAR)

Artificial neuronal networks allow complex non-linear relation-ships between the response variables and their predictors byreconstructing simplified nerve structures. NNETAR is a feed-forward neural network with hidden (intermediate) layers. Thatmeans that each layer of nodes receives inputs from the previouslayer until the output (prediction) layer is reached. The predictorsare weighted by a learning algorithm that reduces a cost functionlike RMSE (Hyndman and Athanasopoulos 2014, pp. 276-280).

Table 13: Overview of time series prediction methods of R ’s forecast package.

5 Analysis 53

Delta Classes

Parameters

Type of delta to predict

Sample interval OfferConcentrator

Predictor Class Time Series Based- predicts next delta -

Feature Based- predicts period delta -

Predictor Settings

Predictor Sub-Class Simple Weka R Decision/Regression TreeConcept:Avg delta intervalMost common delta

Concept:Uses Pentaho Pluginfor TS Forecasting- Period Detection - Regression Algorithms

default Base Predictor:SVRMultilayer PerceptronLinear Regression

Simple Direction Absolute1 2 3

Concept:1. Kalman Filter for N/A2. Forecast with different model

Models:ARIMABATSETSTBATSSTLHWDSHWNNETAR

Concept:1. Feature Calculation (Current + Historic)2. Direction Delta Prediction with Random Forest3. Absolute Delta Prediction with M5 Tree

Balancing Mechanism:NoneSMOTEWeight-based

Dummy

default

No DeltaConcept:Predicts alwaysno delta

Grid Search:Min Instances#Attributes...

Hybrid- Time Series + Features -

Weka OverlayConcept:See TS-Based+ consider Features viaOverlay Data(„Intervention Variables“)

Base Predictor:SVRMultilayer PerceptronLinear Regression

Pretraining

Figure 19: The price delta prediction concept.

5A

nalysis54

The simple predictor represents an heuristic which calculates the average price delta interval.

As soon as during the prediction process a new begun time interval is detected, the most fre-

quent price delta is predicted.

The R predictor uses the forecast package of R. R is controlled by the rscala29 plugin which

enables R execution within scala. In order to cope with missing values in the dataset, a Kalman

filter with ARIMA approximation has been used to impute the missing values.30 The previously

presented eight time series prediction methods from table 13 can be used as predictors (namely

ARIMA, ETS, BATS, TBATS, HW, DSHW, STL, NNETAR). In general, the forecast package of R

has the benefit of auto-configuration by automatically analyzing the given time series. Since

the Holt Winters methods (HW, DSHW) can not treat zero or negative values, the corresponding

delta series has been transitioned to a positive series. Afterwards, the prediction is translated

back for rematching the delta series.

The weka predictor is based on a forecasting plugin from Pentaho31 which relies on Weka. The

basic concept comprises a time point deconstruction by removing the temporal order. Hereby,

the time points are split up into ’lagged variables’. Up to 24 lags are allowed. The here used

implementation considers time dynamics of: a) hour of day b) morning/afternoon c) working

day/weekend and d) day of week. The actual prediction is based on the lagged variables and

an underlying base predictor. Linear regression, support vector regression (SVR) and a multi-

layer perceptron (MLP) can be selected. Besides, this approach supports balancing by weights

in order to handle imbalanced datasets.

The weka overlay predictor supports the same characteristics as the normal weka predictor

plus the consolidation of overlay data. Overlay data, also known as ’intervention variables’, is

the incorporation of time-specific features. The features of the decision/regression tree predic-

tor approach are used.

The decision/regression tree predictor is based on features. The features are separated into

current and historic features. The fifteen current features are only valid for a certain point in

time and they are as implicit as possible. They are shown in table 14. The introduction of this

new class of features has the main goal of deriving meta rules that consider both classes of

features e.g.

IF currentHour==9 AND mainDeltaTime==9 AND positionLost THEN predict delta.

The decision/regression tree predictor simplifies the prediction task by using two stages:

1. Prediction of the direction delta with a random forest

29 https://cran.r-project.org/web/packages/rscala/index.html30 The R package ’imputets’ has been used for this purpose: https://cran.r-project.org/web/

packages/imputeTS/index.html31 http://wiki.pentaho.com/display/DATAMINING/Time+Series+Analysis+and+

Forecasting+with+Weka

5 Analysis 55

https://cran.r-project.org/web/packages/rscala/index.html

https://cran.r-project.org/web/packages/imputeTS/index.html

https://cran.r-project.org/web/packages/imputeTS/index.html

http://wiki.pentaho.com/display/DATAMINING/Time+Series+Analysis+and+Forecasting+with+Weka

http://wiki.pentaho.com/display/DATAMINING/Time+Series+Analysis+and+Forecasting+with+Weka

Nr Name Description Range Meta

1 currentDay Current week day [UTC] [1..7]

2 currentHour Current hour of day [UTC] [0..23]

3 currentPosition Current position without delivery costs [1..n] �

4 currentPositionWithDelivery Current position considering delivery costs [1..n] �

5 currentLowerGap Current gap to the next lower offer [0..n] �

6 currentHigherGap Current gap to the next higher offer [0..n] �

7 currentAvailability Availability status of the offer [0,1]

8 hoursSinceLastDelta Time in hours since the reseller changed his price [0..n] �

9 hoursSinceLastPositionLost Time in hours since the reseller deteriorated his price rank [0..n] �

10 hoursSinceLastPositionGained Time in hours since the reseller improved his price rank [0..n] �

11 hoursSinceLastEndogenousPositionChange Time in hours since the reseller changed its position causedby own price change

[0..n] �

12 hoursSinceLastExogenousPositionChange Time in hours since the reseller changed its position causedby competitors

[0..n] �

13 hoursSinceLastAvailable Time in hours since the resellers product was available forthe last time

[0..n] �

14 aloneOnPrice Are competitors in the same price class? [0,1] �

15 currentResellers The number of current resellers [1..n] �

Table 14: Overview of prediction features.

2. If an actual price delta is predicted, the actual deflection is determined with a M5 regres-

sion tree. Hereby, only training data of the current predicted delta class are used.

This approach is enriched with the following capabilities:

Grid searchThe classifier configurations are optimized with grid search. The corresponding parame-

ters can be obtained from appendix G.

Auto balancingAuto balancing handles imbalanced datasets.

Assortment predictionThis capability considers alternative averaged reseller delta features regarding the re-

seller’s assortment. The concerned features can be found in chapter 5.1.2.

PretrainingThe predictor can be assigned with further training instances e.g. with all assortment

instances or all minimum price instances.

Market simulationThe price deltas are predicted for a new time frame. The features are recalculated based

on the prediction and iteratively the next price deltas can be forecasted. In this way the

market can be simulated. However, with each forecast the prediction error increases.

5 Analysis 56

Offers

20-fold TS Cross Validation

For Price Series mininmum price / reseller price

For every Accumulated Interval

Calculate FeaturesCurrent Features Historic Features

Current TimeCurrent PositionTime since last delta...

Avg DeltaMain Delta TimeDegree in Top 3...

Create Weka Instances

Grid SearchPredict Delta Class for next periodwith Random Forest

1

Positive Delta Negative DeltaNo Delta

Price Seriesof current Resellers

Grid SearchPredict Exact Deltawith M5 Regression Trees

use only specific delta class instances

2

Filter

Applied Metrics: MAE & RMSE

Predict with Decision/Regression Tree Approach

Figure 20: The evaluation scheme of the decision/regression tree price predictor.

5 Analysis 57

In a preprocessing step, the offers are synthetically concentrated. Thereby, sampling rates of

24, 12 and 6 hours are chosen as concentration intervals. These intervals reflect the prediction

window. A further prerequisite is the selection of the desired delta prediction type. Besides,

the decision/regression tree predictor is either equipped with pretraining, assortment or nor-

mal prediction method plus a balancing scheme. Afterwards, a nested in-depth evaluation is

applied. The simplified evaluation concept is exemplarily shown with the decision/regression

tree predictor in figure 20:

1. A time series cross validation with up to 80 folds has been conducted. In every fold,

only current offering resellers are considered. For example, a 20-fold time series cross

validation with a daily crawling interval of 80 days leads to a first fold which comprises

the first 60 days (corresponds to 60 features and price series points) in order to predict

the price of the 61th day. The cross validation approach used in this thesis keeps training

(60 days) and testing set (20 days) stable. The different simulated crawling intervals of

24/12/6 hours correspond to a 20-/40-/80-fold time series cross validation.

2. Different product aggregation schemes are applied, namely all products and the car prod-

uct category. The car category has been chosen due to its high degree of pricing dynamics

(chapter 4.3.2).

3. Dedicated price series are analyzed: Either a synthetically minimum price series or reseller

price series. The prediction of the minimum price series is a simplification of the prediction

problem. Since more price deltas occur on the first position and the minimum price series

has generally spoken a high economical impact.

4. The additional features for the current timestamp are calculated.

5. The presented two-stage prediction is applied with configurations optimized by grid

search.

6. The error measures MAE and RMSE are calculated.

A full market simulation over multiple periods is not task of the evaluation.

5.2.3 Evaluation

The evaluation’s main goal is the analysis of the developed decision/regression tree approach

(further called decision tree predictor) for predicting price deltas. Excerpts of a sophisticated

analysis as described in the previous chapter are presented, namely:

1. The prediction of minimum prices (100 products) under consideration of different crawl-

ing intervals and delta types

2. The full prediction of reseller prices of a dedicated product category (5 products) under

consideration of different delta types and a daily crawling interval

5 Analysis 58

RMSE has been chosen as target metric due to its harsher punishment of larger devia-

tions. The RMSE values are all averaged. Deviations within the simple delta are accounted with

one. Deviations within the direction delta are accounted by the gap between following charac-

teristics: priceHike=1, noDelta=0 and priceCut=-1. Deviations within the absolute delta are

considered as such. The evaluation has been conducted with a crawling start time at 8 am

(UTC). Appendix H shows the independence of the starting hour by calculation of the RMSE

stability. A starting hour at 8 am (UTC) has been chosen since at this hour the price delta ratio

peaks (see chapter 4.3.1).

Forecasting Minimum Price Series of all ProductsThe first prediction task includes forecasting of all minimum prices. Synthetically created crawl-

ing intervals of 24, 12 and 6 hours have been selected. The broad spectrum of prediction ap-

proaches has been applied by consideration of all delta types. Visualized are eight prediction

approaches depending on the predictor’s performance (in terms of lowest RMSE):

• The no delta predictor which can be seen as reference

• The simple delta predictor

• The two best configured decision tree predictors

• The two best R predictors

• The two best weka predictors

The complete prediction results for the daily crawling interval are unveiled in appendix I.

An offer concentration with daily intervals increases the average minimum price delta to 25%.

Figure 21 presents the prediction results for the simple delta. The pretrained decision tree predic-

tor performs best in all crawling intervals. The reference no delta predictor has been surpassed

by 5.32% (1440 minutes), 5.83% (720 minutes) and 1.40% (360 minutes). The pretrained

decision tree predictor performs slightly better than its simplified version without pretraining.

The R predictors are only within range at the daily interval with a RMSE of 0.4988. The other

predictors reduce errors with a higher crawling rate whereas HW reaches a RMSE of 0.8482

at the 360 minutes interval. The weka predictors with support vector regression (SVR) strongly

reduce their prediction errors with increasing crawling rate. At the highest crawling rate the

weka predictor with SVR achieves a RMSE of 0.2744 which is the third lowest error measure.

The no delta predictor exhibits an error measure of 0.2741 whereas the pretrained decision tree

predictor can take the lead with 0.2702 RMSE.

An interesting point is the observation of price persistence ratios. Price persistence ratios

(PPR) are considered as: PPR = 1 − delta ratio. The no delta predictor has per definition a

PPR of one. The PPR of the pretrained decision tree predictor increases from 93% (24 hours) to

97% (12 hours) and is finally reaching 99% (6 hours). The PPR of the weka predictor with SVR

massively increases (43%⇒ 93%⇒ 98% respectively to the crawling intervals) which explains

5 Analysis 59

RM

SE

Crawling Interval [Minutes]

No Delta PredictorSimple PredictorPretrained Decision Tree PredictorDecision Tree PredictorHWDSHWWeka Predictor (SVR)Weka Overlay Predictor (SVR)

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1440720

360

Figure 21: Minimum price prediction results for simple delta .

the corresponding reduced errors.

Figure 22 presents the direction price delta prediction results. In general, it shows a similar

picture like the previous delta type. The pretrained decision tree predictor takes the lead again.

However, the gaps to the no delta predictor are getting smaller (2.54% ⇒ 1.78% ⇒ 0.10%

respectively to the crawling intervals). Again, the two best R predictors are far behind and the

weka predictors’ PPR is strongly positively correlated with the crawling interval.

In the task of predicting the most difficult delta, the absolute delta, a convergence regarding

the no delta predictor can be observed at a crawling interval of 12 hours. The results are shown

in figure 23. The pretrained decision tree predictor exhibits a RMSE of 6.62 (no delta predictor

6.85 RMSE). The evaluation has only been partly run for the 360 minutes interval due to time-

constraints. Surprisingly, the ETS predictor has a RMSE which is only 0.47% worse than the no

delta predictor even though it has a PPR of 28%. Initially, the weka overlay predictor has a lower

RMSE than its counterpart without overlay (7.04 versus 11.14 at the daily crawling interval).

Across all scenarios the simple predictor has an approximately RMSE which is 10% higher

compared to the no delta predictor. The MLP approaches can’t come close to the leading ones

although they achieved in pre-tests promising results in time series pattern recognition. The

concept of the weka overlay predictor looked promising by consideration of time series, derived

time patterns and features as overlay data. Nonetheless, in this evaluation it could not realize

its potential. In not a single case a decision tree predictor with balancing scheme could manage

to reach the top two decision tree predictors. Therefore, auto balancing is not applicable for

this kind of prediction problem. The lower the sampling interval, the lower the probability of a

price change. So, the approaches are more likely to adapt to the no delta predictor. An example

5 Analysis 60

RM

SE


No Delta PredictorSimple Predictor

Pretrained Decision Tree PredictorDecision Tree Predictor

DSHWARIMA

Weka Predictor (SVR)Weka Overlay Predictor (SVR)

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1440720

360

Figure 22: Minimum price prediction results for direction delta .

RM

SE


No Delta PredictorSimple Predictor

Pretrained Decision Tree PredictorDecision Tree Predictor

ARIMAETS

Weka Predictor (Linear Regression)Weka Overlay Predictor (Linear Regression)

3

4

5

6

7

8

9

1440720

360

Figure 23: Minimum price prediction results for absolute delta.

5 Analysis 61

M5 tree of the pretrained decision tree predictor shows in figure 24 the incorporation of both

feature types for minimum price prediction.

relativePriceSpan

availability

<=0.059

currentHigherGap

>0.059

LM 1 (24/10.457%)

<=0.944

deltaDownRatio

>0.944

LM 2 (3/8.451%)

<=0.417

priceTrend

>0.417

LM 3 (5/6.367%)

<=-0

LM 4 (2/0.23%)

>-0

currentDay

<=1.33

currentHigherGap

>1.33

priceTrend

<=3.5

LM 10 (21/18.76%)

>3.5

mainAbsoluteHigherGap

<=-0

offerRatio

>-0

LM 5 (3/2.136%)

<=0.055

nightDeltaRatio

>0.055

LM 6 (3/11.206%)

<=0.054

LM 7 (3/37.388%)

>0.054

LM 8 (3/9.444%)

<=0.793

LM 9 (6/12.456%)

>0.793

LM 11 (5/26.886%)

<=4.53

LM 12 (5/95.224%)

>4.53

Figure 24: A grown M5 tree for minimum price prediction.

Forecasting Reseller Price Series of the Car Product CategoryThe second prediction task consists of predicting all reseller price series of the car product

category. This category has been chosen because it exposed the highest category delta ratio

in the market analysis (chapter 4.3.2). Hereafter, result excerpts are presented comprising

the prediction results with a daily crawling interval and a comparison between the no delta

predictor and different decision tree approaches. The daily interval has been chosen due to its

high pricing dynamics. The approaches are reduced to the no delta predictor as reference and

different setup decision tree predictors in this excerpt. The approach selection has been made

since both selected predictor types clearly outperform the other approaches. The decision tree

approach is always configured without balancing. Further, the decision tree approach is either

setup with assortment prediction, pretraining or as plain predictor.

The car product category is denoted as high frequency repricing category. After the offer

interval is reduced to a daily one the category exhibits an average daily delta ratio of 35% and

encounters 89 distinct resellers.

Figure 25 presents the prediction results clustered by delta type. Instead of using abso-

lute averaged RMSEs, a new normalized RMSE is introduced. It is normalized regarding the

reference no delta predictor. A lower normalized RMSE means the better in comparison to the

5 Analysis 62

reference predictor. A comparison within the group of decision tree predictors revealed that the

assortment approach can’t provide any advantages over the plain decision tree predictor. The

pretrained approach shows serious advantages over the other two approaches when using the

direction and absolute delta. The pretrained decision tree predictor outperforms the plain de-

cision tree predictor by 45% regarding the absolute delta scenario. However, when using the

simple delta the pretrained predictor is slightly worse (1.74%). The direction delta scenario is

the only prediction case in which the pretrained decision tree predictor can’t take the lead over

the reference predictor. But on the other hand, it can assume the leadership again by 11.01%

over reference predictor in absolute deltas. Complete information of the underlying prediction

values are offered by appendix J.

Nor

mal

ized

RM

SE

Price Delta Types

No Delta Predictor

Decision Tree Predictor

Assortment Decision Tree Predictor

Pretrained Decision Tree Predictor

0.9

1

1.1

1.2

1.3

1.4

1.5

Simple Delta

Direction Delta

Absolute Delta

Figure 25: Reseller price delta prediction results of the car category.

The PPRs of the decision tree predictors can be obtained from table 15. Noticeable is the

high PPR of the pretrained decision tree predictor.

An example M5 tree of the pretrained decision tree predictor shows in figure 26 the incorporation

of both feature types for minimum price prediction.

5.2.4 Discussion

Different decision tree predictors were subject of a thorough evaluation. They were compared to

a broad spectrum of other elaborated approaches. The pretrained decision tree predictor achieves

5 Analysis 63

Delta Type Decision TreePredictor

AssortmentDecision TreePredictor

PretrainedDecision TreePredictor

Simple Delta 68.63% 69.31% 87.01%

Direction Delta 83.68% 83.68% 98.03%

Absolute Delta 75.73% 83.68% 90.92%

Table 15: Price persistence ratios of the decision tree approaches (predictive car category withall resellers).

degreeInTop10WithDelivery

hoursSinceLastEndogenousPositionChange

<=0.959

relativeMedianSpan

>0.959

currentLowerGap

<=8

avgPriceToProduct

>8

hoursSinceLastExogenousPositionChange

<=0.36

degreeInTop10

>0.36

LM 1 (2/4.143%)

<=75

avgLowerGap

>75

LM 2 (4/5.276%)

<=0.558

LM 3 (2/3.655%)

>0.558

LM 4 (4/3.653%)

<=0.858

LM 5 (3/4.825%)

>0.858

LM 6 (6/7.148%)

<=0.888

LM 7 (14/3.55%)

>0.888

LM 8 (15/86.611%)

<=0.033

LM 9 (17/51.637%)

>0.033

Figure 26: A grown M5 tree for reseller price prediction.

the most promising prediction results. In the vast majority of scenarios this predictor is at least

as good as the reference no delta predictor. Thereby, up to 11% error reduction is realized. The

other approaches can not catch up. Most of them are based on plain price series approaches (R

predictors, simple predictor, weka predictors). That alone may be not sufficient for adapting on

the artificial piece-wise price series on CSAs.

The pretrained decision tree approach shows that a focus on a single reseller price series

is not enough. The consideration of the whole assortment by pretraining leads to major pre-

diction improvements. Exemplarily this is shown by the reseller’s absolute delta prediction

in the car category in which the error metric could be improved by 45%. Compared to the

small step of assortment consideration (from one to a maximum of five considered reseller

price series) the pretraining is highly effective. Building on larger datasets should reveal more

5 Analysis 64

prediction-relevant connections like extended reseller assortment data and in general more

training examples which should lead to further error reduction.

In contrast, the usage of assortment features showed no advantages. Auto balancing

schemes have negative prediction implications by increasing the used error metrics. The de-

tachment of the temporal order may deliver a partial explanation.

Prediction of small time frames is not very promising for practice because the pricing en-

vironment is too stable in such cases. Larger prediction time ranges increase the probability of

a price change and therefore makes prediction more reasonable. An one day ahead prediction

would be a good choice for a prediction starting point. Since the minimum price series itself

is very meaningful and easier to predict, the author suggests predicting corresponding absolute

deltas. This information can be wrapped in a price tendency feature which repricing providers

can integrate in their frontend.

Essentially, the presented prediction approaches abstracts from strategies. This has the

major advantage of predicting all reseller prices without knowing the blueprints (strategies).

However, all underlying strategies are somehow considered in terms of implicit mappings via

’hidden layers’. Knowing pricing strategies is very valuable, but knowing all strategies in order

to make full market predictions is an enormous challenge.

5 Analysis 65

5.3 Pricing Strategy Extraction

The pricing strategy extraction relies on a heuristic filtering process with individual filters for

six strategies. The extracted strategies are manually analyzed on sample basis. The extrac-

tion pipeline works with a real dataset instead of synthetic strategies or self-defined models.

Problem-specific methods are applied without need for a training set. Since the task of strategy

extraction is a complex one, the main goal of this analysis is not the full derivation of all pricing

strategies. It is rather seen as proof of concept.

5.3.1 Concepts

Granger causality (Granger 1969) is a statistical test for time series. If past data of time series

X improves the prediction quality of another time series Y than X Grange-causes Y. Granger

causality is based on transforming the Granger’s keynote into a regression model which is solv-

able with a hypothesis test. Shibuya et al. (2009) apply the Granger causality for numerical

and symbolic time series. The authors achieve good results for predicting stock closing prices.

They outperform vector autoregression models for small datasets (small in terms of less than

500 samples).

A motif is a previously unknown pattern in a time series (Tanaka et al. 2005, p. 269). Motif

discovery describes mining of motifs (Lin; Keogh, et al. 2002, p. 1). Discretization transforms

continuous time series data into a discrete equivalent. Discretization is regularly done in motif

discovery by a transformation into a symbolic representation (Lin; Keogh, et al. 2002, pp. 3-4).

A piecewise aggregate function is often used for that purpose (Minnen et al. 2007, p. 3). A

sliding window method is applied for consideration of temporal delays (Tanaka et al. 2005,

pp. 279-281). Random projection can be used for locating approximately equal motifs (Minnen

et al. 2007, pp. 3-4). Tanaka et al. (2005, pp. 279-281) present a motif discovery scheme.

The most frequently appearing patterns are extracted based on a symbolic representation and

a sliding window method. The original time series data is put back and the distances of the

mined pattern classes can be calculated. Finally, thresholds filter the discovered motifs. Minnen

et al. (2007) present an approach for discovering recurring patterns in multivariant time series

data.

5.3.2 Approach

The basic idea behind the strategy extraction concept is a serial extraction process with strategy-

specific methods. Once identified, the strategy is not put back into the extraction pipeline.

Therefore, the extractor order is relevant. The extraction pipeline can be seen as handcrafted

decision tree. The strategy extraction process is shown in figure 27. Besides pricing strategy

types, the extractors are intended to derive the underlying strategy parameters.

A reseller price delta ratio, that is higher than 0.3%, is a precondition for the automated

repricing strategies (time frame, pull-up, target position and interlink). This limit is aligned on

the average delta ratio discovered in the market analysis (chapter 4.3.1). Dedicated strategy

extractors have been implemented which are described in following:

5 Analysis 66

Price Series

Features

StaticStrategyExtractor

Hit&RunStrategyExtractor

Time FrameStrategyExtractor

Pull UpStrategyExtractor

Target PositionStrategyExtractor

InterlinkStrategyExtractor

UnknownStrategy

Figure 27: The pricing strategy extraction pipeline.

1. Static Strategy ExtractorFirstly, the static strategy extractor is applied. A static strategy is characterized by no price

changes. As long as no price changes are detected the strategy is identified as static.

2. Hit and Run Strategy ExtractorA hit and run strategy is characterized by low prices, short offering periods and has been already

detected in CSAs (Haynes and Thompson 2008a, p. 19; Haynes and Thompson 2008b, p. 467).

The corresponding extractor analyzes all price segments (periods of offering) of the reseller’s

price series. If all price segments have a maximum duration of three days and the reseller price

is below the average price, the strategy is identified as hit and run strategy. The number of

price segments, the average offering time in hours and the average position are returned as

underlying strategy parameters.

3. Time Frame Strategy ExtractorThe time frame strategy extractor is two-parted:

1. Initially a motif discovery extractor is applied. It uses the jMotif SAX-VSM32 library which

has been introduced by Senin and Malinchik (2013). The extractor is set up with window

sizes for discovering motifs in the course of days and weeks, as well as a delta alphabet.

The window sizes comprise a minimum number of three weekly occurrences and five

daily appearances. Beforehand, the offers are concentrated to an hourly basis in order to

smooth the price series. The price series is simplified to a direction delta series. Only valid

motifs are accepted. Valid means that all three delta direction characteristics occur or two

of them with each: Two or more occurrences. At least three/five repetitions are required

for weekly/daily motifs. The discovered motif, the number of appearances, the first and

last appearance as well as the identified window size are derived as underlying strategy

parameters.

32 https://github.com/jMotif/sax-vsm_classic

5 Analysis 67

https://github.com/jMotif/sax-vsm_classic

2. The second extractor is a simplified heuristic which is able to identify if two major daily

price change times exist. This can be seen as minimum criterion for further automated

time based strategies. They should each account for more than 25% of all deltas and their

gap should be at least two hours. Due to its generic strategy description this extractor

is applied at last. The two major price change times are returned as underlying strategy

parameters.

4. Pull-Up Strategy ExtractorThis extractor is a special case of the target position strategy extractor. Therefore, the detection

of a target position strategy is a prerequisite. It basically detects if a top three position is hold

and two major price gaps exist which each at least account for 40% of all gaps. The gap to the

next lower competitor should be zero because matching the better position is the first half of

this special strategy. The higher gap should be within ten cents. The pull-up gap is added as

additional underlying strategy parameter.

5. Target Position Strategy ExtractorThe target position strategy extractor expects the following conditions to be fulfilled:

• The average position is lower ten because high target positions are assumed to be unreal-

istic due to its missing customer impact.

• A main position is detected. That means more than 70% of the price series the reseller is

staying on this position. Either:

– A main position with delivery costs is hold.

– A main position without delivery costs is hold.

The target position, the price gap, minimum and maximum boundaries, main price change time

in hours (UTC) and the delivery cost consideration are retrieved or rather analyzed from the

reseller features.

6. Interlink Strategy ExtractorThis strategy extractor performs a Granger causality test. It uses R in combination with the lme-

test library33. The underlying logic can be obtained from algorithm 2. This extractor retrieves

the interlinked competitor, the interlink lag and the corresponding p-value.

A preprocessing step is performed which cleans the offers of unidentified resellers and inexplica-

ble prices. If a reseller offers multiple product variants which are all assigned to a single GTIN,

only the variant with the lowest price is considered. In addition the offers are concentrated on

hourly basis.

33 https://cran.r-project.org/web/packages/lmtest/index.html

5 Analysis 68

https://cran.r-project.org/web/packages/lmtest/index.html

Algorithm 2 Interlink strategy extractor scheme.concentrate market data . 120/360 min for daily/weekly motifsconvert reseller price series to delta series . direction deltafor all resellers do

nearCompeti tors← filter surrounding competitors . +- 2% avg reseller pricefor competitor in nearCompeti tors do

for all allowed lags do . [1..23]congruentSeries← calculate intersection: reseller↔ competitorif congruentSeries.size>threshold then . two weeks

causal i t y ← calculate Granger causalityend if

end forend formaxCausali t y ← filter maximum causalityif maxCausali t y .pValue≤threshold then . 0.01

found interlink strategyend if

end for

5.3.3 Evaluation

The preprocessing step yielded in 6,632 analyzable reseller price series. The strategy extraction

pipeline assigned the reseller price series to the following pricing strategies:

• 3,484 manual strategies (static and hit&run strategies)

• 2,641 unknown strategies

• 507 automated strategies (time frame, target position, pull-up and interlink strategies)

The distribution of extracted strategies is plotted in figure 28.

In chapter 5.1 an automated repricing classification dataset has been created by domain experts.

In comparison, the heuristic strategy extraction achieved a recall of 44.90% and a precision of

33.93% of the mentioned dataset. The found automated repricing ratio corresponds to 7.64%.

This ratio corresponds to 5.25% in the classification dataset (chapter 5.1.3).

Subsequently, samples of the resulting strategy buckets are analyzed manually.

Hit and Run StrategyThe 450 discovered hit and run strategies originate from 282 distinct resellers. This strategy

bucket is characterized by an average of 12.54 price segments with a maximum price segment

count of 683. The average offering time is 10.71 hours. Of the 450 strategies, 111 are positioned

in the top three.

Interlink StrategyThe interlink strategy extractor achieves promising results. Approximately every second found

interlink strategy can be justified by manual inspection (sample basis of n=20). Figure 29

5 Analysis 69

Num

ber

ofSt

rate

gies

Extracted Strategy Types

30342641

450 319148 35 5

0

500

1000

1500

2000

2500

3000

3500

StaticUnknown

Hit and Run

Interlink

Time Frame

Target Position

Pull Up

Figure 28: Extracted pricing strategies.

shows two strongly interlinked reseller price series of a car tyre (GTIN 4019238454291). Re-

seller mein-reifen-outlet.de Granger-causes giga-reifen.de. giga-reifen.de has implemented an

interlink strategy and responds within five hours by price alignments. Figure 30 shows two

weakly interlinked reseller price series of a notebook (GTIN 888462348164). Reseller acom-

pc.de Granger-causes future-x.de.

The interlink strategy extractor tends to detect co-dependencies. The initiator is inclear in

such cases. Nearly full congruent reseller time series has been tracked down. Such an in-

terlink has been proven for reseller electronis.de and mp3-player.de for a smartphone (GTIN

888462039147). A closer look (domain check) at those resellers reveals that they belong to the

same company. The same is true for the resellers notebooksbilliger.de and nullprozentshop.de.

nullprozentshop.de responds with slightly delayed price adjustments (within two hours). These

two findings may explain the observed co-dependencies. The strategy extractor spots injected

own offers from other market places, too. Such co-dependencies could be further exploited in

order to derive shop connections and draw a corresponding map.

However, due to the multi-reseller dependency it is conceivable that one price change can

cause multiple resellers to react which triggers a cascade. The reverse engineering of such

cascades or even nested cascades makes locating the original cause more complicated.

Time Frame StrategyThe detected time frame strategies can be divided into 125 simple and 23 motif time frame

strategies. The motif discovery time frame strategy extractor achieved eminent results. All

classified time frame strategies has been manually verified. Six weekly-based and seventeen

daily-based motifs have been discovered.

Reseller reifensuche.com applies a night-based time frame strategy which can be seen in fig-

ure 31 (GTIN 4019238454291). Between 1 am (UTC) and 5 am (UTC) the prices are regularly

5 Analysis 70

Pric

e(E

uro)

Time

mein-reifen-outlet.degiga-reifen.de

81

82

83

84

85

86

87

88

89

90

91

05.11.2015

19.11.2015

03.12.2015

17.12.2015

31.12.2015

14.01.2016

Figure 29: Interlink between mein-reifen-outlet.de and giga-reifen.de.

Pric

e(E

uro)

Time

acom-pc.defuture-x.de

990

1000

1010

1020

1030

1040

1050

1060

05.11.2015

19.11.2015

03.12.2015

17.12.2015

31.12.2015

14.01.2016

Figure 30: Interlink between acom-pc.de and future-x.de.

5 Analysis 71

adjusted to predefined price steps. In this case, the extractor identified 23 times occurrences of

the most frequent motif (between 11/7/2015 and 12/30/2015). Further, this reseller applies

the described strategy on all of his products from the car category.

Pric

e(E

uro)

Time

reifensuche.com

80

82

84

86

88

90

92

94

05.11.2015

19.11.2015

03.12.2015

17.12.2015

31.12.2015

14.01.2016

Figure 31: Night time frame strategy.

Another assortment repricing strategy has been detected for reseller plus.de. This reseller adjusts

his assortment at 0 am (UTC) with predefined prices which can be seen in figure 32.

Unfortunately, the second time frame extractor (heuristic) has been proven as error-prone. No

correct classified time frame strategy has been found on sample basis (n=20) This extractor

shows vulnerability for misclassification of time series containing price segments.

Target Position StrategyThe target position strategy extraction heuristic is underperforming. Out of 35 supposed target

position strategies only two could be manually verified as a real target position strategy. Both

correspond to the same reseller parfumgroup.de. The derived strategies are shown in table

16. Reseller parfumgroup.de tries to underbid the competitor easycosmetic.de and triggers a

downward pricing spiral. Remarkably is the reaction interval of only 30 minutes and the non-

existing boundaries.

Pull-Up StrategyThere is evidence that three extracted price series are correctly assigned to a pull-up strategy.

The underlying extractor assumes apolux.de to be using the pull-up strategy for three products:

GTIN 6998784, 8628270 and 4058900010236. A price series of apolux.de can be found in

figure 8(b) of chapter 4.3.3. The same high-frequency ’pull-ups’ around position one can be

also seen in the three assumed products.

5 Analysis 72

Pric

e(E

uro)

Time

plus.de (GTIN 4006825538250)plus.de (GTIN 4211129851701)

36

38

40

42

44

46

48

50

52

54

05.11.2015

19.11.2015

03.12.2015

17.12.2015

31.12.2015

14.01.2016

Figure 32: Daily assortment repricing strategy.

Reseller GTIN Targ

etPo

siti

on

Del

iver

yC

osts

Con

side

rati

on

Pric

eG

ap

Mai

nPr

ice

Cha

nge

Tim

e(U

TC

)

Min

imu

mPr

ice

Max

imu

mPr

ice

parfumgroup.de 3439602810118 1 true 0.11€ 5 am 26.42€ 29.70€

parfumgroup.de 3439602810019 1 true 0.11€ 5 am 40.49€ 50.94€

Table 16: Correctly identified target position strategies.

5 Analysis 73

Pric

e(E

uro)

Time

parfumgroup.deeasycosmetic.de

42.5

43

43.5

44

44.5

45

45.5

46

24.12.2015

31.12.2015

07.01.2016

14.01.2016

Figure 33: The target position strategy in action (GTIN 3439602810019).

5.3.4 Discussion

This chapter has shown that, depending on the extractor’s quality, pricing strategy extraction is

possible. The bottom line is three-parted.

1. There are the manual strategy extractors (namely static and hit & run) which are based

on simple assumptions. These extractors can explain over 52% of all used strategies.

2. The profound automated repricing strategy extractors (namely interlink and time frame

with motif discovery) can realize their potential.

3. The heuristic-based strategy extractors (namely target position and simple time frame)

fail in its mission. The heuristics are not sufficient in order to cope with the dynamic

environment on a CSA and their artificial price series. Moreover, it is difficult to justify

the heuristic’s thresholds and values which may be overfitted for the dataset.

The presented bucket pre-selection approach has serious disadvantages such as ignoring the

type II error and building only on predefined strategies. However, it doesn’t need training data

and can be stacked with problem-specific extractors.

5 Analysis 74

A practical implementation should focus on small steps: Derivation of strategy parameters like

minimum and maximum price boundaries and crawling date distributions. This information

is already meaningful for the customers. In order to tackle the strategy extraction, the author

suggests a combined approach of the manual strategy extractors (static and hit&run) plus the

classification engine from chapter 5.1.

5 Analysis 75

6 Conclusion

This thesis was driven by unmasking and exploiting prices in e-commerce. Backed by a recent

dataset with 21.6 million crawled offers this thesis goes far beyond existing literature of CSAs.

A market review of repricing providers revealed that a broad spectrum of sophisticated

pricing strategies is already deployed. A competitive market analysis has given evidence for ad-

vanced pricing dynamics which differ on dimensions like time and product layer. The main goal,

namely the derivation of pricing strategies and the forecasting of prices, has been approached by

a consequent divide-and-conquer concept. The smallest step towards extracting pricing strate-

gies is the ability to separate between handcrafted and artificial price series. This task has been

addressed by supervised classification grounded on multiple decision trees and rich auto-tuning

techniques. Even with a restricted daily crawling interval, accurate classification is still possible.

The linchpin of the price prediction is a two-stage decision tree predictor using random forests

and M5 regression trees. The approach exploits assortment knowledge in order to predict single

price series. Up to 11% less prediction errors are made in comparison to a reference stable price

predictor while outperforming a wide range of time series predictors. These results reveal that

strategies itself are not necessary to make good predictions. The presented approach benefits

from the assortment. Vice versa unleashing more optimization potential is expected when larger

assortments are considered. The strategy extraction concept builds on serial extraction with dif-

ferent strategy-dependent filters. The Granger causality-based interlink strategy extractor and

the motif discovery time frame strategy extractor show promising results. However, the general

approach has shortcomings like ignoring the type II error and partially relying on heuristics.

These separated heuristics could not fulfill its task of strategy extraction.

This thesis is limited by using only one CSA whereas cross CSA interactions are not considered.

Further, only a single step ahead prediction is performed. Complete market simulations could

be the next prediction level. Changing and enlarging the selected products could reveal more

interactions and assortment insights. The main machine learning approach concentrates on

decision trees. However, other promising concepts should be explored like artificial neuronal

networks or vector autoregression.

The thesis is enriched by giving practice-oriented transitions for repricing providers whenever

possible.

Enhancing the prediction performance could be achieved by smart recombination (stacking) of

the prediction approaches. Since the presented decision tree predictor relies on two stages, it

can be reused for this experiments. Additionally, executing multiple predictions and deciding

by majority vote is conceivable.

In order to overcome the strategy extraction issue, an all-new concept should be consid-

ered: The gained insights from the market analysis could be used in combination with an imple-

mentation of the identified strategies in order to generate synthetic price series with underlying

strategies. Afterwards, unsupervised learning algorithms are able to train a strategy-dependent

6 Conclusion 76

model. An evaluation should be conducted with a real dataset which enables the estimation of

the prediction accuracy and the adherence that overfitting is avoided. Furthermore, a reseller

price series should not be tagged with an assumed strategy. Instead, a probability vector should

be supplied for all strategies.

Another promising approach would be probing different repricing algorithms and predic-

tion schemes on a real reseller. In such a case, the real strategy and prediction impacts could be

measured instead of relying on theoretic models.

The reverse engineering of price intelligence is viable and the usage of that knowledge brings

us back to the repricing future.

6 Conclusion 77

Bibliography

Agrawal, R.; Ieong, S., and Velu, R. (2011a). Ameliorating Buyer’s Remorse. In: Proceedings ofthe 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD’11. San Diego, California, USA: ACM, pp. 351–359.

– (2011b). Timing when to Buy. In: Proceedings of the 20th ACM International Conference onInformation and Knowledge Management. CIKM ’11. Glasgow, Scotland, UK: ACM, pp. 709–718.

Ahmad, H. W.; Zilles, S.; Hamilton, H. J., and Dosselmann, R. (2016). Prediction of retailprices of products using local competitors. In: International Journal of Business Intelligence andData Mining 11 (1), pp. 19–30.

Aprimo (2012). Showrooming Uncovers a New World of Retail Opportunities. URL: http://www.teradata.de/Resources/White-Papers/Showrooming-Uncovers-a-New-World-of-Retail-Opportunities (visited on 09/26/2016).

Arias, M.; Arratia, A., and Xuriguera, R. (2013). Forecasting with twitter data. In: ACM Trans-actions on Intelligent Systems and Technology (TIST) 5 (1), 8:1–8:24.

Baird, N. and Rosenblum, P. (2015). Pricing 2015: Learning To Live In A Dynamic, Pro-motional World. RSR Retail Systems Research. URL: http : / / www . rsrresearch .com/research/pricing- 2015- learning- to- live- in- a- dynamic-promotional-world (visited on 09/26/2016).

– (2013). Tough Love: An In Depth Look at Retail Pricing Practices. RSR Retail Systems Re-search. URL: http://www.rsrresearch.com/research/tough-love-an-in-depth-look-at-retail-pricing-practices (visited on 09/26/2016).

Bakos, J. Y. (1997). Reducing Buyer Search Costs: Implications for Electronic Marketplaces. In:Management Science 43 (12), pp. 1676–1692.

Baye, M. R.; Gatti, J. R. J.; Kattuman, P., and Morgan, J. (2009). Clicks, discontinuities, andfirm demand online. In: Journal of Economics & Management Strategy 18 (4), pp. 935–975.

Baye, M. R. and Morgan, J. (2001). Information gatekeepers on the internet and the competi-tiveness of homogeneous product markets. In: The American Economic Review 91 (3), pp. 454–474.

Baye, M. R.; Morgan, J., and Scholten, P. (2004). Price dispersion in the small and in the large:Evidence from an internet price comparison site. In: The Journal of Industrial Economics 52 (4),pp. 463–496.

Bergen, M.; Ritson, M.; Dutta, S.; Levy, D., and Zbaracki, M. (2003). Shattering the myth ofcostless price changes. In: European Management Journal 21 (6), pp. 663–669.

Błazewicz, J.; Kovalyov, M.; Musiał, J.; Urbanski, A., and Wojciechowski, A. (2010). In-ternet Shopping Optimization Problem. In: International Journal of Applied Mathematics andComputer Science 20 (2), pp. 385–390.

Bodur, H. O.; Klein, N. M., and Arora, N. (2015). Online price search: Impact of price comparisonsites on offline price evaluations. In: Journal of Retailing 91 (1), pp. 125–139.

Boer, A. V. d. (2015a). Dynamic pricing and learning: Historical origins, current research, andnew directions. In: Surveys in Operations Research and Management Science 20 (1), pp. 1–18.

– (2014). Dynamic Pricing with Multiple Products and Partially Specified Demand Distribution.In: Mathematics of Operations Research 39 (3), pp. 863–888.

– (2015b). Tracking the market: Dynamic pricing and learning in a changing environment. In:European Journal of Operational Research 247 (3), pp. 914–927.

Bounie, D.; Eang, B.; Sirbu, M. A., and Waelbroeck, P. (2012). Online Price Dispersion: AnInternational Comparison. Tech. rep. Department of Economics and Social Sciences, TelecomParisTech, pp. 1–19.

Breiman, L. (2001). Random Forests. In: Machine Learning 45 (1), pp. 5–32.

6 Bibliography 78

http://www.teradata.de/Resources/White-Papers/Showrooming-Uncovers-a-New-World-of-Retail-Opportunities



http://www.rsrresearch.com/research/pricing-2015-learning-to-live-in-a-dynamic-promotional-world



http://www.rsrresearch.com/research/tough-love-an-in-depth-look-at-retail-pricing-practices

http://www.rsrresearch.com/research/tough-love-an-in-depth-look-at-retail-pricing-practices

Bretschneider, U.; Gierczak, M. M.; Sonnick, A., and Leimeister, J. M. (2015). Auf derJagd nach dem günstigsten Preis: Was beeinflusst die Kaufabsicht von Nutzern von Produkt-und Preisvergleichsseiten? In: Marktplätze im Umbruch. Springer, pp. 43–53.

Broeckelmann, P. and Groeppel-Klein, A. (2008). Usage of mobile price comparison sites at thepoint of sale and its influence on consumers’ shopping behaviour. In: The International Reviewof Retail, Distribution and Consumer Research 18 (2), pp. 149–166.

Brynjolfsson, E. and Smith, M. D. (2000). Frictionless Commerce? A Comparison of Internet andConventional Retailers. In: Management Science 46 (4), pp. 563–585.

– (2001). The great equalizer? Consumer choice behavior at Internet shopbots. Tech. rep. 4208-01.MIT Sloan Working Paper, pp. 1–63.

Chawla, N. V.; Bowyer, K. W.; Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: Synthetic Mi-nority Over-sampling Technique. In: Journal of Artificial Intelligence Research 16 (1), pp. 321–357.

Chen, M. and Chen, Z.-L. (2014). Recent Developments in Dynamic Pricing Research: Multi-ple Products, Competition, and Limited Demand Information. In: Production and OperationsManagement 24 (5), pp. 704–731.

Clay, K.; Krishnan, R., and Wolff, E. (2001). Prices and price dispersion on the web: evidencefrom the online book industry. In: The Journal of Industrial Economics 49 (4), pp. 521–539.

Clement, R. and Schreiber, D. (2013). Internet-Ökonomie. Springer Berlin Heidelberg.Chap. Leistungsfähigkeit elektronischer Märkte, pp. 255–299.

Cohen, W. W. (1995). Fast Effective Rule Induction. In: Proceedings of the twelfth internationalconference on machine learning, pp. 115–123.

Currie, C. S. M.; Cheng, R. C. H., and Smith, H. K. (2007). Dynamic pricing of airline ticketswith competition. In: The Journal of the Operational Research Society 59 (8), pp. 1026–1037.

Dasgupta, P. and Das, R. (2000). Dynamic pricing with limited competitor information in amulti-agent economy. In: Proceedings of the International Conference on Cooperative Informa-tion Systems. Springer, pp. 299–310.

Dasgupta, P. and Melliar-Smith, P. M. (2003). Dynamic consumer profiling and tiered pricingusing software agents. In: Electronic Commerce Research 3 (3-4), pp. 277–296.

Deck, C. A. and Wilson, B. J. (2003). Automated pricing rules in electronic posted offer markets.In: Economic Inquiry 41 (2), pp. 208–223.

DiMicco, J. M.; Greenwald, A., and Maes, P. (2001). Dynamic Pricing Strategies Under a FiniteTime Horizon. In: Proceedings of the 3rd ACM Conference on Electronic Commerce. EC ’01.Tampa, Florida, USA: ACM, pp. 95–104.

Domínguez-Menchero, J. S.; Rivera, J., and Torres-Manzanera, E. (2014). Optimal purchasetiming in the airline market. In: Journal of Air Transport Management 40 (C), pp. 137–143.

Economist, T. (1999). Frictions in cyberspace. URL: http://www.economist.com/node/346410 (visited on 10/06/2016).

Eisen, M. (2011). Amazon’s $23,698,655.93 book about flies. URL: http : / / www .michaeleisen.org/blog/?p=358 (visited on 10/08/2016).

Ellison, G. and Ellison, S. F. (2009). Search, obfuscation, and price elasticities on the internet.In: Econometrica 77 (2), pp. 427–452.

Elmaghraby, W. and Keskinocak, P. (2003). Dynamic Pricing in the Presence of Inventory Consid-erations: Research Overview, Current Practices, and Future Directions. In: Management Science49 (10), pp. 1287–1309.

Etzioni, O.; Tuchinda, R.; Knoblock, C. A., and Yates, A. (2003). To Buy or Not to Buy: MiningAirfare Data to Minimize Ticket Purchase Price. In: Proceedings of the Ninth ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining. KDD ’03. Washington, D.C.:ACM, pp. 119–128.

Bibliography 79

http://www.economist.com/node/346410

http://www.economist.com/node/346410

http://www.michaeleisen.org/blog/?p=358

http://www.michaeleisen.org/blog/?p=358

Gönsch, J.; Klein, R.; Neugebauer, M., and Steinhardt, C. (2013). Dynamic pricing with strate-gic customers. In: Journal of Business Economics 83 (5), pp. 505–549.

Granger, C. W. J. (1969). Investigating Causal Relations by Econometric Models and Cross-spectralMethods. In: Econometrica 37 (3), pp. 424–438.

Grover, V.; Lim, J., and Ayyagari, R. (2006). The dark side of information and market efficiencyin e-markets. In: Decision Sciences 37 (3), pp. 297–324.

Groves, W. and Gini, M. (2015). On Optimizing Airline Ticket Purchase Timing. In: ACM Trans-actions on Intelligent Systems and Technology (TIST) 7 (1), 3:1–3:28.

Guyon, I. and Elisseeff, A. (2003). An Introduction to Variable and Feature Selection. In: Journalof Machine Learning Research 3 (March), pp. 1157–1182.

Hackl, F.; Kummer, M. E.; Winter-Ebmer, R., and Zulehner, C. (2014). Market structure andmarket performance in E-commerce. In: European Economic Review 68 (1), pp. 199–218.

Haynes, M. and Thompson, S. (2008a). Entry and exit behavior at a shopbot: E-sellers as kirzne-rian entrepreneurs.

– (2008b). Price, price dispersion and number of sellers at a low entry cost shopbot. In: Interna-tional Journal of Industrial Organization 26 (2), pp. 459–472.

Hertweck, B. M.; Rakes, T. R., and Rees, L. P. (2009). The effects of comparison shoppingbehaviour on dynamic pricing strategy selection in an agent-enabled e-market. In: Internationaljournal of electronic business 7 (2), pp. 149–169.

– (2010). Using an intelligent agent to classify competitor behavior and develop an effective E-market counterstrategy. In: Expert Systems with Applications 37 (12), pp. 8841–8849.

Holland, M. (2014). Fehler bei Amazon: Hunderte Waren für einen Penny verkauft. URL: http://www.heise.de/newsticker/meldung/Fehler-bei-Amazon-Hunderte-Waren-fuer-einen-Penny-verkauft-2490907.html (visited on 10/08/2016).

Hsu, C.-W.; Chang, C.-C., and Lin, C.-J. (2003). A Practical Guide to Support Vector Classifica-tion. Tech. rep. National Taiwan University, Taipei 106, Taiwan.

Hyndman, R. J. and Athanasopoulos, G. (2014). Forecasting: Principles and Practice. Otexts.292 pp.

Hyndman, R. J. and Khandakar, Y. (2008). Automatic Time Series Forecasting: The forecastPackage for R. In: Journal of Statistical Software 26 (3), pp. 1–22.

Hyndman, R. J.; Koehler, A. B.; Snyder, R. D., and Grose, S. (2002). A state space frameworkfor automatic forecasting using exponential smoothing methods. In: International Journal ofForecasting 18 (3), pp. 439–454.

Jung, K.; Cho, Y. C., and Lee, S. (2014). Online shoppers’ response to price comparison sites. In:Journal of Business Research 67 (10), pp. 2079–2087.

Kachani, S. and Shmatov, K. (2010). Competitive Pricing in a Multi-Product Multi-AttributeEnvironment. In: Production and Operations Management 20 (5), pp. 668–680.

Kephart, J. O.; Hanson, J. E., and Greenwald, A. R. (2000). Dynamic pricing by softwareagents. In: Computer Networks 32 (6), pp. 731–752.

Klausegger, C. (2011). Geizhals Händlerbefragung 2011. URL: http://unternehmen.geizhals . de / about / files / presse / Geizhals _ Haendlerstudie _30112011.pdf (visited on 09/26/2016).

– (2009). Österreichische Konsumenten unter der Lupe. URL: http : / / unternehmen .geizhals.de/about/files/presse/Geizhals_Userbefragung.pdf (vis-ited on 09/26/2016).

Kocas, C. (2002). Evolution of Prices in Electronic Markets Under Diffusion of Price-ComparisonShopping. In: Journal of Management Information Systems 19 (3), pp. 99–119.

Kohavi, R. (1995). A Study of Cross-validation and Bootstrap for Accuracy Estimation and ModelSelection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence -

Bibliography 80

http://www.heise.de/newsticker/meldung/Fehler-bei-Amazon-Hunderte-Waren-fuer-einen-Penny-verkauft-2490907.html



http://unternehmen.geizhals.de/about/files/presse/Geizhals_Haendlerstudie_30112011.pdf



http://unternehmen.geizhals.de/about/files/presse/Geizhals_Userbefragung.pdf

http://unternehmen.geizhals.de/about/files/presse/Geizhals_Userbefragung.pdf

Volume 2. IJCAI’95. Montreal, Quebec, Canada: Morgan Kaufmann Publishers Inc., pp. 1137–1143.

Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. In: Artificial Intelligence97 (1-2), pp. 273–324.

Kopalle, P.; Biswas, D.; Chintagunta, P. K.; Fan, J.; Pauwels, K.; Ratchford, B. T., and Sills,J. A. (2009). Retailer Pricing and Competitive Effects. In: Journal of Retailing 85 (1), pp. 56–70.

Kutschinski, E.; Uthmann, T., and Polani, D. (2003). Learning competitive pricing strategies bymulti-agent reinforcement learning. In: Journal of Economic Dynamics and Control 27 (11-12),pp. 2207–2218.

Levin, Y.; McGill, J., and Nediak, M. (2009). Dynamic Pricing in the Presence of Strategic Con-sumers and Oligopolistic Competition. In: Management Science 55 (1), pp. 32–46.

Lin, J.; Keogh, E.; Lonardi, S., and Patel, P. (2002). Finding Motifs in Time Series. In: Proceed-ings of the 2nd Workshop on Temporal Data Mining, pp. 1–11.

Lin, K. Y. and Sibdari, S. Y. (2009). Dynamic price competition with discrete customer choices.In: European Journal of Operational Research 197 (3), pp. 969–980.

Livera, A. M. D.; Hyndman, R. J., and Snyder, R. D. (2011). Forecasting Time Series WithComplex Seasonal Patterns Using Exponential Smoothing. In: Journal of the American StatisticalAssociation 106 (496), pp. 1513–1527.

Lucchese, G.; Ketter, W.; Dalen, J. van, and Collins, J. (2012). Forecasting Prices in DynamicHeterogeneous Product Markets Using Multivariate Prediction Methods. In: Proceedings of the13th International Conference on Electronic Commerce. ICEC ’11. Liverpool, United Kingdom:ACM, 26:1–26:10.

Mei-Pochtler, A. and Hepp, M. (2013). Die neue Welt des Handels. In: Retail Business. Springer,pp. 77–98.

Meyer, S. (2012). Dynamische Preisoptimierung im E-Commerce. In: Information Managementund Consulting Sonderausgabe, pp. 68–75.

Minnen, D.; Isbell, C.; Essa, I., and Starner, T. (2007). Detecting Subdimensional Motifs: An Ef-ficient Algorithm for Generalized Multivariate Pattern Discovery. In: Seventh IEEE InternationalConference on Data Mining (ICDM 2007). Institute of Electrical and Electronics Engineers(IEEE), pp. 1–10.

Moe, W. W. (2003). Buying, Searching, or Browsing: Differentiating Between Online ShoppersUsing In-Store Navigational Clickstream. In: Journal of Consumer Psychology 13 (1-2), pp. 29–39.

Moraga-González, J. L. and Wildenbeest, M. R. (2011). Comparison sites. Tech. rep. 933. IESEBusiness School - University of Navarra, pp. 1–31.

Pathak, B. K. (2012). Comparison shopping agents and online price dispersion: A search cost basedexplanation. In: Journal of Theoretical and Applied Electronic Commerce Research 7 (1), pp. 64–76.

Petrescu, P.; Ghita, M., and Loiz, D. (2014). Google Organic CTR Study 2014. Advanced WebRanking. URL: https://www.advancedwebranking.com/google-ctr-study-2014.html (visited on 09/26/2016).

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Francisco, CA, USA: MorganKaufmann Publishers Inc.

Quinlan, R. J. (1992). Learning with Continuous Classes. In: 5th Australian Joint Conference onArtificial Intelligence. Singapore: World Scientific, pp. 343–348.

Ramezani, S.; Bosman, P. A., and Poutre, H. L. (2011). Adaptive Strategies for Dynamic PricingAgents. In: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelli-gence and Intelligent Agent Technology. Vol. 2. Institute of Electrical and Electronics Engineers(IEEE), pp. 323–328.

Bibliography 81

https://www.advancedwebranking.com/google-ctr-study-2014.html

https://www.advancedwebranking.com/google-ctr-study-2014.html

Riekhof, H.-C. and Wurr, F. (2013). Steigerung der Wertschöpfung durch intelligentes Pricing:Eine empirische Untersuchung. Tech. rep. 2013/02. PFH Private Hochschule Göttingen.

Sapankevych, N. and Sankar, R. (2009). Time Series Prediction Using Support Vector Machines:A Survey. In: IEEE Computational Intelligence Magazine 4 (2), pp. 24–38.

Sato, K. and Sawaki, K. (2013). A continuous-time dynamic pricing model knowing the competi-tor’s pricing strategy. In: European Journal of Operational Research 229 (1), pp. 223–229.

Schieder, C. and Lorenz, K. (2012). Pricing-Intelligence-Studie 2012. Technische UniversitätChemnitz. URL: https://www.tu-chemnitz.de/wirtschaft/wi2/wp/wp-content/uploads/2012/04/Pricing-Studie-State-of-the-Art-im-E-Commerce_v1.5.pdf (visited on 09/26/2016).

Senin, P. and Malinchik, S. (2013). SAX-VSM: Interpretable Time Series Classification Using SAXand Vector Space Model. In: 2013 IEEE 13th International Conference on Data Mining. Instituteof Electrical and Electronics Engineers (IEEE), pp. 1175–1180.

Shibuya, T.; Harada, T., and Kuniyoshi, Y. (2009). Causality quantification and its applications.In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery anddata mining - KDD ’09. Association for Computing Machinery (ACM), pp. 787–796.

Skorupa, J. (2014). Pricing Intelligence Goes to War. RIS. URL: http://risnews.edgl.com/retail-research/Pricing-Intelligence-Goes-to-War90346 (vis-ited on 09/26/2016).

Smola, A. J. and Schölkopf, B. (2004). A tutorial on support vector regression. In: Statistics andComputing 14 (3), pp. 199–222.

Steiner, I. (2012). Appeagle Repricing Glitch Causes Penny Listings on Amazon. eCOMMERCEBYTES. URL: http://www.ecommercebytes.com/cab/abn/y12/m07/i18/s02(visited on 10/08/2016).

Tanaka, Y.; Iwamoto, K., and Uehara, K. (2005). Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle. In: Machine Learning 58 (2-3), pp. 269–300.

Taylor, J. W. (2003). Short-term electricity demand forecasting using double seasonal exponentialsmoothing. In: Journal of the Operational Research Society 54 (8), pp. 799–805.

Tirole, J. (1988). The Theory of Industrial Organization. MIT Press Ltd, pp. 209–212. 479 pp.Transchel, S. and Minner, S. (2009). The impact of dynamic pricing on the economic order

decision. In: European Journal of Operational Research 198 (3), pp. 773–789.Varian, H. R. (1980). A Model of Sales. In: The American Economic Review 70 (4), pp. 651–659.Waldfogel, J. and Chen, L. (2006). Does information undermine brand? Information intermedi-

ary use and preference for branded web retailers. In: The Journal of Industrial Economics 54 (4),pp. 425–449.

Wan, Y.; Menon, S., and Ramaprasad, A. (2003). A Classification of Product Comparison Agents.In: Proceedings of the 5th International Conference on Electronic Commerce. ICEC ’03. Pitts-burgh, Pennsylvania, USA: ACM, pp. 498–504.

Wang, Y. and Witten, I. H. (1997). Induction of model trees for predicting continuous classes. In:Poster papers of the 9th European Conference on Machine Learning. Springer.

Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. In: Machine Learning 8 (3-4), pp. 279–292.

Weisstein, F. L.; Monroe, K. B., and Kukar-Kinney, M. (2013). Effects of price framing on con-sumers’ perceptions of online dynamic pricing practices. In: Journal of the Academy of MarketingScience 41 (5), pp. 501–514.

Welch, G. and Bishop, G. (2006). An Introduction to the Kalman Filter. Tech. rep. TR 95-041.University of North Carolina at Chapel Hill, pp. 1–16.

Witten, I. H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Tech-niques. 2nd ed. Morgan Kaufmann.

Bibliography 82

https://www.tu-chemnitz.de/wirtschaft/wi2/wp/wp-content/uploads/2012/04/Pricing-Studie-State-of-the-Art-im-E-Commerce_v1.5.pdf



http://risnews.edgl.com/retail-research/Pricing-Intelligence-Goes-to-War90346

http://risnews.edgl.com/retail-research/Pricing-Intelligence-Goes-to-War90346

http://www.ecommercebytes.com/cab/abn/y12/m07/i18/s02

Zhang, J. and Jing, B. (2011). The Impacts of Shopbots on Online Consumer Search. In: Proceed-ings of the 44th Hawaii International Conference on System Sciences (HICSS), pp. 1–10.

Bibliography 83

A Product Selection Process

This section shows the intermediate steps of the product selection. Table 17 presents a snap-shot of the CSA Billiger.de with its top 40 popular products on 10/25/2015. The corre-sponding categories act as baseline and are scaled up by the factor 2.5 in table 18. This step isperformed with purpose of getting a representative distribution for 100 products. Subsequently,the categories are simplified in table 19 and the number of quintuples is derived.

Position Product Category Price

1 Thierry Mugler Alien Eau de Pa... Gesundheit & Kosmetik – Kosmetik – Parfum 35.94 €

2 Apple iPhone 6 Handys & Telefon – Handys – Handys ohne Vertrag 545.00 €

3 Samsung Galaxy S5 Handys & Telefon – Handys – Handys ohne Vertrag 320.00 €

4 Sony PS4 500GB Unterhaltungselektronik – Konsolen & Zubehör – Konsolen 317.99 €

5 Samsung Galaxy S6 Handys & Telefon – Handys – Handys ohne Vertrag 446.20 €

6 Maxi-Cosi Pebble Baby & Kind – Unterwegs – Autokindersitze 148.54 €

7 Apple iPhone 6 Plus Handys & Telefon – Handys – Handys ohne Vertrag 684.99 €

8 Logitech UE Boom Unterhaltungselektronik – Audio & HiFi – HiFi 129.00 €

9 KitchenAid Artisan Küchenmasch... Haushalt – Küchengeräte – Küchenmaschine 394.68 €

10 Apple iPad mini 3 Computer & Software – Tablet PCs & Zubehör –Tablet PCs 312.79 €

11 Huawei P8 Lite Handys & Telefon – Handys – Handys ohne Vertrag 207.00 €

12 Sony Xperia Z3+ Handys & Telefon – Handys – Handys ohne Vertrag 495.88 €

13 Bosch GSR 10,8-2-LI Professional Heimwerken & Garten – Werkstatt & Werkzeug – Elektrowerkzeug 49.00 €

14 HTC One M9 Handys & Telefon – Handys – Handys ohne Vertrag 459.08 €

15 ABC-Design Turbo 4S Baby & Kind – Unterwegs – Kinderwagen 292.81 €

16 Sony Alpha 6000 Fotografie – Fotografie & Camcorder – Digitalkameras 468.20 €

17 Samsung Galaxy Alpha Handys & Telefon – Handys – Handys ohne Vertrag 349.00 €

18 Valentino Valentina Eau de Par... Gesundheit & Kosmetik – Kosmetik – Parfum 30.99 €

19 ABC-Design 3Tec Baby & Kind – Unterwegs – Kinderwagen 292.81 €

20 Microsoft Lumia 640 Handys & Telefon – Handys – Handys ohne Vertrag 119.84 €

21 Samsung Galaxy A3 Handys & Telefon – Handys – Handys ohne Vertrag 178.48 €

22 Bosch PSR 18 LI-2 Hegoodimwerken & Garten – Werkstatt & Werkzeug – Elektrowerkzeug 78.83 €

23 Samsung Galaxy S5 mini Handys & Telefon – Handys – Handys ohne Vertrag 237.56 €

24 Voltaren Schmerzgel Gesundheit & Kosmetik – Arzneimittel 3.94 €

25 Philips SensoTouch 3D Gesundheit & Kosmetik – Kosmetik – Elektrischer Rasierer 141.31 €

26 Hankook Ventus Prime 2 K115 Auto & Motorrad – Reifen – PKW-Reifen 38.95 €

27 Samsung Galaxy S III mini Handys & Telefon – Handys – Handys ohne Vertrag 111.00 €

28 FC Bayern München Bettwäsche Spiel, Sport & Freizeit – Fanartikel 35.99 €

29 Makita BHP453 Heimwerken & Garten – Werkstatt & Werkzeug – Elektrowerkzeug 64.90 €

30 Goodyear Vector 4Seasons Auto & Motorrad – Reifen – PKW-Reifen 37.10 €

31 Loceryl Nagellack Gesundheit & Kosmetik – Arzneimittel 15.05 €

32 McNeill Ergo Light Compact Baby & Kind – Schulbedarf – Schulranzen Sets 109.95 €

33 Orthomol-Immun Gesundheit & Kosmetik – Arzneimittel 11.46 €

34 CYBEX Pallas Baby & Kind – Unterwegs – Autokindersitze 137.93 €

35 Maxi-Cosi CabrioFix Baby & Kind – Unterwegs – Autokindersitze 115.66 €

36 Bosch PSR 14,4 LI-2 Heimwerken & Garten – Werkstatt & Werkzeug – Elektrowerkzeug 114.99 €

37 Apple MacBook Pro Computer & Software – Notebooks 999.00 €

38 Maxi-Cosi Tobi Baby & Kind – Unterwegs – Autokindersitze 159.31 €

39 ABC-Design Turbo 6S Baby & Kind – Unterwegs – Kinderwagen 249.99 €

40 Quinny Zapp Xtra Baby & Kind – Unterwegs – Kinderwagen 150.64 €

Table 17: Top 40 (10/25/2015 - 13 PM) of Billiger.de.


Billiger.de

Billiger.de

Categories Entries Scaled Entries

Handys & Telefon – Handys – Handys ohne Vertrag 12 30

Baby & Kind – Unterwegs – Autokindersitze 4 10

Baby & Kind – Unterwegs – Kinderwagen 4 10

Heimwerken & Garten – Werkstatt & Werkzeug – Elektrowerkzeug 4 10

Gesundheit & Kosmetik – Arzneimittel 3 7.5

Auto & Motorrad – Reifen – PKW-Reifen 2 5

Gesundheit & Kosmetik – Kosmetik – Parfum 2 5

Baby & Kind – Schulbedarf – Schulranzen Sets 1 2.5

Computer & Software – Notebooks 1 2.5

Computer & Software – Tablet PCs & Zubehör – Tablet PCs 1 2.5

Fotografie – Fotografie & Camcorder – Digitalkameras 1 2.5

Gesundheit & Kosmetik – Kosmetik – Elektrischer Rasierer 1 2.5

Haushalt – Küchengeräte – Küchenmaschine 1 2.5

Spiel, Sport & Freizeit – Fanartikel 1 2.5

Unterhaltungselektronik – Audio & HiFi – HiFi 1 2.5

Unterhaltungselektronik – Konsolen & Zubehör – Konsolen 1 2.5

40 100

Table 18: Top 40 mapped categories of Billiger.de.

Mapped Categories Products Quintuples

Smartphone 30 6

Baby & Kind 20 4

Gesundheit & Kosmetik 15 3

Computer & Software 5 1

Heimwerken & Garten 10 2

Unterhaltungselektronik 5 1

Auto & Motorrad 5 1

Fotografie 5 1

Haushalt 5 1

100 20

Table 19: Product category selection.


Billiger.de

B Classification Feature Selection Algorithms

Two different feature selection mechanisms have been developed. The first algorithm is pre-sented in algorithm 3. The main idea is sorting the features regarding a calculated metric andselecting the feature with highest metric gain. In every iteration the expected gains are recal-culated and the features are added until no gain is possible or all features are selected. Theselection scheme is wrapped in a n-fold cross validation.

Algorithm 3 Greedy feature selection algorithm.procedure SELECT(selec tedFeatures, remainingFeatures, cur rentBestMeasure)

if remainingFeatures.isEmpty thenreturn selec tedFeatures

elsefor n-fold Cross Validation do

for all remainingFeatures doclassifier.calculateGainsByGridSearch(selec tedFeatures+remainingFeature)

end forend forbestRemainingFeature=selectRemainingFeature(byHighestAvgGain)if bestRemainingFeature.maxGain>0 then

return SELECT(selec tedFeatures.add(bestRemainingFeature),remainingFeatures.remove(bestRemainingFeature),cur rentBestMeasure+bestRemainingFeature.maxGain)

elsereturn selec tedFeatures

end ifend if

end procedure

The greedy feature selection is fast but has serious shortcomings like the possibility ofbeing trapped in a local maximum. Therefore, a second feature selection algorithm has beendeveloped which relies on binary random sampling. This algorithm is shown in algorithm 4.The main idea is picking a feature and creating two classes with random samples. One classcontains the current feature and the other class not. During sampling it crystallizes if the featureshould be added. The whole algorithm is wrapped again with a n-fold cross validation.

Note: The applied metrics are always averaged regarding the number of folds.


Algorithm 4 Binary feature selection algorithm.procedure SELECT(selec tedFeatures, remainingFeatures, cur rentBestMeasure)

if remainingFeatures.isEmpty thenreturn selec tedFeatures

elsepotentialFeature = remainingFeatures.headfor n-fold Cross Validation do

for samples per n dor=selectRandom(remainingFeatures.remove(potentialFeature))classA=classifier.calculateGainByGridSearch(selec tedFeatures+potentialFeature+r)classB=classifier.calculateGainByGridSearch(selec tedFeatures+r)

end forend forif avgGain(classA)>avgGain(classB) && avgGain(classA)>0 then

return SELECT(selec tedFeatures+potentialFeature,remainingFeatures.drop(1),cur rentBestMeasure+avgGain(classA))

elseSELECT(selec tedFeatures, remainingFeatures.DROP(1), cur rentBestMeasure)

end ifend if

end procedure


C Classification Classifiers Grid Search Configuration

This section shows the classifier grid search parameters for the task of classifying price seriesinto manual and automated repricing. The grid search is implemented as brute force approachresulting in 140 configurations for random forest, 42 configuration for C4.5 and 9 configurationsfor REP tree. The list of grid search parameters is shown in table 20.

Classifier Parameter Values

Random Forest Number of Trees 100, 200

Tree Depth default, 10

Minimum Instances 1, 2, 5, 10, 20, 50, 80

Number of Attributes default, 2, 5, 10, 15

C4.5 Confidence Intervals 0.01, 0.05, 0.10, 0.15, 0.25, 0.50

Minimum Instances 1, 2, 5, 10, 20, 50, 80

REP tree Minimum Instances 2, 5, 10, 15, 20, 30, 40, 50, 80

Table 20: Classification grid search parameters.

C Classification Classifiers Grid Search Configuration 88

D Evaluation of Different Balancing Schemes

The automated repricing dataset is imbalanced. The manual repricing (MR) class accounts foralmost 95%. Decision tree approaches often are mislead in such cases since the optimzation isfocussed on the majority class. Figure 34 shows different balancing schemes and their metric im-pacts for REP trees. The averaged results are based on a 10-fold cross validation with pure andinjected classification schemes for both feature selectors: binary and greedy. If no auto balanc-ing countermeasure is applied, the classifier is performance poor for the automated repricingclass (AR). Weight-based auto balancing improves massively the F-measure of the automatedrepricing class at an expense of overall accuracy. SMOTE achieves both: Overall accuracy andclass-dependent accuracy. Similar balancing impacts are observed for C4.5 and random forest.

Ach

ieve

dM

etri

c

Balancing Schemes

F-measure(AR)F-measure(MR)ROC area

0.50.55

0.60.65

0.70.75

0.80.85

0.90.95

1

NoneWeight-based

SMOTE

Figure 34: Different balancing schemes with REP trees.

D Evaluation of Different Balancing Schemes 89

E Detailed Classification Results

This section covers the detailed evaluation results of the automated repricing classification.Table 21 shows the general class prediction results. The class-dependent results can be foundin table 22. Table 23 shows features which are favored by the classifiers. ’Preferred’ means thatat in least of 60% of the cross validation folds the respective feature is selected.

Classifier C4.5 Random Forest REP Tree

Auto Repricer Policy pure injected pure injected pure injected

Feature Selector binary greedy binary greedy binary greedy binary greedy binary greedy binary greedy

Avg Number of Features 12 4 13 6 14 5 14 6 13 3 13 4

Calculated Trees 38210 12695 38420 16145 36330 12695 35930 14060 35740 8890 35750 11225

Avg Tree Size 68 89 145 123 N/A N/A N/A N/A 63 76 101 106

Avg Training ROC Area 97.28% 97.39% 90.45% 90.20% 98.59% 98.35% 93.08% 92.93% 97.58% 96.53% 90.03% 88.42%

Avg Prediction ROC Area 95.94% 95.22% 88.56% 88.04% 97.11% 96.99% 91.48% 90.46% 95.05% 94.70% 87.86% 86.53%

Table 21: Base results of the automated repricing classification.

Classifier C4.5 Random Forest REP Tree

Auto Repricer Policy pure injected pure injected pure injected

Feature Selector binary greedy binary greedy binary greedy binary greedy binary greedy binary greedy

Avg AR Precision 92.56% 91.83% 86.54% 86.92% 94.43% 93.88% 90.31% 88.50% 93.01% 88.41% 86.34% 87.22%

Avg AR Recall 86.00% 83.66% 72.22% 71.41% 81.58% 84.35% 70.48% 73.13% 82.25% 87.34% 71.78% 65.87%

Avg AR F-measure 88.88% 87.24% 78.44% 78.24% 87.33% 88.62% 78.96% 79.79% 86.78% 87.61% 78.03% 74.07%

Avg AR ROC Area 95.94% 95.22% 88.56% 88.04% 97.11% 96.99% 91.48% 90.46% 95.05% 94.70% 87.86% 86.53%

Avg MR Precision 87.09% 85.56% 77.14% 76.31% 84.36% 86.21% 76.53% 78.09% 85.04% 87.71% 76.77% 73.77%

Avg MR Recall 93.99% 93.45% 89.25% 89.50% 95.79% 95.21% 92.69% 90.73% 94.57% 89.59% 89.05% 90.78%

Avg MR F-measure 90.20% 89.14% 82.59% 82.29% 89.49% 90.21% 83.76% 83.84% 89.08% 88.46% 82.27% 81.13%

Avg MR ROC Area 95.94% 95.22% 88.56% 88.04% 97.11% 96.99% 91.48% 90.46% 95.05% 94.70% 87.86% 86.53%

Avg Total Precision 90.61% 89.42% 82.12% 81.89% 90.05% 90.78% 83.59% 83.44% 89.86% 88.93% 81.85% 80.81%

Avg Total Recall 89.99% 88.72% 81.08% 80.81% 88.98% 89.91% 81.99% 82.39% 88.54% 88.52% 80.70% 78.61%

Avg Total F-measure 89.97% 88.68% 80.85% 80.61% 88.94% 89.89% 81.68% 82.14% 88.47% 88.52% 80.46% 77.95%

Avg Total ROC Area 95.94% 95.22% 88.56% 88.04% 97.11% 96.99% 91.48% 90.46% 95.05% 94.70% 87.86% 86.53%

Table 22: Detailed results of the automated repricing classification.


Auto Repricer Policy pure injected

Feature Selector binary greedy binary greedy

C4.5 Top Features

AvgDelta avgTop3ShortestChangeRatio avgTop3ShortestChangeRatio availability

avgDeltaToMinPriceProduct maxDeltaDayRatio offerRatio avgTop3ShortestChangeRatio

deltaDownRatio degreeInTop3 downUpDeltaRatio

avgTop3ShortestChangeRatio maxDeltaDayRatio offerRatio

offerRatio

maxDeltaDayRatio

Random Forest Top Features

avgDelta avgDelta avgDelta availability

distinctPriceRatio offerRatio availability distinctPriceRatio

deltaDownRatio maxDeltaDayRatio distinctPriceRatio deltaDownRatio

avgTop3ShortestChangeRatio avgPriceToProduct avgTop3ShortestChangeRatio

downUpDeltaRatio deltaUpRatio offerRatio

deltaUpRatio offerRatio mostFrequentCentEnding

offerRatio numberOfResellers

priceSegments longestPlateau

endogenousChangeRatio degreeInTop3

relativeMedianSpan mostFrequentCentEnding

avgDeltaToProduct

REP Tree Top Features

avgDelta N/A avgRelativeLowerGap deltaDownRatio

distinctPriceRatio offerRatio mainDeltaTime

downUpDeltaRatio avgGapToMinPrice

offerRatio

Table 23: Preferred features of the automated repricing classifiers.


F Large Decision Tree Examples

Figure 35 shows a resulting decision tree with greedy selection and pure classification schemeof a C4.5 tree. The leafs contain the number of classified and misclassified instances.

maxDeltaDayRatio

maxDeltaDayRatio

<= 0.041667


> 0.041667

manual (3288.0/1.0)

<= 0


> 0

maxDeltaDayRatio

<= 78806870.156311

maxDeltaDayRatio

> 78806870.156311

manual (183.0/26.0)

<= 0.029412


> 0.029412


<= 53487180.009945

auto (96.0/26.0)

> 53487180.009945


<= 19679666

manual (88.0/22.0)

> 19679666


<= 10933812.927173

auto (81.0/23.0)

> 10933812.927173

auto (35.0/16.0)

<= 2834000

manual (30.0/8.0)

> 2834000

auto (25.0/7.0)

<= 0.009804

manual (1648.0/107.0)

> 0.009804


<= 13435897.006015


> 13435897.006015

auto (2865.0/101.0)

<= 217168.77127

maxDeltaDayRatio

> 217168.77127

auto (2673.0/205.0)

<= 0.33326


> 0.33326

manual (33.0/14.0)

<= 7074201.758409

auto (54.0/6.0)

> 7074201.758409


<= 153689177.200511

manual (157.0/11.0)

> 153689177.200511

manual (42.0/9.0)

<= 17764333


> 17764333

auto (459.0/118.0)

<= 55970299.688461


> 55970299.688461

manual (31.0/6.0)

<= 71172333


> 71172333

auto (43.0/11.0)

<= 79634000

maxDeltaDayRatio

> 79634000

manual (33.0/8.0)

<= 0.060414

auto (50.0/23.0)

> 0.060414

Figure 35: A generated C4.5 tree of medium size.

Figure 36 shows a resulting decision with binary selection and pure classification schemeof a C4.5 tree.

maxDeltaDayRatio

avgDeltaToMinPriceProduct

<= 0.034483

offerRatio

> 0.034483

manual (4202.0/15.0)

<= 0.181131

offerRatio

> 0.181131

manual (561.0/12.0)

<= 0.576025

avgDelta

> 0.576025

availability

<= 0.002283

avgDelta

> 0.002283


<= 0.971063

manual (96.0/2.0)

> 0.971063

manual (42.0/3.0)

<= 0.361438

nightDeltaRatio

> 0.361438

manual (11.0/1.0)

<= 0.00337

nightDeltaRatio

> 0.00337

auto (10.0)

<= 0.07075

manual (10.0/4.0)

> 0.07075

maxPosition

<= 0.007082

auto (29.0)

> 0.007082

avgPosWithDelivery

<= 38.401236

manual (26.0/2.0)

> 38.401236

avgHigherGap

<= 9.306122

auto (43.0/9.0)

> 9.306122

manual (19.0/1.0)

<= 3.523813

avgPosWithDelivery

> 3.523813

manual (10.0/3.0)

<= 3.81046

auto (14.0/2.0)

> 3.81046

offerRatio

<= 0.234237

avgDelta

> 0.234237

manual (114.0/1.0)

<= 0.067521

avgDelta

> 0.067521

availability

<= 0.157098

auto (14.0/1.0)

> 0.157098

numberOfResellers

<= 0.980556

manual (69.0/6.0)

> 0.980556

auto (27.0/7.0)

<= 92.315049

manual (14.0)

> 92.315049


<= 0.004464

nightDeltaRatio

> 0.004464

numberOfResellers

<= 0.231701

degreeInTop3

> 0.231701

nightDeltaRatio

<= 70.892473

manual (300.0/11.0)

> 70.892473

manual (32.0/1.0)

<= 0.00231

priceTrend

> 0.00231

nightDeltaRatio

<= 0.000012

manual (10.0)

> 0.000012

maxDeltaDayRatio

<= 0.778341

manual (10.0)

> 0.778341

manual (11.0/1.0)

<= 0.041928

maxPosition

> 0.041928

auto (14.0)

<= 9.977267

avgPosWithDelivery

> 9.977267

relativeMedianSpan

<= 29.035273

auto (14.0)

> 29.035273

auto (17.0/3.0)

<= 0.029047

manual (10.0)

> 0.029047

numberOfResellers

<= 0.00004

avgDelta

> 0.00004

numberOfResellers

<= 143.315261

manual (20.0)

> 143.315261

manual (14.0)

<= 48.340436

availability

> 48.340436

manual (10.0)

<= 0.07415

priceTrend

> 0.07415

auto (15.0/7.0)

<= -0.000017

maxDeltaDayRatio

> -0.000017

availability

<= 0.143064

auto (14.0)

> 0.143064

auto (20.0/4.0)

<= 0.999127

manual (10.0/1.0)

> 0.999127

manual (13.0/3.0)

<= 0.001122

availability

> 0.001122

auto (185.0/22.0)

<= 0.999912

maxPosition

> 0.999912

auto (30.0/3.0)

<= 7.493482

manual (30.0/8.0)

> 7.493482


<= 0

maxDeltaDayRatio

> 0

auto (22.0)

<= 0.077562

maxDeltaDayRatio

> 0.077562

availability

<= 0.178565

auto (18.0/1.0)

> 0.178565

auto (18.0/6.0)

<= 0.986682

manual (42.0/10.0)

> 0.986682

offerRatio

<= 0.0625

avgHigherGap

> 0.0625

numberOfResellers

<= 0.628051

auto (498.0/37.0)

> 0.628051

auto (42.0/5.0)

<= 76

nightDeltaRatio

> 76

auto (10.0/3.0)

<= 0.211098

manual (20.0/1.0)

> 0.211098

nightDeltaRatio

<= 0.06537

avgDelta

> 0.06537

auto (15.0)

<= 0.196855

manual (10.0/1.0)

> 0.196855

availability

<= 0.013021

auto (3111.0/24.0)

> 0.013021

avgHigherGap

<= 0.000012

auto (1834.0/66.0)

> 0.000012

numberOfResellers

<= 7.275119

manual (11.0/2.0)

> 7.275119

auto (92.0)

<= 59.491675

relativeMedianSpan

> 59.491675

manual (10.0/3.0)

<= 0.013161

auto (71.0/12.0)

> 0.013161

Figure 36: A generated C4.5 tree of large size.

F Large Decision Tree Examples 92

G Prediction Classifier Grid Search Configuration

This section shows the classifier grid search parameters for the task of predicting prices. Adecision/regression tree approach has been developed which uses a random forest and a M5 treeclassifier. The grid search is implemented as brute force approach resulting in 24 configurationsfor the first prediction stage (price delta direction) and 144 configurations for the second stage(price delta amplitude prediction). The list of grid search parameters is shown in table 24

Classifier Parameter Values

Random Forest Number of Trees 100

Tree Depth 10

Minimum Instances 2, 5, 10, 20, 50, 80

Number of Attributes default, 5, 10, 20

M5 Tree Minimum Instances 2, 5, 10, 20, 50, 80

Table 24: Price prediction grid search parameters.

G Prediction Classifier Grid Search Configuration 93

H Start Hour Prediction Comparison

This section analyzes the impacts of the start hour on the prediction results. The effect isdemonstrated by an absolute prediction based on a daily crawling interval with synthetic min-imum prices. A 20-fold time series cross validation has been conducted. On the one hand, adecision tree predictor configured with no auto balancing and activated grid search is applied.On the other hand, a no delta predictor is applied. Figure 37 shows the RMSE stability dependingon the start hour of the crawling interval (UTC). The RMSE stability is defined as:

RMSE stability=RMSE(no delta predictor)

RMSE(decision tree predictor)

This metric represents the degree to which extent the decision tree predictor performs betterover the no delta predictor.

RM

SESt

abili

ty

Time UTC [Hours]

0.9

0.95

1

1.05

1.1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Figure 37: Start hours and RMSE stability.

The results show, that high RMSE stability is reached with a standard deviation of 1.5629%.The standard deviation for the simple delta is even better with 0.9459%. Those results illustratethat a fixed start hour can be chosen in order to reduce computation complexity while keepingcomparability of the different prediction approaches.

H Start Hour Prediction Comparison 94

I Detailed Minimum Price Prediction Results

This section provides detailed information about the minimum price prediction results. Theunderlying crawling interval is on daily basis. The results are clustered by the correspondingdelta type: Simple delta (table 25), direction delta (table 26) and absolute delta (table 27). Thedecision tree approach’s configuration refers to the used balancing scheme.

Predictor Configuration MAE RMSE PPR Grid Search Config

No Delta Predictor default 0.2141 0.4577 100.00%

Simple Predictor default 0.2544 0.5002 93.11%

Pretrained Decision Tree Predictor none 0.1911 0.4333 92.75% rfInstances=50;rfAttributes=5

Decision Tree Predictor none 0.2100 0.4555 90.60% rfInstances=50;rfAttributes=20

SMOTE 0.2151 0.4610 86.05% rfInstances=20;rfAttributes=5

Weight-based 0.2268 0.4728 82.05% rfInstances=2;rfAttributes=20

R Predictor BATS 0.6598 0.8114 13.47%

HW 0.2524 0.4988 76.28%

TBATS 0.6598 0.8114 13.47%

DSHW 0.2524 0.4988 76.28%

ARIMA 0.6348 0.7959 16.57%

NNETAR 0.6863 0.8276 10.71%

ETS 0.6301 0.7930 17.76%

STL 0.2524 0.4988 76.28%

Weka Predictor MLP 0.6051 0.7772 20.71%

LR 0.6900 0.8300 10.60%

SVR 0.4341 0.6582 43.43%

Weka Overlay Predictor MLP 0.5878 0.7660 22.65%

LR 0.6854 0.8273 11.06%

SVR 0.6041 0.7762 19.80%

Table 25: Minimum price prediction results for daily simple price deltas.





Pretrained Decision Tree Predictor none 0.2013 0.4460 97.40% rfInstances=20;rfAttributes=5

Decision Tree Predictor none 0.2110 0.4698 94.10% rfInstances=20;rfAttributes=5

SMOTE 0.2197 0.4879 91.55% rfInstances=20;rfAttributes=default

Weight-based 0.2708 0.5601 82.55% rfInstances=2;rfAttributes=default

R Predictor BATS 0.8871 1.0298 11.37%

HW 0.3464 0.6591 76.28%

TBATS 0.8871 1.0298 11.37%

DSHW 0.3464 0.6591 76.28%

ARIMA 0.4410 0.7075 66.82%

NNETAR 0.7880 0.9771 20.98%

ETS 0.8019 0.9820 22.56%

STL 0.3464 0.6591 76.28%


LR 0.8472 1.0094 14.08%

SVR 0.6098 0.8624 40.87%


LR 0.8580 1.0100 11.27%

SVR 0.8035 0.9898 17.64%

Table 26: Minimum price prediction results for daily direction price deltas.




Pretrained Decision Tree Predictor none 1.7731 6.6221 95.75% rfInstances=20;rfAttributes=20;m5Instances=10

Decision Tree Predictor none 1.8934 7.0232 93.30% rfInstances=20;rfAttributes=10;m5Instances=2

SMOTE 2.0825 7.6431 90.35% rfInstances=10;rfAttributes=10;m5Instances=2

Weight-based 2.9019 8.9427 80.25% rfInstances=5;rfAttributes=10;m5Instances=10

R Predictor BATS 2.3854 7.3318 11.88%

HW 3.3660 10.9172 76.28%

TBATS 2.3854 7.3318 11.88%

DSHW 3.3660 10.9172 76.28%

ARIMA 2.1149 6.9536 68.72%

NNETAR 3.0202 9.1919 15.91%

ETS 1.9060 6.9117 23.22%

STL 3.3660 10.9172 76.28%


LR 2.5876 9.1841 21.74%

SVR 3.4907 11.0387 18.92%


LR 2.6923 7.1430 15.97%

SVR 4.8870 14.3723 14.48%

Table 27: Minimum price prediction results for daily absolute price deltas.


J Detailed Reseller Price Prediction Results

Table 28 provides detailed information about the reseller price prediction results within the carproduct category. The underlying crawling interval is on daily basis.

Delta Type Predictor MAE RMSE PPR Grid Search Config

Absolute Delta

No Delta Predictor 0.7748 5.9499 100.00% default

Decision Tree Predictor 1.1023 7.6885 75.73% rfInstances=2;rfAttributes=0;m5Instances=2

Assortment Decision Tree Predictor 1.0279 8.0185 83.68% rfInstances=80;rfAttributes=5;m5Instances=2

Pretrained Decision Tree Predictor 0.7818 5.2951 90.92% rfInstances=10;rfAttributes=20;m5Instances=20

Direction Delta


Decision Tree Predictor 0.3758 0.6822 83.68% rfInstances=50;rfAttributes=20

Assortment Decision Tree Predictor 0.3758 0.6822 83.68% rfInstances=50;rfAttributes=20

Pretrained Decision Tree Predictor 0.3165 0.567 98.03% rfInstances=80;rfAttributes=5

Simple Delta


Decision Tree Predictor 0.2690 0.5146 68.63% rfInstances=10;rfAttributes=5

Assortment Decision Tree Predictor 0.2690 0.5142 69.31% rfInstances=2;rfAttributes=5

Pretrained Decision Tree Predictor 0.2788 0.5236 87.01% rfInstances=20;rfAttributes=0

Table 28: Reseller price prediction results for the car product category.

J Detailed Reseller Price Prediction Results 97

Documents

Automated Repricing in Comparison Shopping Agents - Price ... · change rates on every third day. Based on product minimum price change rates of up to every second hour the market