Identifying new EV charging station locations based on ...fse.studenttheses.ub.rug.nl/13032/1/2015-0702_ChargeQuest_Bachel… · Bachelor Project ChargeQuest c Identifying new EV

Bachelor Project

ChargeQuestc©

Identifying new EV charging stationlocations based on user trip data

Authors:A.J.H. Sigtermans (s1914766)

J. van Breemen (s2262967)K.Y. Kliffen (s2369494)

Supervisor:prof. dr. A. Lazovik

July 2nd, 2015

Contents

1 Project description 31.1 Domain definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Main goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Problem statement 72.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Methods to gather data 103.1 Data needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Existing data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Gathering data through submission systems (inquiry-style) . . . . . . . . . . . . . 143.4 Gathering data through tracking applications . . . . . . . . . . . . . . . . . . . . . 153.5 Preliminary conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Requirements 174.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Non-functional requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.4 Functional requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.5 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Technical research 285.1 API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.2 Storage methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.3 Hosting and processing architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.4 Location aware app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.5 Preliminary conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6 Prototyping 336.1 Data gathering app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.2 Centralised platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.3 Implemented architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.5 Data simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

7 Data analysis 427.1 General approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427.2 Test data generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447.3 Weight determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467.4 Research on clustering algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497.5 Identifying large clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527.6 Processing large clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547.7 Post processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567.8 Input Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587.9 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597.10 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

8 Conclusions 61

1

A Appendix 66A.1 API specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66A.2 List of places used for simulation purposes . . . . . . . . . . . . . . . . . . . . . . . 67A.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

2

1 Project description

This study focuses on the growth of electric vehicle supporting infrastructure. Over the pastyears, electric vehicle usage has been on the rise, and technologies that have been researched foryears start to get ready for the consumer market. The use of electric vehicles is growing rapidly,and the Netherlands currently has one of the highest coverages in both charging stations as wellas electric vehicles (of course, in relation to the number of inhabitants) [21]. For example, theNetherlands currently has over 7600 charging stations. That’s the highest number in any of theEU countries, with Germany (a substantially bigger country in terms of both population andarea) taking a second place with approximately 3800 charging stations. Though electric vehicles

Figure 1: Charging station coverage in the EU (chargemap.com)

are being adopted quickly, the main bottleneck for massive growth remains the limited range.Whilst researchers, engineers and car manufacturers focus on increasing battery capacity (withvery promising results shown by companies such as Tesla), it is up to local governments, infras-tructure providers and electric utility companies to improve the infrastructure for rechargingelectric vehicles.

This study will focus on helping these last mentioned parties to gather insights on where toplace new charging stations, based on the needs of both current electric vehicle drivers, as wellas future electric vehicle drivers. By gathering data on driving behaviour (trips), processing it(adding weights and distinguishing variables), and analyse it, new charging station locationswill be identified, helping the before mentioned parties to grow the electric vehicle infrastruc-ture in an efficient manner, onwards to a ’global’ coverage, similar to that of fossil fuels.

Though this study focuses on the situation in the Netherlands, which already has a high densityof charging stations, it will be of great importance to other countries that are in an earlier stageof electric vehicle infrastructure development. The Netherlands is a good reference case since a’general’ coverage has already been realised, and efficient and optimised decisions hence are ofgreater importance. Furthermore, the methods described for identifying charging stations willnot only be applicable to supplying electricity (recharging facilities) but can, with the right in-terpretation and constraints, also be applied on other (renewable) fuels that are not yet broadlyavailable.

3

chargemap.com

1.1 Domain definitions

As follows from the above introduction, this study will discuss many topics regarding electricvehicles. For context, and to avoid confusion due to different interpretations, some definitionsfrom the electric vehicle domain will be given below. The terms explained here will be frequentlyused throughout this thesis.

Hybrid Electric Vehicles A Hybrid Electric Vehicle (HEV) combines a conventional internalcombustion engine (ICE) with an electric propulsion system. The electric propulsion sys-tem is used for either fuel efficiency improvements, or performance improvements. Theelectricity for the electric motor is provided by a generator linked to the ICE, or a batterythat is charged through regenerative breaking.

Plug-in Hybrids Electric Vehicles A Plug-in Hybrid Electric Vehicle (PHEV) is the same as aHEV, except that it’s battery can be charged by external power sources, and hence generallyhas larger capacity (allowing for some kilometers of pure electric driving). In some casesthe car is fully driven by the electric motors, where the ICE only serves as generatorto recharge the battery whilst driving. Traditional fuel sources such as petrol are stillconsidered to provide a large amount of the power needed to drive these cars. Batterycapacities of PHEV’s are limited, with a typical 30-80 kilometers of full electric range.

Electric Vehicles with Range Extender An Electric Vehicle (EV) with Range Extender (RE, com-bined sometimes named EV-RE), is an electric car that has a (small) traditional petrolengine that allows to recharge the battery if the driver has used up all of the battery’s ca-pacity. Electricity is considered to be the main power source for these cars, with small fueltanks for the range extending petrol engine. Battery capacities typically allow for 80-200kilometers of full electric driving.

Full Electric Vehicles Full Electric Vehicles are vehicles that are powered by an electric motor,that only draws power from the drive battery. There are no other fuel sources than energyinvolved. Electric vehicles can be recharged, and apply techniques such as regenerativebraking to efficiently use their battery charge. Battery capacities typically allow for 150-300kilometers of fully electric driving, though there are some exceptions either in the low andhigh range (for example: Tesla currently offers cars with a range of up to 400 kilometers).

Range Anxiety Range Anxiety is the fear to run out of battery capacity before reaching yourdestination (or a charging station), as experienced by car owners of electric vehicles, i.e.stranding roadside with no power remaining in the car battery.

Battery When referring to the battery, this study is referring to the battery used to power elec-tric motors (also referred to as ’drive battery’ by some manufacturers). There might besome confusion on this topic as all types of cars also have a (small) battery for on-boardcomputers, starting the engine, etcetera.

1.2 Main goal

The main objective of this study is to research and develop (prototype) methods and modelsthat inform private investors or governments on the placement of new charging station infras-tructure, and show that conclusions can be drawn based on user driving behaviour. The maingoal of the research hence is to develop a system, that might include an app (subject to research),that will help users to determine if an electric vehicle is a viable transport solution for them,meanwhile providing local authorities and service providers with the necessary data to expandthe electric vehicle charging infrastructure. This goal is summarised in the main research ques-tion of this study:

How can user generated trip data be used to identify new charging station locations?

4

1.3 Research questions

To achieve the before-mentioned goal and answer the main research question, several additionaland more specified research questions have been defined. Their answers put together will forma concrete proposal on how to achieve the goal specified. Below, the questions will be specified.For each question, an overview of the proposed research to answer the question will be given,as well as deliverables described later on in this thesis, related to the question.

The first research question focuses on the gathering of data: what sources are available?What new methods can be developed to gather data? How can data be gathered from users(what incentives are needed)? This leads to the following research question:

How can we gather data on the need and future need of electric vehicle charging stations from bothcurrent, but more importantly, future customers?

To answer this question, first this study will identify the data needed, based on various otherstudies on driver behaviour, and the other research questions posed. Afterwards, various avail-able data sources will be evaluated, as well as new methods to gather data. Finally, preliminaryconclusions will be drawn to determine the data needed and relevant sources and/or methodsto gather this data. This will pose as input to the other research questions, and will help toreach the main goal.

The second research question focuses on the processing architecture needed for the data tobe gathered, as well as the analysis to be performed on the data gathered. The implementationof the answer to the first research question poses new questions: large amounts of data will begathered, but where and how to store them? Various factors will have to be taken into account,not only those related to the gathered data, but more importantly factors related to further pro-cessing and analysis of the data. This leads to the following research question:

How can we design a data storage and processing architecture that allows for easy processing and statis-tical analysis of the data gathered?

To answer this question available techniques and platforms will be evaluated. Furthermore,the setup of similar studies will be evaluated and might be used as an example. Also, stud-ies that compare various techniques or platforms will be consulted to draw conclusions on anoptimal architecture to support data gathering and analysis. The answer to this question willhelp build a solid foundation for analysing the data, based on the answers to the third and lastresearch question.

The third and last research question focuses on analyzing the gathered data to result in ad-vice on the placement of new charging stations (to achieve the main goal of this study), andbuilds heavily upon the answers to the two research questions posed in advance. The datagathered represents many individual needs. These will need to be weighted and afterwardsclustered to identify new charging station locations as every evidently cannot be addressed in-dividually. This leads to the following research question on analysis:

How can we analyze gathered data, and process it into valuable conclusions that inform local author-ities and other service providers with desired charging station locations?

The answer to this question will denote the analysis ’pipeline’ to be used on the data gathered.As with the two other questions, existing studies will be analyzed. In this case on geographicaldata analysis and clustering. Various clustering methods will be identified, and compared. Theanswer to this question will consist of a description of the analysis procedure to follow for thegathered data, as identified by the first research question, using the architecture as outlined

5

based on the second research question.

6

2 Problem statement

Though the research questions give a good overview of the goals of this study, they are stillbroad and can be interpreted in many ways. Therefore, this section will focus on defining theproblem statement: what is the problem at hand, and what exact question does this study aimto answer.

2.1 Definitions

Many of the terms used in relation to electric vehicles, charging, and renewable and durableinitiatives like the ones proposed in this study have a very broad interpretation. Therefore, tocorrectly define the problem statement, first some definitions will be explicitly specified below.Each definition will be summed up in one line, with some further explanation as to why thisdefinition is chosen below, for interested readers and more context.

Electric Vehicles

Cars with an electric motor used for its propulsion, that are powered by a battery that can be rechargedfrom an external energy source.

Electric Vehicles (EVs) come in many forms. They share one property that defines them aselectric vehicles; that is that electric motors or battery systems in some way assist in the propul-sion of a car. That indeed is a very broad term, and therefore, the definition of EVs in this studywill be narrowed down. In the problem description, the various types of EVs have already beendescribed. From those definitions, the Full Electric Vehicle would be the most interesting subjectin a study that aims to focus on full electric driving. However, both Electric Vehicles with RangeExtender as well as Plug-in Hybrid Electric Vehicles (though to a lesser extent) are still capableof drastically reducing CO2-emissions by optimally making use of their battery and it’s range.Hence, those can be considered when evaluating needs for charging stations. Identifying thepoints where the batteries of these types of cars need to be recharged, and trying to fulfill thisdemand by identifying new charging station locations will result in Full Electric Vehicle charac-teristics for these vehicles as well.

Hybrid Electric Vehicles will be disregarded in this study, as they only use technology suchas batteries and electric motors to regenerate electricity or drive more efficiently: they are notdependent on recharging and hence not dependent on charging stations, or the location of futurecharging stations.

Charging stations

Physical location defined by GPS-coordinates, with a defined number of outlets that can be used to rechargeElectric Vehicles. A charging station can be either a local charging station (bound to a home, office orother physical location), or an on-route charging station located alongside driving routes that offers fast-charging capabilities.

Charging station is a very wide definition for all sorts of devices, locations and adaptors thatcan be used to charge electric vehicles. In this study, a charging station is defined as a physicallocation that offers one or more outlets using which an electric vehicle battery can be charged.The number of outlets at one charging station can be relevant in terms of capacity, but will bemostly ignored in this study as capacity problems are a problem of its own. The location ofcharging stations will be identified using GPS-coordinates (latitude and longitude). This studywill focus on location and coverage of charging station infrastructure, rather than individualcharging station capacity. Also, various standards exist for charging. That is, various connectorsor protocols exist that might influence a car’s capability to charge at a certain charging station.

7

A standard is however forming[20], and standardisation as well as compatibility will only in-crease. Again, we will focus on location not technology or other local problems.

A very important difference in charging station outlets is however their possibility to fast chargeor not. Furthermore, two types of locations exist for charging stations: charging stations thatare located at ‘parking’ or ‘static’ locations, such as homes, offices, city centres. This study willfrom now on refer to those as ‘local charging station’. And charging station locations that arelocated on-route, so, along highways, main roads, hubs, or other locations that people (car own-ers) do not regularly spent large amounts of time. This study will from now on refer to those as‘on-route charging stations’.

Though some deviations occur, this study will assume that all on-route charging stations areequipped with fast chargers (allowing to indeed recharge on-route), and that local chargingstations are either fast chargers (however with limited use at cars will be parked there for alonger time) or ‘regular’ chargers (taking a longer time of 3-8 hours to fully recharge a car bat-tery). Please note that also with ‘regular’ chargers there is a variation in the time needed tofully charge a car, although this also depends on the car itself. All charging times are howeverconsidered too long for an on-route stopover.

This study will mainly focus on identifying new locations for on-route charging stations. Thoughsuggestions will be made on the identification of new local charging stations, or how the datagathered can be used for this purpose, the main goal will remain to identify on-route charg-ing stations. This follows from the problem statement defined below; the study focuses onincreasing the coverage of charging stations, rather than on local issues.

Trip

A route that is driven by a user, denoted by a start- and endpoint, and intermediate waypoints approxi-mately 250 meters apart. All points consist of GPS-coordinates (latitude, longitude) and a timestamp.

Trips are the basic data element that will be used in analysis done by this study. Trips identifya series of points that together represent an actual driven route. Whereas many datasets onlydenote single points, with no relation between them, this study proposes to track full routes ofusers, as the relation between points is relevant for determining new charging station locations.

User driving behaviour

The behaviour of users (car owners) denoted by trips he or she makes.

User driving behaviour is a term that will be mentioned both in the problem statement andvarious other parts of this study. Of course, this is a term open to broad interpretation. Fromanything as wide as driving speed, to aggressive or slow driving, etcetera.

In this study user driving behaviour will be narrowed down to the following aspects:The trips and routes in these trips a car owner takes, including details of starting point, endpoint, way points (intermediate points the driver passes through when travelling from startingpoint to end point). These points are indicated using GPS coordinates. Additionally, timestampsof these points will be recorded, as additional information and context to the coordinates. Otherparameters and their relevance might be discussed, but the study will focus on the above.

When talking about incorporating user driving behaviour, or drawing conclusions from it, thisstudy intends to use the data described above in mathematical applications to derive conclu-sions. This is a contrast to other studies that use statistical or simulated data.

8

Transition to Electric Vehicles

The increase of electric vehicle use as promoted by governments, as facilitated by infrastructure growthand technical developments, and as accelerated by growing scarcity of fossil fuels and the hence increasingprices of these fuels.

The transition to electric vehicles form part of the underlying motivation for this study, andhence it is denoted here. When referring to it in this study, it represents the plans and goalsgovernments announce and the global awareness of the need to use more electric powered ve-hicles due to their increase efficiency with respect to petrol fueled cars, and the possibilities ofgenerating energy from renewable sources (whereas these options are limited and still causeCO2-emissions for fossil fuels). The transition includes development of batteries with increasedcapacity, stimulating people to buy electric vehicles rather than petrol fueled cars, and of courseexpanding the infrastructure needed to use electric vehicles. The latter is most important in thisstudy, it is what this study aims to contribute to.

2.2 Problem formulation

Given the current coverage of electric vehicle charging stations, and the current ongoing transitionto electric vehicles,

Where should new charging stations be located to match user driving behaviour and hence as-sist in the adoption of electric vehicles.

To answer this question will be the main goal of this study. Stakeholders involved are theAutoJUICE project, local governments, and private companies involved in the EV market.

Besides locating new charging stations, the answers to this question but more importantly theunderlying research will serve to prove how user driving behaviour can be an accurate measureand needs to be incorporated in future infrastructure decisions such as the placement of EVcharging stations.

Current coverage refers to the electric vehicle charging stations that already exist. This studyuses open data sources on charging stations such as openchargemap.io as a reference for this.

9

3 Methods to gather data

The first research question of this thesis focuses on how to gather valuable data, that can later beused to create valuable insights and draw conclusions as to where to place new charging stationinfrastructure (the main goal of this project). Of course, there are numerous ways in which datacan be gathered. However, data can be of different qualities: is it as useful and significant asother data? More importantly, from which group of users is data gathered? In this specificresearch topic, it is very important to first of all define the target group, or target audience, thatwill allow to identify new charging station locations. Gathering data from the wrong groupof users could lead to useless conclusions (e.g., when gathering data from truck drivers, onewould be able to validly determine new charging station locations, those would however be ofno value since truck drivers are not in need of charging stations, at this point in time).

Having identified the target group for gathering data, it is of course important to first of alldenote what exact data, better said variables, need to be gathered. What variables are neededto make valid conclusions, and answer the research questions posed. Together with the targetaudience, this forms a concrete description of the data needed.

Furthermore, one has to think of how the data gathering method can actually be performedin reality: though having users input their exact behaviour in to a statistical software packagemight give the most accurate data, it may not yield much data, as it is too difficult or timeconsuming for users. Various methods will need to be elaborated.

In this study, three methods of gathering data will be evaluated:

• Existing data sources

• Gathering data through submission systems (inquiry-style)

• Gathering data through tracking applications

10

3.1 Data needed

Before going into further detail on either of those methods, it’s important to identify the dataone wants to have available: what is the data model an existing data source must be able tocomply to, in order for it to be viable data. The main need that will be identified in later partsof this thesis is a route that a car travels along. That includes not only destination and source,but also some waypoints on the route, to be able to describe the route. After all, this researchaims to find new charging station locations, and those need to match the actual needs, implyingthe actual routes driven are to be considered (not just any route from destination to source).Of course, after clustering data from various drivers, conclusions may be reached where somedrivers will need to take alternate routes in order to make their trip using electricity only.

Various other papers and research projects on electric vehicles have questioned what data isneeded to form conclusions on driving behaviour, routes, or new charging station locations.They confirm a data model that saves route data.

The research of Xiaomin Xi, Ramteen Sioshansi and Vincenzo Marano [16] uses regions andstates to identify routes. It states that at least routes between these regions as well as a sourceand destination need to be saved. Additionally, they save trip time, which is relevant as it de-creases electric vehicle range. For example, when waiting in traffic jam, the range decreases,which implies a need for a charging station closer than one might expect based solely on theroute (distance).

Other papers, such as that of Sung Hoon Chunga and Changhyun Kwon[17], that mainly fo-cuses on using existing data (in this specific case from the Korea Expressway Corporation),suggest looking at routes (destination-origin) only, and discarding shorter routes. They denotethe number of vehicles travelling on a certain route rather than registering individual routes.This system is very similar to a flow system (with a source and sink) and allows for easy math-ematical modelling. It does however lack detail if all roads are to be considered; it only focuseson main highways.

Further research shows that many studies use existing traffic data, proving that this can beuseful in the determination of new charging station locations. These are however all studiesfocused on identifying highway and main road charging station locations only.

Before continuing, it is important to also recall the distinction between two types of chargingstations as made in the problem statement, namely that of on-route and local charging stations,and identify which implications their specifications have on the data that needs to be gathered.

On-route charging stations These charging stations are located on-route, and used to extendrange of a single route. This will mainly to solely be fast charger charging stations, thatrecharge the car battery up to 80% in a short amount of time (taking 10 minutes up to 1hour). The 80% threshold has to do with technical working of batteries and their chargingprocesses, which this research will not further elaborate on as it is out of scope and widelydiscussed and explained elsewhere. There is no economically viable market for a rechargethat takes 5-8 hours (dependent of the car model and battery capacity) on-route (evidently,no car owner would want to be waiting this amount of time at a roadside petrol station).To indicate future on-route charging station locations information on the route taken isimportant, as well as the “status” of the car at a given point in that route: what is theremaining range, based on factors such as battery capacity. Together, these informationpoints will allow to identify areas of demand for charging stations (i.e. where car batteriesare empty given their starting point with full charge).

Local charging stations These are charging stations that are located at the start- and endpointsof a car owners’ route. They are located in residential areas, at office spaces, and in city

11

centres. These charging stations offer cheaper and full recharge of a car battery, thoughcharging can take three up to eight hours. Indicating new locations for such local chargingstations can be based on the same data as on-route charging station, however only the start-and endpoints would be of interest in this case. Car status is of less influence, and henceother more simplistic data sources that only indicate locations where EV car owners arepresent are viable data sources for identifying new charging station locations of this type.

In addition to all that is mentioned above, it is of course important to note that a point wouldconsist of latitude and longitude coordinate, as this has internationally been established as thestandard way to indicate a location (and many services like GPS offer us ways to retrieve thesepoints).

In addition to the above, it is important to determine the target audience, from which thisstudy would like to gather data. In this case, that would be both EV driver, as well as future EVdrivers, that might buy an electric vehicle. In all cases, the target audience consists of drivers ofconsumer-size vehicles, not trucks.Concluding, the data needed can be described as follows:

The data needed to decide where to place new EV on-route as well as local charging stations does notonly consists of points, but of an start- and endpoint, and waypoints between these two points, whichtogether represents a trip made by a car owner. Hence, trips consisting of a start- and endpoint, andintermediate points should be collected, where points are denoted by their GPS latitude and longitudecoordinates. This data should be collected from both current and future drivers of EVs, where EVs areconsumer-sized vehicles. Additional parameters may be added to broaden the possibilities for analysislater on.

12

3.2 Existing data sources

Having identified the data needed, data sources now need to be found. First of all, existing datasources are evaluated. Those can be split up into two types of data sources: data sources thatgather data for the same purposes (route-tracking), and data sources for other applications that,with the right transformations, might yield the needed data.

Many existing studies make use of traffic data from governments or navigation- and map suppli-ers. Those can be categorised as non route-tracking data (which can only be used for identifyinglocal charging station locations). These data sources provide information on traffic flows, andindividual locations of cars, but not on routes taken by a car. Given this data, assumptions willneed to be made on the actual routes (source-destination pairs) car owners take. Hence, thisdata is only of limited value. Transforming it into some sort of route-form would yield inaccu-rate data, that might even be false.

Much more interesting is data gathered with route tracking in mind. Many applications ofsuch route trackers are already in place, especially in the business-driver market, which is ofparticular interest for the EV market as well (many business-drivers commute a lot, with theexceptional customer visit, where they will need to recharge on-route or on destination).

Many companies already track vehicles they own, manage or rent for other practises, suchas anti-theft, legal requirements (tax-related), fleet management and policy supervision. Anoverview of these existing external data sources available is given below:

Lease car companies Lease car companies equip some of their cars with tracking software foradministrative purposes, or insurance purposes (in the case of expensive or exclusive cars).Some only allow for location tracking, whilst others in some cases also allow for routetracking.

Large corporations Large corporations tend to track their cars, for both policy supervision (doemployees use their cars for the right reasons) and tax-related administration. These com-panies are also a target group for research on the possibilities of EV-driving. Hence, thiswould form a great resource of data. Their main goal for collecting data are administra-tive tax related purposes, which only require a start- and endpoint. Though routes maybe guessed from this data, the exact routes taken are not identifiable by this measure.Hence, the data is again of limited value for on-route charging station locations, thoughvery useful and accurate for local charging station locations.

Insurance companies Insurance companies sometimes request GPS tracking facilities in cars, orgive car owners discounts when their car is equipped with such a system. Hence, anothergreat resource of tracking data. The data is linked to a car, so more usable than unlinkedtraffic data. However, in these systems routes are mostly untracked. However, with correctanalysis (detecting differences in movement), this data can be converted to the data modeldescribed before.

Tracking apps Many apps have been developed for car owners to track their own routes. Espe-cially amongst small business owners (freelancers), these apps are popular as they allowfor easy administration of their driving for tax purposes. This data would match the dataneed as described previously, and is also linked to individual users.

Concluding, some of these existing service could be used as a basis for route tracking datathat reflects user driving behaviour, however, the data is not always accurate, and gathering largeamounts of data might be difficult due to privacy or other restricting measures from companiesnot willing to share their data.

13

3.3 Gathering data through submission systems (inquiry-style)

Though existing data sources allow for easy data gathering (the data only needs to be filteredand processed), and hence may yield large quantities of data, the quality of this data can besomewhat poor. Especially since many of the existing data sources don’t register routes, or onlya source and destination. For research on the placement of new charging stations, detailed in-formation on a route would be the preferred data source. One way to obtain such data is to askusers to submit this data. This can be done either by offline summaries, however those wouldrequire lots of administration. Furthermore, users might render data irrelevant before they sub-mit it, whilst that data (route) would actually be relevant to this study. This is not desired if onewants to get an accurate view of all the routes a car owner takes.

A company called Allego, based in the Netherlands, has created a platform that uses a webportal to gather data. They introduced a service called openbaarladen.nl, which allows localgovernments (municipalities) to open up a web portal where users can submit their requestsand desired locations for new charging stations. This system however only allows users them-selves to identify new locations for charging stations, and does not calculate these locationsbased on actual car owner driving patterns. Questions can be raised as to whether chargingstation locations requested by users increase the overall coverage of charging station infrastruc-ture. Systems and portals like these might only render new charging station locations that areof use to a single car owner only; not to the infrastructure coverage as a whole. Hence, this typeof data would mainly be of use to determining new locations for local charging stations.

Of course, an inquiry or web portal as that designed by Allego could also be designed toallow for user input of routes driven. Many fellow researchers however state in their findingsthat inquiries on driving behaviour or routes are expensive, may easily produce bias due to thedesign of the questionnaire or tool, and that it is hard to gather large quantities of data[18].

14

3.4 Gathering data through tracking applications

As can be (preliminary) concluded from the previous two sections on data gathering methods,it is hard to gather data needed for identifying new fast (on-route) charging station locations.Especially those stations are needed to improve the charging station infrastructure, and henceextend electric vehicle range, hoping to overcome the range anxiety of prospective car buyers.

This research would therefore propose a different method, less static than the ones mentionedbefore, that actively involves the user (car owners and hence prospective EV car buyers) in gath-ering data. By making use of smartphones many of those car owners carry with them on a dailybasis, such as GPS tracking and mobile connectivity, one would be able to collect large amountsof unbiased data on the day-to-day driving behaviour of future electric vehicle drivers. Somemain problems come forth when envisioning such a system. First of all, car owners will needan incentive to participate in such research. Secondly, privacy concerns play an important role:would car owners be willing to give away data on their day-to-day driving behaviour, whichcould potentially be an infringement on their privacy (after all: a car owners’ driving behaviourcan yield great insight in to his or her personal life). Thirdly, one would mainly want to attractusers interested in buying an electric vehicle, as they need to be facilitated by the charging sta-tion infrastructure.

Two of these problems could be combined to form a solution: by building an app that tracks acar owners’ driving behaviour to analyse whether he/she will be able to use an electric vehiclefor daily usage, or whether, with this specific car owners’ driving behaviour, driving electricwould be problematic. Whilst informing users, which could form an incentive, valuable data ofprospective EV drivers is collected. The routes of these people form exactly the (future) demandfor electric vehicle range; they can provide insights in where charging stations should be added,and what routes are in highest ’need’ of additional charging capacity and infrastructure. Thisway, the app will in the end serve the car owners in two ways: first of all by helping them decideon their car buying decision, and if that decision is buying an EV car, the data supplied by thecar owner will help improve the charging infrastructure. This way, both the range anxiety aswell as infrastructural problems of EV cars are addresses.

The remainder of this research, including the prototypes developed, will mainly focus on thedata collection and analysis of this data to valuable insights in where to place new charging sta-tions. The advice part of the app, that serves as an incentive, will be a topic for further researchand development. It does however play an important role in this method (a personalised app)being a viable way of gathering data on driving behaviour/routes.

Lastly, an argument in favor of this app and hence to be addressed in this thesis would bethe interest in such an app by commercial parties, which could provide further funding andpromotion for such an app. Car dealerships, specifically those selling EV cars, are faced withthe concept of range anxiety on a daily basis; (prospective) customers assume they will not beable to perform their regular trips, or will end up stranded halfway, whilst the EV is perfectlycapable of completing the trip [19]. Giving customers insight in their driving behaviour mayassist in showing the possibilities rather than the limitations of EV driving. This app will helpcar dealers to show this, and hence, they might be interested in providing further funding forthe realisation of this app and ultimately research.

Apart from a personalised advise app other tracking applications could of course also be ofuse. For example, using data supplied directly by cars, or other appliances. Though this wouldbe optimal for data gathering, it would take long times and many parties involved to be able togather data in this way. Before such technology is integrated into a sufficient amount of cars,many precious time would have passed. Therefore, apps at this point are considered to be themost efficient way to gather data, as they allow for fast implementation.

15

3.5 Preliminary conclusions

Based on the findings as elicited above, trip data will need to be collected to allow for all re-search proposed by this study. The minimal requirement the data (source) will need to fulfil isas follows:

Every datapoint should consists of a start- and endpoint, denoted by GPS-coordinates, preferably withtimestamp. In addition, every trip will consist of waypoints that describe the route the car has taken fromstart- to endpoint. Those points again consist of GPS-coordinates, and preferably a timestamp.

In addition to the above, the trip may be linked to a user to later on provide this user withfeedback on his/her trips. This is however not a requirement, and up for further discussion inlater parts of this thesis.

Based on the research on existing data and various data gathering methods, this research willfurther focus on developing an app concept and a corresponding prototype for gathering data.All this contributes to showing such an app can deliver valuable data, and is in this case to bepreferred over other solutions. It will help to form solid conclusions on the location of futurecharging stations. This research will now further focus on how to realise such an app (the archi-tecture involved), how to analyse the gathered data (clustering), and which conclusions to drawfrom this analysis.

16

4 Requirements

Having identified the data need in the previous section, as well as that an app is the preferredmethod to gather this data, this section will now focus on eliciting the software requirementsfor such an app. These requirements will denote an optimal and extended form of such an app,which would be ready for general use and publishing in app stores. Based on these require-ments, a subset of requirements will be selected to create a prototype that can first be used toprove the research method of this study, and the analysis that has to be performed after datacollection. Based on this study a more extended version of the app as described below could berealised for large-scale deployment. These requirements mainly serve to explain how this studycould be realised in practice, and how it would be able to impact society.

Apart from an app to gather data, data also will need to be processed and ultimately pre-sented to stakeholders of the project. To do so, some requirements will be denoted for thesystem as a whole. This will be requirements on for example data storage and data accessibility.Furthermore, requirements will be denoted for a web portal that will allow the stakeholders toview analyses and conclusions that are relevant to them. Hence, three main components of thesystem can be identified:

• An app to gather data

• A platform to store and process this data

• A web portal to review analyses

Putting these three components together yields a system that helps to increase the EV chargingstation infrastructure by analysing user data. A description of each of the three componentswill first be given below, afterwards functional and non-functional requirements, as well as usecases and stakeholders will be identified for the system as a whole (as many cross-referencesand identical parts exist between the three components). Since this thesis is not on software re-quirements engineering or software development, the requirements elicited here are only meantto give the reader an overview of the intentions and ideas on a platform that will support thisresearch. Hence, no elaborate details on implementation will be given. Various implementationscould be based upon this basic set of requirements.

17

4.1 Components

An app to gather data

A need for gathering the required data inspired the idea of creating an app that could do exactlythat. It was also indicated that users will need an incentive to use this app. Since the motivationfor this study is the transition to Electric Vehicles, a good incentive for this app would be onethat stimulates that as well. Therefore, this study suggests creating an app that will, alongsidegathering data for the analysis of new charging station locations, give users insight in theirdriving behaviour. It will tell users whether their driving behaviour would be feasible with thecurrent coverage of EV charging stations, and what range their future EV would need to have(what type of EV is best for them). The app will gather data on user trips, and inform usersif their day-to-day car use will be possible using an electric vehicle (and how often they wouldneed to recharge). This serves both the interest of gathering data, as well as informing the publicof the possibilities of electric vehicles: many future buyers of electric cars have so-called ”rangeanxiety”; a phenomenon that makes users doubt their freedom of movement when switching toan electric vehicle. Of course, privacy considerations will have to be taken into account whendeveloping such an app. However great the incentive, privacy of users should be respectedfrom both ethical perspectives, as well as the possibility that the privacy issues outweigh theincentive.

A platform to store and process data

After the data has been gathered by the users, it will need to be collected and stored. To doso, a centralised platform needs to be created that can accept data from the app, and supportother features of the app as well (such as user identification, and components needed for theadvice to be returned to the user). This suggest the implementation of an API to interact withthe centralised platform. The platform will need to store the data in a way that it secure, reliableand allows for fast processing. The nature of the data (geospatial) needs to be taken into account.Also, the platform will need to be scalable since large amounts of data might be gathered andanalysed. Finally, the platform will also need to store results from analyses, and be able toserve them on request. This again hints to the implementation of an API, though not only forreceiving data, but also for sending data (on request).

A web portal to review analyses

Having gathered data, and being able to store and process it, the last missing piece of thepuzzle would be a portal to actually display the results, that helps to make the insights gatheredactually turned into decisions and changes towards smarter placement of new charging stationsand increasing EV usage. The web portal should allow stakeholders to view data relevant tothem, in processed form and in tools that help them make decisions. This will include maps,with options to filters the data displayed. Tools to operate the processing of data, or alterparameters for this processing.

18

4.2 Stakeholders

Several stakeholders can be identified for the system. Though some stakeholders may be morerelevant to some components, they are not tied to a specific component since they might beindirectly affecting other components of the system.

Prospective EV buyers Car owners of fossil fueled cars will be using the app to get advice ontheir possibilities of buying a (plug-in) (hybrid) EV. The app will analyse their drivingneeds, and show them what percentage of their routes could have been made using a(plug-in) (hybrid) EV.

Current EV owners Current EV owners may use the app to supply data. The advice will ofcourse be of less use to them, though data on their trips may help to increase the chargingstation infrastructure, which will eventually benefit them. It will also help to place thecharging stations closer to their actual needs (whilst they know might have to divert theirroutes).

EV car dealers Besides the prospective EV buyers, EV car dealers are stakeholders as well. Tothem, the app could serve as the ultimate way of convincing (future) customers of thepossibilities of electric vehicle driving. This app is a great tool for car dealers to overcomethe range anxiety problem.

Data analysts This group of stakeholders consists of researchers and other parties that are look-ing for relevant data on the use of (electric) vehicles, in order to provide other parties(stakeholders) with valuable insights on for example charging station infrastructure, as isthe case in this research. For them, it’s relevant to be able to access the raw data.

Governments Local authorities as well as governments or certain departments of governments,are interested in the behaviour and needs of (future) electric vehicle car owners to makedecisions on where to invest or facilitate in order the grow the electric vehicle use, andreach sustainability goals. The web portal, that provides insights from the various analysesprovides governments with a powerful tool in steering towards the use of electric vehicles.

Electric utility companies Electric utility companies are the main providers of charging sta-tions, or the power that is to be ’sold’ at charging stations. They are very interested in datathat could help them decide which new locations for charging stations are most econom-ically viable or could yield interesting results in the (near) future. Or, if they only supplyelectricity to other exploiters of the charging stations, where needs for electricity grid con-nections will be in the near future. Hence, they serve as stakeholders that are interestedin the conclusions that can be drawn based on the data gathered by the app. They will beusing the web portal to be informed of these conclusions.

AutoJUICE project The AutoJUICE project is currently building a platform for electric vehicleinfrastructure extensions: they provide services to both electric utility companies as wellas governments and centralise the needs of stakeholders as discussed above. The app andanalysis components of this research might be integrated with the AutoJUICE platform.

19

4.3 Non-functional requirements

The following non-functional requirements can be elucidated for the system as a whole and it’svarious components:

Scalability The system should be scalable, meaning that with increasing number of users sup-plying data (increasing app use), increasing number of requests for analysis of the data,or increasing number of requests for raw data, the components of the system (specificallythe centralised platform) should be easy to upscale, in order to process all the requests.

Privacy The privacy of the users supplying data should at all times be guaranteed. That is, oneshould not be able to trace back a set of data to a given person. The advice app wouldrequire to register which routes are from one user, details of the user should however notbe stored. Furthermore, data provided to third parties and as used in analyses shouldbe anonymised, meaning that user or app identification should be stripped. As thereare several ways of tackling this issue, and as it is such an important requirement, asubsection has been devoted to possible solutions with regards to privacy, at the end ofthis requirements section.

Performance Though analyses of large quantities of data take time, the system should strivefor optimal performance, eliminating weak points in data storage and access whereverpossible. This boils down to choosing a storage method that allows for both storing largequantities of data, as well as allowing efficient retrieval of this data later on.

Compatibility The system as a whole should be compatible with as many types as users, bothon the input (app) side, as for the output (export from the centralised platform throughboth the web app as well as export functionalities). This translates to the app being avail-able for at least the three major platforms: iOS, Android and Windows. For the centralisedplatform, it translates to using industry standards when it comes to technique choices. Fi-nally, for the web portal it translates to applying state of the art web techniques, andimplementing the web app to be available on a broad spectrum of devices (responsive).

Openness / accessibility The system, and specifically the centralised platform, should be open.That is, the data it contains should be easily accessible for third parties, to continue re-searching it. The same requirements as described at ’Compatibility’ apply.

20

4.4 Functional requirements

REST API The centralised platform should implement a REST API, allowing apps to providetheir data to the platform, and make themselves known to the system. Furthermore, aREST API should be provided to retrieve results or raw data from the platform, preferrablyin JSON format, though alternative forms may be supplied. The app should communicateit’s result using the REST API, in JSON format.

Incentive The app should provide an incentive for users. That is, it should not merely gatherdata, but provide some result or bonus feature to the user. This could either be informationsupplied, or advice given. This study proposes advice given (see below).

Advice in app The app should provide the user with an advice on his/her abilities to make thetrips that are collected with an electric vehicle, and which type of electric vehicle would bebest suited for this user. The app could also indicate that EV driving is at this point in timenot feasible for the given user. The user should be shown the percentage of trips he or shewould have been able to make with an EV without on-route recharging, and the percentageof trips he or she would have been able to make with one or more recharges. Additionally,the user could be shown these specific trips; those might be incidental trips the user mightbe willing to make with a different mode of transport. Additionally, predictions could begiven on when the driving behaviour could be met due to the expansion of either chargingstation infrastructure or battery capacities (new types of vehicles released).

Maps - plotting points & trips The web portal should allow for points (GPS latitude/longitudecombinations) as well as trips (ordered sets of GPS latitude/longitude combinations, rep-resented as polylines) to be plotted on a geographical map. The underlying map shouldpreferably be from a service such as Google Maps that also shows PoIs (Points of Interest),cities and roads. Those points could either be datapoints as entered, or datapoints result-ing from an analysis. The user should be able to select which datapoints he or she wantsto view.

Showing user statistics The portal should allow administrators to show statistics on the num-ber of users currently using the system, and the number of trips uploaded to the system.The number of users consists of two types of users: app users that supply data, and usersthat use the portal. For those users, statistics on the usage and number of analysis madecould be shown.

Downloading data Users of the web portal should be able to download selections (filtered) orfull sets of data for further analysis on their own system. They should also be able todownload data from the analysis (identified charging station locations). Downloading ofthis data might occur through either the web portal or through the REST API.

Generating and viewing analyses Users of the web portal should be able to start the analysisprocess (that will be performed on the centralised platform). The results will afterwardsbe shown on the web portal. For various analyses some parameters can be set. Thoughdefaults and advices will be given by the ChargeQuest project, the user will be allowed tochange those. Such parameters might include the distance a user is assumed to drive on afull battery. The full list of parameters will be further elaborated on in the Analysis sectionof this study.

Saving analyses The analyses might take quite some time to process. Therefore, results of ananalysis, including the parameters used to generate them, might be saved per user of theweb portal. This way, a user can quickly retrieve or compare several analyses. A usershould be given the option to rerun the analyses if new data comes available (using thesame parameters as the saved analyses). A list of saved analyses should be shown to theuser.

21

4.5 Use cases

Though the above requirements give an overview of the components the system should consistof, the below use cases might give further insight in the envisioned system.

4.5.1 Using the app to collect data on trips

Stakeholders involvedProspective EV buyers, current EV owners, EV car dealers, data analysts

Components involvedApp, centralised platform

DescriptionA user (car driver) that has the app installed on his or her smartphone wants to gather data toretrieve advice from the app (incentive) and contribute to the research. The user will once haveto download and install the app on his or her smartphone, and enable it (accepting that the appmight poll the users’ location). Afterwards, the app will continuously monitor the movement,and if it detects that the movement is part of a trip, will start registering a trip. The user can atany given time choose to upload the trips collected until that moment in time, or delete someof the trips before doing so.

Preconditions

• The user in the posession of a smartphone with GPS-capabilities

• The smartphone has a working internet connection

• The user is about to make one or more trips with his/her vehicle

Flow of events

1. The user downloads and installs the app

2. The user activates the app and approves it tracking his/her location

3. The app checks for movement and if it identifies it as part of a trip, saves this movement(GPS waypoints)

4. Upon completion of a trip, the app saves the set of waypoints collected as a trip

Steps 3 and 4 may be repeated

5. The user may choose to remove certain trips

6. The user starts uploading of the trips from the app

7. The centralised platform retrieves the trips using the API and confirms reception to the app

Steps 3 through 7 may be repeated

Postconditions

• One or more trips have been added to the database of trips on the centralised platform

22

4.5.2 Getting advice on EV choice

Stakeholders involvedProspective EV buyers, EV car dealers

Components involvedApp, centralised platform

DescriptionA good incentive for using the app will be to provide users with a personalised advice onwhether or not their trips can be completed with an EV, and if so, what type of EV. Alterna-tively, an (EV) car dealer might advice a car owner to use this app to retrieve these insights.After allowing the app to gather some data over a period chosen by the user, the user mightactivate the advice option. This will then request information from the server on the users’ trip,and provide statistics on his/her driving behaviour. This will be matched against a number ofavailable EVs.

Preconditions

• The user has uploaded one or more trips to the centralised platform

• The smartphone of the user has a working internet connection

Flow of events

1. The user requests a personal advice from the app

2. The centralised platform receives the requests and responds with statistics

3. The app visualises the statistics on the smartphone

4. The app can retrieve available EV cars and their specifications from the centralised plat-form

5. The app matches the statistics to available EV cars, and shows the result to the user

PostconditionsThis use case has no implications on the state of the system or data in the system.

4.5.3 Accessing data stored on the platform

Stakeholders involvedData analysts, Governments, AutoJUICE project, Electric utility companies

Components involvedCentralised platform, Third party applications

DescriptionTo be able to expand the platform, it must is set up as an open API. Other third party applica-tions should be able to connect and request stored information on trip and/or results. This wayinformation can be reused for different purposes

Preconditions

• The storage component contains information on trips and/or results (otherwise yieldsempty results)

• The third party application uses REST and is implemented to use the API definition asspecified

23

Flow of events

1. Third party application initialises a connection

2. Third party application performs request

3. The centralised platform executes the request and collects the requested data

4. The third party application receives the results of its request

Steps 1 to 4 may be repeated

Postconditions

• The requested data has been send to the third party application

4.5.4 Analysing trip data on the platform, and viewing the conclusions

Stakeholders involvedData analysts, Governments, Electric utility companies, AutoJUICE

Components involvedCentralised platform, Web portal

DescriptionA user of the web portal (note: this is a different user than that of the app) can obtain analysesand results via the web portal. After logging in with the supplied credentials, he or she is ableto select the analyses available on the centralised platform. The results will consist of indicatednew charging station locations on a map. The user can either select existing analyses that havealready been performed using suggested parameters, or can choose to alter the parameters.The list of available parameters will be specified for each analysis later on in this thesis. Whenrunning a new analyses, some waiting time might be incurred. Afterwards, the results will beplotted on the map. The user may then choose to select another analysis, or end his or hersession.

Preconditions

• The storage component contains information on trips and/or results (otherwise no analy-sis possible)

• The user has received credentials for the web portal

Flow of events

1. The user logs in to the web portal, and selects the viewer component

2. The user can choose to do one of the following:

• Choose an existing analyses (computed earlier) from a dropdown menu

• Enter parameters for a customized analysis

3. The centralised platform retrieves the previous results or performs the custom analysis

4. The results of the analysis are plotted on the map, the user may interact with the mapSteps 1 through 4 may be repeated

Postconditions

• If a custom analysis was generated, its results will now be stored on the centralised plat-form

24

4.5.5 Viewing the trips on which data has been gathered

Stakeholders involvedData analysts, Governments, Electric utility companies, AutoJUICE

Components involvedCentralised platform, Web portal

DescriptionA user of the web portal (note: this is a different user than that of the app) can obtain anoverview of the trips gathered by the system. This gives statistical insights and might allow forinsights in the analysis process. After logging in, the user can select the trip viewer. The usermight choose to filter the trips, either on date reported or on location. Afterwards, the chosenfiltered subset, or the full set if unfiltered, will be mapped. Since these are trips, no individualpoints will be plotted, but lines between the points a trip consists of. The map will show roadsand highways, to allow verification of the plotted trips.

Preconditions

• The storage component contains information on trips (otherwise no trips will be shown)

• The user has received credentials for the web portal

Flow of events

1. The user logs in to the web portal, and selects the trip viewer component

2. The user selects the filters he/she wants to apply, or chooses to use no filters

3. The centralised platform retrieves the stored trips and plots those on the map. Steps 1through 3 may be repeated

PostconditionsThis use case has no implications on the state of the system or data in the system.

25

4.6 Privacy

When designing an app that tracks users, and gathers large amounts of data, it is important toconsider privacy of the users of the app [10].In this case the car owners. In this section someissues and concerns with regards of privacy will be discussed, that can serve as a requirementfor the design of the app. Two possible solutions are discussed.

IssuesWhen dealing with personal information such as locations, one will have to comply with legalauthorities such as the Dutch laws about safe keeping of personal data [6]. This has to be keptin mind when storing the data. It might also require extra functionality such as the possibilityto delete data under the European Law about the right to be forgotten.Though one might consider that car owners might be willingly to provide a information in ex-change for the functionality [7]. This already happens with the Android Apps in the Play Storehaving an all or nothing policy on device permissions. Without accepting all of the permissions,one cannot use the app. Though some users might share their location data in ignorance of theconsequences [11].

A privacy preserving solutionOne initiative used for location sharing with social networks is called PlocShare [12]. It uses a

Public-Private Key concept with an access list to share your location with allowed parties. Thiscan be used to allow that location data from users can be safely shared with the program usedto calculate a possible car advice. In the mean time the car owner location data is hidden fromthird parties and can only be shared if the car owner allows it.For calculating the advice for the location of new charging poles, data of individual car ownersdoes not have to be shown, just their trips. This method however, does not protect against theshared data being stored. The car owner loses control of their data when they share it withanother party.

User choiceAnother possible solution is to let the car owner decide what data they would like to share. Forthis the mobile app should be constantly monitoring the location. This data is then stored onthe device. The app generates a notification when some data is collected. (I.E. once a week)This allows for reviewing of the gathered data by the car owner before sending the data to theserver.

26

Figure 2: Possible data flow diagram

Aside from providing the user with a choice on what they are willing to share, they canremove trips they don’t think are representative for their car usage, such as trips in publictransit.

4.7 Conclusions

Though these requirements only give a global overview of the basics that any system supportinga study like this, or in a future larger deployment supporting infrastructure decisions or sus-tainability goals, the main technical needs can be derived from these requirements. This studywill elaborate on them in the next section.

27

5 Technical research

The second research question of this thesis focuses on what architecture is needed to store thegathered data, and later on process it. This section will elucidate on the techniques available,that could be used to create such a platform.

Before going into detail on technologies, it is important to denote what such a platform should:

• Serve incoming requests from information sources, such as the mobile app

• Provide storage to large amounts of location information

• Provide fast access to queries, including queries using location information

5.1 API

Some of the functionality of the centralised platform can be offered as an API, to be precise, aRepresentational State Transfer (REST) API. This is a software architecture style consisting ofguidelines and best practises for creating scalable web services[27].

It uses a lightweight payload (body of a request containing extra information) in JSON formaton top of HTTP. The request from mobiles consist only of two options: uploading informationon driven trips and requesting advice based on the driven trips. These requests can be done inparallel and therefore scaling would not be a real problem. Real world applications of load bal-ancing are a common practice[9]. Since REST is based on HTTP, it can be implemented by mostplatforms which are capable of making internet connections. There exist a lot of web servers,application servers and specialised applications which use REST as a means to communication.It can also be implemented on top of existing services such as a web server with PHP. This doesnot require any special third party libraries.

The REST API can be implemented by adding server side computations to verbs of HTTP (GET,PUT, POST and DELETE). With both PUT and POST it is possible to send a payload of infor-mation to the REST API with the information to be inserted. One example of a large API usingREST is the Twitter API[32]. It uses the API to enable developers to communicate with its ser-vices. This allows for third party apps and integration with other services.

Since REST can be used with most modern web platforms. It might be wise to combine theREST API with the portal. This allows for one machine to serve two purposes.

28

5.2 Storage methods

A key component of the architecture is the storage. Both the front end and the analysis serverare perfectly horizontally scalable. The storage however must provide these facilities for largerspatial data sets. This type of data has special requirements which will be discussed first. Whenstoring this spatial data several options come to mind. In this section the two major distinctstorage methods; relational and non-relational (NoSQL) will be evaluated.

(Geo)spatial databases

The methods discussed about data gathering all deliver coordinates as data points. A databasehas to provide functionality to perform queries based on this spatial database. Some databasesalready feature such operations, these are called spatial databases. A spatial database is adatabase specially equipped to store data with a geometric component and to retrieve resultsusing topological and distance-based queries [3]. Non-Spatial databases would use indexesto look up values, though this approach is often not optimal for spatial data storage. Spatialdatabase instead use a spatial index to store this information to speed up the operations on ge-ographical data stored in this database. An index which would be suitable to this project wouldbe a sphere index. This sphere index takes into account for the distance between coordinateson the earth. Another benefit of using spatial databases is that it has build in functions forcomparing different spatial fields. This allows for queries to quickly look up coordinate pointsclose to a given point.

Relational spatial data storage

Relational storage consist of row based entities. An object model is divided over multiple tableseach having one or more entries per object. Spatial fields are stored in a similar manner asnormal primitive fields. Four popular relational database systems with a spatial implementa-tions are: MySQL, PostGIS, Oracle Spatial and IBM-DB2 Spatial [4]. MySQL[14] and PostGISare open-source and therefore free while the other two solutions are closed source applications.The open source versions are discussed.

MySQL

MySQL is an open source relational database which is developed by Oracle. MySQL has built-infunctionality for storing geospatial data inside regular database fields. Geospatial types can beretrieved in both binary and text formats. In MySQL these types are stored in binary format,but need to be translated to a text format to be used in queries. Indexes can be placed on thesefields, empowering the build in functions to issue queries for near locations.

PostGIS

PostGIS is similar to MySQL, but has specialised components for spatial data. It is an exten-sion for the PostgreSQL database software which is an open source alternative to the corporatedeveloped MySQL. Similar to MySQL, it has the ability to store geometry directly in fields. Ithas build in functionality to perform operations on them and can be directly compared insidequeries used to look up locations near a requested location. It also has capabilities to definespatial indexes.

In both relational storage methods, size is an issue. Tables with thousands of rows have poorperformance[4][2]. Considering the amount of collected location points, this might be an issue.

29

Non-Relational spatial data storage

NoSQL or Not Only SQL differs from the relational storage method that tends to store data ina tabular form. NoSQL storage units consist of Document oriented storage. This means that inthe data storage design a complete object can be stored in a document, instead of rows whichhave relations to each other in a database. NoSQL comes best in place when the geographicaland other data collected will have a high frequency of change, high variety of structure orhigh volume. Relational database tend not be able to provide the needed efficiency in termsof performance and scalability[1] [2]. Two open source NoSQL databases will be discussed:Accumulo and MongoDB.

Accumulo

Apache’s Accumolo is a mapping system inspired by Google’s big-table design. In this big-tabledesign key-pairs are stored that contain for every data object a row key, a column key, and atime stamp. Values themselves are represented by byte-arrays[3]. An example of this key-valuestorage is shown in the image below.

Figure 3: Key-Pair storage example[3]

Accumolo builds on this principle and enables a back-end for these systems. It is built uponHadoop, Zookeeper and Thrift. Look ups can be based on these keys to search for certainresults. One could for example try to filter on a certain period in time, and could combine otherfactors such as key identifiers to retrieve a subset of results. Unfortunately it has no build infunctionality for spatial data.

MongoDB

MongoDB[8] is one of the most popular NoSQL databases. It is a document oriented Database.It’s written in C++ and has an interface to many programming languages. MongoDB has built-in functions to handle geospatial queries. It is possible to specify an index on coordinates. Thesehave to be formatted in GeoJSON[28], an open format to specify coordinates and other points.The format is able to define several geometric values such as points, shapes and line strings.It uses the same JSON language used by MongoDB to format the different types. MongoDBprovides concurrent access, and only locks at a field level of a document. It supports dynamicqueries, just like relational database systems [2].

Downsides of NoSQL

Most NoSQL databases do not provide ACID properties[2]. Updates are eventually propagatedto other servers. Therefore the acronym BASE (Basically Available, Soft state, Eventually consis-tent) is used. This however should not cause problems, since the data is mostly only written toinstead of read. Reading only happens in small requests from users for their advice, or in onelarge batch by the analysis server.

30

5.3 Hosting and processing architecture

In an envisioned scenario the architecture of the program will exist out of several modules. Thegeneral architecture can be observed in the image.

Figure 4: Utopian Architecture model

The architecture consists of several major aspects. Initially the phone is an individual compo-nent that will supply location information to the REST API, which will be universal to supportmultiple formats of input. This enables the user to also supply information from other source(e.g. an integrated system in a vehicle). This information will be saved to a general storagelocation and will be processed by multiple input nodes.

To determine the location of a new charging station a separate component that only interactswith other components through a database is introduced. The analysis server will performcertain analysis steps on the data that will result on an estimated ideal location for a new charg-ing station (see chapter 7). The server has to interact with an external database that stores allthe locations of existing charging stations (and will have to fetch and store them on a regularbasis). The final results generated by the analysis server can then be stored to the generaldatabase again for further processing and visualisation.

The web portal is capable of visualising the data stored in the database. Not only the newlocations of charging stations but also aspects such as the stored trip information, existing solu-tions and preliminary results by the analysis server.

The architecture will enable different input methods (as long as they meet the demands of adefined API) to be analysed and a wide range of results to be visualised. The separation be-tween the modules will require every aspects to be run as a complete different module with nodependencies but a general storage location.

31

5.4 Location aware app

The above focuses on the centralised platform, on what happens after data is gathered. But assuggested by this study, an app will be used to gather data. Some technical research on thefeasibility of such an app is necessary as well.

Many apps exist that provide location aware services[31]. Some examples are: Buienalert, anapp warn the user if it starts to rain in the next 5 minutes, Strava, which is an app to track run-ning and cycling trips. The techniques used in these apps are common practice and especiallythe Strava App provides similar functionality used by the proposed app.

From the requirements the app has to contain some basic features:

• Location polling, which is the gathering of coordinates of the phone.

• Storing trips

• Sending the stored trips to the server

Both location polling and sending information to the server require a lot of power from thephone[29]. Therefore, the sending of data is minimised by storing data locally. Allowing forstorage on the device, gives the possibility of sending all trips at once.

The location polling is another major energy consuming activity. Modern mobile phones oftenallow for two location services[30]: GPS and wireless networks. The wireless network optionsis generally less energy consuming at the cost of accuracy. GPS provides better accuracy, butconsumes more electricity.

In normal time based polling polling will result in unevenly distributed points. This is be-cause the car will not drive at the same speed during a trip. To eliminate this, the polling willneed a filter based on distance between the last point stored by the app.Multiple distance arepossible as a filter. Choosing a larger distance will result in fewer points stored per trip. Thelarger distance will however decrease the resolution on how trips can be used by the analysis.To get the right amount of points per trip, an estimation is used. This variable may be changedfor different needs. Based on the definition of a trip, a distance of 250 metres is used. Both GPSand wireless networks will be tested with the distance based filter, see section 6.4 for results ofboth types location polling.

Development and design of an app will not be the main aim of this research, as several so-lutions already exist and provide similar functionality as needed.

5.5 Preliminary conclusions

To recall, the research question that was discussed in this section is How to design a data storageand processing architecture that allows for easy processing and statistical analysis of the data gathered?.The following preliminary conclusions can be drawn based on the technical research conducted.

The requirements for the platform do not require new technologies to be developed, since existing soft-ware solutions are sufficient. The app will use existing techniques for retrieving the data. Two methodsare possible: GPS and mobile networks. Research on spatial data yields two possible solutions for storingthe data. From the two possible options for storage, a NoSQL options would be the best solution as it canstore large amounts of data in a document oriented way.

32

6 Prototyping

Since the focus of this study is on identifying new charging station locations based on userdata, rather than a software development project, the requirements given and technical researchdone are mainly important to identify possibilities and prove feasibility. Part of the features willhowever be implemented as a prototype, to assist in further research, specifically on the analysis.What features will be implemented and in what way will be elaborated on in this section. Thiswill also help to show which compromises with regards to the envisioned requirements havebeen made.

6.1 Data gathering app

The first component that requires prototyping is the app to gather data. The prototype helpsto show polling is technically feasible, and helps to supply necessary test data. The incentive(advice) part of the app will not be implemented as it does not serve the research, but is merelya practical implication.

Currently three mayor mobile platforms exist[13]. These are the Android OS from Google,Apple’s iOS and Microsoft’s Windows Phone.

Figure 5: Relative market share of mobile platforms

From figure 5 can be seen that Android currently has the largest market share. Combinedwith the open nature of the platform it is an ideal target to develop a prototype. The secondlargest platfor iOS from Apple. Unfortunately, it’s App Store is a closed platform. The otherplatforms such as Windows Phone and Blackberry OS have lower market shares, and shouldtherefore not be considered as first to develop.

Developing apps for all platforms is out of scope of this project. Only the app for androidwill be developed due to its high market share and openness of the platform. The app de-scribed here will not contain platform specific functionality and could be ported to the otherplatform easily.

As said, the prototype app will only contain the tracking functionality, and functionality that

33

interacts with the centralised platform, since these are the components which need to be testedfor feasibility.

Therefore the app should consist of a background polling service, polling the location every250 metres. The app should be able to determine trips from the gathered data. If the phonedoes not move within 20 minutes, a new trip is started after the phone moves 250 metres.

Figure 6: Screen shot of trip upload selection

After making a few trips the app shows completed trips in a list, allowing the car owner toselect and send the stored trips to the server. The app should convert the trip to the given APIformat seen in figure 25 from the appendix.

For the prototype two other API functons are implemented. These provide authentication ofthe server and registering the app with the web service. These definitions can be seen in figure26 in the appendix. It is also possible to ignore or delete trips locally on the device to savememory usage and protect privacy. It will not be possible to delete the trips from the serveronce send. These are the bare minimum components the app consists of to function properly.

34

6.2 Centralised platform

The centralised platform should implement both a REST API for the mobile app, as well as anAPI to later be used for displaying information to the other stakeholders. Furthermore, it shouldof course offer the capabilities for storage and processing of the data. Though the techniquesto be used here are already existent, and hence prototyping might not be considered necessary,such a centralised platform is in fact needed. To answer all research questions, analysis alsoneeds to be performed. Therefore, the centralised platform needs to be implemented as well.Though a less extensive version than proposed will be implemented, featuring only basic func-tionality, will do.

As discussed in section 5.2 there are two choices for storing the data: either a MySQL databaseor a MongoDB database. The latter being preferred because of its performance with large datasets.

API

Because the web platform has no special requirements to the underlying implementations, theLaravel PHP Framework is chosen for prototype development, as it integrates easily with all theother components of the system, it uses the PHP programming language which is widely usedand described (which allows for quicker prototyping, and easier extensions by others based onthe prototypes developed for this study).

The Laravel PHP Framework can also be used to create web applications, such as the dynamicoverview of the results from the analysis (web portal). This is an additional reason to choosefor the Laravel PHP Framework as it requires only one technique to be used and understood forprototype development, leaving more time for research, with less time spent on prototyping.

Data storage

Having identified the techniques to use in the prototype of the centralised platform, manychoices on the actual implementation can be made. Though many of these are not that relevantto the research, and will not be discussed here in detail, the way data is stored on the centralisedplatform is discussed, as it is in part an answer to the second research question on how to storeand process data.

Of course, the way to store data can differ, it can be implemented otherwise, with the archi-tecture choice for the MongoDB platform remaining valid. This storage design does howevertake into account the properties and strenghts of MongoDB as much as possible.

It is proposed to store data in three different MongoDB databases. An overview of the completestorage design can be seen in figure 7. The GeoJSON [28] point format will be used for storinglocations as coordinates. MongoDB can generate indexes for easy (geospatial aware) querying,by storing the coordinates in this GeoJSON format, of which an example can be seen below.

”startpoint”: {”type”: ”Point”,”coordinates”: [53.0, 3.5]

}

35

The first database shown, the Portal database, will contain three collections:

Users A collection of users of the portal with credentials

AppUsers A collection of users of the mobile app with app identifiers

Analyses A collection of all analyses run with their respective parameters, see section 7.8

Input parameters for more details on the input parameters

The Users collection will only consists of credentials used for authentication. The Analyses

collection will link back to the User Collection to see what analyses are executed by which user.

The AppUsers collection will be used to identify single users of the App with an AppIdenti-fier. It can be used to calculate car advice based on their trips stored. It also contains whetherthe users home address is able to charge the car; eliminating the need of a charging station attheir home address.

The second database, AnalysisResults, will consist of two types of collections. For each analysisrequest, a new collection will be made. Due to MongoDB’s size limit of 16MB of one documentit is not feasible to put all results into one collection.

The two type of collections that will exists are:

newchargerlocations which contain coordinates of the new charger locations.

needs which are coordinates with a given weight defined in section 7.3. The documents in thiscollection also contain a colour attribute for visualising the weight in the viewer of theplatform.

The third database will consist of three static collections which can periodically be updated fromexternal sources:

FastChargers a collection of coordinates of all fast charging poles

AllChargers a collection of all local charging stations

Trips a collection of all the trips made by the users of the app

Please note that the FastChargers collection is a subset of the AllChargers collection, but it ischosen to separate them for easy usage in the various analysis steps.

For each trip in the Trips collection, the corresponding app identifier is stored, to keep track ofall trips belonging to a user. This is not used in the prototype, but would be relevant when pro-viding advice using the app (personalised recommendations or analyses could then be made).Besides that, both the start and end location of a trip are stored. These can be used to calculatethe needs for charging stations at static places such as homes, shopping centres. Next an arrayof points will be used to keep track of the route. In GeoJSON it is also possible to specify anarray of locations as a polyline. It takes less data to store the trip, but the information about timeis then lost. Therefore, a the point format is chosen here. The time information could be used toextend the clustering algorithm to specify at what times needs are present, such as weekdays,weekend etc.

36

Figure 7: The storage design

37

6.3 Implemented architecture

In the section on Technical Research, an envisioned architecture has been proposed. The diagramshown in figure 8 reflects the actual situation that will be implemented as a prototype. Thechanges with relation to the architecture model in figure 4 are:

• Only one server implementing the REST API, no load balancing. Since for small groupsof test users this will not be needed.

• A simulating program is run on a server to insert simulated trips based on directions fromthe Google Maps API.

Figure 8: Implemented Architecture model

The used programming language for the application is, as build in android, Java for obviousreasons. The storage of large geospatial data is handled by MongoDB as it offers great geospatialquery support, performs excellent on big data sets and has no problem storing large data sets.The analysis server will make use of the Java programming language is it is highly portable(platform independent), scalable and has great interaction drivers with MongoDB. This willenable for an easy and uniform application that can be viewed on the web portal. The web portalis written in Laravel (PHP Framework) as it has good compatibility with MongoDB drivers andeasy integration with external API’s like Google Maps.

38

6.4 Testing

The main goal of the app is to track a user for trips. For this, the phones internal GPS functional-ity is used. Most mobiles platforms also provide location services based on wireless connectionssuch as WiFi and cellular networks. The accuracy of the latter is less then that of a GPS location.One example of a trip tracked with the location provided by cellular networks can be seen infigure 9. The trip only followed the high way without detours. Since the points are often notnear roads and it can not be guaranteed that the phone polls every 250 metres, this method isnot recommended.

Figure 9: A trip tracked with wireless connections

Provided that the GPS used in phones is sufficient for navigation purposes, it should alsobe sufficient to store the location information. The location points in figure 10 were retrievedby requesting the Android platform updates each 250 meter at minimal 5 second intervals,following the same route as in the last figure.

39

Figure 10: A trip tracked wich GPS

Collecting multiple trips is done by continuously polling for the location data. Since theAndroid operating system will only give a new location every 250 metres, it will not updatewhen the car owner is standing still or if the car is parked somewhere. For testing purposes,every movement is considered as part of a trip. The car owner has to manually activate thepolling and disable it after a trip is done. Though, if the car owner does not move 250 metreswithin 20 minutes, a new trip is started automatically when the next update arrives. This willwork when the car owner goes to work, home or a meeting. It will fail however when the carowner will go to a large shopping centre and will walk more than 250 metres in 20 minutes.Solutions to this will be discussed in section 8 on future work. The prototype app will not havethis functionality.

40

6.5 Data simulation

Though a prototype of the app can be used to show that tracking users and supplying thisinformation to a centralised platform is feasible, it would, in a short amount of time, not renderenough trips to do further research on analysis, as proposed by this study. Therefore, a needarises to simulate data, that approximates the data generated by the app as closely as possible.

To allow for data simulation, a Java application can be developed that uses the Google Maps Di-rections API to generate trips between two locations. For a given pair of places A and B, routesare requested from the Google Maps Directions API. The API supplies up to three differentroutes per request. These alternative routes make sure that not only main roads are used (overand over again), but also other (local) routes are simulated as part of a trip (creating a greatercoverage).

After retrieving the trips, which are supplied as polylines by the Google API, points are takenapproximately every 250 meters apart, just as the app would do in reality. The resulting set ofpoints, as well as a start- and endpoint, can then submitted to the centralised platform as if theywere generated from the app. This way, analysis can be performed on large amounts of data,and the capacity of the platform can be tested, even though testing of the prototype app is stillin early phases.

A close looks at an intersection provides a more detailed look at the similarities between thesimulated results and the results from testing the app. As seen in figure 11 both trips don’tfollow the road exactly; both methods poll only every 250 metres, some shortcuts will be made.The shortcuts are not a big issue, because only the points are used by the framework to deter-mine a new charging station location.

(a) Part of a generated trip (b) Part of a trip tracked with the app

Figure 11: Comparison between generated and app results

What locations for the start and end points of a trip need to be chosen will be furtherelaborated on in the analysis section, as this section only focuses on the technical side of theproblem. Though this is a valid way to test if the analysis process works. Eventually, actual userdata will be needed to make valid conclusions.

41

7 Data analysis

This section on data analysis discusses how to determine the locations of new charging sta-tions based on user generated data. In the previous sections it has been described what data isneeded, how this data can be gathered, what techniques are to be used to do so. Furthermore,a platform has been set out where the data can be gathered and processed. This section willpropose analyses to perform on the dataset and platform, that will yield conclusions on whereto place new EV charging stations.

The proposed method consists of four major steps that each will be evaluated as a separatecomponent (and can be executed as such). Together, these four steps form the data analysiscomponent of this study.

Figure 12: Data Analysis Model

7.1 General approach

The general approach can be divided into four steps that will be shortly evaluated to give thereader a quick overview about the setup of the program. After that, every aspect of the analysisis discussed into detail to give the reader further insight. As already noted in the problemstatement, two different types of charging stations exist. It is important to note that both typesof charging stations will be able to be processed by the model but require a change in theparameters to give the correct answer.

On-route charging stations

People who have an electric vehicle and often use them to drive to work will likely not require torecharge on every route. If these users would deviate from their normal driving route they how-ever might require a recharge along the route (depending on the range of the electric vehicle).The model is able to calculate the demands of such locations but requires certain knowledge.Assuming everyone starts his or her trip with a full battery, it is required to know the range anelectric vehicle can drive until the battery runs out of power. This can be a predefined rangefor every vehicle, which is currently implemented in the model for ease of generating results,or a more real word scenario where such information is entered by a user. This scenario wouldeither require a user to give some details about his electric vehicle or would require an integra-tion with the vehicle. The demand or need these users have on a route is highly dependent onthe existing charging solutions on their route and the driving behaviour of these users.

Local charging stations

The model can also identify the need for local charging stations. However, initially it is impor-tant to identify the actual need of such data points (as discussed before). Once these points areidentified and represent users that do not have a recharge possibility at locations where theyoften come to a stand still (e.g. work or at home) the model can process this data to identifylocations that have a high demand for these kind of stations.

Both these types of stations require the model to run separately with different parameters.Initially the model will calculate a weight that identifies how high the actual demand is for anew charging station at a certain location. This weight depends on factors such as availablecharging stations in a given area and could include (but is excluded in this model) other factorssuch as; demographics, convenience, instalment possibilities etc. in that area. This weight is

42

represented by a numerical value in a certain range. For this study, a range between 0 and 5 ischosen (double values).

After these numerical values have been calculated it is important to identify dense areas ofdemand (areas with a large amount of high valued weights in a small area). Adjusted versionsof existing cluster identifying algorithms are used to discover these areas and to group themtogether (if required) for further processing. After such dense areas have been identified a dif-ferent clustering algorithm that requires knowledge (that has been previously obtained by theother algorithm) of the data set will be executed. Depending on the size of the identified densearea, this step will, if the area is of substantial size, split it up in areas of equal demand. Theclustering algorithm will then define a centralised point that represent locations of new chargingstations inside the large cluster.

The post-processing step will help the user visualise the demand and will make it possible,with some human aided input, to choose a location for a new charging station.

Figure 13: Data Analysis Pipeline

What is important to realise here is that every step could be executed as a separate module.Every module will have separate input parameters and will each store their results to a differentdatabase. If a user would then change parameters that only belong to a certain step, not allsteps will have to be executed again (saving execution time). For example, a scenario where auser wishes to change some visual results. It is however required that if a user would changeinput parameters of a module that all following modules will have to be re-executed as well(e.g. changing parameters in step 2 will require re-execution of steps 3 & 4).

43

7.2 Test data generation

As discussed in the earlier section on methods to gather data in this thesis, there is a need tocollect trip data from users. The proposed and preferred method do so is using an app. Thougha prototype for this app has been developed, and full requirements for such a data gatheringmethods have been given, it was for this study not feasible to actually gather enough and rep-resentative data to test the analysis of user generated data. The workings of the app have beentested (technical parts), however in the relatively short period it was not able to gather enoughuser data to get a representative view of user driving behaviour in the Netherlands.

Therefore, for the research on analysis it was chosen to simulate the type of data that wouldbe generated by the app: trips between various cities (large and small) within a country, thatsimulate the driving behaviour of individual users. This way, the same type of data is collected.Of course, this study has to assume here that people were to be driving between the specifiedcities. The outcomes of the analysis will therefore show where charging station should be lo-cated, given that users drive as simulated. Whilst the actual trips may differ, the method ofgathering data and analysing it is equal, hence this simulation can be used to answer the re-search questions on analysis of user generated data.

To mimic a real life situation as closely as possible, 40 cities have been chosen in all parts ofthe Netherlands, both large and small. A list specifying them can be found in the appendix.The pairs of cities will hence represent local traffic (within municipalities or provinces) as welltraffic from one part of the country to another part. This hence simulates both longer distancetrips (which will incur needs for EV charging stations), as well as trips that will not render anyneeds due to their short distances. This is the spectrum of results that is currently to be expectedwhen gathering data using an app.

The actual simulation was made using the Google Maps Directions API, as explained in thesection on prototyping. For the given list of cities all possible pairs of cities are computed. Foreach of these pairs, routes and alternative routes are requested from the Google Maps Direc-tions API. This results in two to three routes (two for cities that are closer) between the twogiven cities. It was chosen to only request routes between unique pairs of cities, where A to Bequals B to A, to limit the number of requests for the Google Maps Directions API.

The given routes are then transformed to trips according to the data model specified in thetechnical requirements: each trip consists of a start- and endpoint, and n waypoints, approxi-mately 250 meters apart. These are then inserted to the system as if they were inputted by anapp user. What happens afterwards is equal to the ’regular’ process, and will hence be describedfurther on in this section.

Though trips from for example north to south are (close to) identical to the same trip fromsouth to north, it is important to denote that in further analysis this study will identify so called’needs’: points on the route were a driver will need to recharge, based on his distance from thestart location. Hence, the analysis result might be biased (since needs tend to occur in the laterparts of a route only, not at the start). To solve this, after retrieving the routes form the GoogleMaps Directions API, an inverted version of the route was added to the simulation as well. Thissolves a possible bias that might have been created.

44

The simulation renders the following map of trips:

Figure 14: Trips simulated using the Google Maps Directions API

One can clearly recognise the Dutch grid of main highways, as well as some local streetsand roads. Please note that due to the alternative routes, part of the trips plotted run throughBelgium and Germany (at the bottom and right parts of the image). Therefore, one might notdirectly recognise the distinct shape of The Netherlands.

Test data for analysis of local charging stations

The simulated data described above is ideal for analysis of on-route charging station, however,only a limited number of start- and end locations will be included in those routes: only 40unique ones, at the center of each of the listed cities. This will be of little use when tryingto locate new local charging stations, and is of course not exemplary for the data that wouldbe gathered using an app. This would show a much more diverse spread of locations, and adifferent start- and endpoint for at least each user of the app. Hence, specific test data needsto be generated to evaluate new local charging stations. To do so, for a given city, in this caseGroningen was chosen, 1000 random locations (GPS-coordinates: latitude/longitude combina-tions) within the city borders were generated. Using the Google Roads API, these points weremapped to the nearest road. The resulting 1000 locations serve as test data for analyses on newlocal charging station locations.

45

7.3 Weight determination

As the on-route charging station scenario requires the most complex approach with regards toanalysis (it requires more parameters than the local charging station level) this study will usethis to describe the main setup of the determination of weights and the following steps. Aweight will be defined by a number that can vary from any range but will require a lower andupper boundary. The lower boundary will reduce the amount of data points in the database sothat it filters out some of the insignificant points to shrink the dataset that will have to be furtheranalysed. The upper boundary will be used to separate larger areas of demand into separateareas that do not surpass this upper boundary. To determine the needs in this model the factorsdescribed below are taken into consideration. Though these factors form a solid basis for weightdetermination, it would be wise for future implementations to consider other extending factorsas well.

Car battery range

The range a car can drive on a certain battery is an important factor to consider. The exactrange a car can drive is of course not a proper indication as one will already be stranded at themoment they require a recharge. Based on the data that can be found on [26] a safe margin touse would be twenty to thirty kilometres. In the current model this factor is a predetermineduser input, but this could very easily be implemented as a variable for every trip. This wouldhowever require certain knowledge from users such as the type of electric vehicle, or perhaps anintegration with the API of the electric vehicle (if one wants really exact results). In this studyit is chosen not to integrate this as the gathered data also has to include users that do not yetmake use of an electric vehicle. Therefore a fixed amount that would represent a mean of therange of an electric vehicle is the current value for this parameter. The algorithm proposed hereis universal, so could be applied in a different manner (per trip parameter) as well.

Existing charging solutions

If one were to run out of battery soon it is important to consider current solutions (that is, currentEV charging stations) within a given range. To do so all existing solutions (in the Netherlands)are fetched from openchargemap.org [26] and are stored to a database for faster processing.The further these stations are located near a place that requires one to recharge the higher theweight of the need will be.

Maximum detour distance

A user might have to take a small detour to recharge his or her vehicle. This detour distance iscurrently a parameter that is user input but is estimated to be somewhere between two to fivekilometres. What is crucial here is that once this distance is reached is that the range anxietycounter (the range of the car) is set to full capacity again (or something a bit lower if user inputsuggests otherwise). Thus demand will not occur again until the car runs out of battery. Ifone is willing to test areas that already have a dense population of available charging stationsone can simply remove (set it to 0) or lower this factor and add all points to the database. Thiswould result in dense areas having a large amount of low weights which could be analysed bythe model to determine areas with still a need even though existing solutions are available. Thiswould only be a proper representation of urban areas.

Other Factors

One can also consider other factors such as the points of interest and demographics within agiven area to increase the weight of a need. These steps are not performed in this model butcan very easily be implemented if one finds a certain origin of these sources.

46

Tests where performed on the data set on how to best determine the needs based on the givenfactors. These tests turned out that a linear approach is the safest way to represent a need. It isimportant however to identify an upper limit so that certain areas do not get a high need eventhough users do not visit the location often (e.g. filter those points). To calculate the need foron-route charging stations the following algorithm is used:

1: for all points 250 meter apart do2: if traveledDistance > maxDistance-safeFactor then3: weight = distanceeps ∗ iniWeight or weight = needsDitance/(maxRadius/maxWeight)4: if distance < detourDistance then5: traveledDistance = percentageCharged6: end

Here the distance value is the distance to the nearest charging station. eps (epsilon) is afactor that smooths the line and is recommended somewhere to be set to a value below 1. TheiniWeight value represents an initial weight value and could be adjusted if combination withanother weight factor occurs but is initially set to 1. In the model a limit of the weight is set totwenty kilometres (e.g. max weight is 5).

As an example, the need calculation has been executed for the test data that is displayed before.This will give the following results, where a darker color (purple) represents a higher demand.

Figure 15: Need points, Darker represents a higher need. Max = 5 / Min = 1. Battery distance= 80km

This data is now ready for further processing. The reason the demand is extremely high inBelgium and Germany is because these charging stations are not part of the current database,thus demand is significant on these big highways. About 80 - 90% of the points that could befound in the original trips database have been filtered in this step. If one where to make drasticchanges to the parameters of the weights determination algorithm though (e.g. change thebattery range to 0), this could reduce to 0 % (though this will not represent a realistic scenario).In this scenario every charging station (so also local charging stations) is included. A morerealistic scenario where cars on the road only make use of fast chargers (255 stations instead of6500) is shown below. Further analysis is performed on the latter dataset (so the dataset whereonly existing on-route fast charge charging stations are considered, not local ones).

47

Figure 16: Needs for fastchargers. Battery Range equals 100KM

In the scenario that one wishes to find new locations for local charging stations it is importantthat the input of the model only exists of start- and endpoints of users. Here one can considerall available solutions (so not just fast chargers) as it is assumed users stop here for a longerperiod of time. If one where to execute this on test data as generated for Groningen the resultswould be as follows:

Figure 17: Needs in Groningen on randomized data

Note that this data is highly randomised (that of the local charging stations) so does notrepresent a realistic scenario. It is merely used to show that the method designed for on-routecharging stations can also be applied to local charging stations, given that the right data issupplied It can be observed that the western part of Groningen has a higher demand thenit’s surroundings due to large industrial area’s located there (and a small amount of chargingstations currently available).

48

7.4 Research on clustering algorithms

The next step, on clustering needs, requires one to identify certain dense areas with high needs(represents demand). To do so, clustering algorithms are analysed and based on the research aproper implementation has to be chosen for every step.

7.4.1 Hierarchical Clustering

The principle of the hierarchical clustering algorithm is that all points within a certain distancebelong to a certain cluster. For example, a large group of clusters could be formed if all thepoints are within a certain distance threshold. However the opposite scenario would describe acluster where some further data points are clustered in a lone scenario.

Hierarchical clustering methods construct the clusters by recursively partitioning the instancesin either a top-down or bottom-up fashion.[23]. A proper manner of representing the clustersis by using a dendogram, hence the name of the method. When not analyzed thoroughly onecan come to the conclusion that the complexity of hierarchical clustering algorithms can resultin some issues ( O(n3) or O(2n)). However the algorithm can be optimized depending on thescenario. The two major approaches can be described as follows:[23]

Agglomerative hierarchical clustering Bottom-up where every data point initially is representedin a single cluster and these clusters are merged until a desired scenario is obtained.

Divisive hierarchical clustering Top-down where all clusters start as one cluster and are splitup in sub-clusters until an appropriate size and distance is reached.

The merging and division of the clusters can be decided based on some criteria [23].

Single-Link clustering Also known as nearest neighbour clustering (nn - method). In thismethod the distance is kept into account to any other member of data within the cluster.The clusters are then checked for similarity amongst each other.

Complete-Link clustering Methods that consider the distance between two clusters to be equalto the longest distance from any member of one cluster to any member of the othercluster[23].

Average-link clustering Consider the distance between two clusters to be equal to the averagedistance from any member of one cluster to any member of the other cluster.

7.4.2 k-means clustering

K-means clustering, also referred to as Lloyd’s algorithm (the most used implementation) is awidely used and accepted algorithm to cluster vector data. The goal of the k-means clusteringalgorithm is to combine a group of data points into certain clusters with a centralised nearestdata point belonging to this certain cluster. A major difference between the previous hierarchicalalgorithm is however that k-means clustering is a partitioning clustering algorithm. This meansthat a data set will be split up in a predefined k amount of data clusters, partitioning it in newdata sets[24]. As in this case a large amount of data is available, which has to be clustered toa proper centralised area that represents a cluster with a single location, an implementation ofk-means is likely to be used. The algorithm

1: Initialize a k amount of random clustroids µ1, µ2, ..., µk.2: do assign data points to nearest µi (compute Ci).3: recompute µi as the mean of points in Ci.4: until no change in µ1, µ2, ..., µk.5: return C1, C2, ..., Ckandµ1, µ2, ..., µk.6: end

49

An example of the results of the above algorithm can be shown in the image below:

Figure 18: k-Means clustering [5]

Where the original spread data points are divided among the coloured clusters with a cen-tralised data point (the circle). The complexity of k-means is hard to define as one can not give apolynomial value to something that defines no change. Therefore likely some form of limitationto this factor will be required to make it speed up and not NP-Hard.

7.4.3 Density-based clustering

These type of algorithms take points that are closely linked together and groups them in acluster, marking less significant points as unimportant in the process. The best known algorithmin this area is known as the Density-Based Algorithm for Discovery Clusters or DBSCAN [24].This algorithm has certain parameters such as a minimal cluster amount and a certain area onecan enter to determine the maximum size of a cluster. The algorithm is distance based where allthe data points within a certain range of all data points that are in the same region are clusteredto form an individual cluster. The downside of this algorithm when applied to locating chargingstations, is that in busy areas, such as cities where a lot of demand will occur, a cluster will berelatively large. Precautions will have to be taken to prevent this or to preserve this informationwhen clustering. A maximum cluster size or a smaller value of the maximum distance mightresolve these “city issues”. Something that might turn out to be difficult when implementingthis algorithm is to determine the maximum distance parameter (eps) and the minimum datapoints parameter (minPts) [24]. Precise conclusions regarding the values of these parameters arelikely best settled in a testing environment, however Martin Easter et al. [24] discuss a propermethod on how to do this in their paper.

7.4.4 Distribution Based Clustering

Distribution based clustering is a clustering method related to distribution models (e.g. Gaus-sian and normal distribution). These clusters are separated based on the likelihood. For thesekind of clustering methods prior knowledge about the data set is required, more focusing oncombining elements that belong in combined data-sets then identifying separate area’s. XiaoweiXu et al. [15] propose an algorithm (DBCLASD) to perform cluster identification based ongeospatial data using distribution based clustering. While this is a good algorithm on its own,it does require some prior knowledge in the dataset that in the scenario of EV charging stationswould be hard or impossible to determine. Also in the paper of Xiaowei et al. [15] the followingtable could be observed:

50

Comparison between distribution based clustering and DBSCAN [15]. Execution time insedonds..

Number of Points 5000 10000 15000 20000 25000DBCLASD 77 159 229 348 457DBSCAN 51 83 134 183 223CLARANS 5031 22666 64714 138555 250790

Based on these execution time statistics, it is safe to conclude that for this implementation re-garding charging stations, DBSCAN seems a more appropriate method to execute as it requiresless information regarding the dataset and has a better execution time on large data sets.

7.4.5 Other considerations

This section contains other considered algorithms with a brief explanation as to why they arenot considered relevant to this study and hence not explained in detail. Please note that researchwas performed regarding geospatial implementations of these methods.

Graph theory Even though graph theory has a close relation with the road networks this studyanalyses, it does not have much to do with identification of dense areas of demand. Becausegraph analyses methods such as Travelling Salesman or Dijkstra’s algorithm require informationsuch as an endpoint and begin point no implementation for it was found within the scope ofthis project.

Flow networksWith requirements such as a source and sink the application might find a way in route cal-culations. However it is incapable of performing the identification this study requires for thecalculations of dense areas of demand. Therefore it is neglected for the same reasons as GraphTheory.

UPGMAUnweighted Pair Group Method with Arithmetic Mean is an hierarchical clustering methodused mostly in bioinformatics to generate phenetic trees. As this study does not require such adataset it was rejected as a possible solution to our problem.

OPTICS algorithmThe OPTICS algorithm is an algorithm that is a further implementation of DBSCAN that takesaway the factor epsilon and enables it to identify data of varying densities. This algorithm wasconsidered for a period and has also been partially implemented though if used it would alsoidentify less important areas of demand (with a large separation) that could be filtered out bythe original DBSCAN algorithm. The reason it is not discussed into detail is because the differ-ence with DBSCAN is minimal.

Vector QuantizationVector Quantization can be used for multiple purposes which can greatly enhance the setup ofan algorithm. It is however important to note that if one where to use it for clustering purposesit converges to a k-means solution in an incremental manner.

51

7.5 Identifying large clusters

After the needs have been determined as can be observed in figure 15 one has to filter out areaswith high demand. At this stage the only thing known about the dataset is the amount of pointsthat are located inside of it and the total weight of the mentioned dataset (as determined by thefirst step). Therefore a clustering algorithm that requires nearly no prior knowledge about adata-set is chosen, namely DBSCAN [24].

In this algorithm points are clustered together as previously discussed according to the dis-tance they are separated from eachother. Another important factor for DBSCAN is the minimalamount of points that should belong to a cluster. These factors highly depend on the size ofthe input data. If this tends to get larger the maximum distance between clusters might haveto decrease and the minimal amount of points might have to increase. Currently an estimateis given based on the size of the total dataset, however it might be required the user sets theseparameters manually, which is also a possibility.

The structure of the DBSCAN algorithm has been changed on some minor details such thatit does disregard points that are located near the edge of the maximum distance (epsilon) witha low weight. This is done so that larger but separate clusters do not tend to form one wholecluster. An example of the results of a DBSCAN cluster performed on the needs database setshown in figure [16] can observed below.

Figure 19: DBSCAN on the needsDB in figure [16]. Eps = 2km, minPts = 100.

Please note that all the black points are considered noise and every colour defines an indi-vidual cluster. The image on the large scale might make it hard to see that some areas are lessdense than others and therefore do not form clusters. One can observe that some more denseareas are present and therefore got a cluster assigned that might span a large area. Such anexample can be identified at the A6 and A7 highways with Emmeloord as it’s central point.

An important thing that can be observed here and also confirms the decision of choosing DB-SCAN is that it is able to identify noise. An example can be seen on the German autobahn.Because this route is an alternative route simulated, it is not used that often and therefore willnot meet the minimum points threshold. Meaning this information is now filtered out and doesnot require any further processing (it is marked as insignificant). This will enable this analysisstep to filter out clusters that do tend to create a high and dense area of demand because thereare no charging solutions nearby but are only visited by a small proportion of people. Thus

52

they might not be (as) interesting as the remainder of the data. If one were to consider areasthat have high demand but are not visited often, identification is also possible. This could bedone by setting a threshold that requires a high weight with a low amount of maximum points.

After this identification process is finished there is additional information available on the data.DBSCAN identifies the total weight of the clusters, the size and the maximum range and thetotal amount of clusters found. This information is then stored to a separate database that canbe used for further processing.

When calculating the demand on a local level it is important that the parameters of epsilon(maximum removal distance from points) is lowered so that DBSCAN will identify clusters ofcities, villages or densely populated urban areas. These high areas of demand will then bestored and will find their most crucial step of processing to be in the next part (step) of theanalysis model. An example of an identification of a cluster in Groningen can be observed inthe image below. In a real world scenario this would be split up in different clusters in differenturban areas.

Figure 20: DBSCAN on an urban area to identify clusters of cities.

53

7.6 Processing large clusters

Once the initial cluster has been split up in smaller clusters and additional knowledge is ob-tained by the previous step, this step can calculate where new charging station should be placed.DBSCAN will tend to cluster together large proportions of clusters if they are within a densearea. This is not a bad thing though it will result in one huge area (cluster) where there is a highdemand (an example can be seen in figure 19 at the red cluster near Emmeloord). This areawill require further processing by splitting it up in multiple clusters with a centralised location,that forms the actual new charging station location. If this last step where not to be executed,the charging station location would just end up somewhere in the middle of a field, with needskilometers away from the actual charging station location.

Clusters that are considered less dense (have a smaller size) that do not require further pro-cessing by means of splitting up would still require a mean calculation on where the weight ofthe cluster will reach it’s highest proportions, to identify the exact new charging station location.This mean value will then represent a central point as to where the weight (or size) in this areais at it’s highest and thus represents the location with highest demand.

To perform both these steps, an adjusted version of Lloyd’s algorithm (k-means clustering) isimplemented. This version of k-means uses weights to determine the means instead of locationsand also determines the no change factor based upon weights of the clusters. If the secondscenario occurs where one only wishes to know a centralised point of a smaller cluster the valueof k can be set to a default value of 1 to calculate the mean of the cluster. In the other scenariothe value of k will have to be calculated.

To calculate the value of k, estimates are made based on the size and total weights of everyindividual cluster (the clusters identified by DBSCAN), however some user input may be re-quired as the weights can vary greatly depending on demographics as well as the calculationmethod of the weight parameter. Two important variables that can be estimated but also can beentered by the user are introduced: Minimum and Maximal weight. A low weight will definethe minimum demand for a charging station before a recommendation will be placed there (any-thing below this threshold will be disregarded), the maximum weight if exceeded will calculatethe value of k to determine into how much separate elements the data should be separated. Forexample, if a city is identified, the results from the previous step will be a large cluster. Thiscluster will then be split up into a large amount of clusters, identified by a large value of k, todetermine separate clusters. Likely some clusters will not reach the minimal threshold and willbe discarded whilst others will reach a significant value.

The k-means algorithm by default continues until there is no change in the dataset. As thedataset under consideration (user trip data) can tend to get large, a maximum amount of it-erations for the k-means algorithm execution can be given. Another parameter will define thedefinition of ’no change’ by means of a deviation between the maximum given weight on everydataset. For example if the maximum weight is set to 2000, a weight difference of one percentwould mean no distributions would differ more than 20 weight-points. The algorithm is ad-justed as follows:

1: Initialize a k amount of random clustroids µ1, µ2, ..., µk.2: do assign data points to nearest µi (compute Ci).3: depthIterator − > depthIterator+1.4: recompute µi as the mean of weights in Ci.5: until weight differs no more than 10% of maxWeight in µ1, µ2, ..., µk.6: or depthIterator > maxDepth7: return C1, C2, ..., Ckandµ1, µ2, ..., µk.8: end

54

In the implementation of k-means it was also important to consider how to pick data points( in what order) as the geospatial database tends to order them. This study executed some testsand came to the conclusion that the most accurate results, in a short period of time, would beachieved if the points were requested from the database in random order.

An example of the results of the k-means algorithm based on the on-route (fast charger only)scenario shown above can be observed below. Here, every cluster with a weight lower than800 is discarded and everything above the size of 5000 or a total weight of 2000 is split up byk-means in different clusters. Please note that the weight values are in the domain between 0and 5 and are represented by a double.

Figure 21: k-means results on data-set of figure [16]

The results show that the large clusters located on the A6 and A7 in figure 19 have beenreduced to multiple points that are located on highways instead of one big cluster with a cen-tralised mean. The information known about every individual point will then be stored to adatabase that can further process the gathered information.

When analysing clusters on a local level such as in figure 17, the parameters of minimal andmaximal weight will have to be adjusted to lower (or perhaps, if really dense areas are evaluated,to higher) standards. The results will be visualised later on in this thesis.

55

7.7 Post processing

The k-means algorithm will upload the results generated for new charging locations and theirsize and weights to a database. The post processing step then uses this information to visualisedata to a user in an appropriate way. A distinction can be made if the user wishes to filter onthe value of the demand or the weight of the new charging station location. After evaluatingthese factors for every location found, the following results are generated:

Figure 22: Post-processing results on figure [16] based on weights.

The darker the colour the higher the demand is for a new charging station at that location.Based on these locations one can determine within a certain area on where it is most advisedto place a new charging station. Some human input will be required to determine the final andmost suited location to determine a station (e.g. near a currently existing gas station).

It is important that one does recognise certain areas with a more dense demand (in this ex-ample the A6). It is recommended to place a charging station at the darkest spot in this areaand if one chooses to do so it is important to note that surrounding demand will also decline.For example if one would place a fast charger on the spot with the highest demand on the A6at for example Emmeloord the model would generate the following result:

56

Figure 23: With a new fastcharger placed at Emmeloord

It can be observed that this has significant effects on the recommendations throughout thewhole country (but especially in the surrounding area). However, as the amount of fastchargerson the A6 and A7 is still limited and a lot of traffic from the simulation tends to pass theseroads, recommendations still exist along this route.

When analysing the results on a local level such as Groningen, as shown in figure 17, thefollowing results are produced:

Figure 24: Post-Processing results on data in figure 17

Please note that the input data (for local charging stations) was highly randomised and doesnot represent a real word scenario in any way. It is only used to show that the methodology foron-route charging stations can be applied to local charging station locations as well. The resultsthat can be observed in the image show that there is a higher demand in the northern part ofthe city. The smaller size but higher weight demand that could have been noticed in the westis combined with a smaller demand near the Paddepoel region and has now gotten the highestdemand for local charging demand.

57

7.8 Input Parameters

In this chapter several parameters were discussed that the program uses as input. In this chapterthese parameters are discussed in short so the user can set appropriate values on the web portal.

General parameters

Battery Range Identifies the total battery range in meters a car could drive before a recharge isrequired.

Max Sacrifice A parameter that indicates in meters how far one would deviate from their orig-inal route to recharge their battery.

Minimal Weight This variable is used to filter out not significant charging stations. Everycharging station below this threshold will be removed from the visual results.

Maximum Weight A threshold that defines input for the k-means algorithm. If a cluster identi-fied by DBSCAN would exceed this amount it would be separated by k-means so that themaximum value is that of this value or less.

Maximum Size The maximum size (total amount of points) a cluster can take. This value hasto be set to a high value if one where to choose to filter on weights. This is especiallyhandy to set to a low value if for example dense areas of demands with low weights butlarge sizes tend to form in urban areas.

Minimal Size The minimal size a certain cluster should take. Especially handy if one where tofocus on placing charging station is more urban area’s (put it on a higher value) insteadof area’s with high demand.

Advanced Parameters

DBSCANRange A double value that represents the epsilon value for DBSCAN. This is themaximum distance in kilometres points can be separated from each other.

Linear Weights Decides what of the two weight calculations method should be used by theweights determination algorithm (section 7.3). If true the second option is chosen thattends to form a more linear line. The first option suits a more urban approach while thelather is a more general solution.

Epsilon This represents the epsilon value that will influence a non linear weights determinationcalculation.

Maximum Scan Range The maximum distance that one is going to search around a weightpoint for a charging station in meters.

Sort on Weights Default value is yes, if no the program will visualise the dots based on the sizeand not the weights/size. This can be useful in more urban areas where one is interestedin areas that get visited often instead of high areas of demand.

Maximum k-means depth To prevent k-means from becoming NP-hard this value is intro-duced to set an iteration limit. Anything between 50-100 should result in a fast andaccurate answer.

Maximum k-means deviation The maximum amount that every split up data set is going todiffer from k-means. If this threshold is met an earlier termination than the above givenmaximum depth will be achieved. This value is a percentage.

58

7.9 Complexity

When executing the model it is easy to identify that DBSCAN is the slowest implemented algo-rithm and will likely form the complexity barrier. The complexity of the weight calculation isof O(n ∗ m) where n is the size of the data set and m the amount of existing charging station.This is because it simply iterates over all the points and will look at some surrounding factors(namely the existing charging stations) for every point. This requires approximately 10-20 % ofthe total execution time.

Even though the first step tends to filter out a large proportion of the data, still a large datasetwill remain. As one can observe in the algorithm research, DBSCAN will still have a long execu-tion time with a relatively small number of points. The geospatial database implementation thatwas implemented at a later moment did increase the execution time by a high amount thoughthe complexity of DBSCAN remains O(n2). Execution of the DBSCAN algorithm takes about70% of the total execution time.

The last significant step is the division in smaller clusters by Lloyd’s algorithm. Since the com-plexity of k-means is hard to define as it is hard to determine complexity based on the factor ofno change, this study assumes the worst case scenario. As this study has introduced a factorthat limits the depth of k-means the complexity will be O(n ∗maxDepth) where n is the size ofthe input dataset. It is important to note that if the maxDepth factor is not introduced, or set toinfinite, the complexity of k-means is NP-hard. K-means execution takes about 10 % of the totalexecution time.

The last step only iterates once over the found new charging station locations and looks atthe saved details of every point to visualise them. At this point the database is at the small-est size and the execution time is O(n) and takes less than 1 % of the execution time. Theweakest link therefore is either k-means or DBSCAN. However as the amount of data processedby DBSCAN is much higher and as this study has introduced a maxDepth variable at k-meansimplementation, it is safe to assume that the complexity is O(n2) where n is the total amountof points in the dataset. Keep in mind however that the weights determination will filter about80 % of the original dataset, hence decreasing the number n. To analyse the execution time inseconds one has to consider both hardware and input data. The input data consisted of thedata generated in figure 14 containing 3000 trips and over 500.000 points (n). A hardware nodewas used consisting of a dual core Xeon processor with 2GB of RAM and 32GB of SSD storage(CloudVPS Standard 2). The used operating system was Ubuntu Server 15.04. The total execu-tion time would be 210 seconds however this can increase or decrease greatly when the formatof the input data would change (e.g. increase the size of n at step 2).

59

7.10 Results

In this section a model was designed to answer the following research question:

How can we analyse gathered data, and process it into valuable conclusions that inform

local authorities and other service providers with desired charging station locations?

The model takes as input the user generated data, collected as described in prior sections. Basedon this data the model for processing data is capable of determining new charging locations.The model requires input parameters that can be estimated by the program but may differ ifsomeone were to analyse data outside of the Netherlands. The question can be answered onboth a large scale level such as a country or a small scale such as a city. A clear distinction hasto be made between charging stations; the option to charge on-route and the option to rechargewhen standing still (at for example a Point of Interest), a local charging station. The first re-quires a fast charger solution to be implemented on the indicated location, and the latter willhave less requirements, also when analysing or gathering data.

The answers to the research question comes in the form of a four step approach which makesuse of a weight determination system created for just this purpose. Two existing clustering al-gorithms are adjusted so that they are capable of identifying and processing clusters of demandto determine a final position for a charging station (steps 2 and 3). A post-processing method isexecuted that focuses on visualisation to the users that will eventually result in a close proxim-ity on where to place a new charging station (step 4). Some final user input will be required todetermine the final position.

To test the model and verify the results, this study initially attempted to make use of realuser-generated data but came to the conclusion this would not be a sufficient amount of inputdata. A simulation was created that mimics real world traffic on the road network in the Nether-lands. The data was verified with the input parameters of common electric vehicles such as theNissan Leaf. The results yield realistic conclusions and tend to pinpoint realistic locations fornew charging stations. A new charging station has also been manually added to the databaseand the model was then executed again to successfully observe that the changes made to theresult are appropriate. The results on a smaller scale are also satisfying although the test datageneration on such a small scale is far more difficult and less inaccurate then reality. It didhowever show that the proposed analysis solution allows to identify both new on-route as wellnew local charging station locations.

60

8 Conclusions

Various data sources on driver location and car use exist. During this study it has becomeapparent that two types of charging stations exist that require different approaches towardsidentifying new locations. The two types to be identified are firstly on-route charging stations,that allow for the extension of single trip range, by recharging during the trip, making use offast charger capabilities. Secondly, local charging stations that provide abilities to charge atlonger stops or home- and office locations.

For local charging stations, readily available data sources that only provide single non-linkeddatapoints (GPS-coordinates) on the whereabouts of drivers might be of use in analysing newcharging station locations. To be able to analyse new charging stations for both local as wellas on-route charging stations, it is needed to have information on user trips: datapoints (GPS-coordinates) that are connected to represent a route a car driver travels along. This forms anadequate source of data for identifying new on-route charging station locations, and a rich andenhanced data source for identifying local charging stations. Such data could adequately becollected from users by tracking them using an app, that furthermore will provide users withinsights on use of electric vehicles. This way, both infrastructure can be improved, and users canbe informed and made aware of the possibilities electric driving has to offer for them, allowingfor an increase in the use of electric vehicles.

The technology needed to collect the trip data is readily available. Implementation of the solu-tion would consist of three components mainly: an app to gather data, a centralised platform tostore, process and analyse it, and a web portal to make available the insights and conclusionsthat can be drawn from the analysis. Prototyping has proven the suggested pipeline, from auser smartphone with GPS-capabilities, to analysis on a centralised platform, initiated from aweb portal for end-users of the (insights provided by) the data. Many further applications ofboth the app, the collected data and the web portal have and can be identified. This way, thisstudy contributes to the transition to electric vehicles, and the further use of renewable fuelsinstead of fossil fuels.

To convert the gathered data to the conclusions leading to new charging station locations, afour-step approach has been defined. It consists of first of all calculating weights based on thegathered trip data (where is the need for charging stations located, and where is it the mosturgent). Secondly, clusters can be identified based on these weights using an adaptions of theDBSCAN algorithm. Afterwards, by applying the methodology of k-means clustering, theseclusters can be further processed to form a basis for conclusions on new charging station loca-tions. Finally, post processing visualises the results, and allows third parties to easily spot newcharging station locations.

The analysis results have been verified by comparing them against existing charging locationsand sample data on driving behaviour (trips). This study has successfully been able to identifylocations with low coverage, and hence increase the effective range for the simulated trips. Also,comparisons that do not take existing charging stations into account (as the analysis has beenprogrammed to do) show that the influence of existing charging stations is effectively influecingthe result, hence optimising the new identified charging station locations for increasing the cov-erage in large areas, instead of expanding the coverage near existing charging stations or pointsof interest.

The above serves to prove that new charging station locations can be identified based on userdriving behaviour (trip data). Based on the verifications and comparisons against existing charg-ing station locations, this study may hence conclude that user driving behaviour (trip data)forms a great data source for identifying new charging station locations, and that it can assist inthe global transition to electric vehicles.

61

Future work

Though this study shows promising results, and is able to answer the research questions in asatisfiable way as shown in the above conclusions, there are of course some suggestions thisstudy would like to make on future work in the EV charging station domain. This study mightbe used as a basis for this future work, or this work may be improved by implementing thesuggestions below.

Android application

The app has several shortcomings. In future implementations these should be addressed. Devel-oping the car advice part, giving car owners an incentive to use the app would be an importantfeature to be implement. This can be done by calculating the distances driven for each trip anddetermine if it is possible to make that trip with a number of given cars (taking existing chargingpoles into account). The app then advises the user on what type of car is needed to make thosetrips.

Tracking other values, such as speed, acceleration with the app is also possible. These havean impact on battery usage for electric cars and could be a factor for choosing which car touse. This will be only needed for the car advice incentive. This information can also be usedto improve the detection of movement in a car and disable the polling when on a bike or on foot.

Another method for distinguishing car movement from other types, is checking for the availabil-ity of a Bluetooth car kit. Most cars already have this feature, checking if the phone is connectedto a known car kit is a proper way to confirm movement is made by car.

Use a more battery friendly method for polling the locations instead of the device only options.Google has provided a new location API with its 5.0 platform. Unfortunately the adoption ofthis version is still very low and therefor not used in the prototype. In newer versions of theapp, this should be considered.

General considerations

In this thesis, only a small potential is used by the analysis. Driving behaviour may be consid-ered in needs. Instead of only measuring the trips, also speed, acceleration must be taken intoaccount. These can have a major impact on the range of an electric vehicle. This might increasethe validity of advice given by the app.

Taking in account why people drive to certain destinations (work, leisure, shopping etc.) Thisinformation could give insight in what type of chargers are needed. Shopping centres wouldneed a faster charger, then a home/work charger, as car owners tend to spend more time in thelatter locations.

Data analysis

Implement additional parameters to determine the weight of demand, such as nearby points ofinterest.

Upload different available solutions such as bio-fuel stations to determine where these shouldbe placed next.

Test the model on a large set of real user generated data.

Integrate battery range parameter in application or car API so that the information differs forevery vehicle.

62

References

[1] Pouria Amirian, Anahid Basiri, Adam WinstanleyEffiecint Online Sharing of Geospatial Big Data Using NoSQL XML DatabasesDOI:10.1109/COMGEO.2013.34

[2] Cattell, RickScalable SQL and NoSQL Data Stores,DOI:10.1145/1978915.1978919

[3] Anthony Fox, Chris Eichelberger, James Hughes, Skylar Lyon Commonwealth ComputerResearch, IncSpatio-temporal Indexing in Non-relational Distributed DatabasesDOI:10.1109/BigData.2013.6691586

[4] Zhonghai Zhou, Ocean University of China College of Marine. Wenwen Li, Brian Griglak,Carmen Caiseda, Qunying Huang College of Science George Mason UniversityEvaluating Query Performance on Object-Relational Spatial DatabasesDOI:10.1109/ICCSIT.2009.5234509

[5] WikipediaK-means clustering. Image source & algorithm setup ,http://en.wikipedia.org/wiki/K-means_clustering

[6] Dutch GovernmentWet bescherming persoonsgegevens,http://wetten.overheid.nl/BWBR0011468/geldigheidsdatum_22-04-2015

[7] Tang, Karen P. and Lin, Jialiu and Hong, Jason I. and Siewiorek, Daniel P. and Sadeh,NormanRethinking Location Sharing: Exploring the Implications of Social-driven vs. Purpose-drivenLocation Sharing,Proceedings of the 12th ACM International Conference on Ubiquitous Computing,DOI: 10.1145/1864349.1864363

[8] MonoDB offical websiteMongoDB features and spatial capabilities,https://www.mongodb.org/

[9] Khiyaita, A. and Zbakh, M. and El Bakkali, H. and El Kettani, D.Load balancing cloud computing: State of art,2012 National Days of Network Security and Systems (JNS2),DOI: 10.1109/JNS2.2012.6249253

[10] Minch, R.P.,Location Privacy in the Era of the Internet of Things and Big Data Analytics,2015 48th Hawaii International Conference on System Sciences (HICSS), Jan 2015,DOI: 10.1109/HICSS.2015.185

[11] Linna Li and Goodchild, M.F.,Is privacy still an issue in the era of big data? #x2014; Location disclosure in spatial footprints,2013 21st International Conference on Geoinformatics (GEOINFORMATICS), June 2013,DOI: 10.1109/Geoinformatics.2013.6626191

[12] Jiazhu Dai and Liang Hua,PLocShare: A privacy-preserving location sharing scheme in mobile social network,2014 IEEE Workshop on Electronics, Computer and Applications, May 2014,DOI: 10.1109/IWECA.2014.6845580

63

http://en.wikipedia.org/wiki/K-means_clustering

http://wetten.overheid.nl/BWBR0011468/geldigheidsdatum_22-04-2015

https://www.mongodb.org/

[13] IDCSmartphone OS datahttp://www.idc.com/prodserv/smartphone-os-market-share.jsp

[14] MySQLMySQL official websitehttps://www.mysql.com/

[15] Xiaowei Xu, Martin Ester, Hans-Peter Kriegel, Jrg SanderA Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases,Data Engineering, 1998. Proceedings., 14th International Conference on,DOI: 10.1109/ICDE.1998.655795

[16] Xiaomin Xi, Ramteen Sioshansi and Vincenzo Marano,Simulationoptimization model for location of a public electric vehicle charging infrastructure,2013 Transportation Research Part D: Transport and Environment, vol. 22,DOI: 10.1016/j.trd.2013.02.014

[17] Sung Hoon Chunga, Changhyun Kwon,Multi-period planning for electric car charging station locations: A case of Korean Expressways,European Journal of Operational Research, Volume 242, Issue 2, 16 April 2015, Pages677-68,DOI: 10.1016/j.ejor.2014.10.029

[18] Shih-Ching Lo,Classification of Driving Behavior by Pattern Recognition in Multiclass Users Traffic Flow,AIP Conference Proceedings, 2007,DOI: 10.1063/1.2836259

[19] Jeremy Neubauer, Eric Wood,The impact of range anxiety and home, workplace, and public charging infrastructure on simulatedbattery electric vehicle lifetime utility,Journal of Power Sources, 2014, vol. 257, pages 12-20,DOI: 10.1016/j.jpowsour.2014.01.075

[20] Bart Lubbers,The Fastned Story deel 1,2014, http://fastned.nl/nl/onze-missie,ISBN 978-94-6203-755-7

[21] Rijksdienst voor Ondernemend Nederland,Cijfers elektrisch vervoer,2015, http://www.rvo.nl/onderwerpen/duurzaam-ondernemen/

energie-en-milieu-innovaties/elektrisch-rijden/stand-van-zaken/cijfers

[22] Stephen P. Borgatti , University of South CarolinaHow to Explain Hierarchical Clusteringhttp://www.analytictech.com/networks/hiclus.htm

[23] Rokach, Lior, and Oded MaimonClustering methods,Data mining and knowledge discovery handbook. Springer US, 2005. 321-352.

[24] Martin Ester, Hans-Peter Kriegel, Jiirg Sander, Xiaowei XuA Density-Based Algorithm for Discovering Clustersin Large Spatial Databases with Noise.,KDD-96 Proceedings

64

http://www.idc.com/prodserv/smartphone-os-market-share.jsp

https://www.mysql.com/

http://www.rvo.nl/onderwerpen/duurzaam-ondernemen/energie-en-milieu-innovaties/elektrisch-rijden/stand-van-zaken/cijfers

http://www.rvo.nl/onderwerpen/duurzaam-ondernemen/energie-en-milieu-innovaties/elektrisch-rijden/stand-van-zaken/cijfers

http://www.analytictech.com/networks/hiclus.htm

[25] Thanh N. Tran, Klaudia Drab, Michal DaszykowskiRevised DBSCAN algorithm to cluster data with dense adjacent clusters,Chemometrics and Intelligent Laboratory Systems 120 (2013) 92-96

[26] Open ChargeMaphttp://openchargemap.org/site/

[27] Fielding, Roy T. and Taylor, Richard N.Principled Design of the Modern Web ArchitectureProceedings of the 22Nd International Conference on Software EngineeringDOI: 10.1145/337180.337228

[28] Howard Butler et all.GeoJSON Specificationhttp://geojson.org/

[29] Ge Bai and Hansi Mou and Yinhong Hou and Yongqiang Lyu and Weikang Yang AndroidPower Management and Analyses of Power Consumption in an Android SmartphoneConference on Embedded and Ubiquitous Computing , 2013DOI: 10.1109/HPCC.and.EUC.2013.338

[30] Bhatia, Shaveta and Hilal, SabaA new approach for Location based TrackingInternational Journal of Computer Science issues vol. 10 nr. 3 2013

[31] Kumar, S. and Qadeer, M.A. and Gupta, A.Location based services using android (LBSOID)Internet Multimedia Services Architecture and Applications (IMSAA), 2009 IEEE Inter-national Conference onDOI: 10.1109/IMSAA.2009.5439442

[32] Twitter inc.Twitter REST APIhttps://dev.twitter.com/rest/public

65

http://openchargemap.org/site/

http://geojson.org/

https://dev.twitter.com/rest/public

A Appendix

A.1 API specification

Figure 25: API definition of sending trips to the platform

66

Figure 26: API definition of misc. functions

A.2 List of places used for simulation purposes

The places (cities and villages) below have been used in the simulation of data for the analysis,as described in the section on analysis of this thesis.

• Amsterdam

• Den Haag

• Groningen

• Amerongen

• Venlo

• Lutjebroek

• Ouddorp

• Maastricht

• Middelburg

67

• Zwolle

• Haarlem

• Lelystad

• Den Helder

• Volendam

• Leiden

• Den Bosch

• Makkum

• Arnhem

• Apeldoorn

• Enschede

• Emmen

• Stadskanaal

• Delfzijl

• Reeuwijk

• Heerlen

• Hilversum

• Urk

• Dronten

• Winschoten

• Zutphen

• Barneveld

• Nijmegen

• Doetinchem

• Vries

• Leeuwarden

• Breda

• Veghel

• Steenwijk

• Assen

• Utrecht

68

A.3 Contributions

Though the ChargeQuest research project has been a team effort, it is required by university reg-ulations to supply a formal identification of contribution to the thesis and research, by variousmembers of the team. This is supplied below. Please be aware that, in the end, all three teammembers contributed to all parts of the research, whereas below only the main contributors arementioned.

Jorden van Breemen

• 7. Data analysis

• Implementation Java analysis software

• Contributions to 4. Requirements

• Contributions to 5.2 & 5.3

Klaas Kliffen

• 5. Technical Research

• 6. Prototyping

• Contributions to 4. Requirements

• Development of the prototype app

• Partial implementation of DBSCAN (Data analysis)

Alex-Jan Sigtermans

• 1. Project description

• 2. Problem statement

• 3. Methods to gather data

• 4. Requirements

• 8. Conclusions

• Contributions to 6. Prototyping

• Contributions to 7.2 Test data generation

• Development of the web portal

• Development of trip data simulator

69

Documents

Identifying new EV charging station locations based on ...fse.studenttheses.ub.rug.nl/13032/1/2015-0702_ChargeQuest_Bachel… · Bachelor Project ChargeQuest c Identifying new EV