Geospatial Analysis of Diseases for Ratnakala Health ERPmoradiya.in/home/downloadFile/4730559ae19627565c.pdfGeospatial Analysis of Diseases for Ratnakala Health ERPTM Submitted in

Geospatial Analysis of Diseases for Ratnakala

Health ERPTM

Submitted in partial fulfilment of the requirements for the degree of

BACHELOR OF TECHNOLOGY

in

Computer Engineering

Submitted by

Shivani P. Desai

(201209100310055)

C.G.P.I.T., UKA Tarsadia University

Internal Guide

Asst. Prof. Rachna M. Patel

C.G.P.I.T., UKA Tarsadia University

External Guide

Mr. Ankit Moradiya

(C.E.O. Ratnakala Software Pvt. Ltd.)

Department of Computer Engineering & Information Technology

Chhotubhai Gopalbhai Patel Institute of Technology

UKA Tarsadia University, Bardoli

May, 2016

Contents vi

ContentConfidential iii

Abstract iv

Acknowledgment v

Contents vi

List of Tables viii

List of Figures ix

Abbreviations x

1 Introduction 2

1.1 Health ERP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Data mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Geospatial analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3.1 What is spatial pattern? . . . . . . . . . . . . . . . . . . . . 4

1.3.2 Families of Spatial Data Mining Patterns . . . . . . . . . . . 4

1.3.3 GIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.3.1 GIS datasets . . . . . . . . . . . . . . . . . . . . . 5

1.3.4 Why geospatial analysis of diseases is needed? . . . . . . . . 5

1.3.5 Disease mapping . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.6 Geospatial analysis process . . . . . . . . . . . . . . . . . . . 5

1.4 Software requirement . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Problem 8

2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 SAP Lumira . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.2 ESRI MAPS . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Literature review 11

3.1 Spatial Epidemiology . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Framework for spatial analysis . . . . . . . . . . . . . . . . . 11

3.2 GIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.1 Layered Technology . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 disease clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Data mining process model 17

4.1 CRISP-DM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.1 Business Understanding . . . . . . . . . . . . . . . . . . . . 17

c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved

Contents vii

4.1.2 Data Understanding . . . . . . . . . . . . . . . . . . . . . . 18

4.1.3 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1.4 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.6 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Algorithm 20

5.1 Epidemiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.1.1 Frequency based measures used in epidemiology . . . . . . . 20

5.2 Spatial data mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.2.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.3 Predictive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.3.1 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . 23

5.3.2 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6 Database Design 27

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.2 Database Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.2.1 res diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.2.2 res area wise diseases . . . . . . . . . . . . . . . . . . . . . . 28

7 Implementation 30

7.1 Calculating risk ratio . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7.1.1 By entering two months and the disease for which compar-ison will be done. According to that risk ratio will be gen-erated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7.1.2 When there is new entry in the database ,risk ratio will beadded to the database according to the algorithm. . . . . . . 31

7.2 Google map APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7.3 Visualizations in SAP Lumira . . . . . . . . . . . . . . . . . . . . . 33

7.3.1 Analysis of diseases in Surat . . . . . . . . . . . . . . . . . . 33

7.3.1.1 Cancer analysis . . . . . . . . . . . . . . . . . . . . 42

7.3.1.2 Swine flu analysis . . . . . . . . . . . . . . . . . . . 45

7.3.2 Analysis of diseases in Gujarat . . . . . . . . . . . . . . . . . 47

7.3.3 Analysis of diseases in 8 cities of India . . . . . . . . . . . . 59

8 Conclusion 75

9 Future prediction 77

Bibliography 81


List of Figures viii

List of Figures1.1 Geospatial analysis process . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 Conceptual framework of spatial epidemological data analysis . . . 12

3.2 GIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1 CRISP-DM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

7.1 Enter details to find risk ratio . . . . . . . . . . . . . . . . . . . . . 30

7.2 Risk ratio as a result . . . . . . . . . . . . . . . . . . . . . . . . . . 31

7.3 disease count for different diseases in Surat . . . . . . . . . . . . . . 33

7.4 Acen spreads in Dabholi day by day . . . . . . . . . . . . . . . . . . 34

7.5 Acen spreads in quarter 1 . . . . . . . . . . . . . . . . . . . . . . . 35




7.9 List of disease count of acne in Adajan on different day . . . . . . . 37

7.10 comparison of acne, actinic keratoses and acute myocardinal infrac-tion in Surat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.11 comparison of acne and myocardinal infraction spreading month bymonth in Dabholi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7.12 Total disease count of diseases in different zones of Surat . . . . . . 40

7.13 risk of diseases in Surat . . . . . . . . . . . . . . . . . . . . . . . . . 41

7.14 disease count of different canceres in Surat . . . . . . . . . . . . . . 42

7.15 Cancers spread day by day in Surat . . . . . . . . . . . . . . . . . . 43

7.16 Disease count of different cancers in different landmark in Surat . . 44

7.17 Swine flu spreads day by day in Surat . . . . . . . . . . . . . . . . . 45

7.18 Swine flu affects different landmarks in Surat . . . . . . . . . . . . . 46

7.19 Total disease count in different localities of Gujarat . . . . . . . . . 47

7.20 Top 5 diseases Which affects most in Gujarat . . . . . . . . . . . . 48

7.21 Bottom 3 diseases Which affects less in Gujarat . . . . . . . . . . . 49

7.22 Anxiety affects different localities of Gujarat . . . . . . . . . . . . . 50

7.23 Comparison of mental diseases’effects in different in Gujarat . . . . 51

7.24 Comparison of different cancers in Gujarat . . . . . . . . . . . . . . 52

7.25 Blood pressures’s patients in Gujarat in quarter 1 . . . . . . . . . . 53




7.29 Most affected landmark of gujarat . . . . . . . . . . . . . . . . . . . 56

7.30 Most affected and least affected locality of Gujarat . . . . . . . . . 57

7.31 Comparision of acne, acid indigestion upset stomach, actinic ker-atoses, acute myocardinal infraction in Gujarat . . . . . . . . . . . 58

7.32 Total disease count for different localities of India . . . . . . . . . . 59

7.33 Comparison of spreading of leptospirosis between Surat and Banglore 60


List of Figures ix

7.34 Top 10 disease count in india . . . . . . . . . . . . . . . . . . . . . 61

7.35 Top 10 disease count in india . . . . . . . . . . . . . . . . . . . . . 62

7.36 Swine flu spreads in India in February . . . . . . . . . . . . . . . . 63

7.37 Swine flu spreads in india in March . . . . . . . . . . . . . . . . . . 64

7.38 Swine flu spreads in india in April . . . . . . . . . . . . . . . . . . . 64

7.39 Swine flu spreads in India in May . . . . . . . . . . . . . . . . . . . 65

7.40 Swine flu spreads in India in July . . . . . . . . . . . . . . . . . . . 66

7.41 Swine flu spreads in India in August . . . . . . . . . . . . . . . . . 66

7.42 Swine flu spreads in india in October . . . . . . . . . . . . . . . . . 67

7.43 Swine flu spreads in India in December . . . . . . . . . . . . . . . . 68

7.44 Most and least affected state by diseases in India . . . . . . . . . . 69

7.45 Comparison of anemia, asthma, bacterial infections, eye allergiesand keratitis in India . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.46 Comparison of anemia, asthma, bacterial infections, eye allergiesand keratitis in India . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.47 Comparison of allergies,alopecia,altitude illness,alzeimer and am-blopia in India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.48 Disease count for different diseases in different localities in India . . 73

7.49 Bottom 3 disease count in India . . . . . . . . . . . . . . . . . . . . 74

9.1 Prediction of diseases in India in 2016 by linear regression . . . . . 77

9.2 Prediction of diseases in India in 2016 by forecasting . . . . . . . . 78

9.3 Prediction of acid indigestion upset stomach in India in 2016 bylinear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

9.4 Prediction of acid indigestion upset stomach in India in 2016 byforecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80


Abbreviations x

AbbreviationsERP Enterprise Resource Planning

GPS Global Positioning System

GIS Geographic Information System

CRISP-DM CRoss Industry Standard Process Data Mining

ESRI Environmental Systems Research Institute

DBMS Data Base Management System

API Application Programming Interface


Company Profile 1

Ratnakala Software Pvt. Ltd.

205/206, Luxuria Business Hub,

Before VR Mall,

Piplod Road,

Surat – 395007,

Gujarat, India.

The company began as Monali Solutions and transformed into Ratnakala Software

Pvt. Ltd. on the 27th August, 2014. The objectives of the company are to offer

consultancy, advisory and all related services in the field of Information Technology

including computer hardware and software, software development, data commu-

nication, telecommunication, manufacturing and process control and automation,

hardware selection, system design, manpower selection, implementation, training

and to spread computer literacy and computer aided education in rural and urban

areas through application of modern techniques.

The company specialize in developing mobile applications for the two main mobile

platforms which are Android and iOS for all device models whether tablet or

phones.

The company develops mobile application for end users as well as client–specific

applications. Company develop ERPs for the institutes for optimizing the usage

of the resources for their personal growth in the real world market.

The company is a subsidiary of Ratnakala Exports Company, one of the world’s

largest exporters of polished diamonds.


Chapter 1

Introduction

1.1 Health ERP

Health ERP has been a valued solution of decision making in treatment. The

software has helped for improving operations with enhancements for productivity,

profitability, growth and overall business processes. In up coming years, ERP will

be in demand for the health care industry.

The key benefits provided by ERP to the health care sector is business intelligence,

better patient care, reduce operational costs. Health ERP is a software which eases

the business work and treatment process in health industry with graphical analy-

sis and decision making. Health involves activities that work for the society and

people by keeping people healthy, by protecting the environment, by making sure

that the water and food supply are safe, and providing sufficient health services.

ERP system in place eliminates duplication and manual processes and proactively

increases patient safety through the use of efficient and effective information sys-

tems.

2

Chapter 1. Introduction 3

1.2 Data analysis

Analysis of data is a process of inspecting, cleaning, transforming, and modeling

data with the goal of discovering useful information, suggesting conclusions, and

supporting decision-making. Data Analysis refers to breaking a whole into its

separate components for individual examination. Data analysis is a process for

obtaining raw data and converting it into information useful for decision-making

by users. Data is collected and analyzed to answer questions, test hypotheses or

disprove theories.[1]

1.2.1 Data mining

Data mining is a particular data analysis technique that focuses on modeling and

knowledge discovery for predictive rather than purely descriptive purposes. The

overall goal of the data mining process is to extract information from a data set

and transform it into an understandable structure for further use.[1]

1.3 Geospatial analysis

Geospatial analysis, or just spatial analysis is an approach to applying statistical

analysis and other analytic techniques to data which has a geographical or spa-

tial aspect. Such analysis would typically employ software capable of rendering

maps processing spatial data, and applying analytical methods to terrestrial or

geographic datasets, including the use of geographic information and geomatics.

It is the gathering, display and manipulation of imagery, GPS, satellite photogra-

phy and historical data, described explicitly in terms of geographical coordinates

or implicitly,in terms of street address, postal code or forest stand identifier as

they applied to geographic model.

Generally geospatial data is not 2D or 3D data.It is a high dimensional data.[1]



1.3.1 What is spatial pattern?

• A frequent arrangement, configuration, composition, regularity.

• A rule, law, method, design, description.

• A major direction, trend, prediction.

1.3.2 Families of Spatial Data Mining Patterns

• Location prediction:

– Where will a phenomenon occur?

– Where will disease occur?

– Which disease are predictable in particular spatial location?

– What should be recommended to health care organization to control

disease for affected area?

• interaction:

– Which subset of spatial phenomena interact?

– Which spatial events are correlate with another spatial event?

• Hotspot:

– Which locations are unusual or share commonalities?

– Spatial clustered.

– Diseases is common in particular spatial area?

1.3.3 GIS

Geographical Information System is a large domain that provides a variety of ca-

pabilities designed to capture, store, manipulate, analyze, manage, and present all

types of geographical data, and utilizes geospatial analysis in a variety of contexts,

operations and applications.



1.3.3.1 GIS datasets

It comes as layers i.e.here in this case layers for diseases and landmarks.Layers

have features.GIS layer has two views.

1. map view : It acts as visual representation of data and particular attribute

of dataset.

2. data view : It is used to create smaller dataset from large dataset using

query tool.[2]

1.3.4 Why geospatial analysis of diseases is needed?

Health care organizations will be able to analyze spatial and time data to predict

movements of disease outbreaks over time and adequately prepare for potential

epidemics before they occur.

1.3.5 Disease mapping

Disease mapping is often carried out to investigate the geographical distribution of

disease burden. Area-specific estimates of risk may inform public health resource

allocation by estimating the disease burden in specific areas, and the informal

comparison of risk maps with exposure maps may provide clues to generate hy-

potheses. It provides information on a measure of disease occurrence across a

geographic space. Disease maps are able to provide as a rapid visual summary of

complex geographic information.[4]

1.3.6 Geospatial analysis process

1. Requirement: Gathering data as per the customers requirements.



Figure 1.1: (Geospatial analysis process)

2. Obtain Primary Data: Obtain data and evaluate spatial and temporal

suitability and availability.

3. Get Related Data: Obtain reference and supporting materials including

previous information, mapping data, imagery, technical data, operational

data. Evaluate spatial and temporal suitability and availability of each.

Prepare this data for use as necessary.

4. Arrange Materials in Work Environment: Import and display mate-

rials in geographic information system. Query and view as necessary. Use

background maps, images, and other data such as elevation to enhance mean-

ing.

5. Conduct Overall Familiarization/Orientation: Perform initial layout

including move to map, move text to map, determine and set the proper

scale, perform ortho-rectification and rubber-sheeting of inputs as necessary,

select future data, eliminate data that causes unnecessary visual clutter.

6. Conduct exploitation and analysis: Perform analysis and extraction to

include identify feature and object, updates, identify and examine changes,



extract features, extract features, examine alternative approaches, evaluate

result.

7. Manipulate Data: Manipulate geospatial information including annotate

with positions and routs, add textual information, edit data as necessary to

remove anomalies, apply tools and applications such as line of sight, drape

imagery, identify decision pints and conflicts, adjust data, ortho-rectify.

8. Wrap-up/Report: Complete product and report including generate prod-

ucts, send to storage , add grids, generate additional data requests, generate

value added products.

1.4 Software requirement

1. SAP Lumira

2. ArcGIS online


Chapter 2

Problem

2.1 Problem Statement

Now a days , there are so many diseases occurring in different areas so geospatial

analysis of diseases in different areas are needed. So from that analysis health care

organizations will be able to analyze spatial and time data to predict movements

of disease outbreaks over time and adequately prepare for potential epidemics

before they occur. Health care institutes enrich the repository of patients disease

related information in an increasing manner which could have been more useful

by carrying out relational analysis.

2.2 Approach

2.2.1 SAP Lumira

The approach would include a geospatial analysis using GIS software and SAP

Lumira which is a self service analysis tool having facility of different geo-maps

which combine database with an geo maps. There are different geo-maps used for

locating different diseases in different areas.

8

Chapter 2. Problem 9

• Geo Bubble Chart

• Geo Chloropleth Chart

• Geo Pie Chart

• Geo map

By applying different dimensions and measures we can analyze the database.

There are two approaches are following:

1. GIS approach : Main UI is map and data from SAP Lumira is accessed from

a map.

2. SAP lumira approach : Embedding map within the SAP Lumira UI.

As SAP Lumira is capable for geospatial analysis because it has a limited geo-

maps.

2.2.2 ESRI MAPS

SAP Lumira has allowed plotting data on basic map outlines using Longitude and

Latitude coordinates for some time but not on at street level like we are used to

using in say Google Maps. However, in the most recent version of SAP Lumira

1.17 this has changed as SAP has announced a partnership with ESRI the enables

integration of their ArcGIS online service within SAP Lumira.

Esri which is a mapping software helps you understand and visualize data to make

decisions based on the insights from geo-charts.

With Esri Maps integrated in SAP Lumira you can enable your geo-business data

with intuitive mapping and analytical tools. You will quickly discover new patterns

in the geo charts within Lumira and effortlessly share your insights across the

organization for greater collaboration. One of the nice features of Esri that is not

present in native geo implementation is the concept of layers. You can have a


Chapter 2. Problem 10

chloropleth map and then do a bubble plot on top of that. Another feature (also

not present in the native geo offering) is the ability to show different map views:

topographic, street, satellite, gray.[5]


Chapter 3

Literature review

3.1 Spatial Epidemiology

Spatial Epidemiology is the description and analysis of the geographic, or spatial,

variations in disease with respect to demographic, environmental, behavioral, so-

cioeconomic, genetic, and risk factors. The spread of infectious diseases is closely

associated with the concepts of spatial and spatiotemporal proximity, as individ-

uals who are linked in a spatial and a temporal sense are at a higher risk of get-

ting infected. Proximity to environmental risk factors is therefore important.Thus

knowledge of the spatial and temporal variations of diseases and characterizing

its spatial structure is essential for the epidemiologist to understand better the

populations interactions with its environment.

3.1.1 Framework for spatial analysis

Spatial epidemiology comprises of a wide range of methods. Determining which

ones to use can be challenging. Four groups as illustrated in Fig. that can be used

to define a logical, sequential process for conducting spatial analysis:

1. Data

11

Chapter 3. Literature review 12

Figure 3.1: Conceptual framework of spatial epidemological data analysis

The objectives of spatial epidemiological analysis are the description of spa-

tial patterns, identification of disease clusters, and explanation or prediction

of disease risk. Central to these objectives is the need for data. Geographic

data systems include geo referenced feature data and attributes, be they

points or areas. These data are obtained by taking field surveys, remotely

sensed imagery or use of existing data generated either by government or-

ganizations or those closely linked to government such as cadastral, postal,

meteorological or national census statistics and health organizations.



2. GIS and DBMS

Management of the data is performed using GIS and database management

systems (DBMS), and is of relevance throughout the various phases of spatial

data analysis. GIS provide a platform for managing these data, computing

spatial relationships such as proximity to source of infection, connectivity

and directional relationships between spatial units, and visualizing both the

raw data and results from spatial analysis within a cartographic context.

3. Visualization and exploration

Visualization and exploration cover techniques that focus solely on examin-

ing the spatial dimension of the data. Visualization tools are used resulting

in maps that describe spatial patterns and which are useful for both stim-

ulating more complex analyses and for communicating the results of such

analyses. Exploration of spatial data involves the use of statistical methods

to determine whether observed patterns are random in space. However there

is some overlap between visualization and exploration, since meaningful vi-

sual presentation will require the use of quantitative analytical methods.

4. Modeling

Analytical procedures that simulates real-world conditions within a GIS us-

ing the spatial relationships of geographic features. Modeling introduces the

concept of cause-effect relationship using both spatial and non-spatial data

sources to explain or predict spatial patterns.[7]

3.2 GIS

GIS is an information system(hardware, software, data) to any geographical datasets

which enables us to apply lots of analysis models for generating derived informa-

tion that can be visualized as maps. A Geographic Information System helps us

understand our world, answer questions about our environment and support us

during decision-making.



Figure 3.2: (GIS)

3.2.1 Layered Technology

GIS provide powerful tools for addressing geographical and environmental issues.

Consider the schematic diagram below.Imagine that the GIS allows us to arrange

information about a given region or city as a set of maps with each map displaying

information about one characteristic of the region.In the case below, a set of maps

that will be helpful for urban transportation planning have been gathered.

Each of these separate maps is referred to as a layer, coverage, or level and each

layer has been carefully overlaid on the others so that every location is precisely

matched to its corresponding locations on all the other maps.The bottom layer

of this diagram is the most important, for it represents the grid of a locational

reference system (such as latitude and longitude) to which all the maps have been

precisely registered.

Once these maps have been registered carefully within a common locational refer-

ence system, information displayed on the different layers can be compared and an-

alyzed in combination.Transit routes can be compared to the location of shopping

malls, population density to centers of employment.In addition. single locations



or areas can be separated from surrounding locations, as in the diagram below, by

simply cutting all the layers of the desired location from the larger map.Whether

for one location or the entire region, GIS offers a means of searching for spatial

patterns and processes.

There are 3 perspectives:

1. The Data: A single data repository: the Geo database Every geographic

data set, business logic and behavior.

2. The Map: A set of geometric features that represents a geographic reality.

It is a window for exploring the data.

3. The Model: Analysis tools that creates new geographic information from

existing data.[3][9]

3.3 disease clustering

The cluster, in accordance with the characteristics of the object is in accordance

with certain criteria to distinguish and categorize process.Cluster analysis is a

branch of pattern recognition, are unsupervised.classification.Cluster analysis is

widely used in the field of pattern recognition, image segmentation because of the

method is simple and efficient, but without the characteristics of the training pro-

cess. The clustering algorithm according to the calculation method is partition

into the following categories: cluster method based on the clustering algorithm,

the cluster algorithm based on density algorithm, cluster algorithm based on grid

algorithm and cluster algorithm based on model algorithm.

The generally processing flow of cluster analysis for data is shown in figure 1.

First, we have to preprocess the data set that we want to analysis.Therefore, we

can remove redundant and noise information, reasonable filling in missing features,

feature extraction, in order to achieve the principal component of the extract data,

the purpose of reducing the calculated dimension.Then we will select the appro-

priate model to design clustering algorithm based on the specific requirement and



application scenarios.

The clustering results of the test data are according to the corresponding require-

ments and analysis.The clustering results can reveal certain guiding significance.

If clustering results do not satisfy the requirements, then we need to recollect the

data from multi-dimensions. And we have to re-correction the model and algo-

rithm to achieve reasonable and accuracy analysis results.

Finally, we can min the new knowledge, which has certain guiding significance to

real-world applications. Cluster analysis in the medical field is still in its early

stage of development.However, with the information and digitization of medical

diagnosis and management system, the medical industry has accumulated massive

and exploitable medical information.

Using data mining method from the massive database mining law and implicit

knowledge model is very important in the decision-making process of medical di-

agnosis. The application use cluster analysis in the medical field include clinical

efficacy explore verify typing identification of the disease, and medical image seg-

mentation.Efficacy exploration and validation by clustering the course of treatment

in patients with clinical data, comparing the cluster results in different times of

patients to determine the effectiveness and feasibility colleagues can get different

treatment of individual differences in the treatment of response. The type identi-

fication of disease allows us to understand the pathogenesis of the disease, and it

can provide a scientific basis for early prevention and post- treatment[17].


Chapter 4

Data mining process model

4.1 CRISP-DM

CRoss Industry Standard Process for Data Mining, commonly known by its acronym

CRISP-DM, is a data mining process model that describes commonly used ap-

proaches that data mining experts use to tackle problems. CRISP-DM model for

data mining is divided into six phases. The sequence of the phases is not strict and

moving back and forth between different phases is always required. The arrows

in the process diagram indicate the most important and frequent dependencies

between phases.

4.1.1 Business Understanding

This initial phase focuses on understanding the project objectives and require-

ments from a business perspective, and then converting this knowledge into a

data mining problem definition, and a preliminary plan designed to achieve the

objectives. A decision model, especially one built using the Decision Model and

Notation standard can be used.

17

Chapter 4. Project Development Life Cycle 18

Figure 4.1: (CRISP-DM)

4.1.2 Data Understanding

The data understanding phase starts with an initial data collection and proceeds

with activities in order to get familiar with the data, to identify data quality

problems, to discover first insights into the data, or to detect interesting subsets

to form hypotheses for hidden information.

4.1.3 Data Preparation

The data preparation phase covers all activities to construct the final data set (data

that will be fed into the modeling tools from the initial raw data. Data preparation

tasks are likely to be performed multiple times, and not in any prescribed order.

Tasks include table, record, and attribute selection as well as transformation and

cleaning of data for modeling tools.


Chapter 4. Project Development Life Cycle 19

4.1.4 Modeling

In this phase, various modeling techniques are selected and applied, and their pa-

rameters are calibrated to optimal values. Typically, there are several techniques

for the same data mining problem type. Some techniques have specific require-

ments on the form of data. Therefore, stepping back to the data preparation phase

is often needed.

4.1.5 Evaluation

At this stage in the project you have built a model (or models) that appears to

have high quality, from a data analysis perspective. Before proceeding to final

deployment of the model, it is important to more thoroughly evaluate the model,

and review the steps executed to construct the model, to be certain it properly

achieves the business objectives. A key objective is to determine if there is some

important business issue that has not been sufficiently considered.At the end of

this phase, a decision on the use of the data mining results should be reached.

4.1.6 Deployment

Creation of the model is generally not the end of the project. Even if the purpose

of the model is to increase knowledge of the data, the knowledge gained will need

to be organized and presented in a way that is useful to the customer. Depending

on the requirements, the deployment phase can be as simple as generating a report

or as complex as implementing a repeatable data scoring or data mining process.

In many cases it will be the customer, not the data analyst, who will carry out

the deployment steps. Even if the analyst deploys the model it is important for

the customer to understand up front the actions which will need to be carried out

in order to actually make use of the created models.[3]


Chapter 5

Algorithm

5.1 Epidemiology

Epidemiology is the study of the patterns, causes, and effects of health and disease

conditions in different areas. It is the cornerstone of public health, and shapes

policy decisions and evidence-based practice by identifying risk factors for disease

and targets for preventive health care. Epidemiologists help with study design,

collection, and statistical analysis of data, and interpretation and dissemination

of results.

5.1.1 Frequency based measures used in epidemiology

Epidemiologists use a variety of methods to summarize data.One fundamental

method is the frequency distribution. The frequency distribution is a table which

displays how many people fall into each category of a variable such as landmarks

or disease status. There are different frequency based measures.In this project risk

ratio is calculated.

Epidemiologic data come in many forms and sizes. One of the most common

forms is a rectangular database made up of rows and columns.Each row contains

information about one individual; each row is called a record or observation. Each

20

Chapter 5. Algorithm 21

column contains information about one characteristic such as race or date of birth;

each column is called variable. The first column of an epidemiologic database

usually contains the individuals name, initials, or identification number which

allows us to identify who is who.

The size of the database depends on the number of records and the number of

variables. A small database may fit on a single sheet of paper; larger databases

with thousands of records and hundreds of variables are best handled with a com-

puter. When we investigate an outbreak, we usually create a database called a

line listing.In a line listing, each row represents a case of the disease we are in-

vestigating.Columns contain identifying information, clinical details, descriptive

epidemiology factors, and possible etiologic factors.

1. Risk ratio A risk ratio, or relative risk, compares the risk of some health-

related event such as disease or death in two groups. The two groups are

typically differentiated by demographic factors such as month (e.g., January

versus February) or by exposure to a suspected risk factor (e.g., occurrences

of disease). Often, you will see the group of primary interest labeled the

exposed group, and the comparison group labeled the ?unexposed? group.

We place the group that we are primarily interested in the numerator; we

place the group we are comparing them with in the denominator:

step 1: find the ratio of cases for one month for particular one disease and

all cases of all disease in that month.

step 2: find the ratio of cases for next month for particular one disease and

all cases of all disease in that month.

step 3: take the ratio of both the month(calculated in step 1 and step 2).It is

the risk ratio of one month for particular disease compare to another month.

Risk ratio = risk of group of primary interest/risk for comparison

group

A risk ratio of 1.0 indicates identical risk in the two months.A risk ratio

greater than 1.0 indicates an increased risk for the numerator group, while a

risk ratio less than 1.0 indicates a decreased risk for the numerator group.[11]



5.2 Spatial data mining

Spatial data mining is the process of discovering interesting and previously un-

known, but potentially useful patterns from large spatial data sets. Extracting

interesting and useful patterns from spatial datasets is more difficult than extract-

ing the corresponding patterns from traditional numeric and categorical data due

to the complexity of spatial data types, spatial relationships, and spatial autocor-

relation.

5.2.1 Clustering

Clustering is a process where the features are grouped in clusters. On the base of a

given set of data points, each with a set of features, they are grouped in clusters so

that data points in a cluster are similar to each other while other ones in separate

clusters are different from each other. Spatial clustering is the process of grouping

similar objects based on their distance, connectivity, or relative density in space,

which has been employed for spatial analysis over years.In short,spatial clustering

is the process of discovering groups in large databases.

Spatial view : rows in a database = points in a multi-dimentional space.

Visualization may reveal interesting groups.In hierarchical clustering,All points in

one cluster split and merge till a stop criterion is reached.In Partition cluster-

ing,start with random central point assign points to nearest central point update

the central points approach with statistical rigor.In density clustering,Find clus-

ters based on density of regions Here, by selecting disease and the landmark ,it

will create the one cluster.Like wise all the diseases for different landmarks makes

the different different clusters.The size of the cluster is decided by the no of dis-

ease count.[14]



5.3 Predictive Analysis

5.3.1 Linear regression

In statistics, linear regression is an approach for modeling the relationship be-

tween a scalar dependent variable y and one or more explanatory variables (or

independent variables) denoted X. we can rarely expect the relationship between

two economic variables to be ”perfect”. There are always other variables that

affect the endogenous variable.Differences in these other variables between obser-

vations will cause some data points to lie above the regression line and others to

lie below it.

No single line passes through all three points.Choosing the line passing through

any two of the three points leaves one point off the line, so we say that there is

one degree of freedom in choosing the line.(In the case of only two points, there

were zero degrees of freedom; if we added a fourth point, there would be two de-

grees of freedom.) In the case of only two data points, our regression line passes

through both points, so the residuals are zero–the data points do not deviate from

the line.With three or more data points we cannot find a line that makes all the

residuals zero, except in the unusual case where all the points happen to lie on the

same line.

1. least square methodology There are several different techniques for linear

regression analysis but here there is a simple linear regression analysis using

the method of least squares. Here we fit a straight line through the of the

data points that would provide the best fit to those points. This line is given

by the equation,

y = a + bx (5.1)

where y and x are our variables e.g. disease count and year,month,quarter,day.

b is known as the gradient and is the amount by which y increases for every

increase in x, for example if every day disease increases by 4 disease count



then b here has value 4. a is known as the intercept and is the point where

the straight line meets the y axis. In our example this would be the mini-

mum disease count of diseases at 1st day of month.

As well as calculating the two values a (intercept) and b (gradient) we also

want to calculate the correlation coefficient (denoted by r) which is a mea-

sure of how well the points fit to the straight line. This is a value between

0 and 1 where a result of 0.5 or below would mean that there is little or no

linear relationship while values above 0.8 would mean that there is a strong

linear relationship.

The correlation coefficient is also known as the product-moment coefficient

of correlation or Pearson’s correlation. It is sometimes also expressed as a

r-squared. Gradient :

We begin by calculating the gradient b. This is given by the formula,

b = cov(x, y)/var(x) (5.2)

which is the covariance in x and y divided by the variance in x. Covariance

in x,y is given by the following formula,

cov(x, y) =∑

(x− x)(y − y)/n (5.3)

and variance in a is given by,

var(x) =∑

(x− x)2 (5.4)

Intercept:

Once we have found b we can then calculate the intercept (a) by,

a = y − bx (5.5)



That gives us the values we need for our straight line equation.

Correlation Coefficient :

To calculate our correlation coefficient we use,

r = cov(x, y)/√

var(x)var(y) (5.6)

You can apply a linear regression to your data, to visualize a linear trend

or to predict future data based on the linear trend in your data. Linear

regression uses a measure and a dimension that is part of a time hierarchy

(for example, Month) as its inputs. SAP Lumira Use this algorithm to find

trends in data. It determines how an individual variable influences another

variable with the least square methodology.[21]

5.3.2 Forecasting

The forecasting capability in SAP Lumira lets you use historical data as the ba-

sis for predicting future values. The forecasting feature analyzes the trends and

cycles of a time series to predict future values. Forecasting uses a measure and

a dimension that is part of a time hierarchy (for example, Month) as its inputs.

You specify how many forecasted values you want the algorithm to produce. SAP

Lumira provides two algorithms for forecasting future data:

1. An SAP Predictive Analytics : Time series analysis computes several models

that are compared for best results. It does this by breaking a time series

into four components:

• Trend : A trend exists when there is a long-term increase or decrease

in the data. It does not have to be linear. Sometimes we will refer to a

trend “changing direction” when it might go from an increasing trend

to a decreasing trend.



• Cycles : A cyclic pattern exists when data exhibit rises and falls that

are not of fixed period. The duration of these fluctuations is usually of

at least 2 years.

• Fluctuations : It is an irregular rising and falling in number or amount.It

means a variation in quantity overtime.

• Information Residue : A residual in forecasting is the difference between

an observed value and its forecast based on other observations.

2. Triple Exponential Smoothing : Use this algorithm to smooth the source

data and find seasonal trends in data.A seasonal pattern exists when a series

is influenced by seasonal factors (e.g., the quarter of the year, the month, or

day of the week). Seasonality is always of a fixed and known period.[25][22]


Chapter 6

Database Design

6.1 Introduction

Database design is the process of producing a detailed data model of a database.

This data model contains all the needed logical and physical design choices and

physical storage parameters needed to generate a design in a data definition lan-

guage, which can then be used to create a database. A fully attributed data model

contains detailed attributes for each entity.

In a majority of cases, a person who is doing the design of a database is a person

with expertise in the area of database design, rather than expertise in the domain

from which the data to be stored is drawn e.g. financial information, biological

information etc.Therefore, the data to be stored in the database must be deter-

mined in cooperation with a person who does have expertise in that domain, and

who is aware of what data must be stored within the system.This process is one

which is generally considered part of requirements analysis.

Once a database designer is aware of the data which is to be stored within the

database, they must then determine where dependency is within the data. Some-

times when data is changed you can be changing other data that is not visible.

Once the relationships and dependencies amongst the various pieces of information

have been determined, it is possible to arrange the data into a logical structure

27

Chapter 6. Database Design 28

which can then be mapped into the storage objects supported by the database

management system.

6.2 Database Introduction

The tables that are being referred for the development of this module are as follow:

• res diseases

• res area wise diseases

6.2.1 res diseases

The attributes of this tables are as follows.

• disease id : It shows the unique id for each disease.(e.g. d1,d2,d3..)

• disease name : It indicates the disease name for particular disease id.

6.2.2 res area wise diseases

The attributes of this tables are as follows.

• patient id : It shows the unique id for the patient.(e.g. PAT1,PAT2..)

• disease id : It is a disease id by which the disease name will be fetched from

the res diseases table.

• month : It contains the month in which the case is registered.

• year : It contains the year in which the case is registered.

• landmark : it shows the adress of the patient or the area of particular case

affected by that disease.


Chapter 6. Database Design 29

• locality : it indicates that registered case is from which area or region.

• province : it shows that registered case is from which state .

• country : it shows that registered case is from which country.

• latitude : it indicates the latitude of particular area.

• logitude : it indicates the logitude of particular area. latitude and logitude

is used for locating particular area in the geo-map.

• postcode : it shows the pincode of particular area.


Chapter 7

Implementation

7.1 Calculating risk ratio

The risk ratio can be implemented by two ways.

7.1.1 By entering two months and the disease for which

comparison will be done. According to that risk ratio

will be generated.

Figure 7.1: (Enter details to find risk ratio)

30

Chapter 7. Implementation 31

Figure 7.2: Risk ratio as a result

7.1.2 When there is new entry in the database ,risk ratio

will be added to the database according to the algo-

rithm.

As risk ratio is important for comparing the particular disease risk in two months

it can be added as a new column in database. Here, risk ratio of one month ac-

cording the next immediate month is counted(i.e. risk ratio of January according

to February,risk ratio of February according to march and so on). for that follow-

ing steps are implemented.

step 1: count the cases of particular disease in particular month.After that count

all the cases in the month.(Here find the cases of disease d1 in month January and

find the total cases of January)find the ratio of it.

step 2: count the cases of that particular disease in the next month.After that

count the all cases in the month.(Here find the cases of disease d1 in month Febru-

ary and find the total cases of February).Find the ratio of it.

step 3: insert the risk ratio in the row matching that disease and 1st month(i.e



January)

step 4: Again do the step 1 for the next two months.After completing one disease

check for all the diseases and put the values respectively.

So,When the new record is inserted or new row is inserted this function is called

and value of risk ratio is put according the month and disease.This file is then

imported to the SAP Lumira.

7.2 Google map APIs

Google APIs is a set of application programming interfaces (APIs) developed by

Google which allow communication with Google Services and their integration to

other services. Google map APIs provides the API key by which the longitude

and latitude for the selected address of the landmark are to be get. As the address

is selected it will insert the longitude and latitude of that address. So it will easy

to find and take less time.



7.3 Visualizations in SAP Lumira

7.3.1 Analysis of diseases in Surat

1. How significant is the impact of different diseases in different areas?

Visualization: disease count by landmark and disease name(column chart)

Figure 7.3: disease count for different diseases in Surat

Here,X-Axis represents the disease count and Y-Axis represents landmark

and color shows the disease name. This graph represents the disease count

of different diseases for a particular area.So from that we can analyze that

which diseases have most impact on particular area.

Here, disease count of different diseases in Adajan can be seen. e.g. There

are 37 cases of convulsion which is maximum from all the diseases. After

convulsion Adajan is more affected by emphysema,acne,actinik keratosis and

digestive spasms having disease count 28,22,20,15 respecively.



2. How diseases spread day by day in different areas?

1)Visualization: disease count by landmark,day and disease name(line chart)

Figure 7.4: Acen spreads in Dabholi day by day

In graph, X-Axis represents the disease count and Y-Axis represents day

and color shows the disease name. Here from the graph we can analyze that

how disease is increased and decreased in different areas( filter is applied).So

from this line chart we can analyze the spreading of diseases day by day.

Here, chart shows how disease acne is spreading in Dabholi day by day.Disease count

shows the cases of that disease on particular day.e.g.day 1 disease count is

4 for acne which decreases on day 2(i.e 1).After then again it is increased

by 2 disease count and so on.It is noticed that in the middle of the month

effect of acne remain constant.Highest count of disease is on day 1 and day 15.

2)Visualization: disease count by landmark LongLat and disease name

(Geo bubble map)

Here, geo bubble map shows the area located by latitude and longitude(Which

is geographic dimensions) affected by the diseases.It also show a day by day



Figure 7.5: Acen spreads in quarter 1








spreading of diseases. The size of the circle shows the disease count. Color

shows the disease name.

Here, in Adajan disase count of acne is 4 in quarter 1,6 in quarter 2 and 6

in quarter 3 and 4 in quarter 4 so on.So,it is observed that disease is increas-

ing day by day.We can analyze this on each day as the animation is based

on day. This is the only geo map having animation feature.

3)Visualization: disease count by landmark,disease name,day(cross tab

chart)

disease_count by landmark, disease_name, day

landmark disease_name day

Measures

disease_count

Adajan Acne 4

5

6

7

9

10

15

19

21

23

1

1

3

2

3

2

1

1

2

1


This cross tab graph gives the details about day by day disease count for

different diseases in different areas. From this graph we can easily observed

that what are the day by day counts of diseases in different areas. There

are highest disease count 3 of acne in Adajan on 6th and 9th day of the

month.there lowest disease count 1 of acne in adajan on 4th,5th,15th,19th,23th.



3. Which is the most affected area by particular disease among all

diseases?

Visualization: disease count by landmark LongLat and disease name(Geo

pie map)

Figure 7.10: comparison of acne, actinic keratoses and acute myocardinalinfraction in Surat

This geo pie map shows the different disease count for different areas for

a different diseases geospatially. From that we can analyze which is most

affected areas by particular diseases.Color shows the disease name and size

of circle shows the disease count.

Here we can see that , there are 8 cases of acne in Bhuvneshwari society

which is less compare to other diseases which is actinic keratoses and acute

myocardinal infraction.From that we can compare 4 diseases in particular

area.



4. How two disease spreading month by month in particular area?

Visualization: disease count by landmark,month,disease name(line chart)

Figure 7.11: comparison of acne and myocardinal infraction spreading monthby month in Dabholi

Here,X-Axis represents location and month,Y-Axis represents disease count

and color represents disease name. two lines show the increment and decre-

ment of disease in different month.

From graph we can say that in Dabholi the cases are more for both the dis-

eaes.acute myocardinal infraction is dissolve after june where acne is dissolve

after september.Maximum number of cases of acne and acute myocardinal

infraction are in may month which are 6 and 5 respectively.



5. Which population of surat is more affected by diseases?

Visualization: disease count by landmark groups(pie chart)

Figure 7.12: Total disease count of diseases in different zones of Surat

Here,color represents different zons of Surat.different areas are groupd in

zons.

It shows which zons is most and less affected by the diseases.It is observed

that West zone is most affected by disease having 2,679 disease count.So

there are requirement of the hospitals,doctors,medicines of every diseases by

which the diseases can be reduced fast. East zone is less affected by diseases

having 174 diease count.



6. Which area are at which risk for particular diseases?

Visualization: disease count and risk count by landmark,disease name(2

Axis combined column line chart)

Figure 7.13: risk of diseases in Surat

Here, X-Axis represents disease count, 2nd X-Axis represents risk count, Y-

Axis represents landmark.The graph shows the risk of diseases in different

areas.Here risk count is a measure which has following formula

if disease count < 50 than 3 else if disease count < 25 than 1 else 2

where

3 indicates ”high risk”

2 indicates ”average risk”

1 indicates ”low risk”

Here from graph we can see that there are 42 cases of convulsions which

is at average risk as there is a 2 risk count. Dabholi and Anand mahal

road is at high risk so all the precautions must be taken for the diseases for

controlling it.



7.3.1.1 Cancer analysis

7. Which type of cancer is more in Surat?

Visualization: disease count by disease name(Heat map)

Figure 7.14: disease count of different canceres in Surat

In heat map different colors represents the disease count of the cancer in

Surat. There are mainly 9 type of cancers appearing in Surat which are can-

cer,throat cancer,breast cancer,skin cancer,anal cancer,cervical cancer,liver

cancer,lung cancer,kidney cancer.

It is observed that cervical cancer has the highest disease count then any an-

other cancer.So it is necessary to find the solution to control this cancer.There

are 87 cases of cervical cancer.After that 45 cases of anal cancer.Other can-

cers liver cancer,lung cancer,cancer,throat cancer,breast cancer,kidney can-

cer,skin cancer has disease count 2,1,2,4,2,1,2 respectively.



8. How two cancers are spreading day by day?

Visualization: disease count by day and disease name(line chart)

Figure 7.15: Cancers spread day by day in Surat

Here, X-Axis represents day,Y-Axis represents disease count and color rep-

resents disease name. From above line chart it is observed that how anal

cancer and cervical cancer are spreading day by day.

Cervical cancer is increasing fast than anal cancer.Anal cancer is dissolve

between 5 to 8 day but cervical cancer is constantly appeared in each

day.There are highest 8 disease count of anal cancer on 19th day and 7 dis-

ease count of cervical cancer on 17th and 30th day.At the end of the month

cervical cancer is decreasing faster.



9. Which area is most affected by which cancer?

Visualization:disease count by landmark LongLat and disease name(Stacked

column chart)

Figure 7.16: Disease count of different cancers in different landmark in Surat

Here,X-Axis represents landmark,Y-Axis represents disease count and color

suggests disease name. This graph shows that which area is having which

type of cancer. There are mainly 8 type of cancer appearing in Surat(i.e.)Anal

cancer, breast cancer, cervical cancer, kidney cancer, lung cancer, cancer,

skin cancer, liver cancer, throat cancer. Here,there are 6 cases of cervical

cancer which is maximum from all the cancers, 1 case of breast cancer

which is minimum and 3 cases of anal cancer. Rest of this cancers are not

appear in canal road area.



7.3.1.2 Swine flu analysis

10. How Swine flu spreading month by month?

Visualization: disease count by month and disease name(line chart)

Figure 7.17: Swine flu spreads day by day in Surat

Here X-Axis represents month Y-Axis represents disease count and color rep-

resents disease name.It is observed that in january the swine flu is decreasing

month by month.

In january disease count of swine flu is 4 which decreases in february (i.e

2)Which is constant till may month.After the may month the effect of swin

flu decreases faster as the medicines are available. Disease count of swine

flu in august and october are 1,1 respectively. So at the end of the year the

effect of swine flu decreases in Surat.



11. How Swine flu affect different landmarks of Surat?

Visualization: disease count by disease name and landmark(column chart)

Figure 7.18: Swine flu affects different landmarks in Surat

Here X-Axis represents disease name Y-Axis represents disease count and

color represents landmark.

It is observed that Opposite gail tower area and Ved road is most

affected by swine flu having disease count 2. Other landmarks Anand ma-

hal,Bhuvneshwary,Green city,Opposite new l.p.savani school,Vasupujiya green,Near

old bank of baroda has disease count 1. So overall all the landmarks are af-

fected by swine flu.



7.3.2 Analysis of diseases in Gujarat

12. Which locality is most affected by diseases in Gujarat?

Visualization: disease count by locality (Column chart)

Figure 7.19: Total disease count in different localities of Gujarat

Here X-Axis represents locality and Y-Axis represents disease count and

color shows the disease name.

It is observed that Aanand is the most affected locality (having 6828 dis-

ease count) and Gandhinagar is least affected locality(having 4991 dis-

ease count) in Gujarat. Other localities like Ahemdabad, Bharuch, Jam-

nagar, Rajkot, Vadodara and Surat has 5829,6167,5895,6204,6268,6118 dis-

ease count respectively.



13. Which disease has most impact in gujarat?

Visualization: Top 5 disease count by disease name(Column chart)

Figure 7.20: Top 5 diseases Which affects most in Gujarat

Here X-Axis represents disease name and Y-Axis represents disease count

and color shows the province. It is observed that there are 10,393 dis-

ease count for the disease convulsions(epilepsy seizures) which make

more impact in gujarat.

After that emphysema,acen,actinic keratoses,accute myocardinal in-

fraction are in the top 5 diseases having disease count 8185,4753,4508,4358

which has more impact on Gujarat.So it is necessary to find the way to

control this diseases.



14. Which disease has least impact in gujarat?

Visualization:Bottom 4 disease count by disease name

Figure 7.21: Bottom 3 diseases Which affects less in Gujarat

Here X-Axis represents disease name and Y-Axis represents disease count

and color shows the province.

It is observed that alcohol withdrawal, alzheimer, motion sickness, bed wet-

ting, malapsorbtion, meneier, menstrual cramps, myocarditis, nasal allergy,

sleep apnea bottom 10 diseases having disease count 6, 7, 7, 12, 12, 12, 12,

12, 12, 12 which has less impact in Gujarat.There are only 6 disease count

for the disease alcohol withdrawal which has the least impact in Gujarat.



15. How the anxiety is appearing in different localities month by month?

Visualization: disease count by month and locality(line graph)

Figure 7.22: Anxiety affects different localities of Gujarat

Here X-Axis represents month and Y-Axis represents disease count and color

shows the province.

It is observed that in the February (2nd month) there are maximum no.

of disease count of anxiety Which is 13 in Aanand,11 in Surat,Vadodara

and Ahemdabad,and 5 in Gandhinagar.After that anxiety is decreasing in

march month.Disease count for Surat, Vadodara, Aanand and Gandhinagar

are 7, 6, 6, 5 repectively. Anxiety is consistence in Gandhinagar in almost all

month.In Ahemdabad and Vadodara after march month there are no cases

of anxiety.In Surat, it appears by 2-3 months.



16. What is the impact of diseases like depression, fatique, anxiety,

migraine on the people of different localities?

Visualization:disease count by locality and disease name( Stacked column

chart)

Figure 7.23: Comparison of mental diseases’effects in different in Gujarat

Here X-Axis represents locality and Y-Axis represents disease count and

color shows the disease count.

Here,we can see that all the 4 diseases anxiety, depression, fatique and mi-

grain appear in every locality.Only there is not a single case of migrain

in jamnagar.Depression is the least in Gandhinagar(6 disease count) as

compare to other localities. Migrain is maximum in the Gandhinagar(16

disease count) then other localities.Patients of fatique and migrain are less

in Ahemdabad than other localities.



17. How different types of cancers affecting the different localities?

Visualization: disease count by disease name and locality(Heat map)

Figure 7.24: Comparison of different cancers in Gujarat

Here X-Axis represents disease name and Y-Axis represents locality and

color shows the disease count.

The cervical cancer is the most spreaded cancer from all the cancers in

Gujarat.Aanand is most affected by the anal cancer having disease count

91. There are least cases of lung cancer in all localities. There are maxi-

mum cases of breast cancer,skin cancer,lung cancer,liver cancer and

throat cancer are in Gandhinagar which is respectively 26 and 25.Cervi-

cal cancer is more in Rajkot and Jamnagar having disease count 87.



18. What is the scenario of blood pressure in different localities by

quarter?

Visualization: disease count by landmark LongLat, disease name and locality(Geo-

Bubble map)

Animation: by quarter

Figure 7.25: Blood pressures’s patients in Gujarat in quarter 1

Here the size of the circle shows the disease count, color shows the landmark

and 1 quarter=3 months. The graph shows the blood pressure’s scenario in

the 1st quarter Where in almost all localities like Lalpur, Amaran, Bhadla,

Chandlekha, Gudel, Adas, Colony, Dahej, Nabipur, Valia, Dabholi etc are

affected by blood-pressur. So blood pressure’s patients are all over in Gu-

jarat in 1st quarter.




It shows the situation of blood pressure in 2nd quarter where the disease

seems less then 1st quarter.So the blood pressure’s patient is decreased in

2nd quarter then in the 1st quarter in allover Gujarat.

Disease is disappears in some localities like Chandlekha, Amran, Dahej,

Gudel, Nabipur, Valia, Adas, Colony etc. There are some localities like

Desar, Atkot, V R mall, Kanisha, Kareli ,Navjivan in which blood pressure’s

patients are noticed.

In quarter 3 the disease decreases and only in Chandlekha in Gandhinagar

2 cases are noticed. So,The medicines are needed to supply in Chandlekha

as it is the only landmark having blood pressure’s patient in quarter 3.

In quarter 4 the disease decreases significantly and almost disappeared from

all the landmarks and only in Chandlekha there is 1 case is noticed. So,in

quarter 4 blood pressure’s patients decreases.





So,the highest patients of blood pressure is in the 1st quarter in all over

Gujarat which is decreasing in each quarter.



19. Which landmark is most affected by diseases?

Visualization: disease count by landmark Long Lat(Geo Chloropleth Chart)

Figure 7.29: Most affected landmark of gujarat

Here color represents the disease count. From this geo map it is observed

that all the localities of Gujarat are affected by different diseases.Lalpuris

the most affected landmark by different diseases in Gujarat.Disease count of

Lalpur is 1,876.

So It can be controlled by the providing proper medicines, hospitals, phar-

macies etc.social awareness is also needed to control the diseases.



20. Which is the most affected and least affected sub region in Gujarat?

Visualization: disease count by Sub region

Figure 7.30: Most affected and least affected locality of Gujarat

Here color represents the disease count. From the graph,Aanand is the

most affected locality in gujarat having disease count 6828 and Panchma-

hals is the least affected having disease count 738.So it is noticed that west

Gujarat is more affected than east Gujarat.So,different health care facilities

are needed in west Gujarat.

21. Which locality is affected by which different diseases?

Visualization: disease count by landmark LongLat and disease name(Geo

pie map)

Here color represents the disease count and size of circle represents dis-

ease count. This map shows that Lalpur is most effected and it is most

effected by actinic keratoses among filtered acne, acid indigestion up-

set stomach and acute myocardinal infraction and acid indigetion upset



Figure 7.31: Comparision of acne, acid indigestion upset stomach, actinickeratoses, acute myocardinal infraction in Gujarat

stomach has least impact which is 15.84%.So by this graph comparison

can be done between diseases in particular locality.



7.3.3 Analysis of diseases in 8 cities of India

22. which city is most affected by diseases in India?

Visualization: disease count by locality(column chart)

Figure 7.32: Total disease count for different localities of India

Here X-Axis represents disease count color shows the locality.

It is observed that Banglore is the most affected by diseases having 11645

disease count which far away than the other localities and Ahemdabad is

the least affected by diseases having 5829 disease count. Other localities

Ahemedabad, Mumbai, New delhi, Hydrabad, Pune, Surat has 5829, 6123,

6225, 6224, 6203, 6118 disease count respectively.So, the Banglore needs to

find the way how the diseases will be controlled.



23. Comparision of leptospirosis spreading between two cities day by

day. Visualization: disease count day and locality

Figure 7.33: Comparison of spreading of leptospirosis between Surat andBanglore

Here X-Axis represents day,Y-Axis represents disease count and color shows

the locality.

Graph shows that how leptospirosis is spreading day by day in surat

and banglore.There is big difference between two cities in spreading of

leptospirosis.leptospirosis spreads and increases faster in banglore than in

surat.On 5th count is 1 in surat.On 6th day both localities have 1 dis-

ease count. On 10th day leptospirosis is increases having disease count 2.

On 10th day it is remain same.After 11th day it increases in both the locali-

ties.Disease count of leptospirosis in banglore and surat on 12th day is 6 and

3 respectively.



24. Which city is effected by which disease most?

Visualization: Top 10 disease count on disease name by disease name and

locality(Heat map)

Figure 7.34: Top 10 disease count in India

Here X-Axis represents disease name ,Y-Axis represents locality and color

shows the disease count.

From the graph we can analyse that how top 10 diseases affected different

cities.It is observed that Banglore is most affected by convulsions and

emphysema having disease count 1368 and 1128 respectively. Rest of all

cities Pune, New delhi, Hydrabad, Mumbai, Surat, Ahemdabad, Kolkata are

also influenced by convulsions having disease count 726,726,726,715,715,671,577

respectively. Kolkata is 2nd highest affected by acute mycardinal infraction

and least affected by actinic keratoses.



25. How cities are affected by some mental diseases?

Visualization: disease count by locality and disease name(Stacked column

chart)

Figure 7.35: Top 10 disease count in India

Here X-Axis represents disease name ,Y-Axis represents locality and color

shows the disease count.

Here the stacked coloumn chart suggests that anxiety cases are more in

Banglore. So banglore need same system of civilization which decreases the

anxiety of people. Bloodpressure patients is more in kolkata. Depression

patients are equal in pune and new delhi (3 cases). Isonomia is more in

banglore (4 cases). Ahemdabad is least affected by the mental diseases (6

cases). Rest of the cities Surat, Mumbai, Pune, New delhi, Banglore, Hy-

drabad, Kolkata having disease count 7,9,13,10,14,8,11 respectively. There

are more cases of fatique in Kolkata and Pune as the physical hardwork

is more.Hydrabad, Mumbai, Ahemdabad, Kolkata is the least affected by

isinomia as there disease count is 1.



26. How swine flue is spreading in india month by month?

Visualization: disease count by region and disease name(geo bubble map)

Animation: by month

Figure 7.36: Swine flu spreads in India in february

Here color represents disease ,size of circle represents disease count. It is

observed that in february all the states Gujarat, Maharastra, Delhi, Kar-

nataka, Aandra pradesh and West bengal are affect by the swine flu except

Tamilnadu has disease count 4,3,2,4,2,2 respectively which are affected by

swine flu.Gujarat and Karnataka is more affected.

In march, all the states are affected by swine flue except West bengal,

Tamilnadu. It is observed that in march all the states Gujarat, Maharas-

tra, Delhi, Karnataka and Aandra pradesh has disease count of swine flu is

4,4,2,3,2 respectively.

In Maharstra Swine flu increases than in february.Disease count of swine flu

in Maharstra is 3.swine flu is disappeared from the West bengal in march.



Figure 7.37: Swine flu spreads in india in March

Figure 7.38: Swine flu spreads in india in April



In April, swine flue is disappeared from all the states which were affected

and West bengal is influenced by the swine flue again. It is observed that

west bengal has 1 patient of swine flue in april. So the medicines and proper

hospital facilities needed in West bengal.

Figure 7.39: Swine flu spreads in India in May

In may,again swine flue increases in all the states Gujarat, Maharastra,

Karnataka, Aandra pradesh, Tamilnadu, Delhi, West bengal has 4,4,3,2,1,2,2

disease count. Gujarat and Maharsatra have the most and Tamilnadu has

the least number of patients of swine flu. Swine flu increses in west bengal.

Tamilnadu has the best facilities of hospitals and medicines as it is the least

affected.

In july,swine flue disappears from all the states Gujarat, Maharastra, Kar-

nataka, Tamilnadu, Delhi, Andhra pradesh except West bengal so there is

consistent up-down of the effect of swine flu in all the states and in west

bengal swine decreases than before as the medicines are to be available and

all other help is provided.



Figure 7.40: Swine flu spreads in India in July

Figure 7.41: Swine flu spreads in India in August



In august,again swine flue started spreading in all states except Tamilnadu.

All the states Gujarat, Maharastra, Karnataka, Andhra pradesh,Delhi and

West bengal has 2,2,2,1,1,1. As per the observation West bengal is the

consistent affected by swine flu as in all the month swine flu is noticed.

Figure 7.42: Swine flu spreads in india in October

In october, the situation of swine flue in different states remain same.

It takes more time to recover the people of Gujarat,Maharastra, Karnataka,

Andhra pradesh, Delhi, West bengal as the effect is same in october.Maharastra

and Karnataka have disease count 2 and they are the most affected.

In december, swine flue is disappeared from all the states and only West

bengal is affected.West bengal has disease count 1 in december.

So we can predict that disease will be disappeared from Gujarat,Maharastra,

Karnataka, Delhi, Tamiladu and Andhra pradesh next months and it is

possible that west bengal takes some months to recover. West bengal needs

some help from other states to recover from the swine flu.



Figure 7.43: Swine flu spreads in India in December

So,it is observed that in march and may months all states affect the most

and in april ,july and december affect the least.



27. Which state is most effected by disease?

Visualization: disease count by Region(Geo chloropleth graph)

Figure 7.44: Most and least affected state by diseases in India

Here,color shows the disease count. From the geo chloroleth map it is ob-

served that Karntaka is the most affected and Tamilnadu is the least

affected state.

So from this graph we can come to know that which state needs more hos-

pitals, medicines, technologies, doctors etc,what is scenario of diseases in

different states of India,how the states are recovered from diseases,we can

take help from the least affected state.

28. Which state is most effected by disease?

Visualization: disease count by Region and disease name(Geo pie chart)

Here color represents disease name and size of circle represents disease count.

It is observed that Karnataka is the most affected by anemia,West bengal is

most affected by bacterial infections,Tamilnadu is not affected by keratosis,



Figure 7.45: Comparison of anemia, asthma, bacterial infections, eye allergiesand keratitis in India

asthma.

From this graph the comparison between the anemia,asthma,bacterial infec-

tion,eye allergies and keratitis can be done.Asthma is less in all the states.

29. Comparision of different cities influenced by different diseases.

Visualization: disease count by locality and disease name(area chart)

Here,X-Axis represents locality,Y-Axis represents disease count and color

shows disease name.

From the graph,bites and stings,lukemia, leg pain or cramps are more in the

kolkata having disease count 17,6,22. There are less cases of luekemia all over

then above 4 diseases.Banglore is also affected by led pain or cramps and

bites and stings having disease count 22 and 15 respectively.Surat, Ahemd-

abad, Mumbai, Delhi, are the least affeted by this 4 diseases.Lukemia is



Figure 7.46: Comparison of anemia, asthma, bacterial infections, eye allergiesand keratitis in India

the least spreaded and bites and stings is the most spreaded among these 4

diseases.

30. how allergies,alopecia,altitude illness,alzeimer,amblopia affect dif-

ferent locality?

Visualization: disease count by disease name and locality

Here,X-Axis represents disease count ,Y-Axis represents disease name and

color shows locality.

From the graph ,it is observed that all the localities are most affected by

amplopia. Disease count of amplopia for Ahemdabad, Banglore, Mumbai,

New delhi, Hydrabad, Kolkata, Pune, Surat are 7,14,8,10,10,0,8,9 respec-

tively. Disease count of allergies for Ahemdabad, Banglore, Mumbai, New



Figure 7.47: Comparison of allergies,alopecia,altitude illness,alzeimer and am-blopia in India

delhi, Hydrabad, Kolkata, Pune, Surat are 1,2,1,1,1,6,1,1 respectively. Dis-

ease count of alopecia for Ahemdabad, Banglore, Mumbai, New delhi, Hy-

drabad, Kolkata, Pune, Surat are 1,2,1,1,1,6,1,1 respectively. Amplopia is

more in pune rest of the diseases are more in kolkata. Disease count of

altitude illness for Ahemdabad, Banglore, Mumbai, New delhi, Hydrabad,

Kolkata, Pune, Surat are 1,2,1,1,1,5,1,1 respectively. Amplopia is more in

Pune rest of the diseases are more in Kolkata.

31. Comparision of different cities influenced by different diseases.

Visualization: disease count by locality and disease name(Stacked column

chart)



Figure 7.48: Disease count for different diseases in different localities in India

Here,X-Axis represents disease name ,Y-Axis represents disease count and

color shows locality.

From the graph it is noticed that which diseases are top 5 diseases and which

diseases have less effects. Convulsion,emphysema,acen, actinic keratoses, ac-

cute myocardinal infraction having disease count 10000,8000,4500,4300,4200

are the top 5 diseases.

32. Which are the diseases which are the least spreaded?

Visualization:Bottom 3 disease count by disease name(donut chart)

Here,color shows disease name.

From the graph ,it is observed that bed-wetting is the least appearing

disease(7.74%).alchohol widrawal, alergies and allergic reaction, altitude



Figure 7.49: Bottom 3 disease count in India

illness,autism, kidney cancer, lung cancer, menere, menstrual cramps, sleep

aprea, violigo are also less appeared.This is the least affected diseases in

India


Chapter 8

Conclusion

From the analysis done, following are the conclusions that are derived:

In Surat, Adajan is most affected landmark. Acen is spreading more faster then

accute myocarninal infraction in Dabholi. South zone is most affected and east

zone is less affected by diseases. Most spreaded cancer is cervical cancer. Affected

areas by liver cancer are Near mahavir petrol pump and U m road.

In Gujarat, Aanand is the most affected locality and Gandhinagar is the least

affected locality by diseases. Convulsion(epilepsy seizures) has most impact and

alcohol widrawal has least impact. In february ,anxiety is appearing the most.

Depression is the least and anxiety is the most in Gandhinagar. Cervical cancer is

the most spreaded cancer and lung cancer is least spreaded cancer. Gandhinagar

is the most affected locality by cancers. In 1st quarter, blood pressure is appeared

most. Lalpur is the most affected landmark.

In India, Banglore is the most affected and Ahemdabad is the least affected cities.

Ahemdabad is least affected by mental diseases There are more cases of fatique

in Pune and Kolkata. West bengal is most affected by swineflue. Karnataka is

the most affected and Tamilnadu is least the affected state. West bengalis most

affected by bacterial infection. bites and stings, lukemia, leg pain or cramps,

azheimar, altitube illness, allopeciaand allergies are more in the Kolkata.

75

Chapter 8. Conclusion 76

leptospirosis and amblopia is more in Banglore. Convulsion, emphysema, acen, ac-

tinic keratoses are top 5 most spreaded diseases. Bed-wetting is the least appeared

disease.


Chapter 9

Future prediction

1. What will be the future scenario of diseases in overall India?

1)Visualization: disease count and Linear Regression: disease count by

Month (By linear regression)

Figure 9.1: Prediction of diseases in India in 2016 by linear regression

77

Chapter 9. Future prediction 78

Here X-Axis represents year,quarter,month,Y-Axis represents disease count

and Linear Regression: disease count color shows the disease count and Lin-

ear Regression: disease count.

From the graph,we can see that on basis of previous data after the january

2016 the predictive disease count of diseases is decreasing. So from this

we can predict the up break of diseases in future and control the diseases

after they occur or prevent the diseases to be occurred. 2)Visualization:

disease count and forecast: disease count by Month (By forecasting)

Figure 9.2: Prediction of diseases in India in 2016 by forecasting

Here X-Axis represents year,quarter,month,Y-Axis represents disease count

and Forecasting: disease count color shows the disease count and Forecast-

ing: disease count. Above graph shows that what are the predicted dis-

ease count by forecasting and the disease count will decrease in the next 2

month.



2. What is the possibilities of acid indigestion upset stomach in India

in 2016?

1)Visualization: disease count and Linear Regression: disease count by

month for acid indigestion upset stomach in India

Figure 9.3: Prediction of acid indigestion upset stomach in India in 2016 bylinear regression

Here X-Axis represents month,Y-Axis represents disease count and Linear

Regression: disease count color shows the disease count and Linear Regres-

sion: disease count.

From the graph,we can see that on basis of previous data after the january

2016 the predictive disease count of diseases is increasing for acid indigestion

upset stomach.

2)Visualization: disease count and Forecasting: disease count by

month for acid indigestion upset stomach in india



Figure 9.4: Prediction of acid indigestion upset stomach in India in 2016 byforecasting

Here X-Axis represents month,Y-Axis represents disease count and fore-

casting: disease count color shows the disease count and forecasting: dis-

ease count. From the graph,we can see that on basis of previous data after

the january 2016 the predictive disease count of diseases is increasing for

acid indigestion upset stomach.


Bibliography

[1] https://www.wikipedia.org [Date Accesesd : 15 january 2016]

[2] ”conceptual framework for spatial analysis” http://www.powershow.com/

view/9bbc7-OWNmZ/Conceptual_frameworks_for_spatial_analysis_

powerpoint_ppt_presentation [Date Accesesd : 15 january 2016]

[3] Pete Chapman, Julian Clinton, Randy Kerber, Step by step data mining guide

CRISP DM https://the-modeling-agency.com/crisp-dm.pdf [Date Ac-

cesesd : 16 january 2016]

[4] ”SPATIAL ANALYSIS AND MAPPING OF CHOLERA CAUSING

FACTORS IN KUMASI, GHANA.” JERRY ASAANA ANAMZUI-

YA March, 2012 https://www.itc.nl/library/papers_2012/msc/gfm/

asaana.pdf[Date Accesesd : 18 january 2016]

[5] http://training.esri.com/Courses/StartGIS_10/index.cfm(course_

tutorial)[Date Accesesd : 18 january 2016]

[6] International Journal of Science, Engineering and Technology Research

(IJSETR), Volume 4, Issue 7, July 2015 2697 Comparative Anal-

ysis of K-Means Algorithm in Disease Prediction K.Rajalakshmi1,

Dr.S.S.Dhenakaran2, N.Roobini http://ijsetr.org/wp-content/

uploads/2015/07/IJSETR-VOL-4-ISSUE-7-2697-2699.pdf [Date Ac-

cessed : 15 February 2016]

[7] Rhttps://courseware.e-education.psu.edu/courses/bootcamp/lo09/

04.html [Date Accessed : 15 February 2016]

81

https://www.wikipedia.org

http://www.powershow.com/view/9bbc7-OWNmZ/Conceptual_frameworks_for_spatial_analysis_powerpoint_ppt_presentation



https://the-modeling-agency.com/crisp-dm.pdf

https://www.itc.nl/library/papers_2012/msc/gfm/asaana.pdf

https://www.itc.nl/library/papers_2012/msc/gfm/asaana.pdf

http://training.esri.com/Courses/StartGIS_10/index.cfm(course_tutorial)

http://training.esri.com/Courses/StartGIS_10/index.cfm(course_tutorial)

http://ijsetr.org/wp-content/uploads/2015/07/IJSETR-VOL-4-ISSUE-7-2697-2699.pdf

http://ijsetr.org/wp-content/uploads/2015/07/IJSETR-VOL-4-ISSUE-7-2697-2699.pdf

Rhttps://courseware.e-education.psu.edu/courses/bootcamp/lo09/04.html

Rhttps://courseware.e-education.psu.edu/courses/bootcamp/lo09/04.html

Refrences 82

[8] http://www.colorado.edu/geography/gcraft/notes/intro/intro.html

[Date Accessed : 19 February 2016]

[9] copiadewhatisgisjaimenievesignaciovazquez-111116134133-phpapp01.pdf

http://www.slideshare.net/ESRI/what-is-gis-10190355 [Date Ac-

cessed: 19 February 2016]

[10] ”Geographic Data Mining and Knowledge Discovery,Research

Monographs in GIS” Taylor and Francis, 2001.url-

http://www.dbs.ifi.lmu.de/Publikationen/Papers/Chapter7.revised.pdf


[11] ”Frequency Measures Used in Epidemiology”https://www.uic.edu/sph/

prepare/courses/ph490/resources/epilesson02.pdf [Date Accessed : 21

February 2016]

[12] ”Algorithms and Applications for Spatial Data Mining” Martin Ester, Hans-

Peter Kriegel, Jorg Sander (University of Munich) [Date Accessed : 21 Febru-

ary 2016]

[13] https://www.google.co.in/webhp?sourceid=chrome-instant&ion=

1&espv=2&ie=UTF-8#q=geospatial%20analysis%20in%20sap%20lumira


[14] ”Spatial data mining” www.cs.sjsu.edu/faculty/.../Spatial%20Data%

20Mining_CS157B_Satoru_Hozumi.ppt[Date Accessed : 21 February 2016]

[15] ”Using Clustering Methods in Geospatial Information Systems” Xin

Wang ,Department of Geomatics Engineering, Schulich School of

EngineeringUniversity of Calgary, Calgary, AB Canada T2N 1N4

[email protected],Howard Hamilton,Department of Computer Sci-

ence, University of Regina, Regina, Canada S4S 0A2, Hamil-

[email protected]://www.ucalgary.ca/wangx/files/wangx/

geoinformaticsxwhh2008.pdf [Date Accessed : 21 February 2016]


http://www.colorado.edu/geography/gcraft/notes/intro/intro.html

http://www.slideshare.net/ESRI/what-is-gis-10190355

https://www.uic.edu/sph/prepare/courses/ph490/resources/epilesson02.pdf

https://www.uic.edu/sph/prepare/courses/ph490/resources/epilesson02.pdf

https://www.google.co.in/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=geospatial%20analysis%20in%20sap%20lumira

https://www.google.co.in/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=geospatial%20analysis%20in%20sap%20lumira

www.cs.sjsu.edu/faculty/.../Spatial%20Data%20Mining_CS157B_Satoru_Hozumi.ppt

www.cs.sjsu.edu/faculty/.../Spatial%20Data%20Mining_CS157B_Satoru_Hozumi.ppt

http://www.ucalgary.ca/wangx/files/wangx/geoinformaticsxwhh2008.pdf

http://www.ucalgary.ca/wangx/files/wangx/geoinformaticsxwhh2008.pdf

Refrences 83

[16] http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/

uuid/e01da20b-db14-3210b984-83e648239c61QuickLink=index&

overridelayout=true&60155311981591(SAP LUMIRA tutorials)[Date

Accessed : 29 March 2016]

[17] ”Research in Clustering Algorithm for Diseases Analysis” Kaijian XIA, Yue

WU, Xiaogang REN Changshu No.1 People’s Hospital, Jiangsu, Changshu,

China Email: [email protected] Yong JIN School of Computer Science and

Engineering, Changshu Institute of Technology, Changshu, China Email: jiny-

[email protected] [Date Accessed : 29 March 2016]

[18] ”Introduce Basic Algorithm for Predictive Analy-

sis” SelwynZhou, BI Consultant, ATCG Solutions Sel-

[email protected] http://www.atcgsolutions.com/blog/

introduce-basic-algorithm-for-predictive-analysis [Date Accessed :

31 March 2016]

[19] http://scn.sap.com/docs/DOC-53142 [Date Accessed : 31 March 2016]

[20] ”Introduction to linear regression” Author:David M. Lane http://

onlinestatbook.com/2/regression/intro.html [Date Accessed : 31

March 2016]

[21] SAP Lumira — SCN , http://scn.sap.com/community/lumira[Date Ac-

cessed : 31 March3 2016]

[22] ”Linear regression analysis and web intelligence” http://www.gulland.com/

wp/?p=534[Date Accessed : 1 April 2016]

[23] https://cp.hana.ondemand.com/dps/d/preview/

5a4bc2cea197421a8ce8474ef803e596/1.28/en-US/frameset.htm?

6630b086c9444170b5ebe0f52cbdc977.html

[24] http://www.esri.com[Date Accessed : 7 April 2016]

[25] https://www.otexts.org/fpp/6/1[Date Accessed : 7 April 2016]


http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/e01da20b-db14-3210b984-83e648239c61QuickLink=index&overridelayout=true&60155311981591



http://www.atcgsolutions.com/blog/introduce-basic-algorithm-for-predictive-analysis

http://www.atcgsolutions.com/blog/introduce-basic-algorithm-for-predictive-analysis

http://scn.sap.com/docs/DOC-53142

http://onlinestatbook.com/2/regression/intro.html

http://onlinestatbook.com/2/regression/intro.html

http://scn.sap.com/community/lumira

http://www.gulland.com/wp/?p=534

http://www.gulland.com/wp/?p=534

https://cp.hana.ondemand.com/dps/d/preview/5a4bc2cea197421a8ce8474ef803e596/1.28/en-US/frameset.htm?6630b086c9444170b5ebe0f52cbdc977.html



http://www.esri.com

https://www.otexts.org/fpp/6/1

Documents

Geospatial Analysis of Diseases for Ratnakala Health ERPmoradiya.in/home/downloadFile/4730559ae19627565c.pdfGeospatial Analysis of Diseases for Ratnakala Health ERPTM Submitted in