Upload
dinhphuc
View
220
Download
0
Embed Size (px)
Citation preview
Geospatial Analysis of Diseases for Ratnakala
Health ERPTM
Submitted in partial fulfilment of the requirements for the degree of
BACHELOR OF TECHNOLOGY
in
Computer Engineering
Submitted by
Shivani P. Desai
(201209100310055)
C.G.P.I.T., UKA Tarsadia University
Internal Guide
Asst. Prof. Rachna M. Patel
C.G.P.I.T., UKA Tarsadia University
External Guide
Mr. Ankit Moradiya
(C.E.O. Ratnakala Software Pvt. Ltd.)
Department of Computer Engineering & Information Technology
Chhotubhai Gopalbhai Patel Institute of Technology
UKA Tarsadia University, Bardoli
May, 2016
Contents vi
ContentConfidential iii
Abstract iv
Acknowledgment v
Contents vi
List of Tables viii
List of Figures ix
Abbreviations x
1 Introduction 2
1.1 Health ERP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Data mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Geospatial analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 What is spatial pattern? . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Families of Spatial Data Mining Patterns . . . . . . . . . . . 4
1.3.3 GIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.3.1 GIS datasets . . . . . . . . . . . . . . . . . . . . . 5
1.3.4 Why geospatial analysis of diseases is needed? . . . . . . . . 5
1.3.5 Disease mapping . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.6 Geospatial analysis process . . . . . . . . . . . . . . . . . . . 5
1.4 Software requirement . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Problem 8
2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 SAP Lumira . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 ESRI MAPS . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Literature review 11
3.1 Spatial Epidemiology . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Framework for spatial analysis . . . . . . . . . . . . . . . . . 11
3.2 GIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Layered Technology . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 disease clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Data mining process model 17
4.1 CRISP-DM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.1 Business Understanding . . . . . . . . . . . . . . . . . . . . 17
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Contents vii
4.1.2 Data Understanding . . . . . . . . . . . . . . . . . . . . . . 18
4.1.3 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.4 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.6 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Algorithm 20
5.1 Epidemiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.1.1 Frequency based measures used in epidemiology . . . . . . . 20
5.2 Spatial data mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 Predictive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.1 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.2 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 Database Design 27
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Database Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2.1 res diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2.2 res area wise diseases . . . . . . . . . . . . . . . . . . . . . . 28
7 Implementation 30
7.1 Calculating risk ratio . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7.1.1 By entering two months and the disease for which compar-ison will be done. According to that risk ratio will be gen-erated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7.1.2 When there is new entry in the database ,risk ratio will beadded to the database according to the algorithm. . . . . . . 31
7.2 Google map APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.3 Visualizations in SAP Lumira . . . . . . . . . . . . . . . . . . . . . 33
7.3.1 Analysis of diseases in Surat . . . . . . . . . . . . . . . . . . 33
7.3.1.1 Cancer analysis . . . . . . . . . . . . . . . . . . . . 42
7.3.1.2 Swine flu analysis . . . . . . . . . . . . . . . . . . . 45
7.3.2 Analysis of diseases in Gujarat . . . . . . . . . . . . . . . . . 47
7.3.3 Analysis of diseases in 8 cities of India . . . . . . . . . . . . 59
8 Conclusion 75
9 Future prediction 77
Bibliography 81
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
List of Figures viii
List of Figures1.1 Geospatial analysis process . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Conceptual framework of spatial epidemological data analysis . . . 12
3.2 GIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1 CRISP-DM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7.1 Enter details to find risk ratio . . . . . . . . . . . . . . . . . . . . . 30
7.2 Risk ratio as a result . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.3 disease count for different diseases in Surat . . . . . . . . . . . . . . 33
7.4 Acen spreads in Dabholi day by day . . . . . . . . . . . . . . . . . . 34
7.5 Acen spreads in quarter 1 . . . . . . . . . . . . . . . . . . . . . . . 35
7.6 Acen spreads in quarter 2 . . . . . . . . . . . . . . . . . . . . . . . 35
7.7 Acen spreads in quarter 3 . . . . . . . . . . . . . . . . . . . . . . . 36
7.8 Acen spreads in quarter 4 . . . . . . . . . . . . . . . . . . . . . . . 36
7.9 List of disease count of acne in Adajan on different day . . . . . . . 37
7.10 comparison of acne, actinic keratoses and acute myocardinal infrac-tion in Surat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.11 comparison of acne and myocardinal infraction spreading month bymonth in Dabholi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.12 Total disease count of diseases in different zones of Surat . . . . . . 40
7.13 risk of diseases in Surat . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.14 disease count of different canceres in Surat . . . . . . . . . . . . . . 42
7.15 Cancers spread day by day in Surat . . . . . . . . . . . . . . . . . . 43
7.16 Disease count of different cancers in different landmark in Surat . . 44
7.17 Swine flu spreads day by day in Surat . . . . . . . . . . . . . . . . . 45
7.18 Swine flu affects different landmarks in Surat . . . . . . . . . . . . . 46
7.19 Total disease count in different localities of Gujarat . . . . . . . . . 47
7.20 Top 5 diseases Which affects most in Gujarat . . . . . . . . . . . . 48
7.21 Bottom 3 diseases Which affects less in Gujarat . . . . . . . . . . . 49
7.22 Anxiety affects different localities of Gujarat . . . . . . . . . . . . . 50
7.23 Comparison of mental diseases’effects in different in Gujarat . . . . 51
7.24 Comparison of different cancers in Gujarat . . . . . . . . . . . . . . 52
7.25 Blood pressures’s patients in Gujarat in quarter 1 . . . . . . . . . . 53
7.26 Blood pressures’s patients in Gujarat in quarter 2 . . . . . . . . . . 54
7.27 Blood pressures’s patients in Gujarat in quarter 3 . . . . . . . . . . 55
7.28 Blood pressures’s patients in Gujarat in quarter 4 . . . . . . . . . . 55
7.29 Most affected landmark of gujarat . . . . . . . . . . . . . . . . . . . 56
7.30 Most affected and least affected locality of Gujarat . . . . . . . . . 57
7.31 Comparision of acne, acid indigestion upset stomach, actinic ker-atoses, acute myocardinal infraction in Gujarat . . . . . . . . . . . 58
7.32 Total disease count for different localities of India . . . . . . . . . . 59
7.33 Comparison of spreading of leptospirosis between Surat and Banglore 60
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
List of Figures ix
7.34 Top 10 disease count in india . . . . . . . . . . . . . . . . . . . . . 61
7.35 Top 10 disease count in india . . . . . . . . . . . . . . . . . . . . . 62
7.36 Swine flu spreads in India in February . . . . . . . . . . . . . . . . 63
7.37 Swine flu spreads in india in March . . . . . . . . . . . . . . . . . . 64
7.38 Swine flu spreads in india in April . . . . . . . . . . . . . . . . . . . 64
7.39 Swine flu spreads in India in May . . . . . . . . . . . . . . . . . . . 65
7.40 Swine flu spreads in India in July . . . . . . . . . . . . . . . . . . . 66
7.41 Swine flu spreads in India in August . . . . . . . . . . . . . . . . . 66
7.42 Swine flu spreads in india in October . . . . . . . . . . . . . . . . . 67
7.43 Swine flu spreads in India in December . . . . . . . . . . . . . . . . 68
7.44 Most and least affected state by diseases in India . . . . . . . . . . 69
7.45 Comparison of anemia, asthma, bacterial infections, eye allergiesand keratitis in India . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.46 Comparison of anemia, asthma, bacterial infections, eye allergiesand keratitis in India . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.47 Comparison of allergies,alopecia,altitude illness,alzeimer and am-blopia in India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.48 Disease count for different diseases in different localities in India . . 73
7.49 Bottom 3 disease count in India . . . . . . . . . . . . . . . . . . . . 74
9.1 Prediction of diseases in India in 2016 by linear regression . . . . . 77
9.2 Prediction of diseases in India in 2016 by forecasting . . . . . . . . 78
9.3 Prediction of acid indigestion upset stomach in India in 2016 bylinear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.4 Prediction of acid indigestion upset stomach in India in 2016 byforecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Abbreviations x
AbbreviationsERP Enterprise Resource Planning
GPS Global Positioning System
GIS Geographic Information System
CRISP-DM CRoss Industry Standard Process Data Mining
ESRI Environmental Systems Research Institute
DBMS Data Base Management System
API Application Programming Interface
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Company Profile 1
Ratnakala Software Pvt. Ltd.
205/206, Luxuria Business Hub,
Before VR Mall,
Piplod Road,
Surat – 395007,
Gujarat, India.
The company began as Monali Solutions and transformed into Ratnakala Software
Pvt. Ltd. on the 27th August, 2014. The objectives of the company are to offer
consultancy, advisory and all related services in the field of Information Technology
including computer hardware and software, software development, data commu-
nication, telecommunication, manufacturing and process control and automation,
hardware selection, system design, manpower selection, implementation, training
and to spread computer literacy and computer aided education in rural and urban
areas through application of modern techniques.
The company specialize in developing mobile applications for the two main mobile
platforms which are Android and iOS for all device models whether tablet or
phones.
The company develops mobile application for end users as well as client–specific
applications. Company develop ERPs for the institutes for optimizing the usage
of the resources for their personal growth in the real world market.
The company is a subsidiary of Ratnakala Exports Company, one of the world’s
largest exporters of polished diamonds.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 1
Introduction
1.1 Health ERP
Health ERP has been a valued solution of decision making in treatment. The
software has helped for improving operations with enhancements for productivity,
profitability, growth and overall business processes. In up coming years, ERP will
be in demand for the health care industry.
The key benefits provided by ERP to the health care sector is business intelligence,
better patient care, reduce operational costs. Health ERP is a software which eases
the business work and treatment process in health industry with graphical analy-
sis and decision making. Health involves activities that work for the society and
people by keeping people healthy, by protecting the environment, by making sure
that the water and food supply are safe, and providing sufficient health services.
ERP system in place eliminates duplication and manual processes and proactively
increases patient safety through the use of efficient and effective information sys-
tems.
2
Chapter 1. Introduction 3
1.2 Data analysis
Analysis of data is a process of inspecting, cleaning, transforming, and modeling
data with the goal of discovering useful information, suggesting conclusions, and
supporting decision-making. Data Analysis refers to breaking a whole into its
separate components for individual examination. Data analysis is a process for
obtaining raw data and converting it into information useful for decision-making
by users. Data is collected and analyzed to answer questions, test hypotheses or
disprove theories.[1]
1.2.1 Data mining
Data mining is a particular data analysis technique that focuses on modeling and
knowledge discovery for predictive rather than purely descriptive purposes. The
overall goal of the data mining process is to extract information from a data set
and transform it into an understandable structure for further use.[1]
1.3 Geospatial analysis
Geospatial analysis, or just spatial analysis is an approach to applying statistical
analysis and other analytic techniques to data which has a geographical or spa-
tial aspect. Such analysis would typically employ software capable of rendering
maps processing spatial data, and applying analytical methods to terrestrial or
geographic datasets, including the use of geographic information and geomatics.
It is the gathering, display and manipulation of imagery, GPS, satellite photogra-
phy and historical data, described explicitly in terms of geographical coordinates
or implicitly,in terms of street address, postal code or forest stand identifier as
they applied to geographic model.
Generally geospatial data is not 2D or 3D data.It is a high dimensional data.[1]
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 1. Introduction 4
1.3.1 What is spatial pattern?
• A frequent arrangement, configuration, composition, regularity.
• A rule, law, method, design, description.
• A major direction, trend, prediction.
1.3.2 Families of Spatial Data Mining Patterns
• Location prediction:
– Where will a phenomenon occur?
– Where will disease occur?
– Which disease are predictable in particular spatial location?
– What should be recommended to health care organization to control
disease for affected area?
• interaction:
– Which subset of spatial phenomena interact?
– Which spatial events are correlate with another spatial event?
• Hotspot:
– Which locations are unusual or share commonalities?
– Spatial clustered.
– Diseases is common in particular spatial area?
1.3.3 GIS
Geographical Information System is a large domain that provides a variety of ca-
pabilities designed to capture, store, manipulate, analyze, manage, and present all
types of geographical data, and utilizes geospatial analysis in a variety of contexts,
operations and applications.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 1. Introduction 5
1.3.3.1 GIS datasets
It comes as layers i.e.here in this case layers for diseases and landmarks.Layers
have features.GIS layer has two views.
1. map view : It acts as visual representation of data and particular attribute
of dataset.
2. data view : It is used to create smaller dataset from large dataset using
query tool.[2]
1.3.4 Why geospatial analysis of diseases is needed?
Health care organizations will be able to analyze spatial and time data to predict
movements of disease outbreaks over time and adequately prepare for potential
epidemics before they occur.
1.3.5 Disease mapping
Disease mapping is often carried out to investigate the geographical distribution of
disease burden. Area-specific estimates of risk may inform public health resource
allocation by estimating the disease burden in specific areas, and the informal
comparison of risk maps with exposure maps may provide clues to generate hy-
potheses. It provides information on a measure of disease occurrence across a
geographic space. Disease maps are able to provide as a rapid visual summary of
complex geographic information.[4]
1.3.6 Geospatial analysis process
1. Requirement: Gathering data as per the customers requirements.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 1. Introduction 6
Figure 1.1: (Geospatial analysis process)
2. Obtain Primary Data: Obtain data and evaluate spatial and temporal
suitability and availability.
3. Get Related Data: Obtain reference and supporting materials including
previous information, mapping data, imagery, technical data, operational
data. Evaluate spatial and temporal suitability and availability of each.
Prepare this data for use as necessary.
4. Arrange Materials in Work Environment: Import and display mate-
rials in geographic information system. Query and view as necessary. Use
background maps, images, and other data such as elevation to enhance mean-
ing.
5. Conduct Overall Familiarization/Orientation: Perform initial layout
including move to map, move text to map, determine and set the proper
scale, perform ortho-rectification and rubber-sheeting of inputs as necessary,
select future data, eliminate data that causes unnecessary visual clutter.
6. Conduct exploitation and analysis: Perform analysis and extraction to
include identify feature and object, updates, identify and examine changes,
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 1. Introduction 7
extract features, extract features, examine alternative approaches, evaluate
result.
7. Manipulate Data: Manipulate geospatial information including annotate
with positions and routs, add textual information, edit data as necessary to
remove anomalies, apply tools and applications such as line of sight, drape
imagery, identify decision pints and conflicts, adjust data, ortho-rectify.
8. Wrap-up/Report: Complete product and report including generate prod-
ucts, send to storage , add grids, generate additional data requests, generate
value added products.
1.4 Software requirement
1. SAP Lumira
2. ArcGIS online
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 2
Problem
2.1 Problem Statement
Now a days , there are so many diseases occurring in different areas so geospatial
analysis of diseases in different areas are needed. So from that analysis health care
organizations will be able to analyze spatial and time data to predict movements
of disease outbreaks over time and adequately prepare for potential epidemics
before they occur. Health care institutes enrich the repository of patients disease
related information in an increasing manner which could have been more useful
by carrying out relational analysis.
2.2 Approach
2.2.1 SAP Lumira
The approach would include a geospatial analysis using GIS software and SAP
Lumira which is a self service analysis tool having facility of different geo-maps
which combine database with an geo maps. There are different geo-maps used for
locating different diseases in different areas.
8
Chapter 2. Problem 9
• Geo Bubble Chart
• Geo Chloropleth Chart
• Geo Pie Chart
• Geo map
By applying different dimensions and measures we can analyze the database.
There are two approaches are following:
1. GIS approach : Main UI is map and data from SAP Lumira is accessed from
a map.
2. SAP lumira approach : Embedding map within the SAP Lumira UI.
As SAP Lumira is capable for geospatial analysis because it has a limited geo-
maps.
2.2.2 ESRI MAPS
SAP Lumira has allowed plotting data on basic map outlines using Longitude and
Latitude coordinates for some time but not on at street level like we are used to
using in say Google Maps. However, in the most recent version of SAP Lumira
1.17 this has changed as SAP has announced a partnership with ESRI the enables
integration of their ArcGIS online service within SAP Lumira.
Esri which is a mapping software helps you understand and visualize data to make
decisions based on the insights from geo-charts.
With Esri Maps integrated in SAP Lumira you can enable your geo-business data
with intuitive mapping and analytical tools. You will quickly discover new patterns
in the geo charts within Lumira and effortlessly share your insights across the
organization for greater collaboration. One of the nice features of Esri that is not
present in native geo implementation is the concept of layers. You can have a
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 2. Problem 10
chloropleth map and then do a bubble plot on top of that. Another feature (also
not present in the native geo offering) is the ability to show different map views:
topographic, street, satellite, gray.[5]
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 3
Literature review
3.1 Spatial Epidemiology
Spatial Epidemiology is the description and analysis of the geographic, or spatial,
variations in disease with respect to demographic, environmental, behavioral, so-
cioeconomic, genetic, and risk factors. The spread of infectious diseases is closely
associated with the concepts of spatial and spatiotemporal proximity, as individ-
uals who are linked in a spatial and a temporal sense are at a higher risk of get-
ting infected. Proximity to environmental risk factors is therefore important.Thus
knowledge of the spatial and temporal variations of diseases and characterizing
its spatial structure is essential for the epidemiologist to understand better the
populations interactions with its environment.
3.1.1 Framework for spatial analysis
Spatial epidemiology comprises of a wide range of methods. Determining which
ones to use can be challenging. Four groups as illustrated in Fig. that can be used
to define a logical, sequential process for conducting spatial analysis:
1. Data
11
Chapter 3. Literature review 12
Figure 3.1: Conceptual framework of spatial epidemological data analysis
The objectives of spatial epidemiological analysis are the description of spa-
tial patterns, identification of disease clusters, and explanation or prediction
of disease risk. Central to these objectives is the need for data. Geographic
data systems include geo referenced feature data and attributes, be they
points or areas. These data are obtained by taking field surveys, remotely
sensed imagery or use of existing data generated either by government or-
ganizations or those closely linked to government such as cadastral, postal,
meteorological or national census statistics and health organizations.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 3. Literature review 13
2. GIS and DBMS
Management of the data is performed using GIS and database management
systems (DBMS), and is of relevance throughout the various phases of spatial
data analysis. GIS provide a platform for managing these data, computing
spatial relationships such as proximity to source of infection, connectivity
and directional relationships between spatial units, and visualizing both the
raw data and results from spatial analysis within a cartographic context.
3. Visualization and exploration
Visualization and exploration cover techniques that focus solely on examin-
ing the spatial dimension of the data. Visualization tools are used resulting
in maps that describe spatial patterns and which are useful for both stim-
ulating more complex analyses and for communicating the results of such
analyses. Exploration of spatial data involves the use of statistical methods
to determine whether observed patterns are random in space. However there
is some overlap between visualization and exploration, since meaningful vi-
sual presentation will require the use of quantitative analytical methods.
4. Modeling
Analytical procedures that simulates real-world conditions within a GIS us-
ing the spatial relationships of geographic features. Modeling introduces the
concept of cause-effect relationship using both spatial and non-spatial data
sources to explain or predict spatial patterns.[7]
3.2 GIS
GIS is an information system(hardware, software, data) to any geographical datasets
which enables us to apply lots of analysis models for generating derived informa-
tion that can be visualized as maps. A Geographic Information System helps us
understand our world, answer questions about our environment and support us
during decision-making.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 3. Literature review 14
Figure 3.2: (GIS)
3.2.1 Layered Technology
GIS provide powerful tools for addressing geographical and environmental issues.
Consider the schematic diagram below.Imagine that the GIS allows us to arrange
information about a given region or city as a set of maps with each map displaying
information about one characteristic of the region.In the case below, a set of maps
that will be helpful for urban transportation planning have been gathered.
Each of these separate maps is referred to as a layer, coverage, or level and each
layer has been carefully overlaid on the others so that every location is precisely
matched to its corresponding locations on all the other maps.The bottom layer
of this diagram is the most important, for it represents the grid of a locational
reference system (such as latitude and longitude) to which all the maps have been
precisely registered.
Once these maps have been registered carefully within a common locational refer-
ence system, information displayed on the different layers can be compared and an-
alyzed in combination.Transit routes can be compared to the location of shopping
malls, population density to centers of employment.In addition. single locations
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 3. Literature review 15
or areas can be separated from surrounding locations, as in the diagram below, by
simply cutting all the layers of the desired location from the larger map.Whether
for one location or the entire region, GIS offers a means of searching for spatial
patterns and processes.
There are 3 perspectives:
1. The Data: A single data repository: the Geo database Every geographic
data set, business logic and behavior.
2. The Map: A set of geometric features that represents a geographic reality.
It is a window for exploring the data.
3. The Model: Analysis tools that creates new geographic information from
existing data.[3][9]
3.3 disease clustering
The cluster, in accordance with the characteristics of the object is in accordance
with certain criteria to distinguish and categorize process.Cluster analysis is a
branch of pattern recognition, are unsupervised.classification.Cluster analysis is
widely used in the field of pattern recognition, image segmentation because of the
method is simple and efficient, but without the characteristics of the training pro-
cess. The clustering algorithm according to the calculation method is partition
into the following categories: cluster method based on the clustering algorithm,
the cluster algorithm based on density algorithm, cluster algorithm based on grid
algorithm and cluster algorithm based on model algorithm.
The generally processing flow of cluster analysis for data is shown in figure 1.
First, we have to preprocess the data set that we want to analysis.Therefore, we
can remove redundant and noise information, reasonable filling in missing features,
feature extraction, in order to achieve the principal component of the extract data,
the purpose of reducing the calculated dimension.Then we will select the appro-
priate model to design clustering algorithm based on the specific requirement and
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 3. Literature review 16
application scenarios.
The clustering results of the test data are according to the corresponding require-
ments and analysis.The clustering results can reveal certain guiding significance.
If clustering results do not satisfy the requirements, then we need to recollect the
data from multi-dimensions. And we have to re-correction the model and algo-
rithm to achieve reasonable and accuracy analysis results.
Finally, we can min the new knowledge, which has certain guiding significance to
real-world applications. Cluster analysis in the medical field is still in its early
stage of development.However, with the information and digitization of medical
diagnosis and management system, the medical industry has accumulated massive
and exploitable medical information.
Using data mining method from the massive database mining law and implicit
knowledge model is very important in the decision-making process of medical di-
agnosis. The application use cluster analysis in the medical field include clinical
efficacy explore verify typing identification of the disease, and medical image seg-
mentation.Efficacy exploration and validation by clustering the course of treatment
in patients with clinical data, comparing the cluster results in different times of
patients to determine the effectiveness and feasibility colleagues can get different
treatment of individual differences in the treatment of response. The type identi-
fication of disease allows us to understand the pathogenesis of the disease, and it
can provide a scientific basis for early prevention and post- treatment[17].
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 4
Data mining process model
4.1 CRISP-DM
CRoss Industry Standard Process for Data Mining, commonly known by its acronym
CRISP-DM, is a data mining process model that describes commonly used ap-
proaches that data mining experts use to tackle problems. CRISP-DM model for
data mining is divided into six phases. The sequence of the phases is not strict and
moving back and forth between different phases is always required. The arrows
in the process diagram indicate the most important and frequent dependencies
between phases.
4.1.1 Business Understanding
This initial phase focuses on understanding the project objectives and require-
ments from a business perspective, and then converting this knowledge into a
data mining problem definition, and a preliminary plan designed to achieve the
objectives. A decision model, especially one built using the Decision Model and
Notation standard can be used.
17
Chapter 4. Project Development Life Cycle 18
Figure 4.1: (CRISP-DM)
4.1.2 Data Understanding
The data understanding phase starts with an initial data collection and proceeds
with activities in order to get familiar with the data, to identify data quality
problems, to discover first insights into the data, or to detect interesting subsets
to form hypotheses for hidden information.
4.1.3 Data Preparation
The data preparation phase covers all activities to construct the final data set (data
that will be fed into the modeling tools from the initial raw data. Data preparation
tasks are likely to be performed multiple times, and not in any prescribed order.
Tasks include table, record, and attribute selection as well as transformation and
cleaning of data for modeling tools.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 4. Project Development Life Cycle 19
4.1.4 Modeling
In this phase, various modeling techniques are selected and applied, and their pa-
rameters are calibrated to optimal values. Typically, there are several techniques
for the same data mining problem type. Some techniques have specific require-
ments on the form of data. Therefore, stepping back to the data preparation phase
is often needed.
4.1.5 Evaluation
At this stage in the project you have built a model (or models) that appears to
have high quality, from a data analysis perspective. Before proceeding to final
deployment of the model, it is important to more thoroughly evaluate the model,
and review the steps executed to construct the model, to be certain it properly
achieves the business objectives. A key objective is to determine if there is some
important business issue that has not been sufficiently considered.At the end of
this phase, a decision on the use of the data mining results should be reached.
4.1.6 Deployment
Creation of the model is generally not the end of the project. Even if the purpose
of the model is to increase knowledge of the data, the knowledge gained will need
to be organized and presented in a way that is useful to the customer. Depending
on the requirements, the deployment phase can be as simple as generating a report
or as complex as implementing a repeatable data scoring or data mining process.
In many cases it will be the customer, not the data analyst, who will carry out
the deployment steps. Even if the analyst deploys the model it is important for
the customer to understand up front the actions which will need to be carried out
in order to actually make use of the created models.[3]
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 5
Algorithm
5.1 Epidemiology
Epidemiology is the study of the patterns, causes, and effects of health and disease
conditions in different areas. It is the cornerstone of public health, and shapes
policy decisions and evidence-based practice by identifying risk factors for disease
and targets for preventive health care. Epidemiologists help with study design,
collection, and statistical analysis of data, and interpretation and dissemination
of results.
5.1.1 Frequency based measures used in epidemiology
Epidemiologists use a variety of methods to summarize data.One fundamental
method is the frequency distribution. The frequency distribution is a table which
displays how many people fall into each category of a variable such as landmarks
or disease status. There are different frequency based measures.In this project risk
ratio is calculated.
Epidemiologic data come in many forms and sizes. One of the most common
forms is a rectangular database made up of rows and columns.Each row contains
information about one individual; each row is called a record or observation. Each
20
Chapter 5. Algorithm 21
column contains information about one characteristic such as race or date of birth;
each column is called variable. The first column of an epidemiologic database
usually contains the individuals name, initials, or identification number which
allows us to identify who is who.
The size of the database depends on the number of records and the number of
variables. A small database may fit on a single sheet of paper; larger databases
with thousands of records and hundreds of variables are best handled with a com-
puter. When we investigate an outbreak, we usually create a database called a
line listing.In a line listing, each row represents a case of the disease we are in-
vestigating.Columns contain identifying information, clinical details, descriptive
epidemiology factors, and possible etiologic factors.
1. Risk ratio A risk ratio, or relative risk, compares the risk of some health-
related event such as disease or death in two groups. The two groups are
typically differentiated by demographic factors such as month (e.g., January
versus February) or by exposure to a suspected risk factor (e.g., occurrences
of disease). Often, you will see the group of primary interest labeled the
exposed group, and the comparison group labeled the ?unexposed? group.
We place the group that we are primarily interested in the numerator; we
place the group we are comparing them with in the denominator:
step 1: find the ratio of cases for one month for particular one disease and
all cases of all disease in that month.
step 2: find the ratio of cases for next month for particular one disease and
all cases of all disease in that month.
step 3: take the ratio of both the month(calculated in step 1 and step 2).It is
the risk ratio of one month for particular disease compare to another month.
Risk ratio = risk of group of primary interest/risk for comparison
group
A risk ratio of 1.0 indicates identical risk in the two months.A risk ratio
greater than 1.0 indicates an increased risk for the numerator group, while a
risk ratio less than 1.0 indicates a decreased risk for the numerator group.[11]
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 5. Algorithm 22
5.2 Spatial data mining
Spatial data mining is the process of discovering interesting and previously un-
known, but potentially useful patterns from large spatial data sets. Extracting
interesting and useful patterns from spatial datasets is more difficult than extract-
ing the corresponding patterns from traditional numeric and categorical data due
to the complexity of spatial data types, spatial relationships, and spatial autocor-
relation.
5.2.1 Clustering
Clustering is a process where the features are grouped in clusters. On the base of a
given set of data points, each with a set of features, they are grouped in clusters so
that data points in a cluster are similar to each other while other ones in separate
clusters are different from each other. Spatial clustering is the process of grouping
similar objects based on their distance, connectivity, or relative density in space,
which has been employed for spatial analysis over years.In short,spatial clustering
is the process of discovering groups in large databases.
Spatial view : rows in a database = points in a multi-dimentional space.
Visualization may reveal interesting groups.In hierarchical clustering,All points in
one cluster split and merge till a stop criterion is reached.In Partition cluster-
ing,start with random central point assign points to nearest central point update
the central points approach with statistical rigor.In density clustering,Find clus-
ters based on density of regions Here, by selecting disease and the landmark ,it
will create the one cluster.Like wise all the diseases for different landmarks makes
the different different clusters.The size of the cluster is decided by the no of dis-
ease count.[14]
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 5. Algorithm 23
5.3 Predictive Analysis
5.3.1 Linear regression
In statistics, linear regression is an approach for modeling the relationship be-
tween a scalar dependent variable y and one or more explanatory variables (or
independent variables) denoted X. we can rarely expect the relationship between
two economic variables to be ”perfect”. There are always other variables that
affect the endogenous variable.Differences in these other variables between obser-
vations will cause some data points to lie above the regression line and others to
lie below it.
No single line passes through all three points.Choosing the line passing through
any two of the three points leaves one point off the line, so we say that there is
one degree of freedom in choosing the line.(In the case of only two points, there
were zero degrees of freedom; if we added a fourth point, there would be two de-
grees of freedom.) In the case of only two data points, our regression line passes
through both points, so the residuals are zero–the data points do not deviate from
the line.With three or more data points we cannot find a line that makes all the
residuals zero, except in the unusual case where all the points happen to lie on the
same line.
1. least square methodology There are several different techniques for linear
regression analysis but here there is a simple linear regression analysis using
the method of least squares. Here we fit a straight line through the of the
data points that would provide the best fit to those points. This line is given
by the equation,
y = a + bx (5.1)
where y and x are our variables e.g. disease count and year,month,quarter,day.
b is known as the gradient and is the amount by which y increases for every
increase in x, for example if every day disease increases by 4 disease count
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 5. Algorithm 24
then b here has value 4. a is known as the intercept and is the point where
the straight line meets the y axis. In our example this would be the mini-
mum disease count of diseases at 1st day of month.
As well as calculating the two values a (intercept) and b (gradient) we also
want to calculate the correlation coefficient (denoted by r) which is a mea-
sure of how well the points fit to the straight line. This is a value between
0 and 1 where a result of 0.5 or below would mean that there is little or no
linear relationship while values above 0.8 would mean that there is a strong
linear relationship.
The correlation coefficient is also known as the product-moment coefficient
of correlation or Pearson’s correlation. It is sometimes also expressed as a
r-squared. Gradient :
We begin by calculating the gradient b. This is given by the formula,
b = cov(x, y)/var(x) (5.2)
which is the covariance in x and y divided by the variance in x. Covariance
in x,y is given by the following formula,
cov(x, y) =∑
(x− x)(y − y)/n (5.3)
and variance in a is given by,
var(x) =∑
(x− x)2 (5.4)
Intercept:
Once we have found b we can then calculate the intercept (a) by,
a = y − bx (5.5)
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 5. Algorithm 25
That gives us the values we need for our straight line equation.
Correlation Coefficient :
To calculate our correlation coefficient we use,
r = cov(x, y)/√
var(x)var(y) (5.6)
You can apply a linear regression to your data, to visualize a linear trend
or to predict future data based on the linear trend in your data. Linear
regression uses a measure and a dimension that is part of a time hierarchy
(for example, Month) as its inputs. SAP Lumira Use this algorithm to find
trends in data. It determines how an individual variable influences another
variable with the least square methodology.[21]
5.3.2 Forecasting
The forecasting capability in SAP Lumira lets you use historical data as the ba-
sis for predicting future values. The forecasting feature analyzes the trends and
cycles of a time series to predict future values. Forecasting uses a measure and
a dimension that is part of a time hierarchy (for example, Month) as its inputs.
You specify how many forecasted values you want the algorithm to produce. SAP
Lumira provides two algorithms for forecasting future data:
1. An SAP Predictive Analytics : Time series analysis computes several models
that are compared for best results. It does this by breaking a time series
into four components:
• Trend : A trend exists when there is a long-term increase or decrease
in the data. It does not have to be linear. Sometimes we will refer to a
trend “changing direction” when it might go from an increasing trend
to a decreasing trend.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 5. Algorithm 26
• Cycles : A cyclic pattern exists when data exhibit rises and falls that
are not of fixed period. The duration of these fluctuations is usually of
at least 2 years.
• Fluctuations : It is an irregular rising and falling in number or amount.It
means a variation in quantity overtime.
• Information Residue : A residual in forecasting is the difference between
an observed value and its forecast based on other observations.
2. Triple Exponential Smoothing : Use this algorithm to smooth the source
data and find seasonal trends in data.A seasonal pattern exists when a series
is influenced by seasonal factors (e.g., the quarter of the year, the month, or
day of the week). Seasonality is always of a fixed and known period.[25][22]
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 6
Database Design
6.1 Introduction
Database design is the process of producing a detailed data model of a database.
This data model contains all the needed logical and physical design choices and
physical storage parameters needed to generate a design in a data definition lan-
guage, which can then be used to create a database. A fully attributed data model
contains detailed attributes for each entity.
In a majority of cases, a person who is doing the design of a database is a person
with expertise in the area of database design, rather than expertise in the domain
from which the data to be stored is drawn e.g. financial information, biological
information etc.Therefore, the data to be stored in the database must be deter-
mined in cooperation with a person who does have expertise in that domain, and
who is aware of what data must be stored within the system.This process is one
which is generally considered part of requirements analysis.
Once a database designer is aware of the data which is to be stored within the
database, they must then determine where dependency is within the data. Some-
times when data is changed you can be changing other data that is not visible.
Once the relationships and dependencies amongst the various pieces of information
have been determined, it is possible to arrange the data into a logical structure
27
Chapter 6. Database Design 28
which can then be mapped into the storage objects supported by the database
management system.
6.2 Database Introduction
The tables that are being referred for the development of this module are as follow:
• res diseases
• res area wise diseases
6.2.1 res diseases
The attributes of this tables are as follows.
• disease id : It shows the unique id for each disease.(e.g. d1,d2,d3..)
• disease name : It indicates the disease name for particular disease id.
6.2.2 res area wise diseases
The attributes of this tables are as follows.
• patient id : It shows the unique id for the patient.(e.g. PAT1,PAT2..)
• disease id : It is a disease id by which the disease name will be fetched from
the res diseases table.
• month : It contains the month in which the case is registered.
• year : It contains the year in which the case is registered.
• landmark : it shows the adress of the patient or the area of particular case
affected by that disease.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 6. Database Design 29
• locality : it indicates that registered case is from which area or region.
• province : it shows that registered case is from which state .
• country : it shows that registered case is from which country.
• latitude : it indicates the latitude of particular area.
• logitude : it indicates the logitude of particular area. latitude and logitude
is used for locating particular area in the geo-map.
• postcode : it shows the pincode of particular area.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7
Implementation
7.1 Calculating risk ratio
The risk ratio can be implemented by two ways.
7.1.1 By entering two months and the disease for which
comparison will be done. According to that risk ratio
will be generated.
Figure 7.1: (Enter details to find risk ratio)
30
Chapter 7. Implementation 31
Figure 7.2: Risk ratio as a result
7.1.2 When there is new entry in the database ,risk ratio
will be added to the database according to the algo-
rithm.
As risk ratio is important for comparing the particular disease risk in two months
it can be added as a new column in database. Here, risk ratio of one month ac-
cording the next immediate month is counted(i.e. risk ratio of January according
to February,risk ratio of February according to march and so on). for that follow-
ing steps are implemented.
step 1: count the cases of particular disease in particular month.After that count
all the cases in the month.(Here find the cases of disease d1 in month January and
find the total cases of January)find the ratio of it.
step 2: count the cases of that particular disease in the next month.After that
count the all cases in the month.(Here find the cases of disease d1 in month Febru-
ary and find the total cases of February).Find the ratio of it.
step 3: insert the risk ratio in the row matching that disease and 1st month(i.e
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 32
January)
step 4: Again do the step 1 for the next two months.After completing one disease
check for all the diseases and put the values respectively.
So,When the new record is inserted or new row is inserted this function is called
and value of risk ratio is put according the month and disease.This file is then
imported to the SAP Lumira.
7.2 Google map APIs
Google APIs is a set of application programming interfaces (APIs) developed by
Google which allow communication with Google Services and their integration to
other services. Google map APIs provides the API key by which the longitude
and latitude for the selected address of the landmark are to be get. As the address
is selected it will insert the longitude and latitude of that address. So it will easy
to find and take less time.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 33
7.3 Visualizations in SAP Lumira
7.3.1 Analysis of diseases in Surat
1. How significant is the impact of different diseases in different areas?
Visualization: disease count by landmark and disease name(column chart)
Figure 7.3: disease count for different diseases in Surat
Here,X-Axis represents the disease count and Y-Axis represents landmark
and color shows the disease name. This graph represents the disease count
of different diseases for a particular area.So from that we can analyze that
which diseases have most impact on particular area.
Here, disease count of different diseases in Adajan can be seen. e.g. There
are 37 cases of convulsion which is maximum from all the diseases. After
convulsion Adajan is more affected by emphysema,acne,actinik keratosis and
digestive spasms having disease count 28,22,20,15 respecively.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 34
2. How diseases spread day by day in different areas?
1)Visualization: disease count by landmark,day and disease name(line chart)
Figure 7.4: Acen spreads in Dabholi day by day
In graph, X-Axis represents the disease count and Y-Axis represents day
and color shows the disease name. Here from the graph we can analyze that
how disease is increased and decreased in different areas( filter is applied).So
from this line chart we can analyze the spreading of diseases day by day.
Here, chart shows how disease acne is spreading in Dabholi day by day.Disease count
shows the cases of that disease on particular day.e.g.day 1 disease count is
4 for acne which decreases on day 2(i.e 1).After then again it is increased
by 2 disease count and so on.It is noticed that in the middle of the month
effect of acne remain constant.Highest count of disease is on day 1 and day 15.
2)Visualization: disease count by landmark LongLat and disease name
(Geo bubble map)
Here, geo bubble map shows the area located by latitude and longitude(Which
is geographic dimensions) affected by the diseases.It also show a day by day
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 35
Figure 7.5: Acen spreads in quarter 1
Figure 7.6: Acen spreads in quarter 2
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 36
Figure 7.7: Acen spreads in quarter 3
Figure 7.8: Acen spreads in quarter 4
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 37
spreading of diseases. The size of the circle shows the disease count. Color
shows the disease name.
Here, in Adajan disase count of acne is 4 in quarter 1,6 in quarter 2 and 6
in quarter 3 and 4 in quarter 4 so on.So,it is observed that disease is increas-
ing day by day.We can analyze this on each day as the animation is based
on day. This is the only geo map having animation feature.
3)Visualization: disease count by landmark,disease name,day(cross tab
chart)
disease_count by landmark, disease_name, day
landmark disease_name day
Measures
disease_count
Adajan Acne 4
5
6
7
9
10
15
19
21
23
1
1
3
2
3
2
1
1
2
1
Figure 7.9: Acen spreads in quarter 4
This cross tab graph gives the details about day by day disease count for
different diseases in different areas. From this graph we can easily observed
that what are the day by day counts of diseases in different areas. There
are highest disease count 3 of acne in Adajan on 6th and 9th day of the
month.there lowest disease count 1 of acne in adajan on 4th,5th,15th,19th,23th.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 38
3. Which is the most affected area by particular disease among all
diseases?
Visualization: disease count by landmark LongLat and disease name(Geo
pie map)
Figure 7.10: comparison of acne, actinic keratoses and acute myocardinalinfraction in Surat
This geo pie map shows the different disease count for different areas for
a different diseases geospatially. From that we can analyze which is most
affected areas by particular diseases.Color shows the disease name and size
of circle shows the disease count.
Here we can see that , there are 8 cases of acne in Bhuvneshwari society
which is less compare to other diseases which is actinic keratoses and acute
myocardinal infraction.From that we can compare 4 diseases in particular
area.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 39
4. How two disease spreading month by month in particular area?
Visualization: disease count by landmark,month,disease name(line chart)
Figure 7.11: comparison of acne and myocardinal infraction spreading monthby month in Dabholi
Here,X-Axis represents location and month,Y-Axis represents disease count
and color represents disease name. two lines show the increment and decre-
ment of disease in different month.
From graph we can say that in Dabholi the cases are more for both the dis-
eaes.acute myocardinal infraction is dissolve after june where acne is dissolve
after september.Maximum number of cases of acne and acute myocardinal
infraction are in may month which are 6 and 5 respectively.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 40
5. Which population of surat is more affected by diseases?
Visualization: disease count by landmark groups(pie chart)
Figure 7.12: Total disease count of diseases in different zones of Surat
Here,color represents different zons of Surat.different areas are groupd in
zons.
It shows which zons is most and less affected by the diseases.It is observed
that West zone is most affected by disease having 2,679 disease count.So
there are requirement of the hospitals,doctors,medicines of every diseases by
which the diseases can be reduced fast. East zone is less affected by diseases
having 174 diease count.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 41
6. Which area are at which risk for particular diseases?
Visualization: disease count and risk count by landmark,disease name(2
Axis combined column line chart)
Figure 7.13: risk of diseases in Surat
Here, X-Axis represents disease count, 2nd X-Axis represents risk count, Y-
Axis represents landmark.The graph shows the risk of diseases in different
areas.Here risk count is a measure which has following formula
if disease count < 50 than 3 else if disease count < 25 than 1 else 2
where
3 indicates ”high risk”
2 indicates ”average risk”
1 indicates ”low risk”
Here from graph we can see that there are 42 cases of convulsions which
is at average risk as there is a 2 risk count. Dabholi and Anand mahal
road is at high risk so all the precautions must be taken for the diseases for
controlling it.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 42
7.3.1.1 Cancer analysis
7. Which type of cancer is more in Surat?
Visualization: disease count by disease name(Heat map)
Figure 7.14: disease count of different canceres in Surat
In heat map different colors represents the disease count of the cancer in
Surat. There are mainly 9 type of cancers appearing in Surat which are can-
cer,throat cancer,breast cancer,skin cancer,anal cancer,cervical cancer,liver
cancer,lung cancer,kidney cancer.
It is observed that cervical cancer has the highest disease count then any an-
other cancer.So it is necessary to find the solution to control this cancer.There
are 87 cases of cervical cancer.After that 45 cases of anal cancer.Other can-
cers liver cancer,lung cancer,cancer,throat cancer,breast cancer,kidney can-
cer,skin cancer has disease count 2,1,2,4,2,1,2 respectively.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 43
8. How two cancers are spreading day by day?
Visualization: disease count by day and disease name(line chart)
Figure 7.15: Cancers spread day by day in Surat
Here, X-Axis represents day,Y-Axis represents disease count and color rep-
resents disease name. From above line chart it is observed that how anal
cancer and cervical cancer are spreading day by day.
Cervical cancer is increasing fast than anal cancer.Anal cancer is dissolve
between 5 to 8 day but cervical cancer is constantly appeared in each
day.There are highest 8 disease count of anal cancer on 19th day and 7 dis-
ease count of cervical cancer on 17th and 30th day.At the end of the month
cervical cancer is decreasing faster.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 44
9. Which area is most affected by which cancer?
Visualization:disease count by landmark LongLat and disease name(Stacked
column chart)
Figure 7.16: Disease count of different cancers in different landmark in Surat
Here,X-Axis represents landmark,Y-Axis represents disease count and color
suggests disease name. This graph shows that which area is having which
type of cancer. There are mainly 8 type of cancer appearing in Surat(i.e.)Anal
cancer, breast cancer, cervical cancer, kidney cancer, lung cancer, cancer,
skin cancer, liver cancer, throat cancer. Here,there are 6 cases of cervical
cancer which is maximum from all the cancers, 1 case of breast cancer
which is minimum and 3 cases of anal cancer. Rest of this cancers are not
appear in canal road area.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 45
7.3.1.2 Swine flu analysis
10. How Swine flu spreading month by month?
Visualization: disease count by month and disease name(line chart)
Figure 7.17: Swine flu spreads day by day in Surat
Here X-Axis represents month Y-Axis represents disease count and color rep-
resents disease name.It is observed that in january the swine flu is decreasing
month by month.
In january disease count of swine flu is 4 which decreases in february (i.e
2)Which is constant till may month.After the may month the effect of swin
flu decreases faster as the medicines are available. Disease count of swine
flu in august and october are 1,1 respectively. So at the end of the year the
effect of swine flu decreases in Surat.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 46
11. How Swine flu affect different landmarks of Surat?
Visualization: disease count by disease name and landmark(column chart)
Figure 7.18: Swine flu affects different landmarks in Surat
Here X-Axis represents disease name Y-Axis represents disease count and
color represents landmark.
It is observed that Opposite gail tower area and Ved road is most
affected by swine flu having disease count 2. Other landmarks Anand ma-
hal,Bhuvneshwary,Green city,Opposite new l.p.savani school,Vasupujiya green,Near
old bank of baroda has disease count 1. So overall all the landmarks are af-
fected by swine flu.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 47
7.3.2 Analysis of diseases in Gujarat
12. Which locality is most affected by diseases in Gujarat?
Visualization: disease count by locality (Column chart)
Figure 7.19: Total disease count in different localities of Gujarat
Here X-Axis represents locality and Y-Axis represents disease count and
color shows the disease name.
It is observed that Aanand is the most affected locality (having 6828 dis-
ease count) and Gandhinagar is least affected locality(having 4991 dis-
ease count) in Gujarat. Other localities like Ahemdabad, Bharuch, Jam-
nagar, Rajkot, Vadodara and Surat has 5829,6167,5895,6204,6268,6118 dis-
ease count respectively.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 48
13. Which disease has most impact in gujarat?
Visualization: Top 5 disease count by disease name(Column chart)
Figure 7.20: Top 5 diseases Which affects most in Gujarat
Here X-Axis represents disease name and Y-Axis represents disease count
and color shows the province. It is observed that there are 10,393 dis-
ease count for the disease convulsions(epilepsy seizures) which make
more impact in gujarat.
After that emphysema,acen,actinic keratoses,accute myocardinal in-
fraction are in the top 5 diseases having disease count 8185,4753,4508,4358
which has more impact on Gujarat.So it is necessary to find the way to
control this diseases.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 49
14. Which disease has least impact in gujarat?
Visualization:Bottom 4 disease count by disease name
Figure 7.21: Bottom 3 diseases Which affects less in Gujarat
Here X-Axis represents disease name and Y-Axis represents disease count
and color shows the province.
It is observed that alcohol withdrawal, alzheimer, motion sickness, bed wet-
ting, malapsorbtion, meneier, menstrual cramps, myocarditis, nasal allergy,
sleep apnea bottom 10 diseases having disease count 6, 7, 7, 12, 12, 12, 12,
12, 12, 12 which has less impact in Gujarat.There are only 6 disease count
for the disease alcohol withdrawal which has the least impact in Gujarat.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 50
15. How the anxiety is appearing in different localities month by month?
Visualization: disease count by month and locality(line graph)
Figure 7.22: Anxiety affects different localities of Gujarat
Here X-Axis represents month and Y-Axis represents disease count and color
shows the province.
It is observed that in the February (2nd month) there are maximum no.
of disease count of anxiety Which is 13 in Aanand,11 in Surat,Vadodara
and Ahemdabad,and 5 in Gandhinagar.After that anxiety is decreasing in
march month.Disease count for Surat, Vadodara, Aanand and Gandhinagar
are 7, 6, 6, 5 repectively. Anxiety is consistence in Gandhinagar in almost all
month.In Ahemdabad and Vadodara after march month there are no cases
of anxiety.In Surat, it appears by 2-3 months.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 51
16. What is the impact of diseases like depression, fatique, anxiety,
migraine on the people of different localities?
Visualization:disease count by locality and disease name( Stacked column
chart)
Figure 7.23: Comparison of mental diseases’effects in different in Gujarat
Here X-Axis represents locality and Y-Axis represents disease count and
color shows the disease count.
Here,we can see that all the 4 diseases anxiety, depression, fatique and mi-
grain appear in every locality.Only there is not a single case of migrain
in jamnagar.Depression is the least in Gandhinagar(6 disease count) as
compare to other localities. Migrain is maximum in the Gandhinagar(16
disease count) then other localities.Patients of fatique and migrain are less
in Ahemdabad than other localities.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 52
17. How different types of cancers affecting the different localities?
Visualization: disease count by disease name and locality(Heat map)
Figure 7.24: Comparison of different cancers in Gujarat
Here X-Axis represents disease name and Y-Axis represents locality and
color shows the disease count.
The cervical cancer is the most spreaded cancer from all the cancers in
Gujarat.Aanand is most affected by the anal cancer having disease count
91. There are least cases of lung cancer in all localities. There are maxi-
mum cases of breast cancer,skin cancer,lung cancer,liver cancer and
throat cancer are in Gandhinagar which is respectively 26 and 25.Cervi-
cal cancer is more in Rajkot and Jamnagar having disease count 87.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 53
18. What is the scenario of blood pressure in different localities by
quarter?
Visualization: disease count by landmark LongLat, disease name and locality(Geo-
Bubble map)
Animation: by quarter
Figure 7.25: Blood pressures’s patients in Gujarat in quarter 1
Here the size of the circle shows the disease count, color shows the landmark
and 1 quarter=3 months. The graph shows the blood pressure’s scenario in
the 1st quarter Where in almost all localities like Lalpur, Amaran, Bhadla,
Chandlekha, Gudel, Adas, Colony, Dahej, Nabipur, Valia, Dabholi etc are
affected by blood-pressur. So blood pressure’s patients are all over in Gu-
jarat in 1st quarter.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 54
Figure 7.26: Blood pressures’s patients in Gujarat in quarter 2
It shows the situation of blood pressure in 2nd quarter where the disease
seems less then 1st quarter.So the blood pressure’s patient is decreased in
2nd quarter then in the 1st quarter in allover Gujarat.
Disease is disappears in some localities like Chandlekha, Amran, Dahej,
Gudel, Nabipur, Valia, Adas, Colony etc. There are some localities like
Desar, Atkot, V R mall, Kanisha, Kareli ,Navjivan in which blood pressure’s
patients are noticed.
In quarter 3 the disease decreases and only in Chandlekha in Gandhinagar
2 cases are noticed. So,The medicines are needed to supply in Chandlekha
as it is the only landmark having blood pressure’s patient in quarter 3.
In quarter 4 the disease decreases significantly and almost disappeared from
all the landmarks and only in Chandlekha there is 1 case is noticed. So,in
quarter 4 blood pressure’s patients decreases.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 55
Figure 7.27: Blood pressures’s patients in Gujarat in quarter 3
Figure 7.28: Blood pressures’s patients in Gujarat in quarter 4
So,the highest patients of blood pressure is in the 1st quarter in all over
Gujarat which is decreasing in each quarter.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 56
19. Which landmark is most affected by diseases?
Visualization: disease count by landmark Long Lat(Geo Chloropleth Chart)
Figure 7.29: Most affected landmark of gujarat
Here color represents the disease count. From this geo map it is observed
that all the localities of Gujarat are affected by different diseases.Lalpuris
the most affected landmark by different diseases in Gujarat.Disease count of
Lalpur is 1,876.
So It can be controlled by the providing proper medicines, hospitals, phar-
macies etc.social awareness is also needed to control the diseases.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 57
20. Which is the most affected and least affected sub region in Gujarat?
Visualization: disease count by Sub region
Figure 7.30: Most affected and least affected locality of Gujarat
Here color represents the disease count. From the graph,Aanand is the
most affected locality in gujarat having disease count 6828 and Panchma-
hals is the least affected having disease count 738.So it is noticed that west
Gujarat is more affected than east Gujarat.So,different health care facilities
are needed in west Gujarat.
21. Which locality is affected by which different diseases?
Visualization: disease count by landmark LongLat and disease name(Geo
pie map)
Here color represents the disease count and size of circle represents dis-
ease count. This map shows that Lalpur is most effected and it is most
effected by actinic keratoses among filtered acne, acid indigestion up-
set stomach and acute myocardinal infraction and acid indigetion upset
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 58
Figure 7.31: Comparision of acne, acid indigestion upset stomach, actinickeratoses, acute myocardinal infraction in Gujarat
stomach has least impact which is 15.84%.So by this graph comparison
can be done between diseases in particular locality.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 59
7.3.3 Analysis of diseases in 8 cities of India
22. which city is most affected by diseases in India?
Visualization: disease count by locality(column chart)
Figure 7.32: Total disease count for different localities of India
Here X-Axis represents disease count color shows the locality.
It is observed that Banglore is the most affected by diseases having 11645
disease count which far away than the other localities and Ahemdabad is
the least affected by diseases having 5829 disease count. Other localities
Ahemedabad, Mumbai, New delhi, Hydrabad, Pune, Surat has 5829, 6123,
6225, 6224, 6203, 6118 disease count respectively.So, the Banglore needs to
find the way how the diseases will be controlled.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 60
23. Comparision of leptospirosis spreading between two cities day by
day. Visualization: disease count day and locality
Figure 7.33: Comparison of spreading of leptospirosis between Surat andBanglore
Here X-Axis represents day,Y-Axis represents disease count and color shows
the locality.
Graph shows that how leptospirosis is spreading day by day in surat
and banglore.There is big difference between two cities in spreading of
leptospirosis.leptospirosis spreads and increases faster in banglore than in
surat.On 5th count is 1 in surat.On 6th day both localities have 1 dis-
ease count. On 10th day leptospirosis is increases having disease count 2.
On 10th day it is remain same.After 11th day it increases in both the locali-
ties.Disease count of leptospirosis in banglore and surat on 12th day is 6 and
3 respectively.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 61
24. Which city is effected by which disease most?
Visualization: Top 10 disease count on disease name by disease name and
locality(Heat map)
Figure 7.34: Top 10 disease count in India
Here X-Axis represents disease name ,Y-Axis represents locality and color
shows the disease count.
From the graph we can analyse that how top 10 diseases affected different
cities.It is observed that Banglore is most affected by convulsions and
emphysema having disease count 1368 and 1128 respectively. Rest of all
cities Pune, New delhi, Hydrabad, Mumbai, Surat, Ahemdabad, Kolkata are
also influenced by convulsions having disease count 726,726,726,715,715,671,577
respectively. Kolkata is 2nd highest affected by acute mycardinal infraction
and least affected by actinic keratoses.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 62
25. How cities are affected by some mental diseases?
Visualization: disease count by locality and disease name(Stacked column
chart)
Figure 7.35: Top 10 disease count in India
Here X-Axis represents disease name ,Y-Axis represents locality and color
shows the disease count.
Here the stacked coloumn chart suggests that anxiety cases are more in
Banglore. So banglore need same system of civilization which decreases the
anxiety of people. Bloodpressure patients is more in kolkata. Depression
patients are equal in pune and new delhi (3 cases). Isonomia is more in
banglore (4 cases). Ahemdabad is least affected by the mental diseases (6
cases). Rest of the cities Surat, Mumbai, Pune, New delhi, Banglore, Hy-
drabad, Kolkata having disease count 7,9,13,10,14,8,11 respectively. There
are more cases of fatique in Kolkata and Pune as the physical hardwork
is more.Hydrabad, Mumbai, Ahemdabad, Kolkata is the least affected by
isinomia as there disease count is 1.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 63
26. How swine flue is spreading in india month by month?
Visualization: disease count by region and disease name(geo bubble map)
Animation: by month
Figure 7.36: Swine flu spreads in India in february
Here color represents disease ,size of circle represents disease count. It is
observed that in february all the states Gujarat, Maharastra, Delhi, Kar-
nataka, Aandra pradesh and West bengal are affect by the swine flu except
Tamilnadu has disease count 4,3,2,4,2,2 respectively which are affected by
swine flu.Gujarat and Karnataka is more affected.
In march, all the states are affected by swine flue except West bengal,
Tamilnadu. It is observed that in march all the states Gujarat, Maharas-
tra, Delhi, Karnataka and Aandra pradesh has disease count of swine flu is
4,4,2,3,2 respectively.
In Maharstra Swine flu increases than in february.Disease count of swine flu
in Maharstra is 3.swine flu is disappeared from the West bengal in march.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 64
Figure 7.37: Swine flu spreads in india in March
Figure 7.38: Swine flu spreads in india in April
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 65
In April, swine flue is disappeared from all the states which were affected
and West bengal is influenced by the swine flue again. It is observed that
west bengal has 1 patient of swine flue in april. So the medicines and proper
hospital facilities needed in West bengal.
Figure 7.39: Swine flu spreads in India in May
In may,again swine flue increases in all the states Gujarat, Maharastra,
Karnataka, Aandra pradesh, Tamilnadu, Delhi, West bengal has 4,4,3,2,1,2,2
disease count. Gujarat and Maharsatra have the most and Tamilnadu has
the least number of patients of swine flu. Swine flu increses in west bengal.
Tamilnadu has the best facilities of hospitals and medicines as it is the least
affected.
In july,swine flue disappears from all the states Gujarat, Maharastra, Kar-
nataka, Tamilnadu, Delhi, Andhra pradesh except West bengal so there is
consistent up-down of the effect of swine flu in all the states and in west
bengal swine decreases than before as the medicines are to be available and
all other help is provided.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 66
Figure 7.40: Swine flu spreads in India in July
Figure 7.41: Swine flu spreads in India in August
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 67
In august,again swine flue started spreading in all states except Tamilnadu.
All the states Gujarat, Maharastra, Karnataka, Andhra pradesh,Delhi and
West bengal has 2,2,2,1,1,1. As per the observation West bengal is the
consistent affected by swine flu as in all the month swine flu is noticed.
Figure 7.42: Swine flu spreads in india in October
In october, the situation of swine flue in different states remain same.
It takes more time to recover the people of Gujarat,Maharastra, Karnataka,
Andhra pradesh, Delhi, West bengal as the effect is same in october.Maharastra
and Karnataka have disease count 2 and they are the most affected.
In december, swine flue is disappeared from all the states and only West
bengal is affected.West bengal has disease count 1 in december.
So we can predict that disease will be disappeared from Gujarat,Maharastra,
Karnataka, Delhi, Tamiladu and Andhra pradesh next months and it is
possible that west bengal takes some months to recover. West bengal needs
some help from other states to recover from the swine flu.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 68
Figure 7.43: Swine flu spreads in India in December
So,it is observed that in march and may months all states affect the most
and in april ,july and december affect the least.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 69
27. Which state is most effected by disease?
Visualization: disease count by Region(Geo chloropleth graph)
Figure 7.44: Most and least affected state by diseases in India
Here,color shows the disease count. From the geo chloroleth map it is ob-
served that Karntaka is the most affected and Tamilnadu is the least
affected state.
So from this graph we can come to know that which state needs more hos-
pitals, medicines, technologies, doctors etc,what is scenario of diseases in
different states of India,how the states are recovered from diseases,we can
take help from the least affected state.
28. Which state is most effected by disease?
Visualization: disease count by Region and disease name(Geo pie chart)
Here color represents disease name and size of circle represents disease count.
It is observed that Karnataka is the most affected by anemia,West bengal is
most affected by bacterial infections,Tamilnadu is not affected by keratosis,
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 70
Figure 7.45: Comparison of anemia, asthma, bacterial infections, eye allergiesand keratitis in India
asthma.
From this graph the comparison between the anemia,asthma,bacterial infec-
tion,eye allergies and keratitis can be done.Asthma is less in all the states.
29. Comparision of different cities influenced by different diseases.
Visualization: disease count by locality and disease name(area chart)
Here,X-Axis represents locality,Y-Axis represents disease count and color
shows disease name.
From the graph,bites and stings,lukemia, leg pain or cramps are more in the
kolkata having disease count 17,6,22. There are less cases of luekemia all over
then above 4 diseases.Banglore is also affected by led pain or cramps and
bites and stings having disease count 22 and 15 respectively.Surat, Ahemd-
abad, Mumbai, Delhi, are the least affeted by this 4 diseases.Lukemia is
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 71
Figure 7.46: Comparison of anemia, asthma, bacterial infections, eye allergiesand keratitis in India
the least spreaded and bites and stings is the most spreaded among these 4
diseases.
30. how allergies,alopecia,altitude illness,alzeimer,amblopia affect dif-
ferent locality?
Visualization: disease count by disease name and locality
Here,X-Axis represents disease count ,Y-Axis represents disease name and
color shows locality.
From the graph ,it is observed that all the localities are most affected by
amplopia. Disease count of amplopia for Ahemdabad, Banglore, Mumbai,
New delhi, Hydrabad, Kolkata, Pune, Surat are 7,14,8,10,10,0,8,9 respec-
tively. Disease count of allergies for Ahemdabad, Banglore, Mumbai, New
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 72
Figure 7.47: Comparison of allergies,alopecia,altitude illness,alzeimer and am-blopia in India
delhi, Hydrabad, Kolkata, Pune, Surat are 1,2,1,1,1,6,1,1 respectively. Dis-
ease count of alopecia for Ahemdabad, Banglore, Mumbai, New delhi, Hy-
drabad, Kolkata, Pune, Surat are 1,2,1,1,1,6,1,1 respectively. Amplopia is
more in pune rest of the diseases are more in kolkata. Disease count of
altitude illness for Ahemdabad, Banglore, Mumbai, New delhi, Hydrabad,
Kolkata, Pune, Surat are 1,2,1,1,1,5,1,1 respectively. Amplopia is more in
Pune rest of the diseases are more in Kolkata.
31. Comparision of different cities influenced by different diseases.
Visualization: disease count by locality and disease name(Stacked column
chart)
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 73
Figure 7.48: Disease count for different diseases in different localities in India
Here,X-Axis represents disease name ,Y-Axis represents disease count and
color shows locality.
From the graph it is noticed that which diseases are top 5 diseases and which
diseases have less effects. Convulsion,emphysema,acen, actinic keratoses, ac-
cute myocardinal infraction having disease count 10000,8000,4500,4300,4200
are the top 5 diseases.
32. Which are the diseases which are the least spreaded?
Visualization:Bottom 3 disease count by disease name(donut chart)
Here,color shows disease name.
From the graph ,it is observed that bed-wetting is the least appearing
disease(7.74%).alchohol widrawal, alergies and allergic reaction, altitude
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 7. Implementation 74
Figure 7.49: Bottom 3 disease count in India
illness,autism, kidney cancer, lung cancer, menere, menstrual cramps, sleep
aprea, violigo are also less appeared.This is the least affected diseases in
India
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 8
Conclusion
From the analysis done, following are the conclusions that are derived:
In Surat, Adajan is most affected landmark. Acen is spreading more faster then
accute myocarninal infraction in Dabholi. South zone is most affected and east
zone is less affected by diseases. Most spreaded cancer is cervical cancer. Affected
areas by liver cancer are Near mahavir petrol pump and U m road.
In Gujarat, Aanand is the most affected locality and Gandhinagar is the least
affected locality by diseases. Convulsion(epilepsy seizures) has most impact and
alcohol widrawal has least impact. In february ,anxiety is appearing the most.
Depression is the least and anxiety is the most in Gandhinagar. Cervical cancer is
the most spreaded cancer and lung cancer is least spreaded cancer. Gandhinagar
is the most affected locality by cancers. In 1st quarter, blood pressure is appeared
most. Lalpur is the most affected landmark.
In India, Banglore is the most affected and Ahemdabad is the least affected cities.
Ahemdabad is least affected by mental diseases There are more cases of fatique
in Pune and Kolkata. West bengal is most affected by swineflue. Karnataka is
the most affected and Tamilnadu is least the affected state. West bengalis most
affected by bacterial infection. bites and stings, lukemia, leg pain or cramps,
azheimar, altitube illness, allopeciaand allergies are more in the Kolkata.
75
Chapter 8. Conclusion 76
leptospirosis and amblopia is more in Banglore. Convulsion, emphysema, acen, ac-
tinic keratoses are top 5 most spreaded diseases. Bed-wetting is the least appeared
disease.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 9
Future prediction
1. What will be the future scenario of diseases in overall India?
1)Visualization: disease count and Linear Regression: disease count by
Month (By linear regression)
Figure 9.1: Prediction of diseases in India in 2016 by linear regression
77
Chapter 9. Future prediction 78
Here X-Axis represents year,quarter,month,Y-Axis represents disease count
and Linear Regression: disease count color shows the disease count and Lin-
ear Regression: disease count.
From the graph,we can see that on basis of previous data after the january
2016 the predictive disease count of diseases is decreasing. So from this
we can predict the up break of diseases in future and control the diseases
after they occur or prevent the diseases to be occurred. 2)Visualization:
disease count and forecast: disease count by Month (By forecasting)
Figure 9.2: Prediction of diseases in India in 2016 by forecasting
Here X-Axis represents year,quarter,month,Y-Axis represents disease count
and Forecasting: disease count color shows the disease count and Forecast-
ing: disease count. Above graph shows that what are the predicted dis-
ease count by forecasting and the disease count will decrease in the next 2
month.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 9. Future prediction 79
2. What is the possibilities of acid indigestion upset stomach in India
in 2016?
1)Visualization: disease count and Linear Regression: disease count by
month for acid indigestion upset stomach in India
Figure 9.3: Prediction of acid indigestion upset stomach in India in 2016 bylinear regression
Here X-Axis represents month,Y-Axis represents disease count and Linear
Regression: disease count color shows the disease count and Linear Regres-
sion: disease count.
From the graph,we can see that on basis of previous data after the january
2016 the predictive disease count of diseases is increasing for acid indigestion
upset stomach.
2)Visualization: disease count and Forecasting: disease count by
month for acid indigestion upset stomach in india
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Chapter 9. Future prediction 80
Figure 9.4: Prediction of acid indigestion upset stomach in India in 2016 byforecasting
Here X-Axis represents month,Y-Axis represents disease count and fore-
casting: disease count color shows the disease count and forecasting: dis-
ease count. From the graph,we can see that on basis of previous data after
the january 2016 the predictive disease count of diseases is increasing for
acid indigestion upset stomach.
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Bibliography
[1] https://www.wikipedia.org [Date Accesesd : 15 january 2016]
[2] ”conceptual framework for spatial analysis” http://www.powershow.com/
view/9bbc7-OWNmZ/Conceptual_frameworks_for_spatial_analysis_
powerpoint_ppt_presentation [Date Accesesd : 15 january 2016]
[3] Pete Chapman, Julian Clinton, Randy Kerber, Step by step data mining guide
CRISP DM https://the-modeling-agency.com/crisp-dm.pdf [Date Ac-
cesesd : 16 january 2016]
[4] ”SPATIAL ANALYSIS AND MAPPING OF CHOLERA CAUSING
FACTORS IN KUMASI, GHANA.” JERRY ASAANA ANAMZUI-
YA March, 2012 https://www.itc.nl/library/papers_2012/msc/gfm/
asaana.pdf[Date Accesesd : 18 january 2016]
[5] http://training.esri.com/Courses/StartGIS_10/index.cfm(course_
tutorial)[Date Accesesd : 18 january 2016]
[6] International Journal of Science, Engineering and Technology Research
(IJSETR), Volume 4, Issue 7, July 2015 2697 Comparative Anal-
ysis of K-Means Algorithm in Disease Prediction K.Rajalakshmi1,
Dr.S.S.Dhenakaran2, N.Roobini http://ijsetr.org/wp-content/
uploads/2015/07/IJSETR-VOL-4-ISSUE-7-2697-2699.pdf [Date Ac-
cessed : 15 February 2016]
[7] Rhttps://courseware.e-education.psu.edu/courses/bootcamp/lo09/
04.html [Date Accessed : 15 February 2016]
81
Refrences 82
[8] http://www.colorado.edu/geography/gcraft/notes/intro/intro.html
[Date Accessed : 19 February 2016]
[9] copiadewhatisgisjaimenievesignaciovazquez-111116134133-phpapp01.pdf
http://www.slideshare.net/ESRI/what-is-gis-10190355 [Date Ac-
cessed: 19 February 2016]
[10] ”Geographic Data Mining and Knowledge Discovery,Research
Monographs in GIS” Taylor and Francis, 2001.url-
http://www.dbs.ifi.lmu.de/Publikationen/Papers/Chapter7.revised.pdf
[Date Accessed : 21 February 2016]
[11] ”Frequency Measures Used in Epidemiology”https://www.uic.edu/sph/
prepare/courses/ph490/resources/epilesson02.pdf [Date Accessed : 21
February 2016]
[12] ”Algorithms and Applications for Spatial Data Mining” Martin Ester, Hans-
Peter Kriegel, Jorg Sander (University of Munich) [Date Accessed : 21 Febru-
ary 2016]
[13] https://www.google.co.in/webhp?sourceid=chrome-instant&ion=
1&espv=2&ie=UTF-8#q=geospatial%20analysis%20in%20sap%20lumira
[Date Accessed : 21 February 2016]
[14] ”Spatial data mining” www.cs.sjsu.edu/faculty/.../Spatial%20Data%
20Mining_CS157B_Satoru_Hozumi.ppt[Date Accessed : 21 February 2016]
[15] ”Using Clustering Methods in Geospatial Information Systems” Xin
Wang ,Department of Geomatics Engineering, Schulich School of
EngineeringUniversity of Calgary, Calgary, AB Canada T2N 1N4
[email protected],Howard Hamilton,Department of Computer Sci-
ence, University of Regina, Regina, Canada S4S 0A2, Hamil-
[email protected]://www.ucalgary.ca/wangx/files/wangx/
geoinformaticsxwhh2008.pdf [Date Accessed : 21 February 2016]
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved
Refrences 83
[16] http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/
uuid/e01da20b-db14-3210b984-83e648239c61QuickLink=index&
overridelayout=true&60155311981591(SAP LUMIRA tutorials)[Date
Accessed : 29 March 2016]
[17] ”Research in Clustering Algorithm for Diseases Analysis” Kaijian XIA, Yue
WU, Xiaogang REN Changshu No.1 People’s Hospital, Jiangsu, Changshu,
China Email: [email protected] Yong JIN School of Computer Science and
Engineering, Changshu Institute of Technology, Changshu, China Email: jiny-
[email protected] [Date Accessed : 29 March 2016]
[18] ”Introduce Basic Algorithm for Predictive Analy-
sis” SelwynZhou, BI Consultant, ATCG Solutions Sel-
[email protected] http://www.atcgsolutions.com/blog/
introduce-basic-algorithm-for-predictive-analysis [Date Accessed :
31 March 2016]
[19] http://scn.sap.com/docs/DOC-53142 [Date Accessed : 31 March 2016]
[20] ”Introduction to linear regression” Author:David M. Lane http://
onlinestatbook.com/2/regression/intro.html [Date Accessed : 31
March 2016]
[21] SAP Lumira — SCN , http://scn.sap.com/community/lumira[Date Ac-
cessed : 31 March3 2016]
[22] ”Linear regression analysis and web intelligence” http://www.gulland.com/
wp/?p=534[Date Accessed : 1 April 2016]
[23] https://cp.hana.ondemand.com/dps/d/preview/
5a4bc2cea197421a8ce8474ef803e596/1.28/en-US/frameset.htm?
6630b086c9444170b5ebe0f52cbdc977.html
[24] http://www.esri.com[Date Accessed : 7 April 2016]
[25] https://www.otexts.org/fpp/6/1[Date Accessed : 7 April 2016]
c©2016 Ratnakala Software Pvt. Ltd. All Rights Reserved