19
Name of the students: 1) Rohit Jain (10103473) 2) Neeraj Chaudhary (10103525) Name of the supervisor: Mr. Vivek Mishra.

Major ppt

Embed Size (px)

Citation preview

Page 1: Major ppt

Name of the students: 1) Rohit Jain (10103473)

2) Neeraj Chaudhary (10103525)

Name of the supervisor: Mr. Vivek Mishra.

Page 2: Major ppt

Importance of project

Today is World where stock market is one of the major market to invest

in , to earn money. The stock market reflects the variation of the market

economy, and receives ten million investors’ focus since its opening

development. The stock market is characterize by high-risk, high-yield,

so investors are concerned about the analysis of the stock market and

trying to forecast the trend of the stock market. However, stock market is

impacted by the politics, economy and many other factors, coupled with

the complexity of its internal law, such as price changes in the non-linear,

and shares data with high noise characteristics, therefore the traditional

mathematical statistical techniques to forecast the stock market has not

yielded suitable results

Hence, we are going to analyze the stock based on different algorithms

designed using some tools and techniques which include hadoop and

mapreduce.

Page 3: Major ppt

INTRODUCTION

Analysis of data is a process of inspecting, cleaning, transforming, and

modeling data with the goal of discovering useful information,suggesting

conclusions, and supporting decision making. Data analysis has multiple

facets and approaches, encompassing diverse techniques under a variety

of names, in different business, science, and social science domains.

We are going to analyse the data of a stock to find different types of

trend in this stock using hadoop and mapreduce.

Hadoop is an open source framework for writing and running distributed

applications that process large amounts of data. Distributed computing is

a wide and varied field, but the key distinctions of Hadoop are that it is

Page 4: Major ppt

MapReduce is a data processing model . Its greatest advantage is the

easy scaling of data processing over multiple computing nodes. Under

the MapReduce model, the data processing primitives are called

mappers and reducers . Decomposing a data processing application

into mappers and reducers is sometimes nontrivial. But, once you

write an application in the MapReduce form, scaling the application

to run over hundreds, thousands, or even tens of thousands of

machines in a cluster is merely a configuration change. This simple

scalability is what has attracted many programmers to the MapReduce

model.

Page 5: Major ppt

Technical and graphical indicators used

We are analyzing the data of a particular for past several years through

different type of algorithms. We need to find no. of days where a same

percentage change has been occurred in whole data.

For example, given a particular stock, we’d like to know how often in the

past several years its changed by 1%, 2%, 3% etc (kind of like a a

Fourier Transform, or transforming some temporal domain data into the

frequency domain).

Further we will we using different Technical indicators for analysis

purpose only which will include:

•Simple Moving Average (SMA)

•Exponential Moving Average (EMA)

•On Balance Volume (OBV)

Page 6: Major ppt

TECHNICAL INDICATORS USED -:

This method is used for analysis purpose by using one of the

following feature users can see graph of that company by entering

period as input.

Simple Moving Average (SMA)-

1) SMA is basic of the moving average used for treading.

2) It is based on closing price.

Exponential Moving Average (EMA) –

1) Try to reduce Lag by applying more weight to recent price.

2) EMA (Current) = ((Price (Cur) – EMA (Prev))*Multiplier) +

EMA (Prev)

Multiplier = (2/ (Time period+1))

Page 7: Major ppt

Overall description of the project

Our project aims at analyzing the data of particular stock using hadoop

mapreduce.

We proposed some algorithm to analyse the data of stock. Initially we are

finding the frequency of stock changes using an excel sheet as a input of

a stock. We will be using mapreduce functions to perform this operation

so that data could be analysed.

Then we will be using some other algorithm to forecast the trend of the

stock using some technical indicator exponential moving average(EMA)

. After this we are using graphical stock trend indicator to understand

the trend of stock.

This project is whole working on a Hadoop mapreduce .

Page 8: Major ppt

Functional requirements and Non Functional

requirementsAfter the time elapsed in the project and working out the procedure to

implement our algorithm there are some requirements namely that are

needed for the proper functioning of the project .

A functional requirement describes what a software system should do,

while non-functional requirements place constraints on how the system

will do so.

•Functional Requirements:

• Hadoop should handle the inputted data of the stock.

•Mapper must have a key for mapping the data.

•Reducer must integrate the data as an output.

•Non Functional Reuirements:

•Scability: The application must work for a large data. It should not

fail in a this condition.

•Reliability: The application must be reliable in every aspect for the

user who is using for analyzing the data.

•Efficiency: Specifies how well the software utilizes scarce resources:

Page 9: Major ppt

Component description and dependency details

An excel file of a particular stock is used as an input for the project. We

have used the excel of BP stock from yahoo server.

•Softwares Requirement

•Oracle (Sun) Java 6: Oracle (Sun) Java 6 is the reference

implementation for Java6.

•Hadoop: Hadoop Map/Reduce is a software framework for easily

writing applications which process vast amounts of data (multi-terabyte

data-sets) in-parallel on large clusters (thousands of nodes) of

commodity hardware in a reliable, fault-tolerant manner.

•Hardware Requirement

•PC 1.6 Ghz or higher

•3 Gb Ram or higher

•Operating System: Ubuntu

Page 10: Major ppt

Overall Architecture

We are taking an excel file as an input and allowing map function

to perform a task on it and then reducing the result to get an output.

Page 11: Major ppt

Proposed Algorithm

Algorithm based on percentage change of stock:

It’s an Algorithm to compute the frequency of stock market changes.

For example, given a particular stock, we’d like to know how often in the

past several years its changed by 1%, 2%, 3% etc (kind of like a a

Fourier Transform, or transforming some temporal domain data into the

frequency domain).

Yahoo Finance provides us a stock of BP as an excel sheet for the

analysis.

Page 12: Major ppt

Map Function:Primarily we are writing a stream processor here that atomically

performs what needs to happen on one line of data. Thats perfect for us,

we’re going to simply take the opening price, the closing price, calculate

the percent change and spit it out.

//Date,Open,High,Low,Close,Volume,Adj Close

String[] tokens = value.toString().split(“,”);

Float open= Float.valueOf(tokens[1]);

Float close= Float.valueOf(tokens[4]);

Float change=((close-open)/open)*100;

Word.set(new DecimalFormat(“0.##”).format((double)change) + “%”);

Context.write(word, one);

We will get a stream of (name, value) pairs with the name being the

percentage change for the day and the value being the integer ‘1’. This

function can be distributed over X number of machines, each one

performing its streaming function in parallel and independent of the

others.

Page 13: Major ppt

Reduce Function:

This function is going to take the (name, value) outputs from all the

mappers and process that data accordingly (often ‘reducing’ it). In our

case we are simply going to count the number of times a particular

percentage change happens. In essence we are going to change this:

1.2% 1

1.3% 1

1.2% 1

Into

1.2% 2

1.3% 1

int sum=0;

for(IntWritable val : values)

{

Sum=sum +val.get();

}

Context,write(key , new IntWritable(sum));

Page 14: Major ppt

Technical indicators algorithm:This method is used for analysis purpose by using one of the following

feature users can see graph of that company by entering period as input.

A. Simple Moving Average (SMA)-

1) SMA is basic of the moving average used for treading.

2) It is based on closing price.

Ex. Daily Closing price- 11,12,13,14,15,16,17

To Find MA of day-

1st day- (11+12+13+14+15)/5=13

2nd day- (12+13+14+15+16)/5=14

3rd day- (13+14+15+16+17)/5=15 & so on.

Page 15: Major ppt

B. Exponential Moving Average (EMA) –

1) Try to reduce Lag by applying more weight to recent

price.

2) EMA (Current) = ((Price (Cur) – EMA

(Prev))*Multiplier) + EMA (Prev)

Multiplier = (2/ (Time period+1))

Page 16: Major ppt

ConclusionInvesting into stocks is a common side business of companies and

indivisual to get compound interest, time value of money, tax benefit,

diversification. So that to invest into good rising stock is necessary to get

desired profit. To select good stock stock change indicator is very helpful

for the user. Hadoop is a open source software which can handle the

huge amount of data quite easily. Hadoop has some modules like map

reduce function , HDFS, Hadoop common, hadoop yarn. Map is a

programming model which calculates percentage change of stock and

assigns that change as key and gives value equals to 1 for each key.

Whereas map function reads key and set of values associated to it.

Reduce function than calculates sum of values associates with key and

gives key and that sum (frequency) as final output. EMA algorithm

mainly focuses on recent price values. By analyzing these values user

can choose a stock less risky. By drawing graph of EMA closing prices

user can understand trend of stock. So that he can invest into less risky or

more risky (according to his choice) stock with upper trend.

Page 17: Major ppt

Future work

We have planned the following things to do in future .

•We want add some more technical indiacators (like back propagation

neural networks ) to this program so that person can compare result of

each indicator. User will have the freedom to give importance on

particular condition (indicator).

•We want to add some graphical indicators ( like OBVP ) also with this

project so that user gets the graphical knowledge along with statistical

knowledge. So that he can better understand the trend of stock.

•We want to link this project to a website so that more no of people can

take benefit of this project.

Page 18: Major ppt

I. Apache Software Foundation. Official apache hadoop website,

http://hadoop.apache.org

II. The Hadoop Architecture and Design,

Available:http://hadoop.apache.org/common/docs/r0.16.4/hdfs_desig

n.html

III. Aditya B. Patel, Manashvi Birla, Ushma Nair ,Addressing Big Data

Problem Using Hadoop and Map Reduce, NIRMA UNIVERSITY

INTERNATIONAL CONFERENCE ON ENGINEERING,

NUiCONE-2012, 06-08DECEMBER, 2012.

References

Page 19: Major ppt

I. Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplied Data

Processing on Large Clusters,OSDI 2004.

II. KUSHAGRA SAHU, REVATI PAWAR, SONALI TILEKAR,

RESHMA SATPUTE, STOCK EXCHANGE IFORECASTING

USING HADOOP MAP-REDUCE TECHNIQUE,

International Journal ofAdvancements inResearch & Technology,

Volume 2,Issue4,April‐2013

III. Hadoop in Action” by Chuck Lam.

IV. “Pro Hadoop- build scalable distributed applications in the cloud” by

Jason Venner Michael G Noll tutorials Applied Research. Big

Data. Distributed Systems. website: http://www.michael-noll.com