26
Real Time Machine Learning Architecture & Sentiment Analysis Quantcon 2016, Singapore Juan CHENG, PHD Data Scientist [email protected] www.infotrie.com @infotrie www.finsents.com @finsents

Real time machine learning architecture & sentiment analysis

Embed Size (px)

Citation preview

Page 1: Real time machine learning architecture & sentiment analysis

Real Time Machine Learning Architecture & Sentiment Analysis

Quantcon 2016, Singapore

Juan CHENG, PHDData [email protected]

www.infotrie.com@infotrie

www.finsents.com@finsents

Page 2: Real time machine learning architecture & sentiment analysis

Agenda

● About us● News analytics in finance● A news analytics case

• Information extraction of text• Text feature extraction for machine learning classification• Big data tools applied• Architecture that combines all

Page 3: Real time machine learning architecture & sentiment analysis

Our team

Frederic GEORJONCEO

Ajil GEORGEHead of Development Center

Daniel ABROUKHead of EMEA

Paris/Singapore London

LONG ZhichengCTO

Singapore India

Page 4: Real time machine learning architecture & sentiment analysis

Services

FinSentS.com➔ Real-time information

and trading portal➔ Millions of sources /

Multilingual➔ Saas or on premises➔ Real-time Alerts➔ Actionable signals

Sentiment Data➔ Through API or 1/3 parties➔ Up to 15 years of history➔ Low latency / Tick by tick➔ 50,000+ entities➔ Stock, Forex, commodities,

index, Macroeconomic topics etc…

Consultancy and Training➔ Trading Technology➔ Algorithmic trading➔ Big Data➔ Natural Language

Processing (NLP)➔ Machine Learning

Page 5: Real time machine learning architecture & sentiment analysis

B.No, I’m a quant. I found it’s hard to quantified news.

A.No, I found news are noisy. They are just too much.

C. Yes. But I found using news is not very efficient. I have to manually related them to my portfolio.

Do you use news in your strategies?

Page 6: Real time machine learning architecture & sentiment analysis

News Analytics in FinanceAccess to News / News management

- Visualization tools - Filtering tools - On demand view

Feed from multiple sources:- Social Media- Web based content- Private sources - Internal data

News Content Alerts based on sentiment indicator

Provide accurate information from Big Data environment and pushed it front of Users in real time for Risk management

Dashboard

- Consolidated Dashboard- Portfolio Alerts

Actionable indicators

Users receive news signals for trading / hedging / risk management based sentiment indicator

Algo Trading / Robo Trading

Real Time algorithmic trading Sentiment indicator and News Analytics

Equity Research / Sales Team Hedging Trader / Prop Trader

- News Tag Cloud- Filtering newsfeed with Social media blotter, news blotter - Search Engine on demand

- Topics detection - Rumours alerts- News qualification per importance

- Relevant information from single screen- Automatic Alert- Integrated to OMS

Provide relevant news analytics indicator for hedging or trade idea generation

Fully integrated news analytics signals integrated to algo trading strategies

Page 7: Real time machine learning architecture & sentiment analysis

ReutersMARKET NEWS | Fri Oct 21, 2016 | 2:18am EDTAT&T acquires Time Warner for $85 billionNEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers.

The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook.David Goldman and Paul R. La Monica contributed to this report.

What’s in the news?

Page 8: Real time machine learning architecture & sentiment analysis

ReutersMARKET NEWS | Fri Oct 21, 2016 | 2:18am EDTAT&T acquires Time Warner for $85 billionNEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers.

The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook.David Goldman and Paul R. La Monica contributed to this report.

Source

Category

Time

Location

Named Entity

Sentiment

Event

Hacking skill, regex,nlp, named entity recognition, pos taggers

What’s in the news?

Page 9: Real time machine learning architecture & sentiment analysis

Text feature extraction

Train Document Set:

d1: The sky is blue.d2: The sun is bright.

Test Document Set:

d3: The sun in the sky is bright.d4: We can see the shining sun, the bright sun.

Vector Space Model (VSM)

t1 t2...

d1

d2 ...

Page 10: Real time machine learning architecture & sentiment analysis

Text feature extraction

Train Document Set:d1: The sky is

blue.d2: The sun is

bright.

Vocabulary

Term frequency(TF)

Page 11: Real time machine learning architecture & sentiment analysis

Text feature extraction

TF emphasize a term which is almost present in the entire corpus

TD-IDF

TF example IDF example

Normalized TD-IDF

Page 12: Real time machine learning architecture & sentiment analysis

Text feature extraction

Train Document Set:

d1: The sky is blue.d2: The sun is bright.

Test Document Set:

d3: The sun in the sky is bright.d4: We can see the shining sun, the bright sun.

Vector Space Model (VSM)

t1 t2...

d1

d2 ...

Machine Learning

Page 13: Real time machine learning architecture & sentiment analysis

- Companies, indexes - People, locations, organizations- Events- Regions

NLP

Text- Dow Jones, bloomberg- Web news, blogs, twitter- 1000+ sources

Feature Extraction

Classification

Sentiment

- 15 years history- Tens of millions of articles

Training

Indexing - Sector/industry- Commodity, FX, ETFs- Political, country risk- Macroeconomic- Fear, greed, anger,

happiness

Aggregation

Processes in text analytics

Page 14: Real time machine learning architecture & sentiment analysis

Architecture requirements

❏ Guaranteed data processing❏ Horizontal scalability❏ Fault-tolerance❏ Higher level abstraction than message passing❏ Real-time machine learning for classification and predictive

analytics

Page 15: Real time machine learning architecture & sentiment analysis

Analytics on Massive Historical Text Data

Analytics on recent pass

Realtime analytics

Batch layer real-time layer

Architecture Solutions

Page 16: Real time machine learning architecture & sentiment analysis

Fast and general engine for large-scale distributed data processing

Memory Network CPU’s Disk

Reference: spark

Logistic regression in Hadoop and Spark

What’s Spark

Page 17: Real time machine learning architecture & sentiment analysis

What’s Storm?

open source distributed realtime computation system, easily process unbounded streams of data

Storm was benchmarked at processing one million 100 byte messages per second per node on hardware with the following specs:

Processor: 2x Intel [email protected]

Memory: 24 GB

Reference: storm

Spout

bolt

Page 18: Real time machine learning architecture & sentiment analysis

Requirements

✓ Guaranteed data processing ✓ Horizontal scalability✓ Fault-tolerance✓ Higher level abstraction than message

passing✓ Real-time machine learning for

classification and predictive analytics

Page 19: Real time machine learning architecture & sentiment analysis

NoSQL Databasecache persistent

Kafka Filter, topic classification, sentiment calculation, entity detection, stock mapping, sentiment aggregation

Apache Storm

DFSNlp modelsML models

ProducersBlogs, twitter, news, bloomberg...

Model training, batch cleaning, batch calculation

Apache Spark

Solr

Relational Database

Web app

Architecture

Page 20: Real time machine learning architecture & sentiment analysis

Usecases

➔Scale analysis pipeline

➔Live stats

➔Recommendations

➔Predictions➔Realtime analytics

➔Online machine learning

Apply similar architecture in

Page 21: Real time machine learning architecture & sentiment analysis

Thank you!!!

Available @ [email protected]@infotrie

www.finsents.com@finsents

Page 22: Real time machine learning architecture & sentiment analysis

USE CASE in trading I- positive buzz

Sentiment in itself is a powerful trading indicator out of which multiple trading strategies can be build

Simulate impact of complex events

Page 23: Real time machine learning architecture & sentiment analysis

USE CASE in Trading II- Monitoring & Rebalancing

MIFID alertImprove Client's communication

Regulatory Process complex / low signals events

ESG monitoringEcological – Social – Governance

An union calls for a strike in a factory in Argentina?

Negative news coverage is accelerating for a stock I hold in Chinese press but are not yet in English press?

A European company employs children in Bangladesh (*)?

ACTIONS

Page 24: Real time machine learning architecture & sentiment analysis

111111111

3231

111111111

3231

111111111

3231

dfs

Spark basics - word count

96

3

99693

text_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)Job

Executor

Page 25: Real time machine learning architecture & sentiment analysis

Storm basics

Nimbus

Zookeeper

Zookeeper

Worker

Worker

Worker

Worker

Page 26: Real time machine learning architecture & sentiment analysis

Big Data in Finance

Velocity

Big Data

Variety

- News, blogs, social media, analyst reports, company announcement, traders’ chat room…

- Financial reports, price, economic events...

- Weather, GPS, image....

Volumn

- ETL- Machine learning- Correlation analysis,- regressions….

- As fast as possible