Upload
yige-zhao
View
258
Download
5
Embed Size (px)
Citation preview
Real Time Machine Learning Architecture & Sentiment Analysis
Quantcon 2016, Singapore
Juan CHENG, PHDData [email protected]
www.infotrie.com@infotrie
www.finsents.com@finsents
Agenda
● About us● News analytics in finance● A news analytics case
• Information extraction of text• Text feature extraction for machine learning classification• Big data tools applied• Architecture that combines all
Our team
Frederic GEORJONCEO
Ajil GEORGEHead of Development Center
Daniel ABROUKHead of EMEA
Paris/Singapore London
LONG ZhichengCTO
Singapore India
Services
FinSentS.com➔ Real-time information
and trading portal➔ Millions of sources /
Multilingual➔ Saas or on premises➔ Real-time Alerts➔ Actionable signals
Sentiment Data➔ Through API or 1/3 parties➔ Up to 15 years of history➔ Low latency / Tick by tick➔ 50,000+ entities➔ Stock, Forex, commodities,
index, Macroeconomic topics etc…
Consultancy and Training➔ Trading Technology➔ Algorithmic trading➔ Big Data➔ Natural Language
Processing (NLP)➔ Machine Learning
B.No, I’m a quant. I found it’s hard to quantified news.
A.No, I found news are noisy. They are just too much.
C. Yes. But I found using news is not very efficient. I have to manually related them to my portfolio.
Do you use news in your strategies?
News Analytics in FinanceAccess to News / News management
- Visualization tools - Filtering tools - On demand view
Feed from multiple sources:- Social Media- Web based content- Private sources - Internal data
News Content Alerts based on sentiment indicator
Provide accurate information from Big Data environment and pushed it front of Users in real time for Risk management
Dashboard
- Consolidated Dashboard- Portfolio Alerts
Actionable indicators
Users receive news signals for trading / hedging / risk management based sentiment indicator
Algo Trading / Robo Trading
Real Time algorithmic trading Sentiment indicator and News Analytics
Equity Research / Sales Team Hedging Trader / Prop Trader
- News Tag Cloud- Filtering newsfeed with Social media blotter, news blotter - Search Engine on demand
- Topics detection - Rumours alerts- News qualification per importance
- Relevant information from single screen- Automatic Alert- Integrated to OMS
Provide relevant news analytics indicator for hedging or trade idea generation
Fully integrated news analytics signals integrated to algo trading strategies
ReutersMARKET NEWS | Fri Oct 21, 2016 | 2:18am EDTAT&T acquires Time Warner for $85 billionNEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers.
The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook.David Goldman and Paul R. La Monica contributed to this report.
What’s in the news?
ReutersMARKET NEWS | Fri Oct 21, 2016 | 2:18am EDTAT&T acquires Time Warner for $85 billionNEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers.
The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook.David Goldman and Paul R. La Monica contributed to this report.
Source
Category
Time
Location
Named Entity
Sentiment
Event
Hacking skill, regex,nlp, named entity recognition, pos taggers
What’s in the news?
Text feature extraction
Train Document Set:
d1: The sky is blue.d2: The sun is bright.
Test Document Set:
d3: The sun in the sky is bright.d4: We can see the shining sun, the bright sun.
Vector Space Model (VSM)
t1 t2...
d1
d2 ...
Text feature extraction
Train Document Set:d1: The sky is
blue.d2: The sun is
bright.
Vocabulary
Term frequency(TF)
Text feature extraction
TF emphasize a term which is almost present in the entire corpus
TD-IDF
TF example IDF example
Normalized TD-IDF
Text feature extraction
Train Document Set:
d1: The sky is blue.d2: The sun is bright.
Test Document Set:
d3: The sun in the sky is bright.d4: We can see the shining sun, the bright sun.
Vector Space Model (VSM)
t1 t2...
d1
d2 ...
Machine Learning
- Companies, indexes - People, locations, organizations- Events- Regions
NLP
Text- Dow Jones, bloomberg- Web news, blogs, twitter- 1000+ sources
Feature Extraction
Classification
Sentiment
- 15 years history- Tens of millions of articles
Training
Indexing - Sector/industry- Commodity, FX, ETFs- Political, country risk- Macroeconomic- Fear, greed, anger,
happiness
Aggregation
Processes in text analytics
Architecture requirements
❏ Guaranteed data processing❏ Horizontal scalability❏ Fault-tolerance❏ Higher level abstraction than message passing❏ Real-time machine learning for classification and predictive
analytics
Analytics on Massive Historical Text Data
Analytics on recent pass
Realtime analytics
Batch layer real-time layer
Architecture Solutions
Fast and general engine for large-scale distributed data processing
Memory Network CPU’s Disk
Reference: spark
Logistic regression in Hadoop and Spark
What’s Spark
What’s Storm?
open source distributed realtime computation system, easily process unbounded streams of data
Storm was benchmarked at processing one million 100 byte messages per second per node on hardware with the following specs:
Processor: 2x Intel [email protected]
Memory: 24 GB
Reference: storm
Spout
bolt
Requirements
✓ Guaranteed data processing ✓ Horizontal scalability✓ Fault-tolerance✓ Higher level abstraction than message
passing✓ Real-time machine learning for
classification and predictive analytics
NoSQL Databasecache persistent
Kafka Filter, topic classification, sentiment calculation, entity detection, stock mapping, sentiment aggregation
Apache Storm
DFSNlp modelsML models
ProducersBlogs, twitter, news, bloomberg...
Model training, batch cleaning, batch calculation
Apache Spark
Solr
Relational Database
Web app
Architecture
Usecases
➔Scale analysis pipeline
➔Live stats
➔Recommendations
➔Predictions➔Realtime analytics
➔Online machine learning
Apply similar architecture in
USE CASE in trading I- positive buzz
Sentiment in itself is a powerful trading indicator out of which multiple trading strategies can be build
Simulate impact of complex events
USE CASE in Trading II- Monitoring & Rebalancing
MIFID alertImprove Client's communication
Regulatory Process complex / low signals events
ESG monitoringEcological – Social – Governance
An union calls for a strike in a factory in Argentina?
Negative news coverage is accelerating for a stock I hold in Chinese press but are not yet in English press?
A European company employs children in Bangladesh (*)?
ACTIONS
111111111
3231
111111111
3231
111111111
3231
dfs
Spark basics - word count
96
3
99693
text_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)Job
Executor
Storm basics
Nimbus
Zookeeper
Zookeeper
Worker
Worker
Worker
Worker
Big Data in Finance
Velocity
Big Data
Variety
- News, blogs, social media, analyst reports, company announcement, traders’ chat room…
- Financial reports, price, economic events...
- Weather, GPS, image....
Volumn
- ETL- Machine learning- Correlation analysis,- regressions….
- As fast as possible