22
Scraping the Web with Scrapinghub For Finance

Using Web Data for Finance

Embed Size (px)

Citation preview

Page 1: Using Web Data for Finance

Scraping the Web with ScrapinghubFor Finance

Page 2: Using Web Data for Finance

We turn web content into useful data

Page 3: Using Web Data for Finance

About Scrapinghub

Scrapinghub specializes in data extraction. Our platform is used to scrape over 4 billion web pages a month.

We offer:

● Professional Services to handle the web scraping for you

● Off-the-shelf datasets so you can get data hassle free

● A cloud-based platform that makes scraping a breeze

Page 4: Using Web Data for Finance

Founded in 2010, largest 100% remote company based outside of the US

We’re 134 teammates in 48 countries

Page 5: Using Web Data for Finance

“Getting information off the Internet is like taking a drink from a fire hydrant.”

– Mitchell Kapor

Page 6: Using Web Data for Finance

Scrapy

Scrapy is a web scraping framework that gets the dirty work related to web crawling out of your way.

Benefits● No platform lock-in: Open Source● Very popular (13k+ ★)● Battle tested● Highly extensible● Great documentation

Page 7: Using Web Data for Finance

Portia

Portia is a Visual Scraping tool that lets you get data without needing to write code.

Benefits● No platform lock-in: Open Source● JavaScript dynamic content

generation● Ideal for non-developers● Extensible● It’s as easy as annotating a page

Page 8: Using Web Data for Finance

Portia

Page 9: Using Web Data for Finance

Large Scale Infrastructure

Meet Scrapy Cloud , our PaaS for web crawlers:

● Scalable: Crawlers run on EC2 instances or dedicated servers● Crawlera add-on● Control your spiders: Command line, API or web UI● Machine learning integration: BigML, MonkeyLearn● No lock-in: scrapyd to run Scrapy spiders on your own

infrastructure

Page 10: Using Web Data for Finance

Broad Crawls

Frontera allows us to build large scale web crawlers in Python:

● Scrapy support out of the box● Distribute and scale custom web crawlers across servers● Crawl Frontier Framework: large scale URL prioritization logic● Aduana to prioritize URLs based on link analysis (PageRank,

HITS)

Page 11: Using Web Data for Finance

Web Scraping Use Cases

Page 12: Using Web Data for Finance

Competitive Pricing

Companies use web scraping to monitor the pricing and the ratings of competitors:

● Scrape online retailers● Structure the data in a search engine or

DB● Create an interface to search for

products● Sentiment analysis for product rankings

Page 13: Using Web Data for Finance

We help a leading IT manufacturer monitor the activities of their resellers:

● Tracking and watching out for stolen goods

● Pricing agreement violations

● Customer support responses on complaints ● Product line quality checks

Monitor Resellers

Page 14: Using Web Data for Finance

Lead Generation

Mine scraped data to identify who to target in a company for your outbound sales campaigns:

● Locate possible leads in your target market● Identify the right contacts within each one● Augment the information you already have on them

Page 15: Using Web Data for Finance

Real Estate

Crawl property websites and use the data obtained in order to:

● Estimate house prices● Rental values● Housing stock movements● Give insight into real estate agents and homeowners

Page 16: Using Web Data for Finance

Fraud Detection

Monitor for sellers that offer products violating the ToS of credit card companies including:● Drugs● Weapons● Gambling

Identify stolen cards and IDs on the Dark Web● Forums where hackers share ID numbers / pins

Page 17: Using Web Data for Finance

Company Reputation

Sentiment analysis of a company or product through newsletters, social networks and other natural language data sources.

● NLP to create an associated sentiment indicator.● Track the relevant news supporting the indicator can lead to

market insights for long-term trends.

Page 18: Using Web Data for Finance

Consumer Behavior

Extract data from forums and websites like Reddit to evaluate consumer reviews and commentary:

● Volume of comments across brands● Topics of discussion● Comparisons with other brands and products ● Evaluate product launches and marketing tactics

Page 19: Using Web Data for Finance

Tracking Legislation

Monitor bills and regulations that are being discussed in Congress. Access court judgments and opinions in order to:

● Follow discussions ● Try to forecast legislative outcomes● Track regulations that impact different economic sectors

Shane Evans
any more details? how would forecasting work, for example?
Cecilia Haynes
So this would be the time to discuss Vizlegal and probably our integration with BigML, which is a predictive ML software. You could likely forecast decisions based on the previous voting history of the judges and members of congress.
Shane Evans
so you have that data available - the judges & previous voting history?
Denis de Bernardy
something like that, yes. writing "forecast" might be a bit far fetched though. maybe use a slightly less strong work or dilute it a bit with a "try to"?that said, as I understood AI is used a lot for discovery nowadays: neural nets, rather than humans, plow through monster data dumps. so perhaps it's not _that_ far fetched.
Page 20: Using Web Data for Finance

Hiring

Crawl and extract data from job boards and other sources in order to understand:● Hiring trends in different sectors or regions● Find candidates for jobs, or future leaders● Spot and rescue employees that are

shopping for a new job

Page 21: Using Web Data for Finance

Monitoring Corruption

Journalists and analysts can create Open Data by extracting information from difficult to access government websites:

● Track the activities of lobbyists

● Patterns in the behavior of government officials● Disruptions in the economy due to corruption allegations

Page 22: Using Web Data for Finance

Thank you!

scrapinghub.com

Thank you!