Upload
masashi-shibata
View
1.215
Download
0
Embed Size (px)
Citation preview
Introduction of FeedyMasashi Shibata
#pyconapac #pyconapac2016
Masashi Shibata
Student Programmer in Japan
c-bata
PyCon JP staff
c_bata_en! "
RSS Feed
<?xml version="1.0" encoding="UTF-8" ?> <rss version="2.0"> <channel> <title>c-bata’s weblog</title> <link>http://example.com</link> <description>c-bata’s weblog</description> <item> <title>Introduction of Feedy</title> <link>http://example.com/foo</link> <description>short description</description> </item> <item> <title>XML Tutorial</title> <link>http://example.com/bar</link> <description>Description about bar</description> </item> :
RSS Feed XML
title url descriptions
Each feed items consists of
#
#
#
Collecting images from RSS Feed
<?xml version="1.0" encoding="UTF-8" ?> <rss version="2.0"> <channel> <title>c-bata’s weblog</title> <link>http://example.com</link> <description>c-bata’s weblog</description> <item> <title>Introduction of Feedy</title> <link>http://example.com/foo</link> <description>Description about foo</description> </item> <item> <title>XML Tutorial</title> <link>http://example.com/bar</link> <description>Description about bar</description> </item> :
RSS Feed XML <html lang=en> <head> <title>introduction of feedy | c-bata’s weblog</title> </head> <body> <h1>Introduction of Feedy</h1> : </body> </html>
Article 1
<html lang=en> <head> <title>introduction of feedy | c-bata’s weblog</title> </head> <body> <h1>Introduction of Feedy</h1> : </body> </html>
Article 2
FEED ITEMS
:If you want to collect images, you have to fetch HTML of each articles.
Little complexwhen just fetching RSS Feed items.
http://doc.scrapy.org/en/1.0/topics/architecture.html
$ pip install feedy
Usagefrom feedy import Feedy app = Feedy(‘feedy.dat')
@app.add(‘<RSS_FEED_URL>’) def func(info, body): # do something
if __name__ == '__main__': app.run()
with BeautifulSoup4from feedy import Feedy from bs4 import BeautifulSoup app = Feedy(‘feedy.dat')
@app.add(‘<RSS_FEED_URL>’) def func(info, body): soup = BeautifulSoup(body, "html.parser") # do something
if __name__ == '__main__': app.run()
HTML Body of each feed items
Collecting images using Feedyfrom feedy import Feedy app = Feedy('feedy.dat')
@app.add('http://rss.cnn.com/rss/edition.rss') def cnn(info, body): soup = BeautifulSoup(body, "html.parser") for x in soup.find_all(‘img’, attrs={‘class’: ‘foo’}): print(x[‘src'])
if __name__ == '__main__': app.run()
Adding other RSS [email protected](‘http://site1.com/rss') def site1(info, body): soup = BeautifulSoup(body, "html.parser") for x in soup.find_all(‘img’, attrs={‘class’: ‘foo’}): print(x[‘src'])
@app.add(‘http://any-other-website.com/rss') def site2(info, body): soup = BeautifulSoup(body, "html.parser") for x in soup.find_all(‘img’, attrs={‘class’: ‘bar’}): print(x['src'])
Plugins
Getting social share countsfrom feedy import Feedy from feedy_plugins import social_share_plugin
app = Feedy(store='feedy.dat', ignore_fetched=True) app.install(social_shared_plugin)
@app.add('http://rss.cnn.com/rss/edition.rss') def cnn_shared(info, body, social_count): article = { 'pocket': social_count['pocket_count'], 'facebook': social_count['facebook_count'], } print(article)
github.com/c-bata/feedy
Pull requests, Issues are Welcom :)
More details are avairable on github.
Thank you!