Introduction of Feedy

Introduction of FeedyMasashi Shibata

#pyconapac #pyconapac2016

Masashi Shibata

Student Programmer in Japan

c-bata

PyCon JP staff

c_bata_en! "

RSS Feed

<?xml version="1.0" encoding="UTF-8" ?> <rss version="2.0"> <channel> <title>c-bata’s weblog</title> <link>http://example.com</link> <description>c-bata’s weblog</description> <item> <title>Introduction of Feedy</title> <link>http://example.com/foo</link> <description>short description</description> </item> <item> <title>XML Tutorial</title> <link>http://example.com/bar</link> <description>Description about bar</description> </item> :

RSS Feed XML

title url descriptions

Each feed items consists of

#

#

#

Collecting images from RSS Feed

<?xml version="1.0" encoding="UTF-8" ?> <rss version="2.0"> <channel> <title>c-bata’s weblog</title> <link>http://example.com</link> <description>c-bata’s weblog</description> <item> <title>Introduction of Feedy</title> <link>http://example.com/foo</link> <description>Description about foo</description> </item> <item> <title>XML Tutorial</title> <link>http://example.com/bar</link> <description>Description about bar</description> </item> :

RSS Feed XML <html lang=en> <head> <title>introduction of feedy | c-bata’s weblog</title> </head> <body> <h1>Introduction of Feedy</h1> : </body> </html>

Article 1

<html lang=en> <head> <title>introduction of feedy | c-bata’s weblog</title> </head> <body> <h1>Introduction of Feedy</h1> : </body> </html>

Article 2

FEED ITEMS

：If you want to collect images, you have to fetch HTML of each articles.

Little complexwhen just fetching RSS Feed items.

http://doc.scrapy.org/en/1.0/topics/architecture.html

$ pip install feedy

Usagefrom feedy import Feedy app = Feedy(‘feedy.dat')

@app.add(‘<RSS_FEED_URL>’) def func(info, body): # do something

if __name__ == '__main__': app.run()

with BeautifulSoup4from feedy import Feedy from bs4 import BeautifulSoup app = Feedy(‘feedy.dat')

@app.add(‘<RSS_FEED_URL>’) def func(info, body): soup = BeautifulSoup(body, "html.parser") # do something


HTML Body of each feed items

Collecting images using Feedyfrom feedy import Feedy app = Feedy('feedy.dat')

@app.add('http://rss.cnn.com/rss/edition.rss') def cnn(info, body): soup = BeautifulSoup(body, "html.parser") for x in soup.find_all(‘img’, attrs={‘class’: ‘foo’}): print(x[‘src'])


Adding other RSS [email protected](‘http://site1.com/rss') def site1(info, body): soup = BeautifulSoup(body, "html.parser") for x in soup.find_all(‘img’, attrs={‘class’: ‘foo’}): print(x[‘src'])

@app.add(‘http://any-other-website.com/rss') def site2(info, body): soup = BeautifulSoup(body, "html.parser") for x in soup.find_all(‘img’, attrs={‘class’: ‘bar’}): print(x['src'])

Plugins

Getting social share countsfrom feedy import Feedy from feedy_plugins import social_share_plugin

app = Feedy(store='feedy.dat', ignore_fetched=True) app.install(social_shared_plugin)

@app.add('http://rss.cnn.com/rss/edition.rss') def cnn_shared(info, body, social_count): article = { 'pocket': social_count['pocket_count'], 'facebook': social_count['facebook_count'], } print(article)

github.com/c-bata/feedy

Pull requests, Issues are Welcom :)

More details are avairable on github.

http://github.com/c-bata/feedy

Thank you!

Technology

Introduction of Feedy