27
Hacking RSS: Filtering & Processing Obscene Amounts of Information #hackingRSS Hacking RSS: Filtering & Processing Obscene Amounts of Information #hackingRSS Dawn Foster Intel Community Manager for MeeGo [email protected]

Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version)

Embed Size (px)

DESCRIPTION

The 15 minute version of the longer talk that I delivered at SXSW in March. More details: http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/

Citation preview

Page 1: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Hacking RSS:Filtering & Processing

Obscene Amounts of Information#hackingRSS

Hacking RSS:Filtering & Processing

Obscene Amounts of Information#hackingRSS

Dawn FosterIntel Community Manager

for [email protected]

Page 2: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Information OverloadInformation Overload

CD Photo: http://www.flickr.com/photos/chefranden/2751354004/

Page 3: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Who Cares?Who Cares?

● Most of it is …– complete crap– out of date / obsolete– not interesting to you– irrelevant for you

Junk Pile: http://www.flickr.com/photos/zen/4013525/

Page 4: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

You Want to Find the NeedleYou Want to Find the Needle

Haystacks: http://www.flickr.com/photos/rasekh/4911673659/

Page 5: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

RSS Alone is a StartRSS Alone is a Start● Sources you care about delivered right to you. But …

– Do you care about everything in each feed?– What about the feeds you aren't subscribed to?– Can you keep up with what you have?

Page 6: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Prioritize Your ReaderPrioritize Your Reader

● Put things you care about at the top● Categorize● Don't try to read everything

Page 7: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

The Real Magic is in Filtering RSSThe Real Magic is in Filtering RSS

● In my Google Reader right now:– Analyst research blogs mentioning Online Community– Analyst research blogs mentioning MeeGo– Searches across social sites mentioning me, my projects, my

websites etc. - filtering out things I don't care about– My favorite blogs filtered using PostRank to find only the

ones with a lot of comments or social mentions

Complete CrapInteresting

Maybe Relevant

Yay!

Page 8: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

RSS Filtering ToolsRSS Filtering Tools● Yahoo Pipes (my favorite)

– More powerful & fexible: options to filter any data found in any field in the rss feed (URL, title, description, author …)

– Downside: takes some time to learn & can be a little faky at times. Also a single point of failure if Yahoo ever killed it.

● Other Options– FeedRinse: easy to use, not as fexible. Import RSS feeds,

add filters, get new RSS feeds out.– RSS readers with filtering / alerts (FeedDemon)– Code: write your own filters– Note: many free RSS filtering services have gone out of

business – can be bandwidth intensive & costly to host.

Page 9: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

● Input:– WebWorkerDaily– ReadWriteWeb

● Filter by content:– Collaborate– Collaboration– Collaborative

● Output:– 1 RSS Feed– Matching 3 keywords

Yahoo Pipes Filtering ExampleYahoo Pipes Filtering Example

2 Minute Yahoo Pipe Video How-to's: http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/

Page 10: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

PostRankPostRank● Best Posts in a

feed● Ranked on

engagement (links, sharing, comments)

● Can get output as RSS feed

● Feed includes postrank number as a field

Page 11: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

What's In a Feed? PostRank (Yahoo Pipes View)What's In a Feed? PostRank (Yahoo Pipes View)

● Content in feeds varies wildly depending on site.● Common: title, author, pubDate, link, content, description● Site-specific: postrank, lat/long, image links, username,

twitter source … (most RSS readers don't show these)● API: usually has additional data & can output RSS● If it's in the feed, you can use it!

Page 12: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Reformatting / Modifying RSS FeedsReformatting / Modifying RSS FeedsDon't be satisfied with default RSS feed formats!

TwitterSearch

TwitterRSSFeed

Modify & more quickly scan key data

Page 13: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Yahoo Pipes: Reformat Twitter FeedYahoo Pipes: Reformat Twitter Feed● Input:

– Twitter Search feed

● Loop String Build:– Author– : (spacing)– Title

● Loop Assign:– Store result back

into title● Output:

– 1 RSS feed– Efficient format

Page 14: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

BackTweets (BackType API)BackTweets (BackType API)● Data about links on

Twitter● Finds links regardless of

shortening service● No RSS Feeds● But … You can use

API + Pipes to build one!

Page 15: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

BackType + Twitter API + Pipes OutputBackType + Twitter API + Pipes Output● Data from BackType + Twitter● Built an RSS feed using Yahoo Pipes● Included the information relevant for me● Could have included or filtered on: name, listed count,

location, profile image, user URL, ...

Page 16: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Admit it, we ALL do vanity searchesAdmit it, we ALL do vanity searches● You can enter your search queries in Google, Twitter,

Flickr …– Add a new project & have to update all of them– Can be hard to filter out some results– May have duplicates from multiple searches

● Yahoo Pipes– Update keywords in a CSV file– Use CSV file as input into a bunch of searches (RSS or

API inputs)– Filter out what you don't want– Get 1 filtered RSS feed as output

2 minute video: http://fastwonderblog.com/2009/05/01/keyword-csv-files-and-searching-2-minute-yahoo-pipes-demo/

Page 17: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

How Should / Shouldn't You Use All of This?How Should / Shouldn't You Use All of This?● Do:

– Use this for personal productivity– Play around, create prototypes and understand the possibilities

● Don't: – Don't violate licenses on content or republish w/o permission– Don't use in critical or production environments

● For production use or putting data on websites:– Re-write in a real programming language with cached results

and error checkingXKCD Comic: http://xkcd.com/327/

Page 18: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Learn MoreAbout Dawn:● Intel Community Manager for MeeGo● Author of Companies and Communities● More Info: http://fastwonderblog.com● [email protected]● @geekygirldawn on Twitter

Additional Reading & audio from 1 hour version of this talk:● http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/

18

Photo of Dawn: http://www.flickr.com/photos/ahockley/3036575066/

Page 19: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Backup

Page 20: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Outsource / Crowdsource New SourcesOutsource / Crowdsource New Sources

Page 21: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Yahoo Pipes: Reformat PostRank FeedYahoo Pipes: Reformat PostRank Feed● Input:

– 3 PostRank feeds● Loop String Build:

– PostRank– : (spacing)– Title

● Loop Assign:– Store result back

into title● Output:

– 1 RSS feed– Efficient format

Page 22: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Yahoo Pipes PostRank ExampleYahoo Pipes PostRank Example● Input PostRank

Feeds:– Engadget– CrunchGear– Boy Genius

● Filter by content– Tablet

● Sort:– PostRank

● Output– 1 RSS feed– Best tablet posts

Page 23: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Using Web APIs 101Using Web APIs 101● Many API calls are basically URLs● Constructing URLs

– Use API documentation/examples to format the URL

– http://api.twitter.com/1/statuses/show/ID.xml

● Version 1 of API show status for ID in .format

● API keys– Tells API who you are (password)

● Rate limiting– Only get so much & you're cut of– Limited by IP or API key– Chill out for a while & come back

XKCD Comic: http://xkcd.com/844/

Page 24: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Backtweets API + Twitter API + Yahoo PipesBacktweets API + Twitter API + Yahoo Pipes● What we want to do:

– Start with a set of URLs (blog posts in a feed)– Find any tweet mentioning those URLs– Return the tweet and data about the person who posted it

● Mission: Build feed using only data from these 2 APIs ● BackType API provides Tweet ID (not humanly useful)

– http://api.backtype.com/tweets/search/links.xml?q=URL&mode=batch&key=KEY

– List of Twitter Status IDs for Tweets linking to URL– Note: I think this feature may be deprecated

● Twitter API uses Tweet ID to get everything else– http://api.twitter.com/1/statuses/show/ID.xml– Returns a single status all relevant data for ID

Page 25: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

BackTweets API: Get Tweet IDBackTweets API: Get Tweet ID

● Take WebWorkerDaily Author Feed● Use WWD URLs to build URLs for BackType API call● Fetch data from BackType URLs to get Tweet ID

Page 26: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Twitter API: Get Data Based on Tweet IDTwitter API: Get Data Based on Tweet ID

● Use BackType tweet ID to build URL for Twitter API● Fetch data about Tweet & User from Twitter API● Re-Build title to show “user (followers): tweet”

Page 27: Hacking RSS: Filtering & Processing  Obscene Amounts of Information (short version)

Add Filters to BackType + Twitter ExampleAdd Filters to BackType + Twitter Example● Show only tweets from people with 1000+ followers