27
Hacking RSS: Filtering & Processing Obscene Amounts of Information #hackingRSS Hacking RSS: Filtering & Processing Obscene Amounts of Information #hackingRSS Dawn Foster Intel Community Manager for MeeGo [email protected]

SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Embed Size (px)

DESCRIPTION

Information overload is less about having too much information and more about not having the right tools and techniques to filter and process information to find the pieces that are most relevant for you. This presentation will focus on showing you a variety of tips and techniques to get you started down the path of looking at RSS feeds in a completely different light. The default RSS feeds generated by your favorite blog or website are just a starting point waiting to be hacked and manipulated to serve your needs. Most people read RSS feeds, but few people take the time to go one step further to hack on those RSS feeds to find only the most interesting posts. I combine tools like Yahoo Pipes, BackTweets, PostRank and more with some simple API calls to be able to find what I need while automatically discarding the rest. You start with one or more RSS feeds and then feed those results into other services to gather more information that can be used to further filter or process the results. This process is easier than it sounds once you learn a few simple tools and techniques, and no “real” programming experience is required to get started. This session will show you some tips and tricks to get you started down the path of hacking your RSS feeds.

Citation preview

Page 1: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Hacking RSS:Filtering & Processing

Obscene Amounts of Information#hackingRSS

Hacking RSS:Filtering & Processing

Obscene Amounts of Information#hackingRSS

Dawn FosterIntel Community Manager

for [email protected]

Page 2: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Information OverloadInformation Overload

CD Photo: http://www.flickr.com/photos/chefranden/2751354004/

Page 3: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Who Cares?Who Cares?

● Most of it is …– complete crap– out of date / obsolete– not interesting to you– irrelevant for you

Junk Pile: http://www.flickr.com/photos/zen/4013525/

Page 4: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

You Want to Find the NeedleYou Want to Find the Needle

Haystacks: http://www.flickr.com/photos/rasekh/4911673659/

Page 5: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

RSS Alone is a StartRSS Alone is a Start● Sources you care about delivered right to you. But …

– Do you care about everything in each feed?– What about the feeds you aren't subscribed to?– Can you keep up with what you have?

Page 6: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Prioritize Your ReaderPrioritize Your Reader

● Put things you care about at the top● Categorize● Don't try to read everything

Page 7: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Outsource / Crowdsource New SourcesOutsource / Crowdsource New Sources

Page 8: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

The Real Magic is in Filtering RSSThe Real Magic is in Filtering RSS

● In my Google Reader right now:– Analyst research blogs mentioning Online Community– Analyst research blogs mentioning MeeGo– Searches across social sites mentioning me, my projects, my

websites etc. - filtering out things I don't care about– My favorite blogs filtered using PostRank to find only the

ones with a lot of comments or social mentions

Complete CrapInteresting

Maybe Relevant

Yay!

Page 9: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

RSS Filtering ToolsRSS Filtering Tools● Yahoo Pipes (my favorite)

– More powerful & fexible: options to filter any data found in any field in the rss feed (URL, title, description, author …)

– Downside: takes some time to learn & can be a little faky at times. Also a single point of failure if Yahoo ever killed it.

● Other Options– FeedRinse: easy to use, not as fexible. Import RSS feeds,

add filters, get new RSS feeds out.– RSS readers with filtering / alerts (FeedDemon)– Code: write your own filters– Note: many free RSS filtering services have gone out of

business – can be bandwidth intensive & costly to host.

Page 10: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

● Input:– WebWorkerDaily– ReadWriteWeb

● Filter by content:– Collaborate– Collaboration– Collaborative

● Output:– 1 RSS Feed– Matching 3 keywords

Yahoo Pipes Filtering ExampleYahoo Pipes Filtering Example

2 Minute Yahoo Pipe Video How-to's: http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/

Page 11: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

PostRankPostRank● Best Posts in a

feed● Ranked on

engagement (links, sharing, comments)

● Can get output as RSS feed

● Feed includes postrank number as a field

Page 12: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

What's In a Feed? PostRank (Yahoo Pipes View)What's In a Feed? PostRank (Yahoo Pipes View)

● Content in feeds varies wildly depending on site.● Common: title, author, pubDate, link, content, description● Site-specific: postrank, lat/long, image links, username,

twitter source … (most RSS readers don't show these)● API: usually has additional data & can output RSS● If it's in the feed, you can use it!

Page 13: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Yahoo Pipes PostRank ExampleYahoo Pipes PostRank Example● Input PostRank

Feeds:– Engadget– CrunchGear– Boy Genius

● Filter by content– Tablet

● Sort:– PostRank

● Output– 1 RSS feed– Best tablet posts

Page 14: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Reformatting / Modifying RSS FeedsReformatting / Modifying RSS FeedsDon't be satisfied with default RSS feed formats!

TwitterSearch

TwitterRSSFeed

Modify & more quickly scan key data

Page 15: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Yahoo Pipes: Reformat Twitter FeedYahoo Pipes: Reformat Twitter Feed● Input:

– Twitter Search feed

● Loop String Build:– Author– : (spacing)– Title

● Loop Assign:– Store result back

into title● Output:

– 1 RSS feed– Efficient format

Page 16: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Yahoo Pipes: Reformat PostRank FeedYahoo Pipes: Reformat PostRank Feed● Input:

– 3 PostRank feeds● Loop String Build:

– PostRank– : (spacing)– Title

● Loop Assign:– Store result back

into title● Output:

– 1 RSS feed– Efficient format

Page 17: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Using Web APIs 101Using Web APIs 101● Many API calls are basically URLs● Constructing URLs

– Use API documentation/examples to format the URL

– http://api.twitter.com/1/statuses/show/ID.xml

● Version 1 of API show status for ID in .format

● API keys– Tells API who you are (password)

● Rate limiting– Only get so much & you're cut of– Limited by IP or API key– Chill out for a while & come back

XKCD Comic: http://xkcd.com/844/

Page 18: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

BackTweets (BackType API)BackTweets (BackType API)● Data about links on

Twitter● Finds links regardless of

shortening service● No RSS Feeds● But … You can use

API + Pipes to build one!

Page 19: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Backtweets API + Twitter API + Yahoo PipesBacktweets API + Twitter API + Yahoo Pipes● What we want to do:

– Start with a set of URLs (blog posts in a feed)– Find any tweet mentioning those URLs– Return the tweet and data about the person who posted it

● Mission: Build feed using only data from these 2 APIs ● BackType API provides Tweet ID (not humanly useful)

– http://api.backtype.com/tweets/search/links.xml?q=URL&mode=batch&key=KEY

– List of Twitter Status IDs for Tweets linking to URL– Note: I think this feature may be deprecated

● Twitter API uses Tweet ID to get everything else– http://api.twitter.com/1/statuses/show/ID.xml– Returns a single status all relevant data for ID

Page 20: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

BackTweets API: Get Tweet IDBackTweets API: Get Tweet ID

● Take WebWorkerDaily Author Feed● Use WWD URLs to build URLs for BackType API call● Fetch data from BackType URLs to get Tweet ID

Page 21: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Twitter API: Get Data Based on Tweet IDTwitter API: Get Data Based on Tweet ID

● Use BackType tweet ID to build URL for Twitter API● Fetch data about Tweet & User from Twitter API● Re-Build title to show “user (followers): tweet”

Page 22: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

BackType + Twitter API + Pipes OutputBackType + Twitter API + Pipes Output● Data from BackType + Twitter● Built an RSS feed using Yahoo Pipes● Included the information relevant for me● Could have included or filtered on: name, listed count,

location, profile image, user URL, ...

Page 23: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Add Filters to BackType + Twitter ExampleAdd Filters to BackType + Twitter Example● Show only tweets from people with 1000+ followers

Page 24: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Admit it, we ALL do vanity searchesAdmit it, we ALL do vanity searches● You can enter your search queries in Google, Twitter,

Flickr …– Add a new project & have to update all of them– Can be hard to filter out some results– May have duplicates from multiple searches

● Yahoo Pipes– Update keywords in a CSV file– Use CSV file as input into a bunch of searches (RSS or

API inputs)– Filter out what you don't want– Get 1 filtered RSS feed as output

2 minute video: http://fastwonderblog.com/2009/05/01/keyword-csv-files-and-searching-2-minute-yahoo-pipes-demo/

Page 25: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

How Should / Shouldn't You Use All of This?How Should / Shouldn't You Use All of This?● Do:

– Use this for personal productivity– Play around and understand the possibilities– Create prototypes for something you might want to build

● Don't: Use in critical or production environments

● Everything I've done here could be done in most programming languages

● For production use or putting data on websites:– Re-write in a real programming language with cached

results and error checkingXKCD Comic: http://xkcd.com/327/

Page 26: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

Q & AAbout Dawn:● Intel Community Manager for MeeGo● More Info: http://fastwonderblog.com● [email protected]● @geekygirldawn on Twitter

Additional Reading:● http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/

26

Photo of Dawn: http://www.flickr.com/photos/ahockley/3036575066/

Page 27: SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information

2703/15/11