1.9 Making Robust Twitter Re-quests
• Problem– You want to write a long-running script that harvests large amounts of data, such as the friend and follower ids for a very popular Twit-terer; however, the Twitter API is inherently unreliable and imposes rate limits that require you to always expect the unexpected.
• Solution– Write an abstraction for making twitter requests that accounts for rate limiting and other types of HTTP errors so that you can focus on the problem at hand and not worry about HTTP errors or rate limits, which are just a very specific kind of HTTP error.
Error Codes & ResponsesCod
eText Description
200 OK Success!304 Not Modified There was no new data to return.
400 Bad RequestThe request was invalid. An accompanying error message will explain why. This is the status code will be returned during version 1.0 rate limiting. In API v1.1, a re-quest without authentication is considered invalid and you will get this response.
401 Unauthorized Authentication credentials were missing or incorrect.
403 ForbiddenThe request is understood, but it has been refused or access is not allowed. An ac-companying error message will explain why. This code is used when requests are be-ing denied due to update limits.
404 Not FoundThe URI requested is invalid or the resource requested, such as a user, does not ex-ists. Also returned when the requested format is not supported by the requested method.
406 Not Acceptable Returned by the Search API when an invalid format is specified in the request.
410 GoneThis resource is gone. Used to indicate that an API endpoint has been turned off. For example: "The Twitter REST API v1 will soon stop functioning. Please migrate to API v1.1."
420 Enhance Your Calm Returned by the version 1 Search and Trends APIs when you are being rate limited.
422Unprocessable En-
tityReturned when an image uploaded to POST account/update_profile_banner is unable to be processed.
429 Too Many RequestsReturned in API v1.1 when a request cannot be served due to the application's rate limit having been exhausted for the resource. See Rate Limiting in API v1.1.
500Internal Server Er-
rorSomething is broken. Please post to the group so the Twitter team can investigate.
502 Bad Gateway Twitter is down or being upgraded.
503Service Unavail-
ableThe Twitter servers are up, but overloaded with requests. Try again later.
504 Gateway timeoutThe Twitter servers are up, but the request couldn't be serviced due to some failure within our stack. Try again later.
Error Messages• {"errors":[{"message":"Sorry, that page does not
exist","code":34}]}• <?xml version="1.0" encoding="UTF-8"?>
<errors><error code="34">Sorry, that page does not exist</error></errors>
Error CodesCode Text Description
32 Could not authenticate you Your call could not be completed as di-aled.
34 Sorry, that page does not exist
Corresponds with an HTTP 404 - the spec-ified resource was not found.
88 Rate limit exceededThe request limit for this resource has been reached for the current rate limit window.
89 Invalid or expired token The access token used in the request is incorrect or has expired. Used in API v1.1
130 Over capacity Corresponds with an HTTP 503 - Twitter is temporarily over capacity.
131 Internal error Corresponds with an HTTP 500 - An un-known internal error occurred.
135 Could not authenticate youCorresponds with a HTTP 401 - it means that your oauth_timestamp is either ahead or behind our acceptable range
215 Bad authentication data
Typically sent with 1.1 responses with HTTP code 400. The method requires au-thentication but it was not presented or was wholly invalid.
1.10
• Problem– You want to harvest and store tweets from a collection of id values, or harvest entire timelines of tweets
• Solution– Use the /statuses/show resource to fetch a single tweet by its id value; the various /statuses/*_timeline methods can be used to fetch timeline data. CouchDB is a great op-tion for persistent storage, and also pro-vides a map/reduce processing paradigm and built-in ways to share your analysis with others.
Document
{"_id": "tansac",“_rev”: “1”"profile": {"nickname": "tansanc","name": {"firstname": "종명","lastname": "김"},"birthdate": "1987-05-31“}
}
Schema Free
{"_id": "tansac",“_rev”: “2”"profile": {
"nickname": "tansanc","name": {
"firstname": "종명","lastname": "김"
},"birthdate": "1987-05-31”“hasBrother”: true
}}
tweepy get timeline• API.public_timeline()
– Returns the 20 most recent statuses from non-protected users who have set a custom user icon. The public timeline is cached for 60 seconds so requesting it more often than that is a waste of resources.
– Parameters: None– Returns: list of class:Status objects
• API.home_timeline()– Returns the 20 most recent statuses, including retweets, posted by the authenticating
user and that user’s friends. This is the equivalent of /timeline/home on the Web.– Parameters: since_id, max_id, count, page– Returns: list of class:Status objects
• API.friends_timeline()– Returns the 20 most recent statuses posted by the authenticating user and that user’s
friends.– Parameters: since_id, max_id, count, page– Returns: list of class:Status objects
• API.user_timeline()– Returns the 20 most recent statuses posted from the authenticating user. It’s also pos-
sible to request another user’s timeline via the id parameter.– Parameters: (id or user_id or screen_name), since_id, max_id, count, page– Returns: list of class:Status objects
• http://pythonhosted.org/tweepy/html/api.html#timeline-methods