15
1 My Little Data A Poster by Candida Haynes - PyData NYC 2015 Big Data World in a

My Little Data in a Big Data World

Embed Size (px)

Citation preview

Page 1: My Little Data in a Big Data World

1

My Little Data

A Poster by Candida Haynes - PyData NYC 2015

Big Data World

in a

Page 2: My Little Data in a Big Data World

!!!!!!!You share on social media. You use email. Big data machines “think” they know you once they have analyzed some of your patterns. Now imagine “writing” a deliberate data story. !This poster describes an early process of !

1) retrieving personal data, !2) securing and exploring it, and !

3) identifying the tools that will allow individuals to start cultivating and/or preserving their data identities with code.

Description

Page 3: My Little Data in a Big Data World

3

Considerations at Each Step:

Does it expose personal data or thought processes to the web? To a platform that does not already have the data?

Friction for beginners – i.e. difficulty, information available online for someone with limited knowledge of jargon.

Page 4: My Little Data in a Big Data World

More on Friction

Most documentation

describes how to pull information from distributed

API's.

Required precautions might delay beginners

and ultimately lose them by seeming (or being) too far from their project

goals.

Page 5: My Little Data in a Big Data World

5

Problem-Solving Strategy: Local Security

!

!

Computed behind the firewall that was already on my system.

Encrypted hard drive, protected via password.

Page 6: My Little Data in a Big Data World

6

USB 4 gigabytes

Did not encrypt USB drive but will consider it in the next iteration to discover if / how that changes the process

Storage and Memory Management

Page 7: My Little Data in a Big Data World

7

Getting Your DataSome social media sites have a page where you can request a download file of your data. I chose to use Twitter and found my request link here: https://twitter.com/settings/account Timing: Sometimes the packaging and prep of your data download can take several minutes, several hours, or a few days. The email alert that the data was ready for download took less than two minutes to arrive this time.

Page 8: My Little Data in a Big Data World

8

Twitter Data CharacteristicsPersonal information that is / was already consumable by the public. !Delivered as ZIP file with Twitter's encryption while in transit. !Twitter archive file, which had a numerical name, included the a tweets.csv file, which matched the data type in Bokeh's example. !Each word in a tweet had its own column, which made counting easier. !

Page 9: My Little Data in a Big Data World

9

The data I downloaded appeared in a folder once I unzipped the file.

Page 10: My Little Data in a Big Data World

Grailbird.data.tweets_2008_12 =

[ {

"source" : "\u003Ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003ETwitter Web Client\u003C\/a\u003E",

"entities" : {

"user_mentions" : [ ],

"media" : [ ],

"hashtags" : [ ],

"urls" : [ ]

},

"geo" : { },

"id_str" : "1064099455",

"text" : "I need cookies.",

"id" : 1064099455,

"created_at" : "2008-12-18 00:00:00 +0000",

"user" : {

"name" : "Dida Lakes",

"screen_name" : "dihaynes",

"protected" : false,

"id_str" : "18206386",

"profile_image_url_https" : "https:\/\/pbs.twimg.com\/profile_images\/654911288917659648\/QKsP0wHR_normal.jpg",

"id" : 18206386,

This is a JSON file of my first tweet!

Page 11: My Little Data in a Big Data World

11

!Handled a .CSV file with > 7k tweets Adequately displayed data Sorting and counting tools did not require code !!!!

Problem: Dependencies Solution: Anaconda

Understanding Data: OpenOffice Calc as a Problem-Solving Tool

Not having the right software and configurations for the new software you are installing causes errors. Anaconda resolves a lot of those problems and is recommended in the Bokeh documentation

Page 12: My Little Data in a Big Data World

From @dihaynes on Twitter (sans cleansing) via Jason Davies.

You can explore more word cloud generators at http://worditout.com/ and http://tagcrowd.com.

Word Cloud: Visual Alternative to Calc

Page 13: My Little Data in a Big Data World

13

Data that matched my interest in employment data for future projects

Learning opportunities via dependencies that I could use to interact with tools that had previously posed too much friction

It had a presentation format that would allow for quick interactions

“Employment Sample” visualization had colors that resonated with me

Data Visualization Tool: Bokeh

Page 14: My Little Data in a Big Data World

Data visualization from Bokeh sample.

http://bokeh.pydata.org/en/latest/docs/gallery/unemployment.html

Visual Inspiration

Page 15: My Little Data in a Big Data World

15

What is the role of data science in society?

What would you add to the story of this project?

What are some moments when data science and storytelling are at odds? When are they not?

What questions do you still have about the data? About the process?

Questions for Discussion?