My Little Data in a Big Data World

My Little Data

A Poster by Candida Haynes - PyData NYC 2015

Big Data World

!!!!!!!You share on social media. You use email. Big data machines “think” they know you once they have analyzed some of your patterns. Now imagine “writing” a deliberate data story. !This poster describes an early process of !

1) retrieving personal data, !2) securing and exploring it, and !

3) identifying the tools that will allow individuals to start cultivating and/or preserving their data identities with code.

Description

Considerations at Each Step:

Does it expose personal data or thought processes to the web? To a platform that does not already have the data?

Friction for beginners – i.e. difficulty, information available online for someone with limited knowledge of jargon.

More on Friction

Most documentation

describes how to pull information from distributed

API's.

Required precautions might delay beginners

and ultimately lose them by seeming (or being) too far from their project

goals.

Problem-Solving Strategy: Local Security

Computed behind the firewall that was already on my system.

Encrypted hard drive, protected via password.

USB 4 gigabytes

Did not encrypt USB drive but will consider it in the next iteration to discover if / how that changes the process

Storage and Memory Management

Getting Your DataSome social media sites have a page where you can request a download file of your data. I chose to use Twitter and found my request link here: https://twitter.com/settings/account Timing: Sometimes the packaging and prep of your data download can take several minutes, several hours, or a few days. The email alert that the data was ready for download took less than two minutes to arrive this time.

Twitter Data CharacteristicsPersonal information that is / was already consumable by the public. !Delivered as ZIP file with Twitter's encryption while in transit. !Twitter archive file, which had a numerical name, included the a tweets.csv file, which matched the data type in Bokeh's example. !Each word in a tweet had its own column, which made counting easier. !

The data I downloaded appeared in a folder once I unzipped the file.

Grailbird.data.tweets_2008_12 =

"source" : "\u003Ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003ETwitter Web Client\u003C\/a\u003E",

"entities" : {

"user_mentions" : [ ],

"media" : [ ],

"hashtags" : [ ],

"urls" : [ ]

"geo" : { },

"id_str" : "1064099455",

"text" : "I need cookies.",

"id" : 1064099455,

"created_at" : "2008-12-18 00:00:00 +0000",

"user" : {

"name" : "Dida Lakes",

"screen_name" : "dihaynes",

"protected" : false,

"id_str" : "18206386",

"profile_image_url_https" : "https:\/\/pbs.twimg.com\/profile_images\/654911288917659648\/QKsP0wHR_normal.jpg",

"id" : 18206386,

This is a JSON file of my first tweet!

!Handled a .CSV file with > 7k tweets Adequately displayed data Sorting and counting tools did not require code !!!!

Problem: Dependencies Solution: Anaconda

Understanding Data: OpenOffice Calc as a Problem-Solving Tool

Not having the right software and configurations for the new software you are installing causes errors. Anaconda resolves a lot of those problems and is recommended in the Bokeh documentation

From @dihaynes on Twitter (sans cleansing) via Jason Davies.

You can explore more word cloud generators at http://worditout.com/ and http://tagcrowd.com.

Word Cloud: Visual Alternative to Calc

Data that matched my interest in employment data for future projects

Learning opportunities via dependencies that I could use to interact with tools that had previously posed too much friction

It had a presentation format that would allow for quick interactions

“Employment Sample” visualization had colors that resonated with me

Data Visualization Tool: Bokeh

Data visualization from Bokeh sample.

http://bokeh.pydata.org/en/latest/docs/gallery/unemployment.html

Visual Inspiration

What is the role of data science in society?

What would you add to the story of this project?

What are some moments when data science and storytelling are at odds? When are they not?

What questions do you still have about the data? About the process?

Questions for Discussion?

My Little Data in a Big Data World

Data & Analytics

Little Big Data - inovex.de · Little Big Data Analyze your own data with the Elastic Stack inovex Meetup Köln, 18.09.2017. Fun: Marathon Triathlon Lactate Garmin Work (can be fun,

Beyond Big or Little Science: Understanding Data Life

Big Data, Little Data, No Data – Who is in Charge of Data Quality?

Finding Little Things in Big Data - Boston UniversityFinding Little Things in Big Data BU Security Camp 2016 Patrick Cain. Patrick.Cain@bc.edu, pcain@coopercain.com, Patrick.Cain@tufts.edu

Big Data for the Little People | Guy Tomer

LINQits: Big Data on Little Clients - microsoft.com · LINQits: Big Data on Little Clients ... // Equivalent SQL-style syntax 8. var results = from cust in customer ... traditional

Little Data That's Pretty Big

"Little Data In a Big Data World"

Big Data Little Disease' - OBH and Big Data Partnership

The Little Big Data Showdown

Little data, Big problems

Counting Little Words in Big Data: The Psychology …...2 Counting Little Words in Big Data: The Psychology of Communities, Culture, and History Language can provide a window into

Big Data is a big lie without little data: Humanistic intelligence ...wearcam.org/BigDataBigLies.pdfCommentary Big Data is a big lie without little data: Humanistic intelligence as

2015 07 13 Big Data in Little PEI.pptx

Privacy & Ownership of Data - Little Sister vs. Big Brother

Big Data/Little Data Big Team/Little Teamk-hen.com/Portals/16/Education/QualitySymposium2017/BigDataPresentation.pdfBig DATA + Little DATA = A better approach BIG Data • The entire

BIG DATA: LITTLE DATA IN HEALTHCARE?

Big data is interesting, but little data is useful handout

Lots of Data, Little Money. A Last.fm perspectivetalks.dekstop.de/2009-04-23 big data little money.pdf · Lots of Data, Little Money. A Last.fm perspective Martin Dittus, martind@last.fm

“Big Data, Little Data & Everything In Between”€¦ · Webex Support 1-866-223-3239 Event # 293 139 695 Slide Deck: goo.gl/u6nsd “Big Data, Little Data & Everything In Between”