98
23 April 2014

English R Lightning Talks @ BURN (2014-04-22)

Embed Size (px)

DESCRIPTION

6 lightning talks on various R topics at the Budapest Users of R Network:• László Gönczy - Exploratory data analysis: project experience and ongoing developments• Gergely Horváth - R workshop in Bucharest• Imre Kocsis - Bigvis: plotting (relatively) large data in R• András Tajti: Changing User Roles in an Online Forum • Dénes Tóth - Dilemmas in package development: interactive visualization, GUIs, largish data, extensibility• Gergely Daróczi - Transforming R objects to Pandoc’s markdownMore details: http://www.meetup.com/Budapest-Users-of-R-Network/events/174345362/

Citation preview

Page 1: English R Lightning Talks @ BURN (2014-04-22)

23 April 2014

Page 2: English R Lightning Talks @ BURN (2014-04-22)

László Gönczy:Exploratory data analysis:

project experience and ongoing developments

Quanopt

Page 3: English R Lightning Talks @ BURN (2014-04-22)
Page 4: English R Lightning Talks @ BURN (2014-04-22)
Page 5: English R Lightning Talks @ BURN (2014-04-22)
Page 6: English R Lightning Talks @ BURN (2014-04-22)
Page 7: English R Lightning Talks @ BURN (2014-04-22)
Page 8: English R Lightning Talks @ BURN (2014-04-22)
Page 9: English R Lightning Talks @ BURN (2014-04-22)
Page 10: English R Lightning Talks @ BURN (2014-04-22)
Page 11: English R Lightning Talks @ BURN (2014-04-22)
Page 12: English R Lightning Talks @ BURN (2014-04-22)
Page 13: English R Lightning Talks @ BURN (2014-04-22)
Page 14: English R Lightning Talks @ BURN (2014-04-22)
Page 15: English R Lightning Talks @ BURN (2014-04-22)
Page 16: English R Lightning Talks @ BURN (2014-04-22)
Page 17: English R Lightning Talks @ BURN (2014-04-22)
Page 18: English R Lightning Talks @ BURN (2014-04-22)

László Gönczy:Exploratory data analysis: project

experience and ongoing developments

Gergely Horváth:R workshop in Bucharest

KSH

Page 19: English R Lightning Talks @ BURN (2014-04-22)

buchaRest

Page 20: English R Lightning Talks @ BURN (2014-04-22)

Literature

Page 21: English R Lightning Talks @ BURN (2014-04-22)

Romania - organizer

Page 22: English R Lightning Talks @ BURN (2014-04-22)

Girafe and church

Page 23: English R Lightning Talks @ BURN (2014-04-22)

Big-big professor

Page 24: English R Lightning Talks @ BURN (2014-04-22)

Mr V. Tepes alias Dracula

Page 25: English R Lightning Talks @ BURN (2014-04-22)

Hungary

Page 26: English R Lightning Talks @ BURN (2014-04-22)
Page 27: English R Lightning Talks @ BURN (2014-04-22)

Serbia

Page 28: English R Lightning Talks @ BURN (2014-04-22)

Ancient hero - Traian

Page 29: English R Lightning Talks @ BURN (2014-04-22)

Austria

Page 30: English R Lightning Talks @ BURN (2014-04-22)
Page 31: English R Lightning Talks @ BURN (2014-04-22)

Romania

Page 32: English R Lightning Talks @ BURN (2014-04-22)

RO – GB - NL

Page 33: English R Lightning Talks @ BURN (2014-04-22)
Page 34: English R Lightning Talks @ BURN (2014-04-22)

Gergely Horváth:R workshop in Bucharest

Quanopt

Imre Kocsis:Bigvis: plotting

(relatively) large data in R

Page 35: English R Lightning Talks @ BURN (2014-04-22)

Budapesti Műszaki és Gazdaságtudományi EgyetemMéréstechnika és Információs Rendszerek Tanszék

Bigvis: plotting (relatively) large data in R

Kocsis Imre

[email protected]

BURN Meetup, 2014.04.22.

Page 36: English R Lightning Talks @ BURN (2014-04-22)

Let’s do Exploratory Data Analysis!

„Flight data”

2008: 113MB df

~7 million x 29

> system.time(print((qplot(data=b,

x=Distance,y=AirTime))))

user system elapsed102.2 60.2 163.5

Page 37: English R Lightning Talks @ BURN (2014-04-22)

SotA

Relatively PainlessVisual EDA

Relatively PainlessHandling of Big Data

[…]

[…]

Page 38: English R Lightning Talks @ BURN (2014-04-22)

bigvis

From Hadley Wickham

A rather generic approach

o Paper: vita.had.co.nz/papers/bigvis.pdf

o Slides: files.meetup.com/1406240/bigvis.pdf

A reference implementation in R

o ggplot2 gets a huge boost

o GitHub: hadley/bigvis

Page 39: English R Lightning Talks @ BURN (2014-04-22)

Big Data EDA?

Subsampling is a hassle.

You probably want…

0. For the whole data

1. Summary statistics over

2. Interval-binned data

+ Error approx. would be nice

+ Supress outliers (or not)

Page 40: English R Lightning Talks @ BURN (2014-04-22)

Put in pictures…

ggplot2 bigvis

Few seconds

Page 41: English R Lightning Talks @ BURN (2014-04-22)

bigvis (simplified) workflow

bin()

Data in memory

bin()

condense()

bin() Interval binning

count, sum, mean, median, sd

Page 42: English R Lightning Talks @ BURN (2014-04-22)

bigvis (simplified) workflow

condense()

smooth()

peel()

count, sum, mean, median, sd

smooth out errors

peel off outliers

Page 43: English R Lightning Talks @ BURN (2014-04-22)

… and then plot with ggplot

Page 44: English R Lightning Talks @ BURN (2014-04-22)

Some other aspects

Some further automatic magic with KDE

Relative error estimation with alpha / hues

Vis. patterns for (n, m)-d datasets

o n: # of binned variables

om: # of summaries

o Dens. estimate: (1,1)-d, earlier: (2,1)-d

Page 45: English R Lightning Talks @ BURN (2014-04-22)

Parallelization & decoupling?

The pattern can scale bymoving out concerns from R

bin: see MapReduce

Some formulations easy forstream proc., too

bin

data

summarize

smooth

visualize

Page 46: English R Lightning Talks @ BURN (2014-04-22)

Parallelization & decoupling?

Summary: depends…

Distributive stats: count, sum, min, max

Algebraic stats: mean, sd, higher moments

Holistic…? (quantiles, countdistinct)

bin

data

summarize

smooth

visualize

Page 47: English R Lightning Talks @ BURN (2014-04-22)

Parallelization & decoupling?

Input: mostly „resolution” bound

R excels here

bin

data

summarize

smooth

visualize

Page 48: English R Lightning Talks @ BURN (2014-04-22)

Towards interactive EDA?

Bin-summarize-smooth can be still long…

Precompute/cache…

… and e.g. update after new batches

Raw data-at-rest

RDBMS / in-memory summarized data

client

Page 49: English R Lightning Talks @ BURN (2014-04-22)

Imre Kocsis:Bigvis: plotting (relatively)

large data in R

András Tajti:Changing User Roles in

an Online Forum

Page 50: English R Lightning Talks @ BURN (2014-04-22)

Changing User Roles in an Online Forum

András Tajti

BURN meetup

04.23.2014.

Page 51: English R Lightning Talks @ BURN (2014-04-22)

Questions

1. Can we declare patterns in user behaviour?

2. Can we detect the change of the behaviour?

Of course, we can!

I will show you one way...

Page 52: English R Lightning Talks @ BURN (2014-04-22)

Theoretical tools

● You need features to describe behaviour:– Network science

● You need to find the most important variables:– Principal component analysis

● You need to find users with similar behaviour:– Cluster analysis

Page 53: English R Lightning Talks @ BURN (2014-04-22)

Practical tools

● To do all the computations, I used R packages:– Igraph for extracting network features– PcaPP and rrcov for PCA– Fpc for cluster evaluation

● Of course, basic R functions were used mostly:– Princomp for PCA– Hclust for hierarchical clustering– Compiler package for faster computation

Page 54: English R Lightning Talks @ BURN (2014-04-22)

How does a forum look like?

● One post is either a reply to another or not:– One post has maximum one out-degree– Can have several in-degrees as any later post can

refer to it.

Page 55: English R Lightning Talks @ BURN (2014-04-22)

Users' features

● To describe behaviour, I used:– Number of posts– Number of neighbours– Parent users in- and outdegree– All above as ranks and relative ranks

Page 56: English R Lightning Talks @ BURN (2014-04-22)

Choosing important features

● Main problem: all variables have heavy-tailed distribution– Principal component is best for normally

distributed variables– Alternatives:

● Robust correlation estimation● Projection pursuit methods

– Winner: ROBpca from rrcov as PcaHubert– Mostly the same results as the original Princomp

Page 57: English R Lightning Talks @ BURN (2014-04-22)

Searching groups

● Cluster analysis:– Hierarchical, with euclidean distance and

complete linkage– Used on the PCA scores increased with explained

variance– Technical limits on the number of clusters:

● Min.: 2 (the result contained groupings with at least three grous)

● Max: 30 (was reached a few times)

Page 58: English R Lightning Talks @ BURN (2014-04-22)

Selecting cluster numbers

● For every goodness measure, I was looking for– First local minimum/maximum– Sharpest “elbow”

Page 59: English R Lightning Talks @ BURN (2014-04-22)

Select by eye

Page 60: English R Lightning Talks @ BURN (2014-04-22)

What is changing?

● I used “time windows“ to slice the data● One window contained 1000 posts and their

full thread● I ran role detection for all sets● Than compared memberships between

clusters

Page 61: English R Lightning Talks @ BURN (2014-04-22)

How to compare memberships?

● There are users only in one or the other dataset

● Two groups are similar if they have significant number of common users:

Page 62: English R Lightning Talks @ BURN (2014-04-22)

Example

Page 63: English R Lightning Talks @ BURN (2014-04-22)

Example

Page 64: English R Lightning Talks @ BURN (2014-04-22)

Thank You!

[email protected]@atajti

The code will be availabe at github.com/atajti/changingForumRoles

Page 65: English R Lightning Talks @ BURN (2014-04-22)

András Tajti:Changing User Roles in an Online Forum

MTA

Dénes Tóth:Dilemmas in package development:

interactive visualization, GUIs, largish data, extensibility

Page 66: English R Lightning Talks @ BURN (2014-04-22)

[email protected]

Dilemmas in package development:

Dénes Tóth

interactive visualization, GUIs, largish data, extensibility

BURN Meetup 1 / 15Budapest, 23.04.2014.

Page 67: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 2 / 15Budapest, 23.04.2014.

• Electroencephalography (EEG)– Voltage fluctuations (μV) recorded at the scalp– A typical setup: 32-128 channels, 500-1000 Hz sampling

rate, 30-90 minutes recording time, 20-30 participants → 200 MB – 2 GB / participant

– Tasks: raw data import + signal processing (filtering, resampling, artifact correction [e.g. eye movements])

– Visual inspection is unavoidable → interactive visualization is a must

EEG

Page 68: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 3 / 15Budapest, 23.04.2014.

Page 69: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 4 / 15Budapest, 23.04.2014.

• Cognitive experiments: what does the brain do if exposed to A versus B

– EEG & events = Event-related potentials (ERP)– 40-200 repetitions per condition, factorial design (Fac1 x

Fac2)– Tasks: marker handling, segmentation, artifact rejection,

averaging, time-frequency analyses → extract components & do statistics (e.g. clustering, ANOVA, etc.) → tremendous number of analytic possibilities

– randomization statistics (e.g. 5000 permutations)

ERP

Page 70: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 5 / 15Budapest, 23.04.2014.

Page 71: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 6 / 15Budapest, 23.04.2014.

• no dedicated comprehensive package in R for EEG, but a lot of related packages (e.g. signal, mfilter, icaOcularCorrection + one trillion statistical methods)

• Present (eegR)– Base data class: array– Basic operation: apply-like– ~60 functions, ~4000 lines → appropriate for a specific

workflow– No cohesive system

eegR package

Page 72: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 7 / 15Budapest, 23.04.2014.

• Future (dream :)– Covers all basic analytic steps + highly extensible– Provides Workflow + GUI + scripting– Handles well out of memory datasets, easy parallelization– Interactive visualization capabilities

eegR package

Page 73: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 8 / 15Budapest, 23.04.2014.

• One package– Pros: easy install process, better tuning options– Cons: less general, harder to extend

• One core package + extensions– Pros: anyone can write extensions, easy to invoke other

packages– Cons: the core package must be very well written

Question I. One package or related packages?

Page 74: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 9 / 15Budapest, 23.04.2014.

• Range– Introduce only classes, methods and utility functions, or

provide a basic stand-alone package?

• Classes– S3 / S4 / R5 ?

Question I/a. How to write a good core?

Page 75: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 10 / 15Budapest, 23.04.2014.

Workflow approach: R AnalyticFlow• Pros:

the natural way ofEEG signal processing

unconstrained scripting

• Cons:

reliability?performance?

Question II.What about the user interface?

Page 76: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 11 / 15Budapest, 23.04.2014.

• GUI coverageFull GUI ←→ subtask- (function-) related GUI

• GUI type– Desktop GUI ←→ web based GUI– gWidgets2 |

gWidgetsWWW2 | Shiny |

...

Question II.What about the user interface?

Page 77: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 12 / 15Budapest, 23.04.2014.

• SciDB would be great, but only available on Linux• Two candidates: ff & gdsfmt packages

– ff package: more comprehensive– gdsfmt package: lightweight & fast

Question III.Which out-of-RAM package to choose?

Page 78: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 13 / 15Budapest, 23.04.2014.

• iPlots, playwith, ggvis etc.: good, but not efficient• Performance issues: a 10-sec part can contain

128 x 1000 x 10 = 1.280.000 data points• Candidates for line plots:

– Acinonyx– rCharts w. Dygraphs

Question IV.Interactive visualization?

Page 79: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 14 / 15Budapest, 23.04.2014.

• Acinonyx (iPlots Extreme)– Pros: very fast, iContainer– Cons: very poor documentation, not on CRAN

• rCharts and other web-based tools, esp. JavaScript libraries

– Dygraphs: fast and nice, but no official port to rCharts– Communication between JS and R?

Question IV.Interactive visualization?

Page 80: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu

BURN Meetup 15 / 15Budapest, 23.04.2014.

Thank you!

Q1: One package or related packages?

Q1a: What should the base package cover? Do I need S4 or R5?

Q2: User interface? → R AnalyticFlow, GUIs

Q3: How to handle out-of-memory data?

Q4: Interactive visualization?

Page 81: English R Lightning Talks @ BURN (2014-04-22)

Dénes Tóth:Dilemmas in package development:

interactive visualization, GUIs, largish data, extensibility

rapporter.net

Gergely Daróczi:pander: Transforming R objects

to Pandoc’s markdown

Page 82: English R Lightning Talks @ BURN (2014-04-22)

pander: A Pandoc writer in RTransforming R objects to Pandoc’s markdown

Gergely Daró[email protected]

Budapest Users of R Network

23 April 2014

Page 83: English R Lightning Talks @ BURN (2014-04-22)

What is pander?A collection of helper functions to print markdown syntax

> ?pandoc.(footnote|header|horizontal.rule|image|link|p)(.return)?> ?pandoc.(emphasis|strikeout|strong|verbatim)(.return)?

> pandoc.strong(’foobar’)**foobar**

> pandoc.strong.return(’foobar’)[1] "**foobar**"

> pandoc.header(’foobar’, level = 2)

## foobar

> pandoc.header(’foobar’, style = ’setext’)

foobar======

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 2 / 15

Page 84: English R Lightning Talks @ BURN (2014-04-22)

What is pander?Collection of helper functions to map R objects to markdown

> ?pandoc.(list|table)(.return)?

> pandoc.list(list(’foo’, list(’bar’)))

* foo* bar

> pandoc.table(head(iris, 2), split.table = Inf)

-------------------------------------------------------------------Sepal.Length Sepal.Width Petal.Length Petal.Width Species

-------------- ------------- -------------- ------------- ---------5.1 3.5 1.4 0.2 setosa

4.9 3 1.4 0.2 setosa-------------------------------------------------------------------

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 3 / 15

Page 85: English R Lightning Talks @ BURN (2014-04-22)

What is pander?Collection of helper functions to map R objects to various markdown languages

> pandoc.table(head(iris, 2), split.table = Inf, style = ’rmarkdown’)

| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species ||:--------------:|:-------------:|:--------------:|:-------------:|:---------:|| 5.1 | 3.5 | 1.4 | 0.2 | setosa || 4.9 | 3 | 1.4 | 0.2 | setosa |

> pandoc.table(head(iris, 2), split.table = Inf, style = ’simple’)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species-------------- ------------- -------------- ------------- ---------

5.1 3.5 1.4 0.2 setosa4.9 3 1.4 0.2 setosa

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 4 / 15

Page 86: English R Lightning Talks @ BURN (2014-04-22)

What is pander?Collection of helper functions to map R objects to various markdown languages

> iris$Species <- ’foos and bars’; names(iris) <- gsub(’.’, ’ ’, names(iris)> pandoc.table(head(iris, 4), split.table = Inf, style = ’grid’,+ split.cells = 5, justify = ’left’)

+----------+---------+----------+---------+------------+| Sepal | Sepal | Petal | Petal | Species || Length | Width | Length | Width | |+==========+=========+==========+=========+============+| 5.1 | 3.5 | 1.4 | 0.2 | setosa |+----------+---------+----------+---------+------------+| 4.9 | 3 | 1.4 | 0.2 | setosa |+----------+---------+----------+---------+------------+| 4.7 | 3.2 | 1.3 | 0.2 | setosa |+----------+---------+----------+---------+------------+| 4.6 | 3.1 | 1.5 | 0.2 | foos || | | | | and || | | | | bars |+----------+---------+----------+---------+------------+

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 5 / 15

Page 87: English R Lightning Talks @ BURN (2014-04-22)

What is pander?S3 method to map R objects to markdown

> ?pander(.return)?

> methods(pander)

[1] pander.anova* pander.aov* pander.cast_df* pander.character*

[5] pander.data.frame* pander.default* pander.density* pander.evals*

[9] pander.factor* pander.glm* pander.htest* pander.image*

[13] pander.list* pander.lm* pander.logical* pander.matrix*

[17] pander.NULL* pander.numeric* pander.option pander.POSIXct*

[21] pander.POSIXt* pander.prcomp* pander.rapport* pander.table*

Non-visible functions are asterisked

> pander(head(iris, 1), split.table = Inf)

-------------------------------------------------------------------

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

-------------- ------------- -------------- ------------- ---------

5.1 3.5 1.4 0.2 setosa

-------------------------------------------------------------------

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 6 / 15

Page 88: English R Lightning Talks @ BURN (2014-04-22)

What is pander?S3 method to map R objects to markdown

> pander(letters[1:7])

_a_, _b_, _c_, _d_, _e_, _f_ and _g_

> pander(ks.test(runif(50), runif(50))

---------------------------------------------------

Test statistic P value Alternative hypothesis

---------------- --------- ------------------------

0.18 _0.3959_ two-sided

---------------------------------------------------

Table: Two-sample Kolmogorov-Smirnov test: ‘runif(50)‘ and ‘runif(50)‘

> pander(chisq.test(table(mtcars$am, mtcars$gear)))

---------------------------------------

Test statistic df P value

---------------- ---- -----------------

20.94 2 _2.831e-05_ * * *

---------------------------------------

Table: Pearson’s Chi-squared test: ‘table(mtcars$am, mtcars$gear)‘

Warning message:In chisq.test(table(mtcars$am, mtcars$gear)) :

Chi-squared approximation may be incorrect

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 7 / 15

Page 89: English R Lightning Talks @ BURN (2014-04-22)

What is pander?S3 method to map R objects to markdown

> pander(lm(mtcars$wt ~ mtcars$hp), summary = TRUE)

--------------------------------------------------------------

&nbsp; Estimate Std. Error t value Pr(>|t|)

----------------- ---------- ------------ --------- ----------

**mtcars$hp** 0.009401 0.00196 4.796 4.146e-05

**(Intercept)** 1.838 0.3165 5.808 2.389e-06

--------------------------------------------------------------

-------------------------------------------------------------

Observations Residual Std. Error $R^2$ Adjusted $R^2$

-------------- --------------------- ------- ----------------

32 0.7483 0.4339 0.4151

-------------------------------------------------------------

Table: Fitting linear model: mtcars$wt ~ mtcars$hp

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 8 / 15

Page 90: English R Lightning Talks @ BURN (2014-04-22)

What is pander?S3 method to map R objects to pretty formatted markdown

> panderOptions(’table.split.table’, Inf)

> panderOptions(’table.style’, ’grid’)

> emphasize.cells(which(iris > 1.3, arr.ind = TRUE))

> pander(iris)

+----------------+---------------+----------------+---------------+------------+

| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |

+================+===============+================+===============+============+

| *5.1* | *3.5* | *1.4* | 0.2 | setosa |

+----------------+---------------+----------------+---------------+------------+

| *4.9* | *3* | *1.4* | 0.2 | setosa |

+----------------+---------------+----------------+---------------+------------+

| *4.7* | *3.2* | 1.3 | 0.2 | setosa |

+----------------+---------------+----------------+---------------+------------+

| *4.6* | *3.1* | *1.5* | 0.2 | setosa |

+----------------+---------------+----------------+---------------+------------+

| *5* | *3.6* | *1.4* | 0.2 | setosa |

+----------------+---------------+----------------+---------------+------------+

| *5.4* | *3.9* | *1.7* | 0.4 | setosa |

+----------------+---------------+----------------+---------------+------------+

| *4.6* | *3.4* | *1.4* | 0.3 | setosa |

+----------------+---------------+----------------+---------------+------------+

| *5* | *3.4* | *1.5* | 0.2 | setosa |

+----------------+---------------+----------------+---------------+------------+

| *4.4* | *2.9* | *1.4* | 0.2 | setosa |

+----------------+---------------+----------------+---------------+------------+

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 9 / 15

Page 91: English R Lightning Talks @ BURN (2014-04-22)

What is pander?Tool for literate programming like Sweave, knitr or brew

> ?Pandoc.brew

> Pandoc.brew(text = ’

+ Pi equals to <%= pi %>, and the best damn cars are:

+ <%= head(mtcars, 2) %>

+ ’)

Pi equals to _3.142_, and the best damn cars are:

--------------------------------------------------------

&nbsp; mpg cyl disp hp drat wt

------------------- ----- ----- ------ ---- ------ -----

**Mazda RX4** 21 6 160 110 3.9 2.62

**Mazda RX4 Wag** 21 6 160 110 3.9 2.875

--------------------------------------------------------

Table: Table continues below

--------------------------------------------------

&nbsp; qsec vs am gear carb

------------------- ------ ---- ---- ------ ------

**Mazda RX4** 16.46 0 1 4 4

**Mazda RX4 Wag** 17.02 0 1 4 4

--------------------------------------------------

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 10 / 15

Page 92: English R Lightning Talks @ BURN (2014-04-22)

What is pander?Tool for literate programming like Sweave, knitr or brew

Features of Pandoc.brew:

brew loops and conditional parts of a report just like with brew,capturing plots and images with automatically applied theme,render all R objects automatically in Pandoc’s markdown,recording all warning/error messages plus the raw R objects alongwith anything printed to stdout and the printed results,custom caching mechanism to disk or RAM with auto-dependecy,convert to HTML/pdf/odt/docx at one go,no chunk options (only workaround),building reports also in interactive session with an R5 reference class.

http://rapporter.github.io/pander/#brew-to-pandoc

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 11 / 15

Page 93: English R Lightning Talks @ BURN (2014-04-22)

What is pander?Tool for literate programming like Sweave, knitr or brew

Features of Pandoc.brew:

brew loops and conditional parts of a report just like with brew,capturing plots and images with automatically applied theme,render all R objects automatically in Pandoc’s markdown,recording all warning/error messages plus the raw R objects alongwith anything printed to stdout and the printed results,custom caching mechanism to disk or RAM with auto-dependecy,convert to HTML/pdf/odt/docx at one go,no chunk options (only workaround),building reports also in interactive session with an R5 reference class.

http://rapporter.github.io/pander/#brew-to-pandoc

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 12 / 15

Page 94: English R Lightning Talks @ BURN (2014-04-22)

What is pander?Tool for literate programming like Sweave, knitr or brew – with global options

?panderOptions?evalsOptions

number formatting style (decimal mark, digits, trailing spaces etc.),date format,table formats (split, alignment, caption etc.),vector options (separator, copula, wrapper character),global graph settings for base, lattice and ggplot2 calls:

color palette, font settings, grid,legend poistion, axis labels angle etc.

plot dimensions, resolution,cache options, hooks, filter output etc.

http://rapporter.github.io/pander/#pander-options

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 13 / 15

Page 95: English R Lightning Talks @ BURN (2014-04-22)

What is pander?Tool for literate programming like Sweave, knitr or brew – with global options

?panderOptions?evalsOptions

number formatting style (decimal mark, digits, trailing spaces etc.),date format,table formats (split, alignment, caption etc.),vector options (separator, copula, wrapper character),global graph settings for base, lattice and ggplot2 calls:

color palette, font settings, grid,legend poistion, axis labels angle etc.

plot dimensions, resolution,cache options, hooks, filter output etc.

http://rapporter.github.io/pander/#pander-options

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 14 / 15

Page 96: English R Lightning Talks @ BURN (2014-04-22)

What is pander?Tool for literate programming like Sweave, knitr or brew – a quick comparison

> require(wordcloud)

> pkgs <- ctv:::.get_pkgs_from_ctv_or_repos(’ReproducibleResearch’)[[1]]

> wordcloud(pkgs, rep(1, times = length(pkgs)), colors = rainbow(length(pkgs)),

+ random.color = TRUE)

And pander is intended to be a wrapper around Pandoc,so transforming markdown files to other document formats:> ?Pandoc.convert

> Pandoc.brew(..., convert = ’(html|pdf|odt|docx)’, ...)

Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 15 / 15

Page 97: English R Lightning Talks @ BURN (2014-04-22)

Job advertismentsData Scientist Rails programmer

Requirements:* Data warehouse experience* SQL, NoSQL* Programming (e.g. Perl)* English

Advantages:* R programming* Math or insurance degree* German

Requirements:* 2 yrs of Rails experience* jQuery, Ajax* git* work without specs :)

Advantages:* stats knowledge* GH and SO activity* SaaS experience

Page 98: English R Lightning Talks @ BURN (2014-04-22)

01 László Gönczy Exploratory data analysis.

02 Gergely Horváth R workshop in Bucharest.

03 Imre Kocsis Bigvis: plotting large data in R.

04 András Tajti Changing User Roles in a Forum.

05 Dénes Tóth Dilemmas in package development.