Stanの事後処理 LTver

Stanの事後処理

専修大学大学院文学研究科 M1 北條大樹

本スライドについて

•以下を某勉強会LT用に短縮･補足/加筆したものになります

http://www.slideshare.net/daikihojo/stan-70425025

2

このスライドの目標

①Stan の結果から欲しいものを取り出す

Why▶Stanの結果はS4クラス･扱いづらい

②Stan (とJAGS)の結果を簡単に出力する

Why▶結果を出すのに余計な苦労はしない

3

(一部変態の方は除く)

S4クラスオブジェクトとは?

• The data contained in an object from an S4 class is defined by the slots in the class definition.

• Each slot in an object is a component of the object; like components (that is, elements) of a list, these may be extracted and set, using the function slot() or more often the operator "@". However, they differ from list components in important ways. First, slots can only be referred to by name, not by position, and there is no partial matching of names as with list elements.

• All the objects from a particular class have the same set of slot names; specifically, the slot names that are contained in the class definition. Each slot in each object always is an object of the class specified for this slot in the definition of the current class. The word "is" corresponds to the R function of the same name (is), meaning that the class of the object in the slot must be the same as the class specified in the definition, or some class that extends the one in the definition (a subclass).

• A special slot name, .Data, stands for the ‘data part’ of the object. An object from a class with a data part is defined by specifying that the class contains one of the R object types or one of the special pseudo-classes, matrix or array, usually because the definition of the class, or of one of its superclasses, has included the type or pseudo-class in its contains argument. A second special slot name, .xData, is used to enable inheritance from abnormal types such as "environment" See the section on inheriting from non-S4 classes for details on the representation and for the behavior of S3 methods with objects from these classes.

• Some slot names correspond to attributes used in old-style S3 objects and in R objects without an explicit class, for example, the names attribute. If you define a class for which that attribute will be set, such as a subclass of named vectors, you should include "names" as a slot. See the definition of class "namedList" for an example. Using the names() assignment to set such names will generate a warning if there is no names slot and an error if the object in question is not a vector type. A slot called "names" can be used anywhere, but only if it is assigned as a slot, not via the default names() assignment.

4

from https://stat.ethz.ch/R-manual/R devel/library/methods/html/Classes_Details.html







5

要は複雑なオブジェクト構造







6

fit$では取り出せない








7

fit$では取り出せない

fit@で取り出す


いつもの

• 8schools.stan を実行

準備完了

8

library(rstan) #パッケージ読み込みrstan_options(auto_write = TRUE) #並列処理(下と合わせて)

options(mc.cores = parallel::detectCores())

dat <- list(J = 8,

y = c(28, 8, -3, 7, -1, 1, 18, 12),

sigma = c(15, 10, 16, 11, 9, 11, 10, 18))

fit<-stan(file ="8schools.stan", data = dat, iter = 1000,chains = 4)


•事後分布の平均値(EAP)を取り出す

•事後分布の中央値(MED)を取り出す

• 95%信用区間を確認する

•実効サンプルサイズを確認する

• 𝑹を確認する

9

summary(fit)$summary[,"50%"]

summary(fit)$summary[,"mean"]

summary(fit)$summary[,c("2.5%", "97.5%")]

summary(fit)$summary[, "n_eff"]

summary(fit)$summary[, "Rhat"]







10












11












12












13












14







• 95%信用区間が0をまたいでいるか確認

※0をまたいでいれば,積が負の値になることを利用


※10%以上でサンプリングに問題ないと判断する(BDA3)


※1.10以下で収束していると判断する(Gelman, & Rubin, 1992; BDA3)

15

all(apply(summary(fit)$summary[,c("2.5%", "97.5%")],1,prod)>0, na.rm = T)

all(summary(fit)$summary[, "n_eff"] > 0.10, na.rm = T)

all(summary(fit)$summary[, “Rhat"] < 1.10, na.rm = T)








16











17











18





• “mu”に関する情報を取り出す

• “mu”と“tau”に関する情報を取り出す

• “theta[1]”~“theta[8]”に関する情報を取り出す

19

summary(fit)$summary[c("mu", "tau"),]

summary(fit)$summary["mu",]

summary(fit)$summary[paste0("theta[",1:fit@par_dims$theta,"]"),]





20








21








22





Stan (とJAGS)の結果を簡単出力するには??

▶最近登場の MCMCvis パッケージ

23


最大の特徴

▶StanとJAGSのどちらも同じように結果をはき出せる

▶推定結果の比較が容易

24

mean 2.5% 50% 97.5% Rhat

mu 7.77 -1.69 7.67 17.10 1.01

tau 6.58 0.36 5.24 20.01 1.02

eta[1] 0.42 -1.52 0.42 2.15 1.00

...

eta[8] 0.07 -1.76 0.07 1.92 1.00

theta[1] 11.55 -1.19 10.28 30.59 1.01

...

theta[8] 8.27 -6.25 8.08 23.59 1.00

lp__ -39.42 -45.09 -39.19 -34.83 1.01

>MCMCsummary(fit)


•信用区間が0を

またいでいるかどうか

判定してくれる

(任意の値に変更可)

25

>MCMCplot(fit)


•左にtrace

右に事後分布

を描いてくれる

(JAGSの出力と同じ)

26

>MCMCtrace(fit)

さらなる変態へ

• Stanのデフォルト関数で様々な図が描ける

▶http://www.slideshare.net/daikihojo/stan-70425025

•もっとggplot2を簡単に書く/体裁を整える

▶ggpubr パッケージで

▶ggThemeAssist パッケージで

▶http://www.slideshare.net/daikihojo/ggplot-72164183

(国里研 LT資料)

•好きな色を指定する

▶colourpicker パッケージで

27たかみ

Data & Analytics

Stanの事後処理 LTver