27
Stanの事後処理 専修大学大学院文学研究科 M1 北條大樹

Stanの事後処理 LTver

Embed Size (px)

Citation preview

Page 1: Stanの事後処理 LTver

Stanの事後処理

専修大学大学院文学研究科 M1 北條大樹

Page 2: Stanの事後処理 LTver

本スライドについて

•以下を某勉強会LT用に短縮・補足/加筆したものになります

http://www.slideshare.net/daikihojo/stan-70425025

2

Page 3: Stanの事後処理 LTver

このスライドの目標

①Stan の結果から欲しいものを取り出す

Why▶Stanの結果はS4クラス・扱いづらい

②Stan (とJAGS)の結果を簡単に出力する

Why▶結果を出すのに余計な苦労はしない

3

(一部変態の方は除く)

Page 4: Stanの事後処理 LTver

S4クラスオブジェクトとは?

• The data contained in an object from an S4 class is defined by the slots in the class definition.

• Each slot in an object is a component of the object; like components (that is, elements) of a list, these may be extracted and set, using the function slot() or more often the operator "@". However, they differ from list components in important ways. First, slots can only be referred to by name, not by position, and there is no partial matching of names as with list elements.

• All the objects from a particular class have the same set of slot names; specifically, the slot names that are contained in the class definition. Each slot in each object always is an object of the class specified for this slot in the definition of the current class. The word "is" corresponds to the R function of the same name (is), meaning that the class of the object in the slot must be the same as the class specified in the definition, or some class that extends the one in the definition (a subclass).

• A special slot name, .Data, stands for the ‘data part’ of the object. An object from a class with a data part is defined by specifying that the class contains one of the R object types or one of the special pseudo-classes, matrix or array, usually because the definition of the class, or of one of its superclasses, has included the type or pseudo-class in its contains argument. A second special slot name, .xData, is used to enable inheritance from abnormal types such as "environment" See the section on inheriting from non-S4 classes for details on the representation and for the behavior of S3 methods with objects from these classes.

• Some slot names correspond to attributes used in old-style S3 objects and in R objects without an explicit class, for example, the names attribute. If you define a class for which that attribute will be set, such as a subclass of named vectors, you should include "names" as a slot. See the definition of class "namedList" for an example. Using the names() assignment to set such names will generate a warning if there is no names slot and an error if the object in question is not a vector type. A slot called "names" can be used anywhere, but only if it is assigned as a slot, not via the default names() assignment.

4

from https://stat.ethz.ch/R-manual/R devel/library/methods/html/Classes_Details.html

Page 5: Stanの事後処理 LTver

S4クラスオブジェクトとは?

• The data contained in an object from an S4 class is defined by the slots in the class definition.

• Each slot in an object is a component of the object; like components (that is, elements) of a list, these may be extracted and set, using the function slot() or more often the operator "@". However, they differ from list components in important ways. First, slots can only be referred to by name, not by position, and there is no partial matching of names as with list elements.

• All the objects from a particular class have the same set of slot names; specifically, the slot names that are contained in the class definition. Each slot in each object always is an object of the class specified for this slot in the definition of the current class. The word "is" corresponds to the R function of the same name (is), meaning that the class of the object in the slot must be the same as the class specified in the definition, or some class that extends the one in the definition (a subclass).

• A special slot name, .Data, stands for the ‘data part’ of the object. An object from a class with a data part is defined by specifying that the class contains one of the R object types or one of the special pseudo-classes, matrix or array, usually because the definition of the class, or of one of its superclasses, has included the type or pseudo-class in its contains argument. A second special slot name, .xData, is used to enable inheritance from abnormal types such as "environment" See the section on inheriting from non-S4 classes for details on the representation and for the behavior of S3 methods with objects from these classes.

• Some slot names correspond to attributes used in old-style S3 objects and in R objects without an explicit class, for example, the names attribute. If you define a class for which that attribute will be set, such as a subclass of named vectors, you should include "names" as a slot. See the definition of class "namedList" for an example. Using the names() assignment to set such names will generate a warning if there is no names slot and an error if the object in question is not a vector type. A slot called "names" can be used anywhere, but only if it is assigned as a slot, not via the default names() assignment.

5

要は複雑なオブジェクト構造

Page 6: Stanの事後処理 LTver

S4クラスオブジェクトとは?

• The data contained in an object from an S4 class is defined by the slots in the class definition.

• Each slot in an object is a component of the object; like components (that is, elements) of a list, these may be extracted and set, using the function slot() or more often the operator "@". However, they differ from list components in important ways. First, slots can only be referred to by name, not by position, and there is no partial matching of names as with list elements.

• All the objects from a particular class have the same set of slot names; specifically, the slot names that are contained in the class definition. Each slot in each object always is an object of the class specified for this slot in the definition of the current class. The word "is" corresponds to the R function of the same name (is), meaning that the class of the object in the slot must be the same as the class specified in the definition, or some class that extends the one in the definition (a subclass).

• A special slot name, .Data, stands for the ‘data part’ of the object. An object from a class with a data part is defined by specifying that the class contains one of the R object types or one of the special pseudo-classes, matrix or array, usually because the definition of the class, or of one of its superclasses, has included the type or pseudo-class in its contains argument. A second special slot name, .xData, is used to enable inheritance from abnormal types such as "environment" See the section on inheriting from non-S4 classes for details on the representation and for the behavior of S3 methods with objects from these classes.

• Some slot names correspond to attributes used in old-style S3 objects and in R objects without an explicit class, for example, the names attribute. If you define a class for which that attribute will be set, such as a subclass of named vectors, you should include "names" as a slot. See the definition of class "namedList" for an example. Using the names() assignment to set such names will generate a warning if there is no names slot and an error if the object in question is not a vector type. A slot called "names" can be used anywhere, but only if it is assigned as a slot, not via the default names() assignment.

6

fit$では取り出せない

要は複雑なオブジェクト構造

Page 7: Stanの事後処理 LTver

S4クラスオブジェクトとは?

• The data contained in an object from an S4 class is defined by the slots in the class definition.

• Each slot in an object is a component of the object; like components (that is, elements) of a list, these may be extracted and set, using the function slot() or more often the operator "@". However, they differ from list components in important ways. First, slots can only be referred to by name, not by position, and there is no partial matching of names as with list elements.

• All the objects from a particular class have the same set of slot names; specifically, the slot names that are contained in the class definition. Each slot in each object always is an object of the class specified for this slot in the definition of the current class. The word "is" corresponds to the R function of the same name (is), meaning that the class of the object in the slot must be the same as the class specified in the definition, or some class that extends the one in the definition (a subclass).

• A special slot name, .Data, stands for the ‘data part’ of the object. An object from a class with a data part is defined by specifying that the class contains one of the R object types or one of the special pseudo-classes, matrix or array, usually because the definition of the class, or of one of its superclasses, has included the type or pseudo-class in its contains argument. A second special slot name, .xData, is used to enable inheritance from abnormal types such as "environment" See the section on inheriting from non-S4 classes for details on the representation and for the behavior of S3 methods with objects from these classes.

• Some slot names correspond to attributes used in old-style S3 objects and in R objects without an explicit class, for example, the names attribute. If you define a class for which that attribute will be set, such as a subclass of named vectors, you should include "names" as a slot. See the definition of class "namedList" for an example. Using the names() assignment to set such names will generate a warning if there is no names slot and an error if the object in question is not a vector type. A slot called "names" can be used anywhere, but only if it is assigned as a slot, not via the default names() assignment.

7

fit$では取り出せない

fit@で取り出す

要は複雑なオブジェクト構造

Page 8: Stanの事後処理 LTver

いつもの

• 8schools.stan を実行

準備完了

8

library(rstan) #パッケージ読み込みrstan_options(auto_write = TRUE) #並列処理(下と合わせて)

options(mc.cores = parallel::detectCores())

dat <- list(J = 8,

y = c(28, 8, -3, 7, -1, 1, 18, 12),

sigma = c(15, 10, 16, 11, 9, 11, 10, 18))

fit<-stan(file ="8schools.stan", data = dat, iter = 1000,chains = 4)

Page 9: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

•事後分布の平均値(EAP)を取り出す

•事後分布の中央値(MED)を取り出す

• 95%信用区間を確認する

•実効サンプルサイズを確認する

• 𝑹を確認する

9

summary(fit)$summary[,"50%"]

summary(fit)$summary[,"mean"]

summary(fit)$summary[,c("2.5%", "97.5%")]

summary(fit)$summary[, "n_eff"]

summary(fit)$summary[, "Rhat"]

Page 10: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

•事後分布の平均値(EAP)を取り出す

•事後分布の中央値(MED)を取り出す

• 95%信用区間を確認する

•実効サンプルサイズを確認する

• 𝑹を確認する

10

summary(fit)$summary[,"50%"]

summary(fit)$summary[,"mean"]

summary(fit)$summary[,c("2.5%", "97.5%")]

summary(fit)$summary[, "n_eff"]

summary(fit)$summary[, "Rhat"]

Page 11: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

•事後分布の平均値(EAP)を取り出す

•事後分布の中央値(MED)を取り出す

• 95%信用区間を確認する

•実効サンプルサイズを確認する

• 𝑹を確認する

11

summary(fit)$summary[,"50%"]

summary(fit)$summary[,"mean"]

summary(fit)$summary[,c("2.5%", "97.5%")]

summary(fit)$summary[, "n_eff"]

summary(fit)$summary[, "Rhat"]

Page 12: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

•事後分布の平均値(EAP)を取り出す

•事後分布の中央値(MED)を取り出す

• 95%信用区間を確認する

•実効サンプルサイズを確認する

• 𝑹を確認する

12

summary(fit)$summary[,"50%"]

summary(fit)$summary[,"mean"]

summary(fit)$summary[,c("2.5%", "97.5%")]

summary(fit)$summary[, "n_eff"]

summary(fit)$summary[, "Rhat"]

Page 13: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

•事後分布の平均値(EAP)を取り出す

•事後分布の中央値(MED)を取り出す

• 95%信用区間を確認する

•実効サンプルサイズを確認する

• 𝑹を確認する

13

summary(fit)$summary[,"50%"]

summary(fit)$summary[,"mean"]

summary(fit)$summary[,c("2.5%", "97.5%")]

summary(fit)$summary[, "n_eff"]

summary(fit)$summary[, "Rhat"]

Page 14: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

•事後分布の平均値(EAP)を取り出す

•事後分布の中央値(MED)を取り出す

• 95%信用区間を確認する

•実効サンプルサイズを確認する

• 𝑹を確認する

14

summary(fit)$summary[,"50%"]

summary(fit)$summary[,"mean"]

summary(fit)$summary[,c("2.5%", "97.5%")]

summary(fit)$summary[, "n_eff"]

summary(fit)$summary[, "Rhat"]

Page 15: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

• 95%信用区間が0をまたいでいるか確認

※0をまたいでいれば,積が負の値になることを利用

•実効サンプルサイズを確認する

※10%以上でサンプリングに問題ないと判断する(BDA3)

• 𝑹を確認する

※1.10以下で収束していると判断する(Gelman, & Rubin, 1992; BDA3)

15

all(apply(summary(fit)$summary[,c("2.5%", "97.5%")],1,prod)>0, na.rm = T)

all(summary(fit)$summary[, "n_eff"] > 0.10, na.rm = T)

all(summary(fit)$summary[, “Rhat"] < 1.10, na.rm = T)

Page 16: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

• 95%信用区間が0をまたいでいるか確認

※0をまたいでいれば,積が負の値になることを利用

•実効サンプルサイズを確認する

※10%以上でサンプリングに問題ないと判断する(BDA3)

• 𝑹を確認する

※1.10以下で収束していると判断する(Gelman, & Rubin, 1992; BDA3)

16

all(apply(summary(fit)$summary[,c("2.5%", "97.5%")],1,prod)>0, na.rm = T)

all(summary(fit)$summary[, "n_eff"] > 0.10, na.rm = T)

all(summary(fit)$summary[, “Rhat"] < 1.10, na.rm = T)

Page 17: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

• 95%信用区間が0をまたいでいるか確認

※0をまたいでいれば,積が負の値になることを利用

•実効サンプルサイズを確認する

※10%以上でサンプリングに問題ないと判断する(BDA3)

• 𝑹を確認する

※1.10以下で収束していると判断する(Gelman, & Rubin, 1992; BDA3)

17

all(apply(summary(fit)$summary[,c("2.5%", "97.5%")],1,prod)>0, na.rm = T)

all(summary(fit)$summary[, "n_eff"] > 0.10, na.rm = T)

all(summary(fit)$summary[, “Rhat"] < 1.10, na.rm = T)

Page 18: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

• 95%信用区間が0をまたいでいるか確認

※0をまたいでいれば,積が負の値になることを利用

•実効サンプルサイズを確認する

※10%以上でサンプリングに問題ないと判断する(BDA3)

• 𝑹を確認する

※1.10以下で収束していると判断する(Gelman, & Rubin, 1992; BDA3)

18

all(apply(summary(fit)$summary[,c("2.5%", "97.5%")],1,prod)>0, na.rm = T)

all(summary(fit)$summary[, "n_eff"] > 0.10, na.rm = T)

all(summary(fit)$summary[, “Rhat"] < 1.10, na.rm = T)

Page 19: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

• “mu”に関する情報を取り出す

• “mu”と“tau”に関する情報を取り出す

• “theta[1]”~“theta[8]”に関する情報を取り出す

19

summary(fit)$summary[c("mu", "tau"),]

summary(fit)$summary["mu",]

summary(fit)$summary[paste0("theta[",1:fit@par_dims$theta,"]"),]

Page 20: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

• “mu”に関する情報を取り出す

• “mu”と“tau”に関する情報を取り出す

• “theta[1]”~“theta[8]”に関する情報を取り出す

20

summary(fit)$summary[c("mu", "tau"),]

summary(fit)$summary["mu",]

summary(fit)$summary[paste0("theta[",1:fit@par_dims$theta,"]"),]

Page 21: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

• “mu”に関する情報を取り出す

• “mu”と“tau”に関する情報を取り出す

• “theta[1]”~“theta[8]”に関する情報を取り出す

21

summary(fit)$summary[c("mu", "tau"),]

summary(fit)$summary["mu",]

summary(fit)$summary[paste0("theta[",1:fit@par_dims$theta,"]"),]

Page 22: Stanの事後処理 LTver

①Stan の結果から欲しいものを取り出す

• “mu”に関する情報を取り出す

• “mu”と“tau”に関する情報を取り出す

• “theta[1]”~“theta[8]”に関する情報を取り出す

22

summary(fit)$summary[c("mu", "tau"),]

summary(fit)$summary["mu",]

summary(fit)$summary[paste0("theta[",1:fit@par_dims$theta,"]"),]

Page 23: Stanの事後処理 LTver

②Stan (とJAGS)の結果を簡単に出力する

Stan (とJAGS)の結果を簡単出力するには??

▶最近登場の MCMCvis パッケージ

23

Page 24: Stanの事後処理 LTver

②Stan (とJAGS)の結果を簡単に出力する

最大の特徴

▶StanとJAGSのどちらも同じように結果をはき出せる

▶推定結果の比較が容易

24

mean 2.5% 50% 97.5% Rhat

mu 7.77 -1.69 7.67 17.10 1.01

tau 6.58 0.36 5.24 20.01 1.02

eta[1] 0.42 -1.52 0.42 2.15 1.00

...

eta[8] 0.07 -1.76 0.07 1.92 1.00

theta[1] 11.55 -1.19 10.28 30.59 1.01

...

theta[8] 8.27 -6.25 8.08 23.59 1.00

lp__ -39.42 -45.09 -39.19 -34.83 1.01

>MCMCsummary(fit)

Page 25: Stanの事後処理 LTver

②Stan (とJAGS)の結果を簡単に出力する

•信用区間が0を

またいでいるかどうか

判定してくれる

(任意の値に変更可)

25

>MCMCplot(fit)

Page 26: Stanの事後処理 LTver

②Stan (とJAGS)の結果を簡単に出力する

•左にtrace

右に事後分布

を描いてくれる

(JAGSの出力と同じ)

26

>MCMCtrace(fit)

Page 27: Stanの事後処理 LTver

さらなる変態へ

• Stanのデフォルト関数で様々な図が描ける

▶http://www.slideshare.net/daikihojo/stan-70425025

•もっとggplot2を簡単に書く/体裁を整える

▶ggpubr パッケージで

▶ggThemeAssist パッケージで

▶http://www.slideshare.net/daikihojo/ggplot-72164183

(国里研 LT資料)

•好きな色を指定する

▶colourpicker パッケージで

27たかみ