Waikato Environment for Knowledge Analysiskuze-lab/RS2016/Weka_Lecture_Slides_161026.pdf · ・ベイジアンネットワーク Bayesian network ・ロジスティック識別 Logistic

Waikato Environment for Knowledge Analysis

http://www.cs.waikato.ac.nz/ml/weka/downloading.html

○ Weka (Waikato Environment for Knowledge Analysis) は、ニュージーランドのワイカト大学で開発した機械学習ソフトウェアで、Javaで書かれている。GNU General Public License でライセンスされているフリーソフトウェアである。

○ データ解析と予測モデリングのための視覚化ツールとアルゴリズムの集合体であり、その機能を容易に扱えるグラフィカルユーザインタフェース GUI

を備えている。Wekaには次のような利点がある。

・ GNU General Public License でライセンスされたフリーソフトウェア

・完全にJavaで実装されているので、ほとんどのプラットフォームで動作

・データプリプロセッサとモデリング技法を包括的に取り揃えている

○ データプリプロセッシング、クラスタリング、統計分類、回帰分析、視覚化、特徴選択といった標準的データマイニングタスクをサポートしている。

Weka supports standard data mining tasks

such as preprocessing, regression analysis,

visualization, and feature selection.

○ データマイニング data mining 記録の蓄積に過ぎない多量のデータから意味のある知見を発見する試み。

○ 機械学習 machine learning ビッグデータの処理にあたって、コンピュータ自身が知識や規則を発見し、それを使って推論、識別、予測を行う。

○ パターン認識 pattern recognition 人間がもつ文字や音声の認識能力を何らかの手法でモデル化し、コンピュータでそれを再現する。

○ 教師付き学習・教師なし学習 supervised / unsupervised

learning 各データについて正解が用意されている場合と用意されていない場合。

リモートセンシングでは、地上検証データ ground truth data の有無による。

○ 強化学習 reinforcement learning 最適な結果を得た場合に「報酬」として教師信号を与える。

○ 深層学習 deep learning 多層のニューラルネットワークなどを用いて生データから特徴抽出を行う。

RGBカラー合成画像

(True

Color)

◆以下に示すのは、元のカラー画像のBMPファイルからBGRの3

枚の画像を作製するcプログラム

◆画像サイズ

480×480

=== BMP file header ===

[0] B 66 [1] M 77 [2] 6 54 [3] 140 [4] 10 [5] 0 [6] 0 [7] 0 [8] 0 [9] 0

[10] 6 54 [11] 0 [12] 0 [13] 0 [14] ( 40 [←Windows] [15] 0 [16] 0 [17] 0

[18] 224 [19] 1 [<Width/3] [20] 0 [21] 0 [22] 224 [23] 1 [<- Height] [24]

0 [25] 0 [26] 1 [27] 0 [28] 24 [<- Color bit] [29] 0 [30] 0 [31] 0 [32] 0

[33] 0 [34] 0 [35] 0 [36] 0 [37] 0 [38] 196 [39] 14 [40] 0 [41] 0 [42]

196 [43] 14 [44] 0 [45] 0 [46] 0 [47] 0 [48] 0 [49] 0 [50] 0 [51] 0 [52]

0 [53] 0 (0) 0[54] ] 93

*** width = 1440 (=480*3 ： RGBの分で3倍になっている)

*** height = 480

*** pixel = 691200

(1) 104 [左下B] (2) 104 [G] (3) 124 [R] (4) 100 (5) 113 (6) 129 (7) 85 (8) 100

(9) 109 (10) 72 (11) 90 (12) 91 (13) 76 (14) 100 (15) 90 (16) 66 (17) 92 (18) 74

(19) 51 (20) 79 (21) 56 (22) 56 (23) 85 (24) 60 (25) 76 (26) 105 (27) 82 (28) 91

(29) 117 (30) 103 (31) 94 (32) 117 (33) 113 (34) 90 (35) 109 (36) 117 (37) 91

(38) 107 (39) 130 (40) 91 (41) 103 (42) 137 (43) 93 (44) 101 (45) 148 (46) 109

(47) 113 (48) 154 (49) 131 (50) 131 (51) 155 (52) 98 (53) 95 (54) 110 ......

BMPファイル

Today’s questions

1.What is “machine learning” and what is

“neural network”?

2.Describe how machine learning is

related to the analysis of remote

sensing data (or can be related to your

own research topic).

フリーソフトではじめる機械学習入門

荒木雅弘著森北出版

ISBN 978-4-627-85211-2

○クラス判別用の手法。学習データをベクトル空間上にプロットしておき、未知のデータが得られたら、そこから距離が近い順に任意のK個を取得し、多数決でデータが属するクラスを推定する。

K近傍法 KNN (K Nearest Neighbor)

○図の場合におけるクラス判別の流れ：１. 既知のデータ(学習データ)を黄色と紫の丸としてプロット。

２. Kの数を決めておく（K=3）。

３. 未知のデータとして赤い星が得られたら、近い点から３つ取得する。

４. その３つのクラスの多数決で、属するクラスを推定する。

○今の場合、未知の赤い星はClass B

に属すると推定する。

○Kの数次第で結果が変わる。たとえばK=6にすると、赤い星はClass Aと判定される。

Typically k is odd when the number of

classes is 2. Lets say k = 3 and there

are 1 instance of C1 and 2 instances

of C2. In this case , KNN says that

new point has to labeled as C2 as it

forms the majority.

主成分分析 Principal Component Analysis (PCA)

(a)Read iris.arff at the Preprocess tab.

(b)Choose PrincipalComponents in the Filter range.

(filters → unsupervised → attribute → PrincipalComponents)

(c) Push Apply button.

PC1 = -0.581petallength -0.566petalwidth -0.522sepallength + 0.263sepalwidth

PC2 = +0.926petallength +0.372petalwidth +0.065sepallength + 0.021sepalwidth

-3.298 2.765

PC1

-2.649 2.713

PC2

○ 概念学習 concept learning 「概念の例」からその概念を表す「一般的な規則」を導きカテゴリー化(categorization)すること。その概念に当てはまる事例を正事例といい、

当てはまらない事例を負事例という。すべての正事例に当てはまり、負事例には当てはまらないような概念を見つけることで、新しい事例がある概念に当てはまるかどうかを判定できるようになる。

➢ A concept is the way in which a category or class of

objects is represented mentally. Concepts allow individuals

to discern class membership or non-membership, relate

different classes of objects, and provide context for learning

new information about classes and class membership.

➢ Often the problem faced by a learning agent is expressed

as a concept learning problem, whereby the agent must learn

a concept which achieves the correct categorization. Usually,

to get a machine learning agent to learn a concept, both

positive and negative examples of the concept must be

supplied.

https://ja.wikipedia.org/wiki/決定木

決定木 Decision tree

http://www.saedsayad.com/decision_tree.htm

➢ Decision tree builds classification or regression models in the form of a

tree structure. It breaks down a dataset into smaller and smaller subsets

while at the same time an associated decision tree is incrementally

developed.

➢ The final result is a tree with decision nodes and leaf nodes. A decision

node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and

Rainy). Leaf node (e.g., Play) represents a classification or decision.

➢ The topmost decision node in a tree which corresponds to the best

predictor called root node. Decision trees can handle both categorical and

numerical data.

J48 pruned tree

------------------

outlook = sunny

| humidity = high: no (3.0)

| humidity = normal: yes

(2.0)

outlook = overcast: yes (4.0)

outlook = rainy

| windy = TRUE: no (2.0)

| windy = FALSE: yes (3.0)

Number of Leaves : 5

Size of the tree : 8

決定木 Decision Tree

(a)Read weather.nominal.arff at the Preprocess tab.

(b)Choose trees → J48 in the Classifier range.

(c) Choose “Use training set” in the Test options range.

(d) Right crick at Result list → Visualize tree

○ 3 ビットあれば、23=8 通りの状態を区別できる： (000, 001, 010, 011, 100, 101, 110, 111)

S = log2W = log2 8 = 3 Eight different events can be described using 3 bits.

○ 複数の事象の内のどれか一つだけが実現し、そのいずれもが等しい確率 P で起こるとする。実際にどれが実現したかを知ったときに得られる情報量 I は、次式で表される。

I = log2(1 / P) = −log2 P

This is the information quantity that can be obtained after

knowing that an event has occurred if the probability that any

one of the events should occur is p.

○それぞれの事象が実現する確率が Piで、それが実現したときに

−log2 Piの情報が得られるとすれば、平均情報エントロピーは次式で与えられる。

S = −∑i Pi log2 Pi Average information entropy

➢ Shannon's entropy can be thought of as a way to quantify

the potential reduction in our uncertainty once we have learnt

the outcome of the probabilistic process.

識別 Classification と回帰 Regression

<5,1, 3.5, 1.4, 0.2> -> <Iris-setosa>

<sunny, 85.0, 90, true> -> <no>

In Classification, the

answer is a label

識別の方法 Classification methods

・決定木 Decision tree

・ベイジアンネットワーク Bayesian network

・ロジスティック識別 Logistic classification

・ニューラルネットワーク Neural network

・サポートベクトルマシン Support vector machine

In Regression, the

answer is a number

事象A、Bに関する同時確率の定義 Joint Probability

𝑷(𝐀, 𝐁) = 𝑷 𝐁 𝐀 𝑷 𝐀 = 𝑷(𝐀|𝐁)𝑷 𝐁

事象A、Bに関するBayes Theorem

𝑷 𝐁 𝐀 =𝑷(𝐀|𝐁)𝑷 𝐁

𝑷 𝐀

Rain

Wet grass

𝑷 𝐑 = 𝟎. 𝟒

node

arc

node

prior probability

𝑷 𝐖|𝐑 = 𝟎. 𝟗 𝑷 𝐖|¬𝐑 = 𝟎. 𝟐

conditional prob.

The probability that the

grass is wet:

𝑷 𝐖 = 𝑷 𝐖|𝐑 𝑷 𝐑

+𝑷 𝐖|¬𝐑 𝑷 ¬𝐑

= 0.9×0.4 + 0.2×0.6

= 0.48

When it is observed that the

grass is wet, what is the

probability that it has rained?

𝑷 𝐑 𝐖 =𝑷(𝐖|𝐑)𝑷 𝐑

𝑷 𝐖

= 0.9×0.4 / 0.48 = 0.75

p.86 Weka Explorer/ BayesNet Algorithm

(1) Preprocessor: read weather.nominal.arff

(2) Classify: classifier -> bayes -> BayesNet

(3) Test options: use training set -> Start

(4) Result list region: (right crick) visualize graph

(5) (Floppy disc button) ->XML Bif files

sunny, hot, high, FALSE, no sunny, hot, high, TRUE, no overcast, hot, high, FALSE, yes rainy, mild, high, FALSE, yes rainy, cool, normal, FALSE, yes rainy, cool, normal, TRUE, no overcast, cool, normal, TRUE, yes sunny, mild, high, FALSE, no sunny, cool, normal, FALSE, yes rainy, mild, normal, FALSE, yes sunny, mild, normal, TRUE, yes overcast, mild, high, TRUE, yes overcast, hot, normal, FALSE, yes rainy, mild, high, TRUE, no

P(yes)=0.63

P(no) =0.37

0.24, 0.43, 0.33

0.54, 0.08, 0.38

sun over rain

0.35 0.65

0.75 0.25

high norm

0.35 0.65

0.58 0.42

True False

=== Confusion Matrix === a b <-- classified as 9 0 | a = yes 1 4 | b = no

Right Crick -> Show Properties

Crick

maximum Num. of parents 1 -> 2

Maximum Num. of parents = 2 === Confusion Matrix === a b <-- classified as 9 0 | a = yes 0 5 | b = no

P(yes)=0.63

P(no) =0.37

0.24 0.43 0.33 play

0.54 0.08 0.38 no play

sun overcast rain

(high normal)

(0.5 0.5) (0.5 0.5) (0.125 0.875) play

(0.83 0.17) (0.83 0.17) (0.25 0.75) no play

hot mild cool

(True False)

(0.5 0.5) (0.5 0.5) (0.125 0.875) play

(0.375 0.625) (0.5 0.5) (0.83 0.17) no play

sun overcast rain

p.89 第5章生成モデルと識別モデル Logistic classification

第5章 ~第7章数値で表される特徴ベクトル x の識別（回帰）

○ ニューラルネットワーク非線形の複雑なモデルのパラメータを調整して、学習データを再現できるようにする。

In neural network, parameters in a non-linear, complex

model are adjusted so as to reproduce the learning data.

○ サポートベクトルマシン学習データをより高次元の空間に移し、線形モデルを使ってなるべく単純に識別面を決定する。

In support vector machine (SVM), a

simple, linear model is employed to

determine the classification plain by

moving the learning data to a space

with higher dimension.

p.91 ナイーブベイズ識別 Naive Bayes classifier

○ 数値データに対して正規分布 N(m, s) を仮定する。

Normal distribution is assumed for numerical data.

p(x)=1

2𝜋𝜎exp −

(𝑥−𝜇)2

2𝜎2

○ 事後確率 P(wi | x) : 入力 x が観測されたとき、それがクラス wi である確率

Posterior probability P(wi | x) : when input x is observed, the

probability that it belongs to class wi

○ 各特長 xj がそれぞれ独立に正規分布に従うと仮定（Naive Bayes)

Each attribute xj is assumed to follow the normal distribution.

ナイーブベイズ識別の結果は、下記で与えられる：

𝐶𝑁𝐵 = arg max𝑖𝑃(𝜔𝑖) 𝑝(𝑥𝑗|𝜔𝑖)

𝑑𝑗=1

p.92 Weka Explorer/ diabetes.arff

(1) Preprocessor: read diabetes.arff

(2) Classify: classifier -> bayes -> NaiveBayes

(3) Test options: use training set

% 5. Number of Instances: 768 % 6. Number of Attributes: 8 plus class % 7. For Each Attribute: (all numeric-valued) % 1. Number of times pregnant % 2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test % 3. Diastolic blood pressure (mm Hg) % 4. Triceps skin fold thickness (mm) % 5. 2-Hour serum insulin (mu U/ml) % 6. Body mass index (weight in kg/(height in m)^2) % 7. Diabetes pedigree function % 8. Age (years) % 9. Class variable (0 or 1)

6,148,72,35,0,33.6,0.627,50,tested_positive 1,85,66,29,0,26.6,0.351,31,tested_negative 8,183,64,0,0,23.3,0.672,32,tested_positive 1,89,66,23,94,28.1,0.167,21,tested_negative 0,137,40,35,168,43.1,2.288,33,tested_positive 5,116,74,0,0,25.6,0.201,30,tested_negative 3,78,50,32,88,31,0.248,26,tested_positive 10,115,0,0,0,35.3,0.134,29,tested_negative 2,197,70,45,543,30.5,0.158,53,tested_positive

xj (j=1 – 8) class wi

Attribute tested_negative tested_positive (0.65) (0.35) ================================ preg 妊娠回数 mean 3.4234 4.9795 std. dev. 3.0166 3.6827 plas 血糖値 mean 109.9541 141.2581 std. dev. 26.1114 31.8728 pres 最低血圧 mmHg mean 68.1397 70.718 std. dev. 17.9834 21.4094 skin 上腕皮膚厚 mean 19.8356 22.2824 std. dev. 14.8974 17.6992 insu 血中インスリン濃度 mean 68.8507 100.2812 std. dev. 98.828 138.4883 mass BMI mean 30.3009 35.1475 std. dev. 7.6833 7.2537

pedi 糖尿病家系の度合 mean 0.4297 0.5504 std. dev. 0.2986 0.3715 age 年齢 mean 31.2494 37.0808 std. dev. 11.6059 10.9146 === Summary === Correctly Classified Instances 586 76.3 % Incorrectly Classified Instances 182 23.7 % === Confusion Matrix === a b <-- classified as 421 79 | a = tested_negative 103 165 | b = tested_positive

ロジスティック識別 Logistic classifier

シグモイド関数 sigmoid function

𝑦 =1

1 + 𝑒−𝑧

𝑝(⊕ |𝒙𝑖) =1

1 + exp − 𝑤0 +𝒘 ∙ 𝒙𝑖

Probability of positive class

𝑝 ⊝ 𝒙 = 1 − 𝑝(⊕ |𝒙)

Probability of negative class

識別面 classifying plain ← the position

has to be optimized

𝒘 = (𝑤1, 𝑤2, … , 𝑤𝑑)

https://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/11Optimization/gradient/

最急降下法 Steepest descent method

Minimize the following error (cost) function

𝐸 𝒘 = − 𝑦𝑖 log 𝑝(⊕ |𝒙𝑖) + (1 − 𝑦𝑖) log 𝑝(⊖ |𝒙𝑖)

𝒙𝑖∈𝐷

𝑝(⊕ |𝒙𝑖) =1

1 + exp − 𝑤0 +𝒘 ∙ 𝒙𝑖

D: Learning data 学習データ

yi=1 for positive class 正例

yi=0 for negative class 負例

Weight w (i.e. classifying line) is revised as follows:

𝑤𝑗 ← 𝑤𝑗 − 𝜂𝜕𝐸 𝒘

𝜕𝑤𝑗

𝑤𝑗 ← 𝑤𝑗 + 𝜂 [𝑦𝑖 − 𝑝(⊕ |𝒙𝑖)](𝒙𝑖)𝑗𝒙𝑖∈𝐷

h : learning rate 学習係数

(1)

Question: prove eq. (1)

p.100 Weka Explorer/ diabetes.arff

(1) Preprocessor: read diabetes.arff

(2) Classify: classifier -> functions -> Simple Logistic


=== Confusion Matrix ===

a b <-- classified as

446 54 | a = tested_negative

111 157 | b = tested_positive

=== Summary ===

Correctly Classified Instances 603 78.5 %

Incorrectly Classified Instances 165 21.5 %

Class 1 :

-4.18 +

[preg] * 0.06 +

[plas] * 0.02 +

[pres] * -0.01 +

[insu] * -0 +

[mass] * 0.04 +

[pedi] * 0.47 +

[age] * 0.01

w·x1+w0 =(0.06, 0.02, -0.01, 0, 0, 0.04, 0.47, 0.01) ·

(6, 148, 72, 35,0, 0, 33.6, 0.672, 50) - 4.18 = 0.559

𝑝 ⊕ 𝒙𝑖 =1

1 + exp −0.559 = 0.636

Thus, x1 is categorized into tested-positive with probability of 0.636.


a b <-- classified as

446 54 | a = tested_negative

111 157 | b = tested_positive

ニューラルネットワーク Neural Network

Σ

x1

x2

xd

...

𝒉 = 𝒘𝒋𝒙𝒋

𝒅

𝒋=𝟏

o = sigmoid(h)

[0, 1]

○ この図でシグモイド関数がステップ関数の場合を単純パーセプトロンという。

Simple perceptron is a calculation model when the sigmoid function of

this figure is replaced by a step function.

○ 誤差逆伝搬法は、機械学習において、ニューラルネットワークを学習させるための教師付き学習のアルゴリズムである。シグモイド関数が微分可能であるため、誤差逆伝搬法が利用できる。

Since a sigmoid function is differentiable, it can be used to establish

the back- propagation algorithm exploited for neural network.

p.111 Learning process in neural network

ℎ𝑖 = 𝑤𝑗 𝑥𝑖 𝑗

𝑑

𝑗=1

𝐸(𝒘) =𝟏

𝟐 (𝒚𝒊 − 𝒐𝒊)

𝟐𝒙𝒊∈𝑫

= 𝟏

𝟐 𝒚𝒊 −

𝟏

𝟏−exp(−𝒉𝒊)

𝟐

𝒙𝒊∈𝑫

𝑤𝑗 ← 𝑤𝑗 − 𝜂𝜕𝐸 𝒘

𝜕𝑤𝑗

The system should learn how to minimize the following error function.

D: Learning data 学習データ h : learning rate 学習係数

最急降下法 Steepest descent method

𝑤𝑗 ← 𝑤𝑗 − 𝜂 (𝑦𝑖−𝑜𝑖)𝑜𝑖(1 − 𝑜𝑖)𝑎𝑖𝑗

𝑥𝑖∈𝐷

𝑎𝑖𝑗 =𝜕ℎ𝑖𝜕𝑤𝑗

○確率的勾配降下法（オンライン学習）では、勾配aij は一つの訓練データから計算

した値で近似する。一周ごとに訓練データをシャッフルし、収束するまで訓練データ集合を何周もする。

The gradient aij is calculated for a training data in the stochastic gradient

descent (SGD) approach. The training data is selected randomly, and the

calculation is repeated many times until the convergence of wi.

p.113 Weka Explorer/ Multi Layer Perception

(1) Preprocessor: read iris.arff

(2) Classify: classifier -> functions

-> MultiLayerPercepction

(3) Crick “MultiLayerPercepction” -> Property

-> GUI true; hidenLayers a (auto) ; OK


(5) Controls: start -> accept


a b c <-- classified as

50 0 0 | a = Iris-setosa

0 49 1 | b = Iris-versicolor

0 1 49 | c = Iris-virginica

Neural Network

=== Summary ===

Correctly Classified

Instances

148 98.7 %

Incorrectly Classified

Instances

2 1.3 %

Q Analyze

diabetes.arff with

neural network

p.112 Back-propagation algorithm

Input: Learning data D with class

Output: Weight w

Assume small values for initial link w’s

repeat

for all 𝒙 ∈ 𝑫 do

𝒐 ← 𝒏𝒏(𝒘, 𝒙) /* error 𝜹𝒌 is calculated for the 𝒌-th unit of the output layer

𝜹𝒌 ← 𝒐𝒌 𝟏 − 𝒐𝒌 𝒚𝒌 − 𝒐𝒌 /* error 𝜹𝒉 is calculated for the 𝒉-th unit of the middle layer

𝜹𝒉 ← 𝒐𝒉 𝟏 − 𝒐𝒉 𝒘𝒌𝒉𝜹𝒌𝒌∈𝐨𝐮𝐭𝐩𝐮𝐭𝐬

/* revise the weights

/* output layer

𝒘𝒉𝒌← 𝒘𝒉𝒌 + 𝜼𝜹𝒌𝒂𝒉 /* middle layer

𝒘𝒊𝒉← 𝒘𝒊𝒉 + 𝜼𝜹𝒉𝒙(𝒊)

end for

until the correction becomes small enough (or preset iteration #)

Report: deadline November 9 (Wed) (no lecture on Nov. 2)

1. Explain how the consideration on “information entropy” is related

to the formation of “decision tree”. Explain the advantage and

disadvantage of decision tree classifier.

2. Discuss how the probability is considered in the “naive Bayesian”

and “Bayesian network” classifiers.

3. Explain the mathematical background of “logistic classifier”.

4. Explain the mathematical background of “neural network”.

Today’s questions

Q1 What are the important aspects of logistic classifier and

neural network?

Q2 What’s new?

Fiji – ImageJ Trainable Weka Segmentation

○ https://fiji.sc/

○ Fiji is licensed under the GNU General Public License. It builds on top of

the ImageJ2 core, which is licensed under the permissive BSD 2-clause

license. Plugins and other components have their own licenses.

○ Fiji is developed by contributors around the world. It is maintained by

Curtis Rueden and the ImageJ development team at the Laboratory for

Optical and Computational Instrumentation (LOCI) at the University of

Wisconsin-Madison.

○ The Fiji project is driven by a strong desire to

improve the tools available for life sciences to

process and analyze data. To this end, Fiji

collaborates closely with the following projects:

Documents

Waikato Environment for Knowledge Analysiskuze-lab/RS2016/Weka_Lecture_Slides_161026.pdf · ・ベイジアンネットワーク Bayesian network ・ロジスティック識別 Logistic