授業計画と単位認定（第7回）dogra.csis.oita-u.ac.jp/tkm/lecture/sim2011/2011_07/sim...講義資料： 2 授業計画と単位認定 ! 授業の予定! 第1回：仮想実験とは何か、授業の目的と計画、参考書など!

2011年11月25日

2011年度後期

仮想実験特論（第７回）システム情報科学府　情報学専攻　アドバンス科目　　　　　　　　　担当：高見　利也

1

授業担当

! 氏名：高見　利也 (たかみ　としや)! 電子メール： [email protected]! 所属：情報基盤研究開発センター! 居室：箱崎地区　情報基盤研究開発センター棟 604号室伊都地区　West２号館10階1006号室 [ただし金曜のみ]

! 専門分野：非線形物理学、大規模並列計算! WebPage： http://dogra.cc.kyushu-u.ac.jp/tkm講義資料：　http://dogra.cc.kyushu-u.ac.jp/tkm/lecture/sim

2

授業計画と単位認定! 授業の予定

! 第１回：仮想実験とは何か、授業の目的と計画、参考書など! 第２回～５回：様々な分野のシミュレーション! 第６回～９回：シミュレーションの要素技術! 第10回～13回：科学シミュレーションの実際! 第14回、15回：まとめなど

! 単位認定の基準! 出席点４割、レポート６割! 試験はなし

! レポートについて! 合計４回を予定。電子メールで提出すること。

3

先週の内容: 要素技術(1)! 計算機の知識とアルゴリズム

! 計算量! 高速アルゴリズム

! sorting! FFT

! 疑似乱数発生! 計算例

! 二次元ビリヤード系の固有値統計! 固有状態展開とフーリエ変換

!!

!2

!x2+

!2

!y2

"f(x, y) = Ef(x, y)

f(x, y) = 0

P(S)

Spacing

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.0 1.0 2.0 3.0 4.0 5.0

exp(-S)

4

今日の内容

! ４回に渡ってシミュレーションの要素技術を解説する２回め。今回は、線形計算、行列計算。

! 内容は以下の通り! 線形方程式の解法

! 計算例：温度分布の境界値問題! 計算例：二次元流体の定常状態

! 行列の対称性と固有値計算法! 計算例：ランダム行列の固有値統計

5

線形方程式の解法(1)! LU分解 [ を利用しと解く]! 線形方程式の解法のうちGauss消去法とLU分解については、他の講義などで詳しく扱われているので、ここでは割愛する。

! Linpackベンチマーク! 年に２回発表されるTop500リストのもととなるベンチマーク! もともとはJack Dongarra氏(テネシー大)らが作った線形ライブラリだが、その中のGauss消去法の計算で数値演算性能を測定するプログラムを公開した。

! ライブラリの部分は、現在では LAPACK として継続。

Ax = b !" Ux = L!1bA = LU

6

線形方程式の解法(2)! 繰り返し法でを解く。ただし、

! Jacobi法：

! Gauss-Seidel法：

! 収束：の時、x(j) は解になっている。! Jacobi法は、並列計算向き。Gauss-Seidel法の方が収束が速い。

! SOR法 (0 < w < 2)：

! Conjugate Gradient法

Ax = b A = L + D + U

Dx(j+1) = b! (L + U)x(j)

(D + U)x(j+1) = b! Lx(j)

x(j+1) =! x(j)

x(j+1) = x(j) + w!x(j+1)

GS ! x(j)"

= w x(j+1)GS + (1! w)x(j)

7

応用: 温度分布の境界値問題(1)! 練習問題として、Laplace方程式の境界値問題を考える。

! 内部の温度を一列に並べて表現すると、

! 解くべき方程式は Au=b 。ただし、

!2u(x, y) = 0

1.2 回答例例えば、以下の手順で考察し、プログラムを作成すれば良い。「ある区画の値が隣接する上下左右の平均となる」ことから、横方向と縦方向にそれぞれ番号を付け、i、j 番目の区画の温度を u(i, j)と表す時、解くべき方程式は、

u(i, j) =u(i ! 1, j) + u(i + 1, j) + u(i, j ! 1) + u(i, j + 1)

4(1)

である。これが内部区画の i と j (i, j = 1, ..., N ! 2)に対して満たされている状態が解である。これを、問題に従って「温度を更新して行く」形で解くために、n 回解いた後の状態を u(n)(i, j)

と書き、入力データから最初の状態u(0)(i, j) = (内部区画の [初期]温度) (1 " i " N ! 2 かつ 1 " j " N ! 2) (2)

を与えることとする。また、一番外側の区画の値に関しては、すべての n に対して、u(n)(i, 0) = (上辺の温度) (3)

u(n)(i, N ! 1) = (下辺の温度) (4)

u(n)(0, j) = (左辺の温度) (5)

u(n)(N ! 1, j) = (右辺の温度) (6)

と定義する。境界条件 (3)～(6)のもとで、初期条件 (2)から解くには、

u(n+1)(i, j) =u(n)(i ! 1, j) + u(n)(i + 1, j) + u(n)(i, j ! 1) + u(n)(i, j + 1)

4(7)

と定義して、{u(n)(i, j)} から順に {u(n+1)(i, j)} を計算して行けば良い。例題を、0.1 の終了条件で解く場合は、8回でループが止まり、誤差の範囲内でサンプルの問題

と同じであることが確認できる。

2 問題サイズと繰り返し回数計算区画の数を表すN のことを、問題サイズと呼ぶことがある。例題はN = 5であるが、実際の問題においては、滑らかで連続的な温度分布になっていると考えられるため、N = 10, 20, 50, 100, ...などと十分大きな値で計算することになる。また、終了条件の !T = 0.1 についても、ほとんど数値の変動がなくなるまで小さな値 (例えば !T = 1.0 # 10!10)にすることが可能である。この場合、プログラムは単純なループ構造をしているために、繰り返しの1回分に必要な計算

量 (浮動小数点演算の回数を数える)は、各区画に対する計算量 (足し算 3回、かけ算 1回の他、誤差評価のための引き算) の区画数 (N # N) 倍になる。詳細は省いて、問題サイズ N に対する字数だけを問題にすることが多く、この問題に対しては、O(N2) の計算量であるといえる。ただし、終了条件を満たすまでの全計算量は、繰り返し回数に依存することに注意する。そこでまず、いくつかの問題サイズに関して、各ステップでの温度変動がどのように小さくなっ

て行くかをグラフにしてみることとする。図 1(a) に、N = 5 (赤)、10 (緑)、20 (青)、50 (黄)、100 (紫)に対する、繰り返し回数と温度変化のグラフを示す。問題サイズが大きくなると、収束までの回数も増えることがわかる。収束までの繰り返し回数を短くすることは出来ないのだろうか？この問題を考えるために、次

節ではこの問題を一般的な線形方程式として考察する。

2.1 線形計算N = 5 の場合に、内部区画 (i, j) (i, j = 1, 2, 3)の温度 u(i, j) を、k = i ! 1 + 3 # (j ! 1) 番目に並べることとして (k = 0 が最初の要素)、区画の温度の状態を

u $ (u(1, 1) u(2, 1) u(3, 1) u(1, 2) u(2, 2) u(3, 2) u(1, 3) u(2, 3) u(3, 3))T (8)

というベクトルで表すことにする。ここでは、(· · ·)T という転置記号を使って、列ベクトルを行ベクトルの転置の形で表している。同様にして式 (1)を線形方程式の形に書き直すと、境界条件 (上辺 Tu、下辺 Td、左辺 Tl、右辺 Tr)に注意して、!

""""""""""#

u(1, 1)u(2, 1)u(3, 1)u(1, 2)u(2, 2)u(3, 2)u(1, 3)u(2, 3)u(3, 3)

$

%%%%%%%%%%&

=14

!

""""""""""#

0 1 0 1 01 0 1 0 10 1 0 0 0 11 0 0 0 1 0 1

1 0 1 0 1 0 11 0 1 0 0 0 1

1 0 0 0 1 01 0 1 0 1

0 1 0 1 0

$

%%%%%%%%%%&

!

""""""""""#

u(1, 1)u(2, 1)u(3, 1)u(1, 2)u(2, 2)u(3, 2)u(1, 3)u(2, 3)u(3, 3)

$

%%%%%%%%%%&

+14

!

""""""""""#

Tu + Tl

Tu

Tu + Tr

Tl

0Tr

Tl + Td

Td

Tr + Td

$

%%%%%%%%%%&

(9)

2

Tu

Tl

Td

Tr

(1,1) (2,1) (3,1)

(1,2) (2,2) (3,3)

(1,3) (2,3) (3,3)

(a)

Delt

a

The Number of Iterations

0.001

0.01

0.1

1.0

10.0

1 10 100 1000

(b)

Delt

a


0.001

0.01

0.1

1.0

10.0

1 10 100 1000

Figure 1: 問題サイズを大きくした場合の収束の様子: (a) Jacobi法, (b) Gauss-Seidel法。赤、緑、青、黄、紫の順に N = 5, 10, 20, 50, 100 の場合で、横軸に繰り返し回数、縦軸に温度の変化量の最大値を示す。

と書ける。つまり、区画の温度分布 u(i, j) を求めることは、

Au = b (10)

という線形方程式を u について解くことに相当する。ただし、

A =

!

""""""""""#

4 !1 0 !1 0!1 4 !1 0 !10 !1 4 0 0 !1!1 0 0 4 !1 0 !1

!1 0 !1 4 !1 0 !1!1 0 !1 4 0 0 !1

!1 0 0 4 !1 0!1 0 !1 4 !1

0 !1 0 !1 4

$

%%%%%%%%%%&

, b =

!

""""""""""#

Tu + Tl

Tu

Tu + Tr

Tl

0Tr

Tl + Td

Td

Tr + Td

$

%%%%%%%%%%&

(11)

である。

2.1.1 Jacobi法では、例題の繰り返し計算による解法は、どのような方法であったのかを考える。A という行列を、対角行列 D (Dk = 4)と、下三角行列 L、上三角行列 U に分解すると、A = L + D + U と書くことが出来る。これらの行列を使うと、例題の解法 (u(n) から u(n+1) を求める)は、

u(n+1) = D!1'b ! (L + U)u(n)

((12)

と書くことが出来るので、

u(0) = (T0 T0 T0 T0 T0 T0 T0 T0 T0)T (13)

を初期値として、順に u(n) を計算していたのである。線形方程式のこのような解法は、Jacobi法と呼ばれている。正定値かつ対角優位な行列 (すべ

ての i について |Aii| >)

j "=i|Aij |)の場合、この方法による繰り返し計算で収束することが知ら

れているが、この問題では、一部の行、または、列で非対角要素の絶対値の和が対角要素の絶対値に等しい部分があるため、必ずしも収束するとは限らない。この数値計算では、N が大きい時に収束が遅くなる場合が有る。

2.1.2 Gauss-Seidel法Jacobi法の計算をする代わりに、よく似た漸化式

u(n+1) = (D + U)!1'b ! Lu(n)

((14)

を計算する方法を Gauss-Seidel法と呼ぶ。上三角行列の逆行列 (D + U)!1 を用いて計算しているが、これは簡単に計算できるため、計算量としては Jacobi法とほとんど変わらない。収束性の条件についても、正定値かつ対角優位の同じ条件ながら、実際の問題に対しては Jacobi法に比べて早く収束することが多い。

Jacobi法と同様に、繰り返し回数と残差をプロットすると、図1(b)に示す通りとなる。Jacobi法の結果 (図 1(a))と比較すると、Gauss-Seidel法の繰り返し回数の方が小さいことがわかる。

3

N=5 の時

8

応用: 温度分布の境界値問題(2)! 線形解法による違い

! Nを大きくしても、SOR法とCG法は実用的な回数で収束

(a)

Delt

a


0.001

0.01

0.1

1.0

10.0

1 10 100 1000

(b)

Delt

a


0.001

0.01

0.1

1.0

10.0

1 10 100 1000

Figure 1: 問題サイズを大きくした場合の収束の様子: (a) Jacobi法, (b) Gauss-Seidel法。赤、緑、青、黄、紫の順に N = 5, 10, 20, 50, 100 の場合で、横軸に繰り返し回数、縦軸に温度の変化量の最大値を示す。

と書ける。つまり、区画の温度分布 u(i, j) を求めることは、

Au = b (10)

という線形方程式を u について解くことに相当する。ただし、

A =

!

""""""""""#

4 !1 0 !1 0!1 4 !1 0 !10 !1 4 0 0 !1!1 0 0 4 !1 0 !1

!1 0 !1 4 !1 0 !1!1 0 !1 4 0 0 !1

!1 0 0 4 !1 0!1 0 !1 4 !1

0 !1 0 !1 4

$

%%%%%%%%%%&

, b =

!

""""""""""#

Tu + Tl

Tu

Tu + Tr

Tl

0Tr

Tl + Td

Td

Tr + Td

$

%%%%%%%%%%&

(11)

である。

2.1.1 Jacobi法では、例題の繰り返し計算による解法は、どのような方法であったのかを考える。A という行列を、対角行列 D (Dk = 4)と、下三角行列 L、上三角行列 U に分解すると、A = L + D + U と書くことが出来る。これらの行列を使うと、例題の解法 (u(n) から u(n+1) を求める)は、

u(n+1) = D!1'b ! (L + U)u(n)

((12)

と書くことが出来るので、

u(0) = (T0 T0 T0 T0 T0 T0 T0 T0 T0)T (13)

を初期値として、順に u(n) を計算していたのである。線形方程式のこのような解法は、Jacobi法と呼ばれている。正定値かつ対角優位な行列 (すべ

ての i について |Aii| >)

j "=i|Aij |)の場合、この方法による繰り返し計算で収束することが知ら

れているが、この問題では、一部の行、または、列で非対角要素の絶対値の和が対角要素の絶対値に等しい部分があるため、必ずしも収束するとは限らない。この数値計算では、N が大きい時に収束が遅くなる場合が有る。

2.1.2 Gauss-Seidel法Jacobi法の計算をする代わりに、よく似た漸化式

u(n+1) = (D + U)!1'b ! Lu(n)

((14)

を計算する方法を Gauss-Seidel法と呼ぶ。上三角行列の逆行列 (D + U)!1 を用いて計算しているが、これは簡単に計算できるため、計算量としては Jacobi法とほとんど変わらない。収束性の条件についても、正定値かつ対角優位の同じ条件ながら、実際の問題に対しては Jacobi法に比べて早く収束することが多い。

Jacobi法と同様に、繰り返し回数と残差をプロットすると、図1(b)に示す通りとなる。Jacobi法の結果 (図 1(a))と比較すると、Gauss-Seidel法の繰り返し回数の方が小さいことがわかる。

3

Th

e N

um

ber

of

Iter

ati

on

s

Omega

N=5

N=10

N=20

N=50

N=100

10

100

1000

10000

100000

1.0 1.2 1.4 1.6 1.8 2.0

Figure 2: SOR法の加速パラメタと収束までの計算回数

2.1.3 SOR法SOR (Successive Over-Relaxation) 法とは、日本語では逐次加速緩和法と訳されるが、Gauss-Seidel法の収束を速める手法の一つである。Gauss-Seidel法による近似解を u(k+1) と書くことにすると、

u(k+1) = (D + U)!1!b ! Lu(k)

"= D!1

!b ! Uu(k+1) ! Lu(k)

"(15)

である。通常は、これをそのまま k + 1 回目の状態とするのであるが、k 回目との変化量を ! でスケールして、

u(k+1) " u(k) + !!u(k+1) ! u(k)

"(16)

という繰り返しを考える。ここで ! は、問題に応じて与えられるパラメータで、反復行列のスペクトル半径から、収束には 0 < ! < 2 が必要であることがわかる。また、! = 1 は Gauss-Seidel法と等価である。これにより繰り返し回数を短縮することが出来、適切に ! の値を設定すれば、Gauss-Seidel法

に比べて遥かに高速に収束させることも可能である。また、通常の Gauss-Seidel法で収束しない問題においても、! の値を選べば収束させられる場合も有る。ここでは、! の値に応じて収束回数がどのようになるかを調べるため、徐々に値を変化させて

収束回数をプロットしておく (図 2)。二次元 Poisson方程式を N 離散化 (値を固定した境界を除くと N ! 2)した問題に対する ! の最適値は、

! =2

1 + sin["/(N ! 1)](17)

となる (岩波講座応用数学「線形計算」(1992年)参照)ので (この最適値は、図 2の繰り返し回数が最小になる付近に一致している)、図 3(a)に、この値を採用した場合の、毎回の温度変化量 (の最大値)の収束の様子を示す。

2.2 Conjugate Gradient

ベクトル列 {xk} が、正定値対称行列 A を介して、互いに共役関係 xiT Axj = 0 (i #= j の時)に

ある時、これを使って線形方程式 (10)の解は、

un =

n#

k=1

#kxk (18)

ただし、xk

T Aun = #kxkT Axk = xk

T b $% #k =xk

T bxk

T Axk(19)

と書ける。一方、u は二次形式f(u) " 1

2uT Au ! bT u (20)

を最小化するベクトルであるから、ある近似解 uk の場所での f(uk) の勾配

&uf = Auk ! b (21)

の方向近傍に共役ベクトル xk+1 を導入すれば、近似解 uk を徐々に真の解 u に近づけられる。

4

(a)

Delt

a


0.001

0.01

0.1

1.0

10.0

1 10 100 1000

(b)

Delt

a


0.001

0.01

0.1

1.0

10.0

1 10 100 1000

Figure 3: 繰り返し法の問題サイズ依存性: (a) SOR 法、(b) CG 法 [赤、緑、青、黄、紫の順に N = 5, 10,20, 50, 100 を表す]

初期状態として u0 = x0 = 0 から始める場合、この場所での勾配の方向から x1 = r0 = b と定義される。一方、与えられたベクトル a ( != 0)を、初期基底ベクトル x1 = a として開始する場合は、u0 = 0、r0 = b としておく。新たな共役ベクトル xk が導入されたら、xk

T Auk = xkT b を満たすように次の近似解

uk = uk!1 + !kxk (22)

を定義するとき、rk!1 " b # Auk!1 より

!k =xk

T bxk

T Axk=

xkT rk!1

xkT Axk

(23)

である。すると、この近似解による残差は、同じ係数 !k を使って、

rk " b # Auk = rk!1 # !kAxk (24)

と表せる。uk での式 (20)の勾配は #rk だから、新たな共役ベクトル xk+1 を

xk+1 = rk + "kxk (25)

と定義すると、係数 "k は、共役条件から

xkT Axk+1 = xk

T Ark + "kxkT Axk = 0 $% "k = # xk

T Ark

xkT Axk

(26)

と求められる。以上の手順を使って繰り返し uk を求めれば、徐々に残差ベクトル rk を小さくすることが出

来、最終的に線形方程式 (10)の解を得ることが出来る。このようにして線形方程式を解く手法は、共役勾配法 (conjugate gradient Method) と呼ばれる。もともとの問題が N & N 正定値対称行列 A を対象としていることから、実は共役なベクトルは高々 N までしかなく、この手順は、理想的には (数値誤差がなければ)、行列の次数までの有限回数で収束することが知られている。CG法で、温度分布の問題を解いた場合の収束の様子については、図 3(b)に示す。

2.2.1 LU分解この問題で解いているのは単純な線形方程式 (10) であるから、行列 A を保存する領域さえ確保できれば、これを LU分解する方法でも u を求めることが可能である。A は N2 & N2 の実対称行列であるが、対角要素からN 程度離れるとすべて 0 となるため、LU分解の計算量も、全要素が非ゼロの場合に比べて少なくなる。各行の非ゼロ要素の数は、対角要素の両側に N # 2 ずつであるから、大まかに計算量を見積もると、前進消去では

2(N # 2) & (N # 2) & (N # 2)2 ' 2(N # 2)4 (27)

程度となり、後退代入では、

2(N # 2) & (N # 2)2 ' 2(N # 2)3 (28)

程度である。このように行列を操作する方法では、Jacobi法と Gauss-Seidel法では問題にならなかった保

存領域についての問題が生ずる。なぜなら、LU分解などの行列計算を利用して解くためには、区画の情報だけでなく、操作する行列の情報を保存しなくてはならないからである。区画の情報は、

5

9

応用：Lid-driven Cavity (1)! 二次元Lid-driven Cavity問題

! 右の図のような二次元領域で、上のふたが右方向に動いている時、内部の流体は？

! 非圧縮Navier-Stokes方程式

! 流れ関数と渦度

! 最終的に解くべき方程式は、

754 E. ERTURK, T. C. CORKE AND C. G !OKC" !OL

Primary Vortex

u=1 ! v=0

u=0

v

=0

u=0 v=0

u=0 v=0

BR1BR2

BR3

BL1

BL2

BL3

TL1

TL2

Figure 1. Schematic view of driven cavity #ow.

where subscript 0 refers to points on the wall and 1 refers to points adjacent to the wall, $hrefers to grid spacing and U refers to the velocity of the wall with being equal to 1 on themoving wall and 0 on the stationary walls.We note that, it is well understood [28, 32, 35, 36] that, even though Thom’s method is

locally %rst order accurate, the global solution obtained using Thom’s method preserves secondorder accuracy. Therefore in this study, since three point second order central di&erence is usedinside the cavity and Thom’s method is used at the wall boundary conditions, the presentedsolutions are second order accurate.During our computations we monitored the residual of the steady streamfunction and vor-

ticity Equations (1) and (2) as a measure of the convergence to the steady state solution,where the residual of each equation is given as

R = n+1i!1; j ! 2 n+1

i; j + n+1i+1; j

$x2+

n+1i; j!1 ! 2 n+1

i; j + n+1i; j+1

$y2+!n+1

i; j (22)

R! =1Re

!n+1i!1; j ! 2!n+1

i; j +!n+1i+1; j

$x2+1Re

!n+1i; j!1 ! 2!n+1

i; j +!n+1i; j+1

$y2

! n+1i; j+1 ! n+1

i; j!1

2$y!n+1

i+1; j ! !n+1i!1; j

2$x+

n+1i+1; j ! n+1

i!1; j

2$x!n+1

i; j+1 ! !n+1i; j!1

2$y(23)

The magnitude of these residuals is an indication of the degree to which the solution hasconverged to steady state. In the limit these residuals would be zero. In our computations, forall Reynolds numbers, we considered that convergence was achieved when for each Equations(22) and (23) the maximum of the absolute residual in the computational domain (max(|R |)

Copyright ? 2005 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids 2005; 48:747–774

流れ関数と渦度による二次元非圧縮流体の数値計算九州大学情報基盤研究開発センター

高見利也

May 29, 2010

1 Introduction

二次元非圧縮流体の計算は、比較的容易に実装でき、計算量も少ないため、計算と同時に表示するなど教育的な効果も期待できる。ここでは、基本的な手法の一つである流れ関数 (stream function)と渦度 (vorticity)による表示を使って数値計算を行う方法について概略をまとめ、数値計算結果について述べる。

2 Navier-Stokes Equation

非圧縮の Newton流体に対して、連続の式と運動量保存の式から得られる方程式は、

! · u = 0 (1)

!DuDt

= !F "!p + µ!2u (2)

ただし、u は流速 (m/s)、F は外力 (N)、p は圧力 (Pa)、! は密度、µ は粘性係数である。Lagrange微分は

D

Dt# "

"t+ u ·! (3)

であるから、全体を ! で割って、外力がない場合 (F = 0)を考えると、

"u"t

+ u ·!u = "!p! +1

Re!2u (4)

となる。ただし、p! # p/! は規格化した圧力、Re = 1/# = !/µ は Reynolds数である。

3 Stream function and vorticity

ここでは二次元流体を考えることとし、流速 u を流れ関数 (stream function) $(x, y) によって

u = !$ (0, 0,$) =!

"$

"y,""$

"x, 0

"(5)

と定義する。この時、非圧縮性を表す Eq.(1)は自動的に満たされる。また、渦度を

w # !$ u (6)

と置くと、二次元の場合 (u = (u, v, 0))の渦度の z 成分 % は

% ="v

"x" "u

"y= "

!"2

"x2+

"2

"y2

"$ (7)

となる。Navier-Stokes方程式 (4)の y 成分の x 微分から x 成分の y 微分を引き算すると、圧力 p! の項が消

えて、渦度輸送方程式"%

"t+ u ·!% =

1Re

!2% (8)

1


高見利也

May 29, 2010

1 Introduction




! · u = 0 (1)

!DuDt

= !F "!p + µ!2u (2)


D

Dt# "

"t+ u ·! (3)


"u"t

+ u ·!u = "!p! +1

Re!2u (4)




u = !$ (0, 0,$) =!

"$

"y,""$

"x, 0

"(5)


w # !$ u (6)


% ="v

"x" "u

"y= "

!"2

"x2+

"2

"y2

"$ (7)



"t+ u ·!% =

1Re

!2% (8)

1


高見利也

May 29, 2010

1 Introduction




! · u = 0 (1)

!DuDt

= !F "!p + µ!2u (2)


D

Dt# "

"t+ u ·! (3)


"u"t

+ u ·!u = "!p! +1

Re!2u (4)




u = !$ (0, 0,$) =!

"$

"y,""$

"x, 0

"(5)


w # !$ u (6)


% ="v

"x" "u

"y= "

!"2

"x2+

"2

"y2

"$ (7)



"t+ u ·!% =

1Re

!2% (8)

1

が得られる。流れ関数 ! と渦度 " の連立方程式として書き直すと、!2! = "" (9)#"

#t=

1Re

!2" +#!

#x

#"

#y" #!

#y

#"

#x(10)

という形になる。これらは、二次元非圧縮流体に対しては、Navier-Stokes方程式と等価なものである。

4 Steady State Solution

ここでは、定常解を求めるために、Erturk, et al. [1]の方法を採用する。この方法では、擬似的な時間発展を行いながら、t # $ で定常解に収束させる。定常解に対しては、

#!

#t=

#"

#t= 0 (11)

であるから、擬似的な時間発展を考えて、#!

#t= !2! + ", (12)

#"

#t=

1Re

!2" +#!

#x

#"

#y" #!

#y

#"

#x(13)

という連立微分方程式が得られる。ここで、擬似的な時間発展に関しては、時間積分の次数は問題ではなく、安定な収束解が求まればこれが定常解になることがわかる。そこで、陰的Euler法を使って時間積分を実行し、収束するまで繰り返し計算すれば良い。

4.1 Finite Di!erence

時間について差分をとると、!1 " !t!2

"!n+1 = !n + !t"n (14)

#1 " !t

Re!2 " !t

#!

#x

n #

#y+ !t

#!

#y

n #

#x

$"n+1 = "n (15)

これは、!t が充分小さければ、対角 dominantな係数行列になるため、安定に計算を進められる。空間の離散表現としては等間隔meshを採用し、微分演算子を差分で表現する。x、y 方向について、

それぞれ Nx + 1、Ny + 1 離散点で表すとき、mesh間隔は !x = 1/Nx、!y = 1/Ny である。一階微分については中間差分とし、

#!

#x=

!i+1,j " !i!1,j

2!x,

#!

#y=

!i,j+1 " !i,j!1

2!y, (16)

#"

#x=

"i+1,j " "i!1,j

2!x,

#"

#y=

"i,j+1 " "i,j!1

2!y(17)

二階微分についても#2!

#x2=

!i+1,j + !i!1,j " 2!i, j

!x2,

#2!

#y2=

!i,j+1 + !i,j!1 " 2!i,j

!y2, (18)

#2"

#x2=

"i+1,j + "i!1,j " 2"i,j

!x2,

#2"

#y2=

"i,j+1 + "i,j!1 " 2"i,j

!y2(19)

として、各格子点の両側の値 (x、y 方向と中心、合わせて 5点を使う)、解くべき方程式の差分表現は、%1 + 2!t

#1

!x2+

1!y2

$&!n+1

i,j " !t

'!n+1

i+1,j + !n+1i!1,j

!x2+

!n+1i,j+1 + !n+1

i,j!1

!y2

(

= !ni,j + !t "n

i,j (20)%1 +

2!t

Re

#1

!x2+

1!y2

$&"n+1

i,j " !t

Re

'"n+1

i+1,j + "n+1i!1,j

!x2+

"n+1i,j+1 + "n+1

i,j!1

!y2

(

"!t

)(!n

i+1,j " !ni!1,j)("

n+1i,j+1 " "n+1

i,j!1) " (!ni,j+1 " !n

i,j!1)("n+1i+1,j " "n+1

i!1,j)*

4!x!y= "n

i,j (21)

2

E. Erturk, T. C. Corke, and C. Gökçöl,

“Numerical solutions of 2-D steady incompressible

driven cavity flow at high Reynolds numbers,”

Int. J. Numer. Meth. Fluids 48, 747–774 (2005).

10

応用：Lid-driven Cavity (2)! 定常問題を、擬似的な時間発展により解く：

! 空間・時間を離散化して数値計算：

! Jacobi法と、Gauss-Seidel法、SOR法、CG法


#t=

1Re

!2" +#!

#x

#"

#y" #!

#y

#"

#x(10)




#!

#t=

#"

#t= 0 (11)


#t= !2! + ", (12)

#"

#t=

1Re

!2" +#!

#x

#"

#y" #!

#y

#"

#x(13)




"!n+1 = !n + !t"n (14)

#1 " !t

Re!2 " !t

#!

#x

n #

#y+ !t

#!

#y

n #

#x

$"n+1 = "n (15)



#!

#x=

!i+1,j " !i!1,j

2!x,

#!

#y=

!i,j+1 " !i,j!1

2!y, (16)

#"

#x=

"i+1,j " "i!1,j

2!x,

#"

#y=

"i,j+1 " "i,j!1

2!y(17)


#x2=

!i+1,j + !i!1,j " 2!i, j

!x2,

#2!

#y2=

!i,j+1 + !i,j!1 " 2!i,j

!y2, (18)

#2"

#x2=

"i+1,j + "i!1,j " 2"i,j

!x2,

#2"

#y2=

"i,j+1 + "i,j!1 " 2"i,j

!y2(19)


#1

!x2+

1!y2

$&!n+1

i,j " !t

'!n+1

i+1,j + !n+1i!1,j

!x2+

!n+1i,j+1 + !n+1

i,j!1

!y2

(

= !ni,j + !t "n

i,j (20)%1 +

2!t

Re

#1

!x2+

1!y2

$&"n+1

i,j " !t

Re

'"n+1

i+1,j + "n+1i!1,j

!x2+

"n+1i,j+1 + "n+1

i,j!1

!y2

(

"!t

)(!n

i+1,j " !ni!1,j)("

n+1i,j+1 " "n+1

i,j!1) " (!ni,j+1 " !n

i,j!1)("n+1i+1,j " "n+1

i!1,j)*

4!x!y= "n

i,j (21)

2


#t=

1Re

!2" +#!

#x

#"

#y" #!

#y

#"

#x(10)




#!

#t=

#"

#t= 0 (11)


#t= !2! + ", (12)

#"

#t=

1Re

!2" +#!

#x

#"

#y" #!

#y

#"

#x(13)




"!n+1 = !n + !t"n (14)

#1 " !t

Re!2 " !t

#!

#x

n #

#y+ !t

#!

#y

n #

#x

$"n+1 = "n (15)



#!

#x=

!i+1,j " !i!1,j

2!x,

#!

#y=

!i,j+1 " !i,j!1

2!y, (16)

#"

#x=

"i+1,j " "i!1,j

2!x,

#"

#y=

"i,j+1 " "i,j!1

2!y(17)


#x2=

!i+1,j + !i!1,j " 2!i, j

!x2,

#2!

#y2=

!i,j+1 + !i,j!1 " 2!i,j

!y2, (18)

#2"

#x2=

"i+1,j + "i!1,j " 2"i,j

!x2,

#2"

#y2=

"i,j+1 + "i,j!1 " 2"i,j

!y2(19)


#1

!x2+

1!y2

$&!n+1

i,j " !t

'!n+1

i+1,j + !n+1i!1,j

!x2+

!n+1i,j+1 + !n+1

i,j!1

!y2

(

= !ni,j + !t "n

i,j (20)%1 +

2!t

Re

#1

!x2+

1!y2

$&"n+1

i,j " !t

Re

'"n+1

i+1,j + "n+1i!1,j

!x2+

"n+1i,j+1 + "n+1

i,j!1

!y2

(

"!t

)(!n

i+1,j " !ni!1,j)("

n+1i,j+1 " "n+1

i,j!1) " (!ni,j+1 " !n

i,j!1)("n+1i+1,j " "n+1

i!1,j)*

4!x!y= "n

i,j (21)

2

11

応用：Lid-driven Cavity (3)! 数値計算の結果

! レイノルズ数を大きくすると、複雑な構造が現れる。

Re=10 Re=1000 Re=5000

12

Supercomputing 2011! 先週、シアトルで行われたSC11の報告

! 理研／富士通の「京」が、TOP500で再び世界一。! HPC Challengeのすべての部門でトップとなる。

! 「京」を使った科学計算が、Gordon Bell Award を受賞

of supercomputers in terms of processing performance in 28 tests derived from frequently-usedcomputational patterns in the field of scientific computation. Among these, the four challengingbenchmarks are: 1) Global HPL (operating speed in solving large-scale simultaneous linear equations);2) Global RandomAccess (random memory access performance in parallel processing); 3) EP STREAM(Triad) per system (memory access speed under multiple loads); and 4) Global FFT (total performance ofFast Fourier Transform). The HPC Challenge Class 1 Awards are awarded to the top-rankedperformance on each of these four benchmarks.

The University of Tsukuba contributed extensively to increasing the computational speed for the GlobalFFT benchmark. As a result, the performance results of the K computer were submitted to the Class 1award category.

The top three rankings achieved on the four benchmarks for the HPC Challenge Class 1 Awards for 2011are as follows:

Top 3 Rankings of Four Benchmarks for HPC Challenge Class 1 Awards 2011

Global HPL Performance (TFLOP/s) System Institutional Facility

1st place 2,118 K computer RIKEN

1st runner up 1,533 Cray XT5 ORNL

2nd runner up 736 Cray XT5 UTK

Global RandomAccess Performance (GUPS) System Institutional Facility

1st place 121 K computer RIKEN

1st runner up 117 IBM BG/P LLNL

2nd runner up 103 IBM BG/P ANL

EP STREAM (Triad) per system Performance (TB/s) System Institutional Facility

1st place 812 K computer RIKEN

1st runner up 398 Cray XT5 ORNL

2nd runner up 267 IBM BG/P LLNL

Global FFT Performance (TFLOP/s) System Institutional Facility

1st place 34.7 K computer RIKEN

1st runner up 11.9 NEC SX-9 JAMSTEC

2nd runner up 10.7 Cray XT5 ORNL

The HPC Challenge Class 1 Awards evaluate the performance of supercomputers from four differentangles, and the K computer delivers world-class performance on all four benchmarks.

With the understanding that its use would be widely shared by researchers and engineers inside andoutside RIKEN from the very start, the development of the K computer has proceeded with the aim ofcreating a supercomputer that combines superior computational performance with the versatility thatenables it to run applications for a wide range of fields. The HPC Challenge results demonstrate theversatility of the K computer and the all-around high performance levels it delivers as a supercomputer.

Glossary & Notes The "K computer", which is being jointly developed by RIKEN and Fujitsu, is part ofthe High-Performance Computing Infrastructure (HPCI) initiative led by Japan's Ministry of Education,Culture, Sports, Science and Technology (MEXT). The K computer's availability for shared use isscheduled for 2012. The "K computer" is the nickname RIKEN has been using for the supercomputer ofthis project since July 2010. "K" comes from the Japanese Kanji character "Kei" which means ten peta or10 to the 16th power. In its original sense, "Kei" expresses a large gateway, and it is hoped that thesystem will be a new gateway to computational science. The HPC Challenge Awards consist of the Class

1 benchmark performance competition and the Class 2 "Most Productivity" awards for the most "elegant"implementation of computationally intensive kernels. The Class 1 awards consist of the following fourbenchmarks, each of which evaluates the performance of key system components (CPU computational

» More Financials

EPS Revisions

» More Estimates Revisions

IBM Predictive AnalyticsNo More Guesswork with Analytics. Learn More inIBM's 2 Min SPSS Demo

First-principles calculations of electron states of a silicon

nanowire with 100,000 atoms on the K computer Yukihiro Hasegawa

1, Jun-Ichi Iwata

2, Miwako Tsuji

2, Daisuke Takahashi

2,

Atsushi Oshiyama3, Kazuo Minami

1, Taisuke Boku

2, Fumiyoshi Shoji

1,

Atsuya Uno1, Motoyoshi Kurokawa

1, Hikaru Inoue

4, Ikuo Miyoshi

5 and Mitsuo Yokokawa

1

1Development Group, Next-Generation Supercomputer R&D Center, RIKEN

2Center for Computational Sciences, University of Tsukuba

3Department of Applied Physics, School of Engineering, The University of Tokyo

4Computational Science and Engineering Solution Division, Technical Computing Solution Unit, Fujitsu Limited

5PA Project, Next Generation Technical Computing Unit, Fujitsu Limited

{y.hasegawa, minami_kaz, shoji, uno, motoyosi, yokokawa}@riken.jp [email protected]

{tsuji, daisuke, taisuke}@ccs.tsukuba.ac.jp [email protected]

{inoue-hikaru, miyoshi.ikuo}@jp.fujitsu.com

ABSTRACT

Real space DFT (RSDFT) is a simulation technique most suitable for massively-parallel architectures to perform first-principles electronic-structure calculations based on density functional theory. We here report unprecedented simulations on the electron states of silicon nanowires with up to 107,292 atoms carried out during the initial performance evaluation phase of the K computer being developed at RIKEN.

The RSDFT code has been parallelized and optimized so as to make effective use of the various capabilities of the K computer. Simulation results for the self-consistent electron states of a silicon nanowire with 10,000 atoms were obtained in a run lasting about 24 hours and using 6,144 cores of the K computer. A 3.08 peta-flops sustained performance was measured for one iteration of the SCF calculation in a 107,292-atom Si nanowire calculation using 442,368 cores, which is 43.63% of the peak performance of 7.07 peta-flops.

Categories and Subject Descriptors

J.2 [Computer Applications]: Physical Sciences and Engineering - engineering, physics

General Terms

Performance

Keywords

K computer, Tofu, Next-generation supercomputer, Real-space density functional theory, RSDFT, Self-consistent electron states, Silicon nanowire, Peta-flops

1. INTRODUCTION Computer simulations are needed to clarify and predict the properties of materials having promising applications. In particular, first-principles electronic structure calculations based on density functional theory (DFT) have been performed on a variety of materials by using diverse software implementations on parallel computers.

Real space DFT (RSDFT) code [1], developed by Iwata et al. of the University of Tsukuba, is a simulation technique to perform first-principles electronic structure calculations. “Real space” means that three-dimensional physical coordinates are discretized, and the wave functions, electron density, and potential field are calculated at the resulting discrete lattice points or grids. One of the advantages of this method is that it is suitable for parallel computations. In fact, the Hamiltonian matrix of the real-space formulation is sparse, and the Fast Fourier Transform (FFT), which usually requires global communications traversing all compute nodes of parallel computers, is unnecessary for the Hamiltonian matrix operations.

The RSDFT code has been parallelized in order to run on parallel computers, like PACS-CS[2, 3] and the T2K open supercomputer[4, 5], by incorporating matrix-matrix products for Gram-Schmidt orthogonalization with a high cache-hit ratio and thread parallelization with OpenMP directives for multicore processors [1, 6, 7]. Sustained performance of 10%-20% of peak performance has been obtained on these systems for simulations of several thousand atoms. Furthermore, the real-space method has been used to simulate systems consisting of thousands of atoms, and it is apparently promising for much larger systems containing 10,000-100,000 atoms. However, to represent the actual behavior of genuine materials, much more computational resources, i.e., more CPU cycles and a larger storage volume, are needed to make simulations of up to 100,000 atoms.

Supercomputers are essential tools in computational science and engineering. Their high performance stems from the use of high-speed execution units such as SIMD circuits and an appropriate interconnect network balanced with CPU performance. Several supercomputer development projects have been undertaken as

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SC’11, November 12-18, 2011, Seattle, WA, U.S.A. Copyright 2011 ACM978-1-4503-0771-0/11/11…$10.00.

13

行列の対称性と固有値計算法(1)! 実対称行列は、実固有値 Ej を持つ

! 対角化は、という直交行列 P を使って、

ただし、を満たす uj を実の列ベクトルとして、

! 複素エルミート行列は、実固有値 Ej を持つ! 対角化は、というユニタリ行列 U を使って、

ただし、を満たすを複素列ベクトルとして、

ST = S

PT SP = diag(E0, E1, . . . , EN!1)PT P = PPT = I

Suj = Ejuj

P = (u0, u1, . . . , uN!1)

H† = H

U†U = UU† = I

U†HU = diag(E0, E1, . . . , EN!1)Hvj = Ejvj

U = (v0, v1, . . . , vN!1)vj

14

行列の対称性と固有値計算法(2)! 直交行列 P は、実成分を持つ行列でを満たす。

! 固有値の絶対値は 1 で、固有ベクトルは一般には複素ベクトル。! 反対称行列 A (AT=-A) により、P=exp(A) と書ける。このとき、PTP=PPT=exp(AT)exp(A)=exp(-A)exp(A)=I

! ユニタリー行列 U は、を満たす。! 固有値の絶対値は 1 で、固有ベクトルは一般には複素ベクトル。! エルミート行列 H (H†=H)により、U=exp(iH) と書ける。このとき、 U†U=UU†=exp(-iH†)exp(iH)=I

U †U = UU † = I

PT P = PPT = I

15

密行列の固有値計算の例(1)! 実際の数値計算で線形ライブラリを自分で作ることは、まずあり得ない。通常は、LAPACKなどの汎用ライブラリを利用する。! Netlib から取得してC/C++/FORTRAN などとリンクして使う。! see Netlib, http://www.netlib.org/

! あるいは、MATLABなどの数値計算処理系を利用する。! 一般にライセンス料(アカデミックの場合、数万円？)を支払って、特定の計算機にインストールして利用する。

! 互換性のある、無料のclone処理系も開発されている。! Octave, Scilab

16

密行列の固有値計算の例(2)! ランダム行列の固有値統計

! GOE Random Matrix:ただし、実対称行列。直交変換で分布が変化しない行列の集団(Gaussian Orthogonal Ensemble)。

! GUE Random Matrix:ただし、エルミート行列。ユニタリ変換で分布が変化しない行列の集団(Gaussian Unitary Ensemble)。

! 各要素は、正規分布をする(Gauss)乱数(二乗平均は上記の値)。! 固有値は、半円状に分布する(Wigner’s semi-circle law)

!|Aij |2

"=

#$

%

14N (i != j)

12N (i = j)

!|Aij |2

"=

14N

17

密行列の固有値計算の結果(1)! ここでは、MATLAB clone の Octave を利用してみる。! 作成したMATLABファイルと利用方法：

function mat = goeMatrix(N)

mat = zeros(N,N);

scl = 1 / (2.0 * sqrt(N));

for k=1:N-1

mat(k,k+1:N) = randn(1,N-k) * scl;

end

mat = mat + mat.' + diag(randn(1,N)) * scl * sqrt(2);

end

goeMatrix.m

function mat = gueMatrix(N)

mat = zeros(N,N);

scl = 1 / (2.0 * sqrt(N));

for k=1:N-1

mat(k,k+1:N) = scl * randn(1,N-k) * diag(exp(i * rand(1,N-k)));

end

mat = mat + mat' + diag(randn(1,N)) * scl;

end

gueMatrix.m

利用したコマンド：　　・e = eig(M);

　　・save filename obj

　　・etc.

18

密行列の固有値計算の結果(2)! Octave (MATLAB clone)での計算例

! CPU時間計測の結果(MacBookAir)(右図)! 固有値間隔分布: GOEとGUEで異なる C

PU

Tim

e (

sec)

Size of Matrix

0.001

0.01

0.1

1

10

100

1000

100 300 1000 3000 10000

Distirbution

Spacing

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.0 1.0 2.0 3.0

Distribution

Spacing

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.0 1.0 2.0 3.0

P (S) =

!"""""""#

"""""""$

!

2exp

%!!

4S2

&

32!2

exp'! 4

!S2

(

exp(!S)

GOE行列の固有値間隔分布 GUE行列の固有値間隔分布左のグラフ中の理論式

対角化に要するCPU時間

実対称行列

複素エルミート行列

19

参考資料! 岩波講座応用数学「線形計算」岩波書店 ! 芦野隆一、他「はやわかりMATLAB」共立出版! GNU Octave, http://www.gnu.org/software/octave/

20

Documents

授業計画と単位認定 （第7回）dogra.csis.oita-u.ac.jp/tkm/lecture/sim2011/2011_07/sim...講義資料： 2 授業計画と単位認定 ! 授業の予定! 第1回： 仮想実験とは何か、授業の目的と計画、参考書など!

授業計画と単位認定（第7回）dogra.csis.oita-u.ac.jp/tkm/lecture/sim2011/2011_07/sim...講義資料： 2 授業計画と単位認定 ! 授業の予定! 第1回：仮想実験とは何か、授業の目的と計画、参考書など!