2010 ase-automatic detection of nocuous coordination ambiguities in natural language requirements

発表論文

• タイトル

「Automatic Detection of Nocuous Coordination

Ambiguities in Natural Language Requirements」

• 著者– Hui Yang、Alistair Willis、Anne De Roeck、Bashar Nuseibeh

• 出典

– Automated Software Engineering（ASE） 2010

0

概要

• 目的：要求文書から、「あいまいさ」を検出

するため

• 手法：あいまいさを含む文を抽出し、

有毒であるかどうかを判別

• 結果：精度よく有毒なあいまいさを検出できた

1

背景

要求文書：自然言語で記述

自然言語には「あいまいさ」がある

あいまいさは、利害関係者間に「誤解」を生む

要求分析工程の誤解は、下流工程に響く

2

コーディネートのあいまいさ

• 「and」、「or」によって生じるあいまいさ

They support a typing system for

architectural components and connectors.

1. architectural components

and connectors

2. architectural components

and architectural connectors3

低

高

接続

接続

有毒なあいまいさ

They support a typing system for

architectural components and connectors.

低：10

高：7 （17人中）

It is described the size of

vector-based input and output.

低：1

高：16 （17人中）4

無害

有害

NAIツール（Nocuous Ambiguity Identification Tool）

• 「有毒なあいまいさ」を特定

テキスト前処理モジュール

あいまい文検出モジュール

コーディネート構成抽出モジュール

有毒・無毒分類モジュール5

NAIツール（テキスト前処理モジュール）

• 文の境界を探索し、文を判別

• 単語と連語を区別

• POSタグ（名詞、動詞、形容詞、副詞etc）

をつける

Stanford parser （既存）を利用

6

7

c：「and」や「or」n：名詞v：動詞p：前置詞

下線：修飾子adj：形容詞的に修飾adv：副詞的に修飾nn：名詞的に修飾

NAIツール（あいまい文検出モジュール1）

• 要求文書中で、あいまい文のパターンに合う文を抽出

8

適合

あいまい文のパターン

抽出

要求文書（POSタグ付き）

要求文書中のあいまい文

NAIツール（あいまい文検出モジュール2）

• 単語の位置とPOSタグで、2つの解釈を検出

1. architectural components and connectors

2. architectural components and architectural connectors9

NAIツール（コーディネート構成抽出モジュール）

architectural components and connectors

修飾子 n1 c n2

位置

POS

あいまい文が138コある要求文書で、

17人の専門家（情報工学研究者やスタッフ）が、

• HA（両方を修飾している）

• LA（片方を修飾している）

• A（どちらとも言えない）

のどれかを選択

10

NAIツール（有毒・無毒分類モジュール1）

• 分類機の学習アルゴリズムは、

LogitBoost（既存）に決定

– decision trees, J48, Naive Bayes, SVM,

Logistic Regression より良かった

• 分類機の閾値は可変に

–グレー：無毒

–黒：有毒

11

NAIツール（有毒・無毒分類モジュール2）

NAIツールの評価（概要）

• 138コのうち90%を学習、残りの10%を評価

• 再現率と適合率を評価

R：再現率 TP：特定できた有害なあいまいさ

P：適合率 FN：特定できなかった有害なあいまいさ

FP：有害だと判断された無害なあいまいさ

12

NAIツールの評価（結果）

13

• 適合率（P）：約80%

• 再現率（R）：

→閾値75%～は100%

（適合率はほぼ不変）

→閾値40～50%は悪い

結論

要求文書からあいまい文を抽出

あいまい文が有毒かどうか判別

有毒なあいまいさを特定できた

14

私見

長所

• あいまいさには、有毒なものと無毒なものがあることを見つけ出した

• 適合率・再現率ともに良い精度となった

• 読んでいて楽しい

短所

• NAIツールのコーディネート構成抽出モジュールの説明が浅い（もう少し詳細に記述してほしい）

15

Documents

2010 ase-automatic detection of nocuous coordination ambiguities in natural language requirements