10
Research Article Domain Terminology Collection for Semantic Interpretation of Sensor Network Data Myunggwon Hwang, Jinhyung Kim, Jangwon Gim, Sa-kwang Song, Hanmin Jung, and Do-Heon Jeong Department of Computer Intelligence Research, Korea Institute of Science and Technology Information (KISTI), 245 Daehak-ro, Yuseong-gu, Daejeon 305-806, Republic of Korea Correspondence should be addressed to Do-Heon Jeong; [email protected] Received 20 October 2013; Accepted 27 December 2013; Published 13 February 2014 Academic Editor: Hwa-Young Jeong Copyright © 2014 Myunggwon Hwang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Many studies have investigated the management of data delivered over sensor networks and attempted to standardize their relations. Sensor data come from numerous tangible and intangible sources, and existing work has focused on the integration and management of the sensor data itself. e data should be interpreted according to the sensor environment and related objects, even though the data type, and even the value, is exactly the same. is means that the sensor data should have semantic connections with all objects, and so a knowledge base that covers all domains should be constructed. In this paper, we suggest a method of domain terminology collection based on Wikipedia category information in order to prepare seed data for such knowledge bases. However, Wikipedia has two weaknesses, namely, loops and unreasonable generalizations in the category structure. To overcome these weaknesses, we utilize a horizontal bootstrapping method for category searches and domain-term collection. Both the category- article and article-link relations defined in Wikipedia are employed as terminology indicators, and we use a new measure to calculate the similarity between categories. By evaluating various aspects of the proposed approach, we show that it outperforms the baseline method, having wider coverage and higher precision. e collected domain terminologies can assist the construction of domain knowledge bases for the semantic interpretation of sensor data. 1. Introduction Many studies have considered the integrated management of data received from sensor networks [1, 2]. In particular, some significant research has focused on ontology-based approaches for developing standardized and semantic rela- tions between the data [36]. e data collected from sen- sors represents various tangible and intangible objects, such as temperature, acceleration, GPS, light, barometric pres- sure, magnetic degree, and acoustic measurements. Existing research deals with the integration of the sensor data itself, the definition of standard schemes, and management appli- cations for understanding the sensor data. However, the data could be interpreted differently according to the environment and which objects are related to the sensor, even though the data type, and even its value, may be the same. For example, two 1 C measurements from a refrigerator and an aquarium will have very different interpretations. To make appropriate decisions in different situations, the conceptual idea of the sensor network domain should be related to other concepts in different domains. To attack this issue, knowledge bases incorporating ontology, taxonomy, folksonomy, or thesaurus information should first be constructed, allowing reliable connections to be formed between concepts of the sensor network and concepts of other domain knowledge bases. e fundamental step in constructing knowledge bases is to collect domain terminologies, and our research deals with a domain-term collection method. Domain-terms, which are the main components of the knowledge, are words and compound words that have spe- cific meanings in a specific context (definition of the term “Terminology”: http://en.wikipedia.org/wiki/Terminology). Constructing knowledge bases manually requires consider- able labor, cost, and time and can sometimes result in conflict [7, 8]. erefore, the automatic construction of a body of knowledge by extracting domain-terms from various sources Hindawi Publishing Corporation International Journal of Distributed Sensor Networks Volume 2014, Article ID 827319, 9 pages http://dx.doi.org/10.1155/2014/827319

Research Article Domain Terminology Collection for ...downloads.hindawi.com/journals/ijdsn/2014/827319.pdf · knowledge bases for the semantic interpretation of sensor data. 1. Introduction

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Domain Terminology Collection for ...downloads.hindawi.com/journals/ijdsn/2014/827319.pdf · knowledge bases for the semantic interpretation of sensor data. 1. Introduction

Research ArticleDomain Terminology Collection for Semantic Interpretation ofSensor Network Data

Myunggwon Hwang Jinhyung Kim Jangwon Gim Sa-kwang SongHanmin Jung and Do-Heon Jeong

Department of Computer Intelligence Research Korea Institute of Science and Technology Information (KISTI)245 Daehak-ro Yuseong-gu Daejeon 305-806 Republic of Korea

Correspondence should be addressed to Do-Heon Jeong heonkistirekr

Received 20 October 2013 Accepted 27 December 2013 Published 13 February 2014

Academic Editor Hwa-Young Jeong

Copyright copy 2014 Myunggwon Hwang et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Many studies have investigated the management of data delivered over sensor networks and attempted to standardize theirrelations Sensor data come from numerous tangible and intangible sources and existing work has focused on the integration andmanagement of the sensor data itselfThe data should be interpreted according to the sensor environment and related objects eventhough the data type and even the value is exactly the sameThismeans that the sensor data should have semantic connections withall objects and so a knowledge base that covers all domains should be constructed In this paper we suggest a method of domainterminology collection based onWikipedia category information in order to prepare seed data for such knowledge bases HoweverWikipedia has two weaknesses namely loops and unreasonable generalizations in the category structure To overcome theseweaknesses we utilize a horizontal bootstrapping method for category searches and domain-term collection Both the category-article and article-link relations defined inWikipedia are employed as terminology indicators andwe use a newmeasure to calculatethe similarity between categories By evaluating various aspects of the proposed approach we show that it outperforms the baselinemethod having wider coverage and higher precision The collected domain terminologies can assist the construction of domainknowledge bases for the semantic interpretation of sensor data

1 Introduction

Many studies have considered the integrated managementof data received from sensor networks [1 2] In particularsome significant research has focused on ontology-basedapproaches for developing standardized and semantic rela-tions between the data [3ndash6] The data collected from sen-sors represents various tangible and intangible objects suchas temperature acceleration GPS light barometric pres-sure magnetic degree and acoustic measurements Existingresearch deals with the integration of the sensor data itselfthe definition of standard schemes and management appli-cations for understanding the sensor data However the datacould be interpreted differently according to the environmentand which objects are related to the sensor even though thedata type and even its value may be the same For exampletwo 1∘C measurements from a refrigerator and an aquariumwill have very different interpretations To make appropriate

decisions in different situations the conceptual idea of thesensor network domain should be related to other conceptsin different domains To attack this issue knowledge basesincorporating ontology taxonomy folksonomy or thesaurusinformation should first be constructed allowing reliableconnections to be formed between concepts of the sensornetwork and concepts of other domain knowledge basesThe fundamental step in constructing knowledge bases is tocollect domain terminologies and our research deals with adomain-term collection method

Domain-terms which are the main components of theknowledge are words and compound words that have spe-cific meanings in a specific context (definition of the termldquoTerminologyrdquo httpenwikipediaorgwikiTerminology)Constructing knowledge bases manually requires consider-able labor cost and time and can sometimes result in conflict[7 8] Therefore the automatic construction of a body ofknowledge by extracting domain-terms from various sources

Hindawi Publishing CorporationInternational Journal of Distributed Sensor NetworksVolume 2014 Article ID 827319 9 pageshttpdxdoiorg1011552014827319

2 International Journal of Distributed Sensor Networks

is a popular area of research [8ndash15] Nowadays Wikipedia(WP) and similar repositories are widely employed as infor-mation sources [7 16 17] WP contains diverse forms toexplain concepts (hereafter we use concepts terms andarticle as the same meaning) such as abstract information(specific and long definitions) tabular information the mainarticle content article links and category information Theterm ldquoarticlerdquo is generally used in WP but it also meansldquotitle of articlerdquo In this paper we use ldquoarticlerdquo and ldquotermrdquointerchangeably Moreover WP provides highly reliable andwidely used content because it is based on semantic infor-mation from the collective intelligence of contributors world-wide HoweverWP has a couple of weaknesses in its categorystructure (we detail these with examples in Section 2) Oneis that it has loops in the category hierarchy and the otheris that a significant number of categories are unreasonablygeneralized These weaknesses were similarly identified inprevious work [7] General methods of extracting domain-terms from knowledge such as for Princeton WordNet[18] use a vertical search (top-down or bottom-up) thatchooses a representative term (eg science) covering a fieldof interest and extracts as domain-terms all of the terms(eg natural science life science biology botany ecologygenetic science morphology anatomy biomedical sciencemedical science information science and natural languageprocessing) contained under the representative term [19]Because of the weaknesses identified above such methodscannot be applied toWPThis research proposes a horizontalmethod to resolve the difficulties of a vertical search Themethod requires one domain category as input (multiplecategories are possible but we consider only the single case inthis paper)The entry category containsmany articlesWe callthese domain articles and each domain article is involved inone ormore categoriesWe consider the categories connectedto the domain articles as candidate categories that can bedeeply related to the entry category Then our methodmeasures the similarity between the domain category andthe candidate category If the similarity matches or exceedsa predetermined threshold the candidate and its articlesare added to the domain category group and the domainarticle group respectively The method generates a similarcategory group and a domain terminology group throughiterative processes and evaluates its category grouping andterm collection performance

The remainder of this paper is organized as fol-lows Section 2 describes our motivation for this researchSection 3 proposes the domain-term collection methodthrough domain category grouping Section 4 presents exper-imental results and evaluates the performance of the pro-posed approach and finally in Section 5 we summarize ourresearch

2 Motivation

Many applications employ various WP components forsemantic information processing WP is an agglomerationof knowledge that has been cultivated by contributors fromdiverse fields thus its content has wide coverage andhigh reliability In particular the hierarchical structure of

categories and the semantic networks between articles appearsimilar to the human knowledge system These strengthsallow WP to be widely used however unfortunately addi-tional processes are needed We indicate a couple of weak-nesses of WP in this section Box 1 shows a case of looprelations in the hierarchical structure which represents oneof the weaknesses

The category ldquoNatural language processingrdquo has ldquoCon-ceptsrdquo as a supercategory and each ldquoConceptrdquo has itself asone of the superconcepts a few steps later The WP hierarchycontains many loop cases and this poses difficulties duringa vertical search Even if this was resolved programmaticallythere would be another obstacle as shown in Box 2

Box 2 enumerates the supercategories of ldquoNatural lan-guage processingrdquo after its loop cases have been removedThe initial category is a computer science technology but thissoon becomes connected to ldquoMindrdquo ldquoMarxismrdquo ldquoHumansrdquoldquoTaxonomyrdquo ldquoClassification systemsrdquo ldquoLibrariesrdquo ldquoCollectiveintelligencerdquo ldquoInternetrdquo ldquoWorldrdquo and ldquoPeoplerdquo Some of theconnections are appropriate but others suffer from excessivegeneralization between categories We call this inappropriategeneralization and it causes undesirable categories and termsto be collected in a domain category during a vertical searchTherefore we propose a method of searching horizontally forrelated categories

This research considers a category of interest as theentry domain category and measures the similarity of articleintersections between this domain and the other candidatecategory If the similarity is equal to or exceeds a predeter-mined threshold the candidate is used as an element of thedomain set Some well-known similarity measures for thedegree of article intersection such as the Jaccard similaritycoefficient (JSC) or theDice coefficient (DC) have significantlimitations as shown in Table 1

Each case consists of two categories and we wish todetermine whether the candidate can be added to the domainset In the first case the categories have the same number ofarticles 50 of which are shared as an intersection set Thesimilarity values are 0333 and 05 according to the JSC andthe DC respectively In the second case almost all of thecandidate articles are included in the domain category but thesimilarities are lower than those of the first case We believethat the second case should have higher similarity valuesbecause the initial domain category was chosen by the userwhichmeans that the domain has our trust However existingmethods cannot provide a suitable measure of similarity Inthis paper we suggest simple new measurements to mitigatethis limitation

3 Domain-Term Extraction Based onCategory Grouping

Articles included in the same WP category describe simi-lar content However WP assigns many similar categories(eg ldquoWord sense disambiguationrdquo ldquoOntology learningrdquo andldquoData miningrdquo) to a single category (eg ldquoNatural languageprocessingrdquo) To collect domain-terms that have wide cover-age we suggest a bootstrapping method that remedies theweaknesses of WP and improves on the existing similarity

International Journal of Distributed Sensor Networks 3

Natural language processing rarr Computational linguistics rarr Natural language and computing rarrHuman-computer interaction rarr Artificial intelligence rarr Futurology rarr Social change rarr Socialphilosophy rarr Human sciences rarr Interdisciplinary fields rarr Science rarr Knowledge rarr Conceptsrarr Mental content rarr Consciousness rarr Philosophy of mind rarr Conceptions of self rarr Concepts rarr sdot sdot sdot

Box 1 Case of a loop in the WP category hierarchy (bold represents the same category concept but it occurs iteratively on the hierarchy)

Natural language processing rarr Computational linguistics rarr Natural language and computingrarr sdot sdot sdot rarr Artificial intelligence rarr sdot sdot sdot rarr Human sciences rarr Social sciences rarr sdot sdot sdot rarr Science rarrKnowledge rarr Concepts rarr Mental content rarr Consciousness rarr Mind rarr Concepts inmetaphysics rarr sdot sdot sdot rarr Form rarr Ontology rarr Reality rarr sdot sdot sdot rarr Scientific observation rarr Datacollection rarr sdot sdot sdot rarr Probability rarr Philosophical logic rarr sdot sdot sdot rarr Neo-Marxism rarr Marxism rarr sdot sdot sdot rarrReasoning rarr Intelligence rarr sdot sdot sdot rarr Humans rarr sdot sdot sdot rarr Phyla rarr Taxonomic categories rarrTaxonomy rarr Classification systems rarr Library science rarr Libraries rarr sdot sdot sdot rarr Human welfareorganizations rarr Non-profit organizations by beneficiaries rarr Non-profit organizations rarr sdot sdot sdot rarrConstitutions rarr Legal documents rarr Documents rarr sdot sdot sdot rarr Crowd psychology rarr Collectiveintelligence rarr sdot sdot sdot rarr Cyberspace rarr Internet rarr Wide area networks rarr sdot sdot sdot rarr World rarr sdot sdot sdot rarr People

Box 2 Case of inappropriate generalization in theWP category hierarchy (bold represents the same category concept but it occurs iterativelyon the hierarchy)

Table 1 Similarity issues for bootstrapping methods

Cases Categories Count of articles |1198621

cap 1198622

| JSC DC

1 Domain 1 100 50 0333 05Candidate 1 100

2 Domain 2 100 18 0176 03Candidate 2 20

measurements mentioned in the previous section We nowdescribe these processes in detail using real examples

31 System Flow The proposed method takes one categorywhich the user selects as an entry (trigger) and follows theflowchart shown in Figure 1 Starting from the entry categorywe determine similar categories through a horizontal searchIn the search articles included in the entry act as ldquocluesrdquofor measuring the similarity and ldquobridgesrdquo for preparing thenext candidate category Figure 2 illustrates an example of acategory-article network If the category ldquoNatural languageprocessingrdquo is given as the entry the method finds articles forsimilarity measurement and prepares the categories of eacharticle for the next candidates This means that ldquoInformationsciencerdquo ldquoKnowledge representationrdquo ldquoMachine learningrdquoldquoArtificial intelligence applicationsrdquo ldquoData miningrdquo and soforth are processed individually as candidate categories Wenow explain the process shown in Figure 1 using similarexamples

32 Domain-Term Selection throughCategoryGrouping (Boot-strapping Method) To group similar categories we choosea horizontal category search and propose new similaritymeasurements that enable the group to be enriched Thebootstrapping process proceeds as follows

(1) An initial domain category (DC) consists of a user-selected category DC = user selected category Forexample DC = Natural language processing Thelength of DC increases throughout the iterative pro-cess

(2) The domain articles (DA) of one category are col-lected At first DA consists of the articles of the entrybut it gradually becomes enriched with articles fromnew domain categories Explicitly DA = (art distcount dw)i 1 =lt i =lt n where art dist count and dwdenote an article a distance from an entry categoryan accumulated (overlapped) count and a domain-term weight respectively The initial elements of DAhave values of 1 1 and 10 for dist count and weightrespectively For example DA = (Concept mining1 1 10) (Information retrieval 1 1 10) (Languageguessing 1 1 10) (Stemming 1 1 10) (Contentdetermination 1 1 10)

(3) There are two options to choose whether an article-link network is used in the similarity measurementsWe explain the options using Figure 3 and Table 2Figure 3 shows the network between the categoriesldquoNatural language processingrdquo and ldquoData miningrdquowhereas Table 2 defines the network types

(i) The first option considers only category-articlenetworks such as Content determinationInformation retrieval Languageware Conceptmining Document classification Text miningAutomatic summarization String kernel Senticcomputing for ldquoNatural language process-ingrdquo Type 1 in Table 2 is related to this option

(ii) The second option uses more complex networksthat utilize the category-article network as well

4 International Journal of Distributed Sensor Networks

Start

(1) Entry category (user interested category)

(6)

(2) Collecting domain articles

(3) Link type = 1

(4) Collecting candidate categories

(5) Collecting candidate articles

Yes

Yes

Yes

Yes

Yes

Yes

(11) Collecting links of domain articles

(12) Collecting candidate categories

(14) Collecting links of candidate articles

(13) Collecting candidate articles

(15)

No

No

No

No

No

No

End

Evaluation(domain category and domain term)

cat1

= (art dist count and

artj)

catk)

cati catk)

(7)

(8) Move candidate andrespectively

(9)

(10)

(19) Print

artj)

catk) =

catk) = link cano 1 le o le q

(16)

(17) Move candidate

(18)

weight)j 1 le j le n

Measure similarity

threshold value

cati) = link domj 1 le j le mcati)

DC =

CC(

DA(

CA(

CA

DLS(

CC(

CA(

CLS(

DC and DA

sim( cati catk)Measure similarity sim(

sim gethreshold valuesim ge

to DC and DA

CA and CLS

= (artl) 1 le l le p

(artl) 1 le l le p= (catk) 1 le k le m

= (catk) 1 le k le m

|DA| = 0

|CC| = 0|CC| = 0

to DC and DA and DLS respectively

Figure 1 Flowchart for domain category grouping and domain-term selection

International Journal of Distributed Sensor Networks 5

(C) Natural language processing

(A) Concept mining

(A) Document classification

(A) Automatic summarization

(C) Computational linguistics

(C) Data mining

(C) Artificial intelligence applications

(C) Information science

(C) Machine learning

(A) Knowledge representation

(C) middot middot middot

(C) middot middot middotmiddot middot middot

Figure 2 Example of a category-article network

(C) Natural language processing

(C) Data mining

Languageware

Concept mining

Document classification

Text mining

Text corpus

Informationretrieval

Contentdetermination

String kernel

Clusteranalysis

Automaticsummarization

Information Datavisualization

OntologySenticcomputing

Biomedicaltext mining

middot middot middot

middot middot middot

Figure 3 Network structure between the categories ldquoNatural language processingrdquo and ldquoData miningrdquo Solid lines and dotted lines representthe category-article network and article-link network respectively whereas (C) denotes a category Here the link is a connection betweenWP articles

as article links Types 2 and 3 in Table 2 repre-sent this option

If the first option is chosen the method goes to Step4 otherwise it goes to Step 11

(4) Candidate categories (CC) are collected using onearticle of DA If DC already contains a candidate it isnot included in CC Explicitly CC (art

119894) = cat

119895 1 =lt

119895 =lt 119898 For example CC (Concept mining) = Datamining Artificial intelligence applications

(5) To prepare clues that indicate suitable categoriesa set of candidate articles (CA) of one candidatecategory is formed For example CA (Data mining)= Extension neural network Big data Data classifi-cation (business intelligence) Document classification

Webmining Text mining Concept mining If thereis an intersection article then an intersection set IS isconstructed to measure the similarity 119868119878 (set

119894 set119895) =

(set119894cap set119895) 997904rArr 119894119886

119896 1 le 119896 le 119901 For exam-

ple IS (Natural language processing Data mining) =Languageware Concept mining Document classifica-tion Text mining

(6) To eliminate the limitation described in Section 2 wepropose a new similarity measurement

sim (DACA) = 120572 times |119868119878 (DACA)||DA|

+ (1 minus 120572)

times

|119868119878 (DACA)||CA|

(1)

6 International Journal of Distributed Sensor Networks

Table 2 Network types and examples C denotes a category A anarticle and L a link

Type Semantic network Examples of thenetwork

1 C1 rarr (A1 = A2)larr C2

Natural languageprocessing rarrDocument

classificationlarrData mining

2

C1 rarr A1 rarr (L1 = A2)larr C2

Natural languageprocessing rarrInformation

retrieval rarr Textcorpuslarr data

mining

C1 rarr (A1 = L2)larr A2larr C2

Natural languageprocessing rarrAutomatic

summarizationlarrText mininglarrData mining

3 C1 rarr A1 rarr (L1 = L2)larr A2larr C2

Natural languageprocessing rarrInformationretrieval rarr

InformationlarrData visualizationlarr Data mining

Bold represents the category concepts connected with super and subrelations but inappropriate generalization in hierarchical structure

where basically we assign a value of 05 to120572 Based on(1) we can calculate the similarities of Cases 1 and 2 inTable 1 to obtain values of 05 and 054 respectivelyHowever we must consider an additional constraintin the bootstrapping method Table 3 shows anotherexample Based on (1) we find that both cases havethe same similarity value as shown in Table 3 Evenso Case 2 is inappropriate because the coverage ofDA is too narrow This may cause the generalizationproblem Thus before calculating similarities in thebootstrapping method the similarity constraint

|DA| ge |CA| (2)

should be satisfiedAccording to this constraint and (1) the similaritybetween ldquoNatural language processingrdquo and ldquoDataminingrdquo is calculated as follows |DA| = 117 |CA| =93 |119868119878 (DACA)| = 4 sim (Natural language process-ing Data mining) = (4117 + 493)2 = 0039

(7) If the similarity exceeds a predetermined thresholdgo to Step 8 otherwise we skip Step 8 and proceed toStep 9

(8) If the candidate category has a similarity that isgreater than the threshold the system considers thecandidate to be the domain and inserts the candidateand its articles into DC andDAHere new articles are

Table 3 Examples of similarity measurement

Cases Categories Count of articles |1198621

cap 1198622

| sim

1 DA1 400 100 0625CA1 100

2 DA2 100 00 0625CA2 400

accompanied by supplementary values of dist countand dw Specifically dist is the category distance fromthe entry category (in the case of ldquoData miningrdquo distis 2) count is 1 and dw is the similarity calculated inStep 6 If DA already contains an article the originalsupplementary values are increased by adding thenewonesThese values are used later for domain-termselection

(9) If there is at least one element in CC we return to Step4 for the next candidate category otherwise proceedto Step 10

(10) If there is at least one element in DA we return toStep 2 for the new CC of the next domain articleotherwise proceed to Step 19

(11) To use the links of articles as additional clues for thesimilarity measurement a domain link set (DLS) iscollected DLS(DA) = link domj 1 =lt j =lt m Forexample DLS = Information Metadata Relationaldatabase World Wide Web Data (computing) Docu-ment retrieval

(12) This step is the same as Step 4(13) This step is the same as Step 5(14) Construct a candidate link set (CLS) with links from

CA Explicitly CLS(CA (catk)) = link cat119905 1 =lt

t =lt q For example CLS(CA (Data mining)) =Computer science Data set Artificial intelligenceDatabase management system Business intelligenceNeural network Cluster analysis

(15) If the similarity constraint is satisfied (1) is appliedto determine the category-article similarity We useadditional similarity measurements for article-linknetworks The network can have different degreesof relatedness according to the network type (seeTable 2) If two categories have complex close con-nections the similarity should be greater becauseof common features in the neighborhood To applyboth characteristics we propose a further similaritymeasure

sim (set119894 set119895)

=

10038161003816100381610038161003816119868119878 (set

119894 set119895)

10038161003816100381610038161003816

distancetimes

1

1003816100381610038161003816set119894

1003816100381610038161003816+

10038161003816100381610038161003816set119895

10038161003816100381610038161003816

(3)

where distance is the number of articles that existon the network (this is different from dist in DA)According to (3) we can calculate sim (DLS CA)

International Journal of Distributed Sensor Networks 7

sim (DA CLS) and sim (DLS CLS) which have dis-tances of 2 2 and 3 respectively The final similarity(final sim) between DA and CA is determined bysumming sim (DA CA) from (1) with sim (DLS CA)sim (DA CLS) and sim(DLS CLS) from (3) Thesimilarity between ldquoNatural language processingrdquo andldquoData miningrdquo can thus be calculated as follows

|DA| = 117 |CA| = 93

|DLS| = 3633 |CLS| = 3220

|119868119878 (DACA)| = 4 |119868119878 (DLSCA)| = 91

|119868119878 (DACLS)| = 63 |119868119878 (DLSCLS)| = 6771

sim (DACA) = (4117 + 493)2

= 0039

sim (DLSCA) = (912)(3633 + 93)

= 0012

sim (DACLS) = (632)(117 + 3220)

= 0009

sim (DLSCLS) = (67713)(3633 + 3220)

= 03293

final sim (Natural language processingData mining)

= 03893

(4)

(16) If the similarity exceeds the predetermined thresholdgo to Step 17 otherwise Step 18 is carried out

(17) If the similarity exceeds the threshold the systemenriches DC DA andDLSThis step is similar to Step8

(18) If there is at least one element in CC we return to Step12 for the next candidate category otherwise go toStep 10

(19) Output the DC and DA acquired through the boot-strapping process

(20) Terminate the bootstrapping process and evaluateDCand DA which are increased in Step 8 or 17 ForDA the evaluations are divided into three types bysupplementary value (dist count and dw) In thecase of domain weight we use values normalizedaccording to

weight1015840 (art119894) =

weight (art119894)

argart119896isinDA max weight (art119896)

(5)

We have described all of the bootstrapping steps fordomain-term selection by grouping similar categories In thenext section a few aspects of performance are evaluated

4 Experimental Evaluations

This section considers the evaluation of DC and DA Oneobjective of our research is to select as many domain-terms

Table 4 Domain categories collected for ldquoSemantic Webrdquo on NLSwith threshold value 02

Categories sim dist REMetadata publishing 047 2 1Semantic HTML 040 2 1RDF 061 2 1Knowledge engineering 042 2 1Folksonomy 027 2 1Triplestores 068 2 1Domain-specificknowledge representationlanguages

037 2 1

RDF data access 052 2 1Book swapping 022 9 0Rule engines 028 2 1Ontology languages 035 2 1Knowledge bases 030 2 1

RE relevance evaluation

as possible for which we proposed the bootstrappingmethodfor similar category grouping In the process DC and DAbecome enriched with categories and articles respectivelyand each article has supplementary values of dist count anddw

To evaluate the quality of DC and DA we used an article-category dataset and a Pagelinks dataset that are includedamong theWP components inDBpedia (DBpedia is a crowd-sourced community effort to extract structured informationfrom Wikipedia and make this information available on theWeb httpdbpediaorgAbout) version 37 To utilize theWP networks we implemented two versions of our systemreferred to as new similarity (NS) and new similarity withlinks (NSL) We varied the threshold of each system from 01to 09 Moreover we chose entry sets of 40 categories (eachset has one category) from fields of computer science suchas ldquoNatural language processingrdquo ldquoSpeech recognitionrdquo andldquoSemantic Webrdquo The results are compared with those of abaseline method that employs the DC similarity measureTable 4 shows some of the similar categories collected by NSLfor the entry category ldquoSemanticWebrdquo with a threshold of 02

We invited domain specialists to examine the resultsand each collection was manually checked by each evaluatorThe RE field in Table 4 shows the actual checked resultswhere values of 1 and 0 were ascribed by the evaluators forrelevance and irrelevance respectively Tables 5 and 6 presentsummaries of the DC evaluations

There was no improvement for thresholds greater than06 and the bootstrapping was incomplete for a threshold of01Therefore we present results for threshold values from02to 06 In the experiments the baseline attained an extensionrate of only 118 (40 categories extended by 7 categories) eventhough its precision was 100 The aim of the informationprocessing is to reduce the time taken to accomplish certainobjectives which implies that the information system should

8 International Journal of Distributed Sensor Networks

XHTML + RDFa Bath Profile Sidecar file COinS Metadata publishing MARC standards WizFolio QiqqaISO-TimeML TimeML Metadata Authority Description Schema Bookends (software) RIS (file format)Metadata Object Description Schema EndNote Refer (software) ISO 2709 BibTeX XML S5 (file format)Semantic HTML Simple HTML Ontology Extensions Opera Show Format XOXO XHTML Friends NetworkStrixDB Graph Style Sheets TriX (syntax) TriG (syntax) RDF feed Redland RDF Application FrameworkRDF query language Turtle (syntax) RDFLib Notation3 SPARQL D3web Artificial architectureNetWeaver Developer Knowledge engineer Frame language

Box 3 Domain articles (DA) collected for ldquoSemantic Webrdquo on NLS with threshold value 02

Table 5 DC evaluations with baseline and NS

Type Baseline NSThreshold value 02 03 02 03 04 05 06Extended count 7 mdash 282 133 87 82 mdashExtension rate 118 mdash 805 433 318 305 mdashAppropriate 7 mdash 241 113 80 74 mdashInappropriate 0 mdash 41 20 7 8 mdashPrecision () 10000 mdash 8546 8496 9195 9024 mdash

Table 6 DC evaluations with NSL

Type NSLThreshold value 02 03 04 05 06Extended count 938 346 207 134 47Extension rate 2445 965 618 435 218Appropriate 799 293 187 127 47Inappropriate 139 53 20 7 0Precision () 8518 8468 9034 9478 10000

Table 7 DA evaluations based on distance including articles withinthe indicated distance

Distance 2 3 4 5 AllAppropriate 1092 1224 1246 1253 1253Inappropriate 252 336 471 508 514Precision () 813 785 726 712 709

Table 8 DA evaluations based on count (overlapped)

Overlapped count 1 2 3 4Appropriate 1253 330 90 15Inappropriate 514 80 14 1Precision () 709 805 865 938

Table 9 DA evaluations based on normalized weight

Threshold value 0 01 02 03Appropriate 1253 1195 870 755Inappropriate 514 292 69 22Precision () 709 804 927 972

provide varied results In this respect we do not expect thebaseline results to be helpful However it is apparent that NSLprovides wide extension and high precision The maximum

extension rate was 2445 with 799 appropriate categories fora threshold of 02 The minimum precision was around 84when the threshold was 03

In addition to evaluating DC we evaluated DA withthe NSL results for a threshold of 02 To examine theinfluence of the distance count and domain weight weanalyzed the results according to each factor Six DAs wereselected at randomwith a total of 1769 articles (terms) Box 3enumerates a part of collected articles for a domain ldquoSemanticWebrdquo And Tables 7 8 and 9 show the evaluation results withrespect to distance count and weight

The basic performance of the domain-term selectionattained precision of 709 As expected the precision wasinversely proportional to the distance however a distanceof 4 produced almost all of the unrelated articles Theweight and count could be used as important criteria toselect domain-terms we found that the weight returnedmore refined results than the count (the weight returned755 appropriate terms with 972 precision at a thresholdof 03) At short distances there are many names of peopleand organizations such as ldquoSquarespacerdquo ldquoRackspace Cloudrdquoand ldquoNsite Software (Platform as a Service)rdquo These nameswere selected by the bootstrapping because their associatedcategories (eg ldquoCloud platformsrdquo ldquoCloud storagerdquo andldquoCloud infrastructurerdquo) were similar to the entry (eg ldquoCloudcomputingrdquo) This situation is not caused by our methodbut by the definition of the article-category relations of WPWe believe that this can be resolved by processing content(abstracts) or tabular information in the future

5 Conclusions and Future Work

This paper has proposed a method of domain-term collec-tion through a bootstrapping process to assist the semanticinterpretation of data from sensor networks To achievethis we identified weaknesses in the WP category hierarchy

International Journal of Distributed Sensor Networks 9

(ie loops and inappropriate generalizations) and chose ahorizontal rather than vertical category searchWeproposednew semantic similarity measurements and a similarity con-straint to surpass existing methods Moreover we employedcategory-article networks and article-link networks to elicitinformation for the category similarity measurement Inperformance evaluations our category grouping based onNSL yielded the greatest number of proper results In termsof domain-term selection we confirmed that the resultsobtained with normalized weights had the best precision andextension rate The distance-based metric had no positiveinfluence on our research When the distance was greaterthan three almost all of the terms were unrelated Howeverwe believe that the collected domain terminologies can assistthe construction of domain knowledge bases for the semanticinterpretation of sensor data

WP has additional weaknesses to those mentioned inthis paper especially in the category-article relation Forexample the term ldquoPaco Nathanrdquo is a personal name thathas ldquoNatural language processingrdquo as one of its categoriesThe relation between the two that is ldquoPaco Nathanrdquo hasexpertise in ldquoNatural language processingrdquo causes noise andnegatively influences semantic information processing Wethink that this problem can be solved in future work byprocessing additional WP components such as abstract ortabular information Moreover our research employed onlythe out-links of WP articles If the in-links were consideredwe expect that the results would be more significant withwider coverage of domain-terms and higher relevance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Kebler and K Janowicz ldquoLinking sensor datamdashwhy to whatand howrdquo in Proceedings of the 3rd International Workshop onSemantic Sensor Networks pp 77ndash91 2010

[2] S Tilak N B Abu-Ghazaleh andWHeinzelman ldquoA taxonomyof wireless micro-sensor network modelsrdquo ACM SIGMOBILEMobile Computing and Communications Review vol 6 no 2pp 28ndash36 2002

[3] G Goodwin and D J Russomanno ldquoAn ontology-bases sensornetwork prototype environmentrdquo in Proceedings of the 5thInternational Conference on Information Processing in SensorNetworks pp 1ndash2 2006

[4] K Janowicz and M Compton ldquoThe Stimulus-Sensor-Observation ontology design pattern and its integration intothe Semantic Sensor Network ontologyrdquo in Proceedings of the3rd International Workshop on Semantic Sensor Networks pp92ndash106 2010

[5] P Barnaghi S Meissner M Presser and K Moessner ldquoSenseand sensrsquoability semantic data modelling for sensor networksrdquoin Proceedings of the ICT Mobile Summit pp 1ndash9 2009

[6] A Broring P Maue K Janowicz D Nust and C MalewskildquoSemantically-enabled sensor plug amp play for the sensor webrdquoSensors vol 11 no 8 pp 7568ndash7605 2011

[7] M Hwang D Choi and P Kim ldquoAmethod for knowledge baseenrichment using wikipedia document informationrdquo Informa-tion vol 13 no 5 pp 1599ndash1612 2010

[8] P Velardi A Cucchiarelli and M Petit ldquoA taxonomy learningmethod and its application to characterize a scientific web com-munityrdquo IEEETransactions onKnowledge andData Engineeringvol 19 no 2 pp 180ndash191 2007

[9] H Avancini A Lavelli B Magnini F Sebastiani and R ZanolildquoExpanding domain-specific lexicons by term categorizationrdquoin Proceedings of the ACM Symposium on Applied Computingpp 793ndash797 March 2003

[10] J J Jung Y H Yu and K S Jo ldquoCollaborative web browsingbased on ontology learning from bookmarksrdquo in Proceedings ofthe International Conference on Computational Science pp 513ndash520 2004

[11] J J Jung ldquoComputational reputation model based on select-ing consensus choices an empirical study on semantic wikiplatformrdquo Expert Systems with Applications vol 39 no 10 pp9002ndash9007 2012

[12] J Lehmann and L Buhmann ldquoOREmdasha tool for repairingand enriching knowledge basesrdquo in Proceedings of the 9thInternational Semantic Web Conference pp 177ndash193 2010

[13] J Liu Y-C Liu W Jiang and X-L Wang ldquoResearch onautomatic acquisition of domain termsrdquo inProceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 3026ndash3031 July 2008

[14] SMinocha andPGThomas ldquoCollaborative Learning in aWikiEnvironment experiences from a software engineering courserdquoNew Review of Hypermedia and Multimedia vol 13 no 2 pp187ndash209 2007

[15] R Navigli and P Velardi ldquoOntology enrichment through auto-matic semantic annotation of on-line glossariesrdquo in Proceedingsof the Knowledge Engineering and Knowledge Management pp126ndash140 2006

[16] C Muller and I Gurevych ldquoUsing wikipedia and wiktionaryin domain-specific information retrievalrdquo in Proceedings of the9th Cross-Language Evaluation ForumConference on EvaluatingSystems forMultilingual andMultimodal Information Access pp219ndash226 2009

[17] J Vivaldi and H Rodriquez ldquoFinding domain terms usingwikipediardquo in Proceedings of the 7th International Conference onLanguage Resources and Evaluation pp 386ndash393 2010

[18] G AMiller ldquoWordNet a lexical database for EnglishrdquoCommu-nications of the ACM vol 38 no 11 pp 39ndash41 1995

[19] S Lee S-Y Huh and R D McNiel ldquoAutomatic generationof concept hierarchies using WordNetrdquo Expert Systems withApplications vol 35 no 3 pp 1132ndash1144 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Research Article Domain Terminology Collection for ...downloads.hindawi.com/journals/ijdsn/2014/827319.pdf · knowledge bases for the semantic interpretation of sensor data. 1. Introduction

2 International Journal of Distributed Sensor Networks

is a popular area of research [8ndash15] Nowadays Wikipedia(WP) and similar repositories are widely employed as infor-mation sources [7 16 17] WP contains diverse forms toexplain concepts (hereafter we use concepts terms andarticle as the same meaning) such as abstract information(specific and long definitions) tabular information the mainarticle content article links and category information Theterm ldquoarticlerdquo is generally used in WP but it also meansldquotitle of articlerdquo In this paper we use ldquoarticlerdquo and ldquotermrdquointerchangeably Moreover WP provides highly reliable andwidely used content because it is based on semantic infor-mation from the collective intelligence of contributors world-wide HoweverWP has a couple of weaknesses in its categorystructure (we detail these with examples in Section 2) Oneis that it has loops in the category hierarchy and the otheris that a significant number of categories are unreasonablygeneralized These weaknesses were similarly identified inprevious work [7] General methods of extracting domain-terms from knowledge such as for Princeton WordNet[18] use a vertical search (top-down or bottom-up) thatchooses a representative term (eg science) covering a fieldof interest and extracts as domain-terms all of the terms(eg natural science life science biology botany ecologygenetic science morphology anatomy biomedical sciencemedical science information science and natural languageprocessing) contained under the representative term [19]Because of the weaknesses identified above such methodscannot be applied toWPThis research proposes a horizontalmethod to resolve the difficulties of a vertical search Themethod requires one domain category as input (multiplecategories are possible but we consider only the single case inthis paper)The entry category containsmany articlesWe callthese domain articles and each domain article is involved inone ormore categoriesWe consider the categories connectedto the domain articles as candidate categories that can bedeeply related to the entry category Then our methodmeasures the similarity between the domain category andthe candidate category If the similarity matches or exceedsa predetermined threshold the candidate and its articlesare added to the domain category group and the domainarticle group respectively The method generates a similarcategory group and a domain terminology group throughiterative processes and evaluates its category grouping andterm collection performance

The remainder of this paper is organized as fol-lows Section 2 describes our motivation for this researchSection 3 proposes the domain-term collection methodthrough domain category grouping Section 4 presents exper-imental results and evaluates the performance of the pro-posed approach and finally in Section 5 we summarize ourresearch

2 Motivation

Many applications employ various WP components forsemantic information processing WP is an agglomerationof knowledge that has been cultivated by contributors fromdiverse fields thus its content has wide coverage andhigh reliability In particular the hierarchical structure of

categories and the semantic networks between articles appearsimilar to the human knowledge system These strengthsallow WP to be widely used however unfortunately addi-tional processes are needed We indicate a couple of weak-nesses of WP in this section Box 1 shows a case of looprelations in the hierarchical structure which represents oneof the weaknesses

The category ldquoNatural language processingrdquo has ldquoCon-ceptsrdquo as a supercategory and each ldquoConceptrdquo has itself asone of the superconcepts a few steps later The WP hierarchycontains many loop cases and this poses difficulties duringa vertical search Even if this was resolved programmaticallythere would be another obstacle as shown in Box 2

Box 2 enumerates the supercategories of ldquoNatural lan-guage processingrdquo after its loop cases have been removedThe initial category is a computer science technology but thissoon becomes connected to ldquoMindrdquo ldquoMarxismrdquo ldquoHumansrdquoldquoTaxonomyrdquo ldquoClassification systemsrdquo ldquoLibrariesrdquo ldquoCollectiveintelligencerdquo ldquoInternetrdquo ldquoWorldrdquo and ldquoPeoplerdquo Some of theconnections are appropriate but others suffer from excessivegeneralization between categories We call this inappropriategeneralization and it causes undesirable categories and termsto be collected in a domain category during a vertical searchTherefore we propose a method of searching horizontally forrelated categories

This research considers a category of interest as theentry domain category and measures the similarity of articleintersections between this domain and the other candidatecategory If the similarity is equal to or exceeds a predeter-mined threshold the candidate is used as an element of thedomain set Some well-known similarity measures for thedegree of article intersection such as the Jaccard similaritycoefficient (JSC) or theDice coefficient (DC) have significantlimitations as shown in Table 1

Each case consists of two categories and we wish todetermine whether the candidate can be added to the domainset In the first case the categories have the same number ofarticles 50 of which are shared as an intersection set Thesimilarity values are 0333 and 05 according to the JSC andthe DC respectively In the second case almost all of thecandidate articles are included in the domain category but thesimilarities are lower than those of the first case We believethat the second case should have higher similarity valuesbecause the initial domain category was chosen by the userwhichmeans that the domain has our trust However existingmethods cannot provide a suitable measure of similarity Inthis paper we suggest simple new measurements to mitigatethis limitation

3 Domain-Term Extraction Based onCategory Grouping

Articles included in the same WP category describe simi-lar content However WP assigns many similar categories(eg ldquoWord sense disambiguationrdquo ldquoOntology learningrdquo andldquoData miningrdquo) to a single category (eg ldquoNatural languageprocessingrdquo) To collect domain-terms that have wide cover-age we suggest a bootstrapping method that remedies theweaknesses of WP and improves on the existing similarity

International Journal of Distributed Sensor Networks 3

Natural language processing rarr Computational linguistics rarr Natural language and computing rarrHuman-computer interaction rarr Artificial intelligence rarr Futurology rarr Social change rarr Socialphilosophy rarr Human sciences rarr Interdisciplinary fields rarr Science rarr Knowledge rarr Conceptsrarr Mental content rarr Consciousness rarr Philosophy of mind rarr Conceptions of self rarr Concepts rarr sdot sdot sdot

Box 1 Case of a loop in the WP category hierarchy (bold represents the same category concept but it occurs iteratively on the hierarchy)

Natural language processing rarr Computational linguistics rarr Natural language and computingrarr sdot sdot sdot rarr Artificial intelligence rarr sdot sdot sdot rarr Human sciences rarr Social sciences rarr sdot sdot sdot rarr Science rarrKnowledge rarr Concepts rarr Mental content rarr Consciousness rarr Mind rarr Concepts inmetaphysics rarr sdot sdot sdot rarr Form rarr Ontology rarr Reality rarr sdot sdot sdot rarr Scientific observation rarr Datacollection rarr sdot sdot sdot rarr Probability rarr Philosophical logic rarr sdot sdot sdot rarr Neo-Marxism rarr Marxism rarr sdot sdot sdot rarrReasoning rarr Intelligence rarr sdot sdot sdot rarr Humans rarr sdot sdot sdot rarr Phyla rarr Taxonomic categories rarrTaxonomy rarr Classification systems rarr Library science rarr Libraries rarr sdot sdot sdot rarr Human welfareorganizations rarr Non-profit organizations by beneficiaries rarr Non-profit organizations rarr sdot sdot sdot rarrConstitutions rarr Legal documents rarr Documents rarr sdot sdot sdot rarr Crowd psychology rarr Collectiveintelligence rarr sdot sdot sdot rarr Cyberspace rarr Internet rarr Wide area networks rarr sdot sdot sdot rarr World rarr sdot sdot sdot rarr People

Box 2 Case of inappropriate generalization in theWP category hierarchy (bold represents the same category concept but it occurs iterativelyon the hierarchy)

Table 1 Similarity issues for bootstrapping methods

Cases Categories Count of articles |1198621

cap 1198622

| JSC DC

1 Domain 1 100 50 0333 05Candidate 1 100

2 Domain 2 100 18 0176 03Candidate 2 20

measurements mentioned in the previous section We nowdescribe these processes in detail using real examples

31 System Flow The proposed method takes one categorywhich the user selects as an entry (trigger) and follows theflowchart shown in Figure 1 Starting from the entry categorywe determine similar categories through a horizontal searchIn the search articles included in the entry act as ldquocluesrdquofor measuring the similarity and ldquobridgesrdquo for preparing thenext candidate category Figure 2 illustrates an example of acategory-article network If the category ldquoNatural languageprocessingrdquo is given as the entry the method finds articles forsimilarity measurement and prepares the categories of eacharticle for the next candidates This means that ldquoInformationsciencerdquo ldquoKnowledge representationrdquo ldquoMachine learningrdquoldquoArtificial intelligence applicationsrdquo ldquoData miningrdquo and soforth are processed individually as candidate categories Wenow explain the process shown in Figure 1 using similarexamples

32 Domain-Term Selection throughCategoryGrouping (Boot-strapping Method) To group similar categories we choosea horizontal category search and propose new similaritymeasurements that enable the group to be enriched Thebootstrapping process proceeds as follows

(1) An initial domain category (DC) consists of a user-selected category DC = user selected category Forexample DC = Natural language processing Thelength of DC increases throughout the iterative pro-cess

(2) The domain articles (DA) of one category are col-lected At first DA consists of the articles of the entrybut it gradually becomes enriched with articles fromnew domain categories Explicitly DA = (art distcount dw)i 1 =lt i =lt n where art dist count and dwdenote an article a distance from an entry categoryan accumulated (overlapped) count and a domain-term weight respectively The initial elements of DAhave values of 1 1 and 10 for dist count and weightrespectively For example DA = (Concept mining1 1 10) (Information retrieval 1 1 10) (Languageguessing 1 1 10) (Stemming 1 1 10) (Contentdetermination 1 1 10)

(3) There are two options to choose whether an article-link network is used in the similarity measurementsWe explain the options using Figure 3 and Table 2Figure 3 shows the network between the categoriesldquoNatural language processingrdquo and ldquoData miningrdquowhereas Table 2 defines the network types

(i) The first option considers only category-articlenetworks such as Content determinationInformation retrieval Languageware Conceptmining Document classification Text miningAutomatic summarization String kernel Senticcomputing for ldquoNatural language process-ingrdquo Type 1 in Table 2 is related to this option

(ii) The second option uses more complex networksthat utilize the category-article network as well

4 International Journal of Distributed Sensor Networks

Start

(1) Entry category (user interested category)

(6)

(2) Collecting domain articles

(3) Link type = 1

(4) Collecting candidate categories

(5) Collecting candidate articles

Yes

Yes

Yes

Yes

Yes

Yes

(11) Collecting links of domain articles

(12) Collecting candidate categories

(14) Collecting links of candidate articles

(13) Collecting candidate articles

(15)

No

No

No

No

No

No

End

Evaluation(domain category and domain term)

cat1

= (art dist count and

artj)

catk)

cati catk)

(7)

(8) Move candidate andrespectively

(9)

(10)

(19) Print

artj)

catk) =

catk) = link cano 1 le o le q

(16)

(17) Move candidate

(18)

weight)j 1 le j le n

Measure similarity

threshold value

cati) = link domj 1 le j le mcati)

DC =

CC(

DA(

CA(

CA

DLS(

CC(

CA(

CLS(

DC and DA

sim( cati catk)Measure similarity sim(

sim gethreshold valuesim ge

to DC and DA

CA and CLS

= (artl) 1 le l le p

(artl) 1 le l le p= (catk) 1 le k le m

= (catk) 1 le k le m

|DA| = 0

|CC| = 0|CC| = 0

to DC and DA and DLS respectively

Figure 1 Flowchart for domain category grouping and domain-term selection

International Journal of Distributed Sensor Networks 5

(C) Natural language processing

(A) Concept mining

(A) Document classification

(A) Automatic summarization

(C) Computational linguistics

(C) Data mining

(C) Artificial intelligence applications

(C) Information science

(C) Machine learning

(A) Knowledge representation

(C) middot middot middot

(C) middot middot middotmiddot middot middot

Figure 2 Example of a category-article network

(C) Natural language processing

(C) Data mining

Languageware

Concept mining

Document classification

Text mining

Text corpus

Informationretrieval

Contentdetermination

String kernel

Clusteranalysis

Automaticsummarization

Information Datavisualization

OntologySenticcomputing

Biomedicaltext mining

middot middot middot

middot middot middot

Figure 3 Network structure between the categories ldquoNatural language processingrdquo and ldquoData miningrdquo Solid lines and dotted lines representthe category-article network and article-link network respectively whereas (C) denotes a category Here the link is a connection betweenWP articles

as article links Types 2 and 3 in Table 2 repre-sent this option

If the first option is chosen the method goes to Step4 otherwise it goes to Step 11

(4) Candidate categories (CC) are collected using onearticle of DA If DC already contains a candidate it isnot included in CC Explicitly CC (art

119894) = cat

119895 1 =lt

119895 =lt 119898 For example CC (Concept mining) = Datamining Artificial intelligence applications

(5) To prepare clues that indicate suitable categoriesa set of candidate articles (CA) of one candidatecategory is formed For example CA (Data mining)= Extension neural network Big data Data classifi-cation (business intelligence) Document classification

Webmining Text mining Concept mining If thereis an intersection article then an intersection set IS isconstructed to measure the similarity 119868119878 (set

119894 set119895) =

(set119894cap set119895) 997904rArr 119894119886

119896 1 le 119896 le 119901 For exam-

ple IS (Natural language processing Data mining) =Languageware Concept mining Document classifica-tion Text mining

(6) To eliminate the limitation described in Section 2 wepropose a new similarity measurement

sim (DACA) = 120572 times |119868119878 (DACA)||DA|

+ (1 minus 120572)

times

|119868119878 (DACA)||CA|

(1)

6 International Journal of Distributed Sensor Networks

Table 2 Network types and examples C denotes a category A anarticle and L a link

Type Semantic network Examples of thenetwork

1 C1 rarr (A1 = A2)larr C2

Natural languageprocessing rarrDocument

classificationlarrData mining

2

C1 rarr A1 rarr (L1 = A2)larr C2

Natural languageprocessing rarrInformation

retrieval rarr Textcorpuslarr data

mining

C1 rarr (A1 = L2)larr A2larr C2

Natural languageprocessing rarrAutomatic

summarizationlarrText mininglarrData mining

3 C1 rarr A1 rarr (L1 = L2)larr A2larr C2

Natural languageprocessing rarrInformationretrieval rarr

InformationlarrData visualizationlarr Data mining

Bold represents the category concepts connected with super and subrelations but inappropriate generalization in hierarchical structure

where basically we assign a value of 05 to120572 Based on(1) we can calculate the similarities of Cases 1 and 2 inTable 1 to obtain values of 05 and 054 respectivelyHowever we must consider an additional constraintin the bootstrapping method Table 3 shows anotherexample Based on (1) we find that both cases havethe same similarity value as shown in Table 3 Evenso Case 2 is inappropriate because the coverage ofDA is too narrow This may cause the generalizationproblem Thus before calculating similarities in thebootstrapping method the similarity constraint

|DA| ge |CA| (2)

should be satisfiedAccording to this constraint and (1) the similaritybetween ldquoNatural language processingrdquo and ldquoDataminingrdquo is calculated as follows |DA| = 117 |CA| =93 |119868119878 (DACA)| = 4 sim (Natural language process-ing Data mining) = (4117 + 493)2 = 0039

(7) If the similarity exceeds a predetermined thresholdgo to Step 8 otherwise we skip Step 8 and proceed toStep 9

(8) If the candidate category has a similarity that isgreater than the threshold the system considers thecandidate to be the domain and inserts the candidateand its articles into DC andDAHere new articles are

Table 3 Examples of similarity measurement

Cases Categories Count of articles |1198621

cap 1198622

| sim

1 DA1 400 100 0625CA1 100

2 DA2 100 00 0625CA2 400

accompanied by supplementary values of dist countand dw Specifically dist is the category distance fromthe entry category (in the case of ldquoData miningrdquo distis 2) count is 1 and dw is the similarity calculated inStep 6 If DA already contains an article the originalsupplementary values are increased by adding thenewonesThese values are used later for domain-termselection

(9) If there is at least one element in CC we return to Step4 for the next candidate category otherwise proceedto Step 10

(10) If there is at least one element in DA we return toStep 2 for the new CC of the next domain articleotherwise proceed to Step 19

(11) To use the links of articles as additional clues for thesimilarity measurement a domain link set (DLS) iscollected DLS(DA) = link domj 1 =lt j =lt m Forexample DLS = Information Metadata Relationaldatabase World Wide Web Data (computing) Docu-ment retrieval

(12) This step is the same as Step 4(13) This step is the same as Step 5(14) Construct a candidate link set (CLS) with links from

CA Explicitly CLS(CA (catk)) = link cat119905 1 =lt

t =lt q For example CLS(CA (Data mining)) =Computer science Data set Artificial intelligenceDatabase management system Business intelligenceNeural network Cluster analysis

(15) If the similarity constraint is satisfied (1) is appliedto determine the category-article similarity We useadditional similarity measurements for article-linknetworks The network can have different degreesof relatedness according to the network type (seeTable 2) If two categories have complex close con-nections the similarity should be greater becauseof common features in the neighborhood To applyboth characteristics we propose a further similaritymeasure

sim (set119894 set119895)

=

10038161003816100381610038161003816119868119878 (set

119894 set119895)

10038161003816100381610038161003816

distancetimes

1

1003816100381610038161003816set119894

1003816100381610038161003816+

10038161003816100381610038161003816set119895

10038161003816100381610038161003816

(3)

where distance is the number of articles that existon the network (this is different from dist in DA)According to (3) we can calculate sim (DLS CA)

International Journal of Distributed Sensor Networks 7

sim (DA CLS) and sim (DLS CLS) which have dis-tances of 2 2 and 3 respectively The final similarity(final sim) between DA and CA is determined bysumming sim (DA CA) from (1) with sim (DLS CA)sim (DA CLS) and sim(DLS CLS) from (3) Thesimilarity between ldquoNatural language processingrdquo andldquoData miningrdquo can thus be calculated as follows

|DA| = 117 |CA| = 93

|DLS| = 3633 |CLS| = 3220

|119868119878 (DACA)| = 4 |119868119878 (DLSCA)| = 91

|119868119878 (DACLS)| = 63 |119868119878 (DLSCLS)| = 6771

sim (DACA) = (4117 + 493)2

= 0039

sim (DLSCA) = (912)(3633 + 93)

= 0012

sim (DACLS) = (632)(117 + 3220)

= 0009

sim (DLSCLS) = (67713)(3633 + 3220)

= 03293

final sim (Natural language processingData mining)

= 03893

(4)

(16) If the similarity exceeds the predetermined thresholdgo to Step 17 otherwise Step 18 is carried out

(17) If the similarity exceeds the threshold the systemenriches DC DA andDLSThis step is similar to Step8

(18) If there is at least one element in CC we return to Step12 for the next candidate category otherwise go toStep 10

(19) Output the DC and DA acquired through the boot-strapping process

(20) Terminate the bootstrapping process and evaluateDCand DA which are increased in Step 8 or 17 ForDA the evaluations are divided into three types bysupplementary value (dist count and dw) In thecase of domain weight we use values normalizedaccording to

weight1015840 (art119894) =

weight (art119894)

argart119896isinDA max weight (art119896)

(5)

We have described all of the bootstrapping steps fordomain-term selection by grouping similar categories In thenext section a few aspects of performance are evaluated

4 Experimental Evaluations

This section considers the evaluation of DC and DA Oneobjective of our research is to select as many domain-terms

Table 4 Domain categories collected for ldquoSemantic Webrdquo on NLSwith threshold value 02

Categories sim dist REMetadata publishing 047 2 1Semantic HTML 040 2 1RDF 061 2 1Knowledge engineering 042 2 1Folksonomy 027 2 1Triplestores 068 2 1Domain-specificknowledge representationlanguages

037 2 1

RDF data access 052 2 1Book swapping 022 9 0Rule engines 028 2 1Ontology languages 035 2 1Knowledge bases 030 2 1

RE relevance evaluation

as possible for which we proposed the bootstrappingmethodfor similar category grouping In the process DC and DAbecome enriched with categories and articles respectivelyand each article has supplementary values of dist count anddw

To evaluate the quality of DC and DA we used an article-category dataset and a Pagelinks dataset that are includedamong theWP components inDBpedia (DBpedia is a crowd-sourced community effort to extract structured informationfrom Wikipedia and make this information available on theWeb httpdbpediaorgAbout) version 37 To utilize theWP networks we implemented two versions of our systemreferred to as new similarity (NS) and new similarity withlinks (NSL) We varied the threshold of each system from 01to 09 Moreover we chose entry sets of 40 categories (eachset has one category) from fields of computer science suchas ldquoNatural language processingrdquo ldquoSpeech recognitionrdquo andldquoSemantic Webrdquo The results are compared with those of abaseline method that employs the DC similarity measureTable 4 shows some of the similar categories collected by NSLfor the entry category ldquoSemanticWebrdquo with a threshold of 02

We invited domain specialists to examine the resultsand each collection was manually checked by each evaluatorThe RE field in Table 4 shows the actual checked resultswhere values of 1 and 0 were ascribed by the evaluators forrelevance and irrelevance respectively Tables 5 and 6 presentsummaries of the DC evaluations

There was no improvement for thresholds greater than06 and the bootstrapping was incomplete for a threshold of01Therefore we present results for threshold values from02to 06 In the experiments the baseline attained an extensionrate of only 118 (40 categories extended by 7 categories) eventhough its precision was 100 The aim of the informationprocessing is to reduce the time taken to accomplish certainobjectives which implies that the information system should

8 International Journal of Distributed Sensor Networks

XHTML + RDFa Bath Profile Sidecar file COinS Metadata publishing MARC standards WizFolio QiqqaISO-TimeML TimeML Metadata Authority Description Schema Bookends (software) RIS (file format)Metadata Object Description Schema EndNote Refer (software) ISO 2709 BibTeX XML S5 (file format)Semantic HTML Simple HTML Ontology Extensions Opera Show Format XOXO XHTML Friends NetworkStrixDB Graph Style Sheets TriX (syntax) TriG (syntax) RDF feed Redland RDF Application FrameworkRDF query language Turtle (syntax) RDFLib Notation3 SPARQL D3web Artificial architectureNetWeaver Developer Knowledge engineer Frame language

Box 3 Domain articles (DA) collected for ldquoSemantic Webrdquo on NLS with threshold value 02

Table 5 DC evaluations with baseline and NS

Type Baseline NSThreshold value 02 03 02 03 04 05 06Extended count 7 mdash 282 133 87 82 mdashExtension rate 118 mdash 805 433 318 305 mdashAppropriate 7 mdash 241 113 80 74 mdashInappropriate 0 mdash 41 20 7 8 mdashPrecision () 10000 mdash 8546 8496 9195 9024 mdash

Table 6 DC evaluations with NSL

Type NSLThreshold value 02 03 04 05 06Extended count 938 346 207 134 47Extension rate 2445 965 618 435 218Appropriate 799 293 187 127 47Inappropriate 139 53 20 7 0Precision () 8518 8468 9034 9478 10000

Table 7 DA evaluations based on distance including articles withinthe indicated distance

Distance 2 3 4 5 AllAppropriate 1092 1224 1246 1253 1253Inappropriate 252 336 471 508 514Precision () 813 785 726 712 709

Table 8 DA evaluations based on count (overlapped)

Overlapped count 1 2 3 4Appropriate 1253 330 90 15Inappropriate 514 80 14 1Precision () 709 805 865 938

Table 9 DA evaluations based on normalized weight

Threshold value 0 01 02 03Appropriate 1253 1195 870 755Inappropriate 514 292 69 22Precision () 709 804 927 972

provide varied results In this respect we do not expect thebaseline results to be helpful However it is apparent that NSLprovides wide extension and high precision The maximum

extension rate was 2445 with 799 appropriate categories fora threshold of 02 The minimum precision was around 84when the threshold was 03

In addition to evaluating DC we evaluated DA withthe NSL results for a threshold of 02 To examine theinfluence of the distance count and domain weight weanalyzed the results according to each factor Six DAs wereselected at randomwith a total of 1769 articles (terms) Box 3enumerates a part of collected articles for a domain ldquoSemanticWebrdquo And Tables 7 8 and 9 show the evaluation results withrespect to distance count and weight

The basic performance of the domain-term selectionattained precision of 709 As expected the precision wasinversely proportional to the distance however a distanceof 4 produced almost all of the unrelated articles Theweight and count could be used as important criteria toselect domain-terms we found that the weight returnedmore refined results than the count (the weight returned755 appropriate terms with 972 precision at a thresholdof 03) At short distances there are many names of peopleand organizations such as ldquoSquarespacerdquo ldquoRackspace Cloudrdquoand ldquoNsite Software (Platform as a Service)rdquo These nameswere selected by the bootstrapping because their associatedcategories (eg ldquoCloud platformsrdquo ldquoCloud storagerdquo andldquoCloud infrastructurerdquo) were similar to the entry (eg ldquoCloudcomputingrdquo) This situation is not caused by our methodbut by the definition of the article-category relations of WPWe believe that this can be resolved by processing content(abstracts) or tabular information in the future

5 Conclusions and Future Work

This paper has proposed a method of domain-term collec-tion through a bootstrapping process to assist the semanticinterpretation of data from sensor networks To achievethis we identified weaknesses in the WP category hierarchy

International Journal of Distributed Sensor Networks 9

(ie loops and inappropriate generalizations) and chose ahorizontal rather than vertical category searchWeproposednew semantic similarity measurements and a similarity con-straint to surpass existing methods Moreover we employedcategory-article networks and article-link networks to elicitinformation for the category similarity measurement Inperformance evaluations our category grouping based onNSL yielded the greatest number of proper results In termsof domain-term selection we confirmed that the resultsobtained with normalized weights had the best precision andextension rate The distance-based metric had no positiveinfluence on our research When the distance was greaterthan three almost all of the terms were unrelated Howeverwe believe that the collected domain terminologies can assistthe construction of domain knowledge bases for the semanticinterpretation of sensor data

WP has additional weaknesses to those mentioned inthis paper especially in the category-article relation Forexample the term ldquoPaco Nathanrdquo is a personal name thathas ldquoNatural language processingrdquo as one of its categoriesThe relation between the two that is ldquoPaco Nathanrdquo hasexpertise in ldquoNatural language processingrdquo causes noise andnegatively influences semantic information processing Wethink that this problem can be solved in future work byprocessing additional WP components such as abstract ortabular information Moreover our research employed onlythe out-links of WP articles If the in-links were consideredwe expect that the results would be more significant withwider coverage of domain-terms and higher relevance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Kebler and K Janowicz ldquoLinking sensor datamdashwhy to whatand howrdquo in Proceedings of the 3rd International Workshop onSemantic Sensor Networks pp 77ndash91 2010

[2] S Tilak N B Abu-Ghazaleh andWHeinzelman ldquoA taxonomyof wireless micro-sensor network modelsrdquo ACM SIGMOBILEMobile Computing and Communications Review vol 6 no 2pp 28ndash36 2002

[3] G Goodwin and D J Russomanno ldquoAn ontology-bases sensornetwork prototype environmentrdquo in Proceedings of the 5thInternational Conference on Information Processing in SensorNetworks pp 1ndash2 2006

[4] K Janowicz and M Compton ldquoThe Stimulus-Sensor-Observation ontology design pattern and its integration intothe Semantic Sensor Network ontologyrdquo in Proceedings of the3rd International Workshop on Semantic Sensor Networks pp92ndash106 2010

[5] P Barnaghi S Meissner M Presser and K Moessner ldquoSenseand sensrsquoability semantic data modelling for sensor networksrdquoin Proceedings of the ICT Mobile Summit pp 1ndash9 2009

[6] A Broring P Maue K Janowicz D Nust and C MalewskildquoSemantically-enabled sensor plug amp play for the sensor webrdquoSensors vol 11 no 8 pp 7568ndash7605 2011

[7] M Hwang D Choi and P Kim ldquoAmethod for knowledge baseenrichment using wikipedia document informationrdquo Informa-tion vol 13 no 5 pp 1599ndash1612 2010

[8] P Velardi A Cucchiarelli and M Petit ldquoA taxonomy learningmethod and its application to characterize a scientific web com-munityrdquo IEEETransactions onKnowledge andData Engineeringvol 19 no 2 pp 180ndash191 2007

[9] H Avancini A Lavelli B Magnini F Sebastiani and R ZanolildquoExpanding domain-specific lexicons by term categorizationrdquoin Proceedings of the ACM Symposium on Applied Computingpp 793ndash797 March 2003

[10] J J Jung Y H Yu and K S Jo ldquoCollaborative web browsingbased on ontology learning from bookmarksrdquo in Proceedings ofthe International Conference on Computational Science pp 513ndash520 2004

[11] J J Jung ldquoComputational reputation model based on select-ing consensus choices an empirical study on semantic wikiplatformrdquo Expert Systems with Applications vol 39 no 10 pp9002ndash9007 2012

[12] J Lehmann and L Buhmann ldquoOREmdasha tool for repairingand enriching knowledge basesrdquo in Proceedings of the 9thInternational Semantic Web Conference pp 177ndash193 2010

[13] J Liu Y-C Liu W Jiang and X-L Wang ldquoResearch onautomatic acquisition of domain termsrdquo inProceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 3026ndash3031 July 2008

[14] SMinocha andPGThomas ldquoCollaborative Learning in aWikiEnvironment experiences from a software engineering courserdquoNew Review of Hypermedia and Multimedia vol 13 no 2 pp187ndash209 2007

[15] R Navigli and P Velardi ldquoOntology enrichment through auto-matic semantic annotation of on-line glossariesrdquo in Proceedingsof the Knowledge Engineering and Knowledge Management pp126ndash140 2006

[16] C Muller and I Gurevych ldquoUsing wikipedia and wiktionaryin domain-specific information retrievalrdquo in Proceedings of the9th Cross-Language Evaluation ForumConference on EvaluatingSystems forMultilingual andMultimodal Information Access pp219ndash226 2009

[17] J Vivaldi and H Rodriquez ldquoFinding domain terms usingwikipediardquo in Proceedings of the 7th International Conference onLanguage Resources and Evaluation pp 386ndash393 2010

[18] G AMiller ldquoWordNet a lexical database for EnglishrdquoCommu-nications of the ACM vol 38 no 11 pp 39ndash41 1995

[19] S Lee S-Y Huh and R D McNiel ldquoAutomatic generationof concept hierarchies using WordNetrdquo Expert Systems withApplications vol 35 no 3 pp 1132ndash1144 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Research Article Domain Terminology Collection for ...downloads.hindawi.com/journals/ijdsn/2014/827319.pdf · knowledge bases for the semantic interpretation of sensor data. 1. Introduction

International Journal of Distributed Sensor Networks 3

Natural language processing rarr Computational linguistics rarr Natural language and computing rarrHuman-computer interaction rarr Artificial intelligence rarr Futurology rarr Social change rarr Socialphilosophy rarr Human sciences rarr Interdisciplinary fields rarr Science rarr Knowledge rarr Conceptsrarr Mental content rarr Consciousness rarr Philosophy of mind rarr Conceptions of self rarr Concepts rarr sdot sdot sdot

Box 1 Case of a loop in the WP category hierarchy (bold represents the same category concept but it occurs iteratively on the hierarchy)

Natural language processing rarr Computational linguistics rarr Natural language and computingrarr sdot sdot sdot rarr Artificial intelligence rarr sdot sdot sdot rarr Human sciences rarr Social sciences rarr sdot sdot sdot rarr Science rarrKnowledge rarr Concepts rarr Mental content rarr Consciousness rarr Mind rarr Concepts inmetaphysics rarr sdot sdot sdot rarr Form rarr Ontology rarr Reality rarr sdot sdot sdot rarr Scientific observation rarr Datacollection rarr sdot sdot sdot rarr Probability rarr Philosophical logic rarr sdot sdot sdot rarr Neo-Marxism rarr Marxism rarr sdot sdot sdot rarrReasoning rarr Intelligence rarr sdot sdot sdot rarr Humans rarr sdot sdot sdot rarr Phyla rarr Taxonomic categories rarrTaxonomy rarr Classification systems rarr Library science rarr Libraries rarr sdot sdot sdot rarr Human welfareorganizations rarr Non-profit organizations by beneficiaries rarr Non-profit organizations rarr sdot sdot sdot rarrConstitutions rarr Legal documents rarr Documents rarr sdot sdot sdot rarr Crowd psychology rarr Collectiveintelligence rarr sdot sdot sdot rarr Cyberspace rarr Internet rarr Wide area networks rarr sdot sdot sdot rarr World rarr sdot sdot sdot rarr People

Box 2 Case of inappropriate generalization in theWP category hierarchy (bold represents the same category concept but it occurs iterativelyon the hierarchy)

Table 1 Similarity issues for bootstrapping methods

Cases Categories Count of articles |1198621

cap 1198622

| JSC DC

1 Domain 1 100 50 0333 05Candidate 1 100

2 Domain 2 100 18 0176 03Candidate 2 20

measurements mentioned in the previous section We nowdescribe these processes in detail using real examples

31 System Flow The proposed method takes one categorywhich the user selects as an entry (trigger) and follows theflowchart shown in Figure 1 Starting from the entry categorywe determine similar categories through a horizontal searchIn the search articles included in the entry act as ldquocluesrdquofor measuring the similarity and ldquobridgesrdquo for preparing thenext candidate category Figure 2 illustrates an example of acategory-article network If the category ldquoNatural languageprocessingrdquo is given as the entry the method finds articles forsimilarity measurement and prepares the categories of eacharticle for the next candidates This means that ldquoInformationsciencerdquo ldquoKnowledge representationrdquo ldquoMachine learningrdquoldquoArtificial intelligence applicationsrdquo ldquoData miningrdquo and soforth are processed individually as candidate categories Wenow explain the process shown in Figure 1 using similarexamples

32 Domain-Term Selection throughCategoryGrouping (Boot-strapping Method) To group similar categories we choosea horizontal category search and propose new similaritymeasurements that enable the group to be enriched Thebootstrapping process proceeds as follows

(1) An initial domain category (DC) consists of a user-selected category DC = user selected category Forexample DC = Natural language processing Thelength of DC increases throughout the iterative pro-cess

(2) The domain articles (DA) of one category are col-lected At first DA consists of the articles of the entrybut it gradually becomes enriched with articles fromnew domain categories Explicitly DA = (art distcount dw)i 1 =lt i =lt n where art dist count and dwdenote an article a distance from an entry categoryan accumulated (overlapped) count and a domain-term weight respectively The initial elements of DAhave values of 1 1 and 10 for dist count and weightrespectively For example DA = (Concept mining1 1 10) (Information retrieval 1 1 10) (Languageguessing 1 1 10) (Stemming 1 1 10) (Contentdetermination 1 1 10)

(3) There are two options to choose whether an article-link network is used in the similarity measurementsWe explain the options using Figure 3 and Table 2Figure 3 shows the network between the categoriesldquoNatural language processingrdquo and ldquoData miningrdquowhereas Table 2 defines the network types

(i) The first option considers only category-articlenetworks such as Content determinationInformation retrieval Languageware Conceptmining Document classification Text miningAutomatic summarization String kernel Senticcomputing for ldquoNatural language process-ingrdquo Type 1 in Table 2 is related to this option

(ii) The second option uses more complex networksthat utilize the category-article network as well

4 International Journal of Distributed Sensor Networks

Start

(1) Entry category (user interested category)

(6)

(2) Collecting domain articles

(3) Link type = 1

(4) Collecting candidate categories

(5) Collecting candidate articles

Yes

Yes

Yes

Yes

Yes

Yes

(11) Collecting links of domain articles

(12) Collecting candidate categories

(14) Collecting links of candidate articles

(13) Collecting candidate articles

(15)

No

No

No

No

No

No

End

Evaluation(domain category and domain term)

cat1

= (art dist count and

artj)

catk)

cati catk)

(7)

(8) Move candidate andrespectively

(9)

(10)

(19) Print

artj)

catk) =

catk) = link cano 1 le o le q

(16)

(17) Move candidate

(18)

weight)j 1 le j le n

Measure similarity

threshold value

cati) = link domj 1 le j le mcati)

DC =

CC(

DA(

CA(

CA

DLS(

CC(

CA(

CLS(

DC and DA

sim( cati catk)Measure similarity sim(

sim gethreshold valuesim ge

to DC and DA

CA and CLS

= (artl) 1 le l le p

(artl) 1 le l le p= (catk) 1 le k le m

= (catk) 1 le k le m

|DA| = 0

|CC| = 0|CC| = 0

to DC and DA and DLS respectively

Figure 1 Flowchart for domain category grouping and domain-term selection

International Journal of Distributed Sensor Networks 5

(C) Natural language processing

(A) Concept mining

(A) Document classification

(A) Automatic summarization

(C) Computational linguistics

(C) Data mining

(C) Artificial intelligence applications

(C) Information science

(C) Machine learning

(A) Knowledge representation

(C) middot middot middot

(C) middot middot middotmiddot middot middot

Figure 2 Example of a category-article network

(C) Natural language processing

(C) Data mining

Languageware

Concept mining

Document classification

Text mining

Text corpus

Informationretrieval

Contentdetermination

String kernel

Clusteranalysis

Automaticsummarization

Information Datavisualization

OntologySenticcomputing

Biomedicaltext mining

middot middot middot

middot middot middot

Figure 3 Network structure between the categories ldquoNatural language processingrdquo and ldquoData miningrdquo Solid lines and dotted lines representthe category-article network and article-link network respectively whereas (C) denotes a category Here the link is a connection betweenWP articles

as article links Types 2 and 3 in Table 2 repre-sent this option

If the first option is chosen the method goes to Step4 otherwise it goes to Step 11

(4) Candidate categories (CC) are collected using onearticle of DA If DC already contains a candidate it isnot included in CC Explicitly CC (art

119894) = cat

119895 1 =lt

119895 =lt 119898 For example CC (Concept mining) = Datamining Artificial intelligence applications

(5) To prepare clues that indicate suitable categoriesa set of candidate articles (CA) of one candidatecategory is formed For example CA (Data mining)= Extension neural network Big data Data classifi-cation (business intelligence) Document classification

Webmining Text mining Concept mining If thereis an intersection article then an intersection set IS isconstructed to measure the similarity 119868119878 (set

119894 set119895) =

(set119894cap set119895) 997904rArr 119894119886

119896 1 le 119896 le 119901 For exam-

ple IS (Natural language processing Data mining) =Languageware Concept mining Document classifica-tion Text mining

(6) To eliminate the limitation described in Section 2 wepropose a new similarity measurement

sim (DACA) = 120572 times |119868119878 (DACA)||DA|

+ (1 minus 120572)

times

|119868119878 (DACA)||CA|

(1)

6 International Journal of Distributed Sensor Networks

Table 2 Network types and examples C denotes a category A anarticle and L a link

Type Semantic network Examples of thenetwork

1 C1 rarr (A1 = A2)larr C2

Natural languageprocessing rarrDocument

classificationlarrData mining

2

C1 rarr A1 rarr (L1 = A2)larr C2

Natural languageprocessing rarrInformation

retrieval rarr Textcorpuslarr data

mining

C1 rarr (A1 = L2)larr A2larr C2

Natural languageprocessing rarrAutomatic

summarizationlarrText mininglarrData mining

3 C1 rarr A1 rarr (L1 = L2)larr A2larr C2

Natural languageprocessing rarrInformationretrieval rarr

InformationlarrData visualizationlarr Data mining

Bold represents the category concepts connected with super and subrelations but inappropriate generalization in hierarchical structure

where basically we assign a value of 05 to120572 Based on(1) we can calculate the similarities of Cases 1 and 2 inTable 1 to obtain values of 05 and 054 respectivelyHowever we must consider an additional constraintin the bootstrapping method Table 3 shows anotherexample Based on (1) we find that both cases havethe same similarity value as shown in Table 3 Evenso Case 2 is inappropriate because the coverage ofDA is too narrow This may cause the generalizationproblem Thus before calculating similarities in thebootstrapping method the similarity constraint

|DA| ge |CA| (2)

should be satisfiedAccording to this constraint and (1) the similaritybetween ldquoNatural language processingrdquo and ldquoDataminingrdquo is calculated as follows |DA| = 117 |CA| =93 |119868119878 (DACA)| = 4 sim (Natural language process-ing Data mining) = (4117 + 493)2 = 0039

(7) If the similarity exceeds a predetermined thresholdgo to Step 8 otherwise we skip Step 8 and proceed toStep 9

(8) If the candidate category has a similarity that isgreater than the threshold the system considers thecandidate to be the domain and inserts the candidateand its articles into DC andDAHere new articles are

Table 3 Examples of similarity measurement

Cases Categories Count of articles |1198621

cap 1198622

| sim

1 DA1 400 100 0625CA1 100

2 DA2 100 00 0625CA2 400

accompanied by supplementary values of dist countand dw Specifically dist is the category distance fromthe entry category (in the case of ldquoData miningrdquo distis 2) count is 1 and dw is the similarity calculated inStep 6 If DA already contains an article the originalsupplementary values are increased by adding thenewonesThese values are used later for domain-termselection

(9) If there is at least one element in CC we return to Step4 for the next candidate category otherwise proceedto Step 10

(10) If there is at least one element in DA we return toStep 2 for the new CC of the next domain articleotherwise proceed to Step 19

(11) To use the links of articles as additional clues for thesimilarity measurement a domain link set (DLS) iscollected DLS(DA) = link domj 1 =lt j =lt m Forexample DLS = Information Metadata Relationaldatabase World Wide Web Data (computing) Docu-ment retrieval

(12) This step is the same as Step 4(13) This step is the same as Step 5(14) Construct a candidate link set (CLS) with links from

CA Explicitly CLS(CA (catk)) = link cat119905 1 =lt

t =lt q For example CLS(CA (Data mining)) =Computer science Data set Artificial intelligenceDatabase management system Business intelligenceNeural network Cluster analysis

(15) If the similarity constraint is satisfied (1) is appliedto determine the category-article similarity We useadditional similarity measurements for article-linknetworks The network can have different degreesof relatedness according to the network type (seeTable 2) If two categories have complex close con-nections the similarity should be greater becauseof common features in the neighborhood To applyboth characteristics we propose a further similaritymeasure

sim (set119894 set119895)

=

10038161003816100381610038161003816119868119878 (set

119894 set119895)

10038161003816100381610038161003816

distancetimes

1

1003816100381610038161003816set119894

1003816100381610038161003816+

10038161003816100381610038161003816set119895

10038161003816100381610038161003816

(3)

where distance is the number of articles that existon the network (this is different from dist in DA)According to (3) we can calculate sim (DLS CA)

International Journal of Distributed Sensor Networks 7

sim (DA CLS) and sim (DLS CLS) which have dis-tances of 2 2 and 3 respectively The final similarity(final sim) between DA and CA is determined bysumming sim (DA CA) from (1) with sim (DLS CA)sim (DA CLS) and sim(DLS CLS) from (3) Thesimilarity between ldquoNatural language processingrdquo andldquoData miningrdquo can thus be calculated as follows

|DA| = 117 |CA| = 93

|DLS| = 3633 |CLS| = 3220

|119868119878 (DACA)| = 4 |119868119878 (DLSCA)| = 91

|119868119878 (DACLS)| = 63 |119868119878 (DLSCLS)| = 6771

sim (DACA) = (4117 + 493)2

= 0039

sim (DLSCA) = (912)(3633 + 93)

= 0012

sim (DACLS) = (632)(117 + 3220)

= 0009

sim (DLSCLS) = (67713)(3633 + 3220)

= 03293

final sim (Natural language processingData mining)

= 03893

(4)

(16) If the similarity exceeds the predetermined thresholdgo to Step 17 otherwise Step 18 is carried out

(17) If the similarity exceeds the threshold the systemenriches DC DA andDLSThis step is similar to Step8

(18) If there is at least one element in CC we return to Step12 for the next candidate category otherwise go toStep 10

(19) Output the DC and DA acquired through the boot-strapping process

(20) Terminate the bootstrapping process and evaluateDCand DA which are increased in Step 8 or 17 ForDA the evaluations are divided into three types bysupplementary value (dist count and dw) In thecase of domain weight we use values normalizedaccording to

weight1015840 (art119894) =

weight (art119894)

argart119896isinDA max weight (art119896)

(5)

We have described all of the bootstrapping steps fordomain-term selection by grouping similar categories In thenext section a few aspects of performance are evaluated

4 Experimental Evaluations

This section considers the evaluation of DC and DA Oneobjective of our research is to select as many domain-terms

Table 4 Domain categories collected for ldquoSemantic Webrdquo on NLSwith threshold value 02

Categories sim dist REMetadata publishing 047 2 1Semantic HTML 040 2 1RDF 061 2 1Knowledge engineering 042 2 1Folksonomy 027 2 1Triplestores 068 2 1Domain-specificknowledge representationlanguages

037 2 1

RDF data access 052 2 1Book swapping 022 9 0Rule engines 028 2 1Ontology languages 035 2 1Knowledge bases 030 2 1

RE relevance evaluation

as possible for which we proposed the bootstrappingmethodfor similar category grouping In the process DC and DAbecome enriched with categories and articles respectivelyand each article has supplementary values of dist count anddw

To evaluate the quality of DC and DA we used an article-category dataset and a Pagelinks dataset that are includedamong theWP components inDBpedia (DBpedia is a crowd-sourced community effort to extract structured informationfrom Wikipedia and make this information available on theWeb httpdbpediaorgAbout) version 37 To utilize theWP networks we implemented two versions of our systemreferred to as new similarity (NS) and new similarity withlinks (NSL) We varied the threshold of each system from 01to 09 Moreover we chose entry sets of 40 categories (eachset has one category) from fields of computer science suchas ldquoNatural language processingrdquo ldquoSpeech recognitionrdquo andldquoSemantic Webrdquo The results are compared with those of abaseline method that employs the DC similarity measureTable 4 shows some of the similar categories collected by NSLfor the entry category ldquoSemanticWebrdquo with a threshold of 02

We invited domain specialists to examine the resultsand each collection was manually checked by each evaluatorThe RE field in Table 4 shows the actual checked resultswhere values of 1 and 0 were ascribed by the evaluators forrelevance and irrelevance respectively Tables 5 and 6 presentsummaries of the DC evaluations

There was no improvement for thresholds greater than06 and the bootstrapping was incomplete for a threshold of01Therefore we present results for threshold values from02to 06 In the experiments the baseline attained an extensionrate of only 118 (40 categories extended by 7 categories) eventhough its precision was 100 The aim of the informationprocessing is to reduce the time taken to accomplish certainobjectives which implies that the information system should

8 International Journal of Distributed Sensor Networks

XHTML + RDFa Bath Profile Sidecar file COinS Metadata publishing MARC standards WizFolio QiqqaISO-TimeML TimeML Metadata Authority Description Schema Bookends (software) RIS (file format)Metadata Object Description Schema EndNote Refer (software) ISO 2709 BibTeX XML S5 (file format)Semantic HTML Simple HTML Ontology Extensions Opera Show Format XOXO XHTML Friends NetworkStrixDB Graph Style Sheets TriX (syntax) TriG (syntax) RDF feed Redland RDF Application FrameworkRDF query language Turtle (syntax) RDFLib Notation3 SPARQL D3web Artificial architectureNetWeaver Developer Knowledge engineer Frame language

Box 3 Domain articles (DA) collected for ldquoSemantic Webrdquo on NLS with threshold value 02

Table 5 DC evaluations with baseline and NS

Type Baseline NSThreshold value 02 03 02 03 04 05 06Extended count 7 mdash 282 133 87 82 mdashExtension rate 118 mdash 805 433 318 305 mdashAppropriate 7 mdash 241 113 80 74 mdashInappropriate 0 mdash 41 20 7 8 mdashPrecision () 10000 mdash 8546 8496 9195 9024 mdash

Table 6 DC evaluations with NSL

Type NSLThreshold value 02 03 04 05 06Extended count 938 346 207 134 47Extension rate 2445 965 618 435 218Appropriate 799 293 187 127 47Inappropriate 139 53 20 7 0Precision () 8518 8468 9034 9478 10000

Table 7 DA evaluations based on distance including articles withinthe indicated distance

Distance 2 3 4 5 AllAppropriate 1092 1224 1246 1253 1253Inappropriate 252 336 471 508 514Precision () 813 785 726 712 709

Table 8 DA evaluations based on count (overlapped)

Overlapped count 1 2 3 4Appropriate 1253 330 90 15Inappropriate 514 80 14 1Precision () 709 805 865 938

Table 9 DA evaluations based on normalized weight

Threshold value 0 01 02 03Appropriate 1253 1195 870 755Inappropriate 514 292 69 22Precision () 709 804 927 972

provide varied results In this respect we do not expect thebaseline results to be helpful However it is apparent that NSLprovides wide extension and high precision The maximum

extension rate was 2445 with 799 appropriate categories fora threshold of 02 The minimum precision was around 84when the threshold was 03

In addition to evaluating DC we evaluated DA withthe NSL results for a threshold of 02 To examine theinfluence of the distance count and domain weight weanalyzed the results according to each factor Six DAs wereselected at randomwith a total of 1769 articles (terms) Box 3enumerates a part of collected articles for a domain ldquoSemanticWebrdquo And Tables 7 8 and 9 show the evaluation results withrespect to distance count and weight

The basic performance of the domain-term selectionattained precision of 709 As expected the precision wasinversely proportional to the distance however a distanceof 4 produced almost all of the unrelated articles Theweight and count could be used as important criteria toselect domain-terms we found that the weight returnedmore refined results than the count (the weight returned755 appropriate terms with 972 precision at a thresholdof 03) At short distances there are many names of peopleand organizations such as ldquoSquarespacerdquo ldquoRackspace Cloudrdquoand ldquoNsite Software (Platform as a Service)rdquo These nameswere selected by the bootstrapping because their associatedcategories (eg ldquoCloud platformsrdquo ldquoCloud storagerdquo andldquoCloud infrastructurerdquo) were similar to the entry (eg ldquoCloudcomputingrdquo) This situation is not caused by our methodbut by the definition of the article-category relations of WPWe believe that this can be resolved by processing content(abstracts) or tabular information in the future

5 Conclusions and Future Work

This paper has proposed a method of domain-term collec-tion through a bootstrapping process to assist the semanticinterpretation of data from sensor networks To achievethis we identified weaknesses in the WP category hierarchy

International Journal of Distributed Sensor Networks 9

(ie loops and inappropriate generalizations) and chose ahorizontal rather than vertical category searchWeproposednew semantic similarity measurements and a similarity con-straint to surpass existing methods Moreover we employedcategory-article networks and article-link networks to elicitinformation for the category similarity measurement Inperformance evaluations our category grouping based onNSL yielded the greatest number of proper results In termsof domain-term selection we confirmed that the resultsobtained with normalized weights had the best precision andextension rate The distance-based metric had no positiveinfluence on our research When the distance was greaterthan three almost all of the terms were unrelated Howeverwe believe that the collected domain terminologies can assistthe construction of domain knowledge bases for the semanticinterpretation of sensor data

WP has additional weaknesses to those mentioned inthis paper especially in the category-article relation Forexample the term ldquoPaco Nathanrdquo is a personal name thathas ldquoNatural language processingrdquo as one of its categoriesThe relation between the two that is ldquoPaco Nathanrdquo hasexpertise in ldquoNatural language processingrdquo causes noise andnegatively influences semantic information processing Wethink that this problem can be solved in future work byprocessing additional WP components such as abstract ortabular information Moreover our research employed onlythe out-links of WP articles If the in-links were consideredwe expect that the results would be more significant withwider coverage of domain-terms and higher relevance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Kebler and K Janowicz ldquoLinking sensor datamdashwhy to whatand howrdquo in Proceedings of the 3rd International Workshop onSemantic Sensor Networks pp 77ndash91 2010

[2] S Tilak N B Abu-Ghazaleh andWHeinzelman ldquoA taxonomyof wireless micro-sensor network modelsrdquo ACM SIGMOBILEMobile Computing and Communications Review vol 6 no 2pp 28ndash36 2002

[3] G Goodwin and D J Russomanno ldquoAn ontology-bases sensornetwork prototype environmentrdquo in Proceedings of the 5thInternational Conference on Information Processing in SensorNetworks pp 1ndash2 2006

[4] K Janowicz and M Compton ldquoThe Stimulus-Sensor-Observation ontology design pattern and its integration intothe Semantic Sensor Network ontologyrdquo in Proceedings of the3rd International Workshop on Semantic Sensor Networks pp92ndash106 2010

[5] P Barnaghi S Meissner M Presser and K Moessner ldquoSenseand sensrsquoability semantic data modelling for sensor networksrdquoin Proceedings of the ICT Mobile Summit pp 1ndash9 2009

[6] A Broring P Maue K Janowicz D Nust and C MalewskildquoSemantically-enabled sensor plug amp play for the sensor webrdquoSensors vol 11 no 8 pp 7568ndash7605 2011

[7] M Hwang D Choi and P Kim ldquoAmethod for knowledge baseenrichment using wikipedia document informationrdquo Informa-tion vol 13 no 5 pp 1599ndash1612 2010

[8] P Velardi A Cucchiarelli and M Petit ldquoA taxonomy learningmethod and its application to characterize a scientific web com-munityrdquo IEEETransactions onKnowledge andData Engineeringvol 19 no 2 pp 180ndash191 2007

[9] H Avancini A Lavelli B Magnini F Sebastiani and R ZanolildquoExpanding domain-specific lexicons by term categorizationrdquoin Proceedings of the ACM Symposium on Applied Computingpp 793ndash797 March 2003

[10] J J Jung Y H Yu and K S Jo ldquoCollaborative web browsingbased on ontology learning from bookmarksrdquo in Proceedings ofthe International Conference on Computational Science pp 513ndash520 2004

[11] J J Jung ldquoComputational reputation model based on select-ing consensus choices an empirical study on semantic wikiplatformrdquo Expert Systems with Applications vol 39 no 10 pp9002ndash9007 2012

[12] J Lehmann and L Buhmann ldquoOREmdasha tool for repairingand enriching knowledge basesrdquo in Proceedings of the 9thInternational Semantic Web Conference pp 177ndash193 2010

[13] J Liu Y-C Liu W Jiang and X-L Wang ldquoResearch onautomatic acquisition of domain termsrdquo inProceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 3026ndash3031 July 2008

[14] SMinocha andPGThomas ldquoCollaborative Learning in aWikiEnvironment experiences from a software engineering courserdquoNew Review of Hypermedia and Multimedia vol 13 no 2 pp187ndash209 2007

[15] R Navigli and P Velardi ldquoOntology enrichment through auto-matic semantic annotation of on-line glossariesrdquo in Proceedingsof the Knowledge Engineering and Knowledge Management pp126ndash140 2006

[16] C Muller and I Gurevych ldquoUsing wikipedia and wiktionaryin domain-specific information retrievalrdquo in Proceedings of the9th Cross-Language Evaluation ForumConference on EvaluatingSystems forMultilingual andMultimodal Information Access pp219ndash226 2009

[17] J Vivaldi and H Rodriquez ldquoFinding domain terms usingwikipediardquo in Proceedings of the 7th International Conference onLanguage Resources and Evaluation pp 386ndash393 2010

[18] G AMiller ldquoWordNet a lexical database for EnglishrdquoCommu-nications of the ACM vol 38 no 11 pp 39ndash41 1995

[19] S Lee S-Y Huh and R D McNiel ldquoAutomatic generationof concept hierarchies using WordNetrdquo Expert Systems withApplications vol 35 no 3 pp 1132ndash1144 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Research Article Domain Terminology Collection for ...downloads.hindawi.com/journals/ijdsn/2014/827319.pdf · knowledge bases for the semantic interpretation of sensor data. 1. Introduction

4 International Journal of Distributed Sensor Networks

Start

(1) Entry category (user interested category)

(6)

(2) Collecting domain articles

(3) Link type = 1

(4) Collecting candidate categories

(5) Collecting candidate articles

Yes

Yes

Yes

Yes

Yes

Yes

(11) Collecting links of domain articles

(12) Collecting candidate categories

(14) Collecting links of candidate articles

(13) Collecting candidate articles

(15)

No

No

No

No

No

No

End

Evaluation(domain category and domain term)

cat1

= (art dist count and

artj)

catk)

cati catk)

(7)

(8) Move candidate andrespectively

(9)

(10)

(19) Print

artj)

catk) =

catk) = link cano 1 le o le q

(16)

(17) Move candidate

(18)

weight)j 1 le j le n

Measure similarity

threshold value

cati) = link domj 1 le j le mcati)

DC =

CC(

DA(

CA(

CA

DLS(

CC(

CA(

CLS(

DC and DA

sim( cati catk)Measure similarity sim(

sim gethreshold valuesim ge

to DC and DA

CA and CLS

= (artl) 1 le l le p

(artl) 1 le l le p= (catk) 1 le k le m

= (catk) 1 le k le m

|DA| = 0

|CC| = 0|CC| = 0

to DC and DA and DLS respectively

Figure 1 Flowchart for domain category grouping and domain-term selection

International Journal of Distributed Sensor Networks 5

(C) Natural language processing

(A) Concept mining

(A) Document classification

(A) Automatic summarization

(C) Computational linguistics

(C) Data mining

(C) Artificial intelligence applications

(C) Information science

(C) Machine learning

(A) Knowledge representation

(C) middot middot middot

(C) middot middot middotmiddot middot middot

Figure 2 Example of a category-article network

(C) Natural language processing

(C) Data mining

Languageware

Concept mining

Document classification

Text mining

Text corpus

Informationretrieval

Contentdetermination

String kernel

Clusteranalysis

Automaticsummarization

Information Datavisualization

OntologySenticcomputing

Biomedicaltext mining

middot middot middot

middot middot middot

Figure 3 Network structure between the categories ldquoNatural language processingrdquo and ldquoData miningrdquo Solid lines and dotted lines representthe category-article network and article-link network respectively whereas (C) denotes a category Here the link is a connection betweenWP articles

as article links Types 2 and 3 in Table 2 repre-sent this option

If the first option is chosen the method goes to Step4 otherwise it goes to Step 11

(4) Candidate categories (CC) are collected using onearticle of DA If DC already contains a candidate it isnot included in CC Explicitly CC (art

119894) = cat

119895 1 =lt

119895 =lt 119898 For example CC (Concept mining) = Datamining Artificial intelligence applications

(5) To prepare clues that indicate suitable categoriesa set of candidate articles (CA) of one candidatecategory is formed For example CA (Data mining)= Extension neural network Big data Data classifi-cation (business intelligence) Document classification

Webmining Text mining Concept mining If thereis an intersection article then an intersection set IS isconstructed to measure the similarity 119868119878 (set

119894 set119895) =

(set119894cap set119895) 997904rArr 119894119886

119896 1 le 119896 le 119901 For exam-

ple IS (Natural language processing Data mining) =Languageware Concept mining Document classifica-tion Text mining

(6) To eliminate the limitation described in Section 2 wepropose a new similarity measurement

sim (DACA) = 120572 times |119868119878 (DACA)||DA|

+ (1 minus 120572)

times

|119868119878 (DACA)||CA|

(1)

6 International Journal of Distributed Sensor Networks

Table 2 Network types and examples C denotes a category A anarticle and L a link

Type Semantic network Examples of thenetwork

1 C1 rarr (A1 = A2)larr C2

Natural languageprocessing rarrDocument

classificationlarrData mining

2

C1 rarr A1 rarr (L1 = A2)larr C2

Natural languageprocessing rarrInformation

retrieval rarr Textcorpuslarr data

mining

C1 rarr (A1 = L2)larr A2larr C2

Natural languageprocessing rarrAutomatic

summarizationlarrText mininglarrData mining

3 C1 rarr A1 rarr (L1 = L2)larr A2larr C2

Natural languageprocessing rarrInformationretrieval rarr

InformationlarrData visualizationlarr Data mining

Bold represents the category concepts connected with super and subrelations but inappropriate generalization in hierarchical structure

where basically we assign a value of 05 to120572 Based on(1) we can calculate the similarities of Cases 1 and 2 inTable 1 to obtain values of 05 and 054 respectivelyHowever we must consider an additional constraintin the bootstrapping method Table 3 shows anotherexample Based on (1) we find that both cases havethe same similarity value as shown in Table 3 Evenso Case 2 is inappropriate because the coverage ofDA is too narrow This may cause the generalizationproblem Thus before calculating similarities in thebootstrapping method the similarity constraint

|DA| ge |CA| (2)

should be satisfiedAccording to this constraint and (1) the similaritybetween ldquoNatural language processingrdquo and ldquoDataminingrdquo is calculated as follows |DA| = 117 |CA| =93 |119868119878 (DACA)| = 4 sim (Natural language process-ing Data mining) = (4117 + 493)2 = 0039

(7) If the similarity exceeds a predetermined thresholdgo to Step 8 otherwise we skip Step 8 and proceed toStep 9

(8) If the candidate category has a similarity that isgreater than the threshold the system considers thecandidate to be the domain and inserts the candidateand its articles into DC andDAHere new articles are

Table 3 Examples of similarity measurement

Cases Categories Count of articles |1198621

cap 1198622

| sim

1 DA1 400 100 0625CA1 100

2 DA2 100 00 0625CA2 400

accompanied by supplementary values of dist countand dw Specifically dist is the category distance fromthe entry category (in the case of ldquoData miningrdquo distis 2) count is 1 and dw is the similarity calculated inStep 6 If DA already contains an article the originalsupplementary values are increased by adding thenewonesThese values are used later for domain-termselection

(9) If there is at least one element in CC we return to Step4 for the next candidate category otherwise proceedto Step 10

(10) If there is at least one element in DA we return toStep 2 for the new CC of the next domain articleotherwise proceed to Step 19

(11) To use the links of articles as additional clues for thesimilarity measurement a domain link set (DLS) iscollected DLS(DA) = link domj 1 =lt j =lt m Forexample DLS = Information Metadata Relationaldatabase World Wide Web Data (computing) Docu-ment retrieval

(12) This step is the same as Step 4(13) This step is the same as Step 5(14) Construct a candidate link set (CLS) with links from

CA Explicitly CLS(CA (catk)) = link cat119905 1 =lt

t =lt q For example CLS(CA (Data mining)) =Computer science Data set Artificial intelligenceDatabase management system Business intelligenceNeural network Cluster analysis

(15) If the similarity constraint is satisfied (1) is appliedto determine the category-article similarity We useadditional similarity measurements for article-linknetworks The network can have different degreesof relatedness according to the network type (seeTable 2) If two categories have complex close con-nections the similarity should be greater becauseof common features in the neighborhood To applyboth characteristics we propose a further similaritymeasure

sim (set119894 set119895)

=

10038161003816100381610038161003816119868119878 (set

119894 set119895)

10038161003816100381610038161003816

distancetimes

1

1003816100381610038161003816set119894

1003816100381610038161003816+

10038161003816100381610038161003816set119895

10038161003816100381610038161003816

(3)

where distance is the number of articles that existon the network (this is different from dist in DA)According to (3) we can calculate sim (DLS CA)

International Journal of Distributed Sensor Networks 7

sim (DA CLS) and sim (DLS CLS) which have dis-tances of 2 2 and 3 respectively The final similarity(final sim) between DA and CA is determined bysumming sim (DA CA) from (1) with sim (DLS CA)sim (DA CLS) and sim(DLS CLS) from (3) Thesimilarity between ldquoNatural language processingrdquo andldquoData miningrdquo can thus be calculated as follows

|DA| = 117 |CA| = 93

|DLS| = 3633 |CLS| = 3220

|119868119878 (DACA)| = 4 |119868119878 (DLSCA)| = 91

|119868119878 (DACLS)| = 63 |119868119878 (DLSCLS)| = 6771

sim (DACA) = (4117 + 493)2

= 0039

sim (DLSCA) = (912)(3633 + 93)

= 0012

sim (DACLS) = (632)(117 + 3220)

= 0009

sim (DLSCLS) = (67713)(3633 + 3220)

= 03293

final sim (Natural language processingData mining)

= 03893

(4)

(16) If the similarity exceeds the predetermined thresholdgo to Step 17 otherwise Step 18 is carried out

(17) If the similarity exceeds the threshold the systemenriches DC DA andDLSThis step is similar to Step8

(18) If there is at least one element in CC we return to Step12 for the next candidate category otherwise go toStep 10

(19) Output the DC and DA acquired through the boot-strapping process

(20) Terminate the bootstrapping process and evaluateDCand DA which are increased in Step 8 or 17 ForDA the evaluations are divided into three types bysupplementary value (dist count and dw) In thecase of domain weight we use values normalizedaccording to

weight1015840 (art119894) =

weight (art119894)

argart119896isinDA max weight (art119896)

(5)

We have described all of the bootstrapping steps fordomain-term selection by grouping similar categories In thenext section a few aspects of performance are evaluated

4 Experimental Evaluations

This section considers the evaluation of DC and DA Oneobjective of our research is to select as many domain-terms

Table 4 Domain categories collected for ldquoSemantic Webrdquo on NLSwith threshold value 02

Categories sim dist REMetadata publishing 047 2 1Semantic HTML 040 2 1RDF 061 2 1Knowledge engineering 042 2 1Folksonomy 027 2 1Triplestores 068 2 1Domain-specificknowledge representationlanguages

037 2 1

RDF data access 052 2 1Book swapping 022 9 0Rule engines 028 2 1Ontology languages 035 2 1Knowledge bases 030 2 1

RE relevance evaluation

as possible for which we proposed the bootstrappingmethodfor similar category grouping In the process DC and DAbecome enriched with categories and articles respectivelyand each article has supplementary values of dist count anddw

To evaluate the quality of DC and DA we used an article-category dataset and a Pagelinks dataset that are includedamong theWP components inDBpedia (DBpedia is a crowd-sourced community effort to extract structured informationfrom Wikipedia and make this information available on theWeb httpdbpediaorgAbout) version 37 To utilize theWP networks we implemented two versions of our systemreferred to as new similarity (NS) and new similarity withlinks (NSL) We varied the threshold of each system from 01to 09 Moreover we chose entry sets of 40 categories (eachset has one category) from fields of computer science suchas ldquoNatural language processingrdquo ldquoSpeech recognitionrdquo andldquoSemantic Webrdquo The results are compared with those of abaseline method that employs the DC similarity measureTable 4 shows some of the similar categories collected by NSLfor the entry category ldquoSemanticWebrdquo with a threshold of 02

We invited domain specialists to examine the resultsand each collection was manually checked by each evaluatorThe RE field in Table 4 shows the actual checked resultswhere values of 1 and 0 were ascribed by the evaluators forrelevance and irrelevance respectively Tables 5 and 6 presentsummaries of the DC evaluations

There was no improvement for thresholds greater than06 and the bootstrapping was incomplete for a threshold of01Therefore we present results for threshold values from02to 06 In the experiments the baseline attained an extensionrate of only 118 (40 categories extended by 7 categories) eventhough its precision was 100 The aim of the informationprocessing is to reduce the time taken to accomplish certainobjectives which implies that the information system should

8 International Journal of Distributed Sensor Networks

XHTML + RDFa Bath Profile Sidecar file COinS Metadata publishing MARC standards WizFolio QiqqaISO-TimeML TimeML Metadata Authority Description Schema Bookends (software) RIS (file format)Metadata Object Description Schema EndNote Refer (software) ISO 2709 BibTeX XML S5 (file format)Semantic HTML Simple HTML Ontology Extensions Opera Show Format XOXO XHTML Friends NetworkStrixDB Graph Style Sheets TriX (syntax) TriG (syntax) RDF feed Redland RDF Application FrameworkRDF query language Turtle (syntax) RDFLib Notation3 SPARQL D3web Artificial architectureNetWeaver Developer Knowledge engineer Frame language

Box 3 Domain articles (DA) collected for ldquoSemantic Webrdquo on NLS with threshold value 02

Table 5 DC evaluations with baseline and NS

Type Baseline NSThreshold value 02 03 02 03 04 05 06Extended count 7 mdash 282 133 87 82 mdashExtension rate 118 mdash 805 433 318 305 mdashAppropriate 7 mdash 241 113 80 74 mdashInappropriate 0 mdash 41 20 7 8 mdashPrecision () 10000 mdash 8546 8496 9195 9024 mdash

Table 6 DC evaluations with NSL

Type NSLThreshold value 02 03 04 05 06Extended count 938 346 207 134 47Extension rate 2445 965 618 435 218Appropriate 799 293 187 127 47Inappropriate 139 53 20 7 0Precision () 8518 8468 9034 9478 10000

Table 7 DA evaluations based on distance including articles withinthe indicated distance

Distance 2 3 4 5 AllAppropriate 1092 1224 1246 1253 1253Inappropriate 252 336 471 508 514Precision () 813 785 726 712 709

Table 8 DA evaluations based on count (overlapped)

Overlapped count 1 2 3 4Appropriate 1253 330 90 15Inappropriate 514 80 14 1Precision () 709 805 865 938

Table 9 DA evaluations based on normalized weight

Threshold value 0 01 02 03Appropriate 1253 1195 870 755Inappropriate 514 292 69 22Precision () 709 804 927 972

provide varied results In this respect we do not expect thebaseline results to be helpful However it is apparent that NSLprovides wide extension and high precision The maximum

extension rate was 2445 with 799 appropriate categories fora threshold of 02 The minimum precision was around 84when the threshold was 03

In addition to evaluating DC we evaluated DA withthe NSL results for a threshold of 02 To examine theinfluence of the distance count and domain weight weanalyzed the results according to each factor Six DAs wereselected at randomwith a total of 1769 articles (terms) Box 3enumerates a part of collected articles for a domain ldquoSemanticWebrdquo And Tables 7 8 and 9 show the evaluation results withrespect to distance count and weight

The basic performance of the domain-term selectionattained precision of 709 As expected the precision wasinversely proportional to the distance however a distanceof 4 produced almost all of the unrelated articles Theweight and count could be used as important criteria toselect domain-terms we found that the weight returnedmore refined results than the count (the weight returned755 appropriate terms with 972 precision at a thresholdof 03) At short distances there are many names of peopleand organizations such as ldquoSquarespacerdquo ldquoRackspace Cloudrdquoand ldquoNsite Software (Platform as a Service)rdquo These nameswere selected by the bootstrapping because their associatedcategories (eg ldquoCloud platformsrdquo ldquoCloud storagerdquo andldquoCloud infrastructurerdquo) were similar to the entry (eg ldquoCloudcomputingrdquo) This situation is not caused by our methodbut by the definition of the article-category relations of WPWe believe that this can be resolved by processing content(abstracts) or tabular information in the future

5 Conclusions and Future Work

This paper has proposed a method of domain-term collec-tion through a bootstrapping process to assist the semanticinterpretation of data from sensor networks To achievethis we identified weaknesses in the WP category hierarchy

International Journal of Distributed Sensor Networks 9

(ie loops and inappropriate generalizations) and chose ahorizontal rather than vertical category searchWeproposednew semantic similarity measurements and a similarity con-straint to surpass existing methods Moreover we employedcategory-article networks and article-link networks to elicitinformation for the category similarity measurement Inperformance evaluations our category grouping based onNSL yielded the greatest number of proper results In termsof domain-term selection we confirmed that the resultsobtained with normalized weights had the best precision andextension rate The distance-based metric had no positiveinfluence on our research When the distance was greaterthan three almost all of the terms were unrelated Howeverwe believe that the collected domain terminologies can assistthe construction of domain knowledge bases for the semanticinterpretation of sensor data

WP has additional weaknesses to those mentioned inthis paper especially in the category-article relation Forexample the term ldquoPaco Nathanrdquo is a personal name thathas ldquoNatural language processingrdquo as one of its categoriesThe relation between the two that is ldquoPaco Nathanrdquo hasexpertise in ldquoNatural language processingrdquo causes noise andnegatively influences semantic information processing Wethink that this problem can be solved in future work byprocessing additional WP components such as abstract ortabular information Moreover our research employed onlythe out-links of WP articles If the in-links were consideredwe expect that the results would be more significant withwider coverage of domain-terms and higher relevance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Kebler and K Janowicz ldquoLinking sensor datamdashwhy to whatand howrdquo in Proceedings of the 3rd International Workshop onSemantic Sensor Networks pp 77ndash91 2010

[2] S Tilak N B Abu-Ghazaleh andWHeinzelman ldquoA taxonomyof wireless micro-sensor network modelsrdquo ACM SIGMOBILEMobile Computing and Communications Review vol 6 no 2pp 28ndash36 2002

[3] G Goodwin and D J Russomanno ldquoAn ontology-bases sensornetwork prototype environmentrdquo in Proceedings of the 5thInternational Conference on Information Processing in SensorNetworks pp 1ndash2 2006

[4] K Janowicz and M Compton ldquoThe Stimulus-Sensor-Observation ontology design pattern and its integration intothe Semantic Sensor Network ontologyrdquo in Proceedings of the3rd International Workshop on Semantic Sensor Networks pp92ndash106 2010

[5] P Barnaghi S Meissner M Presser and K Moessner ldquoSenseand sensrsquoability semantic data modelling for sensor networksrdquoin Proceedings of the ICT Mobile Summit pp 1ndash9 2009

[6] A Broring P Maue K Janowicz D Nust and C MalewskildquoSemantically-enabled sensor plug amp play for the sensor webrdquoSensors vol 11 no 8 pp 7568ndash7605 2011

[7] M Hwang D Choi and P Kim ldquoAmethod for knowledge baseenrichment using wikipedia document informationrdquo Informa-tion vol 13 no 5 pp 1599ndash1612 2010

[8] P Velardi A Cucchiarelli and M Petit ldquoA taxonomy learningmethod and its application to characterize a scientific web com-munityrdquo IEEETransactions onKnowledge andData Engineeringvol 19 no 2 pp 180ndash191 2007

[9] H Avancini A Lavelli B Magnini F Sebastiani and R ZanolildquoExpanding domain-specific lexicons by term categorizationrdquoin Proceedings of the ACM Symposium on Applied Computingpp 793ndash797 March 2003

[10] J J Jung Y H Yu and K S Jo ldquoCollaborative web browsingbased on ontology learning from bookmarksrdquo in Proceedings ofthe International Conference on Computational Science pp 513ndash520 2004

[11] J J Jung ldquoComputational reputation model based on select-ing consensus choices an empirical study on semantic wikiplatformrdquo Expert Systems with Applications vol 39 no 10 pp9002ndash9007 2012

[12] J Lehmann and L Buhmann ldquoOREmdasha tool for repairingand enriching knowledge basesrdquo in Proceedings of the 9thInternational Semantic Web Conference pp 177ndash193 2010

[13] J Liu Y-C Liu W Jiang and X-L Wang ldquoResearch onautomatic acquisition of domain termsrdquo inProceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 3026ndash3031 July 2008

[14] SMinocha andPGThomas ldquoCollaborative Learning in aWikiEnvironment experiences from a software engineering courserdquoNew Review of Hypermedia and Multimedia vol 13 no 2 pp187ndash209 2007

[15] R Navigli and P Velardi ldquoOntology enrichment through auto-matic semantic annotation of on-line glossariesrdquo in Proceedingsof the Knowledge Engineering and Knowledge Management pp126ndash140 2006

[16] C Muller and I Gurevych ldquoUsing wikipedia and wiktionaryin domain-specific information retrievalrdquo in Proceedings of the9th Cross-Language Evaluation ForumConference on EvaluatingSystems forMultilingual andMultimodal Information Access pp219ndash226 2009

[17] J Vivaldi and H Rodriquez ldquoFinding domain terms usingwikipediardquo in Proceedings of the 7th International Conference onLanguage Resources and Evaluation pp 386ndash393 2010

[18] G AMiller ldquoWordNet a lexical database for EnglishrdquoCommu-nications of the ACM vol 38 no 11 pp 39ndash41 1995

[19] S Lee S-Y Huh and R D McNiel ldquoAutomatic generationof concept hierarchies using WordNetrdquo Expert Systems withApplications vol 35 no 3 pp 1132ndash1144 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Research Article Domain Terminology Collection for ...downloads.hindawi.com/journals/ijdsn/2014/827319.pdf · knowledge bases for the semantic interpretation of sensor data. 1. Introduction

International Journal of Distributed Sensor Networks 5

(C) Natural language processing

(A) Concept mining

(A) Document classification

(A) Automatic summarization

(C) Computational linguistics

(C) Data mining

(C) Artificial intelligence applications

(C) Information science

(C) Machine learning

(A) Knowledge representation

(C) middot middot middot

(C) middot middot middotmiddot middot middot

Figure 2 Example of a category-article network

(C) Natural language processing

(C) Data mining

Languageware

Concept mining

Document classification

Text mining

Text corpus

Informationretrieval

Contentdetermination

String kernel

Clusteranalysis

Automaticsummarization

Information Datavisualization

OntologySenticcomputing

Biomedicaltext mining

middot middot middot

middot middot middot

Figure 3 Network structure between the categories ldquoNatural language processingrdquo and ldquoData miningrdquo Solid lines and dotted lines representthe category-article network and article-link network respectively whereas (C) denotes a category Here the link is a connection betweenWP articles

as article links Types 2 and 3 in Table 2 repre-sent this option

If the first option is chosen the method goes to Step4 otherwise it goes to Step 11

(4) Candidate categories (CC) are collected using onearticle of DA If DC already contains a candidate it isnot included in CC Explicitly CC (art

119894) = cat

119895 1 =lt

119895 =lt 119898 For example CC (Concept mining) = Datamining Artificial intelligence applications

(5) To prepare clues that indicate suitable categoriesa set of candidate articles (CA) of one candidatecategory is formed For example CA (Data mining)= Extension neural network Big data Data classifi-cation (business intelligence) Document classification

Webmining Text mining Concept mining If thereis an intersection article then an intersection set IS isconstructed to measure the similarity 119868119878 (set

119894 set119895) =

(set119894cap set119895) 997904rArr 119894119886

119896 1 le 119896 le 119901 For exam-

ple IS (Natural language processing Data mining) =Languageware Concept mining Document classifica-tion Text mining

(6) To eliminate the limitation described in Section 2 wepropose a new similarity measurement

sim (DACA) = 120572 times |119868119878 (DACA)||DA|

+ (1 minus 120572)

times

|119868119878 (DACA)||CA|

(1)

6 International Journal of Distributed Sensor Networks

Table 2 Network types and examples C denotes a category A anarticle and L a link

Type Semantic network Examples of thenetwork

1 C1 rarr (A1 = A2)larr C2

Natural languageprocessing rarrDocument

classificationlarrData mining

2

C1 rarr A1 rarr (L1 = A2)larr C2

Natural languageprocessing rarrInformation

retrieval rarr Textcorpuslarr data

mining

C1 rarr (A1 = L2)larr A2larr C2

Natural languageprocessing rarrAutomatic

summarizationlarrText mininglarrData mining

3 C1 rarr A1 rarr (L1 = L2)larr A2larr C2

Natural languageprocessing rarrInformationretrieval rarr

InformationlarrData visualizationlarr Data mining

Bold represents the category concepts connected with super and subrelations but inappropriate generalization in hierarchical structure

where basically we assign a value of 05 to120572 Based on(1) we can calculate the similarities of Cases 1 and 2 inTable 1 to obtain values of 05 and 054 respectivelyHowever we must consider an additional constraintin the bootstrapping method Table 3 shows anotherexample Based on (1) we find that both cases havethe same similarity value as shown in Table 3 Evenso Case 2 is inappropriate because the coverage ofDA is too narrow This may cause the generalizationproblem Thus before calculating similarities in thebootstrapping method the similarity constraint

|DA| ge |CA| (2)

should be satisfiedAccording to this constraint and (1) the similaritybetween ldquoNatural language processingrdquo and ldquoDataminingrdquo is calculated as follows |DA| = 117 |CA| =93 |119868119878 (DACA)| = 4 sim (Natural language process-ing Data mining) = (4117 + 493)2 = 0039

(7) If the similarity exceeds a predetermined thresholdgo to Step 8 otherwise we skip Step 8 and proceed toStep 9

(8) If the candidate category has a similarity that isgreater than the threshold the system considers thecandidate to be the domain and inserts the candidateand its articles into DC andDAHere new articles are

Table 3 Examples of similarity measurement

Cases Categories Count of articles |1198621

cap 1198622

| sim

1 DA1 400 100 0625CA1 100

2 DA2 100 00 0625CA2 400

accompanied by supplementary values of dist countand dw Specifically dist is the category distance fromthe entry category (in the case of ldquoData miningrdquo distis 2) count is 1 and dw is the similarity calculated inStep 6 If DA already contains an article the originalsupplementary values are increased by adding thenewonesThese values are used later for domain-termselection

(9) If there is at least one element in CC we return to Step4 for the next candidate category otherwise proceedto Step 10

(10) If there is at least one element in DA we return toStep 2 for the new CC of the next domain articleotherwise proceed to Step 19

(11) To use the links of articles as additional clues for thesimilarity measurement a domain link set (DLS) iscollected DLS(DA) = link domj 1 =lt j =lt m Forexample DLS = Information Metadata Relationaldatabase World Wide Web Data (computing) Docu-ment retrieval

(12) This step is the same as Step 4(13) This step is the same as Step 5(14) Construct a candidate link set (CLS) with links from

CA Explicitly CLS(CA (catk)) = link cat119905 1 =lt

t =lt q For example CLS(CA (Data mining)) =Computer science Data set Artificial intelligenceDatabase management system Business intelligenceNeural network Cluster analysis

(15) If the similarity constraint is satisfied (1) is appliedto determine the category-article similarity We useadditional similarity measurements for article-linknetworks The network can have different degreesof relatedness according to the network type (seeTable 2) If two categories have complex close con-nections the similarity should be greater becauseof common features in the neighborhood To applyboth characteristics we propose a further similaritymeasure

sim (set119894 set119895)

=

10038161003816100381610038161003816119868119878 (set

119894 set119895)

10038161003816100381610038161003816

distancetimes

1

1003816100381610038161003816set119894

1003816100381610038161003816+

10038161003816100381610038161003816set119895

10038161003816100381610038161003816

(3)

where distance is the number of articles that existon the network (this is different from dist in DA)According to (3) we can calculate sim (DLS CA)

International Journal of Distributed Sensor Networks 7

sim (DA CLS) and sim (DLS CLS) which have dis-tances of 2 2 and 3 respectively The final similarity(final sim) between DA and CA is determined bysumming sim (DA CA) from (1) with sim (DLS CA)sim (DA CLS) and sim(DLS CLS) from (3) Thesimilarity between ldquoNatural language processingrdquo andldquoData miningrdquo can thus be calculated as follows

|DA| = 117 |CA| = 93

|DLS| = 3633 |CLS| = 3220

|119868119878 (DACA)| = 4 |119868119878 (DLSCA)| = 91

|119868119878 (DACLS)| = 63 |119868119878 (DLSCLS)| = 6771

sim (DACA) = (4117 + 493)2

= 0039

sim (DLSCA) = (912)(3633 + 93)

= 0012

sim (DACLS) = (632)(117 + 3220)

= 0009

sim (DLSCLS) = (67713)(3633 + 3220)

= 03293

final sim (Natural language processingData mining)

= 03893

(4)

(16) If the similarity exceeds the predetermined thresholdgo to Step 17 otherwise Step 18 is carried out

(17) If the similarity exceeds the threshold the systemenriches DC DA andDLSThis step is similar to Step8

(18) If there is at least one element in CC we return to Step12 for the next candidate category otherwise go toStep 10

(19) Output the DC and DA acquired through the boot-strapping process

(20) Terminate the bootstrapping process and evaluateDCand DA which are increased in Step 8 or 17 ForDA the evaluations are divided into three types bysupplementary value (dist count and dw) In thecase of domain weight we use values normalizedaccording to

weight1015840 (art119894) =

weight (art119894)

argart119896isinDA max weight (art119896)

(5)

We have described all of the bootstrapping steps fordomain-term selection by grouping similar categories In thenext section a few aspects of performance are evaluated

4 Experimental Evaluations

This section considers the evaluation of DC and DA Oneobjective of our research is to select as many domain-terms

Table 4 Domain categories collected for ldquoSemantic Webrdquo on NLSwith threshold value 02

Categories sim dist REMetadata publishing 047 2 1Semantic HTML 040 2 1RDF 061 2 1Knowledge engineering 042 2 1Folksonomy 027 2 1Triplestores 068 2 1Domain-specificknowledge representationlanguages

037 2 1

RDF data access 052 2 1Book swapping 022 9 0Rule engines 028 2 1Ontology languages 035 2 1Knowledge bases 030 2 1

RE relevance evaluation

as possible for which we proposed the bootstrappingmethodfor similar category grouping In the process DC and DAbecome enriched with categories and articles respectivelyand each article has supplementary values of dist count anddw

To evaluate the quality of DC and DA we used an article-category dataset and a Pagelinks dataset that are includedamong theWP components inDBpedia (DBpedia is a crowd-sourced community effort to extract structured informationfrom Wikipedia and make this information available on theWeb httpdbpediaorgAbout) version 37 To utilize theWP networks we implemented two versions of our systemreferred to as new similarity (NS) and new similarity withlinks (NSL) We varied the threshold of each system from 01to 09 Moreover we chose entry sets of 40 categories (eachset has one category) from fields of computer science suchas ldquoNatural language processingrdquo ldquoSpeech recognitionrdquo andldquoSemantic Webrdquo The results are compared with those of abaseline method that employs the DC similarity measureTable 4 shows some of the similar categories collected by NSLfor the entry category ldquoSemanticWebrdquo with a threshold of 02

We invited domain specialists to examine the resultsand each collection was manually checked by each evaluatorThe RE field in Table 4 shows the actual checked resultswhere values of 1 and 0 were ascribed by the evaluators forrelevance and irrelevance respectively Tables 5 and 6 presentsummaries of the DC evaluations

There was no improvement for thresholds greater than06 and the bootstrapping was incomplete for a threshold of01Therefore we present results for threshold values from02to 06 In the experiments the baseline attained an extensionrate of only 118 (40 categories extended by 7 categories) eventhough its precision was 100 The aim of the informationprocessing is to reduce the time taken to accomplish certainobjectives which implies that the information system should

8 International Journal of Distributed Sensor Networks

XHTML + RDFa Bath Profile Sidecar file COinS Metadata publishing MARC standards WizFolio QiqqaISO-TimeML TimeML Metadata Authority Description Schema Bookends (software) RIS (file format)Metadata Object Description Schema EndNote Refer (software) ISO 2709 BibTeX XML S5 (file format)Semantic HTML Simple HTML Ontology Extensions Opera Show Format XOXO XHTML Friends NetworkStrixDB Graph Style Sheets TriX (syntax) TriG (syntax) RDF feed Redland RDF Application FrameworkRDF query language Turtle (syntax) RDFLib Notation3 SPARQL D3web Artificial architectureNetWeaver Developer Knowledge engineer Frame language

Box 3 Domain articles (DA) collected for ldquoSemantic Webrdquo on NLS with threshold value 02

Table 5 DC evaluations with baseline and NS

Type Baseline NSThreshold value 02 03 02 03 04 05 06Extended count 7 mdash 282 133 87 82 mdashExtension rate 118 mdash 805 433 318 305 mdashAppropriate 7 mdash 241 113 80 74 mdashInappropriate 0 mdash 41 20 7 8 mdashPrecision () 10000 mdash 8546 8496 9195 9024 mdash

Table 6 DC evaluations with NSL

Type NSLThreshold value 02 03 04 05 06Extended count 938 346 207 134 47Extension rate 2445 965 618 435 218Appropriate 799 293 187 127 47Inappropriate 139 53 20 7 0Precision () 8518 8468 9034 9478 10000

Table 7 DA evaluations based on distance including articles withinthe indicated distance

Distance 2 3 4 5 AllAppropriate 1092 1224 1246 1253 1253Inappropriate 252 336 471 508 514Precision () 813 785 726 712 709

Table 8 DA evaluations based on count (overlapped)

Overlapped count 1 2 3 4Appropriate 1253 330 90 15Inappropriate 514 80 14 1Precision () 709 805 865 938

Table 9 DA evaluations based on normalized weight

Threshold value 0 01 02 03Appropriate 1253 1195 870 755Inappropriate 514 292 69 22Precision () 709 804 927 972

provide varied results In this respect we do not expect thebaseline results to be helpful However it is apparent that NSLprovides wide extension and high precision The maximum

extension rate was 2445 with 799 appropriate categories fora threshold of 02 The minimum precision was around 84when the threshold was 03

In addition to evaluating DC we evaluated DA withthe NSL results for a threshold of 02 To examine theinfluence of the distance count and domain weight weanalyzed the results according to each factor Six DAs wereselected at randomwith a total of 1769 articles (terms) Box 3enumerates a part of collected articles for a domain ldquoSemanticWebrdquo And Tables 7 8 and 9 show the evaluation results withrespect to distance count and weight

The basic performance of the domain-term selectionattained precision of 709 As expected the precision wasinversely proportional to the distance however a distanceof 4 produced almost all of the unrelated articles Theweight and count could be used as important criteria toselect domain-terms we found that the weight returnedmore refined results than the count (the weight returned755 appropriate terms with 972 precision at a thresholdof 03) At short distances there are many names of peopleand organizations such as ldquoSquarespacerdquo ldquoRackspace Cloudrdquoand ldquoNsite Software (Platform as a Service)rdquo These nameswere selected by the bootstrapping because their associatedcategories (eg ldquoCloud platformsrdquo ldquoCloud storagerdquo andldquoCloud infrastructurerdquo) were similar to the entry (eg ldquoCloudcomputingrdquo) This situation is not caused by our methodbut by the definition of the article-category relations of WPWe believe that this can be resolved by processing content(abstracts) or tabular information in the future

5 Conclusions and Future Work

This paper has proposed a method of domain-term collec-tion through a bootstrapping process to assist the semanticinterpretation of data from sensor networks To achievethis we identified weaknesses in the WP category hierarchy

International Journal of Distributed Sensor Networks 9

(ie loops and inappropriate generalizations) and chose ahorizontal rather than vertical category searchWeproposednew semantic similarity measurements and a similarity con-straint to surpass existing methods Moreover we employedcategory-article networks and article-link networks to elicitinformation for the category similarity measurement Inperformance evaluations our category grouping based onNSL yielded the greatest number of proper results In termsof domain-term selection we confirmed that the resultsobtained with normalized weights had the best precision andextension rate The distance-based metric had no positiveinfluence on our research When the distance was greaterthan three almost all of the terms were unrelated Howeverwe believe that the collected domain terminologies can assistthe construction of domain knowledge bases for the semanticinterpretation of sensor data

WP has additional weaknesses to those mentioned inthis paper especially in the category-article relation Forexample the term ldquoPaco Nathanrdquo is a personal name thathas ldquoNatural language processingrdquo as one of its categoriesThe relation between the two that is ldquoPaco Nathanrdquo hasexpertise in ldquoNatural language processingrdquo causes noise andnegatively influences semantic information processing Wethink that this problem can be solved in future work byprocessing additional WP components such as abstract ortabular information Moreover our research employed onlythe out-links of WP articles If the in-links were consideredwe expect that the results would be more significant withwider coverage of domain-terms and higher relevance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Kebler and K Janowicz ldquoLinking sensor datamdashwhy to whatand howrdquo in Proceedings of the 3rd International Workshop onSemantic Sensor Networks pp 77ndash91 2010

[2] S Tilak N B Abu-Ghazaleh andWHeinzelman ldquoA taxonomyof wireless micro-sensor network modelsrdquo ACM SIGMOBILEMobile Computing and Communications Review vol 6 no 2pp 28ndash36 2002

[3] G Goodwin and D J Russomanno ldquoAn ontology-bases sensornetwork prototype environmentrdquo in Proceedings of the 5thInternational Conference on Information Processing in SensorNetworks pp 1ndash2 2006

[4] K Janowicz and M Compton ldquoThe Stimulus-Sensor-Observation ontology design pattern and its integration intothe Semantic Sensor Network ontologyrdquo in Proceedings of the3rd International Workshop on Semantic Sensor Networks pp92ndash106 2010

[5] P Barnaghi S Meissner M Presser and K Moessner ldquoSenseand sensrsquoability semantic data modelling for sensor networksrdquoin Proceedings of the ICT Mobile Summit pp 1ndash9 2009

[6] A Broring P Maue K Janowicz D Nust and C MalewskildquoSemantically-enabled sensor plug amp play for the sensor webrdquoSensors vol 11 no 8 pp 7568ndash7605 2011

[7] M Hwang D Choi and P Kim ldquoAmethod for knowledge baseenrichment using wikipedia document informationrdquo Informa-tion vol 13 no 5 pp 1599ndash1612 2010

[8] P Velardi A Cucchiarelli and M Petit ldquoA taxonomy learningmethod and its application to characterize a scientific web com-munityrdquo IEEETransactions onKnowledge andData Engineeringvol 19 no 2 pp 180ndash191 2007

[9] H Avancini A Lavelli B Magnini F Sebastiani and R ZanolildquoExpanding domain-specific lexicons by term categorizationrdquoin Proceedings of the ACM Symposium on Applied Computingpp 793ndash797 March 2003

[10] J J Jung Y H Yu and K S Jo ldquoCollaborative web browsingbased on ontology learning from bookmarksrdquo in Proceedings ofthe International Conference on Computational Science pp 513ndash520 2004

[11] J J Jung ldquoComputational reputation model based on select-ing consensus choices an empirical study on semantic wikiplatformrdquo Expert Systems with Applications vol 39 no 10 pp9002ndash9007 2012

[12] J Lehmann and L Buhmann ldquoOREmdasha tool for repairingand enriching knowledge basesrdquo in Proceedings of the 9thInternational Semantic Web Conference pp 177ndash193 2010

[13] J Liu Y-C Liu W Jiang and X-L Wang ldquoResearch onautomatic acquisition of domain termsrdquo inProceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 3026ndash3031 July 2008

[14] SMinocha andPGThomas ldquoCollaborative Learning in aWikiEnvironment experiences from a software engineering courserdquoNew Review of Hypermedia and Multimedia vol 13 no 2 pp187ndash209 2007

[15] R Navigli and P Velardi ldquoOntology enrichment through auto-matic semantic annotation of on-line glossariesrdquo in Proceedingsof the Knowledge Engineering and Knowledge Management pp126ndash140 2006

[16] C Muller and I Gurevych ldquoUsing wikipedia and wiktionaryin domain-specific information retrievalrdquo in Proceedings of the9th Cross-Language Evaluation ForumConference on EvaluatingSystems forMultilingual andMultimodal Information Access pp219ndash226 2009

[17] J Vivaldi and H Rodriquez ldquoFinding domain terms usingwikipediardquo in Proceedings of the 7th International Conference onLanguage Resources and Evaluation pp 386ndash393 2010

[18] G AMiller ldquoWordNet a lexical database for EnglishrdquoCommu-nications of the ACM vol 38 no 11 pp 39ndash41 1995

[19] S Lee S-Y Huh and R D McNiel ldquoAutomatic generationof concept hierarchies using WordNetrdquo Expert Systems withApplications vol 35 no 3 pp 1132ndash1144 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Research Article Domain Terminology Collection for ...downloads.hindawi.com/journals/ijdsn/2014/827319.pdf · knowledge bases for the semantic interpretation of sensor data. 1. Introduction

6 International Journal of Distributed Sensor Networks

Table 2 Network types and examples C denotes a category A anarticle and L a link

Type Semantic network Examples of thenetwork

1 C1 rarr (A1 = A2)larr C2

Natural languageprocessing rarrDocument

classificationlarrData mining

2

C1 rarr A1 rarr (L1 = A2)larr C2

Natural languageprocessing rarrInformation

retrieval rarr Textcorpuslarr data

mining

C1 rarr (A1 = L2)larr A2larr C2

Natural languageprocessing rarrAutomatic

summarizationlarrText mininglarrData mining

3 C1 rarr A1 rarr (L1 = L2)larr A2larr C2

Natural languageprocessing rarrInformationretrieval rarr

InformationlarrData visualizationlarr Data mining

Bold represents the category concepts connected with super and subrelations but inappropriate generalization in hierarchical structure

where basically we assign a value of 05 to120572 Based on(1) we can calculate the similarities of Cases 1 and 2 inTable 1 to obtain values of 05 and 054 respectivelyHowever we must consider an additional constraintin the bootstrapping method Table 3 shows anotherexample Based on (1) we find that both cases havethe same similarity value as shown in Table 3 Evenso Case 2 is inappropriate because the coverage ofDA is too narrow This may cause the generalizationproblem Thus before calculating similarities in thebootstrapping method the similarity constraint

|DA| ge |CA| (2)

should be satisfiedAccording to this constraint and (1) the similaritybetween ldquoNatural language processingrdquo and ldquoDataminingrdquo is calculated as follows |DA| = 117 |CA| =93 |119868119878 (DACA)| = 4 sim (Natural language process-ing Data mining) = (4117 + 493)2 = 0039

(7) If the similarity exceeds a predetermined thresholdgo to Step 8 otherwise we skip Step 8 and proceed toStep 9

(8) If the candidate category has a similarity that isgreater than the threshold the system considers thecandidate to be the domain and inserts the candidateand its articles into DC andDAHere new articles are

Table 3 Examples of similarity measurement

Cases Categories Count of articles |1198621

cap 1198622

| sim

1 DA1 400 100 0625CA1 100

2 DA2 100 00 0625CA2 400

accompanied by supplementary values of dist countand dw Specifically dist is the category distance fromthe entry category (in the case of ldquoData miningrdquo distis 2) count is 1 and dw is the similarity calculated inStep 6 If DA already contains an article the originalsupplementary values are increased by adding thenewonesThese values are used later for domain-termselection

(9) If there is at least one element in CC we return to Step4 for the next candidate category otherwise proceedto Step 10

(10) If there is at least one element in DA we return toStep 2 for the new CC of the next domain articleotherwise proceed to Step 19

(11) To use the links of articles as additional clues for thesimilarity measurement a domain link set (DLS) iscollected DLS(DA) = link domj 1 =lt j =lt m Forexample DLS = Information Metadata Relationaldatabase World Wide Web Data (computing) Docu-ment retrieval

(12) This step is the same as Step 4(13) This step is the same as Step 5(14) Construct a candidate link set (CLS) with links from

CA Explicitly CLS(CA (catk)) = link cat119905 1 =lt

t =lt q For example CLS(CA (Data mining)) =Computer science Data set Artificial intelligenceDatabase management system Business intelligenceNeural network Cluster analysis

(15) If the similarity constraint is satisfied (1) is appliedto determine the category-article similarity We useadditional similarity measurements for article-linknetworks The network can have different degreesof relatedness according to the network type (seeTable 2) If two categories have complex close con-nections the similarity should be greater becauseof common features in the neighborhood To applyboth characteristics we propose a further similaritymeasure

sim (set119894 set119895)

=

10038161003816100381610038161003816119868119878 (set

119894 set119895)

10038161003816100381610038161003816

distancetimes

1

1003816100381610038161003816set119894

1003816100381610038161003816+

10038161003816100381610038161003816set119895

10038161003816100381610038161003816

(3)

where distance is the number of articles that existon the network (this is different from dist in DA)According to (3) we can calculate sim (DLS CA)

International Journal of Distributed Sensor Networks 7

sim (DA CLS) and sim (DLS CLS) which have dis-tances of 2 2 and 3 respectively The final similarity(final sim) between DA and CA is determined bysumming sim (DA CA) from (1) with sim (DLS CA)sim (DA CLS) and sim(DLS CLS) from (3) Thesimilarity between ldquoNatural language processingrdquo andldquoData miningrdquo can thus be calculated as follows

|DA| = 117 |CA| = 93

|DLS| = 3633 |CLS| = 3220

|119868119878 (DACA)| = 4 |119868119878 (DLSCA)| = 91

|119868119878 (DACLS)| = 63 |119868119878 (DLSCLS)| = 6771

sim (DACA) = (4117 + 493)2

= 0039

sim (DLSCA) = (912)(3633 + 93)

= 0012

sim (DACLS) = (632)(117 + 3220)

= 0009

sim (DLSCLS) = (67713)(3633 + 3220)

= 03293

final sim (Natural language processingData mining)

= 03893

(4)

(16) If the similarity exceeds the predetermined thresholdgo to Step 17 otherwise Step 18 is carried out

(17) If the similarity exceeds the threshold the systemenriches DC DA andDLSThis step is similar to Step8

(18) If there is at least one element in CC we return to Step12 for the next candidate category otherwise go toStep 10

(19) Output the DC and DA acquired through the boot-strapping process

(20) Terminate the bootstrapping process and evaluateDCand DA which are increased in Step 8 or 17 ForDA the evaluations are divided into three types bysupplementary value (dist count and dw) In thecase of domain weight we use values normalizedaccording to

weight1015840 (art119894) =

weight (art119894)

argart119896isinDA max weight (art119896)

(5)

We have described all of the bootstrapping steps fordomain-term selection by grouping similar categories In thenext section a few aspects of performance are evaluated

4 Experimental Evaluations

This section considers the evaluation of DC and DA Oneobjective of our research is to select as many domain-terms

Table 4 Domain categories collected for ldquoSemantic Webrdquo on NLSwith threshold value 02

Categories sim dist REMetadata publishing 047 2 1Semantic HTML 040 2 1RDF 061 2 1Knowledge engineering 042 2 1Folksonomy 027 2 1Triplestores 068 2 1Domain-specificknowledge representationlanguages

037 2 1

RDF data access 052 2 1Book swapping 022 9 0Rule engines 028 2 1Ontology languages 035 2 1Knowledge bases 030 2 1

RE relevance evaluation

as possible for which we proposed the bootstrappingmethodfor similar category grouping In the process DC and DAbecome enriched with categories and articles respectivelyand each article has supplementary values of dist count anddw

To evaluate the quality of DC and DA we used an article-category dataset and a Pagelinks dataset that are includedamong theWP components inDBpedia (DBpedia is a crowd-sourced community effort to extract structured informationfrom Wikipedia and make this information available on theWeb httpdbpediaorgAbout) version 37 To utilize theWP networks we implemented two versions of our systemreferred to as new similarity (NS) and new similarity withlinks (NSL) We varied the threshold of each system from 01to 09 Moreover we chose entry sets of 40 categories (eachset has one category) from fields of computer science suchas ldquoNatural language processingrdquo ldquoSpeech recognitionrdquo andldquoSemantic Webrdquo The results are compared with those of abaseline method that employs the DC similarity measureTable 4 shows some of the similar categories collected by NSLfor the entry category ldquoSemanticWebrdquo with a threshold of 02

We invited domain specialists to examine the resultsand each collection was manually checked by each evaluatorThe RE field in Table 4 shows the actual checked resultswhere values of 1 and 0 were ascribed by the evaluators forrelevance and irrelevance respectively Tables 5 and 6 presentsummaries of the DC evaluations

There was no improvement for thresholds greater than06 and the bootstrapping was incomplete for a threshold of01Therefore we present results for threshold values from02to 06 In the experiments the baseline attained an extensionrate of only 118 (40 categories extended by 7 categories) eventhough its precision was 100 The aim of the informationprocessing is to reduce the time taken to accomplish certainobjectives which implies that the information system should

8 International Journal of Distributed Sensor Networks

XHTML + RDFa Bath Profile Sidecar file COinS Metadata publishing MARC standards WizFolio QiqqaISO-TimeML TimeML Metadata Authority Description Schema Bookends (software) RIS (file format)Metadata Object Description Schema EndNote Refer (software) ISO 2709 BibTeX XML S5 (file format)Semantic HTML Simple HTML Ontology Extensions Opera Show Format XOXO XHTML Friends NetworkStrixDB Graph Style Sheets TriX (syntax) TriG (syntax) RDF feed Redland RDF Application FrameworkRDF query language Turtle (syntax) RDFLib Notation3 SPARQL D3web Artificial architectureNetWeaver Developer Knowledge engineer Frame language

Box 3 Domain articles (DA) collected for ldquoSemantic Webrdquo on NLS with threshold value 02

Table 5 DC evaluations with baseline and NS

Type Baseline NSThreshold value 02 03 02 03 04 05 06Extended count 7 mdash 282 133 87 82 mdashExtension rate 118 mdash 805 433 318 305 mdashAppropriate 7 mdash 241 113 80 74 mdashInappropriate 0 mdash 41 20 7 8 mdashPrecision () 10000 mdash 8546 8496 9195 9024 mdash

Table 6 DC evaluations with NSL

Type NSLThreshold value 02 03 04 05 06Extended count 938 346 207 134 47Extension rate 2445 965 618 435 218Appropriate 799 293 187 127 47Inappropriate 139 53 20 7 0Precision () 8518 8468 9034 9478 10000

Table 7 DA evaluations based on distance including articles withinthe indicated distance

Distance 2 3 4 5 AllAppropriate 1092 1224 1246 1253 1253Inappropriate 252 336 471 508 514Precision () 813 785 726 712 709

Table 8 DA evaluations based on count (overlapped)

Overlapped count 1 2 3 4Appropriate 1253 330 90 15Inappropriate 514 80 14 1Precision () 709 805 865 938

Table 9 DA evaluations based on normalized weight

Threshold value 0 01 02 03Appropriate 1253 1195 870 755Inappropriate 514 292 69 22Precision () 709 804 927 972

provide varied results In this respect we do not expect thebaseline results to be helpful However it is apparent that NSLprovides wide extension and high precision The maximum

extension rate was 2445 with 799 appropriate categories fora threshold of 02 The minimum precision was around 84when the threshold was 03

In addition to evaluating DC we evaluated DA withthe NSL results for a threshold of 02 To examine theinfluence of the distance count and domain weight weanalyzed the results according to each factor Six DAs wereselected at randomwith a total of 1769 articles (terms) Box 3enumerates a part of collected articles for a domain ldquoSemanticWebrdquo And Tables 7 8 and 9 show the evaluation results withrespect to distance count and weight

The basic performance of the domain-term selectionattained precision of 709 As expected the precision wasinversely proportional to the distance however a distanceof 4 produced almost all of the unrelated articles Theweight and count could be used as important criteria toselect domain-terms we found that the weight returnedmore refined results than the count (the weight returned755 appropriate terms with 972 precision at a thresholdof 03) At short distances there are many names of peopleand organizations such as ldquoSquarespacerdquo ldquoRackspace Cloudrdquoand ldquoNsite Software (Platform as a Service)rdquo These nameswere selected by the bootstrapping because their associatedcategories (eg ldquoCloud platformsrdquo ldquoCloud storagerdquo andldquoCloud infrastructurerdquo) were similar to the entry (eg ldquoCloudcomputingrdquo) This situation is not caused by our methodbut by the definition of the article-category relations of WPWe believe that this can be resolved by processing content(abstracts) or tabular information in the future

5 Conclusions and Future Work

This paper has proposed a method of domain-term collec-tion through a bootstrapping process to assist the semanticinterpretation of data from sensor networks To achievethis we identified weaknesses in the WP category hierarchy

International Journal of Distributed Sensor Networks 9

(ie loops and inappropriate generalizations) and chose ahorizontal rather than vertical category searchWeproposednew semantic similarity measurements and a similarity con-straint to surpass existing methods Moreover we employedcategory-article networks and article-link networks to elicitinformation for the category similarity measurement Inperformance evaluations our category grouping based onNSL yielded the greatest number of proper results In termsof domain-term selection we confirmed that the resultsobtained with normalized weights had the best precision andextension rate The distance-based metric had no positiveinfluence on our research When the distance was greaterthan three almost all of the terms were unrelated Howeverwe believe that the collected domain terminologies can assistthe construction of domain knowledge bases for the semanticinterpretation of sensor data

WP has additional weaknesses to those mentioned inthis paper especially in the category-article relation Forexample the term ldquoPaco Nathanrdquo is a personal name thathas ldquoNatural language processingrdquo as one of its categoriesThe relation between the two that is ldquoPaco Nathanrdquo hasexpertise in ldquoNatural language processingrdquo causes noise andnegatively influences semantic information processing Wethink that this problem can be solved in future work byprocessing additional WP components such as abstract ortabular information Moreover our research employed onlythe out-links of WP articles If the in-links were consideredwe expect that the results would be more significant withwider coverage of domain-terms and higher relevance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Kebler and K Janowicz ldquoLinking sensor datamdashwhy to whatand howrdquo in Proceedings of the 3rd International Workshop onSemantic Sensor Networks pp 77ndash91 2010

[2] S Tilak N B Abu-Ghazaleh andWHeinzelman ldquoA taxonomyof wireless micro-sensor network modelsrdquo ACM SIGMOBILEMobile Computing and Communications Review vol 6 no 2pp 28ndash36 2002

[3] G Goodwin and D J Russomanno ldquoAn ontology-bases sensornetwork prototype environmentrdquo in Proceedings of the 5thInternational Conference on Information Processing in SensorNetworks pp 1ndash2 2006

[4] K Janowicz and M Compton ldquoThe Stimulus-Sensor-Observation ontology design pattern and its integration intothe Semantic Sensor Network ontologyrdquo in Proceedings of the3rd International Workshop on Semantic Sensor Networks pp92ndash106 2010

[5] P Barnaghi S Meissner M Presser and K Moessner ldquoSenseand sensrsquoability semantic data modelling for sensor networksrdquoin Proceedings of the ICT Mobile Summit pp 1ndash9 2009

[6] A Broring P Maue K Janowicz D Nust and C MalewskildquoSemantically-enabled sensor plug amp play for the sensor webrdquoSensors vol 11 no 8 pp 7568ndash7605 2011

[7] M Hwang D Choi and P Kim ldquoAmethod for knowledge baseenrichment using wikipedia document informationrdquo Informa-tion vol 13 no 5 pp 1599ndash1612 2010

[8] P Velardi A Cucchiarelli and M Petit ldquoA taxonomy learningmethod and its application to characterize a scientific web com-munityrdquo IEEETransactions onKnowledge andData Engineeringvol 19 no 2 pp 180ndash191 2007

[9] H Avancini A Lavelli B Magnini F Sebastiani and R ZanolildquoExpanding domain-specific lexicons by term categorizationrdquoin Proceedings of the ACM Symposium on Applied Computingpp 793ndash797 March 2003

[10] J J Jung Y H Yu and K S Jo ldquoCollaborative web browsingbased on ontology learning from bookmarksrdquo in Proceedings ofthe International Conference on Computational Science pp 513ndash520 2004

[11] J J Jung ldquoComputational reputation model based on select-ing consensus choices an empirical study on semantic wikiplatformrdquo Expert Systems with Applications vol 39 no 10 pp9002ndash9007 2012

[12] J Lehmann and L Buhmann ldquoOREmdasha tool for repairingand enriching knowledge basesrdquo in Proceedings of the 9thInternational Semantic Web Conference pp 177ndash193 2010

[13] J Liu Y-C Liu W Jiang and X-L Wang ldquoResearch onautomatic acquisition of domain termsrdquo inProceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 3026ndash3031 July 2008

[14] SMinocha andPGThomas ldquoCollaborative Learning in aWikiEnvironment experiences from a software engineering courserdquoNew Review of Hypermedia and Multimedia vol 13 no 2 pp187ndash209 2007

[15] R Navigli and P Velardi ldquoOntology enrichment through auto-matic semantic annotation of on-line glossariesrdquo in Proceedingsof the Knowledge Engineering and Knowledge Management pp126ndash140 2006

[16] C Muller and I Gurevych ldquoUsing wikipedia and wiktionaryin domain-specific information retrievalrdquo in Proceedings of the9th Cross-Language Evaluation ForumConference on EvaluatingSystems forMultilingual andMultimodal Information Access pp219ndash226 2009

[17] J Vivaldi and H Rodriquez ldquoFinding domain terms usingwikipediardquo in Proceedings of the 7th International Conference onLanguage Resources and Evaluation pp 386ndash393 2010

[18] G AMiller ldquoWordNet a lexical database for EnglishrdquoCommu-nications of the ACM vol 38 no 11 pp 39ndash41 1995

[19] S Lee S-Y Huh and R D McNiel ldquoAutomatic generationof concept hierarchies using WordNetrdquo Expert Systems withApplications vol 35 no 3 pp 1132ndash1144 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Research Article Domain Terminology Collection for ...downloads.hindawi.com/journals/ijdsn/2014/827319.pdf · knowledge bases for the semantic interpretation of sensor data. 1. Introduction

International Journal of Distributed Sensor Networks 7

sim (DA CLS) and sim (DLS CLS) which have dis-tances of 2 2 and 3 respectively The final similarity(final sim) between DA and CA is determined bysumming sim (DA CA) from (1) with sim (DLS CA)sim (DA CLS) and sim(DLS CLS) from (3) Thesimilarity between ldquoNatural language processingrdquo andldquoData miningrdquo can thus be calculated as follows

|DA| = 117 |CA| = 93

|DLS| = 3633 |CLS| = 3220

|119868119878 (DACA)| = 4 |119868119878 (DLSCA)| = 91

|119868119878 (DACLS)| = 63 |119868119878 (DLSCLS)| = 6771

sim (DACA) = (4117 + 493)2

= 0039

sim (DLSCA) = (912)(3633 + 93)

= 0012

sim (DACLS) = (632)(117 + 3220)

= 0009

sim (DLSCLS) = (67713)(3633 + 3220)

= 03293

final sim (Natural language processingData mining)

= 03893

(4)

(16) If the similarity exceeds the predetermined thresholdgo to Step 17 otherwise Step 18 is carried out

(17) If the similarity exceeds the threshold the systemenriches DC DA andDLSThis step is similar to Step8

(18) If there is at least one element in CC we return to Step12 for the next candidate category otherwise go toStep 10

(19) Output the DC and DA acquired through the boot-strapping process

(20) Terminate the bootstrapping process and evaluateDCand DA which are increased in Step 8 or 17 ForDA the evaluations are divided into three types bysupplementary value (dist count and dw) In thecase of domain weight we use values normalizedaccording to

weight1015840 (art119894) =

weight (art119894)

argart119896isinDA max weight (art119896)

(5)

We have described all of the bootstrapping steps fordomain-term selection by grouping similar categories In thenext section a few aspects of performance are evaluated

4 Experimental Evaluations

This section considers the evaluation of DC and DA Oneobjective of our research is to select as many domain-terms

Table 4 Domain categories collected for ldquoSemantic Webrdquo on NLSwith threshold value 02

Categories sim dist REMetadata publishing 047 2 1Semantic HTML 040 2 1RDF 061 2 1Knowledge engineering 042 2 1Folksonomy 027 2 1Triplestores 068 2 1Domain-specificknowledge representationlanguages

037 2 1

RDF data access 052 2 1Book swapping 022 9 0Rule engines 028 2 1Ontology languages 035 2 1Knowledge bases 030 2 1

RE relevance evaluation

as possible for which we proposed the bootstrappingmethodfor similar category grouping In the process DC and DAbecome enriched with categories and articles respectivelyand each article has supplementary values of dist count anddw

To evaluate the quality of DC and DA we used an article-category dataset and a Pagelinks dataset that are includedamong theWP components inDBpedia (DBpedia is a crowd-sourced community effort to extract structured informationfrom Wikipedia and make this information available on theWeb httpdbpediaorgAbout) version 37 To utilize theWP networks we implemented two versions of our systemreferred to as new similarity (NS) and new similarity withlinks (NSL) We varied the threshold of each system from 01to 09 Moreover we chose entry sets of 40 categories (eachset has one category) from fields of computer science suchas ldquoNatural language processingrdquo ldquoSpeech recognitionrdquo andldquoSemantic Webrdquo The results are compared with those of abaseline method that employs the DC similarity measureTable 4 shows some of the similar categories collected by NSLfor the entry category ldquoSemanticWebrdquo with a threshold of 02

We invited domain specialists to examine the resultsand each collection was manually checked by each evaluatorThe RE field in Table 4 shows the actual checked resultswhere values of 1 and 0 were ascribed by the evaluators forrelevance and irrelevance respectively Tables 5 and 6 presentsummaries of the DC evaluations

There was no improvement for thresholds greater than06 and the bootstrapping was incomplete for a threshold of01Therefore we present results for threshold values from02to 06 In the experiments the baseline attained an extensionrate of only 118 (40 categories extended by 7 categories) eventhough its precision was 100 The aim of the informationprocessing is to reduce the time taken to accomplish certainobjectives which implies that the information system should

8 International Journal of Distributed Sensor Networks

XHTML + RDFa Bath Profile Sidecar file COinS Metadata publishing MARC standards WizFolio QiqqaISO-TimeML TimeML Metadata Authority Description Schema Bookends (software) RIS (file format)Metadata Object Description Schema EndNote Refer (software) ISO 2709 BibTeX XML S5 (file format)Semantic HTML Simple HTML Ontology Extensions Opera Show Format XOXO XHTML Friends NetworkStrixDB Graph Style Sheets TriX (syntax) TriG (syntax) RDF feed Redland RDF Application FrameworkRDF query language Turtle (syntax) RDFLib Notation3 SPARQL D3web Artificial architectureNetWeaver Developer Knowledge engineer Frame language

Box 3 Domain articles (DA) collected for ldquoSemantic Webrdquo on NLS with threshold value 02

Table 5 DC evaluations with baseline and NS

Type Baseline NSThreshold value 02 03 02 03 04 05 06Extended count 7 mdash 282 133 87 82 mdashExtension rate 118 mdash 805 433 318 305 mdashAppropriate 7 mdash 241 113 80 74 mdashInappropriate 0 mdash 41 20 7 8 mdashPrecision () 10000 mdash 8546 8496 9195 9024 mdash

Table 6 DC evaluations with NSL

Type NSLThreshold value 02 03 04 05 06Extended count 938 346 207 134 47Extension rate 2445 965 618 435 218Appropriate 799 293 187 127 47Inappropriate 139 53 20 7 0Precision () 8518 8468 9034 9478 10000

Table 7 DA evaluations based on distance including articles withinthe indicated distance

Distance 2 3 4 5 AllAppropriate 1092 1224 1246 1253 1253Inappropriate 252 336 471 508 514Precision () 813 785 726 712 709

Table 8 DA evaluations based on count (overlapped)

Overlapped count 1 2 3 4Appropriate 1253 330 90 15Inappropriate 514 80 14 1Precision () 709 805 865 938

Table 9 DA evaluations based on normalized weight

Threshold value 0 01 02 03Appropriate 1253 1195 870 755Inappropriate 514 292 69 22Precision () 709 804 927 972

provide varied results In this respect we do not expect thebaseline results to be helpful However it is apparent that NSLprovides wide extension and high precision The maximum

extension rate was 2445 with 799 appropriate categories fora threshold of 02 The minimum precision was around 84when the threshold was 03

In addition to evaluating DC we evaluated DA withthe NSL results for a threshold of 02 To examine theinfluence of the distance count and domain weight weanalyzed the results according to each factor Six DAs wereselected at randomwith a total of 1769 articles (terms) Box 3enumerates a part of collected articles for a domain ldquoSemanticWebrdquo And Tables 7 8 and 9 show the evaluation results withrespect to distance count and weight

The basic performance of the domain-term selectionattained precision of 709 As expected the precision wasinversely proportional to the distance however a distanceof 4 produced almost all of the unrelated articles Theweight and count could be used as important criteria toselect domain-terms we found that the weight returnedmore refined results than the count (the weight returned755 appropriate terms with 972 precision at a thresholdof 03) At short distances there are many names of peopleand organizations such as ldquoSquarespacerdquo ldquoRackspace Cloudrdquoand ldquoNsite Software (Platform as a Service)rdquo These nameswere selected by the bootstrapping because their associatedcategories (eg ldquoCloud platformsrdquo ldquoCloud storagerdquo andldquoCloud infrastructurerdquo) were similar to the entry (eg ldquoCloudcomputingrdquo) This situation is not caused by our methodbut by the definition of the article-category relations of WPWe believe that this can be resolved by processing content(abstracts) or tabular information in the future

5 Conclusions and Future Work

This paper has proposed a method of domain-term collec-tion through a bootstrapping process to assist the semanticinterpretation of data from sensor networks To achievethis we identified weaknesses in the WP category hierarchy

International Journal of Distributed Sensor Networks 9

(ie loops and inappropriate generalizations) and chose ahorizontal rather than vertical category searchWeproposednew semantic similarity measurements and a similarity con-straint to surpass existing methods Moreover we employedcategory-article networks and article-link networks to elicitinformation for the category similarity measurement Inperformance evaluations our category grouping based onNSL yielded the greatest number of proper results In termsof domain-term selection we confirmed that the resultsobtained with normalized weights had the best precision andextension rate The distance-based metric had no positiveinfluence on our research When the distance was greaterthan three almost all of the terms were unrelated Howeverwe believe that the collected domain terminologies can assistthe construction of domain knowledge bases for the semanticinterpretation of sensor data

WP has additional weaknesses to those mentioned inthis paper especially in the category-article relation Forexample the term ldquoPaco Nathanrdquo is a personal name thathas ldquoNatural language processingrdquo as one of its categoriesThe relation between the two that is ldquoPaco Nathanrdquo hasexpertise in ldquoNatural language processingrdquo causes noise andnegatively influences semantic information processing Wethink that this problem can be solved in future work byprocessing additional WP components such as abstract ortabular information Moreover our research employed onlythe out-links of WP articles If the in-links were consideredwe expect that the results would be more significant withwider coverage of domain-terms and higher relevance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Kebler and K Janowicz ldquoLinking sensor datamdashwhy to whatand howrdquo in Proceedings of the 3rd International Workshop onSemantic Sensor Networks pp 77ndash91 2010

[2] S Tilak N B Abu-Ghazaleh andWHeinzelman ldquoA taxonomyof wireless micro-sensor network modelsrdquo ACM SIGMOBILEMobile Computing and Communications Review vol 6 no 2pp 28ndash36 2002

[3] G Goodwin and D J Russomanno ldquoAn ontology-bases sensornetwork prototype environmentrdquo in Proceedings of the 5thInternational Conference on Information Processing in SensorNetworks pp 1ndash2 2006

[4] K Janowicz and M Compton ldquoThe Stimulus-Sensor-Observation ontology design pattern and its integration intothe Semantic Sensor Network ontologyrdquo in Proceedings of the3rd International Workshop on Semantic Sensor Networks pp92ndash106 2010

[5] P Barnaghi S Meissner M Presser and K Moessner ldquoSenseand sensrsquoability semantic data modelling for sensor networksrdquoin Proceedings of the ICT Mobile Summit pp 1ndash9 2009

[6] A Broring P Maue K Janowicz D Nust and C MalewskildquoSemantically-enabled sensor plug amp play for the sensor webrdquoSensors vol 11 no 8 pp 7568ndash7605 2011

[7] M Hwang D Choi and P Kim ldquoAmethod for knowledge baseenrichment using wikipedia document informationrdquo Informa-tion vol 13 no 5 pp 1599ndash1612 2010

[8] P Velardi A Cucchiarelli and M Petit ldquoA taxonomy learningmethod and its application to characterize a scientific web com-munityrdquo IEEETransactions onKnowledge andData Engineeringvol 19 no 2 pp 180ndash191 2007

[9] H Avancini A Lavelli B Magnini F Sebastiani and R ZanolildquoExpanding domain-specific lexicons by term categorizationrdquoin Proceedings of the ACM Symposium on Applied Computingpp 793ndash797 March 2003

[10] J J Jung Y H Yu and K S Jo ldquoCollaborative web browsingbased on ontology learning from bookmarksrdquo in Proceedings ofthe International Conference on Computational Science pp 513ndash520 2004

[11] J J Jung ldquoComputational reputation model based on select-ing consensus choices an empirical study on semantic wikiplatformrdquo Expert Systems with Applications vol 39 no 10 pp9002ndash9007 2012

[12] J Lehmann and L Buhmann ldquoOREmdasha tool for repairingand enriching knowledge basesrdquo in Proceedings of the 9thInternational Semantic Web Conference pp 177ndash193 2010

[13] J Liu Y-C Liu W Jiang and X-L Wang ldquoResearch onautomatic acquisition of domain termsrdquo inProceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 3026ndash3031 July 2008

[14] SMinocha andPGThomas ldquoCollaborative Learning in aWikiEnvironment experiences from a software engineering courserdquoNew Review of Hypermedia and Multimedia vol 13 no 2 pp187ndash209 2007

[15] R Navigli and P Velardi ldquoOntology enrichment through auto-matic semantic annotation of on-line glossariesrdquo in Proceedingsof the Knowledge Engineering and Knowledge Management pp126ndash140 2006

[16] C Muller and I Gurevych ldquoUsing wikipedia and wiktionaryin domain-specific information retrievalrdquo in Proceedings of the9th Cross-Language Evaluation ForumConference on EvaluatingSystems forMultilingual andMultimodal Information Access pp219ndash226 2009

[17] J Vivaldi and H Rodriquez ldquoFinding domain terms usingwikipediardquo in Proceedings of the 7th International Conference onLanguage Resources and Evaluation pp 386ndash393 2010

[18] G AMiller ldquoWordNet a lexical database for EnglishrdquoCommu-nications of the ACM vol 38 no 11 pp 39ndash41 1995

[19] S Lee S-Y Huh and R D McNiel ldquoAutomatic generationof concept hierarchies using WordNetrdquo Expert Systems withApplications vol 35 no 3 pp 1132ndash1144 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Research Article Domain Terminology Collection for ...downloads.hindawi.com/journals/ijdsn/2014/827319.pdf · knowledge bases for the semantic interpretation of sensor data. 1. Introduction

8 International Journal of Distributed Sensor Networks

XHTML + RDFa Bath Profile Sidecar file COinS Metadata publishing MARC standards WizFolio QiqqaISO-TimeML TimeML Metadata Authority Description Schema Bookends (software) RIS (file format)Metadata Object Description Schema EndNote Refer (software) ISO 2709 BibTeX XML S5 (file format)Semantic HTML Simple HTML Ontology Extensions Opera Show Format XOXO XHTML Friends NetworkStrixDB Graph Style Sheets TriX (syntax) TriG (syntax) RDF feed Redland RDF Application FrameworkRDF query language Turtle (syntax) RDFLib Notation3 SPARQL D3web Artificial architectureNetWeaver Developer Knowledge engineer Frame language

Box 3 Domain articles (DA) collected for ldquoSemantic Webrdquo on NLS with threshold value 02

Table 5 DC evaluations with baseline and NS

Type Baseline NSThreshold value 02 03 02 03 04 05 06Extended count 7 mdash 282 133 87 82 mdashExtension rate 118 mdash 805 433 318 305 mdashAppropriate 7 mdash 241 113 80 74 mdashInappropriate 0 mdash 41 20 7 8 mdashPrecision () 10000 mdash 8546 8496 9195 9024 mdash

Table 6 DC evaluations with NSL

Type NSLThreshold value 02 03 04 05 06Extended count 938 346 207 134 47Extension rate 2445 965 618 435 218Appropriate 799 293 187 127 47Inappropriate 139 53 20 7 0Precision () 8518 8468 9034 9478 10000

Table 7 DA evaluations based on distance including articles withinthe indicated distance

Distance 2 3 4 5 AllAppropriate 1092 1224 1246 1253 1253Inappropriate 252 336 471 508 514Precision () 813 785 726 712 709

Table 8 DA evaluations based on count (overlapped)

Overlapped count 1 2 3 4Appropriate 1253 330 90 15Inappropriate 514 80 14 1Precision () 709 805 865 938

Table 9 DA evaluations based on normalized weight

Threshold value 0 01 02 03Appropriate 1253 1195 870 755Inappropriate 514 292 69 22Precision () 709 804 927 972

provide varied results In this respect we do not expect thebaseline results to be helpful However it is apparent that NSLprovides wide extension and high precision The maximum

extension rate was 2445 with 799 appropriate categories fora threshold of 02 The minimum precision was around 84when the threshold was 03

In addition to evaluating DC we evaluated DA withthe NSL results for a threshold of 02 To examine theinfluence of the distance count and domain weight weanalyzed the results according to each factor Six DAs wereselected at randomwith a total of 1769 articles (terms) Box 3enumerates a part of collected articles for a domain ldquoSemanticWebrdquo And Tables 7 8 and 9 show the evaluation results withrespect to distance count and weight

The basic performance of the domain-term selectionattained precision of 709 As expected the precision wasinversely proportional to the distance however a distanceof 4 produced almost all of the unrelated articles Theweight and count could be used as important criteria toselect domain-terms we found that the weight returnedmore refined results than the count (the weight returned755 appropriate terms with 972 precision at a thresholdof 03) At short distances there are many names of peopleand organizations such as ldquoSquarespacerdquo ldquoRackspace Cloudrdquoand ldquoNsite Software (Platform as a Service)rdquo These nameswere selected by the bootstrapping because their associatedcategories (eg ldquoCloud platformsrdquo ldquoCloud storagerdquo andldquoCloud infrastructurerdquo) were similar to the entry (eg ldquoCloudcomputingrdquo) This situation is not caused by our methodbut by the definition of the article-category relations of WPWe believe that this can be resolved by processing content(abstracts) or tabular information in the future

5 Conclusions and Future Work

This paper has proposed a method of domain-term collec-tion through a bootstrapping process to assist the semanticinterpretation of data from sensor networks To achievethis we identified weaknesses in the WP category hierarchy

International Journal of Distributed Sensor Networks 9

(ie loops and inappropriate generalizations) and chose ahorizontal rather than vertical category searchWeproposednew semantic similarity measurements and a similarity con-straint to surpass existing methods Moreover we employedcategory-article networks and article-link networks to elicitinformation for the category similarity measurement Inperformance evaluations our category grouping based onNSL yielded the greatest number of proper results In termsof domain-term selection we confirmed that the resultsobtained with normalized weights had the best precision andextension rate The distance-based metric had no positiveinfluence on our research When the distance was greaterthan three almost all of the terms were unrelated Howeverwe believe that the collected domain terminologies can assistthe construction of domain knowledge bases for the semanticinterpretation of sensor data

WP has additional weaknesses to those mentioned inthis paper especially in the category-article relation Forexample the term ldquoPaco Nathanrdquo is a personal name thathas ldquoNatural language processingrdquo as one of its categoriesThe relation between the two that is ldquoPaco Nathanrdquo hasexpertise in ldquoNatural language processingrdquo causes noise andnegatively influences semantic information processing Wethink that this problem can be solved in future work byprocessing additional WP components such as abstract ortabular information Moreover our research employed onlythe out-links of WP articles If the in-links were consideredwe expect that the results would be more significant withwider coverage of domain-terms and higher relevance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Kebler and K Janowicz ldquoLinking sensor datamdashwhy to whatand howrdquo in Proceedings of the 3rd International Workshop onSemantic Sensor Networks pp 77ndash91 2010

[2] S Tilak N B Abu-Ghazaleh andWHeinzelman ldquoA taxonomyof wireless micro-sensor network modelsrdquo ACM SIGMOBILEMobile Computing and Communications Review vol 6 no 2pp 28ndash36 2002

[3] G Goodwin and D J Russomanno ldquoAn ontology-bases sensornetwork prototype environmentrdquo in Proceedings of the 5thInternational Conference on Information Processing in SensorNetworks pp 1ndash2 2006

[4] K Janowicz and M Compton ldquoThe Stimulus-Sensor-Observation ontology design pattern and its integration intothe Semantic Sensor Network ontologyrdquo in Proceedings of the3rd International Workshop on Semantic Sensor Networks pp92ndash106 2010

[5] P Barnaghi S Meissner M Presser and K Moessner ldquoSenseand sensrsquoability semantic data modelling for sensor networksrdquoin Proceedings of the ICT Mobile Summit pp 1ndash9 2009

[6] A Broring P Maue K Janowicz D Nust and C MalewskildquoSemantically-enabled sensor plug amp play for the sensor webrdquoSensors vol 11 no 8 pp 7568ndash7605 2011

[7] M Hwang D Choi and P Kim ldquoAmethod for knowledge baseenrichment using wikipedia document informationrdquo Informa-tion vol 13 no 5 pp 1599ndash1612 2010

[8] P Velardi A Cucchiarelli and M Petit ldquoA taxonomy learningmethod and its application to characterize a scientific web com-munityrdquo IEEETransactions onKnowledge andData Engineeringvol 19 no 2 pp 180ndash191 2007

[9] H Avancini A Lavelli B Magnini F Sebastiani and R ZanolildquoExpanding domain-specific lexicons by term categorizationrdquoin Proceedings of the ACM Symposium on Applied Computingpp 793ndash797 March 2003

[10] J J Jung Y H Yu and K S Jo ldquoCollaborative web browsingbased on ontology learning from bookmarksrdquo in Proceedings ofthe International Conference on Computational Science pp 513ndash520 2004

[11] J J Jung ldquoComputational reputation model based on select-ing consensus choices an empirical study on semantic wikiplatformrdquo Expert Systems with Applications vol 39 no 10 pp9002ndash9007 2012

[12] J Lehmann and L Buhmann ldquoOREmdasha tool for repairingand enriching knowledge basesrdquo in Proceedings of the 9thInternational Semantic Web Conference pp 177ndash193 2010

[13] J Liu Y-C Liu W Jiang and X-L Wang ldquoResearch onautomatic acquisition of domain termsrdquo inProceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 3026ndash3031 July 2008

[14] SMinocha andPGThomas ldquoCollaborative Learning in aWikiEnvironment experiences from a software engineering courserdquoNew Review of Hypermedia and Multimedia vol 13 no 2 pp187ndash209 2007

[15] R Navigli and P Velardi ldquoOntology enrichment through auto-matic semantic annotation of on-line glossariesrdquo in Proceedingsof the Knowledge Engineering and Knowledge Management pp126ndash140 2006

[16] C Muller and I Gurevych ldquoUsing wikipedia and wiktionaryin domain-specific information retrievalrdquo in Proceedings of the9th Cross-Language Evaluation ForumConference on EvaluatingSystems forMultilingual andMultimodal Information Access pp219ndash226 2009

[17] J Vivaldi and H Rodriquez ldquoFinding domain terms usingwikipediardquo in Proceedings of the 7th International Conference onLanguage Resources and Evaluation pp 386ndash393 2010

[18] G AMiller ldquoWordNet a lexical database for EnglishrdquoCommu-nications of the ACM vol 38 no 11 pp 39ndash41 1995

[19] S Lee S-Y Huh and R D McNiel ldquoAutomatic generationof concept hierarchies using WordNetrdquo Expert Systems withApplications vol 35 no 3 pp 1132ndash1144 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Research Article Domain Terminology Collection for ...downloads.hindawi.com/journals/ijdsn/2014/827319.pdf · knowledge bases for the semantic interpretation of sensor data. 1. Introduction

International Journal of Distributed Sensor Networks 9

(ie loops and inappropriate generalizations) and chose ahorizontal rather than vertical category searchWeproposednew semantic similarity measurements and a similarity con-straint to surpass existing methods Moreover we employedcategory-article networks and article-link networks to elicitinformation for the category similarity measurement Inperformance evaluations our category grouping based onNSL yielded the greatest number of proper results In termsof domain-term selection we confirmed that the resultsobtained with normalized weights had the best precision andextension rate The distance-based metric had no positiveinfluence on our research When the distance was greaterthan three almost all of the terms were unrelated Howeverwe believe that the collected domain terminologies can assistthe construction of domain knowledge bases for the semanticinterpretation of sensor data

WP has additional weaknesses to those mentioned inthis paper especially in the category-article relation Forexample the term ldquoPaco Nathanrdquo is a personal name thathas ldquoNatural language processingrdquo as one of its categoriesThe relation between the two that is ldquoPaco Nathanrdquo hasexpertise in ldquoNatural language processingrdquo causes noise andnegatively influences semantic information processing Wethink that this problem can be solved in future work byprocessing additional WP components such as abstract ortabular information Moreover our research employed onlythe out-links of WP articles If the in-links were consideredwe expect that the results would be more significant withwider coverage of domain-terms and higher relevance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Kebler and K Janowicz ldquoLinking sensor datamdashwhy to whatand howrdquo in Proceedings of the 3rd International Workshop onSemantic Sensor Networks pp 77ndash91 2010

[2] S Tilak N B Abu-Ghazaleh andWHeinzelman ldquoA taxonomyof wireless micro-sensor network modelsrdquo ACM SIGMOBILEMobile Computing and Communications Review vol 6 no 2pp 28ndash36 2002

[3] G Goodwin and D J Russomanno ldquoAn ontology-bases sensornetwork prototype environmentrdquo in Proceedings of the 5thInternational Conference on Information Processing in SensorNetworks pp 1ndash2 2006

[4] K Janowicz and M Compton ldquoThe Stimulus-Sensor-Observation ontology design pattern and its integration intothe Semantic Sensor Network ontologyrdquo in Proceedings of the3rd International Workshop on Semantic Sensor Networks pp92ndash106 2010

[5] P Barnaghi S Meissner M Presser and K Moessner ldquoSenseand sensrsquoability semantic data modelling for sensor networksrdquoin Proceedings of the ICT Mobile Summit pp 1ndash9 2009

[6] A Broring P Maue K Janowicz D Nust and C MalewskildquoSemantically-enabled sensor plug amp play for the sensor webrdquoSensors vol 11 no 8 pp 7568ndash7605 2011

[7] M Hwang D Choi and P Kim ldquoAmethod for knowledge baseenrichment using wikipedia document informationrdquo Informa-tion vol 13 no 5 pp 1599ndash1612 2010

[8] P Velardi A Cucchiarelli and M Petit ldquoA taxonomy learningmethod and its application to characterize a scientific web com-munityrdquo IEEETransactions onKnowledge andData Engineeringvol 19 no 2 pp 180ndash191 2007

[9] H Avancini A Lavelli B Magnini F Sebastiani and R ZanolildquoExpanding domain-specific lexicons by term categorizationrdquoin Proceedings of the ACM Symposium on Applied Computingpp 793ndash797 March 2003

[10] J J Jung Y H Yu and K S Jo ldquoCollaborative web browsingbased on ontology learning from bookmarksrdquo in Proceedings ofthe International Conference on Computational Science pp 513ndash520 2004

[11] J J Jung ldquoComputational reputation model based on select-ing consensus choices an empirical study on semantic wikiplatformrdquo Expert Systems with Applications vol 39 no 10 pp9002ndash9007 2012

[12] J Lehmann and L Buhmann ldquoOREmdasha tool for repairingand enriching knowledge basesrdquo in Proceedings of the 9thInternational Semantic Web Conference pp 177ndash193 2010

[13] J Liu Y-C Liu W Jiang and X-L Wang ldquoResearch onautomatic acquisition of domain termsrdquo inProceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 3026ndash3031 July 2008

[14] SMinocha andPGThomas ldquoCollaborative Learning in aWikiEnvironment experiences from a software engineering courserdquoNew Review of Hypermedia and Multimedia vol 13 no 2 pp187ndash209 2007

[15] R Navigli and P Velardi ldquoOntology enrichment through auto-matic semantic annotation of on-line glossariesrdquo in Proceedingsof the Knowledge Engineering and Knowledge Management pp126ndash140 2006

[16] C Muller and I Gurevych ldquoUsing wikipedia and wiktionaryin domain-specific information retrievalrdquo in Proceedings of the9th Cross-Language Evaluation ForumConference on EvaluatingSystems forMultilingual andMultimodal Information Access pp219ndash226 2009

[17] J Vivaldi and H Rodriquez ldquoFinding domain terms usingwikipediardquo in Proceedings of the 7th International Conference onLanguage Resources and Evaluation pp 386ndash393 2010

[18] G AMiller ldquoWordNet a lexical database for EnglishrdquoCommu-nications of the ACM vol 38 no 11 pp 39ndash41 1995

[19] S Lee S-Y Huh and R D McNiel ldquoAutomatic generationof concept hierarchies using WordNetrdquo Expert Systems withApplications vol 35 no 3 pp 1132ndash1144 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Research Article Domain Terminology Collection for ...downloads.hindawi.com/journals/ijdsn/2014/827319.pdf · knowledge bases for the semantic interpretation of sensor data. 1. Introduction

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of