8
Networks of Compound Object Comparators Lukasz Sosnowski Systems Research Institute, Polish Academy of Sciences ul. Newelska 6, 01-447 Warsaw, Poland and Dituel Sp. z o.o. ul. Ostrobramska 101 lok. 206, 04-041 Warsaw, Poland [email protected] Dominik ´ Sl¸ezak Institute of Mathematics, University of Warsaw ul. Banacha 2, 02-097 Warsaw, Poland and Infobright Inc. ul. Krzywickiego 34 lok. 219, 02-078 Warsaw, Poland [email protected] Abstract —We discuss the theoretical background and some practical implementations of multilayered net- works of units, which compare compound objects for the purposes of their identification, matching and pars- ing. We show how our approach arises from the previous research on fuzzy comparators. We also show in what sense the flow of information in the considered networks follows the principles of rough mereology. Finally, we present some case studies of utilization of networks of comparators in the image and text processing. Index Terms—Compound Objects, Fuzzy Compara- tors, Networks of Comparators, Rough Mereology I. Introduction In our research so far, we proved that comparators of compound objects could be a useful tool for the tasks of identification and classification [1], [2]. In this article, we extend our methodology toward a framework for compara- tor networks, increasing a range of available techniques aimed at solving decision problems related to compound objects. In order to do this, we attempt to follow some paradigms of hierarchical learning and modeling [3], [4], as well as some mathematical models useful for dealing with relationships between object components [5], [6]. The article is organized as follows. Section II presents the background for our research. Sections III and IV outline the previous studies on comparators. Sections V and VI introduce the networks of comparators. Sections VII and VIII bring some case studies and conclusions. II. Preliminaries In our approach, the structures of compound objects and the corresponding comparator networks are formu- lated by utilizing ontologies. We follow [7] (see also [8]), where ontology is defined as a system specifying the structure of concepts reflecting representations of groups of objects with some common characteristics, various types of taxonomic and non-taxonomic relationships between Supported by the grant SP/I/1/77065/10 in frame of the strategic scientific research and experimental development program: “Inter- disciplinary System for Interactive Scientific and Scientific-Technical Information” founded by Polish National Centre for Research and Development (NCBiR), as well as grants 2011/01/B/ST6/03867 and 2012/05/B/ST6/03215 from Polish National Science Centre (NCN). concepts, as well as axioms and lexicons defining how to understand concepts and relations between them. An object is any element of the real world having its representation capable of being expressed by the adopted ontology. The following properties arise from ontological representation of objects: 1) An object always belongs to a certain class in the ontology. An object may belong to several classes. 2) An object has a property within a class. Features may vary by class. 3) The object may be in relation to other objects in the same ontology. We distinguish between compound and simple objects. Simple objects are atomic, standard entities that have cer- tain characteristics, but are not composed of other objects. Compound objects may consist of other objects, either simple or compound, according to their structure defined in the ontology. Their specification includes relations and connections between their sub-objects. Compound objects satisfy the following additional properties: 1) We can extract from them a minimum of two objects that can be independent entities. 2) Component sub-objects are interrelated by using ontological relationships. We will also refer to the notion of an information granule – a clump of objects drawn together on the basis of indis- tinguishability, similarity, or functionality [9], [10]. In our case, granules will be constructed as data representations of objects and their closest surroundings. Finally, we will use some notions from the mereology theory, where the basic idea is the relation of being a part [11]. We will follow the principles of rough mereology [3], which deals with questions of the form to what degree X is part of Y, using rough inclusion μ(X, Y ). As a result of an ontology-based decomposition of com- pound objects onto sub-objects we can reach a level of complexity, which begins to be easy enough to adjust a measure of similarity. By using rough mereology, we can then determine degrees of similarity between more compound objects by combining degrees of relationships between their simpler components.

Networks of compound object comparators

  • Upload
    uw

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Networks of Compound Object Comparators Lukasz Sosnowski

Systems Research Institute, Polish Academy of Sciencesul. Newelska 6, 01-447 Warsaw, Poland

andDituel Sp. z o.o.

ul. Ostrobramska 101 lok. 206, 04-041 Warsaw, [email protected]

Dominik SlezakInstitute of Mathematics, University of Warsaw

ul. Banacha 2, 02-097 Warsaw, Polandand

Infobright Inc.ul. Krzywickiego 34 lok. 219, 02-078 Warsaw, Poland

[email protected]

Abstract—We discuss the theoretical background andsome practical implementations of multilayered net-works of units, which compare compound objects forthe purposes of their identification, matching and pars-ing. We show how our approach arises from the previousresearch on fuzzy comparators. We also show in whatsense the flow of information in the considered networksfollows the principles of rough mereology. Finally, wepresent some case studies of utilization of networks ofcomparators in the image and text processing.

Index Terms—Compound Objects, Fuzzy Compara-tors, Networks of Comparators, Rough Mereology

I. IntroductionIn our research so far, we proved that comparators of

compound objects could be a useful tool for the tasks ofidentification and classification [1], [2]. In this article, weextend our methodology toward a framework for compara-tor networks, increasing a range of available techniquesaimed at solving decision problems related to compoundobjects. In order to do this, we attempt to follow someparadigms of hierarchical learning and modeling [3], [4],as well as some mathematical models useful for dealingwith relationships between object components [5], [6].

The article is organized as follows. Section II presentsthe background for our research. Sections III and IVoutline the previous studies on comparators. Sections Vand VI introduce the networks of comparators. SectionsVII and VIII bring some case studies and conclusions.

II. PreliminariesIn our approach, the structures of compound objects

and the corresponding comparator networks are formu-lated by utilizing ontologies. We follow [7] (see also [8]),where ontology is defined as a system specifying thestructure of concepts reflecting representations of groupsof objects with some common characteristics, various typesof taxonomic and non-taxonomic relationships between

Supported by the grant SP/I/1/77065/10 in frame of the strategicscientific research and experimental development program: “Inter-disciplinary System for Interactive Scientific and Scientific-TechnicalInformation” founded by Polish National Centre for Research andDevelopment (NCBiR), as well as grants 2011/01/B/ST6/03867 and2012/05/B/ST6/03215 from Polish National Science Centre (NCN).

concepts, as well as axioms and lexicons defining how tounderstand concepts and relations between them.

An object is any element of the real world having itsrepresentation capable of being expressed by the adoptedontology. The following properties arise from ontologicalrepresentation of objects:

1) An object always belongs to a certain class in theontology. An object may belong to several classes.

2) An object has a property within a class. Featuresmay vary by class.

3) The object may be in relation to other objects in thesame ontology.

We distinguish between compound and simple objects.Simple objects are atomic, standard entities that have cer-tain characteristics, but are not composed of other objects.Compound objects may consist of other objects, eithersimple or compound, according to their structure definedin the ontology. Their specification includes relations andconnections between their sub-objects. Compound objectssatisfy the following additional properties:

1) We can extract from them a minimum of two objectsthat can be independent entities.

2) Component sub-objects are interrelated by usingontological relationships.

We will also refer to the notion of an information granule– a clump of objects drawn together on the basis of indis-tinguishability, similarity, or functionality [9], [10]. In ourcase, granules will be constructed as data representationsof objects and their closest surroundings.

Finally, we will use some notions from the mereologytheory, where the basic idea is the relation of being a part[11]. We will follow the principles of rough mereology [3],which deals with questions of the form to what degree X ispart of Y, using rough inclusion µ(X,Y ).

As a result of an ontology-based decomposition of com-pound objects onto sub-objects we can reach a level ofcomplexity, which begins to be easy enough to adjusta measure of similarity. By using rough mereology, wecan then determine degrees of similarity between morecompound objects by combining degrees of relationshipsbetween their simpler components.

III. Comparators of Compound Objects

In this section, we outline our previous research oncomparators [1], [2]. Surely, the presented model is just oneof possible approaches. For reference, in the next section,we will recall some earlier works in this area.

By a comparator we mean a structure consisting of sev-eral interlinked components aimed at analyzing similaritybetween compound objects (see Figure 1). The outcome ofthe similarity analysis will be denoted by C(a,B), where Cstands for comparator, a is a compound object belongingto an input set A, and B is a set of reference objects.C(a,B) takes a form of a subset of B, optionally sorted

and weighted with respect to the degree of relevance ofselected reference objects to the input object a. This typeof outcome is produced by so called characteristic functionfch, which takes as input the values of the membershipfunction of a fuzzy relation R over A × B, denoted byµR : A×B → [0, 1]. Function fch may take different formsdepending on a specific application.

The values of µR(a, b) can be transformed by using anactivation function g : [0, 1] → {0, 1}, where g(µR(a, b))equals 1 if µR(a, b) ≥ p, 0 otherwise, for a pre-defined ofadaptively tuned quality parameter p.

We also define the family of exception rules Z. The j-thrule specified for the reference object bi ∈ B takes a formof binary function zij : A → {0, 1}. If an input objecta ∈ A satisfies the rule zij , denoted as zij(a), then we putµR(a, bi) = 0 in order to immediately prohibit comparisonof a to bi regardless of other parameters.

IV. Types of Comparators

The studies on digital comparators started in the 50’s.In the literature from that period, one can find examplesof different variants of comparators applied to binarynumbers, such as e.g. Gray coding. In the same time,researchers have begun distinguishing between exact com-parators, returning accurate outcomes, and some results ofapproximate calculations aiming to simplify comparatorsbased solely on determination of the sign substraction.Such approximate comparators have found practical ap-plications in digital circuits. In 1958, the first integratedcircuit was developed by Texas Instruments [12].

Since the early 70’s, comparators have found their appli-cation in software development – this time not as physicaldigital closed chips, but rather computer programs oper-ating on variables, structures and objects. The basic idearemained the same – to return a relationship status forinputs. In the case of the Java programming language,comparator is nowadays a function introducing a linearorder on a collection of objects. It can be then used to con-trol an order within selected data structures such as trees,maps, and others. In this area of applications, comparatorsreturn three possible categories of results correspondingto a status of relationship between considered objects: anegative integer, a positive integer, or zero.

Fig. 1. Activity diagram of a comparator

Due to relatively easy implementation and efficiency,both analog and digital comparators have usually returnedbinary {0, 1} or ternary {−1, 0, 1} results [13], [14]. How-ever, due to successful applications of fuzzy sets in thecontrol and modeling [5], [10], fuzzy comparators becometo gain some attention too. A fuzzy comparator extendsits classical counterpart by the ability to operate withsimilarity degrees within the unit interval.

The above aspects converge to the model of comparatorapplied in our previous studies [1], [2]. Modern compara-tors should be able to deal not only with the numbers,but also with arbitrary features of considered objects.From a software design perspective, they are abstractunits that are not focused on any specific type of re-lationship functions, applicable to different objects andtheir attributes. In Section III, we relied on fuzzy relationsin order to formally introduce degrees of relationshipsbetween objects. However, depending on a type of appli-cation, one can refer equally successfully to the already-mentioned membership functions in rough mereology [3]or, e.g., to the principles of near sets [15]. Last but notleast, given some specific types of applications that requirecomparing input objects with larger reference sets, the ideaof operating with possibly ordered collections of objectswith their relationship degrees as the comparator outputsturns out to be especially useful.

V. Networks of Comparators

There are a number of network-based approaches thatcan be utilized to process similarities between objects[6], [16]. The networks proposed in this paper are one-directional structures designed for examining similaritybetween compound objects in an incremental way, whereeach further layer can take into account results obtainedin the previous one. Our networks are also designed tofollow complex structures that capture the specification ofcompound objects. There are two basic types of networkswith this respect: networks of monolithic objects andnetworks of structured objects. The first type treats anobject as a monolithic entity, without taking into accountits structure, and investigates various features of the objectas a whole. The second type will be discussed later.

The networks of comparators of compound objects canbe a convenient tool for solving complex decision prob-lems, using a simple, reusable and universal constructionmethodology, complementary to the existing computa-tional intelligence tools. Below we enumerate the funda-mental elements of the proposed network design.

Network layers correspond to contexts for which a com-parison is done. A context may include a common featurewhich is examined by different comparators or differentfeatures related to each other through the specificationof a compound object. There are three types of layers:input, hidden (optional) and output. Different layers maycorrespond to different contexts. A context depends on thedomain knowledge about objects, namely their relation-ships with other objects at the same level of hierarchy, aswell as relationships between their sub-objects.

The input object a is converted into its representationscreated for each comparator in the input layer. This way,we start constructing an information granule related to a,which includes a’s representations for different compara-tors. The granule construction is important due to the factthat different results for different representations can belinked together in a clear way [1], [2].

Each comparator unit enables the execution of a processoutlined in Section III. There are many types of compoundobject comparators. They may vary from layer to layer, ifit is justified by the efficiency of the overall solution.

The aggregator synthesizes the comparator outcomes. Itproduces the same type of reference object-related outputas described for comparators in Section III. It is alwayspresent in the output layer. It may also be included inother layers, depending on the network structure. Thegeneral network scheme does not reflect relationshipsbetween layers. These relationships may affect networkperformance and, therefore, may lead to introducing morethan one aggregator, as shown in Figure 2.

The translator is an optional component responsible foradaptation of the previous layer’s results in case the hierar-chical context of the next layer is different. It expresses theresults of the previous layer by means of reference objects

Fig. 2. General scheme of a network for monolithic objects; Cii –comparators, Aj – aggregators, Ag – global aggregator

used in the next layer, taking into account relationshipsbetween objects specific for both layers. Given the outcomeof the previous layer’s aggregator in a form of a subset ofoptionally ordered and weighted reference objects, one canfollow the methods of rough mereology [3] to translate itto the level of degrees of relationships between sub-objectsor super-objects considered in the next layer.

The output is a result of processing the input objectthrough the network. The result is obtained as a subset ofreference objects processed by the aggregator.

VI. Network ConstructionLet us start by discussing how to design connections

between the neighboring layers. Each network has at leasttwo layers. Let us put the considered element a at thenetwork input. It is a starting point for creating an infor-mation granule around it. The granule focuses on similarelements (objects) to enrich the accumulated knowledgeabout a. It will gather more and more knowledge in thenext stages of processing, which allows taking a better finaldecision. Comparator units located in each single layerare not connected. Each comparator examines selected

features and returns results independently from the others.The network structure can be built on the basis of groupsof independent object features, as well as the structure ofthe considered compound objects. Structural informationabout objects can be also a basis for defining their featuresanalyzed by a comparator. Thus, each network layer can bederived either from the context associated with a group offeatures labeling the object as a single entity, or from theanalysis of the structure of object relations – processingobject as a part of another object (we refer to this case asgeneralization) or looking at an object as a set of smallersub-objects (the case of decomposition). Depending onthe position (context) of the object in the hierarchy ofthe considered compound objects, we can consider variousobject relationships. Such relationships let grow the inputgranule built around a with characteristics of a’s sub-objects and/or super-objects. Below we discuss the twotypes of above-outlined networks.

A. Networks of Monolithic ObjectsIn this case, objects are treated as monolithic entities

without considering neither relationships between theirsub-objects nor their relationships with other componentsof super-objects they are a part of. The same inputobject is processed by each layer, although the subsets ofreference objects it is compared with may vary. At eachlayer, such subsets may be specified by features we arelooking at. As discussed in Section III, subsets of referenceobjects are also the outputs of particular comparators,which makes the results obtained from the previous layerseasily interpretable by subsequent layers.

From the performance perspective, the idea is to reducecardinality of the reference set by the results of theprevious layers. It minimizes the number of comparisonsand, therefore, speeds the process us. On the other hand,for aggregators that transmit relatively small subsets ofreference objects to each next layer there is a risk thatthose of them which would optimally fit a given inputobject are eliminated too quickly in the initial layers. Thisis a reformulation of a classical problem how to balancebetween the speed and accuracy of computations.

As an example, let us show how we used the abovetype of network to solve a task of matching authors ofpublications. Our task was related to a part of an R&Dproject financed by Polish government, aimed at designingand implementing a system for semantic search within arepository of scientific information. Our system is calledSONCA (Search based on ON tologies and CompoundAnalytics; see e.g. [17], [18]). In SONCA, each acquireddocument is stored in a parsed form in a data warehousecalled SoncaDB, in order to make the best possible use ofthe available structural and semantic information whilesearching, filtering, grouping, etc. While parsing docu-ments a lot of duplicates are generated. This is because wecreate a separate instance to preserve every appearance ofa document, including an instance for any reference to it

Fig. 3. A network for matching authors

in other documents. We do the same with other types ofderived instances too – in particular, the authors of pub-lications considered in our example below. These instancesets are initially not synchronized due to requirements forthe parser to work efficiently and transparently. Thus, datamatching needs to be done within SoncaDB.

In our experiment, we analyzed a set of authors ex-tracted from references written down in various formats,with possible errors. Our task was to identify the same per-sons. As one of solutions, we attempted to use a networkwith three layers. The input layer consists of three com-parators running concurrently: Csfl – surname’s first lettercomparator, Cnfl – name’s first letter comparator, Cslen

– surname’s length comparator. The first two of them arequite straightforward while the third one examines thesurname’s lengths with a certain tolerance (e.g., up toone character). All three comparators examine very simplefeatures, so they can be efficiently computed. They let uslimit the search space in the next layer containing fivemore sophisticated comparators, which investigate simi-larities with respect to name, surname, sorted acronym,publication category and co-authors (Figure 3).

The results of the above calculations are synthesized.The final outcome is interpreted as follows: if the networkindicates similarity to a reference object in a sufficientlystrong degree (up to the quality parameter p; see SectionIII), then the input object is matched with an alreadyexisting element of the reference set. If the answer of thenetwork is empty, then the input is treated as a new object,and it is added to the set of reference objects.

TABLE IDistribution data for comparators in the input layer

surnameinitial

count

s 987m 843b 812c 645k 607l 588h 577g 507p 489w 479d 449r 438t 388a 355f 315v 293n 262j 213e 178z 170o 154y 145i 94u 55x 34q 21

nameinitial

count

m 1027j 1014a 875s 814c 625r 560d 518p 463k 435g 432t 428l 407e 369h 355b 304f 252n 243y 219i 167w 167v 141o 92z 57x 57u 57q 20

surnamelength

count

6 18207 15675 15158 13374 10169 88610 4943 43911 28512 1702 15313 11914 8316 5615 4317 3619 2118 2020 1321 923 722 524 21 126 1

Our experiment consisted of 10098 records. Distribu-tions for comparators in the input layer are presented inTable I. The aggregator selects data which will be passedto the second layer. In the current implementation, thisis done by calculating the average of degrees of localsimilarities for the pair (a, bi), where a and bi are an inputand a reference object, respectively.

Note that comparators in the second layer may refer tofeatures of publications that the considered persons areparsed from. As an example, assume that a given authorinstance a is parsed from a publication that we have al-ready classied as belonging to a specific thematic category.Then, when analyzing a reference object bi correspondingto an already-identified person, we may examine whetherthat person’s publications (of course only those alreadyregistered in SoncaDB) fall into the same category.

The above case study is an example of utilization ofa well-known strategy of solving complex computationaltasks, where one first heuristically partitions a problemonto more feasible pieces, then conducts detailed compu-tations over such pieces, and finally aggregates local results[18], [19]. In our case, partitioning is performed by the firstlayer of comparators over the Cartesian product A × Bof input and reference objects, while detailed calculationsrefer to comparators in the second layer. Also, we cansee that those second layer comparators may be based onfeatures derived from relationships of considered objectswith other types of objects, although they do not takeinto account their compound structures.

Fig. 4. An object a and its sub-objects in a relationships treerepresenting the hierarchy of compound objects

B. Networks of Structured Objects

In this case, construction and tuning of the networkis additionally based on the knowledge about the objectstructure. Depending on the position of an object in thehierarchy, we can process with operations of its general-ization, decomposition, or both. It requires creating layerscorresponding to particular levels of the hierarchy, whichmeans that all comparators in a given layer need to referto the features defined over the same level of sub-objectsor super-objects of the considered input object.

As stated before, we refer to each level in the hierarchyof compound objects as to a context of the comparison pro-cess. Decomposition operation will just cause the appear-ance of various sub-objects in the same context, and thegeneralization will cause the appearance of super-objects,which are by definition more general objects containing ourconsidered object as one of sub-objects. Below we take acloser look at these two types of operations.

Generalization can be used when our input object has aparent object in the hierarchy of considered objects. Con-sider an object structure illustrated by Figure 4. Assumingthat the input object is at the level of aa, we can makea generalization toward a. The idea is that by findingsome best fits for the super-object at the level of a wecan limit in the next step a set of reference objects for aa.As described before, we use translators between layers. Inour example, we can express the result of the given layerby means of reference objects from another layer, followingthe fact that each a consists of aa and ab.

In summary, we identify objects belonging to the highestbranches of the hierarchy in the first layer. Then we trans-late each input object to other objects, going down layerby layer. Instead of considering all possible objects at theconsecutive hierarchy levels we only deal with those withconnections found in the previous layers. This strategy issimilar to that introduced before for monolithic objects,although now we explicitly change the levels of hierarchyinstead of just varying features used by comparators.

Fig. 5. General scheme of a network for structured objects; Tj –translators; refer to Figure 2 for other components

Let us now consider the operation of decomposition.In this case, layers represent the hierarchy levels in anorder from the most atomic to the most compound. Theremay be also some layers of comparators unrelated todecomposition, focused on specific (super-)object features.Comparators process the corresponding objects and sup-ply granules representing results at various levels of decom-position. This way, we examine the degrees of similarity(relationship, closeness, proximity) of sub-objects of aninput object to their reference counterparts. Then, basingon the similarity of sub-objects and possibly additionalfeatures, the network concludes the final response (out-put). The main idea is here to express similarity of aninput object through the similarity of its sub-objects. Thisis clearly in line with rough mereology, which we use in theconstruction of aggregators for this type of network.

Let us emphasize one more time that we assume all com-parators in a given network layer to correspond to a singlehierarchy level. Such assumption has already occurred inthe literature on layered approaches to compound objects[16]. However, according to our knowledge, this is the firstcase of using it in the area of comparators.

Fig. 6. A network for identifying administrative regions

VII. More Examples

A. Administrative Area Identification

This example shows application of comparators in thecommercial project aimed at the visualization of the re-sults of the 2010 Polish local elections [2] and the 2011Polish parliament elections. It demonstrates how domainknowledge can lead to optimization of the implementedsolution. The overall task was to assess and visualize(display) the turn out ratios for the administrative areasof Poland. There were three sources of input data:

1) Turn out results for each area;2) Contour of each area;3) The map of Poland divided into areas.

Results and contours were labeled with the area’s adminis-trative codes. However, the codes were not present on themap. In order to identify areas extracted from the map,we matched them against the repository of shapes.

TABLE IIOptimization process based on the knowledge; Ref. set –reference set cardinality, Min – the minimal number ofcomparisons, Max – the maximal number of comparisons

Ref. set Min Maxno domain knowledge 2874 2874 2874

only voivodeships 16 16 16only counties 379 379 379

only communes 2479 2479 2479voivodeship avg: 155 79 314

county avg: 8 3 19

The reference objects and objects to be identified are theareas in the administrative division of Poland (voivode-ships, counties, and communes). They belong to the hier-archy which has the following properties:

1) Voivodeship consists of counties;2) County consists of communes (boroughs);3) Each area is a part of exactly one super-area.

Knowledge of the hierarchy makes it possible to limit thecardinality of reference set with respect to the parentobject, e.g., counties selected by voivodeship, communesselected by county. Limiting the cardinality of referenceset directly corresponds to reduction of processing timeand an increase in the efficiency of identification. In orderto embed such additional knowledge, one can use themultilayered network illustrated by Figure 6. Each layer insuch a network is responsible for identifying the elementbelonging to the appropriate level of the hierarchy. Bychoosing the object at the first level, we automatically nar-row down the reference set for the second level, basing onhierarchical relationships between objects. Such networkscan be built in many different ways by using differenttypes of compound object comparators. Table II reportsexperiments carried out for the considered network.

B. Parsing and Matching of Named EntitiesLet us now go back to the task of matching (deduplicat-

ing) the instances of scientists who occur in the SoncaDB[17], [18] database as the authors of loaded scientific pub-lications. Let us note that, in some cases, it is difficult toextract from the input texts describing publications theircomponents corresponding to their authors, titles, datesand other relevant properties. It happens often, e.g., withthe literature items occurring at the ends of publications– such literature items should be interpreted, parsed andloaded as the publications themselves, but straightforwardalgorithms attempting to do it basing on a local textanalysis may not be able to handle it credibly enough.In such cases, the texts describing publications are loadedinto the SoncaDB database in a form of semantic blobswith no properties and related instances assigned. Then,there is a need of developing a specific layer that wouldextract important information from such text descriptionsby discovering their internal structures and comparingtheir components with other instances already stored in

Fig. 7. A network aimed at reference parsing; Cas – comparator ofauthor’s surname, Can – author’s name, Ct – title, Cy – year, Cd –date, Cs – structure, Ag – global aggregator

the database, prior to further steps of matching, synthesisand semantic indexing of the SoncaDB content.

The task of extracting information from an unparsed,uninterpreted text describing metadata of a scientificpublication fits perfectly the comparator networks forstructured objects. We can consider this process at twolevels, with an automatic feedback between them. The firstlevel is to identify the internal structure of the text, i.e.,to discover the sub-texts corresponding to the author(s),the title, and other relevant types of information that canbe further converted into instances, their properties andrelations between them. With this respect, one may con-sider a reference set of structures describing publicationmetadata as compound objects with some of componentslinked to each other by their relative position in the text.The degree of closeness of the input text to a particularstructure needs to be, however, verified by simulatingwhether components extracted from the initial text usinga given structure are similar to some of reference instancesof analogous types already stored in the database. Thus,as an example, the hypothesis that a given text follows astructure “Author, Author, (...): Title. (...)” can be veri-

fied by running its first component through the already-described comparator network designed for the authors’names. The same can be done with other components dy-namically derived under a hypothesis of a given structure.According to the principles of rough mereology, the degreesof proximity/similarity of particular components to themost relevant reference instances can be then merged intothe degree of proximity/similarity of the whole compoundobject to the tested metadata structure.

Starting from the beginning, we need to decompose theinput description to smaller pieces of text, correspondingto sub-objects of the original object. We propose to makethis division in places of punctuation marks. After thisoperation we have a few sub-objects connected with oneinput. Next, we compute similarities to a certain classof objects such as author’s surname, name, initial, title,year, etc. All computations are done by comparators ininput layer of the network. The next layer consists of com-parators which examine the existence of each particularclass of objects within an assumed text structure. The lastlayer corresponds to the global aggregator. We use a roughinclusion function, which determines similarity of objectsbasing on similarities between their sub-objects. However,it is worth mentioning that in this case it is especiallyimportant to use exclusion rules (see Section III), whichnarrow down the overall analysis only to the most realistictext description structures [20].

VIII. ConclusionsNetworks of comparators of compound objects are un-

doubtedly an interesting tool for solving decision prob-lems. It should be noted that they are characterized by acommon universal approach to various tasks.

The next step in this area will be the implementationof decision support systems in order to further expand thepractical case studies. The crucial aspects with this respectare an expressive power of the comparator networks forstructured objects, as well as the scalability and flexibilityof computations assured by the principles of fuzzy com-parators and rough mereology.

References[1] D. Slezak and L. Sosnowski, “SQL-based Compound Object

Comparators: A Case Study of Images Stored in ICE,” in Proc.of FGIT-ASEA 2010, ser. Communications in Computer andInformation Science, vol. 117, 2010, pp. 303–316.

[2] L. Sosnowski and D. Slezak, “Comparators for Compound Ob-ject Identification,” in Proc. of RSFDGrC 2011, ser. LectureNotes in Computer Science, vol. 6743, 2011, pp. 342–349.

[3] L. Polkowski, Approximate Reasoning by Parts: An Introductionto Rough Mereology, ser. Intelligent Systems Reference Library.Springer, 2011.

[4] P. Stone, Layered Learning in Multiagent Systems: A WinningApproach to Robotic Soccer. MIT Press, 2000.

[5] J. Kacprzyk, Multistage Fuzzy Control: A Model-based Approachto Fuzzy Control and Decision Making. John Wiley & Sons,Limited, 2012.

[6] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F.-J. Huang,A Tutorial on Energy-based Learning, ser. Neural InformationProcessing Systems. MIT Press, 2007.

[7] S. Staab and A. Maedche, “Knowledge Portals: Ontologies atWork,” AI Magazine, vol. 22, no. 2, pp. 63–75, 2001.

[8] C. Calero, F. Ruiz, and M. Piattini, Eds., Ontologies for Soft-ware Engineering and Software Technology. Springer, 2006.

[9] J. Stepaniuk, Rough-Granular Computing in Knowledge Dis-covery and Data Mining, ser. Studies in Computational Intelli-gence. Springer, 2008, vol. 152.

[10] L. A. Zadeh, Computing with Words – Principal Concepts andIdeas, ser. Studies in Fuzziness and Soft Computing. Springer,2012, vol. 277.

[11] S. Lesniewski, “Foundations of the General Theory of Sets,”in Stanislaw Lesniewski: Collected Works, Volumes I & II,S. Surma, J. Srzednicki, D. Barnett, and V. Rickey, Eds.Springer, 1991.

[12] M. Nesenbergs and V. O. Mowery, “Logic Synthesis of SomeHigh-Speed Digital Comparators,” Bell System Technical Jour-nal, vol. 38, 1959.

[13] H. Malmstadt, C. Enke, and S. Crouch, Electronics and Instru-mentation for Scientist. The Benjamin/Cummings PublishingCompany, Inc., 1981.

[14] E. Noroozi, S. M. Daud, A. Sabouhi, and H. Abas, “A NewDynamic Hash Algorithm in Digital Signature,” in Proc. ofAMLTA 2012, ser. Communications in Computer and Informa-tion Science, vol. 322, 2012, pp. 583–589.

[15] J. F. Peters, “Near Sets: An Introduction,” Mathematics inComputer Science, vol. 7, no. 1, pp. 3–9, 2013.

[16] M. Szczuka and D. Slezak, “Feedforward Neural Networks forCompound Signals,” Theoretical Computer Science, vol. 412,no. 42, pp. 5960–5973, 2011.

[17] R. Bembenik, L. Skonieczny, H. Rybinski, and M. Niezgodka,Eds., Intelligent Tools for Building a Scientific InformationPlatform, ser. Studies in Computational Intelligence. Springer,2012, vol. 390.

[18] M. Szczuka and D. Slezak, “How Deep Data Becomes Big Data,”in Proc. of NAFIPS-IFSA 2013, 2013.

[19] Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst, “HaLoop:Efficient Iterative Data Processing on Large Clusters,” PVLDB,vol. 3, no. 1, pp. 285–296, 2010.

[20] S. Jonnalagadda and P. Topham, “NEMO: Extraction andNormalization of Organization Names from PubMed AffiliationStrings,” Journal of Biomedical Discovery and Collaboration,vol. 5, pp. 50–75, 2010.