39
Collaborative Project LOD2 - Creating Knowledge out of Interlinked Data Project Number: 257943 Start Date of Project: 01/09/2010 Duration: 48 months Deliverable 9a.3.2 Implementation of Data Analytics in the Public Contract Filing Application Dissemination Level Public Due Date of Deliverable Month 46, 2014-06-30 Actual Submission Date Month 48, 2014-09-05 Work Package WP 9a, LOD2 for a Distributed Marketplace for Public Sector Contracts Task T9a.3 Type Prototype Approval Status Approved Version 1.0 Number of Pages 38 Filename deliverable-9a.3.2.pdf Abstract: The previously investigated data analytics methods have been connected to the Public Con- tract Filing Application, thus providing guidance for the users based on aggregated information. Ad- ditional user-oriented features of PCFA are also described, such as the use of product ontologies for specification of procured products. Lightweight end-user evaluation was carried out. The information in this document reflects only the author’s views and the European Community is not liable for any use that may be made of the information contained therein. The information in this document is provided ”as is” without guarantee or warranty of any kind, express or implied, including but not limited to the fitness of the information for a particular purpose. The user thereof uses the information at his/ her sole risk and liability. Project funded by the European Commission within the Seventh Framework Programme (2007 - 2013)

Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

Collaborative Project

LOD2 - Creating Knowledge out of Interlinked DataProject Number: 257943 Start Date of Project: 01/09/2010 Duration: 48 months

Deliverable 9a.3.2Implementation of Data Analytics in the Public Contract FilingApplication

Dissemination Level Public

Due Date of Deliverable Month 46, 2014-06-30

Actual Submission Date Month 48, 2014-09-05

Work Package WP 9a, LOD2 for a Distributed Marketplace forPublic Sector Contracts

Task T9a.3

Type Prototype

Approval Status Approved

Version 1.0

Number of Pages 38

Filename deliverable-9a.3.2.pdf

Abstract: The previously investigated data analytics methods have been connected to the Public Con-tract Filing Application, thus providing guidance for the users based on aggregated information. Ad-ditional user-oriented features of PCFA are also described, such as the use of product ontologies forspecification of procured products. Lightweight end-user evaluation was carried out.

The information in this document reflects only the author’s views and the European Community is not liable for any use that may be made of

the information contained therein. The information in this document is provided ”as is” without guarantee or warranty of any kind, express

or implied, including but not limited to the fitness of the information for a particular purpose. The user thereof uses the information at his/

her sole risk and liability.

Project funded by the European Commission within the Seventh Framework Programme (2007 - 2013)

Logo 2010 | 09 . 02

Page 2: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

History

Version Date Reason Revised by

0.1 2014-08-20 Initial version Vojtech Svátek

0.8 2014-09-03 Document ready for peer review all authors

1.0 2014-09-05 Revision after peer review (by S. Campinas,NUIG)

all authors

Author List

Organization Name Contact Information

UEP Vojtech Svátek [email protected]

UEP Jindrich Mynarz [email protected]

UEP Václav Zeman [email protected]

UEP Marek Dudás [email protected]

UEP Jirí Helmich [email protected]

UEP Jakub Hrkal [email protected]

UEP Patrik Kompus [email protected]

I2G Witold Abramowicz [email protected]

I2G Dominik Filipiak [email protected]

I2G Łukasz Grzybowski [email protected]

I2G Mateusz Jarmuzek [email protected]

I2G Krzysztof Wecel [email protected]

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 1

Logo 2010 | 09 . 02

Page 3: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Executive SummaryThe deliverable concludes task T9a.3, which focused on application of data mining and analytic methods overpublic procurement linked data. Public Contract Filing Application (PCFA) is the unifying node from whichthe different kinds of analytic functionality are accessible. Since it is the final deliverable of WP9a, it furtherdescribes, aside the implementation of analytic methods in PCFA as main topic, the second round of analyticexperiments undertaken on procurement linked data, as well as the main extensions of the PCFA as such sinceD9a.1.2, in which the first version was introduced.

The originally foreseen functionality of semantic analytical reporting was not implemented for two reasons:1) the LISp-Miner system that allows this reporting had eventually not been found sufficiently scalable, and wasreplaced by other tools; 2) interactive parametrization and running of data mining task was found excessivelycomplex to be carried out by end users such as contract authorities and bidders. The focus was shifted frombuilding and publishing extensive data mining models to providing useful pieces of information to the enduser, based on procurement linked data sources. Such pieces of information are however mainly relevant inthe context of the interactive process of public contract filing, and is not suitable for publishing as linked data.For this reason, the title was changed from “Implementation of Data Analytics and Semantic Reporting for theLinked Data from the Web Application for Filing Public Contracts” to “Implementation of Data Analytics in thePublic Contract Filing Application”.

Overall, the texual part of the deliverable consists of:

• description of the overall updated user interface of the PCFA (Sect. 2)

• description of the individual PCFA features related to data mining and analytics (Sect. 3–5)

• description of other data mining experiments, whose outcomes could also potentially be exploited inthe PCFA (Sect. 6–8)

• report on the end-user evaluation of PCFA as whole, by contract authorities (Sect. 9).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 2

Logo 2010 | 09 . 02

Page 4: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table of Contents

1 Introduction 4

2 Updated User Interface of PCFA 5

2.1 Mainstream Version of PCFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 PCFA version with Product Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Recommendation of Contract Parameter Values 11

4 Prediction of Number of Tenders 12

4.1 Dataset Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.2 Learning Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4 API Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Interactive Graph Exploration for Chosen Entity 16

6 Affinity Analysis by Association Rules 18

6.1 Data Co-Occurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.2 Business Entities Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7 Clustering in Public Procurement 25

7.1 Matchmaking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

7.2 Learning Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

7.3 Results and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7.4 Implementation and API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

8 Ad-hoc Analysis of Procurement Linked Data 30

9 Evaluation by Contract Authorities 33

9.1 Evaluation setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

9.2 Evaluation notes – UEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

9.3 Evaluation notes – CHMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

10 Conclusions 37

11 References 38

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 3

Logo 2010 | 09 . 02

Page 5: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 IntroductionThe deliverable presents a prototype implementation of the Public Contracts Filing Application (PCFA), withfocus on the functionality that exploits the data mining technology. The other part of the advanced supporttools, related to matchmaking, has already been described in deliverable D9a.2.2 [9]. Results of the presentedwork are embodied in the updated version of PCFA (as user-facing tool) and in three web services interfacingthe tools that have various data-mining-based or visual analytic models behind themselves.

The deliverable partly builds on the preceding T9a.3 deliverable, D9a.3.1 [13], which summarized a numberof studies on procurement linked data mining, as carried out in the previous phase of the project. First, theimplemented data mining functionality as such leverages on experience gained from these studies. Second,since this is the last deliverable of WP9a, its last section is devoted to additional experiments in procurementlinked data mining (not directly connected to PCFA) that had not been available at the time of writing D9a.3.1.

The (lightweight) usability tests presented in this deliverable however cover both the matchmaking andanalytical functionality, since the separation of these kinds of functionality is purely technological and does notmake any sense in the context of user scenarios. Therefore (and also with respect to the delayed submissionof D9a.2.2 and thus very short time span between the completion of matchmaking vs. analytical functionality)the testing users have been confronted with the PCFA only once, as described here.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 4

Logo 2010 | 09 . 02

Page 6: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Updated User Interface of PCFAThe PCFA has recently evolved to two branches. The first, mainstream one, integrates the matchmakingfunctionality and the support based on data mining. The second branch aims to demonstrate the integration ofproduct ontologies as potential source for more elaborate description of certain kind of public contracts: thoseconsisting in procuring common customer goods (as subject of e-commerce, for which the ontologies haveoriginally been developed). This branch is a result of a student project, which had been separated from themainstream one at a certain point so as not to interfere with the key functionality planned in the Descriptionof Work. The two branches have not been integrated yet at the moment of writing this deliverable. Wetherefore start with the interface of the mainstream version and then shift to the slightly modified interface ofthe ‘product ontology’ version.

A demo of the mainstream version is available at http://lod2.vse.cz:8080/pc-filing-app/, andthat of the ‘product ontology’ version at http://lod2-dev.vse.cz:8080/pcfa/. The application shouldwork in any standard browser, although Chrome is a safer option (e.g., the forms for product-ontology-basedspecification have suboptimal layout in some other browsers).

2.1 Mainstream Version of PCFA

As with the previous version described in deliverable D9a.1.2 [8], the core functionality is that of creating andpublishing contracts (calls for tenders) on the side of the buyer (contract authority) and that of creating companyprofiles and preparing tenders on the side of the potential supplier (bidder). The login page thus provides four¹options: buyer login, buyer registration, supplier login and supplier registration.

Most of the novel functionality is concentrated in the view of contract list, which can be that of calls fortenders under preparation or contracts that have already been published. Fig. 1 shows a list of calls underpreparation.² The second to last column provides the options to publish the call (i.e., move it to the RDF graphresiding in the public space), edit its values and delete it; newly, it also offers the invocation of the number ofbidders predictor implemented by I2G (its implementation is described in Section 4). The last column offersthe invocation of the matchmaking services described in the recent deliverable D9a.2.2 [9], i.e. search forpublished contracts (across all contract authorities in the dataset) similar to the current one, and search forsuitable suppliers for the current contract.

When a new call is being created, ‘smart’ support by PCFA is provided in the form of string-based auto-completion and context-based value recommendation. String-based autocompletion concerns the choice ofcodes, namely, CPV codes for commodities³ and NUTS codes for localities. Fig. 2 shows a snapshot of creatinga new contract, where the user typed the string “video p” and immediately received three possible CPV codesfor this string to choose from. Context-based value recommendation is currently only implemented for therecommendation of typical price for the given type of commodity,⁴ and currently is only applied on Czechcontracts, since the training data are from the Czech Republic (the respective invocation button is thus onlyavailable when the currency of the estimated price is set to CZK). Its internals are described in Section 3.

¹The ‘product ontology’ version additionally provides a fifth option for admin login, which allows to import and customize a newproduct ontology. See Section 2.2.

²These have been created during the evaluation sessions with Czech contract authorities, therefore their titles are in Czech.³This functionality was already available in the first version of the application; however, its performance has been meanwhile

increased by an order of magnitude.⁴As the evaluation sessions indicated, this functionality is only relevant for a limited scope of commodities, because of limited input

data available.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 5

Logo 2010 | 09 . 02

Page 7: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 1: Functionality provided in the list-of-contracts view

Figure 2: Code autocompletion for a new call

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 6

Logo 2010 | 09 . 02

Page 8: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 3: Ordered list of similar contracts

Calling the matchmaker for similar public contracts yields a list as in Fig. 3. Currently, no further analysis isavailable for this view.⁵ Calling the matchmaker for suitable suppliers (based on past contracts) yields a list asin Fig. 4. In this view, the provisional exploratory visualization can be invoked using the “Visualization” button;it is described in Section 5.

The functionality offer to the supplier side has no additional advanced functionality compared to what isoffered for the buyer side. Namely, the supplier can create or update its profile containing, in particular, theCPV codes and location (incl. NUTS codes) of its business. The matchmaking is then carried out in the sameway as when finding suppliers for a new contract, except that the direction is reversed. The resulting orderedlist of relevant calls for tenders then looks very similar to that in Fig. 3.

2.2 PCFA version with Product Ontologies

Additional functionality offered in the contract list view is that of providing additional specifications accordingto a product ontology. For contracts whose CPV code matches that of an ontology already accessible to thesystem, the functionality can be invoked via the respective “add specification” button, as in Fig. 5. The additionalform generated from a bicycle ontology is in Fig. 6. Quantitative data can be specified either as specific valuesor as interval range (using the S/R switch on the left). The rdfs:comment for the given property in the originalontology is available as contextual help (here for the “Size of frame” property). All this additional data is storedaccording to the GoodRelations ontology⁶ [6] and to the specific product ontology.

For a new type of commodity, for which a product ontology compliant with GoodRelations⁷ has beenidentified, this ontology can be uploaded in the advanced user interface (currently accessible through an ‘Admin’login button, from the login page). Let us assume that the application is to be extended so as to allow detailed

⁵The functionality of ‘synoptic’ display is only under construction at the moment of writing this deliverable.⁶http://purl.org/goodrelations/v1⁷A repository of such ontologies is in http://www.ebusiness-unibw.org/ontologies/opdm/, as result of the OPDM project.

However, even ontologies designed independently of GoodRelations can be ‘canonicalized’, using semi-automated pattern-basedtransformation, as shown for a subset of Freebase in [5].

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 7

Logo 2010 | 09 . 02

Page 9: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 4: Ordered list of suitable suppliers

Figure 5: Invocation of product-ontology-based form through a CPV code

Figure 6: Specification of ontology-based details on the procured product

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 8

Logo 2010 | 09 . 02

Page 10: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 7: List of uploaded product ontologies

specifications of coffee machines (e.g., for an office building). For this domain, an ontology is available athttp://purl.org/opdm/coffeemachine. Its upload starts by a basic dialog specifying the name, descriptionand URI of the ontology, after which the ontology is included in the ontology list, see Fig. 7. Subsequently, a classfrom the ontology, which corresponds to the commodity in question, is chosen, and one or more CPV codes arespecified for it (so as to connect it with relevant contracts), as in Fig. 8. Then the user selects the properties ofthis class to be covered in the detailed contract form, as in Fig. 9; separately for ‘main properties’ (correspondingto properties modeled as ‘qualitative properties’ or ‘quantitative properties’ in terms of GoodRelations) and‘additional properties’ (corresponding to plain data properties of the given product ontology). Finally, thegenerated form is displayed for approval. This mechanism allows to only cover, from the whole ontology, theclasses and properties relevant for matchmaking in public procurement.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 9

Logo 2010 | 09 . 02

Page 11: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 8: Specification of class and CPV codes

Figure 9: Specification of properties

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 10

Logo 2010 | 09 . 02

Page 12: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Recommendation of Contract Parameter Val-ues

The service for estimated price recommendation currently inputs the main CPV code and the expected durationof the contract. In the tentative version available in the time of writing this version of deliverable, it outputsa single number. However, given the roughness of this estimate, a more adequate version of the service,outputting a numerical interval, is pending.

The recommender itself is currently a regression tree trained on the whole set of Czech data. The under-lying mining framework is BigML.⁸

⁸https://bigml.com/

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 11

Logo 2010 | 09 . 02

Page 13: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Prediction of Number of TendersThe number of bids submitted to a certain contract notice is a highly fluctuating variable and basically hard topredict. It depends on many factors, many of which are not even included in public contract details. Moreover,only some attributes describing the contract are in fact contributing to the variance in the number of tendersvariable.

4.1 Dataset Exploration

An exploration of Polish public contracts⁹ shows that over 38% of contract notices attracted just one offer. Thisis not a positive tendency as the idea of public contracting is to stimulate competitiveness. There was also onecontract that received 610 offers. In some cases we observed large numbers of rejected offers, the biggestthree being: 298, 245, and 111. Just one bid submitted can be the result of over-specified contract notice orshort notification deadline. There are also other factors that are subject to below analysis.

Table 1 presents the number of contracts in relation to the number of tenders. The percentage explainswhat the share of given number of contracts in the whole dataset was.

Table 1: Number of contracts with certain number of tenders

Number oftenders

Count Percent

1 129910 38.2%

2 67929 20.0%

3 46412 13.6%

4 29850 8.8%

5 19367 5.7%

6 12679 3.7%

7 8277 2.4%

8 5639 1.7%

9 3920 1.2%

10 2799 0.8%

We can observe the tendency that number of contracts with higher number of bids is dropping rapidly.Even more explanatory is the chart. Figure 10 presents the relationship between number of tenders and numberof contracts both on logarithmic scale. The almost linear dependency between logs of variables suggests thelong-tail effect. This phenomenon has also been formulated as Zipf’s law, i.e. distribution of number of bidderscan be approximated with a Zipfian distribution, one of the discrete power law probability distributions [14].

⁹We included the whole year 2013 to avoid potential seasonal fluctuations as budget are planed year to year.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 12

Logo 2010 | 09 . 02

Page 14: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 10: Number of contracts by number of tenders on log scales

4.2 Learning Procedure

First step is the selection of a set of contracts attributes with the aid of domain knowledge which should beuseful to build prediction model and transform them from graph format by SPARQL queries to traditional tableform in CSV file. Attributes have to be understandable and present in most of the contracts to avoid missingdata issue. At least some of them have to be common to national contract descriptions to allow buildingmodels for various European countries. Based on our experience in data mining, we have pre-selected thefollowing attributes:

• contract identifier (not really used in mining but needed for contract representation)

• estimated value of contract as assessed by contracting authority

• public procurement procedure type

• type of provision, one of three values: works, supplies or services

• criterion for selection of the best bid, i.e. value A if only based on price, B – if other criterion is alsoconsidered, C – if part of the contract uses both previous criteria

• numerical CPV code defined in the contract

• number of contract parts

• month when contract notice has been published

• is prepayment expected, as number boolean

• if additional orders are possible

• if variant offers are allowed

• is connected with EU programme.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 13

Logo 2010 | 09 . 02

Page 15: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data is in fact stored in a tabular form. This form of storing data allows easy, fast access and incrementaladdition of new instances through write at the end of file for the purposes of model generation. It is comfortablein use for a traditional data mining tasks.

We have identified the following algorithms as suitable for testing purposes and evaluation: Linear Re-gression, Regression Tree (fast decision tree learner [12], M5 [11]), Support Vector Machine and k-NN methods.They have been tested and evaluated on a sample. Some of the models required only numeric features asinput, therefore pre-processing was necessary. Nominal variables have been transformed to numerical. Thefollowing features: month, estimated value, number of parts in contract and CPV code have been normalizedby Z-transformation method. RapidMiner Community 5.3¹⁰ tool with Weka extension has been used to constructand test data mining models.

Polish public contracts from 2013 have been used as a learning dataset. Getting full information abouta contract requires combining information from two notice types: contract notice and contract award notice(in Polish they are described in document types ZP-400 and ZP-403 respectively). Not all contracts announcedin 2013 were awarded in 2013 and there were also some contracts announced in 2012 which have not beentaken into account. As a result of merging, our datasets contained 126 thousand instances. We have decidedto use all examples from dataset to provide the greatest possible accuracy of prediction model. Consequently,obtaining the model in relatively short time is important due to the daily publication of public contracts. It isalso justified by the need for recurring rebuilding of the model and the availability of tools for users; it is bestto avoid incremental model building. Table 2 presents measured times for model building.

Table 2: Time of execution model building process

Algorithm Average time of building modelprocess

Fast decision tree learner 8 s

k-NN 5 m 42 s

Linear Regression 20 s

M5 5 m 4 s

SVM > 2 h

Our criteria concerning the applicability of the model included pre-processing, model configuration, andperformance measurement. Based on the above, we have rejected SVM-based method because it was notsatisfactory with regard to performance. Model configuration has also been problematic – it was hard to findvalues of parameters producing comprehensive outcome. Moreover, the results could not have been easilyinterpreted.

The prediction method has been implemented with RapidMiner mechanisms in the background. It consistsof two separate processes defined in RapidMiner GUI. First process consists of reading and pre-processingoriginal data and creating prediction model. Second process applies model to new instances passed by externalapplications via API and returns the prediction, along with an average error. This error is used to defineboundaries of prediction as this is returned by the API.

¹⁰http://rapidminer.com/

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 14

Logo 2010 | 09 . 02

Page 16: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.3 Evaluation

In order to evaluate the results we have used twomeasures. The first one, root mean squared error, is frequentlyused measure to specify average value of error of prediction model. The second is correlation coefficientbetween actual number of tenders and prediction. It measures strength and the direction of relationshipbetween two variables specifying the accuracy of prediction. Regression trees have proved best in view ofaccuracy of the model, and both of them (fast decision tree learner and M5) gave similar results. In relation toaccuracy and time of model building we have decided to use method based on fast decision tree learner.

Table 3: Accuracy of the tested algorithms

Algorithm Root mean squared error Correlation

Fast decision tree learner 6.547 0.740

k-NN 7.017 0.700

Linear Regression 6.875 0.708

M5 6.509 0.744

4.4 API Description

The prediction application is written in Java, Spring Framework¹¹, and RapidMiner is used as a Java library. Thepaths of all required files and other parameters are changed in process file programmatically. Mechanism isdesigned to delete old files and produce new files, thus model, intermediate results and calculations neednot be stored in memory. Moreover, it is easy to change the learning algorithm through modification processin RapidMiner tool and just one file in project has to be replaced without code changes and recompilation.The RESTful API provides access to data mining algorithm by HTTP POST method at http://data.i2g.pl/contract-analytics/bidders endpoint. It retrieves public contracts in the form of JSON or JSON-LD as inputwhich should contain attributes mentioned in 4.2.

Then starts themining process that transforms input data as required, applies a storedmodel and computesresults which next are returned as JSON in HTTP POST response. The result consists of prediction of number oftenders, with lower and upper bound of possible prediction error. Listing 1 presents sample input and Listing 2sample output of the prediction method.

1{”estimatedValue”:”2895793.68”, ”procedureType”:”Open”,2”criterion”:”A”, ”cpv”:”454530007”, ”numParts”:”1”, ”month”:”1”,3”advance”:”0”, ”addendum”:”1”, ”variant”:”0”, ”ue”:”0”,4”kind”:”Works”, ”id”:”2013_100300”}

Listing 1: Sample input for prediction method provided by the API

1{”prediction”:10, ”topBoundary”:17, ”bottomBoundary”:3}

Listing 2: Sample output for prediction method provided by the API

¹¹http://spring.io/

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 15

Logo 2010 | 09 . 02

Page 17: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Interactive Graph Exploration for Chosen En-tity

While PCFA provides the usual tabular views of procurement data, we aimed to leverage on a larger scope ofviews as provided by LOD2 tools. In deliverable D9a.3.1 [13] (Sect. 6.2), Payola [7] was already introduced as aninteractive visualization tool allowing to iteratively explore the space of RDF data related to public procurement.Therefore, a connection from PCFA to Payola was implemented: from the list of business entities proposed bythe matchmaker as suitable suppliers, the user can proceed to the Payola view corresponding to the RDF graphneighborhood of the given business entity. On the side of Payola, a ‘data source’ corresponding to the publicdata space of PCFA is configured using its SPARQL endpoint and named graph. PCFA links to the views in Payolaby generating URLs based on the data source endpoint URL, URI of the selected business entity, and ID of theselected visualization plugin. In the Payola node-link display, the user can interactively navigate to furthernodes, which expand, in turn.

Example is in Fig. 11: a potential supplier of beamers is connected with five previous tenders (in thebottom) and additional, e.g., contact, information.¹² It is then possible to navigate to a tender so as to see whothe buyer was, and display the contracts of this buyer (contracting authority) in a similar way.

¹²The shown graph is not the ‘regular tree’-shaped initially shown, but one after a bit of manual rearranging of nodes, for theirbetter visibility.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 16

Logo 2010 | 09 . 02

Page 18: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 11: RDF environment of a proposed supplier, in Payola visualizer

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 17

Logo 2010 | 09 . 02

Page 19: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 Affinity Analysis by Association RulesStudy of affinity and co-occurrence of data can help in discovering non-trivial relationships between attributes.Patterns describing behaviour of contracting authorities and bidders are the most visible examples for publicprocurement domain. Frequent relationships between the same business entities may be an indicator of illegalbehaviour such as restricting competition on the market, and as such it is a case for supervisory bodies. Con-tracting authorities may benefit from diagnosis of relations between CPV codes and estimated prices. Tendersmay be interested in information about increased turnover on certain periods (e.g. end of the year) and alsoinvolvement of universities in EU programmes, so that they can prepare a bidding strategy.

6.1 Data Co-Occurrence

The data for our studies have been prepared analogously to prediction and clustering cases, see Section 4.CPV codes have been limited to two characters representing divisions of the contract subject. Analysis hasbeen performed using the Apriori algorithm [3] in RapidMiner tool. Study of data co-occurrence allows puttingforward hypothesis referring to public contract domain. Generated rules have been evaluated by classic supportand confidence measures, and additionally by conviction, another measure that tackles weaknesses of theabove. According to [4] it is calculated as:

conv(A → C) = 1 − sup(C)1 − conf(A → C)

where A – antecedent of the rule, C – consequent of the rule. Conviction attempts to measure the degree ofimplication of a rule. It is infinite for logical implications (confidence = 1.0); it is 1.0 if A and C are independent.Conviction can be particularly successful in classification.

Due to quite large data set the lower bound for the minimum support was 1% and minimum confidence80%. An extract of 25 rules with the highest confidence are presented in Figure 12.

Figure 12: Association rules between attributes in Polish public contracts

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 18

Logo 2010 | 09 . 02

Page 20: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Not all rules are valuable. Some of them describe obvious relations or even business rules, like a relationbetween CPV and kind of contract, e.g. CPV code 80 (Education and training services) will always be classifiedas a service delivery kind of contract.

After filtering out the irrelevant ones, the following list of interesting rules remain:

(a) EstimatedV alue = high ∧ T enders = very high ∧ MainObject = 33 → NumP arts = more than one

(b) T enders = high ∧ MainObject = 33 → NumP arts = more than one

(c) NumP arts = more than one ∧ T enders = very high ∧ MainObject = 80 → UE = true

(d) MainObject = 80 → UE = true

(e) Month = 1 quarter ∧ MainObject = 80 → UE = true

(f) UE = true ∧ Criterion = B → Kind = Services

(g) Month = 1 quarter ∧ UE = true ∧ Kind = W orks → EstimatedV alue = high

Interestingly, it looks like CPV division 80. Education and training services is very often associated withEuropean Union Programmes (d) but according to the conviction measure it is not a really strong relationshipin this dataset. This dependency occurs also in more specific rules, i.e. in conjunction with first quarter (e)and in other rule together with very high number of tenders and more than one part (c). Connection ofEuropean Union Programmes with education and training sector is nothing unexpected because it is one ofthe intervention areas of EU. Relationship between CPV divisions and time dimension may indicate that certaincontracts appear more often at specific times of the year. Contracts with more than one part are especiallycharacteristic for CPV division 33. Medical equipments, pharmaceuticals and personal care products (a), highor very high number of tenders and high estimated value of contract. European Commission Report Study onCorruption in the Healthcare Sector [1] points out many of improper activities of the Polish healthcare sector,and one of the most common issues is restriction of competition by inappropriate procurement specification.Another rule shows link service contracts with EU Programmes and multiple criteria. The enormous popularityof the practice of limiting the selection criteria only for money is the specificity of Polish public procurementsystem [2]. In 90% of tenders in the competitive procedure type in Poland, the price is the exclusive awardcriterion, whereas in Europe the numbers are just at about 30%.

6.2 Business Entities Relationship

In the previous section we have studied dependencies between all attributes. Now we shall focus on associ-ation mining between entities – contracting authorities and contractors.

First, we get the most frequent associations between contracting authorities and contractors in the wholedataset. It can be achieved with a simple SPARQL query, which is presented in Listing 3.

1select ?authority ?contractor ?count2where {{3select ?authority ?contractor count(?contract) as ?count4from <http://data.i2g.pl/bzp2013>5where{6?notice zp:noticeType ”ZP-403” .7?contract pc:notice ?notice ;8pc:contractingAuthority ?authority .9?subcontract pc:isLotFor ?contract ;10pc:awardedTender ?tender .11?contractor pc:supplierFor ?tender .12}

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 19

Logo 2010 | 09 . 02

Page 21: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13group by ?authority ?contractor14}15filter(?count>9)16}17order by desc(?count)

Listing 3: SPARQL query for getting most frequent associations between contracting authorities and contractors

Based on the query above we obtain Table 4 showing how many times given entities singed a cotnract. Itthe example Telekomunikacja Polska S.A. has delivered 71 contracts to Region Wsparcia Teleinformatycznegowe Wrocławiu.

The number of contract is rarely reflecting the real economic background. In order to estimate the impactwe have to look into value of contracts. The query is prepared in a similar way so we skip the SPARQL code.Results are presented in Table 5.

Quick look into the table reveals that there is some problem with quality of data. For example, it is unlikelythat one theatre delivers services to the other theatre worth 1.6 billion PLN (400 million euro). Also, the amountof 200 million PLN in two cases concerning the same public body looks like a placeholder.

Other interesting results have been obtained when we just looked into relations between public bodiesand contractors through basket analogy. The interpretation may be tricky but it goes this way. First, we createbaskets. In the first case the baskets are created by authorities, i.e. contractors that delivered goods to thesame public body are put into the same “basket”. Second, we analyse if there are any associations betweencontractors, i.e. if they deliver goods in common (not necessarily in the same contract). These techniques maybe useful in looking for similar companies or just direct competitors. Results are presented in Table 6.

Similar exercise has been conducted in the reverse direction, i.e. now baskets are created by contractors,i.e. public bodies served by the same contractor are put into the same “basket”. Now, we analyse if thereare any relations between authorities, i.e. if they are served by the same set of contractors. Existence of suchrelations may be an indicator of the shallow market, i.e. there are not too many competitors. Results for thisanalysis are presented in Table 7. Astonishingly, all entities mentioned in the table belong to healthcare sector.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 20

Logo 2010 | 09 . 02

Page 22: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table4:

Mostfrequentcontra

ctsbetw

eencontractingauthorities

andcontractors

Contractingauthority

Contractor

Num

.ofcontra

cts

Region

Wsparcia

Teleinform

atycznegoweWrocław

iuTelekomunikacjaPolsk

aS.A.,W

arszaw

a71

Akadem

iaMorskaw

Gdyni

ALICOSp.z

o.o.,G

dansk

68

Politechn

ikaRzeszowskaim

.IgnacegoŁukasie

wicz

aALCH

EMGrupaSp.z

o.o.,R

zeszow

68

UniwersytetM

ariiCurie

-Skłodow

skiej

Reset-P

CWojcie

chKondratowicz-Kucew

icziA

dam

ZamsSp.j,Lublin

56

SzpitalSpecjalistycznyim

.Sz.

Starkiew

iczaw

Dabrow

ieGó

rnicz

ejSalusInternationalSp.zo.o.,Katowice

53

Politechn

ikaLubelsk

aReset-P

CSp.J.,Lublin

53

Akadem

iaim

.JanaDługosza

wCzestochow

ieeM

DFKomputerM

onikaFryst,Czestochow

a52

Miejsk

iZarzadBu

dynków

Mieszkalnych

PrzedsiebiorstwoWodociagow

iKanaliza

cjiSp.zo.o.,Kalisz

52

Samodzie

lnyPublicz

nySzpitalKlinicz

nyNr1

weWrocław

iuCentrum

ZaopatrzeniaMedycznegoCEZALS.A.,W

rocla

w48

Politechn

ikaKrakow

skaim

.TadeuszaKo�ciuszki

Mich

ałJuszczyk,K

raków

48

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 21

Logo 2010 | 09 . 02

Page 23: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table5:

Mostfrequenta

ssociatio

nsbetw

eencontractingauthorities

andcontractors

Contractingauthority

Contractor

Contractvalue

Teatrim.LudwikaSolsk

iego

wTarnow

ieTeatrim.A

damaMick

iewicz

aw

Czestochow

ie1,6

50,016,500

OpolskieTowarzystwoBu

downictwaSpołecznegoSp.z

o.o.

ENERGO

POLTRA

DEOP

OLESp.z

o.o.

1,098,392,573

MiastoGliwice

,Wydzia

łGeodezjiiKarto

grafii

MIRBU

DS.A.,Skierniew

ice321,4

13,476.45

BałtyckaInstytucjaGo

spodarkiBu

dzetow

ejBA

LTICA

Budjafex

PrzedsiebiorstwoKonserwacjiZabytkow

UslugBu

dowlanych,D

zialdow

o200,00

0,00

0

BałtyckaInstytucjaGo

spodarkiBu

dzetow

ejBA

LTICA

Konsorcjum

firm

Zaklad

RobotO

golnobudow

lanych

Stanislaw

RepinskiGA

RDASp.zo

.o.

200,00

0,00

0

GminaBystrzycaKłodzka

GospodarczyBa

nkSpoldzielcz

yw

Strzelinie

190,700,010

UrzadGm

inyw

Jednorozcu

PrzedsiebiorstwoWielobranzoweZIEJARyszard,

Lomza

114,80

0,436.18

Dom

PomocySpołecznejw

Kole

ECO-THERM

Sp.z

o.o.

112,947,0

75

GminaMiejsk

aHa

jnów

kaPrzedsiebiorstwoWodociagow

iKanaliza

cjiSp.zo.o.,Ha

jnow

ka66,753,296.18

Zespół

O�wiaty

Gminnej,Strzelin

DANIEL-PrzewozyAu

tokaroweSzydzia

kDa

niel,Strzelin

57,802,876

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 22

Logo 2010 | 09 . 02

Page 24: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table6:

Authorities

asabasket

inassocia

tionrulesanalysis

Antecedent

Consequent

confidence

support

count

conviction

PrzedsiebiorstwoH

andlow

eJON

EXKA

TARZYN

AJON

SKA-ZielonaG

ora-4a43

IceQB

spolkazogranicz

onaodpow

iedzialnoscia

-ZielonaGo

ra-7c6b

100.00

0.06

10.00

inf

MERCKSp.zo

.o.-W

arszaw

a-91fa

∧BIOK

OMBa

kaOlszew

skiSp.j.-Janki-fc9f

ProspectaSp.zo.o.-W

arszaw

a-b76a

90.91

0.06

10.00

10.99

LifeTechn

ologiesPolskaSp.zo.o.-W

arszaw

a-c278

∧BIOK

OMBa

kaOlszew

skiSp.j.-Janki-fc9f

ProspectaSp.zo.o.-W

arszaw

a-b76a

90.91

0.06

10.00

10.99

Sorin

GroupPolskaSp.zo.o.-W

arszaw

a-27ed

Medtro

nicPolandSp.zo.o.-W

arszaw

a-a2b2

∧MAQ

UETPolskaSp.zo.o.-

Warszaw

a-80b6

76.92

0.06

10.00

4.33

MerancoAp

araturaKontro

lno-PomiarowaiLaboratoryjnaSp.zo.o.-P

oznan-

5e6a

∧EURxSp.zo

.o.-G

dansk-4d0e

ProspectaSp.zo.o.-W

arszaw

a-b76a

76.92

0.06

10.00

4.33

BIOK

OMBa

kaOlszew

skiSp.j.-Janki-fc9f

ProspectaSp.zo.o.-W

arszaw

a-b76a

∧MERCKSp.zo

.o.-W

arszaw

a-91fa

71.43

0.06

10.00

3.50

NovaSpineSp.zo

.o.-TyniecM

aly-a498

∧Medtro

nicPolandSp.zo.o.-

Warszaw

a-a2b2

LFCSp.zo.o.-Zielonagora-9bc1

71.43

0.06

10.00

3.50

BIOK

OMBa

kaOlszew

skiSp.j.-Janki-fc9f

ProspectaSp.zo.o.-W

arszaw

a-b76a

∧LifeTechn

ologiesPolskaSp.zo.o.-

Warszaw

a-c278

71.43

0.06

10.00

3.50

BIOK

OMBa

kaOlszew

skiSp.j.-Janki-fc9f

MERCKSp.zo

.o.-W

arszaw

a-91fa

∧ALAB

Sp.zo

.o.-W

arszaw

a-c56e

71.43

0.06

10.00

3.50

ProfarmPSSp.zo

.o.-S

taraIwicz

na-a599

∧BIOTRO

NIKPolsk

aSp.zo.o.-

Poznan-e1bb

PROC

ARDIAM

EDICALSp.zo

.o.-W

arszaw

a-9a19

71.43

0.06

10.00

3.50

PerlanTechn

ologiesPolskaSp.zo.o.-W

arszaw

a-a10e

∧LABA

RTSp.zo

.o.-

Gdansk-01f3

LINEGALCH

EMICALSsp.zo.o.-W

arszaw

a-d3d4

71.43

0.06

10.00

3.50

SYMBIOS

Sp.zo

.o.-S

traszyn-1aef∧

ALAB

Sp.zo

.o.-W

arszaw

a-c56e

EURxSp.zo

.o.-G

dansk-4d0e

71.43

0.06

10.00

3.50

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 23

Logo 2010 | 09 . 02

Page 25: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table7:Contractorsas

abasket

inassocia

tionrulesanalysis

Antecedent

Consequent

confidence

support

count

conviction

Samodzie

lnyPublicz

nyZakład

Opieki

Zdrowotnejw

Hajnów

ce∧

Wo-

jewódzkiSzpitalSpecjalistycznyw

Olsztynie

SzpitalLipno

sp.z

o.o.

90.91

0.01

10.00

11

PowiatowySzpitalw

Aleksandrowie

Kujawskim

Sp.zo.o.

∧Wojew

-ódzkiSpecjalistycznySzpitalim.M

.Pirogowaw

Łodzi

SzpitalLipno

sp.z

o.o.

84.62

0.01

11.00

6.5

ISzpitalM

iejsk

iim.dr

E.Sonn

enberga

∧Do

lno�laskiS

zpitalS

pecjal-

istycznyim

.T.M

arcin

iaka

Poddebick

ieCentrum

Zdrowiasp.z

o.o.

71.43

0.01

10.00

3.5

Wielospecjalistyczny

SzpitalMiejsk

iw

Poznaniu

∧RadomskiSzpital

Specjalistyczny

im.d

rTytusaChałubinskiego

SzpitalLipno

sp.z

o.o.

76.92

0.01

10.00

4.33

Regionalny

SzpitalSpecjalistycznyim

.dr

Władysław

aBieganskiego

∧SzpitalSpecjalistycznyim

.J.�

niadeckiegow

Now

ymSaczu

SzpitalLipno

sp.z

o.o.

76.92

0.01

10.00

4.33

SzpitalSpecjalistycznyim

.J.�n

iadeckiego

wNow

ymSaczu

∧Instytut

MatkiiD

ziecka

SzpitalLipno

sp.z

o.o.

76.92

0.01

10.00

4.33

Wojew

ódzkiSpecjalistyczny

Szpitalim

.M.Pirogowa

wŁodzi

∧Poddebick

ieCentrum

Zdrowiasp.z

o.o.

ISzpitalM

iejsk

iim.d

rE.Sonnenberga

90.91

0.01

10.00

11

Wielospecjalistyczny

SzpitalM

iejsk

iwPoznaniu

∧Regionalny

Szpital

Specjalistyczny

im.d

rWładysław

aBieganskiego

SzpitalLipno

sp.z

o.o.

70.59

0.01

12.00

3.4

Specjalistyczny

SzpitalM

iejsk

iim.M.K

opernika

wToruniu

∧Wojew

-ódzkiSzpitalPod

karpackiim

.JanaPawłaII

SzpitalPow

iatowyim

.JanaMikulicz

aw

Biskupcu

90.91

0.01

10.00

11

SzpitalPow

iatowyim

.JanaMikulicz

aw

Biskupcu

∧SzpitalPow

iatowy

im.E

dmunda

Biernackiego

ISzpitalM

iejsk

iim.d

rE.Sonnenberga

83.33

0.01

10.00

6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 24

Logo 2010 | 09 . 02

Page 26: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 Clustering in Public ProcurementClustering of contracts offers an interesting business case both for contracting authorities and for bidders.Contracting authorities may find it useful to see similar contracts as an inspiration for preparation of specificcontract notice. This is particularly true for smaller offices where only one person is part-time responsiblefor procurement. Another opportunity is formed by aggregation of demand – if several similar contracts areannounced together they can get better price by applying in a group than individually. Moreover, the wholeprocess should be cheaper and spared resources can be devoted for evaluation.

Bidders can win even more as clustering offers a sophisticated mechanism to monitor new notices similarthe contracts they have already realised. It is more expressive than a typical search language or interfacesoffered by majority of portals publishing information about public contracts. What they need to do is just toidentify contracts from the past that would be most suitable for them and then appropriate cluster will bedevised.

This initial idea has been further elaborated and clustering is now perceived as a pre-processing task forother methods. They can then be performed in a more efficient manner.

7.1 Matchmaking

Matchmaking, due to efficiency reasons, is based on comparison of one attribute – Common ProcurementVocabulary (CPV) code. Clustering methods allow finding similar contracts not only by looking at one specificattribute but by considering other contract details which are deemed important. The point of clustering hereis to reduce the number of comparisons. In fact, the bulk of the computation is carried out off-line, in theclustering phase, and not in the matchmaking phase.

The improved matchmaking is then achieved in the following steps:

1. attribute selection and transformation to table format

2. data pre-processing

3. building of clustering model

4. application of model to new contract

5. calculation of similarity between new contract and its neighbours in the allocated cluster

6. selection of top N contracts similar to the new example.

7.2 Learning Procedure

The learning starts with the selection of common set of attributes, which are then transferred to CSV fileas in the prediction case (see Section 4). Exploratory data analysis shows that contracts of different kinds(supplies, services, and construction works) have different characteristic and varying set of features definingtheir specificity. For example, contracts related to works have a wide range of estimated value and verynarrow range of CPV codes, whereas supplies have fairly stable monthly occurrence and services have thelowest percentage of contracts defining price as exclusive criterion for contract award (illustrated in Figure 13.).For this reason, the clustering process is carried out separately for these distinct kinds of contracts.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 25

Logo 2010 | 09 . 02

Page 27: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 13: The frequency of use of the criterion in relation to the kind of contract (A – only price, B – not onlyprice, for parts, C – some parts have other criterion then price, for contracts)

Similarity metrics depend very much on data preparation. In pre-processing step the nominal variableshave been transformed to numerical; month variable has been normalised by Z-transformation, number ofparts variable has been scaled to range [-1, 1], and CPV codes and estimated value of contract to range in[-20, 20]¹³. Various transformations of attributes result in different distances between objects with regard toEuclidean measure. This results in a diversified importance of attributes, e.g. two different but close CPV codescause less similar contract than two close months, hence the scaling.

Although there are a lot of clustering methods available, most of them have to be rejected due to therelatively high computational complexity or necessity to a priori define a number of clusters. We need toremember that such methods have also to cope with high dimensionality; a typical contract can be describedwith over 200 attributes.

In the case of public procurement the requirements for clustering are specific. We are mostly interestedin comparing new contracts to existing “space”. On the one hand, obtained clusters should not be too smallbecause finding similar contracts to outliers would be impossible. On the other hand, too large clusters canmake it difficult to find representative cluster and then similar contract within this. The efficiency would bedeteriorated in both cases. Unfortunately, in the domain of public procurement often we have to deal withunique contract examples. Proper setting of parameters for the clustering algorithm is very crucial becausemodel should generalise contracts on the appropriate level, so that the number of generated clusters is neithertoo high, nor too low.

After experiments, X-means algorithm [10], which is extension of K-means, has been selected. Otheralgorithms have not been so efficient in identification of correct number of clusters and sometimes generatedinconsistent groups. X-means is able to cope efficiently with the whole Polish dataset from year 2013 and it

¹³The contract of lowest value has gotten -20, and the most expensive +20; the rest has been scaled linearly

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 26

Logo 2010 | 09 . 02

Page 28: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

determines the correct number of clusters based on a heuristic. During the local tests, the described methodperforms all steps from 2 to 6 in about 30 seconds. This is possible by using a simple K-means method withEuclidean metric. Correctness of model has been checked by domain experts, measuring the average withincluster distance and Davies–Bouldin index which defines ratio of intra-cluster distances to inter-cluster distances.

7.3 Results and Applications

X-means algorithm uses heuristic to determine the appropriate number of clusters but the user specifies anupper kmax and lower kmin range as parameters. The default range is [2, 60] but in our experiments it hasproduced to small number of clusters. We have decided that kmin should not be greater than 20 and kmax

left as 60 (default and the smallest possible value). Based on Davies–Bouldin index, average distance within acluster, and cardinality of clusters, we have identified the best configuration to be [5, 60]. Similar values havealso been indicated by experts. Evidences for this decision can be seen in figures 14 and 15. This setting givesthe best ratio of presented measures, does not produce clusters smaller than 23 examples for all of contractskinds from initial dataset and provides consistent results. Setting kmin for the value of 9, generates the smallestnumber of clusters but others measures are not satisfactory.

Figure 14: Values of measures according to k min parameter for works contracts

Figure 15: Cardinality of clusters for works contracts

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 27

Logo 2010 | 09 . 02

Page 29: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The first and obvious exploitation of results has already been described at the beginning of this section.Contracting authorities, on one hand, can classify prepared contract into one of the learned clusters and thentake advantage of their common characteristic. The most similar contracts can be a good inspiration withregard to terms of reference, offered prices, and also number of bidders. Two use cases have already beenidentified by us:

• price suggestion (auto-completion)

• matchmaking (contract-to-contract).

Contractors can use a cluster as a monitoring tool, e.g. all new contracts in a marked cluster should resultin a notification. For this purposes, clusters on various levels, i.e. with different number of contracts, are useful.

Matchmaking API already offers several comparison methods: contract to contract and contractor to con-tract (see deliverable D9a.2.2 [9]). Unfortunately, matchmaking over the whole graph is not efficient, andclustering can be used to speed this up. We can also think about another matching, i.e. contractor to contrac-tor. The application is not obvious but again, it can be used by contractors who want to know their competitors.This may be leveraged by contracting authorities who want to look for another contractor in case one wentbankrupt, want to check references or form consortia. It can also help in predicting the number of bidders bylooking at the number of active players in specific market niche.

To conclude, results of the proposed method can be used by both matchmaking as well as predictivetasks.

7.4 Implementation and API

Approach to implementation is similar as in the prediction method, i.e. Java application with RapidMinerback-end. Calculation is divided into three separate RapidMiner processes.

The first one starts with a) reading and transforming original data, b) building clustering model, and c)persisting results for future use. Additionally, transformed examples are saved along with assigned cluster to afile. Second process applies the saved model to new data resulting in assignment of new cases to one of theexisting clusters. Third process calculates similarity between new example and existing instances in the samecluster and returns top N similar items.

External clients do not have direct access to cluster building. Although the method is very quick and wehave included a kind of hidden web method to start the learning process, it is not intended for a broader audi-ence. Proper functioning of method requires provision of contract data, and the database works on hundredsof thousands of contracts. Because the methods is very quick we have not considered incremental clustering,and the method itself can be run quite often, e.g. daily when new data arrives into triple store.

Mining process is exposed as API at http://data.i2g.pl/contract-analytics/similar endpointthrough HTTP POST method.¹⁴ It expects contract as JSON or JSON-LD shown by Listing 1 and returns out-put as JSON array containing contracts identifiers as shown in Listing 4. JSON-LD version is presented in Listing5. In order to reduce the transfer of probably unnecessary data, only ids are returned as a result of API methodcall. More data can be normally obtained by dereferencing URIs.

¹⁴Documentation and sample call is available at http://data.i2g.pl/contract-analytics/similar/doc

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 28

Logo 2010 | 09 . 02

Page 30: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1[”http://data.i2g.pl/zp/contract/2013_440888”,2”http://data.i2g.pl/zp/contract/2013_231667”,3”http://data.i2g.pl/zp/contract/2013_425650”,4”http://data.i2g.pl/zp/contract/2013_204511”,5”http://data.i2g.pl/zp/contract/2013_449790”]

Listing 4: Sample output for clustering method provided by the API

1{2”@context”: {3”zp”: ”http://data.i2g.pl/dic/zamowienia-publiczne#”,4”zpc”: ”http://data.i2g.pl/zp/contract/”5},6”zp:similar”:{7”@list”: [8{”@id”: ”zpc:2013_440888”},9{”@id”: ”zpc:2013_231667”},10{”@id”: ”zpc:2013_425650”},11{”@id”: ”zpc:2013_204511”},12{”@id”: ”zpc:2013_449790”}13]14}15}

Listing 5: Sample output for clustering method provided by the API ( JSON-LD version)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 29

Logo 2010 | 09 . 02

Page 31: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 Ad-hoc Analysis of Procurement Linked DataFor the purpose of increasing the usability, a dedicated application for analysis of public contracts in Poland hasbeen designed. It allows to browse the notices, contracts, procedures (set of related contracts), contractingauthorities and bidders. Additionally, results of data mining activities are presented in a convenient form. Theapplication has direct access to linked data stored in local Virtuoso installation.

The welcome screen presents some basic information about Polish public procurement market (see Figure16). The screen presented also contains placeholders for further development of the application by allowingpersonalisation. The following information can be relevant for the user: liked contracts, observed procedures,and other messages generated as alerts in response to specified contract search criteria.

Figure 16: Welcome page of uzpApp

From this screen one can go to detailed information about certain procedure (see Figure 17). It collectsall notices related to given procedure: from call for tenders (ZP-400), through amendment notices (ZP-406) tocontract award (ZP-403). Title of the procedure is taken from the first notice. Visual elements present hintson current status (open, closed, awarded), kind of contract, type of procedure. Depending on the status it canalso display name of awarded contractor. Other important attributes are presented at the bottom of the page.Again, further browsing is possible to contracting authority or contractor.

Details of the notice can be checked on a separate page, dedicated to given contract (see Figure 18).Nevertheless, only a subset of them are in fact interesting for a user and are presented. Some contract types,like ZP-400, can contain almost 150 various attributes characteristic only for this type, so efficient presentationof them is crucial. Common attributes are moved to the header, so different types of notices look similar.

Contracting authority is described by its name, and also logo where available; address data is accompaniedby a map to allow better location of the entity (see figure 19). If the organisation has other “branches”¹⁵ theyare displayed as well. Aggregated statistics about the entity, like for example most popular CPV codes, theestimated value of contracts can be of interest. For convenience, the list of procedures is displayed along withtheir status: open (it was announced, waiting for tenders), closed (the deadline for tender submission passed)and awarded (notification award was published). It is interesting from exploratory point of view to see relatedcontractors, where related means that they were awarded the biggest number of contracts and their valuewas the highest (various sorting options).

¹⁵For example, it is very common among big universities, which consists of several faculties and they have certain financial inde-pendence, they organise tendering process themselves

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 30

Logo 2010 | 09 . 02

Page 32: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 17: Procedure presented in uzpApp

Figure 18: View on notice in uzpApp

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 31

Logo 2010 | 09 . 02

Page 33: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Finally, contractors can be viewed as well (see figure 20). Again, there is a name accompanied by logowhere available, and basic address data. This page displays also a list of awarded contract, so it is possible forcontracting authority to verify references provided for any future tender. One can also see which contractingauthorities are the most popular for a given contractor.

Figure 19: View on contracting authority in uzpApp

Figure 20: View on contractors in uzpApp

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 32

Logo 2010 | 09 . 02

Page 34: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 Evaluation by Contract AuthoritiesIn the second half of August 2014, the PCFA has been evaluated by two contract authorities from the CzechRepublic, primarily from the point of view of potential utility of the proposed functions in the public contractfiling process. Besides, the terminology used in the Czech version of the application was verified as well.

The contract authorities involved were

• University of Economics, Prague (UEP), as a large educational and research body, represented by theHead of its legal department, as official responsible for publication of most public contract notices.¹⁶ UEPofficially publishes around 50 public contracts per year.¹⁷

• Czech Hydrometeorogical Institute (CHMI), as a central government institution chartered by the Ministry ofEnvironment for the fields of air quality, hydrology, climatology and meteorology. The contacted user wasa member of the Department of Hydrology who is regularly involved in the procurement process. CHMIofficially publishes around 60 public contracts per year,¹⁸ of which about a third is related to hydrology.

Further usability evaluation (in particular, with representatives of municipalities) is foreseen for September2014.

9.1 Evaluation setting

Since the nature of the application is primarily that of interface to advanced functionality and not that of com-prehensive software covering the procurement lifecycle according to legal requirements (in terms of Europeanand national-level regulations), the evaluation did not aspire to follow the workflow of the contract authoritiesin detail. Rather, the sessions with the users were arranged as follows:

1. The whole session was planned for about one hour of time. Only the part of PCFA focused on the buyerside (i.e. procurement authority) was shown, i.e. neither the supplier side nor the form generationadmin interface (for product ontologies).

2. The user was briefly informed about the context of the work (LOD2 project goals, procurement usecase etc.) and motivations for the development of the PCFA. In particular, the role of the application asinterface to advanced functionality, which is to be evaluated in first place, was stressed.

3. The user presented her view of the procurement process within her institute/department.

4. The application functionality was demonstrated, taking into account the key points identified in theprevious phase, and considering one or more hypothetical public contracts of the typical kind for thegiven authority.

5. Feedback on the utility of the individual advanced functions was collected. When the utility was notobvious from the use case currently examined, the user was asked if she could identify a different casefor which the relevance of the PCFA support would be higher.

¹⁶Besides, prior to this ‘true’ evaluation session, the local best practices have also been consulted with the administrator of theofficial procurement software (used for publishing the calls for tenders within the contract authority profile).

¹⁷The list, in Czech, is in https://zakazky.vse.cz/index.php?m=contract&a=index&type=all&state=all&page=2.¹⁸The list, in Czech, is in http://sluzby.e-zakazky.cz/Profil-Zadavatele/1df58e7d-df4e-4bcf-9211-c436099efafd;

however, further contracts will be published at the portal of another provider, see https://ezak.mzp.cz/profile_display_7.html.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 33

Logo 2010 | 09 . 02

Page 35: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6. Analogously, the user was enquired about a desirable functionality that is not present in the currentversion of the PCFA (nor in mainstream procurement portals).

9.2 Evaluation notes – UEP

Given the relatively loose setting for the evaluation, the comments and observations are only listed in text.

• No error occurred in the application (in either version: the standard one and the one featuring formsbased on product ontologies).

• The basic form for PC filing was found intuitive and covering the most important features. Compliancewith law would require, among other, specifying not only date but also time of the tender submissiondeadline (this would however be unnecessary burden for a purely demonstrative application).

• The simple codelist-based autocompletion for CPV and NUTS codes was welcomed surprisingly well. Theuser was not aware of similar functionality within the mainstream portals; she had always had to selectthe valued from a codelist open sideways.

• The possibility to ‘play’ with a call under preparation in a private space, while already having access toadvanced matchmaking and predictive functionality, was considered as highly advantageous. As possibleenhancement, it was suggested to allow for multiple users to share a common private space within thecontract authority institution. In a large organization such as UEP, people from various departments andcenters may be entitled to prepare the draft of the contract notice, which is however finalized by thelegal expert; they might not necessarily often meet face to face in this process and would benefit froma collaborative environment.

• It was also noted that the publication model should be careful with respect to the Czech law disallowingto disclose a call earlier than 1 month after the official publication in the Czech Public ProcurementBulletin, except for a certain kind of low-price contract notice (corresponding to “Restricted (Accelerated)’procurement method).

• Forms based on product ontologies were considered as major enhancement; they would most likelybe used for procurement of devices (such as computers and beamers), by the entitled members of thecomputing center or similar (rather than by the legal department as end-stop of the notice preparation).

• In order to precisely specify the estimated price, the user had never previously used the search forprevious similar contracts, as implemented in the mainstream portals,¹⁹ and was skeptical about itssuitability for most contracts published at her institute, since they usually concern the procurement of ahigh number of units of some commodity or of a variable amount of utilities (such as electricity). For therare cases when a single large commodity is procured (like building a canteen or waste water station),comparison to historical cases, own as well as of similar kinds of buyers, could however be beneficial.For such cases, manual price comparison based on concrete past contracts retrieved via similarity searchwas identified more relevant for making the price estimate than mere ‘blackbox’ recommendation ofprice (or price interval) based onmachine-aggregated past contracts. Namely, the latter might be basedon cases that differ in some important characteristic that can only be deduced manually.

• Prediction of the number of tenders was identified as potentially useful, since it would allow iterativetuning of price (to avoid both excessive price and too low/high number of tenders) prior to the publicationof the contract notice. A pre-requisite however is that the prediction is based on relevant training data.

¹⁹However in the form of full-text or form-based search rather than in the form of similarity-based search.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 34

Logo 2010 | 09 . 02

Page 36: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

• The direct invitation of suppliers based on matchmaking would mainly bring advantage for contractswith the “Restricted (Accelerated)” procurement method, since other methods require official publicationfirst.²⁰

• The possibility to display the link graph of the prospective buyer, even prior to a tender submission(or call publication) was identified as useful, as regards its previous tenders, awarded contracts, butalso information that is not yet available in the Payola graph: personal structure and information aboutinsolvency. As expected, however, the current view in Payola was perceived as non-intuitive for acommon user.

9.3 Evaluation notes – CHMI

Given the relatively loose setting for the evaluation, the comments and observations are only listed in text.

• The application ran smoothly, except for two errors caused by minor bugs (fixed shortly after the session).

• The basic form for PC filing was found intuitive; the user was however surprised why the kind of contract(such as ‘Supplies’ or ‘Services’) is not to be filled in it, especially that the scope of documents to beappended (also part of the form) depends on it in reality.

• In contrast, the contract type button was considered misleading. Similarly, the text ‘Tenders are sealed’with a checkbox was considered potentially misleading, as it might indicate that the form filling wouldtake place in the phase when the tenders have already been submitted (formulation ‘Tenders are to besealed’ was instead suggested).

• The simple codelist-based autocompletion for CPV and NUTS codes was welcomed well. It was howeverpointed out that its use for CPV codes might lower the precision in situations when the user would notknow which code s/he seeks, as s/he might encounter a code with similar text that however belongsto a different branch than desired. In such situations, the possibility to navigate within a tree should beoffered as alternative.

• It was mentioned that for larger-size contracts (to which the user was accustomed) the workflow alsocontains the publication of a preliminary contract notice, already including a price estimate, i.e. it is notpossible to freely tune the parameters of the call in a private space and then publish it in full.

• Forms based on product ontologies were considered interesting but not useful for the kind of contractsconsidered by the user (which is Services). It was mentioned that services in hydrology are so variablethat is probably not realistic to devise a ‘service ontology’ for each of them. A minor comment to theinterface was also made (if a mandatory field is left empty, the system should specifically lead the userto that one rather than returning a generic error message.)

• Similar to the UEP user, the CHMI user was skeptical about the usefulness of the price recommendationfeature for the contract at their institute, e.g., the price for water monitoring services depends on thesize of the region covered, number of parameters followed, and the like. She also criticized the lack ofdistinction between price with and without the VAT.

• Similarly, the prediction of the number of tenders as currently implemented – primarily based on CPVcodes – was found too rough. It was stated that a CPV code for services overarches different kinds of

²⁰This observation was also independently made by a potential user from a municipality, who was however consulted before theapplication was functional; his comments are thus otherwise not included here.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 35

Logo 2010 | 09 . 02

Page 37: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

services, such that for some of them there is a high number of possible suppliers while other can onlybe fulfilled by 2-3 specialized companies.

• The list of potential suppliers provided by the system for a model contract was identified as ‘tolerable,given the little information provided’. Five of the top 20 listed suppliers were those having competencefor at least a part of the contract (water abduction or lab test) and could be considered as relevantsupplier if a joint tender were set up. However, the two most relevant companies (awarded individuallyin the previous years) were not proposed by the matchmaker. On the other hand, more than a half of theproposed suppliers were irrelevant; again, the clear reason is the broadness of the CPV codes employed.The user expects that inclusion of full-text matcher over the short textual descriptions (as planned to beintegrated soon) could significantly improve the results. Yet, even with a best possible matchmaker athand, the support by such an application can only have auxiliary role, since the award decision has totake into account subtle know-how that cannot be captured by structured data as demonstrated.

• The possibility to display the link graph of the prospective buyer was shown, but no significant discussion(aside the comment of low user-friendliness) was made, because of short time.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 36

Logo 2010 | 09 . 02

Page 38: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10 ConclusionsThis final deliverable of WP9a, i.e. of the LOD2 Public Procurement use case, was focused on the integrationof functionality resulting from all three tasks of this workpackage, inside a prototype called Public ContractFiling Application (PCFA). While special focus was on the recommendation functionality fuelled by data min-ing, the evaluation by real users (contract authorities) indicated that even less complex and less automatedfunctions, such as code-based autocompletion or discovery of matching suppliers, could be useful. In turn,recommendation based on aggregation of a large number of cases may be too coarse-grained and its valuethus limited. In general, the area of public procurement is very diverse and future R&D activities should aimto provide specifically tailored support solutions for different kinds of contract authorities as well as contractkinds such as supplies of common commercial products vs. specialized services vs. construction activities; weanticipate that the last category will be rather important for the municipality-level users, which we aim at inthe next phase of evaluation (as immediate follow-up to LOD2-supported activities, at national level).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 37

Logo 2010 | 09 . 02

Page 39: Deliverable9a.3svn.aksw.org/lod2/D9a.3.2/public.pdf · CollaborativeProject LOD2-CreatingKnowledgeoutofInterlinkedData ProjectNumber:257943 StartDateofProject:01/09/2010 Duration:48months

D9a.3.2 - v. 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11 References[1] Study on corruption in the healthcare sector, 10 2013. European Commission HOME/2011/ISEC/PR/047-A2.

[2] System zamówien publicznych a rozwój konkurencji w gospodarce, 09 2013. Urzad Ochrony Konkurencji iKonsumentów.

[3] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules, 1994.

[4] PauloJ. Azevedo and AlípioM. Jorge. Comparing rule measures for predictive association rules. In JoostN.Kok, Jacek Koronacki, RaomonLopezde Mantaras, Stan Matwin, Dunja Mladeni�, and Andrzej Skowron,editors, Machine Learning: ECML 2007, volume 4701 of Lecture Notes in Computer Science, pages 510–517.Springer Berlin Heidelberg, 2007.

[5] Marek Dudás, Ondrej Zamazal, Vojtech Svátek, and Jindrich Mynarz. Exploiting freebase to obtaingoodrelations-based product ontologies. In Proceedings of the 15th International Conf. on ElectronicCommerce and Web Technologies. Springer-Verlag, 2014.

[6] Martin Hepp. Goodrelations: An ontology for describing products and services offers on the web. In AldoGangemi and Jérôme Euzenat, editors, EKAW, volume 5268 of Lecture Notes in Computer Science, pages329–346. Springer, 2008.

[7] Jakub Klímek, Jirí Helmich, and Martin Necaský. Payola: Collaborative linked data analysis and visual-ization framework. In Philipp Cimiano, Miriam Fernández, Vanessa Lopez, Stefan Schlobach, and JohannaVölker, editors, ESWC (Satellite Events), volume 7955 of Lecture Notes in Computer Science, pages 147–151.Springer, 2013.

[8] Jindrich Mynarz, Martin Necaský, and Jirí Veselka. Lod2 deliverable 9a.2.1: Web application for filing publiccontracts. Deliverable D9a.1.2, LOD2 EU Project, 2012.

[9] Jindrich Mynarz, Václav Zeman, and Marek Dudás. Lod2 deliverable 9a.2.2: Stable implementation ofmatching functionality into web application for filing public contracts. Deliverable D9a.2.2, LOD2 EU Project,2014.

[10] Dau Pelleg and Andrew Moore. X-means: Extending k-means with efficient estimation of the number ofclusters. In In Proceedings of the 17th International Conf. on Machine Learning, pages 727–734. MorganKaufmann, 2000.

[11] J. R. Quinlan. Learning with continuous classes. pages 343–348. World Scientific, 1992.

[12] Jiang Su and Harry Zhang. A fast decision tree learning algorithm. American Association for ArtificialIntelligence, 2006.

[13] Vojtech Svátek, Jindrich Mynarz, David Chudán, Jakub Klímek, Łukasz Grzybowski, Mateusz Jarmuzek,Krzysztof Wecel, Lorenz Bühmann, and Sander van der Waal. Lod2 deliverable 9a.3.1: Application ofdata analytics methods of linked data in the domain of psc. Deliverable D9a.3.1, LOD2 EU Project, 2014.

[14] George K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 38

Logo 2010 | 09 . 02