8
Towards a Comprehensive Call Ontology for Research 2.0 Vladimir Tomberg, David Lamas, Mart Laanpere Tallinn University, Narva mnt 25, 10120 Tallinn, Estonia, +372 6409 355 [email protected], [email protected], [email protected] Wolfgang Reinhardt University of Paderborn, Fuerstenallee 11, 33102 Paderborn, Germany, +49 5251 606603 [email protected] Jelena Jovanovic University of Belgrade, Jove Ilica 154, 11000 Belgrade, Serbia, + 381 11 3950 853 [email protected] ABSTRACT A Call for Papers (CfP) is a small, but well-structured and information-rich message with a relatively short lifespan. CfP plays an important role in academic life, not just as an advertisement format, but also as a trigger of and advance organiser for collaborative academic writing. This paper explores the possibilities to create a comprehensive ontology for CfP so that is would be relevant and useful in Research 2.0 context for two main target groups: authors involved in collaborative writing of academic papers, and conference organisers or journal editors. Our study is conducted in three phases. First, we identify existing ontologies and other representation frameworks, which could provide concepts relevant for CfP. Next, a sample of conference CfPs is analysed and compared, to find out the common structures and peculiarities, which could be used for extending the existing ontologies. Finally, we propose Call ontology together with two usage scenarios. Categories and Subject Descriptors H.1.2 [User/Machine Systems]: Human Information Processing. I.2.4 [Ontologies] General Terms Design, Standardization. Keywords Research 2.0, ontology, Call for Papers, collaborative writing, scientific events. 1. INTRODUCTION It is hard to overestimate the role and value of Call for Papers (CfP) in the contemporary scientific publishing process. Sometimes researchers spend a lot of time deleting dozens of useless CfPs from their email inboxes together with unwanted messages. Sometimes researchers spend even significantly more time looking for relevant CfPs on the Web. The situation when a researcher receives a word about an appropriate CfP a day after the submission deadline is very familiar to people who write and publish scientific papers. A possibility to receive useful CfPs in time and to extract personalised CfP information from the mass of unwanted CfPs could be highly appreciated by the community of researchers. Researchers frequently experience significant information overload while trying to find the right conference or journal for submitting a paper [10]. The challenge of discovering useful CfPs by interested individuals and groups has already been addressed by several initiatives, like WikiCFP 1 . Semantic aspect has proven as essential; for instance, Xin et al [15] have demonstrated the role of publishing CfP data in semantic-rich format that can be directly consumed by applications used by researchers for planning the writing process and schedule management. This paper addresses the following research problem: which of the existing CfP-related ontologies could be combined and extended in order to allow for automated retrieval and processing of CfP data. More precisely, our aim is to analyse the available ontologies and other representation frameworks related to CfPs, events, portals and communities, then to compare them with the existing practices of CfP providers, and, finally, to suggest a comprehensive CfP ontology together with two usage scenarios. Timely receipt of relevant Call for Papers is especially critical in the context of collaborative authoring of research papers. We argue that the proposed CfP ontology would serve as a useful foundation for developing new services and applications for the Research 2.0 domain. Research 2.0 is a frequently used umbrella term for altered scholarly practices that emerge through the existence of new methods and technologies. Research 2.0 can be defined [14] as application of new practices focusing on opening up the research process in order to broaden participation and collaboration with the help of new Semantic Web technologies inspired by social media. Typical Research 2.0 services require ontologies for sharing and managing bibliographic references, analysing collaboration patterns and recommending resources as well as potential partners. The research presented here, presents a new technique of creating, semantically annotating and sharing CfPs within research communities. If the presented ontology is widely adapted, we envision new mash-up environments that have the potential of enriching existing scholarly practices. One of such services is a new collaborative academic writing assistant called Timeliner, which is currently being developed by our research group. The new ontology can be also used for building new semantic mash-up tools for conference planning and management, as we will illustrate with the scientific event management system ginkgo in Section 7. 1 http://www.wikicfp.com

Towards a comprehensive call ontology for Research 2.0

Embed Size (px)

Citation preview

Towards a Comprehensive Call Ontology for Research 2.0

Vladimir Tomberg, David Lamas, Mart Laanpere

Tallinn University, Narva mnt 25, 10120 Tallinn, Estonia, +372 6409 355

[email protected], [email protected], [email protected]

Wolfgang Reinhardt University of Paderborn,

Fuerstenallee 11, 33102 Paderborn, Germany, +49 5251 606603

[email protected]

Jelena Jovanovic University of Belgrade,

Jove Ilica 154, 11000 Belgrade, Serbia, + 381 11 3950 853

[email protected]

ABSTRACT A Call for Papers (CfP) is a small, but well-structured and information-rich message with a relatively short lifespan. CfP plays an important role in academic life, not just as an advertisement format, but also as a trigger of and advance organiser for collaborative academic writing. This paper explores the possibilities to create a comprehensive ontology for CfP so that is would be relevant and useful in Research 2.0 context for two main target groups: authors involved in collaborative writing of academic papers, and conference organisers or journal editors. Our study is conducted in three phases. First, we identify existing ontologies and other representation frameworks, which could provide concepts relevant for CfP. Next, a sample of conference CfPs is analysed and compared, to find out the common structures and peculiarities, which could be used for extending the existing ontologies. Finally, we propose Call ontology together with two usage scenarios.

Categories and Subject Descriptors H.1.2 [User/Machine Systems]: Human Information Processing. I.2.4 [Ontologies]

General Terms Design, Standardization.

Keywords Research 2.0, ontology, Call for Papers, collaborative writing, scientific events.

1. INTRODUCTION It is hard to overestimate the role and value of Call for Papers (CfP) in the contemporary scientific publishing process. Sometimes researchers spend a lot of time deleting dozens of useless CfPs from their email inboxes together with unwanted messages. Sometimes researchers spend even significantly more time looking for relevant CfPs on the Web. The situation when a researcher receives a word about an appropriate CfP a day after the submission deadline is very familiar to people who write and publish scientific papers. A possibility to receive useful CfPs in

time and to extract personalised CfP information from the mass of unwanted CfPs could be highly appreciated by the community of researchers.

Researchers frequently experience significant information overload while trying to find the right conference or journal for submitting a paper [10]. The challenge of discovering useful CfPs by interested individuals and groups has already been addressed by several initiatives, like WikiCFP1. Semantic aspect has proven as essential; for instance, Xin et al [15] have demonstrated the role of publishing CfP data in semantic-rich format that can be directly consumed by applications used by researchers for planning the writing process and schedule management.

This paper addresses the following research problem: which of the existing CfP-related ontologies could be combined and extended in order to allow for automated retrieval and processing of CfP data. More precisely, our aim is to analyse the available ontologies and other representation frameworks related to CfPs, events, portals and communities, then to compare them with the existing practices of CfP providers, and, finally, to suggest a comprehensive CfP ontology together with two usage scenarios.

Timely receipt of relevant Call for Papers is especially critical in the context of collaborative authoring of research papers. We argue that the proposed CfP ontology would serve as a useful foundation for developing new services and applications for the Research 2.0 domain. Research 2.0 is a frequently used umbrella term for altered scholarly practices that emerge through the existence of new methods and technologies. Research 2.0 can be defined [14] as application of new practices focusing on opening up the research process in order to broaden participation and collaboration with the help of new Semantic Web technologies inspired by social media. Typical Research 2.0 services require ontologies for sharing and managing bibliographic references, analysing collaboration patterns and recommending resources as well as potential partners. The research presented here, presents a new technique of creating, semantically annotating and sharing CfPs within research communities. If the presented ontology is widely adapted, we envision new mash-up environments that have the potential of enriching existing scholarly practices. One of such services is a new collaborative academic writing assistant called Timeliner, which is currently being developed by our research group. The new ontology can be also used for building new semantic mash-up tools for conference planning and management, as we will illustrate with the scientific event management system ginkgo in Section 7.

1 http://www.wikicfp.com

2. COLLABORATIVE ACADEMIC WRITING The primary context for our research work is collaborative academic writing. CfPs serve as important triggers and guidelines for the process of co-authoring a joint research paper. We have developed a conceptual model and design for Timeliner – a mash-up tool supporting the orchestration of collaborative writing process, which involves two or more researchers using various social media tools while co-authoring an academic paper. Timeliner aggregates the content from these social media tools (Mendeley, EverNote, Google Apps, etc) as well as CfP-related data into a joint multi-level timeline inspired by Gantt chart (Figure 1). Relevant CfPs are recommended by TimeLiner (or by a co-author) to a researcher. If CfP is accepted, all related deadlines (submission, acceptance, registration etc) and links to resources are placed on the timeline.

Figure 1. Sharing resources between Timeliner users

The challenge of semantically enhanced CfP can target two opposite sides of the CfP publication process:

� Solutions for semantic search, filtering and recommendation of CfPs should be designed for researchers.

� Semantically rich tools and workflows for the dissemination of CfPs targeted at CfP providers. To provide the most effective and standardized way of data processing, such tools should be integrated in software for organization of events;

Although in the context of developing the Timeliner service we are mainly interested in the firstly mentioned target, in this paper we take the latter approach. We believe that the generation of standardized and semantically rich data will bring a more effective way for the dissemination of CfPs and their subsequent processing by the Research 2.0 tools like Timeliner.

3. STATE OF THE ART Compared to other developments in the Research 2.0 domain, the practices of distributing CfPs have not changed significantly during the last decade. The majority of solutions for handling CfPs are not designed for automatic processing by computers. Almost all conference organizers publish their CfP on the Web in the form of plain HTML documents, sometimes accompanying it with PDF files for printing purposes. In this section we briefly overview different forms of publishing CfPs that prevail in todays research community.

3.1 Mailing Lists and Blogs The simplest ways to collect and advertise CfPs are mailing lists and blogs. Mailing lists are maybe the oldest traditional tool in digital scholarly communication. Lists like dbworld2 and AISWorld3 are still very popular among many researchers looking

2 http://www.cs.wisc.edu/dbworld/ 3 http://www.aisnet.org/AIS_Lists/publiclists.aspx

for CfPs. Unfortunately, mailing lists are not very efficient solutions in terms of filtering data or semantic search; in case of the usual subscription, a typical subscriber would filter all mails for relevant CfPs manually.

Although blogs are relatively modern Web tools and are usually seen as one of the building blocks of Web 2.0, they handle CfPs in quite old-fashioned manner. In most cases, CfPs are published in blogs without any semantic mark-up. Most of the blog platforms enable subscription to RSS feeds allowing in such way an additional form of CfP dissemination. However, the ability of RSS to provide the semantic meaning for posts is limited by the RSS vocabulary. This vocabulary is built around publishing posts and all metadata processing happens around ‘title’, ‘link’, and ‘description’ elements. Therefore, while RSS is considered as a good tool for the distribution of news, using pure RSS without any extension gives no possibility to pass CfP-related semantic data.

3.2 Web Based Services Web-based services form the next group of services which deals with the CfP data. As opposed to mailing lists, these services try to implement the semantic approach to processing the CfP data, though these efforts can be considered as very limited. There are several Web collections of CfPs, one of the most popular in this category seems to be WikiCFP. Started in 2007, this site serves currently a collection of more than 15,000 CfPs, which are managed in traditional wiki manner — the information is entered and maintained manually by registered users. The CfP submission form has a minimal set of fields: WikiCFP allows one to describe three types of calls and one all-purpose ‘others’ type, fields for name of event, textual description of location and dates of event, as well as four fields for specific types of deadline. Users can also provide a Web link to the event and four categories of CfP, but for all other information that cannot be enclosed into the proposed vocabulary one general ‘Call For Papers’ field is given.

Naming of WikiCFP as ‘Semantic Wiki’ can be considered as misleading. In addition to the absence of hyperlinking, which is considered an essential wiki mechanism [6], this service provides very limited processing for semantic data. Currently WikiCFP proposes three types of output: plain HTML, which has no accompanying semantic data at all; usual RSS feed for specific categories, which provides the names of CfPs and links to specific WikiCFP pages containing these CfPs; finally, access to CfP data for the past years can be obtained upon request for analysis purpose. These data is available in the XML format based on the same limited vocabulary described above.

Another good example of CfP Web service is eventseer.net. Started in 1999 and having today the collection with volume size of around 16,000 events, this service differs from WikiCFP in the way of collecting data. Instead of relying on a form with specific fields, eventseer.net expects from users to send their CfP data using emails with plain text inside. The service parses these emails trying to extract semantic data. As it can be seen in the CfP lists, eventseer.net mostly works with dates for deadlines and scientific domain topics, allowing for sorting of CfPs across domains. Related to the data output, eventseer.net has no big difference comparing to WikiCFP; it uses HTML without semantic mark-up. One original useful feature provided by eventseer.net is a possibility to import CfP deadlines into Google Calendar. This means that a user’s Google Calendar subscribes to eventseer.net iCal, which includes all the eventser.net feeds the user is subscribed to and cannot be filtered separately in a simple way.

There are some other CfP collecting web-services, e.g. Conference Alerts4, PapersInvited5 and AllConferences.Com6. In general, they are based on the same approach as the two solutions described above and provide very similar sets of functionalities.

3.3 Adding Semantics to HTML One way to pass the semantic data from data providers is explicit embedding of semantics into web pages and following parsing by web crawlers. Although the topic related to adding of rich semantic data to HTML pages is very popular today, there are still many challenges and the prospect of nesting semantics into CfP presented in the HTML form has no simple solution today. Currently there are three well-known technologies, which support applying additional semantics to HTML documents: microformats, RDFa, and microdata. However, each of these solutions has its own limitations.

3.3.1 Microformats Microformat is the most adopted and popular format for embedding semantic data in HTML documents. Most visible limitations of microformats are the amount of supported vocabularies, the inability to mix microformat vocabularies with any other vocabulary, and a very restricted set of domains (modeling people, events, etc). Each candidate to a microformats vocabulary must have an acceptance from the microformats community. In usual practice a long process of discussions should happen before a new microformat can be accepted. Developers of microformats are conservative in their approaches for the acceptance of new microformats; they try to hold the amount of microformats as compact as possible, proposing to build compound microformats by reusing existing ones.

CfPs sometimes can be very short and simple, but sometimes they have complex structure with different types of data presented. In such case using a structure combining existing microformats is probably a hard challenge and some specific CfP microformat would be very welcomed. For the first time the idea to create CfP microformat was proposed by McCracken in the microformats discussion board in 20067. The microformats community was not very enthusiastic about this proposal. As it can be observed on the microformats website, the interests of the community concentrated around more general and popular formats. The amount of people who needs CfP microformats is very small relatively to all the people who would be interested in using other microformats. Today, five years after the initial proposal for CfP microformat, we have a draft version of new hRecipe microformat, but probably will never have hCfP.

3.3.2 RDFa In contrast to microformats, the RDFa specification allows using any third-party vocabularies and proposes true semantic approach without strict limitations. As RDFa was initially designed for using in conjunction with the XHTML2 syntax, whose development was stopped by W3C in 2009, interest to RDFa has decreased over following year. The focus of web developers was consecutively moved to the new HTML5 syntax. Though because of supporting this technology by several content providers like

4 http://conferencealerts.com 5 http://papersinvited.com/ 6 http://www.allconferences.com 7http://www.mail-archive.com/microformats-discuss @microformats.org/msg02224.html

Google, BBC8, and IPTC9 RDFa got second wind at last year. Most visible indicator of that is a publication at April 2011 W3C Working Draft ‘Support for RDFa in HTML4 and HTML5’10. Considering these circumstances we can expect a growth of RDFa use in the next years.

3.3.3 Microdata Microdata is a new player in the field of semantic data representation designed for new HTML5 syntax. The first draft of HTML5 appeared in 2008. Microdata, as RDFa, has no vocabulary limitations; the difference between them is the way they address vocabularies. While RDFa uses traditional XML namespaces, microdata adopt full URI of vocabularies for this purpose [7]. Microdata specification proposes the native way for integrating metadata into HTML pages. However, no web browser supports microdata at the moment. In spite of that, Google supports microdata (as well as RDFa and microformats) in Google Rich Snippets. However and according to the adopted vocabulary11 only limited types of data entities can be extracted in search result page [3]. For CfPs the most useful information that can be retrieved using this vocabulary is information related to events and people. In fact, data fields for events and people are limited to a few very basic concepts. While this limited vocabulary is sufficient for simple event types like "Festival" or "Concert", without appropriate extension, it is too weak to be used for supporting rich CfP data.

3.4 Parsing CfP Data The first attempts of extracting CfP data were conducted already in 1989 [8]. Similar studies were repeated by different researchers with varying degree of extraction accuracy [2]. Almost all authors claimed that the extraction performance in their experiments is reasonable, but not optimal. In contrast to parsing, e.g., bibliographic data which has standardised document layouts (like LaTeX style files), in case of CfP extraction, a diversity of data representation formats exists [12]. Absence of unified standardized vocabulary for CfPs, often a complex structure of multi-level CfPs, and continuosly extended types of data used in CfPs do not allow for overall data extraction.

3.5 CfP Related Representation Frameworks The most curious fact related to CfP is the long time existence of a multitude of vocabularies and ontologies that can be adapted for CfP description. In spite of the fact that some vocabularies were developed six years ago, there is a very small amount of widely used solutions that make use of these vocabularies. Instead of proposing ways for publishing semantically rich CfP data, researchers and developers were often proposing approaches for parsing collected plain data.

In this part we do an overview of current CfP representation frameworks. By CfP frameworks we refer to ontologies and vocabularies. There are many vocabularies that support semantic description of events, people, and places. Some of them consist only of general terms; others have precise, domain-specific character. For this review we have selected only those

8http://www.bbc.co.uk/blogs/radiolabs/2008/06/microformats_ and_rdfa_and_rdf.shtml 9 http://dev.iptc.org/rNews 10 http://www.w3.org/TR/rdfa-in-html/ 11http://www.google.com/support/webmasters/bin/answer.py? answer=99170

vocabularies that have clear overlap with the concepts presented in modern CfPs.

CfPs are used for different purpose, e.g. for submitting proposals for conferences, workshops, journals, etc. Often, CfPs are also intended for initiating the process of scientific writing or other types of activities. CfPs mention people with very different roles as well as different types of calendar dates. As a consequence, designing of a versatile CfP vocabulary that can suit all possible requirements seems as complex task.

Possibly the assumption mentioned above was implied by DERI12 researchers who, in 2006, designed a very compact CfP vocabulary13. In spite of its ‘Call for Papers’ name, the vocabulary describes also other kinds of calls, represented with six; it consist of seven divided into classes: CallForExhibits, CallForParticipation, CallForDemos, CallForPapers, CallForVideos, and CallForReports. The only available properties are: ‘deadline’, ‘for’ (the event the call is for), and ‘details’ (free description in plain or HTML text). By assuming that such a small vocabulary would not suit all the potential needs, designers from DERI proposed its combined usage with other vocabularies. For example, they suggest its usage in combination with the LODE14 ontology.

LODE (Linking Open Descriptions of Events) ontology was designed through comparison and mapping of several existing event ontologies [11] to enable interoperable modelling of the “factual” aspects of events, such as: what happened, where it happened, when it happened, and who was involved.

For specific types of CfPs intended for conferences, the ESWC2006 Conference Ontology can be considered as relevant15. It has a well-developed vocabulary which, in addition to descriptions of CfPs, contains terms for many types of conference sub-events, involved participants, places, times, and other concepts widely used in modern CfPs.

The AKT Portal ontology16 offers a more general vocabulary that describes people, projects, publications, geographical data, etc.

The SWRC17 (Semantic Web for Research Communities) ontology offers a general vocabulary for research communities, including concepts such as persons, organizations, publications (bibliographic metadata) and their relationships. It consists of six top level concepts — Person, Publication, Event, Organization, Topic and Project [13].

Designed in DERI, the Semantic Web Portal (SWPortal) ontology18 has concepts very similar to SWRC. For example, Person, Publication, Conference, Meeting do exist in both vocabularies. However, SWPortal uses different design, and thus is selected for vocabulary examination in this paper.

Although the bibliographical data appear as not highly important (but still useful) for CfPs, it is more critical for conference and

12 http://www.deri.ie/ 13 http://sw.deri.org/2005/08/conf/ 14 http://linkedevents.org/ontology/ 15http://www.eswc2006.org/technologies/ontology-content/2006-09-21.html 16 http://www.aktors.org/ontology/portal 17 http://ontoware.org/swrc/ 18 http://sw-portal.deri.org/ontologies/swportal.html

journal management systems. For connection between events and bibliographic data, DERI’s SWPortal and SwetoDblp19 ontologies can be adapted. The SwetoDblp ontology connects together such bibliography concepts as Universities, Publishers, Series, and so on. It covers publications in the Computer Science domain focusing on bibliography data of Computer Science publications. It uses FOAF, DC and OPUS as its vocabularies. The SwetoDblp ontology is created through a SAX-parsing process of a large XML document from DBLP Computer Science Bibliography indexing website20 [1].

The last ontology we present here is SEDE, the Scholarly Event Description Ontology21. While the authors of SEDE claim that they ‘may not have included every concept that might be useful’ [5] the ontology includes the majority of concepts related to scholarly events, including CfPs.

In the following section, we examine the CfP-related representation techniques, looking for viable candidates to be combined and extended in order to allow for automated retrieval and processing of CfP data. In Section 5, we take a sample of contemporary conference CfPs to build a comprehensive sample of data representation requirements. Finally, in Section 6, we try to map and examine set of currently existent vocabularies.

4. ANALYSIS OF CFP RELATED REPRESENTATION FRAMEWORKS In this section, we further explore the use of ontologies and defined vocabularies for representing CfP-related data. First, we perform a qualitative assessment of each representation framework introduced in Section 3.5. The criteria used for the assessment include: overall CfP data representation ability, comprehensiveness and scalability, that is, the ability to be used in a broad range of usage scenarios.

As said before, the CFP vocabulary is rather limited. It has three first level classes – Call, Person (from FOAF) and vEvent (from iCalendar). Further, Call is the superclass for six specific call targets, such as papers – CallForPapers – and demonstrations – CallForDemos. All calls can have deadline and details properties associated, as well as an event – vEvent, which does not allow for any CfP specific data, such as field of interest, to be represented. Also, the details foresee only textual description, hence not facilitating the envisioned customization of a CfP feed. Although the CFP vocabulary is adequate for representing simple CfP data, it does not go beyond the current, traditional CfP documentation techniques as it does not provide an adequate level of detail and scope. It is however flexible in the sense that it can represent events with multiple calls.

Moving on to LODE, one understands that it was designed for publishing descriptions of historical events as linked data, and for mapping between other event-related vocabularies and ontologies [11]. Although powerful (and necessarily flexible) when addressing historical events, LODE fails to provide the expressiveness to adequately describe a complex event that will take place in the near future.

On the other hand the 2006 European Semantic Web Conference ontology (ESWC2006) provides adequate CfP data representation ability, and also comprehensiveness and scalability as it

19 http://lsdis.cs.uga.edu/projects/semdis/swetodblp/ 20 http://www.informatik.uni-trier.de/~ley/db/ 21 http://eventography.org/sede/

adequately covers CfP data, allowing for the representation of events with multiple calls and also for the usual submit, be notified and deliver deadlines. ESWC2006 ontology also has the advantage of facilitating a rich description of the event associated with a specific call or a specific set of calls. CFP-wise, the major drawbacks are the use of a generic textual property for representing submission instructions, and the lack of ability to differentiate subject and topic lists per call.

As for the AKT Portal ontology, it has rich classification of academic staff that can be used as a complement for specific CfP related ontologies. However, its scope doesn’t really fit our need to represent CfP data in a machine understandable way, so it fails to fulfil our first ontology assessment criteria.

Although the SWRC ontology fails for modelling entities of research communities such as persons, organisations, publications and their relationships, although also failing to meet our criteria for an overall representation of CfPs, it could be useful in the sense that by expanding another vocabulary, like the ESWC2006 ontology, with SWRC, we would enable an interesting perspective over the tapestry of scientific events and CfP tapestry, highlighting hidden or implicit relations between scientific events.

The SWPortal ontology acts as the skeleton for a Semantic Web community portal – a software platform for the exchange among people working in a common scientific area. Hence, the ontology models concepts within domains such as information exchange and cooperative research, but not CfP related data. Although some of the information represented by this ontology potentially relates to CfP, it is also directly available in more CfP-oriented ontologies.

As many of the previously assessed ontologies, SwetoDblp utilizes concepts and relationships from FOAF and Dublin Core. It is however only marginally related to CFPs, as it is dedicated to bibliography data of Computer Science publications. Although apparently unrelated, this ontology may well provide the necessary representation framework to describe the event’s proceedings or article collection editors and database indexing schemes, a kind of information missing from previously analyzed CfP oriented ontologies, such as ESWC2006.

Finally, the SEDE ontology does offer a very comprehensive conference (event) description concepts, which might prove relevant and valid (as shown in the upcoming Section 6), such as program, committee, series, place, acceptance rate, proceedings, and a call description mechanism that, not only accounts for common deadline information, including early registration dates, but also facilitates the description of a general knowledge domain and topic lists, that can be associated to conference tracks and sessions.

Based on the previous discussion, the best candidates for answering the question driving this paper’s quest are: FOAF, DC and other common concept ontologies as a foundation, then ESWC2006 and SEDE ontologies, complemented with the SWRC ontology for inter CfP relationship elicitation and eventually small parts of SwetoDblp.

This picture will become clear in the next section (Section 5) where we analyse existing CfP practices, and in Section 6 where we put together the ontology foundation to allow for automated retrieval and processing of CfP data by various Semantic Web applications.

5. ANALYSIS OF EXISTING CFP PRACTICES For analyzing the concepts used in contemporary CfPs, we selected a sample of 12 CfPs for conferences and journals in the domain of Technology-Enhanced Learning. This sample was collected from WikiCFP and compared with original versions available on the web sites of respective conference or journal. In the sample we selected, 7 CfPs originated from conferences and 5 CfPs from journal special issues. We started from two conferences most familiar to us, and then picked from the WikiCFP list additional ones in the order of their appearance in the time-sorted list. We stopped to add new CfPs into the sample when three consecutive additions of new CfPs did not provide any new concepts. One might question the representativeness of our CfP sample, because it was drawn just from one narrow domain. We compared our sample with randomly picked CfPs form other domains (psychology, sociology, medical research and biology), which showed no significant differences with regard to the data structure. So we can assume that our sample of CfPs is representative across all domains.

The CfPs for regular journal issues were not considered for examination in this paper because they could increase radically the complexity of our CfP ontology due to their variety, heterogeneity and in many cases, the absence of deadlines. In most of the scientific journals, authors can submit papers at any time.

The first finding from our analysis was inconsistence of the data. In several cases there was a clear difference between the original CfPs on web-pages and their analogues in WikiCFP. Sometimes only a very limited version of a CfP was presented on the conference web page, whereas in WikiCFP the CfP for the same event had a lot of additional data. This observation allowed us to make an assumption that WikiCFP web-form (based on the WikiCFP vocabulary) scaffolds event organizers to provide more information about their event, and in more standardised manner. Obviously, by filling in a standard form, organizers create a better CfPs than when they write CfPs in free form. This assumption can be an additional argument for the adoption of a CfP standard.

Further analysis of terminology used in CfPs revealed some groups of terms that are de-facto used by most of CfP writers. We found that all used concepts can be categorised into six groups: Events, Places, Submissions, Publications, Dates, and People. Simple visual distribution of terms between these groups indicated another tendency: there is an evident difference in vocabularies used between CfPs for events (e.g., conferences) and CfPs for special issues of journals. For example, the “places” concept was almost never used in CfPs for special journal issues. Furthermore, comparing to CfPs for events, they use different types of people (e.g. guest editor vs. keynote speaker), do not differentiate types of submissions (e.g. papers, posters, workshops) and have very weak structure of call description.

By examining CfPs we tried to find terms that are most frequently used for naming specific concepts. We expected that this knowledge would allow us to find appropriate terms in existing vocabularies. We found that there is a very small amount of concepts that are used in all cases in our sample. More precisely, we found only 3 such concepts: call name, event description (written in free form), and dates for deadlines. In almost all CfPs (except two of them), the descriptions of conference topics is used. These four concepts can be considered as basic, universal

set that almost ideally fits into basic CfP vocabulary. However, the amount of semantic data in this set is still not impressive.

There is also a solid set of concepts that is used in groups of CfP related to events, like conferences. Almost all such CfPs use papers as type of submission and the term full paper as type of paper; description and location of event, dates for submission, notification and final submission. In turn, all CfPs for journals use URL link to writing guidelines.

Sometimes the same concept can be named in different CfPs in different ways. E.g. full paper — long paper; topics — scope — subject coverage; camera-ready — final papers — final manuscript; conference registration — authors registration, and so on.

The existing CfPs use diverse vocabularies for description of events. Among them are: conference, poster session, symposia, interactive events, demonstrations, showcases, workshops, tutorials, doctoral consortium and other.

We can conclude that today we have a versatile, but well-established vocabulary, which is used by the organizers of events when writing CfPs. In spite of the variety of terms used, adaptation of the existing vocabularies (discussed in sections 4 and 5) to this de-facto used vocabulary seems as an achievable task.

6. DESIGNING THE CALL ONTOLOGY After elicitation of popular terms used in modern CfPs and analysis of CfP-related representation frameworks, we tried to find, which ontologies and vocabularies can provide formal representation of these terms for the purpose of building semantically reach CfPs.

Because of the size of the picture illustrating the resulting data model – our Call ontology22 – we have split it into five separate pictures; each picture illustrates one logical group of concepts, namely: events, submissions, dates, publications, and persons.

On Figure 2, the concepts for modelling Events of the resulting Call ontology are presented.

Figure 2. Call ontology: concepts for Events23

22 Available now from http://dl.dropbox.com/u/29012918/call-for-papers-ontology.rdf and from http://timeliner.net/ontologies/call/call.rdf in the near future 23 The call prefix stands for the namespace of the Call ontology; eswc stands for the ESWC2006 ontology; cfp prefix replaces the namespace of the DERI’s CfP ontology; finally, dc stands for Dublin Core vocabulary

Following the best practices in ontology development, we tried to reuse the existing ontologies and vocabularies as much as possible. However, the right choice of vocabulary is not a simple task; for example, the Call concept exists in the DERI’s CfP ontology as well as in ESWC2006 and SEDE ontologies, but defined in different contexts. We have found the CfP’s definition of the Call concept as the most flexible and thus the most suitable for reuse and adaptation to our modelling needs. As it was mentioned before, the CfP ontology is designed in a very general manner; this allows for build on top of it almost any required structures with back compatibility compliance. Accordingly, we chose DERI’s CfP ontology, as the foundation of our Call ontology, and defined the call:Call class in our ontology as a subclass of the cfp:Call class.

Similar to the examination of the Call concept, we have examined other CfP concepts that are used in practice. For the concepts that have unambiguous analogues in the examined ontologies, we reused the existing ontologies; for the rest, we introduced new concepts in our Call ontology. The ultimate objective for the Call ontology is to assure an overall logical consistency of CfP terms.

Figure 3 illustrates the Call ontology’s concepts for Submissions.

Figure 3. Call ontology: concepts for Submissions

We have introduced the call:Submission class to represent different kinds of submissions that a CfP might call for. The call:forSubmission property relates a specific CfP (i.e., an instance of call:Call) with one or more kinds of submissions (i.e., instances of the call:Submission) that it calls for. To model the kinds of submissions announced in a specific CfP, we have introduced a number of sub-properties of the call:forSubmission property (shown on Figure 3), each one corresponding to one type of submission that we identified in contemporary CfPs.

Figure 4. Call ontology: concepts for Dates24

24 The time prefix stands for the namespace of the W3C Time ontology

To assign formal semantics to time related terms (Figure 4), we used the W3C Time Ontology25, as the most basic standard way for describing calendar data. We found no vocabulary or ontology providing concepts that could be used for the purpose of representing different kinds of deadlines typical for CfPs. Therefore, in the Call ontology, we introduced a set of properties for that purpose. Specifically, as Figure 4 indicates, different kinds of deadlines are modelled as sub-properties of the general call:hasDeadline property. This way, we can associate a specific CfP (i.e., an instance of call:Call as its formal representation) with one or more time:DateTimeDescription instances to represent different kinds of deadlines published in the CfP.

Figure 5. Call ontology: concepts for Publications26

The part of the Call ontology covering publications seems to be the smallest part of our solution. This is due to the fact that publication related data usually have very scarce description in CfPs for conferences. On the other hand, in CfPs for journals, the situation is different. This can be covered by combining our Call ontology with specialized ontologies such as Bibo27 (Bibliographic Ontology) and BibTeX ontologies.

The easiest task was to find good equivalences for terms related to persons and organizations mentioned in CfPs. The use of popular FOAF ontology seemed to be a valid choice because of its versatility and popularity for describing people.

Figure 6. Call ontology: concepts for People and Organizations

The presented approach illustrates how the proposed Call ontology fills the identified shortage of concepts in CfP related representation frameworks.

7. USAGE SCENARIOS We propose two specific use case scenarios for the application of the introduced Call ontology. The first is related to Timeliner, a

25 http://www.w3.org/TR/owl-time/ 26 The bibo prefix stands for the Bibo ontology 27 http://bibliontology.com/specification

tool for supporting collaborative scientific writing; the second use case is related to a novel conference management system ginkgo28.

Use case 1: Timeliner. The workflow in Timeliner starts when the user subscribes to a personalized CfP feed. This feed can integrate data from different sources, but its accuracy can be significantly improved in case of using semantically marked data. For this purpose the use of the proposed Call ontology is very important. In particular, we plan to design a special purpose semantic repository, which would collect the data generated by modern conference management systems (CMS) by utilizing the presented Call ontology. Using such personalized CfP data, the users can import the relevant information into Timeliner for starting the process of collaborative writing.

Use case 2: ginkgo. Ginkgo is another innovative approach to the management of scientific events that combines classic features of conference management systems with well-known features of social networking sites [10]. Most of the currently available conference management systems support only a small portion of scientific events: the preparation phase in which the organizers set up the tool, define deadlines and accepted submission formats, and invite members of the program committee; the submission phase in which authors submit their proposals to the event, and finally, the review and selection phase in which the members of the program committee review, rate and accept/reject submissions. Some conference management systems support the organizers in the planning of the technical event program and only a small proportion of tools allow participants to register online for the event. So far there are no dedicated systems that support researchers in finding suitable events they might be interested in attending or in selecting appropriate talks during an event they are attending. As a consequence of the upturn of Social Media services in recent years, many researchers have relied on such services to network with the community that forms around each scientific event. The (online) network that is built during a scientific event often strongly influences researchers’ further work and the events they are attending in the future. ginkgo is the first system for scientific event management that directly supports such Research 2.0 [16] practices together with a full support of all the phases and roles involved in the organization of such events [10]. For example ginkgo supports the following / follower pattern known from well-known Web 2.0 applications and applies it to researchers and scientific events. Moreover, the system uses the interface design pattern of Activity Streams to support the user’s awareness about on-going activities.

If the existing conference management systems support the publication of Call for Papers at all, they do it in a free and unstructured form. Most of the time organizers are only given the possibility to upload the CfP in PDF format. As it is one of ginkgo’s main goals to enhance the awareness of scholars in their networks, broad, but properly target dissemination of an event’s CfP can not only increase the chances of receiving many submissions, it also enhances the research network’s awareness about the given event. The provision of semantically enriched CfP data also offers numerous possibilities for (re-)using CfP data by the ginkgo and other emerging Research 2.0 applications thus meeting their existing requirements and offering opportunities for novel, still unforeseen needs. Currently ginkgo supports the creation of CfPs in markdown syntax [17] and does not use any fixed vocabulary for it. In the future, ginkgo could apply the Call

28 http://ginkgosem.com

ontology presented here together with an easy-to-use editor that supports organizers in the creation of semantically enriched CfPs. ginkgo would provide a SPARQL endpoint where crawler could access all available CfPs of events managed with the system.

8. CONCLUSIONS AND FUTURE RESEARCH Considering the availability of wide choice of vocabularies, their practical adaptation to support CfP publishing services seems as the obvious task. To support the CfP providers with integrated solutions the existing conference management systems should be examined for the purpose of services integration; new lightweight and easy to use conference management systems should be developed.

As future challenge we see the design of a service capable of aggregating and making use of semantic-rich CfP data that originate from CfPs generated by conference management systems. This service should: 1) consume advertised CfPs, and 2) provide a structured and personalized access to such data for researchers.

9. REFERENCES [1] Aleman-Meza, B., Hakimpour, F., Arpinar, I B, and Sheth,

A.P. SwetoDblp ontology of Computer Science publications. Web Semantics: Science, Services and Agents on the World Wide Web, 5 (2007), 151 - 155.

[2] Brennhaug, K.E.: EventSeer: Testing Different Approaches to Topical Crawling for Call for Paper Announcements (2005) http://ntnu.diva-portal.org/smash/get/diva2:348108 /FULLTEXT01

[3] Hop, W., Lachner, S., Frasincar, F., De Virgilio, R., Automatic Web Page Annotation with Google Rich Snippets. ( 2010), Springer Berlin / Heidelberg, 957-974.

[4] Hänse, M., Kan, M.Y., and Karduck, A., Kairos: Proactive Harvesting of Research Paper Metadata from Scientific Conference Web Sites (2010), 226-235.

[5] Jeong, S., Kim, H.-G., SEDE: An ontology for scholarly event description. Journal of Information Science, 36 (2010), 209-227.

[6] Kalb, H., Bukvova, H., Schoop, E. (2009). "The Digital Researcher: Exploring the Use of Social Software in the Research Process,". Sprouts: Working Papers on Information Systems, 9(34). http://sprouts.aisnet.org/9-34

[7] Lange, C. Integrated Semantic Web Collaboration on Semiformal Mathematical Knowledge. Jacobs University Bremen – School of Engineering and Science, Bremen, 2010.

[8] F. Lazarinis, “Combining Information Retrieval with Information Extraction for Efficient Retrieval of Calls for Papers”, In Proc. of IRSG’1998, 1998

[9] Manh, C. P., Cao, Y., and Klamma., R. Clustering Technique for Collaborative Filtering and the Application to Venue Recommendation. In Proceedings of I-KNOW 2010 (Graz 2010), J.UCS – Journal of Universal Computer Science, 343-354.

[10] Reinhardt, W., Maicher, J., Drachsler, H, and Sloep, P.: ginkgo. Awareness Support in Scientific Event Management. Submitted to the Special Track on Recommendation, Data Sharing, and Research Practices in Science 2.0 at the 11th International Conference on Knowledge Management and Knowledge Technologies (i-KNOW 2011), 2011.

[11] Shaw, R., Troncy, R., & Hardman, L., (2009). LODE: Linking Open Descriptions of Events. UC Berkeley: School of Information. Report 2009-036. Retrieved from: http://escholarship.org/uc/item/4pd6b5mh

[12] Schneider, K.Michael., Information extraction from calls for papers with conditional random fields and layout features. Artificial Intelligence Review, 25, 1 (2006), 67-77.

[13] Sure, Y., Bloehdorn, S., Haase, P., Hartmann, J., Oberle, D., The SWRC Ontology -Semantic Web for Research Communities. Lecture Notes in Computer Science, 2005, Volume 3808 (2005), 218-231.

[14] Ullmann, T.D., Wild, F., Scott, P., Duval, E., Vandeputte, B., Parra, G., Reinhardt, W., Heinze, N., Kraker, P., and Fessl, A. Components of a Research 2.0 Infrastructure. In Proceedings of EC-TEL. 2010, 590-595.

[15] Xin, X., Li, J., and Tang, J. Enhancing Semantic Web by Semantic Annotation: Experiences in Building an Automatic Conference Calendar. (Silicon Valley 2007), IEEE/WIC/ACM International Conference on Web Intelligence.

[16] Waldrop, M.: Science 2.0. Scientific American, 298(5), 46-51, 2008.

[17] Gruber, J.: Mardown: Syntax. http://daringfireball.net/projects/markdown/syntax, 2011.