22
Applying Text Classification in Conference Management: Some Lessons Learned Andreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber

Applying Text Classification in Conference Management: Some Lessons Learned

  • Upload
    leyna

  • View
    19

  • Download
    0

Embed Size (px)

DESCRIPTION

Applying Text Classification in Conference Management: Some Lessons Learned. Andreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber. Overview. Conference Management Systems Classification & Clustering Case Studies ECDL 2005 ECR Conclusions. Conference Management Systems. - PowerPoint PPT Presentation

Citation preview

Page 1: Applying Text Classification in Conference Management:  Some Lessons Learned

Applying Text Classification in Conference Management: Some Lessons LearnedAndreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber

Page 2: Applying Text Classification in Conference Management:  Some Lessons Learned
Page 3: Applying Text Classification in Conference Management:  Some Lessons Learned

Overview

Conference Management Systems Classification & Clustering Case Studies

ECDL 2005 ECR

Conclusions

Page 4: Applying Text Classification in Conference Management:  Some Lessons Learned

Conference Management Systems Set of tools to support conference workflow Basic support for paper submission &

review collection Many tasks for further automation

Selection of the program committee Topic assignment of submission Paper to reviewer assignment Support in review generation Poster arrangement Post-conference access to papers

Page 5: Applying Text Classification in Conference Management:  Some Lessons Learned

Classification & Clustering Topic assignment of submission

Problem: authors uncertain about precise topic assignment (conference terminology)

Solution: support by automatic assignment Method: ATC based on abstracts

Poster arrangement & Post-conference access to papers Problem: topic based arrangement Solution: clustering Method: SOM & Mnemonic SOM

Page 6: Applying Text Classification in Conference Management:  Some Lessons Learned

ATC for topic assignment

Train model based on previous conferences Abstract submission Automatic assignment Confirmation

Page 7: Applying Text Classification in Conference Management:  Some Lessons Learned

Clustering for organization

Arrange posters thematically Non-rectangular SOMs reflecting

conference site Mnemonic SOMs simplify post-conference

paper access

Page 8: Applying Text Classification in Conference Management:  Some Lessons Learned

Overview

Conference Management Systems Classification & Clustering Case Studies

ECDL 2005 ECR

Conclusions

Page 9: Applying Text Classification in Conference Management:  Some Lessons Learned

ECDL 2005 – ATC data

English abstracts of previous ECDL conferences

Topics of the conference call -> defined seven categories

Pre-processing (removing all numbers, punctuation marks, special characters, transformation to lower case)

tfidf-weighting 4,141 unique terms IG of 3,460 top ranked terms average -

accuracy over all category is 58.60%

Page 10: Applying Text Classification in Conference Management:  Some Lessons Learned

ECDL – training dataclass-

IDclass description sum

1Concepts of Digital Libraries, Concepts of Documents and Metadata

34

2System Architectures, Open Archives, Collection Building, Integration and Interoperability

40

3Information Retrieval, Information Organization, Search and Usage

67

4User Studies, System Evaluation, Personalization, User Interfaces and User Centered Design

50

5 Digital Preservation, Web Archiving and Long Term Access 12

6 Digital Library Applications and Case Studies 65

7Multimedia, Mixed Media, Audio, Video, 3D and non-traditional Objects

43

sum over the selected abstracts 311

Page 11: Applying Text Classification in Conference Management:  Some Lessons Learned

ECDL 2005 – classification results

class-ID 1 2 3 4 5 6 7total

recall F1

1 1 1 2 2 . 1 1 8 0.130.1

7

2 1 17 1 . . . . 19 0.890.7

7

3 1 3 26 6 . 2 . 38 0.680.6

9

4 . . 4 21 . 2 1 28 0.750.7

1

5 1 1 3 . . 1 1 7 0.000.0

0

6 . 3 1 2 . 12 1 19 0.630.6

5

7 . . . . . . 3 3 1.000.6

0

precision 0.25 0.68 0.70 0.68 0.00 0.67

0.43      

Page 12: Applying Text Classification in Conference Management:  Some Lessons Learned

ECDL 2005 – SOM data

Poster and Paper Organization: full text of accepted posters of ECDL 2005 term selection based on minimal word

length and document frequencies 30 posters - 569 terms

Post-conference access 71 papers and posters – 5,654 terms

Page 13: Applying Text Classification in Conference Management:  Some Lessons Learned

ECDL 2005 – SOM

Page 14: Applying Text Classification in Conference Management:  Some Lessons Learned

ECDL 2005 – SOM (2)

Page 15: Applying Text Classification in Conference Management:  Some Lessons Learned

Overview

Conference Management Systems Classification & Clustering Case Studies

ECDL 2005 ECR

Conclusions

Page 16: Applying Text Classification in Conference Management:  Some Lessons Learned

ECR - Data

Abstracts of the ECR:European Congress for Radiology

Training set: ECR 2003 & 2004 - 1,952 documents

Test set: ECR 2005 - 924 documents Same steps as for the ECDL data Resulting in 14,887 unique terms IG: 5,720 top ranked terms, average

accuracy over all categories of 73.57%

Page 17: Applying Text Classification in Conference Management:  Some Lessons Learned

ECR – training dataclass-ID class description

2003

2004

sum

1Abdominal and Gastrointestinal 160 119 279

2 Breast 80 59 139

3 Cardiac 70 70 140

4 Chest 60 70 130

5 Computer Applications 30 30 60

6 Contrast Media 40 39 79

7 Genitourinary 70 60 130

8 Head and Neck 40 40 80

9 Interventional Radiology 130 117 247

10 Musculoskeletal 90 80 170

11 Neuro 90 99 189

12 Pediatric 30 40 70

13 Physics in Radiology 40 40 80

14 Radiographers 10 10 20

15 Vascular 69 70 139

sum over the selected abstracts 1009 943195

2

Page 18: Applying Text Classification in Conference Management:  Some Lessons Learned

ECR 2005 – classification results

class-ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15total

recall F1

1 111 1 . 1 2 2 2 . 2 1 2 . 1 . 1 126 0.880.79

2 1 61 . . . . . . 1 . . . 6 . . 69 0.880.87

3 1 . 73 . . . . . . 1 . . 3 . 2 80 0.910.86

4 6 . 5 49 1 . . . 3 . 1 . . . 5 70 0.700.77

5 2 2 . 3 10 . . . . . 3 . 7 . 3 30 0.330.43

6 12 . . 1 . 26 2 . 1 2 1 . 1 . 3 49 0.530.61

7 5 . . . . 1 38 . 5 3 2 . 3 . 1 58 0.660.73

8 4 . . 1 . 2 4 8 2 2 4 . 2 . 1 30 0.270.39

9 2 4 2 . 1 3 . . 99 2 2 . . . 5 120 0.830.81

10 2 2 . 1 1 1 . . 2 60 5 1 2 . 1 78 0.770.78

11 1 . 1 1 . . . 1 4 . 64 2 1 . 4 79 0.810.73

12 4 . 1 . . . . 1 1 1 10 11 . . 1 30 0.370.50

13 . 1 3 . . . . . . 1 2 . 39 . 2 48 0.810.68

14 2 . . . 1 . . . . 3 . . . 2 . 8 0.250.40

15 2 . 4 . . 1 . 1 3 . . . 1 . 37 49 0.760.64

precision

0.72

0.86

0.82

0.86

0.63

0.72

0.83

0.73

0.80

0.79

0.67

0.79

0.59

1.00

0.56      

Page 19: Applying Text Classification in Conference Management:  Some Lessons Learned

Conclusions

Quality is proportional to amount of training documents

Structure of the classes (overlapping?)

The bulk of submissions can be dealt with automatically

May be used for session assignment Arrange poster & papers thematically Easy to memorize & find

Page 20: Applying Text Classification in Conference Management:  Some Lessons Learned

Questions?

E-Commerce Competence Center

Donau-City-Strasse 1

1220 Vienna Austria

Phone: +43/1/522 71 71-20

Fax: +43/1/522 71 71-71

Internet: http://www.ec3.at/

E-Mail: [email protected]

Page 21: Applying Text Classification in Conference Management:  Some Lessons Learned
Page 22: Applying Text Classification in Conference Management:  Some Lessons Learned

ECDL 2005