Upload
reyna-thornock
View
214
Download
1
Embed Size (px)
Citation preview
Parallel Apriori Algorithm Using MPI
Congressional Voting Records
Çankaya University
Computer Engineering Department
Ahmet Artu YILDIRIM
January 2010
Efficient Association Rules Mining Using MPI
Overview
• Apriori algorithm used for discovery of association rules
• Computation time is the major issue if dataset is pretty large
• The aim is to increase efficiency of mining process in running time manner utilizing computers for parallel computation
Efficient Association Rules Mining Using MPI
Apriori Algorithm (Example)
• Confidence({5}→{2,3})=Prob({2,3,5}/{5})=2/3=0.66
• Min support=50%
• Min support count=0.5x4 = 2
• Min confidence = 0.50
Efficient Association Rules Mining Using MPI
Technology and Methodology• Platform: GNU/Linux 2.6.20.7 i386
Programming language: ISO C99 language Cross platform APIs: MPICH API for MPI implementation and Glib API utility library Compiler suite: GNU toolchain
• Division Methodology:
1. Dataset division
2. Large frequent itemset division
• Dataset division methodology used
Efficient Association Rules Mining Using MPI
Data Division (Merging Local Support)
Efficient Association Rules Mining Using MPI
Parallel Apriori Algorithm Flowchart
Efficient Association Rules Mining Using MPI
Dataset
• 1984 United States congressional voting records
• Attribute Information: Democrat, republican, handicapped infants yes-no, water project cost sharing yes-no, adoption of the budget resolution yes-no, physician fee freeze yes-no, el salvador aid yes-no, religious groups in schools yes-no, aid to nicaraguan contras yes-no, mx-missile yes-no, immigration yes-no, synfuels corporation cutback yes-no, education spending yes-no, superfund right to sue yes-no, crime yes-no, duty free exports yes-no, export admin act south africa yes-no
Efficient Association Rules Mining Using MPI
Preprocessing of Dataset
• Data transformation applied before processing
• Attributes numbered such as democrat = 1, republican = 2, handicapped infants yes = 3, handicapped infants no = 4, water project cost sharing yes = 5 …
Efficient Association Rules Mining Using MPI
Config File and Run CommandConfig File:
attributecount=34
transactioncount=435
minsupportpercent=50
minconfidencepercent=80
Command:
mpirun -np x -machinefile machines ./aprioriparallel
Efficient Association Rules Mining Using MPI
Program Output
Efficient Association Rules Mining Using MPI
Rules
Rules according to confidence threshold level 80%:
• Democrats support
• Adoption of the budget resolution
• Aid to Nicaraguan contras
• Democrats do NOT support
• Physician fee freeze
Efficient Association Rules Mining Using MPI
Rules (cont.)
Rules according to confidence threshold level 80%:
• Those who do not support physician fee freeze, support adoption of the budget resolution
• Those who support adoption of the budget resolution also do not support physician fee freeze
Efficient Association Rules Mining Using MPI
Parallel Computation Speed Up
• Run on Çankaya University wee cluster
• Processor Specs: 600 MHz CPU, 250 Mb Ram
• Speed up = ts / tp
Efficient Association Rules Mining Using MPI
Conclusion
• Parallel version of Apriori algorithm is efficient in running time manner with large datasets
• Scalability gained via adding additional nodes (computers) or memory without modification of code
• High price-performance ratio by utilizing less powerful computers
Thank You
Questions?