Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Fall-12: Early Adoption of NSF/TCPP PDC Curriculum Exploring Computer Vision and Image Processing Algorithms in Teaching
Parallel Programming
Professor Dan Connors- [email protected] Department of Electrical Engineering, University of Colorado Denver
Introduction • Multicore processors and GPUs (Graphics Processing Units)
are universally available to student programmers • Need to prepare students for future of parallel programming • Substantial programmer burden in developing optimized
implementations for current parallel programming languages and architecture models
• Effective parallel programming requires knowledge of parallel computing principles and advance architecture concepts
Requires a staged approach
Curriculum Design
Text block
Computer Vision Algorithms
• Computer vision includes methods for acquiring, processing, analyzing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information.
• There is definite interest in ways that computer vision will impact society such as assisted driving, augmented reality, biometric search, and medical image analysis.
• Some of the core foundational algorithms related to parallel computing concepts that have OpenCL/CUDA assignments:
• Scale-invariant feature transform (or SIFT) is an algorithm in computer vision to detect and describe local features in images. The algorithm was published by David Lowe in 1999.
• Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, and video tracking.
• Object Detection: Histogram generation and similarity matching
• Student project extensions of histogram generation: GPU-accelerated thumbnail-based image and video mosaics
• Edge Detection: Sobel Filter • Parallel K-Means Clustering and KNN Classification • Glyph Recognition and Matching
NVIDIA CUDA GPU SIFT Assignment
CUDA programming of SIFT matching explores both parallel programming models and architecture concepts
• Architecture: DRAM latency, shared memory, bank conflicts, • Programming: data-parallel thread execution model,
synchronization
Summary and Future Work
ELEC 1201-1 Intro to Electrical Engineering
ELEC 1510-3 Logic Design
Digital Foundation 8 hours
Digital Core 6 hours
Digital Specialty 7 hours
ELEC 1520-3 Embedded Systems I
ELEC 3651-3 Digital Hardware Design
ELEC 2531-1 Logic Laboratory
ELEC 2520-3 Embedded Systems II
ELEC 4501-3 Microprocessor-Based Design (MBD)
ELEC 4521-1 MBD Lab
ELEC 4511-3 Hardware-Software Interface Design (HW-SW)
ELEC 4561-1 HW-SW Lab
ELEC 4723-3 Computer Architecture
ELEC 4727-3 Computer Vision Acceleration
NSF/TCPP PDC Curriculum Early Adopter Initiative
• GPUs represent a highlight of the research work being done at the University of Colorado Denver in computer engineering and embedded systems
• Seek ways to integrate emerging research tools into courses • Provide demonstration of research concepts in GPGPU
computing that advance the state of the art in multiple disciplines:
• Mobile and unmanned aerial vehicle computer-vision • Acceleration of neurobiology modeling and neuron
simulation
ELEC 4727 Computer Vision Acceleration with MC/GPU
Here is an example of the expected results for comparing keypoints. This measures the execution time when the keypoint file KP1 and KP2 is compared to
0
10
20
30
40
50
60
70
80
90
2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
Sec
onds
Total Keypoints (KP1 + KP2)
Performance Comparison of SIFT Matching Algorithms
SequentialCUDA
!
!
! !
• The NVIDIA Corporation funded our
program with a CUDA Center of Teaching Excellence grant
• Enabled the resource for an additional Teaching Assistant (TA) for one semester
• The NSF/TCPP Curriculum awarded an Early Adopter Award for developing core curriculum for CS/CE undergraduates related to parallel and distributed computing (PDC) topics. (2012)
• Performance for evaluating full SIFT matches with varying keypoint size on two GPUs: NVIDIA GTX280 and NVIDIA GTX480 (respectively with 280 and 480 cores).
• The difference between the GTX 280 and GTX 480 is clearly distinguished, and concepts in doubling the number of transistors and doubling performance in the architecture model according to Moore’s law can be communicated and discussed.
• A new special topics (elective) course with focus on hands-on experience and applications of GPU architectures
• GPU archi tectures, CUDA programming, and computer vision algorithms with OpenCV
• Enrollment: 23 undergraduates, 13 graduates in spring 2013
• NSF/TCPP PDC curriculum being adopted at University of Colorado Denver
• Computer vision serves to promote interest in parallel/distributed computing education and computer engineering
• Critical to provide real-world applications of parallel programming and ways for students to explore on their own new concepts
• OpenCV provides advantage
Integrated Research
• The primary focus of our approach is to motivate the area of computer engineering by exploring foundational algorithms and their implementation on parallel computing systems in four strategic courses:
• ELEC 1520 - Embedded Systems I: Intro to C Programming
• ELEC 2520 - Embedded Systems II: Microcontroller Systems
• ELEC 4723 - Advanced Computer Architecture • ELEC 4727 - Computer Vision Acceleration with GPU &
Multicore Processors • Goal is to integrate parallel concepts early and consistently
throughout curriculum • Course modifications
• Demonstration of new technologies and architectures • CUDA/OpenCL Application Programming Interfaces (APIs) • Parallel programming examples of core Algorithms
• Performance comparison for evaluating SIFT matching for a range of keypoint file sizes for both the CPU and GPU models
• This example helps demonstrate the massively parallel capabilities of the GPU as students gain insight into natural differences in performance models.
General Purpose GPU (GPGPU) Computing • Arithmetic intensity and application domain scaling
Acknowledgements
• Students Kyle Dunn and Jeff Wiencrot helped develop OpenCV-based computer vision examples and helped build course support for the CUDA computing environment
Approach
Goals
• Early Stage • Enable students to detect and describe coding and
computation cases with inherent parallelism • Later Stage
• Observe students selecting GPU parallel solutions for semester-end projects and senior projects in computationally-intensive scenarios
• The focus of our approach is to integrate GPU-related
programming concepts in distinct phases within multiple courses in the curriculum
• Adopt real applications/real systems as the learning motivation and use them in teaching related topics
• Provide project-based experiences and opportunities
Motivate Parallelism
Code Parallelism
Optimize Parallelism
Explore Parallelism
1st Year
2nd Year
3rd Year
4th Year
• Expose students to the concept of high-level concepts of GPU parallelism
• Relate parallelism to performance • Real-world application domains & demonstrations
• Students deploy API interfaces and GPU templates • Understand the scenarios for deploying GPGPU codes
• Students advance their understanding of GPU model • Overcome performance bottlenecks with knowledge of
computer organization concepts
• Students independently leverage GPU systems • Investigate open-ended projects with GPU acceleration