PERFORMANCE AND POWER OPTIMIZATION FOR COGNITIVE PROCESSOR ... and power optimization for cognitive processor design using deep-submicron very large scale integration ... neural network models,

  • View
    212

  • Download
    0

Embed Size (px)

Transcript

  • AFRL-RI-RS-TR-2010-076 Final Technical Report March 2010 PERFORMANCE AND POWER OPTIMIZATION FOR COGNITIVE PROCESSOR DESIGN USING DEEP-SUBMICRON VERY LARGE SCALE INTEGRATION (VLSI) TECHNOLOGY State University of New York at Binghamton

    APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED.

    STINFO COPY

    AIR FORCE RESEARCH LABORATORY INFORMATION DIRECTORATE

    ROME RESEARCH SITE ROME, NEW YORK

  • NOTICE AND SIGNATURE PAGE Using Government drawings, specifications, or other data included in this document for any purpose other than Government procurement does not in any way obligate the U.S. Government. The fact that the Government formulated or supplied the drawings, specifications, or other data does not license the holder or any other person or corporation; or convey any rights or permission to manufacture, use, or sell any patented invention that may relate to them. This report was cleared for public release by the 88th ABW, Wright-Patterson AFB Public Affairs Office and is available to the general public, including foreign nationals. Copies may be obtained from the Defense Technical Information Center (DTIC) (http://www.dtic.mil). AFRL-RI-RS-TR-2010-076 HAS BEEN REVIEWED AND IS APPROVED FOR PUBLICATION IN ACCORDANCE WITH ASSIGNED DISTRIBUTION STATEMENT. FOR THE DIRECTOR: /s/ /s/ THOMAS RENZ EDWARD J. JONES, Deputy Chief Work Unit Manager Advanced Computing Division Information Directorate This report is published in the interest of scientific and technical information exchange, and its publication does not constitute the Governments approval or disapproval of its ideas or findings.

    http://www.dtic.mil

  • REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Washington Headquarters Service, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington, DC 20503. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS.1. REPORT DATE (DD-MM-YYYY)

    MARCH 2010 2. REPORT TYPE

    Final 3. DATES COVERED (From - To)

    October 2008 October 2009 4. TITLE AND SUBTITLE PERFORMANCE AND POWER OPTIMIZATION FOR COGNITIVE PROCESSOR DESIGN USING DEEP-SUBMICRON VERY LARGE SCALE INTEGRATION (VLSI) TECHNOLOGY

    5a. CONTRACT NUMBER N/A

    5b. GRANT NUMBER FA8750-09-2-0011

    5c. PROGRAM ELEMENT NUMBER61101E

    6. AUTHOR(S) Qing Wu and Qinru Qiu

    5d. PROJECT NUMBER BINA

    5e. TASK NUMBER CS

    5f. WORK UNIT NUMBER 09

    7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)State University of New York at Binghamton Department of Electrical Engineering 4400 Vestal Parkway East Binghamton, NY 13902

    8. PERFORMING ORGANIZATION REPORT NUMBER N/A

    9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) AFRL/RITA 525 Brooks Road Rome NY 13441-4505

    10. SPONSOR/MONITOR'S ACRONYM(S) N/A

    11. SPONSORING/MONITORING AGENCY REPORT NUMBER AFRL-RI-RS-TR-2010-076

    12. DISTRIBUTION AVAILABILITY STATEMENT APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. PA# 88ABW-2010-1156 Date Cleared: 15-March-2010 13. SUPPLEMENTARY NOTES 14. ABSTRACT In the first part of this project, we investigated the performance and power optimization techniques of the floating point unit design as a part of the Air Force Research Laboratory, AFRL cognitive processor project. Our main focus was on exploring different design and synthesis methodologies that lead to the optimized area and power consumption, while fulfilling the performance requirements. Other tasks in this part included tight integration and interaction of logic/physical synthesis, custom circuit design, etc. Simulation and timing analysis results show that our post-layout designs met the area, timing and power requirements of the project. In the second part of the project, we developed a multi-layer cognitive model and algorithm for intelligent text recognition. The algorithm integrates three layers of different cognitive computing models in order to achieve the best accuracy in optical text recognition, as well as the best computation performance on a massively parallel computing cluster. In the first layer, we developed a novel neural network model that performs character recognition from images. The new model is able to provide more than one answer to the input image that is essential for the second layer, word-level recognition based on cogent confabulation. The word confabulation layer also provides multiple candidates that will be cross-checked by the third layer, the sentence confabulation algorithm. We believe that the multi-layer cognitive model concept invented by this project has significant innovation potential in the area of optical text recognition, machine learning and natural language processing. 15. SUBJECT TERMS Floating Point, Synthesis, Cognitive Computing, Brain State in a Box, Confabulation

    16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT

    UU

    18. NUMBER OF PAGES

    35

    19a. NAME OF RESPONSIBLE PERSON Thomas Renz

    a. REPORT U

    b. ABSTRACT U

    c. THIS PAGE U

    19b. TELEPHONE NUMBER (Include area code) N/A

    Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std. Z39.18

  • i

    TABLE OF CONTENTS

    1.0 Summary 12.0 Introduction 2

    2.1 IEEE Standards in Floating Point Numbers and Computations 22.2 Cognitive Models and Algorithms for Intelligent Text Recognition 4

    2.2.1Brain-State-in-a-Box Neural Network Model. 42.2.2Confabulation Theory 6

    3.0 Floating Point Unit Design and Synthesis 84.0 Multi-layer Cognitive Model in Intelligent Text Recognition 13

    4.1 Overview of the algorithms, software and hardware platforms 134.2 Modified BSB algorithm with racing mechanism 164.3 Performance evaluation of the intelligent text recognition program 21

    5.0 Conclusions 276.0 References 287.0 List of Symbols, Abbreviations, and Acronyms 29

  • ii

    LIST OF FIGURES

    Figure 1: I/O pins and descriptions for single precision floating point adder/multiplier 2Figure 2: A confabulation example 7Figure 3: The ASIC-style design and synthesis flow for FPU 8Figure 4: Screen shots of the final layouts 10Figure 5: Projected performance and power roadmap for FPU 11Figure 6: Mixed ASIC and custom design style for multipliers 11Figure 7: Schematics of custom multiplier design 12Figure 8: (a) A tainted text image (b) Layers of intelligent text recognition 13Figure 9: Multi-layer hybrid cognitive model for ITR 14Figure 10: Implementation platform for ITR at Binghamton University 15Figure 11: Overall software flow and partition 16Figure 12: Training vector for letter a in Times font 17Figure 13: Illustration of the training and recall processes of character recognition 17Figure 14: 1/2/3/5-scratch input images of a 18Figure 15: Screen shot of the ITR program at work 21Figure 16: Average word confabulation time versus scratch numbers 23Figure 17: Average word confabulation time versus scratch numbers Novel 24Figure 18: Percentage of word confabulations vs. scratch probability 24Figure 19: Percentage of incorrect word Confabulations vs. scratch severity - 20% 25Figure 20: Percentage of incorrect word Confabulations vs. scratch severity - 40% 25

  • iii

    LIST OF TABLES Table 1. IEEE 754 Rounding Modes 3Table 2. IEEE 754 Status Flags 3Table 3. Area and power consumption numbers after logic synthesis 9Table 4. Area and timing numbers after physical synthesis 9Table 5. Convergence numbers for 1-scratch images 19Table 6. Convergence numbers for 5-scratch images 20Table 7. Statistics of testing documents 22Table 8. Sentence confabulation accuracy results 26

  • 1

    1.0 SUMMARY In the first part of this project, we investigated the performance and power optimization techniques of the floating point unit design as a part of the Air Force Research Laboratory, AFRL cognitive processor project. Our main focus was on exploring different design and synthesis methodologies that lead to optimized area and power consumption, while fulfilling performance requirements. Meanwhile we were also able to obtain accurate estimations of power and area of the final design, from the synthesis and simulation flow. Other tasks in this part included tight integration and interaction of logic/physical synthesis, custom circuit design, etc. Simulation and timing analysis results show that our post-layout designs met the area, timing and power requirements of the project. In the second part, we developed a multi-layer cognitive model and algorithm for intelligent text recognition. The algorithm integrates three layers of different cognitive computing models in order to achieve the best accuracy in optical text recognition, as well as the best computation performance on a massively parallel computing cluster. In the first layer, we developed a novel neural network model that performs character recognition from images. Different from other neural network models, the new model is able to provide more than one answer to the input image. This feature is essential for the second layer, which is word-level recognition based on cogent confabulation. Similarly the word confabulation layer is able to p

Recommended

View more >