41
Introduction www.verbyx.com [email protected]

An introduction to VBX Aug 2014

Embed Size (px)

Citation preview

Page 1: An introduction to VBX Aug 2014

Introduction

www.verbyx.com

[email protected]

August 2014

Page 2: An introduction to VBX Aug 2014

Verbyx Inc.

A Corporation registered in the State of Wyoming

Incorporation Date 4/21/2011

Registered Address (not to be used for correspondence)

125 S King St

Jackson

WY 83001

Mailing Address

4417 13th St No 154St CloudFL 34769

Statement of Confidentiality and copyright© 2014 Verbyx Inc. All rights reserved. Written and published by Verbyx Inc. Except for the purpose of evaluating this proposal, no part of this document may be reproduced, stored in, or introduced into a database or retrieval system, or transmitted, in any form or by any means, (electronic, mechanical, photocopying, recording, or otherwise) without the prior written permission of Verbyx Inc.

Verbyx Inc. reserves all rights in the confidential information and intellectual property contained in this document. This document contains information relating to the business, commercial, financial and technical activities of Verbyx Inc. This information is intended for the sole use of the recipient only and the disclosure of this information to a third party would expose Verbyx Inc. to considerable disadvantage.

The products or architecture names mentioned in this document are registered trademarks, trademarks, and trade names of their respective owners.

Printed in United States

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 2

Page 3: An introduction to VBX Aug 2014

ContentsEXECUTIVE SUMMARY...............................................................................................................5

INTRODUCTION...........................................................................................................................5

The Industry...............................................................................................................................6

Verbyx the Company..................................................................................................................6

Verbyx Products.........................................................................................................................6

THE PEOPLE................................................................................................................................6

Management..............................................................................................................................6

Gary Pearson – CEO and Co-founder...................................................................................7

Chang Qing Shu PhD – Chief Scientist and Co-founder........................................................7

John Sawyer – Chief Technology Officer...............................................................................8

The Board of Advisors...............................................................................................................8

Mark Rogerson.......................................................................................................................8

Marty Sinicrope.......................................................................................................................8

Kevin Brown...........................................................................................................................8

Graeme Riley..........................................................................................................................9

THE SPEECH RECOGNITION MARKET...................................................................................10

THE VERBYX ADVANTAGE.........................................................Error! Bookmark not defined.

A BRIEF DISCUSSION OF ASR TECHNOLOGY.......................................................................12

Natural Language Processing..................................................................................................12

Language, Dialect and Accent Support...................................................................................12

THE VERBYX DIFFERENCE......................................................................................................13

The Verbyx ASR - VRX............................................................................................................13

VRX Developments..................................................................................................................13

The VRX Difference.................................................................................................................14

VRX CAPABILITIES AND FEATURES.......................................................................................16

Primary ASR Features.............................................................................................................16

Voice Model Language Support...............................................................................................17

VERBYX INTELLECTUAL PROPERTY......................................................................................18

OUR BUSINESS..........................................................................................................................20

CUSTOMERS..........................................................................................................................20

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 3

Page 4: An introduction to VBX Aug 2014

CURRENT CLIENTS............................................................................................................20

NEAR TERM OPPORTUNITIES..........................................................................................21

APPENDIX A - Chang-Qing Shu Ph.D........................................................................................23

SUMMARY...............................................................................................................................23

PATENTS.................................................................................................................................23

HONORS.................................................................................................................................23

SKILLS.....................................................................................................................................23

EDUCATION............................................................................................................................24

EXPERIENCE..........................................................................................................................24

Adacel System Inc. Orlando, FL.......................................................................................24

Senior Scientist 2007 – 2011............................................................................................24

Convergys Reston, VA......................................................................................................24

Senior Speech Recognition Scientist 2005 –2007............................................................24

Carnegie Speech Company Pittsburgh, PA.......................................................................24

Speech Recognition Scientist/Engineer 2002 –2005........................................................24

Speech Solution Group, BBN Technologies, Verizon Cambridge, MA..............................25

Researcher/Engineer 1994 – 2001...................................................................................25

Invented and developed new capabilities for Speech Recognition Engine:.........................25

Improved Hark for recognition accuracy, memory usage, and running time:.......................25

Developed application-specific language model:.................................................................26

Image Business Systems, Inc. New York, NY...................................................................26

Researcher/Engineer 1991 - 1994.....................................................................................26

Image Processing Laboratory, New Jersey Institute of Technology Newark, NJ..............26

Postdoctoral Fellow 1989 - 1991.......................................................................................26

PUBLICATIONS.......................................................................................................................26

Conference Papers Speech Recognition...........................................................................26

Journal Papers Image Processing.....................................................................................26

Conference Papers Image Processing..............................................................................27

Journal/Book’s Papers-Physics............................................................................................28

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 4

Page 5: An introduction to VBX Aug 2014

EXECUTIVE SUMMARY

1. Verbyx is a company at the cutting edge of speech recognition technology development2. We have no debt3. We are approaching our 4th anniversary as a US Corporation4. We have an outstanding product 5. We have a great team6. We have happy customers7. We have a number of unique pieces of intellectual property

If any of this excites you then read on.

INTRODUCTION

Automatic Speech Recognition (ASR) has promised a change in the way that humans interact with machines. For over 30 years the hope has been that the technology can liberate us from our attachment to the keyboard and mouse. ASR is not only the interface between man and machine; it is the bridge to universal language translation and global communication.

Automatic Speech recognition (ASR) is the ability to take spoken words and sentences and convert them through the use of computer programs into a machine readable format. With speech converted into machine readable format, the potential for what can be done with it is only limited by our imagination.

At the core of all speech recognition enabled applications is a speech recognition engine. For example, the popular Apple iPhone application SIRI and the Dragon Naturally Speaking dictation program have completely different uses yet they both share a common speech recognition engine from the same vendor.

Verbyx has a clear vision of the necessary path needed to make ASR ubiquitous in our daily lives by solving the fundamental limitations of current ASR technologies. Verbyx has significantly progressed the key technologies required to greatly improve the accuracy of today’s technology, making it applicable to many more applications and domains. Verbyx also developed a technology that significantly reduces the cost of ASR development. It will permit substantial growth in the market by making ASR economically viable in many more languages, dialects and accents.

We have created a new approach to speech recognition that will dramatically alter the future of how we interact with machines and each other. Verbyx ASR technology promises to provide a path to the future that the industry has been seeking since the advent of the technology in the 1950’s

.

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 5

Page 6: An introduction to VBX Aug 2014

The Industry

The speech recognition market is divided into two categories, a handful of companies such as Verbyx, that develop core speech recognition technologies and a much larger segment that develops products and services that utilize these core speech recognition technologies.

Verbyx the Company

Verbyx was born out of the frustrations of using today’s ASR in developing applications in the simulation and training and command and control industries. Frustration included the lack of progress in the advancement of the technology, poor (and in some cases non-existent) customer support, considerable expense and the time consuming need to develop “solutions” to overcome the limitations of ASR.

Verbyx was incorporated early in 2011 in Jackson, Wyoming. We have however, been developing our ASR technology since 2009. Our Chief Scientist has been developing new theories and algorithms since the early 90’s. We are a team of experienced industry and business professionals.

Verbyx has no debt, venture capital investment or angel investment. Verbyx was initially funded from founder resources and for the past 18 months we have operated using customer generated revenues.

Verbyx Products

Verbyx generates revenue from selling speech recognition products and services

VRX Prime – The software responsible for the speech recognition process. Acoustic Models (or Voice Models) – Every speech recognition engine requires an

acoustic model. The model tells the speech engine how to interpret the specific characteristics of a speaker or more typically a group of speakers. The acoustic model is specific to language, accent and even subject matter domain, e.g., a United States English Model is not suitable for an Australian English speaking user.

Consulting services – Verbyx provides speech recognition consulting services.

Verbyx does not compete with its customers. We do not develop our own end user applications so that we can avoid this conflict of interest, we simply provide the best ASR technology for others to use.

THE PEOPLE

Management

The team behind Verbyx has a powerful combination of science and technology skills and management and industry experience that best positions us for success in our mission to

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 6

Page 7: An introduction to VBX Aug 2014

provide the world’s best speech recognition capabilities. While all of our employees are significant contributors to our business, only executive personnel are listed for brevity

Gary Pearson – CEO and Co-founder

Gary has over 30 years of management experience with 16 years of in technology companies. Gary has recently served as the Chief Operating Officer for Adacel Systems. Adacel is a company that leads the world in speech recognition systems for advanced military fighter aircraft and highly sophisticated simulation training systems. Under his strategic leadership Adacel has developed a reputation as the go to company for speech recognition enabled applications in complex environments.

Chang Qing Shu PhD – Chief Scientist and Co-founder

Dr. Shu is the originator of many of the innovative features and technologies in VRX. Dr. Shu has a Ph.D. in Theoretical Physics, an M.S. in Theoretical Physics and an M.S. in Computer Science. He has significant experience in ASR engine development, voice and language model creation and the development of applications that make use of ASR. Dr. Shu in his remarkable career has been a primary contributor to reduced error rates in ASR applications by most of the leading speech technology companies.

Dr. Shu began his career as a research engineer; he played a significant role in the development of the BBN Hark recognizer and has developed numerous applications for Convergys and Carnegie Speech Company. Prior to founding Verbyx, Dr. Shu was the speech scientist for Adacel Systems, a company that specializes in complex speech recognition enabled simulation systems for training pilots and air traffic controllers. The combination of the practical skills typically found in speech recognition engineers with the in-depth knowledge of the pure science of speech recognition and over 30 years of industry experience, makes Dr. Shu unique in his field. Refer to Appendix A for

further details on the achievements of Dr Shu.

John Sawyer – Chief Technology Officer

John is the designer of the Verbyx VRX speech engine and manages the Verbyx engineering group. As an accomplished software engineer and technical leader with 17 years’ experience he brings extensive software development expertise to Verbyx. He is an expert in large-scale object-oriented software design and development, and possesses a deep knowledge of real-time simulation systems and speech recognition. Prior to joining Verbyx, John held technical leadership roles at Raytheon and

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 7

Page 8: An introduction to VBX Aug 2014

Adacel building simulation and training systems for the air traffic control and air defense markets. John studied Mathematics at the University of Surrey (UK) and holds a BSc Honors in Software Engineering from Coventry University (UK).

The Board of Advisors

Mark Rogerson

Mark is Chief Executive Officer at Speedy Services, a $550 Million, Integrated Services Provider across infrastructure, industrial, construction and event markets.

Mark previously spent 8 years with Serco Group PLC.. During his 8 years with Serco, Mark played a key role in the evolution of the company from a £800m to a £4.5Bn, 37 country enterprise, having has held multiple P&L and general management roles in the UK and overseas. He led Serco’s Defense and Aviation Business, and was the Operations and Transformation Director in one of Serco’s largest operating divisions. He served the Board of Serco North America (NA) where he led the competitive due diligence and subsequent transition and integration for Serco’s then largest M&A transaction - a $500M revenue federal IT services company.

Marty Sinicrope

Marty is the President and Chief Operating Officer for GlobalTranz, a $300 million logistics and transportation company headquartered in Phoenix Arizona. Marty has a 29 year track record of developing new business opportunities and implementing key business strategies, while providing insight and guidance in all aspects of management sales and corporate direction. Since his appointment in May 2012 Marty has proven his worth by doubling the company’s revenue.

Kevin Brown

Kevin is the Managing Director at VoxPeritus, a company that provides consulting services to the customer experience / Customer services industry. VoxPeritus, founded by Kevin, specializes in voice channel and voice user interface design and optimization. Kevin is a proven technology, customer service, operations and sales leader with 30 years of global experience designing and managing Customer Experience solutions. Possessing a unique and diverse background supporting healthcare, transportation, automotive, retail, financial and high tech

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 8

Page 9: An introduction to VBX Aug 2014

industries. Kevin has previously served as the Chief Architect of Speech solutions and Senior Enterprise Contact Center Architect for Hewlett Packard.

Graeme Riley

Graeme is a commercially aware leader with substantial experience in the aviation, software and finance industries. Over the last 25 years Graeme has held positions ranging from operational Air Traffic Control, Project Management, Subject Matter Expert and IT security (both operational and management) through to senior management positions with responsibility for large global teams. He has worked extensively across Europe, Middle and Far East, particularly China, as well as the Australasian region. Graeme holds a degree in Computer Science as well as a Certified Information Systems Security Professional (CISSP) qualification. He is also trained in Financial Management and Computer Forensics. Graeme is currently based in Europe and works as Head of IT for a Global Swiss Insurer with responsibility for the end-to-end IT service for the Middle and Far East regions.

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 9

Page 10: An introduction to VBX Aug 2014

THE SPEECH RECOGNITION MARKET

Speech recognition companies fall into one or both of the following niches:

● Application Developers - Companies that create and sell applications that have a speech recognition component, e.g., Speech Recognition in your car, computer speech recognition dictation system; automatic telephone answering systems (or Interactive Voice Response, IVR). These companies for the most part do not own their own ASR capabilities but license ASR technology from others.

● Core Technology Providers (i.e., Verbyx) - Companies that develop the underlying speech recognition engines used by application developers.

While there are thousands of companies that fall into the application developer category, there are only a handful of companies that can provide core ASR technologies. Details on major competitors can be found in Appendix A.

Speech recognition technologies are becoming integral parts of products and services that span an ever increasing array of industries. This maturing industry owes much of its growth to advances from the critical triad of automatic speech recognition (ASR), text-to-speech (TTS), and voice biometrics technologies.

Companies across all sectors, seeking a competitive edge that will differentiate them in an increasingly crowded business environment, want products that can help them retain as well as grow their customers. Brokerages, airlines, and banks rely on speech recognition functionality to not only enhance their customer contacts, but also to comply with security requirements dictated by law and the security-conscious expectations of customers.

To address the rapidly growing mobile traffic demand of such highly developed regions as North America and Europe, speech recognition providers are partnering with manufacturers who are incorporating voice-activated multimodal options into their products. These applications do everything from help drivers navigate to their destination and workers voice-pick warehouse inventory to aid doctors automate medical transcription processes and allow Web users to browse by voice commands.

Marketers with a watchful eye are not only training their sights on the pent-up product demand of growing Asia-Pacific populations, they are also factoring in the potential of the emerging middle class in Latin America when they develop their marketing strategies.

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 10

Speech recognition is finding acceptance in increasingly

demanding applications such as aircraft cockpits.

Page 11: An introduction to VBX Aug 2014

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 11

Page 12: An introduction to VBX Aug 2014

A BRIEF DISCUSSION OF ASR TECHNOLOGY

There is a common perception that Automated Speech Recognition (ASR) does not work outside of controlled domains. The typical experience of the general population is that of the telephone automated voice system. We have all experienced at some point just how terrible that interaction can be.

Speech engine vendors continue to take a common approach to the underlying technology. It is this narrow vision that has led to the stagnation of the technology. The approach taken by current vendors has remained largely unchanged for over 20 years.

Speech recognition involves an extremely complex core process of determining what has actually been spoken by the user. This core process is fundamentally flawed and produces high error rates. The unique nature of how each person speaks only adds to the complication. ASR systems use a number of techniques to take the core recognition results and produce what is in reality a highly educated guess. Increasing accuracy has required extremely large amounts of training data to be processed in order to provide a better “guess”. This has only provided small incremental improvements in accuracy as the core recognition is still very poor.

Natural Language Processing

The industry has shifted to covering up the mediocre performance of ASR by adopting increasingly complex Natural Language Processing / Understanding (NLP/NLU) (artificial intelligence AI) systems. AI has been a very important evolution in voice enabled application, and no doubt may of today’s apps would not be possible without AI. However, AI can only do so much to overcome poor recognition accuracy. AI is typically unique to each application, with the best systems requiring many complicated domain specific rules to be created and programmed. Although AI can give the appearance of improved usability, its bespoke nature adds time, complexity and cost to each project. Any improvement at the ASR level can only result in improved AI capabilities. AI complements speech recognition, it does not replace it.

AI is a valuable component in the development of usable applications but it is a field best left to the many companies that specialize in this domain. Verbyx does not develop AI technology and will make use of the best AI systems industry can provide.

Language, Dialect and Accent Support

The second significant ASR issue is one of language, dialect and accent support. Existing technology performs significantly better for those languages for which large amounts of training data can be provided but poorly or not at all for less common languages, regional dialects and individual accents. Thousands of hours of accurately transcribed audio samples are required for each additional language. The cost of preparing this data makes supporting less common languages and dialects uneconomical for all current technology providers. The result being that the benefits of ASR are not available outside of a small percentage of languages.

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 12

Page 13: An introduction to VBX Aug 2014

THE VERBYX DIFFERENCE

Our approach is simple but in practice the execution is extremely complex. This is important as it places a large barrier to entry to any competition that intends to try and follow. Verbyx will offer an ASR that performs at a level well beyond anything else in the market. It will allow Verbyx to rapidly release products to new markets at significantly reduced costs to our customers and partners.

Verbyx have been approach by numerous organizations who are interested in using the Verbyx VRX speech engine. The Verbyx strategy is to win a handful of key contracts with organizations that we believe we can effectively serve and that in turn can provide a mutually beneficial relationship.

It would be understandable to think that speech recognition technology has recently made significant breakthroughs in accuracy and usability. Recent high profile marketing by Apple with the Siri personal assistant on the iPhone has generated enormous amounts of press and web activity telling us all how this time it’s the “real thing”. As impressive as Siri is, it is in fact a combination of speech recognition technology that has remained largely unchanged for over 20 years and an increasingly complex natural language processing system.

The Verbyx ASR - VRX

Verbyx has already developed technology that can compete comfortably with existing vendors using the traditional approach. However, Verbyx is not interested in being just another ASR vendor.

The Verbyx ASR engine (VRX Prime) supports both the constrained grammar model approach and statistical language model approach to speech recognition. Furthermore it supports both methods within a single version of the ASR. In both internal and external independent tests, it compares very favorably to the dominant industry players. VRX has been adopted by a $50 million technology company as the cornerstone of its speech recognition strategy. Verbyx was also selected to develop a new approach to key-phrase spotting for speech analytics. The Verbyx SKIP process is capable of processing audio at well over 50x real time.The decision to replace what was previously considered best in class offering was based on both performance and value.

VRX Developments

While our competition continue to make incremental improvements through the use of increasingly large training data samples, Verbyx have attacked the issue at a “molecular” level. We asked ourselves the question, instead of trying to develop a more effective model for handling large data, how can we eliminate the large data requirement. Secondly what are the main failures of the existing science and what can be done to eliminate those failings? We deliver a product that is sharply focused on the core phoneme recognition. Verbyx scientists use

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 13

Page 14: An introduction to VBX Aug 2014

their deep understanding of physics, mathematics, linguistics and biology (aural physiology, neuroscience) to deliver genuine ground breaking improvements.

The VRX Difference

Verbyx from the outset decided that the solution was to fundamentally change the science and address the weaknesses of speech recognition. We have already made a number of real and practical breakthroughs in ASR Technology. The first breakthrough relates to the cost and time taken to develop a speech recognition system for new languages and accents and that is the voice model.

The voice model (also known as an acoustic model) is a component used in speech recognition that represents how people pronounce sentences, words and pieces of words called phonemes. It is critical to the accuracy of an ASR. A voice model is required for each new language and in most cases for new accents of the same language, e.g., the voice model for a US English speaker is not the same as a voice model for a United Kingdom English speaker. Additionally, a voice model created to work well with a native

of New York City, will not work as well for a native of New Orleans.

Voice models require hundreds and preferably thousands of hours of specialized audio files for creating or training the voice model. Creating a voice model can cost hundreds of thousands of dollars and take many months of computer processing. The audio files are often not

available in sufficient quantity. This is the reason speech recognition works better for common languages but is poor or non-existent for less popular languages. It is also the reason accent specific speech recognition is seldom available, it is just not economical.

Verbyx has through detailed scientific study, discovered a new law of speech recognition that has eliminated this issue and opened up a significantly larger market to our ASR technologies. Our standard voice model (created with less than 20 hours of training data), produced an error rate 60% lower than the leading competing product in an independent customer side by side test. Of note were the facts that the competing product had undergone a customer evolution over 12 years in fielded applications and the model had been trained with thousands of hours of data. Our latest technology was recently used to create a voice model using less than 60

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 14

Recognizer

Language Model

Voice Model

Langauage Model – Describes how words can be combined to form longer sequences of words

Voice Model – A representation of how people speak

The recognition process is dependent on the language and

voice models.

Page 15: An introduction to VBX Aug 2014

minutes of training audio and in only a few minutes of computer processing time. Such a capability opens new addressable markets which were previously not viable.

The second hurdle is the underlying poor accuracy of the ASR. While some applications appear to operate well for some people, these applications required significant expenditure in natural language processing and other techniques to overcome the accuracy limitations. All speech recognition engines produce an error rate of approximately 50% at the core of the recognition process (phoneme recognition). Developers spend significant sums to introduce capabilities that allow the ASR to make its best guess from the inaccurate information that it is given. Verbyx is currently developing new speech recognition technology using new science that is expected to reduce the errors at the core of the process by possibly as much as 80%. Any improvement in the currently stagnant process has an exponential effect on the performance of the application, resulting in a dramatic reduction in the cost of development and deployment.

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 15

Page 16: An introduction to VBX Aug 2014

VRX CAPABILITIES AND FEATURES

VRX prime is a standards based ASR that supports both constrained grammar and statistical language models. Both approaches are available within the same common ASR.

Constrained Grammar – Commonly used where it is possible to define all of the specific phrases that need to be recognized. Typically much higher accuracy than statistical language but requires more work upfront from the application developer. Example applications, telephone IVR, command and control, Automobile control systems

Statistical Language – Commonly used where free or conversational speech recognition is required. It does not require a comprehensive list of supported phrases to be defined but is generally less accurate than constrained grammar. Example applications, voice mail transcription, call analytics

Access to VRX and its associated features is through a C++ or C API. Existing deployments of VRX are windows based, all VRX development and design supports the port of VRX to other operating systems, with minimal effort and modification.

Primary ASR Features

Key phrase Identification

Dynamic Grammar Rules -VRX supports dynamic grammars to add rules to static pre-

loaded grammar

Partial Results - VRX provides partial results during the real-time recognition process.

Grammar Weighting - VRX supports grammar rule weight to adjust the relative rule

weight

W3C ABNF grammar format - VRX grammar compiler supports W3C ABNF grammar.

Auto Flat – VRX automatically determines the most optimal approach to flattening and

not flattening grammar for best performance. Additionally VRX Auto Flat is capable of

automatically flattening specific parts of the grammar.

Auto-pronunciation - VRX provides auto-pronunciation that is used by the grammar

compiler when words are not defined in dictionary.

N-best Results - VRX returns a configurable number of hypothesis results called the n-

best results ordered by confidence level.

Natural Language Interpretation - Verbyx VRX supports W3C SISR tags attached to

grammar rules.

Confidence Score - VRX returns a confidence score at word level for each result.

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 16

Page 17: An introduction to VBX Aug 2014

Acoustic Model Tuning - Verbyx can train new acoustic models and tune or customize

existing one. This allows performance optimization:

• for a population with specific accents

• when noise conditions are more audible

• for specific words in a vocabulary

Multi-word Pronunciations - VRX supports multi-word definitions across multiple sub-

grammar rules.

Crossword Triphone Handling – VRX includes automatic handling of the difficult process

of recognition caused by co-articulation

Voice Model Language Support

As previously mentioned, Verbyx has developed unique intellectual property that allows training of new voice models with limited training data in just a few days. In addition we can adapt existing models for improved performance using our voice model adaptation process. This process takes additional training data and within a few minutes produces and optimized version of the model. Verbyx can very quickly provide support for the following acoustic models

English, UK, US, Indian, Canadian, Australian and any variant of non-native English

accent

French

German

Italian

Dutch

Portuguese

Turkish

Russian

Ukrainian

Spanish

For those languages for which we do not have existing models or training data, models can be generated with as little as 10 hours of transcribed audio and an appropriate dictionary

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 17

Page 18: An introduction to VBX Aug 2014

VERBYX INTELLECTUAL PROPERTY

To date Verbyx has created more than 20 patentable and commercially viable technologies and ideas. Most of our evolving technology is possible through our discovery of 3 new laws of speech recognition. The discovery and proof of these laws has provided a fundamental breakthrough in the way we think about and implement speech recognition. A partial list of those technologies is given here.

Description StatusShu’s Laws – Discovery and proof of 3 new fundamental laws of speech recognition

Complete

ASR Decoder (VRX) – New design and implementation of constrained grammar recognition

Complete

Micro Voice Models - A method for recognizing with high accuracy, words that are typically difficult for and ASR, e.g., Yes/No, Digits, Alpha Beta spelling

Complete

Voice Model Merging – A method for merging two voice models each with different base phoneme sets

Complete

Synthetic Voice Model – A method for creating voice models with limited quantity of training data

Complete

SKIP (Spoken Key-Phrase Identification Process) – A method for identifying key-phrases in very fast time

Complete

Bi-Phone Based Voice Model – New tools and methods for creating BiPhone level voice models

Complete, further optimizations possible

Rapid Voice Model Adaptation – Very fast method for updating voice models with additional training data

Complete, further optimizations possible

New Voice Model Training Method – Based on discovery of 3 new laws of speech recognition.

Work in Progress

Voice Model Creation with Limited Training Data – New method for creating voice models with very limited training data

Complete, further optimizations possible

Speaker Independent Models – New method for creating user independent models for voice biometrics speaker identification

Work in Progress

Phoneme Distance Calculation – New method for calculating phoneme distance matrix for ASR performance optimization

Work in Progress

Voice Biometrics – New methods for combined speech recognition, speaker identification and speaker verification

Work in Progress

Confidence Scores – New methods for calculation of confidence scores at the utterance, word and phoneme levels

Work in Progress

Very Fast Time Recognition - Method for recognition at the phoneme level in very fast time.

Work in Progress

Keyword Identification - Method for identifying keywords in large audio streams in very fast time

Work in Progress

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 18

Page 19: An introduction to VBX Aug 2014

Spoken Language Learning - New methods of measuring phonetic differences between student and reference pronunciation

Early Stage

Phoneme Segmentation - New method for extracting critical phoneme segmentation features for significantly improved accuracy

Early Stage

Noise Handling – New method for improving speech recognition performance in high noise environments

Early Stage

Multi Language Recognition – New method for handling Multilanguage recognition in a single instance of a decoder

Early Stage

Bi-Phone Based Recognition - New approach to Bi-Phone level based speech recognition

Early Stage

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 19

Page 20: An introduction to VBX Aug 2014

OUR BUSINESS

Adoption of speech recognition can be a complex process to many organizations that in turn place a heavy burden on Verbyx resources. This burden distracts us from our primary technology goals. Verbyx has chosen for the time being to be very selective when it comes to choosing new clients. This approach has been successful for the past 3 years.

The speech recognition industry is on the cusp of an explosion in the adoption of the technology into many more devices than we find today. The market for Verbyx will continually expand as the technological innovations become incorporated into the core VRX speech recognition engine.

The Verbyx philosophy to business development is a focus on key players in speech recognition vertical markets. We do not intend to build speech enabled applications but license our technology to those that do. This approach minimizes the need for Verbyx to interact with potentially thousands of customers and as a consequence permits us to have a very low cost base for the organization.

CUSTOMERS

Even at this relatively early developmental stage VRX is on par or better than that considered the best speech recognition system currently available. Verbyx is pursuing several key accounts who have expressed a large degree of interest in licensing the VRX speech engine. These accounts are typically established companies’ currently licensing speech engines from competing vendors, that seeking alternate solutions.

Verbyx has signed Non-disclosure agreements with a number of these organizations and as such, detailed information relating to these clients and potential clients cannot be provided. Brief descriptions of a number of active opportunities are listed below in a manner that does not contradict these agreements. It is important to note that all of the opportunities listed below can be satisfied with the same base Verbyx technology. Some of the applications do require the addition of some supplementary features but on the whole, Verbyx needs only to develop a single speech engine to serve multiple speech application markets.

CURRENT CLIENTS

$50 Million Technology Company

This client previously licensed speech recognition engines from two separate vendors, Nuance and SRI. Verbyx has a signed contract that guarantees revenue with considerable upside potential. The company has chosen Verbyx as its partner and VRX as the cornerstone of its speech recognition strategy. VRX Prime has been delivered and has being integrated with the client’s products. This company is set to see significant speech related growth opportunities from new industry recommendations and a growing reputation as the go to guys in their existing

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 20

Page 21: An introduction to VBX Aug 2014

markets. They have developed a strong reputation for implementation of speech recognition in simulation and training as well as command and control applications for military and civilian customers, in particular aircraft cockpit avionics integration.

Computer Gaming Company

Verbyx has a contractual agreement to provide VRX for a small niche computer gaming company. This company has previously been successful in developing very popular add on products for Microsoft Flight Simulator. Verbyx will also provide consulting software development services for the integration of speech components within the game.

Independent Speech Vendor Reseller Agreement

Verbyx has signed a reseller agreement with this company. Our partner is well known in the speech industry with a particular focus on telephone Interactive Voice Response (IVR) systems.

Consulting Services

Verbyx has already developed a reputation as an expert in the industry. We offer free consulting services to start-ups and consider this effort to be a key part of our marketing campaign. Many of these companies have expressed a desire to switch to VRX.

Call Analytics

Call Analytics is a growing industry in which speech recognition is used to analyze the content of customer service calls and provide valuable metrics and feedback.

Verbyx is working closely with a leading organization in the call analytics market. We have a contract to develop very fast time keyword analysis system that has generated considerable excitement. The solution has been delivered and is in pre-production testing. Verbyx were called in when existing vendors were unable to provide a solution for their specific needs.

Language and Voice Conversion

Verbyx has developed and delivered key enabling technologies that allow this company to provide unique solutions for voice communications services.

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 21

Page 22: An introduction to VBX Aug 2014

APPENDIX A - Chang-Qing Shu Ph.D.SUMMARY

9 U.S. patents (5 issued and 4 in review) plus 15 algorithms in automatic speech recognition (ASR).

Over 15-years ASR industry experience in both recognizer-level and application-level algorithms, source code and interface development. Developed 19 ASR products including ATC, Telephony, Education, Automobiles and stock trading.

Strong capability of improving recognition accuracy, memory usage, and CPU usage. Expert in audio data mining in preparation of pure-training data from raw data. Strong capability of creating specific language model, training acoustic model and tuning

decoder for various applications/ASR engines. Expert in designing and running ASR experiments for algorithmic development and

system diagnosis. Software development using C, C++, Java, VC++, Perl, Tk/Tcl, VoiceXML, HTML,

JavaScript, Lisp, FORTRAN, BASIC, PROLOG, MATLAB, and Shell scripting Over 26 publications in journals and books. Ph.D. and M.S. in Physics, and M.S. in Computer Science.

PATENTS Chang-Qing Shu, “Selected phoneme rejection in automatic speech system”, US Patent

#6016470, January 2000. Chang-Qing Shu and H. Shu, “System and methods for implementing cepstra-based

segmentation in speech recognition systems,” US Patent #6959278 B1, Oct 25, 2005. Chang-Qing Shu, “Using word confidence score, insertion and substitution thresholds for

selected words in speech recognition”, filed 24 Oct 2008, S/N 12/258,098 Chang-Qing Shu, “System method and for making user dependent language model ”,

filed 3 March 2009, S/N 12/396,933 Chang-Qing Shu, “System and method for training an acoustic model with reduced

feature space variation”, filed 30 March 2009, S/N 12/413,896 Chang-Qing Shu, “Phonetic distance measurement system and related methods”, filed

22 July 2009, S/N 12/491,765 Chang-Qing Shu, Han Shu and John Merwin, “Integrated language model, related

system and methods”, filed 9 Feb 2010, S/N 12/701,788 Chang-Qing Shu and Dezhi Liao, “System and Methods for Automatic Microphone

Volume Setting”, filed 13 Jul 2010, S/N 12/835,440 Chang-Qing Shu, “System and Method for Merging Audio Data Streams for Use in Speech

Recognition Application”, filed 20 Aug 2010, S/N 12/860,245

HONORS Have been called a “PIONEER” in motion analysis in the field of image processing. BBN Publication Awards in 1996, 1997, 1998, and 2000. Recipient of one of first 50 Ph.D.s ever awarded in mainland China.

SKILLSProgramming languages: C, C++, Java, VC++, Perl, Tk/Tcl, VoiceXML, HTML, JavaScript, Lisp,

FORTRAN, BASIC, PROLOG, MATLAB, and Shell scripting.

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 22

Page 23: An introduction to VBX Aug 2014

Programming tools: ClearCase, CVS, GDB, Purify, and Quantify.

Platforms: UNIX/LINUX and MS Windows.

Databases: SQL, Sybase, ORACLE and Access.

EDUCATION Continuous speech recognition course, Massachusetts Institute of Technology, 1995. Postdoctoral fellowship in image processing, New Jersey Institute of Technology, 1989-

1991. Ph.D., M.S. in Physics, Institute of Physics, Academia Sinica, China, 1985 and 1981. M.S. in Computer Science, Queens College, City University of New York, 1988. B.S. in Physics, Nanjing Normal University, Nanjing, China, 1966.

EXPERIENCEAdacel System Inc. Orlando, FLSenior Scientist 2007 – 2011

Invented 7 US patents and filed them to USPTO. Created an algorithm with O(n) for decoding the FSG grammar. Created an algorithm for speeding up the decoder processing and achieved xRT <0.1. Created an tool with creating an algorithm to segment an audio with ~10 hours long and

generate a series of single command wave files with ~20 second long per wave. Reduced an application grammar size from 9! to 45 with a single reasonable

assumption. Invented an algorithm and created a tool with the algorithm for microphone volume

control automatically. The tool can be used in all ADACEL speech recognition products to increase recognition accuracy. It also made the user interface more friendly for users.

Convergys Reston, VASenior Speech Recognition Scientist 2005 –2007

Successfully reduced ASR (absolute) error rate by up to 20% in 15 IVR products in NUANCE or SpeechWorks ASR engine.

Created 8 tuning reports with up to 5% increased for the business performance index in IVR applications.

Created/Corrected over 600 grammars for various applications such as banks, governments, airlines and insurances.

Carnegie Speech Company Pittsburgh, PASpeech Recognition Scientist/Engineer 2002 –2005

Corrected two major algorithm errors in the baseline pitch extraction engine. The number of incorrectly hypothesized zero-pitch frames reduced from 12% to 0.013%.

Modified the SPHINX speech recognition engine for the following: a) added dynamic language model in the decoder; b) added rejection language model; c) added two different rejection algorithms; d) created tools for building error rate statistical table; f)

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 23

Page 24: An introduction to VBX Aug 2014

modified the interface in VC++ to accommodate the changes in SPHINX. Created a batch test system in Perl and VC++ for regression testing and

tuning/optimizing of the production system. The tuning/optimizing tools resulted in an error rate reduction of more than 20% for the production system.

Designed and implemented a waveform splicing algorithm to synthesize incorrectly-pronounced waveforms from correctly-pronounced waveforms. This procedure reduced the need for manually collecting incorrectly-pronounced waveforms, and saved data collection cost of more than 10 person-years.

Created two web versions of educational ASR application using Tk/Tcl and Java.

Speech Solution Group, BBN Technologies, Verizon Cambridge, MAResearcher/Engineer 1994 – 2001

Invented and developed new capabilities for Speech Recognition Engine: Patented novel phoneme loop rejection: correctly detected 95% of noise and out-of-

grammar sentences while only falsely reject 0.5% of in-grammar sentences. Sole inventor of this patented algorithm.

Recognition server: created new tools to generate application-independent acoustic models and dynamic language models; enabled decoder to utilize the acoustic model and hot-swap the language model; facilitated a single decoder to be shared by multiple applications, reducing overall hardware cost for applications.

Algorithm for producing phoneme distance matrix (PDM): language-specific PDM is a necessary component for word loop rejection. Invented the algorithm to generate the language-specific PDM by mapping it from a phoneme confusion matrix, and eliminated the need for a human expert in every language.

Bigram language model builder: with less than 30 lines of source code modification, expanded Hark’s language model capability from finite state grammar to bigram, increasing the number of possible applications.

Software release for Hark 3.1 and 4.0: enhanced the Hark decoder and grammar generator for robustness, and fixed a major bug with large silence penalty under time pressure as part of the release team. Ensured an on-schedule release.

Improved Hark for recognition accuracy, memory usage, and running time: Language model optimizer: developed a new algorithm which combines nodes both at

the word and triphone level to reduce the overall size of the language model. Applied this algorithm for the application of routing calls by person’s name, reducing the language model size by 80% while maintaining the same recognition accuracy.

Efficient lattice n-best decoder: created jointly with two others an efficient decoder with the ability to produce n-best lists. With the aid of a profiler, discovered the problematic data structure word process array, replacing it with a word processor list reduced the processing time by as much as 95%.

Crossword grammar builder: enabled usage of a crossword acoustic model which reduced the recognition error by more than 10% relative.

Bug fixes for Hark: discovered and fixed more than 30 important bugs throughout all four major components of Hark.

Created data mining algorithm/procedure for preparing training data from raw data:

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 24

Page 25: An introduction to VBX Aug 2014

Time and transcription matching algorithm for directory assistance system: invented an algorithm to correct erroneous time alignments between audio recording and transcription, resulted in a time saving of 6 human months for transcribing the same audio data.

Query segmentation for directory assistance system: invented a procedure that segments query utterances by their city and listing portions, improved segmentation accuracy from 80% to 96%.

Developed application-specific language model: Keyword plus filler language model for directory assistance system: designed and

implemented the new language model, enabled keyword-based recognition and reduced both the false acceptance and the false rejection error rate by more than 10%.

Noise robustness for voice-controlled appliances in automobiles: applied phoneme loop rejection to increase noise robustness, reduced recognition error rate by 18% relative.

Automating credit card customer service: designed and constructed a language model for recognizing digits while ignoring non-digit words and noise, reducing recognition error of credit card number sequences by more than 16% relative.

Automating order filling on stock trading floors: identified and improved recognition of the most frequently confused word by modifying the phonetic spelling, reduced the overall error rate by 10% relative.

Image Business Systems, Inc. New York, NYResearcher/Engineer 1991 - 1994Designed and implemented high-throughput OCR check-reading machines for commercial banks. Designed and implemented 2 of 11 total components: OCR module and form-filling module. Also implemented 3 other components: data verification interface, form-template generation module, and imaging processing module.

Image Processing Laboratory, New Jersey Institute of Technology Newark, NJPostdoctoral Fellow 1989 - 1991Helped to establish the lab from its beginning, identified research directions, developed and tested new image processing algorithms, and jointly supervised M.S. and Ph.D. students.

PUBLICATIONSConference Papers Speech Recognition1. Chang-Qing Shu, “Rejection grammar in speech recognition system,” Proceedings of

International Conferences of Signal Processing,” pp.1187-1190, 1998.

Journal Papers Image Processing1. J. Pan, Y. Shi, and Chang-Qing Shu, “Feedback technique in optical flow determination,”

IEEE Transactions on Image Processing, vol.7, no.7, pp.1061-1067, July 1998.

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 25

Page 26: An introduction to VBX Aug 2014

2. Y. Shi, Chang-Qing Shu, and J. Pan, “Unified optical flow field approach to motion analysis from a sequence of stereo images,” Pattern Recognition, vol.7, no. 12, pp.1577-1590, 1994.

3. Chang-Qing Shu and Y. Shi, “Direct recovering of Nth order surface structure using UOFF approach,” Pattern Recognition, vol.26, no.8, pp.1137-1148, 1993.

4. Chang-Qing Shu and Jacob Rootenberg, “Minimum upper bound on number of tests for detecting faults in an unaugmented programmable logic array,” International Journal of Systems Science 22, pp.2275-2283, 1991.

5. Chang-Qing Shu and Y. Shi, “On unified optical flow field,” Pattern Recognition, vol.24, no.6, pp.579-586, June 1991.

Conference Papers Image Processing1. J. Pan, Y. Shi and Chang-Qing Shu, “A Kalman filter in motion analysis from stereo image

sequences,” Proceedings of IEEE 1994 International Conference on Image Processing, vol.3, pp.63-67, Austin, Texas, USA, November 1994.

2. J. Pan, Y. Shi and Chang-Qing Shu, “A correlation-feedback approach to optical flow determination,” Proceedings of IEEE 1994 International Symposium on Circuits and Systems, vol.3, pp.33-36, London, UK, May 1994.

3. Y. Shi, Chang-Qing Shu and J. Pan, “Unified optical flow field approach to motion analysis from a sequence of stereo images,” Proceedings of the 8th IEEE Workshop on Image and Multidimensional Signal Processing, pp.230-231, Cannes, France, September, 1993.

4. Y. Shi, Chang-Qing Shu and M. Salhi, “Direct recovering of Nth order surface structure using UOFF approach,” Proceedings of IEEE 1992 International Symposium on Circuits and Systems, pp.1475-1478, San Diego, CA, USA, May 1992.

5. Y. Shi, Chang-Qing Shu and Y. Xie, “Motion analysis based on a four-frame rectangular model,” Proceedings of the 1992 Conference on Information Sciences and Systems, pp.138, Princeton University, Princeton, NJ, USA, March 1992.

6. Chang-Qing Shu, Y. Zhu, Y. Shi and C. Lu, “Recovering surface structure characterized by an Nth degree polynomial equation,” IEEE Seventh Workshop on Multidimensional Signal Processing, vol.5.7, Lake Placid, NY, USA, September 1991.

7. Chang-Qing Shu, Y. Shi, J. N. Pan and L. Zhou, “A new approach to 3-D position estimation based on the unified optical flow field,” Proceedings of 1991 IEEE Workshop on Visual Signal Processing and Communications, pp.83-86, National Chiao Tung University, Hsinchu, Taiwan, June 1991.

8. Chang-Qing Shu and Y. Shi, “Computation of motion from stereo image sequence using the unified optical flow field,” presented at SPIE's 1990 International Symposium on Optical and Optoelectronic Applied Science and Engineering, San Diego, CA, July 1990; and collected

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 26

Page 27: An introduction to VBX Aug 2014

in Applications of Digital Image Processing XIII, Andrew G. Tescher, Editor, Proc. SPIE 1349, pp.346-357, 1990.

9. Chang-Qing Shu, Y. Shi and C. Lu, “A new arrangement for motion estimation from a binocular image sequence using unified optical flow field,” Proceedings of the 1990 Conference on Information Sciences and Systems, p.444, Princeton University, NJ, USA, March 1990.

10. Chang-Qing Shu and Y. Shi, “Unified optical flow field,” Proceedings of the 1990 Conference on Information Sciences and Systems, p.445, Princeton University, NJ, USA, March 1990.

Journal/Book’s Papers-Physics1 Shu Changqing, Lin Lei and Wang Liangyu, “I-N-A Phase Diagrams of Liquid Crystals”,

Communicating Theoretical Physics, Beijing, vol.1, p.107, 1982.

2 Shu Changqing and Lin Lei, “Molecular Theory of Liquid Crystals”, Acta Physica Sinica, vol.31, p.915, 1982.

3 Lin Lei, Shu Changqing, Shen Juelian, P.M. Lam and Huang Yun, “Soliton Propagation in Liquid Crystals”, Physics Review Letter vol.49, p.1335,1982; vol.52, p. 2190 (E), 1984.

4 Lin Lei and Shu Changqing, “Soliton Propagation in Shearing Liquid Crystals”, Acta Physics Sinica, vol.33, p.165, 1984. [China Physics vol.4, p.598, 1984].

5 Shu Changqing and Lin Lei, “Theory of Homologous Liquid Crystals. I Phase Diagrams and the Even-Odd Effect”, Molecular Crystals and Liquid Crystals, vol.112, p.213, 1984.

6 Shu Changqing and Lin Lei, “Theory of Homologous Liquid Crystals. II Orientation Correlation Functions”, Molecular Crystals and Liquid Crystals, vol.112, p.233 ,1984.

7 Shu Changqing, Xu Gang and Lin Lei, “Temporal and Spatial Distribution of Director Angles in Soliton Experiments of Liquid Crystals”, Acta Physica Sinica, vol.34, p.88, 1985.

8 He Gang, Shu Changqing and Lin Lei, “Molecular Orientations and Optical Patterns of Rotating Nematics”, Molecular Crystals and Liquid Crystals, vol.124, p.53, 1985.

9 L. Lin, C.Q. Shu and G. Xu, “Comment on ‘On Solitary Waves in Liquid Crystals’",

Physics Letter, vol.109A, p.277, 1985.

10 Xu Gang, Shu Changqing and Lin Lei, “Multiple Scales Analysis of a Nonlinear Ordinary Differential Equation”, Journal of Mathematic Physics, vol.26, p.1566 ,1985.

11 Lin Lei, Shu Changqing and Xu Gang, “Generation and Detection of Propagating Solitons in Shearing Liquid Crystals”, Journal of Statistic Physics, vol.39, p.633 ,1985; 43, 391 (E) ,1986.

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 27

Page 28: An introduction to VBX Aug 2014

12 Shu Changqing and Lin Lei, “Solitons Generated by Pressure Gradients in Nematic Liquid Crystals”, Molecular Crystals and Liquid Crystals, vol.131, p.47 ,1985.

13 Liang Zhong-Cheng, Shao Ren-Fan, Shu Chang-Qing, Wang Liang-Yu and Lin Lei, “Variation of Velocities and Widths of Two-Dimensional Solitons with Pressure Gradients in Nematic Disc Cells”, Molecular Crystals and Liquid Crystals Letter, vol.3, p.113 ,1986.

14 L. Lam and Shu Changqing, “Nonlinear Waves in Liquid Crystals”, in Proceedings of the International Conference on Nonlinear Mechanics, Shanghai, October 28-31, 1985, edited by Chien Wei-Zang (World Scientific, Singapore, 1986), p.735.

15 Lin Lei and Shu Changqing, “Comment on "Nerve Propagation and Wall in Liquid Crystals”, Physics Letter A, vol.119, p.178 ,1986.

16 Shao Renfan, Zheng Shu, Liang Zhongcheng, Shu Changqing and Lin Lei, “Experiments on Ring-Shaped Solitons in Nematic Liquid Crystals”, Molecular Crystals and Liquid Crystals, vol.144, p.345 ,1987.

17 Shu Changqing and Lin Lei, “Pattern Formation in Thermal Convective Nematic Liquid Crystals”, Molecular Crystals and Liquid Crystals, vol.146, p.97 ,1987.

18 Xu Gang, Shu Changqing and Lin Lei, “Perturbed Solitons in Nematic Liquid Crystals Under Time-Dependent Shear”, Physics Review A, vol.36, p.277,1987.

19 C.Q. Shu, R.F. Shao, S. Zheng, Z.C. Liang, G. He, G. Xu and L. Lam , “Two-Dimensional Axisymmetric Solitons in Nematic Liquid Crystals”, Liquid Crystal, vol.2, p.717 ,1987.

20 L. Lam and C.Q. Shu, “ Chapter 3. Solitons in Shearing Liquid Crystals”, in Solitons in Liquid Crystals, edited by L. Lam and J. Prost (Springer, New York, 1992).

21 L. Lam, C.Q. Shu and S. Bodefeld, “Active Walks and Path Dependent Phenomena in Social Systems”, in Nonlinear Physics for Beginners, edited by L. Lam (World Scientific, River Edge, N.J., 1998)

Verbyx Inc. – Commercial in Confidence www.verbyx.com [email protected] Page 28