55
The future of robot assistants: Building a hands-free voice-controlled quadcopter Frau Amar, Pedro Academic Year 2015-2016 Director: EMILIA GÓMEZ GUTIÉRREZ Degree on Enginnering of Audiovisual Systems Treball de Fi de Grau

The future of robot assistants: Building a hands-free

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The future of robot assistants: Building a hands-free

The future of robot assistants: Building a

hands-free voice-controlled quadcopter

Frau Amar, Pedro

Academic Year 2015-2016

Director: EMILIA GÓMEZ GUTIÉRREZ

Degree on Enginnering of Audiovisual Systems

Treball de Fi de Grau

Page 2: The future of robot assistants: Building a hands-free

- 1 -

Page 3: The future of robot assistants: Building a hands-free

- 2 -

The future of robot assistants:

Building a hands-free voice-controlled quadcopter

Pedro Frau Amar

FINAL PROJECT

DEGREE ON ENGINEERING OF AUDIOVISUAL SYSTEMS

POLYTECHNIC SCHOOL UPF

JUNE 2016

Director:

Emilia Gómez Gutiérrez

Department of Information and Communication Technologies

Page 4: The future of robot assistants: Building a hands-free

- 3 -

Page 5: The future of robot assistants: Building a hands-free

- 4 -

Dedication

Every challenge needs self-effort. Every project needs compromise. Every goal is reached

with perseverance and hard work. No matter what you do, take the tools and knowledge

your environment gives you.

I want to dedicate my humble effort to my mother, who has always been looking for my

future and my well-being; to my uncle Rodney, who has always been interested on my

projects regardless of their nature; but I specially want to dedicate this project to my

beloved father, rest in peace. He taught me how to be a real engineer with concerns about

the future, and gave me knowledge in several fields. He shared with me a lot of hobbies and

helped me so many times with my projects.

Regardless of the final mark, this project is a success for me and has somehow a little bit of

my father on it.

We will never forget you and will always love you.

Page 6: The future of robot assistants: Building a hands-free

- 5 -

Page 7: The future of robot assistants: Building a hands-free

- 6 -

Special thanks to:

First of all, I would like to thank Mrs. Verónica Moreno. After my father’s death, she has

always been concerned about me and my academic career, and has always been very

helpful when I needed someone to guide me with my studies.

Secondly, I would like to thank my supervisor, Dr. Emilia Gómez. She helped me when she

was my teacher and guided me all along this project. She has been very attentive and

helpful and has always kept her office opened for me.

I would also like to thank Dr. Boris Bellalta and Dr. Antoni Ivorra as they also shared their

time with me to advise me on the steps of my project.

Finally, I would like to thank my mother, my father, my brother and my sister, the rest of

my family and my friends. They have always been interested about my trajectory and have

given me support on all my decisions.

Page 8: The future of robot assistants: Building a hands-free

- 7 -

Page 9: The future of robot assistants: Building a hands-free

- 8 -

Abstract:

The development and applications of drones has increased over the last few years. Their

uses are extensive and their versatility has no potential limits. The evolution of those

unmanned aerial vehicles causes a concern for the future. Although people are currently

using them for entertainment, new functionalities can be found. In this project we are

particularly interested in their potential to assist people with physical disabilities, we

propose then to transform the drone into a hands-free device using voice control. We show

potential applications for this solution and the challenges it involves. We build a 1 kg

quadcopter, let the user control it using her/his voice, and incorporate a voice synthesizer to

communicate the user relevant information and make it more human-friendly. The project

is framed on an increasing evolution of intelligent systems and robotics.

Resumen:

El desarrollo y aplicaciones de los drones ha incrementado estos últimos años. Sus usos son

extensos y su versatilidad no tiene límites potenciales. La evolución de estos vehículos

aéreos no tripulados crea inquietudes de cara al futuro. Aunque actualmente la gente está

usando los drones para su entretenimiento, nuevas funcionalidades pueden ser halladas. En

este proyecto nos interesa particularmente su potencial para asistir a personas con

discapacidades físicas, por lo que proponemos transformar el dron en un dispositivo manos

libres usando control por voz. Para ello nos enfrentamos a una serie de retos tecnológicos

que establecerá un rango con diferentes aplicaciones. En el proyecto hemos construido un

cuadricóptero de 1 kg, que el usuario puede controlar mediante su voz, e incorporamos un

sintetizador de voz para comunicar al usuario información relevante. El proyecto pretende

contribuir a la evolución actual de sistemas inteligentes y robótica.

Page 10: The future of robot assistants: Building a hands-free

- 9 -

Page 11: The future of robot assistants: Building a hands-free

- 10 -

Preface

With the omnipresence of camera-carrying multicopters at trade shows, we just can ask

ourselves how huge the drone industry will become and how long it will take to get there.

Multicopter technology is clearly focusing on cinematography even if it is still in its early

ages. This can be seen in the improvement of image-stabilization and the fall of the prices.

Nowadays, enterprising shooters have gotten some amazing footage by flying tiny

multicopters into places that no other machine can arrive because of its unsafety.

In addition, companies are raising funding because of the future of their product.

Multicopters are becoming more powerful and acquiring new skills. Skills among which we

can see hands-free technology appearance.

Most of the actual commercial multicopters include a GPS transceiver which is used to geo-

localize the device in the 2D space. Now, enterprises such as DJI and Parrot are improving

their technology by including simple image recognition algorithms in their mobile apps for

multicopter controlling. Those are first steps to hands-free controlling of this technology

and this is what we expect to achieve with this project.

Regarding this project and the challenges it involves, we have three major branches of

conflict. First of all we have to face all the “building the quadcopter” issues including all

the research and investigation that has to be done previous to the construction. Then,

programming and first tests of the quadcopter with all the issues related to quadcopter

stability (and later modification of drone hardware if needed). And finally everything

related to programming, testing and modifying the speech controller, including the training

of the recognizer.

Page 12: The future of robot assistants: Building a hands-free

- 11 -

Page 13: The future of robot assistants: Building a hands-free

- 12 -

Summary

Abstract………………………………………………………………………………………….8

Preface………………………………………………………………………………………...10

Figures List………………………………..…………………………………………………..14

Tables List……………………………..………………………………………………………14

Chapter 1. INTRODUCTION, MOTIVATION AND CONTEXT………………………...16

1. INTRODUCTION………………..……………………..………………………...17

1.1 - Context………………………………………………………..……………17

1.2 - Personal motivation………………………………………….……………18

1.3 - Goals of the project……………………………………………….……….19

1.4 - Structure of the report……………………………………………….……19

1.5 - A little bit of history…………………………………………….…………..20

1.6 - The culmination of an engineering degree……………..……………...20

1.7 - Requirements and challenges…………………………………………...21

Chapter 2. BUILDING THE QUADCOPTER..…………………..………………………..22

1. THE COMPONENTS………………………..……………………………...……24

1.1 - The processor…………………………………………………..………….24

1.2 - The Frame………………………………………….……………………...25

1.3 - Motors, Propellers and ESC’s…………………………………………...25

1.4 - The Gyro…………………………….……………………………………..27

1.5 - Others………………………………………………..……………………..28

2. BUILDING THE QUADCOPTER..………..…………………………...………..28

2.1 - Electronic circuit…………………………………….……………………..28

2.2 - The program……………………………………………..………………...28

2.3 - Testing and changes………………………………………….…………..29

2.4 - Future work………………………………………………..……………….31

Chapter 3. THE SPEECH CONTROLLER………………………………..……………...32

1. INTRODUCTION…………………..…………………………..………………...33

2. SPEECH RECOGNITION..……………………………..……………………….34

2.1 - Recognizing the voice……………………………………….……………34

2.2 - Options…………………………………………………….……………….35

2.3 - Tests and Fails…………………………………………………….………36

2.4 - Dictionary……………………………………………….………………….37

2.5 - Conclusions………………………………………….…………………….38

3. SPEECH GENERATION…………………………………………………..…….39

3.1 - Introduction…………………………………………………….…………..39

3.2 - The synthesizer………………………………………………….………...40

3.3 - Conclusions and future work……………………………………….…….40

4. PROGRAMMING AND TRAINING……………………………………….…….41

Page 14: The future of robot assistants: Building a hands-free

- 13 -

Chapter 4. SYSTEM INTEGRATION…………………………………….…….………....44

1. ARDUINO AND THE SPEECH CONTROLLER………….………...…………45

1.1 - The WiFi transceiver……………….………………………..……………45

1.2 - Linking Arduino with external software……………………….…………46

2. THE FINAL CONTROLLER…………………………….………..….…..………48

2.1 - Testing……………………………………………………….…..…………48

2.2 - Fixing problems…………………………………………….…...…………48

2.3 - Expected final result and future work…………………………...………48

Chapter 5: CONCLUSIONS……..……………………..…………..…….…………….…50

1. CONCLUSIONS………………………………………………………………….51

Bibliography.………………………………………………………..…………….…………52

Annex……………………………………………………………………………………….…54

Page 15: The future of robot assistants: Building a hands-free

- 14 -

Figures List

[1] Micro quadcopter...………………………………………………………………...17

[2] DJI phantom (similar dimensions with respect to project’s quadcopter)..................17

[3] Structure of the solution…………………………………………………………....23

[4] Motors rotating for stable up movement of the quadcopter. The opposite

configuration (i.e. all blue) would make the multicopter go down………...…..…..24

[5] Motors rotating for quadcopter to go forward. The opposite configuration would

make the multicopter go backward…………………..………………………….....25

[6] Motors rotating for quadcopter to go left. The opposite configuration would make

the multicopter go right……………...……………..……………………………....25

[7] Motors rotating for quadcopter to rotate left. The opposite configuration would

make the multicopter rotate right………………...………………………….……..26

[8] Quadcopter circuitry…………………………………………………………….….28

[9] Quadcopter seen from its side (avoiding landing gear)............................................30

[10] Quadcopter seen from the top………………………………………….…..30

[11] Full mounted quadcopter………………………………………………..….31

[12] Schematic showing the structure of speech processing in this project….…33

[13] Phases of the speech recognition……………………………………….......34

[14] Flow chart explaining the behaviour of the whole solution……………......37

[15] Conceptual behaviour of speech synthesis…………………………….…...39

[16] Flow chart explaining the structure of the program…………………..……41

[17] ESP8266 module mounted on the quadcopter………………………....…..45

[18] Including the WiFi transceiver to the circuit…………………………….....46

Tables List

[1] Recognizer APIs comparison table………………………………………….……..36

Page 16: The future of robot assistants: Building a hands-free

- 15 -

Page 17: The future of robot assistants: Building a hands-free

- 16 -

Chapter 1.

INTRODUCTION, MOTIVATION

AND CONTEXT

Page 18: The future of robot assistants: Building a hands-free

- 17 -

1. INTRODUCTION

1.1 - Context

Nowadays, we are all surrounded by technology. In the last few years, a phenomenon has

emerged and invaded our world, our homes, and even sometimes, our intimacy. This

phenomenon is commonly known as “drone”. A drone is an Unmanned Aerial Vehicle

(from now on UAV), this means that there is not a pilot aboard, and then, it may have

different kinds of autonomy going from a remote control operator to a fully autonomous

computer-driven machine.

We can find several types of drones: Military airplane-shaped drones, quadcopters,

hexacopters, octocopters, decacopters and even dodecacopters. Along this project, we will

focus on multicopters, more precisely on commercial quadcopters.

In the present, the use of those commercial multicopters has expanded as they are cheap,

reliable and they may satisfy several human needs such as aerial view, or even become a

radio control toy. Anyone can acquire a multicopter at any price and going from micro

quadcopters (approximately 40€) to big multicopters (between 150€ and 4000€

approximately depending on the brand, specifications, accessories, etc.).

Figure 1

1: Micro quadcopter Figure 2: DJI Phantom 2

An important point is that the multicopter has some advantages in front of other aerial

devices. Good multicopters are easily maneuverable and very stable and they usually have a

camera which is used as a First Person View (from now on FPV) and sends the image to

another device. So all in all, we can be anywhere and anytime without leaving home. And

this makes the multicopter as famous as it currently is.

1 The first image has been obtained from Sharper Image, http://www.sharperimage.com. Its

author is unknown.

The second image has been obtained from Wikipedia, http://wikipedia.org. Its author is

unknown

Page 19: The future of robot assistants: Building a hands-free

- 18 -

However, there is a problem. Generally multicopters have to be controlled using our hands

by holding a transmitter. This reduces the use and capabilities of the multicopter to persons

who can use their hands, excluding disabled people and hands-busy people. Let us imagine

a few contexts:

- What if you are a disabled person and want to enjoy multicopter

capabilities?

- What if you are in a rescue mission, holding your equipment and you want

to drive the multicopter on terrain recognition?

- What if you are doing bicycle, running or climbing and you want the

multicopter to follow you and take pictures during your route?

Some solutions have already been implemented on commercial multicopters for following

people. For example, DJI implements an image recognition algorithm in some of their

multicopters in order to make the device follow someone. But this does not give a full

control of the multicopter for the remote pilot.

1.2 - Personal motivation

After achieving three of the four years of the degree, after all the exams and new

knowledge I thought I needed to start a personal summer project and put my acquired skills

in practice.

Robotics is something that captivates me and this is why I bought myself an Arduino starter

kit. I thought it was a good way to start building and programming simple robots. In

addition, I have always played with radio-controlled devices and in the last few years I

started looking for a quadcopter.

Searching on the internet I found several tutorials explaining how to build a quadcopter

using Arduino, so I started the first phase of the project: building and programming an

entertainment radio-controlled quadcopter.

Then, starting the fourth year of the degree, I realized that I had to find an end-of-degree

project, so I thought to myself, why do not I complete my summer project? I started then

searching for possible needs of the quadcopter and I found that commercial multicopters

cannot be controllable without the use of the hands.

Hand-free controlled multicopters can be a revolution improving their versatility, but,

which is the best way to control a device without the use of the hands?

Page 20: The future of robot assistants: Building a hands-free

- 19 -

1.3 - Goals of the project

A first proposal for using speech recognition was made to my actual tutor. Then she told

me it might be interesting to use Brain-Computer Interfaces (from now on BCI), however,

after discussing about that possibility, we found the response of a BCI may not be enough

versatile for controlling all the movements of the quadcopter, so we returned to the first

proposal.

The main goal of this project is then finding a good speech recognition basis Application

Programming Interface (from now on API) in order to create an appropriate vocabulary and

program a speech-based controller for the quadcopter.

Most probably, what might give us more headaches will be training the speech-controller

once we find a good API. As we are talking about a quadcopter, the recognition accuracy

has to be almost a hundred percent or we may have problems such as hurting someone or

crashing the quadcopter. In addition, linking the speech controller with the actual

quadcopter controller (the one on the Arduino) may also be difficult.

All in all, the whole controller has to be very robust to be usable. Otherwise it may be a fail.

1.4 - Structure of the report

This report follows a logical structure based on divider parts of the project, each one with

its sub-parts.

The first chapter provides an introduction to the motivation and context of this project. We

present the technological background and link with the degree on audiovisual systems

engineering.

Chapter 2 focuses on the quadcopter itself. We present the main components and phases of

circuitry, testing and improvement.

Then, chapter 3 presents the speech controller, where we integrated and trained a state-of-

the art speech recognition engine and a speech synthesizer to give feedback and make the

device more human-friendly.

Finally, the fourth chapter provides an overview of the integrated system and discussion on

the main conclusions and future challenges of this project.

Page 21: The future of robot assistants: Building a hands-free

- 20 -

1.5 - A little bit of history

August 22nd 1849, the Republic of San Marco surrenders to Austria. The Republic comes

after a revolt against Austria in 1848 in Venice. The Austrians ended by besiege Venice,

fact that leaded to starvation and outbreaks of cholera. Here appear the first UAVs. A

certain number of bomb-filled balloons sent from Austria to attack Venice.

Then, in the early nineteen-hundreds, drone development and innovations started, as we can

imagine, for military purposes. We can talk about a pilotless torpedo invented by the

Dayton-Wright Airplane Company during World War I, or the A. M. Low's "Aerial Target"

in 1916, the first attempt of unmanned aerial vehicle. After that, a succession of UAVs

appeared over the years, a succession of weapons that became machines to kill people.

It is not till the year 2012 when the commercial multicopters become popular. Daniel

Mellinger and Alex kushleyev, two students from the University of Pennsylvania

developed the first quadcopter as we know them today. This first prototype was presented

on a TED talk by Professor Vijay Kumar2.

This is how UAVs became agile, light and small. Big UAVs weighing several tons and

measuring several meters became small multicopters weighing some grams and measuring

less than a meter.

After that, several companies such as Parrot3 or DJI

4 started developing their own

multicopters for commercial use.

1.6 - The culmination of an engineering degree

This project is an end, the culmination of my undergraduate studies. Over those past years I

have been learning to program in different languages; studying basics of calculus, algebra

and physics; learning how sound is created, transformed, propagated, recorded and studied.

I have learnt how image processing works, how video is recorded, stored and played. I have

studied how robots are designed, created and programmed and how their circuitry works. I

have seen the basics of internet communication systems.

Now let us think about the project. It relates the designing of a flying robot that has to be

controlled using a speech recognizer. All in all this project relates robotics, air physics,

calculus for the quadcopter controller, electronics for the whole circuitry, sound for the

speech controller, networks for the communication PC-Quadcopter, programming in

different languages for the whole project.

2 https://www.youtube.com/watch?v=4ErEBkj_3PY

3 http://www.parrot.com/

4 http://www.dji.com/

Page 22: The future of robot assistants: Building a hands-free

- 21 -

As we can see the project includes a little bit of everything seen in the degree except for the

image processing. This last part could be implemented in later versions of the drone using a

camera with image processing software for detecting targets for example.

This work is a complete engineering project that culminates four years of studies.

1.7 - Requirements and challenges

The idea is to build and test an affordable quadcopter which has to integrate a speech

recognition engine working in real time, in noisy acoustic conditions (motors and other

surrounding noises). In addition, it adds a synthesis voice engine and tries to document all

the process keeping it as open as possible to make the project reproducible in the future.

This project focuses on a four-leg quadcopter, each of them measuring 22 cm

approximately.

We need the following components:

- The structure (commonly known as f450 structure): It consists on a four legged

plastic structure weighing approximately 300 g

- The Battery: We will search for a battery keeping a good relation weight/duration

approximately 200 g and approximately 20 min (3000 mAh).

- The Gyroscope: Its weight is negligible but we will be searching for a gyro that can

give us accuracy in the measure of quadcopter orientation along X, Y and Z axis.

- The Electronic Speed Controllers (from now on ESC): With those controllers, we

send the signal that sets the speed of the motors. Their weight is approximately 4 x

30g

- The WiFi transceiver ESP8266: This piece allows us to communicate with the

computer via WiFi. Its weight is negligible

- The Arduino5 board: This board contains a microcontroller based on

ATmega328P which is an 8-bit microcontroller. Arduino is very versatile and can

be used in several fields such as robotics, networks and sensors and data acquisition.

The weight of our concrete board is approximately 40g.

So this makes a total of 700 g approximately including wires and others.

Now we can start searching for the best relation motors-propellers that can lift that weight

taking into account that we must search for f450 motors. Usually, for f450 we can use

propellers 9045 or 1045 (those are the common ones). The decomposition of the number

stands for the length of the propeller and the inclination of the blade.

5 https://www.arduino.cc/

Page 23: The future of robot assistants: Building a hands-free

- 22 -

Chapter 2.

BUILDING THE QUADCOPTER

Page 24: The future of robot assistants: Building a hands-free

- 23 -

1. THE COMPONENTS

This is an overview of the built in components of the quadcopter:

Figure 3: Structure of the whole solution

1.1 - The processor

A processor allows us to determine the conditions in which the motors accelerate or not.

Apart from that, we need it to get the response of the gyro and interpret it; we need to get

the signal received by the transceiver and eventually tell the transceiver to send information

about the quadcopter status to the controller. We need a board in which we can plug all the

components and write a program to control everything. This is why, for this project we use

an Arduino UNO board.

An Arduino board contains a microcontroller and provides a set of digital and analog

input/output pins that can interface to various expansion boards and other circuits. Those

boards include several communication interfaces for loading programs that are programmed

using the Arduino integrated development environment based on a programming language

named Processing6, which also supports the languages C and C++.

The idea of using this processor comes then from the ease and versatility given by those

boards.

6 https://processing.org/

Page 25: The future of robot assistants: Building a hands-free

- 24 -

1.2 - The Frame

There is not too much to say about the frame, but just a few things that have to be clear:

- We are talking about an f450 frame. This means that we are using a 22 cm long

legged structure which conditions the rest of components of the quadcopter.

- It has to be light and robust as the quadcopter may eventually fall.

- We have to be sure we get all the screws to fix the motors as not all the motor packs

come with them.

- In our case, the frame has an internal circuit to feed the components. This means

that we can sold the battery terminals to the board of the frame and also the the

terminals of each Electronic Speed Controller (ESC).

1.3 - Motors, Propellers and ESC’s

An important fact to take into account: what about the sense of rotation of each motor? Let

us think about all the possible movements of the quadcopter:

- The quadcopter has to be able to go up and down. This can be performed by

a “simple” increase or decrease of power of all the motors. See Figure 4.

Figure 4

7: Motors rotating for stable up movement of the quadcopter. The opposite

configuration (i.e. all blue) would make the multicopter go down.

7 Figures 4,5,6 and 7 have been obtained from Pinterest. Its author is Ricardo Cámara.

https://es.pinterest.com/pin/566679565588950278/. Even so, those images have been modified.

Page 26: The future of robot assistants: Building a hands-free

- 25 -

- The quadcopter has to be able to go forward and backwards. This can be

performed by an increase of power of the rear motors with respect to the

front ones to go forward, and the other way around to go backwards. See

figure 5.

Figure 5: Motors rotating for quadcopter to go forward. The opposite configuration would

make the multicopter go backwards

- The quadcopter must be able to go side to side. This is the same principle

used in forward/backwards movement but using side motors. See figure 6.

Figure 6: Motors rotating for quadcopter to go left. The opposite configuration would

make the multicopter go right.

Page 27: The future of robot assistants: Building a hands-free

- 26 -

There is a final case in which the quadcopter is able to rotate right and left, and here is

where the problem appears. To make this case happen, the sense of rotation of the motors

has a repercussion on the rotation of the quadcopter. For this reason and for better stability,

we have to get two motors rotating clockwise and two counterclockwise.

Each type of motor has to be confronted with its homonym in order to turn around the

central axis of the quadcopter. Given that, with clockwise motors turning faster than

counterclockwise, the quadcopter would turn counterclockwise. And the other way around,

with counterclockwise motors turning faster than clockwise ones, the quadcopter would

turn clockwise. See figure 7.

Figure 7: Motors rotating for quadcopter to rotate left. The opposite configuration would

make the multicopter rotate right.

1.4 - The Gyro

The gyro is the device that allows us to control the stability of the quadcopter. We are using

a 3-axis gyro, which means that it takes the response of the quadcopter on the x,y and z

axis.

The idea here is to get the actual angular speed of the drone in all three axis and send an

order to correct the inclination of the quadcopter if needed.

Page 28: The future of robot assistants: Building a hands-free

- 27 -

1.5 - Others

Apart from the mentioned above, we need something to control the quadcopter. In the first

version, we built a normal radio-controlled drone. This means that we bought a transmitter

with its receiver. In our case it was a six-channel transmitter but we just need 4 channels to

control the quadcopter. In the second version, this transmitter is not needed; instead, we use

a wifi transceiver to communicate with the computer.

Page 29: The future of robot assistants: Building a hands-free

- 28 -

2. BUILDING THE QUADCOPTER

2.1 - Electronic circuit

The following schematic shows the circuit that has been built for the quadcopter:

Figure 8: Quadcopter circuitry

As we can see, a battery is providing the circuit with power (11.1V). The Arduino board

has to be powered by a 5V source, and this is why we have a voltage divider. The gyro and

the receiver are powered by the Arduino itself. Finally, ESC’s are plugged to digital 4-7

pins, receiver inputs to digital 8-11 pins and gyro to analog 4 and 5 pins.

2.2 - The program

As I was building a quadcopter myself and wanted it to fly perfectly, I did not write the

code for controlling it myself. I used the one given by Mr. Joop Brokking (See videos on

bibliography).

We are not going to explain the whole controller because it is very complicated, but the

main idea is that we want to set up a proportional integral derivative (from now on PID)

Page 30: The future of robot assistants: Building a hands-free

- 29 -

controller which main goal is to keep the gyro angular rates the same as the inputs of the

transceiver.

The idea is to sum up, a proportional part with an integral part and a derivative part

following the next equation:

Where stands for the proportional gain, for the integrative gain and for the

derivative gain and for the difference between the gyro output and the receiver

output.

The proportional part consists on the difference between the gyro output and the receiver

output multiplied by the gain of the proportional part which we get by test and error. What

we want to acquire with that is the possibility of keeping the multicopter oscillating around

the center position.

The integral part is the difference between the gyro output and the receiver output

multiplied by the gain of the integral part, which is also given by test and error, and

summed up with the previous integral output. What we see is that the multicopter

overcompensates just as the proportional part did.

Finally, the derivative part consists on the difference between the gyro output, the receiver

output, the previous gyro output and the previous receiver output, and is multiplied with the

derivative gain, which again, is given by test and error. This part only applies changes in

angular motion which means that it only fights on first move, but keeps the motors on same

throttle after that.

This essentially makes the quadcopter stabilize itself avoiding abrupt modifications on the

behaviour of the multicopter. This program needs to be modified retrieving the results of

the speech recognition later but the stabilization part remains always the same.

2.3 - Testing and changes

First design of the quadcopter included 4 f250 motors with 5030 propellers. In our first

tests, we lifted the quadcopter several meters, but it was not stable, it was always going

upside-down. We saw that f450 drones were using other type of motors and we discovered

our problem was related to the size of the frame, and not the weight. We tried with 6030

propellers but the problem was that we needed 9045 ones, and we could not use them on

our motors, so we had to buy the proper f450 ones.

Page 31: The future of robot assistants: Building a hands-free

- 30 -

Figure 9: Quadcopter from the side. It represents the part without the landing gear.

Figure 10: Quadcopter from the top

Once we solved this problem, we made tests again and saw that the quadcopter had

difficulties to lift from ground and was receiving too much hits on landing. We decided to

buy a landing gear to amortize the landing and so that the quadcopter is more elevated from

the ground.

Page 32: The future of robot assistants: Building a hands-free

- 31 -

Figure 11: Image of the full mounted quadcopter

For a visualization of the quadcopter flying, see link to SCIFO Youtube channel on annex.

2.4 - Future work

Currently the quadcopter has two more little problems which are directly related to the

transmitter. First of all, I’m using a 6 channel-transmitter adapted to airplanes. This means

that it has a throttle stick with no spring and neutral position is not acquired automatically.

This means we can not get the quadcopter to be stable at a position trying to put the stick on

neutral position. And this makes the quadcopter go slowly upside-down. Second, the

transmitter does not allow modifying flight modes. The correct flight mode would be

“stabilizing mode”, which would level out the quadcopter after directional modifications.

The transmitter I bought was not prepared for this and so it is not sending actual position of

the sticks to the controller, but just summing-up this value. For clarification, if you move

the stick to position X, the quadcopter will remain on this state provided that the stick

moves along the range [0,X]. The only way to stabilize it is to move the stick out of this

range.

Page 33: The future of robot assistants: Building a hands-free

- 32 -

Chapter 3.

THE SPEECH CONTROLLER

Page 34: The future of robot assistants: Building a hands-free

- 33 -

1. INTRODUCTION

In this chapter we explain the basis of speech recognition and speech synthesis. The idea is

to retrieve a voice input and generate a voice output which depends on the recognition

results. Mainly, we integrate a state-of-the-art speech recognizer, configure it with our

specific vocabulary, test it and train it with our own acoustic input in order to yield good

recognition accuracy.

We first get the input audio given by a speaker, analyze it and compare it with a set of

acoustic models generated through training with annotated audio excerpts. We interpret the

results in order to make a decision and map it to a set of quadcopter actions. We then

synthesize a message to provide some feedback to the user. Figure 12 provides an overview

of the process.

Figure 12: Schematic showing the structure of speech processing in this project

Page 35: The future of robot assistants: Building a hands-free

- 34 -

2. SPEECH RECOGNITION

2.1 - Recognizing the voice

To put in place the process of speech recognition we follow this diagram:

Figure 13: Phases of the speech recognition

Page 36: The future of robot assistants: Building a hands-free

- 35 -

As we can see in the diagram below, there is a phase previous to the speech recognition

which consists on training the recognizer. To do that, we have first to create a database of

recordings, a dictionary with the appropriate phonetic transcription and the language model.

The challenges we face during the project are:

- Correct training: a non-well-trained recognizer may give low accuracy

recognition with results up to 0-10% accurate which is not recommendable

at all.

- Noise: We need to take into account all possible scenarios. This includes

scenarios with noise that may modify the recognition accuracy. For that, we

have to do several tests on different places and with different acoustic

features.

- Speaker independence: There is a huge difference between a system that

must be used by several speakers and one that has an only speaker. In this

case, we train the recognizer for one speaker to simplify the training task.

2.2 - Options

The options that we have considered have been the following ones:

sphinx48: Widely used in the open source community, cmu sphinx is designed specifically

for low-resource platforms; it has a flexible design and focuses on practical application

instead of research. It also has active development and release schedule and a large and

active community.

voce9: It is based on CMUSphinx, but it is a prebuilt library that you can use but you can

neither adapt it nor modify. It is very easy to get and use.

Pypi10

: It is simply a package including several built-in recognizers that allows you to test

different tools. In our case we wanted it because of the possibility to use the google speech

recognition tool. One of its advantages is that it is written in python, which is a very simple

and versatile programming language.

8 http://cmusphinx.sourceforge.net/

9 http://voce.sourceforge.net/

10 https://pypi.python.org/pypi/SpeechRecognition/

Page 37: The future of robot assistants: Building a hands-free

- 36 -

2.3 - Tests and Fails

We made the first tests with Sphinx4, but we were not able in our first attempt to set it up

correctly, and so we were driven to use voce.

We found voce a very interesting and simple to use tool but it has a major problem, it can

not be trained, and the results we were getting were terrible. After testing more than 15

times with my own voice, we discovered that voce was using a complete US-English

dictionary and so there were too many words to compare with my non previously trained

patterns. The accuracy was almost 0% as the results were very random.

Then we found Pypi which gave us two possibilities, to use the built-in version of sphinx4

or, what was more interesting to us, to use the google speech recognizer. The google API

was perfect in terms of word recognition, but there was a major problem, it was taking

more than 5 seconds which can be considered too much time to show the result of the

recognition. It was recording the audio, sending it to the server, making the recognition, and

getting back the result. Controlling a quadcopter must have almost instantaneous response

so this solution was not adequate. Trying to use sphinx4 from Pypi we were unable to set

up correctly Pocketsphinx module. We found finally the way to use sphinx4 normally.

The following table compares the three options in terms of simplicity, accuracy, trainability

and profitability:

Simplicity Accuracy Trainability Profitability

Sphinx4 Very simple to

use

Acceptable

accuracy (60-80%)

but needs to be

trained

Can be trained

quite simply

Can be exploited

Voce Very simple to

use and includes a

synthesizer

0-20% accuracy Cannot be

trained as it is

precompiled

Cannot be

exploited

because of its

lack of

trainability

Pypi Very simple to

use and includes

google speech

recognition

80-100% accuracy Cannot be

trained but it

isn’t necessary.

Cannot be

exploited

because it takes

too much time

to give the result

Table 1: Recognizer APIs comparison table

Page 38: The future of robot assistants: Building a hands-free

- 37 -

2.4 - Dictionary

For this project we are using a simple vocabulary to control the quadcopter. The vocabulary

consists on a listener: SCIFO; and two words for controlling the multicopter. All in our

entire dictionary (.dict) file should have the following words:

scifo, start, stop, motors, go, rotate, up, down, left, right

So, some examples of orders would be: “scifo start motors”, “scifo go up” or “scifo rotate

left”.

The following flowchart shows the structure of the whole solution and the execution

sequence

Figure 14: Flow chart explaining the behaviour of the whole solution

Page 39: The future of robot assistants: Building a hands-free

- 38 -

2.5 - Conclusions

As a conclusion of this section, we can say that speech recognition is a very helpful tool but

it has major issues that must be taken into account. Not every API can be used in any

scenario; this means that we have to know perfectly which are our needs and our

capabilities. In addition, training is very important so it has to be done correctly.

For future work, it would be interesting to improve system training in order to have better

results.

Page 40: The future of robot assistants: Building a hands-free

- 39 -

3. SPEECH GENERATION

3.1 - Introduction

To give feedback to the user, we want to implement a simple solution that has to be human

friendly.

Text-to-speech (TTS) technology is improving in naturalness day after day and it is based

on the simple principle of translating a string of words into synthesized audio. A TTS

Engine converts written text to a phonemic representation, and then uses fundamental

frequency (pitch), duration of the string, position of phonemes in the syllable and

neighboring phones to convert the phonemic representation to waveforms that can be

output as sound.

The following schematic shows the conceptual behaviour of text-to-speech:

Figure 15: Conceptual behaviour of speech synthesis

As this project is not focused on audio synthesis, we do not spend much time on modifying

the synthesizer to get more natural results, even if it is the main challenge of a good

synthesizer. Here we are satisfied finding a synthesizer that can be understood easily and is

easy to use

Page 41: The future of robot assistants: Building a hands-free

- 40 -

3.2 - The synthesizer

We are using FreeTTS11

, which is a speech synthesis system entirely written in JAVA. The

only thing that we have to do to add it to our project is to import the jar libraries into our

project.

The idea is to make the quadcopter give us feedback of every modification on its behaviour.

So anytime the quadcopter gets an order, it replies with feedback information.

We don’t need a specific vocabulary for speech synthesis as we want the quadcopter to give

us understandable feedback. This means that we just need normal EN-us synthesis model.

3.3 - Conclusions and future work

Using FreeTTS we get good results in terms of understandability, but naturalness could be

highly improved. The resulting voice we get is very robotic. FreeTTS allows us to use

several voices which are already precompiled in the jar libraries or create new ones.

In our project we are using the synthesizer just for replying and simple interaction, for

future versions of the synthesizer, we could implement some new features in order to create

a more intelligent system to which we can talk. In addition, one interesting improvement

would be the one related to naturalness using a smoother voice.

11

http://freetts.sourceforge.net/docs/index.php

Page 42: The future of robot assistants: Building a hands-free

- 41 -

4. PROGRAMMING AND TRAINING Our program is structured as follows:

Figure 16: Flow chart explaining the structure of the program

First of all, we need to import all the libraries involved in our project, i.e. cmu sphinx

libraries, FreeTTS and java .net.

The second phase is to initialize all the parameters for the synthesizer and set up all the

configuration parameters of the recognizer. This means that we have to specify the path of

the used acoustic model, the dictionary and the language model.

Then we create a socket for the communication PC-Quadcopter specifying the quadcopter

IP and the communication port.

Finally, the interpreter that takes the information given by the speaker and send an order to

the quadcopter.

Page 43: The future of robot assistants: Building a hands-free

- 42 -

To train the recognizer we need to change the acoustic model. The process is quite

complex. We have two possibilities:

- We can create a new acoustic model to substitute the existing one. The problem is

that we have to record several hours of information and it might be unnecessary

having a second option.

- We can adapt the acoustic model, which is what we do.

Adapting the acoustic model is a complex process. Mainly, we have to record the entire

dictionary in separate wave files to create an adaption corpus. Then we have to generate all

the acoustic feature files using the tool sphinx_fe provided by cmu Sphinx.

Finally, we have to update the existing acoustic model with new parameters and we get our

modified acoustic model which we include to our java project.

Page 44: The future of robot assistants: Building a hands-free

- 43 -

Page 45: The future of robot assistants: Building a hands-free

- 44 -

Chapter 4.

SYSTEM INTEGRATION

Page 46: The future of robot assistants: Building a hands-free

- 45 -

1. ARDUINO AND THE SPEECH CONTROLLER

1.1 - The WiFi transceiver

Once we get the result of the speech recognition, we need to send an order to the

multicopter. Since the beginning, we were using a 6 channel transmitter with its receiver

and controlling the quadcopter with hands. Now, we need a simple way to send commands

from our PC to the quadcopter. We could use three different solutions:

- Send the information via cable using the arduino communication port. In this case

we would not be able to let the quadcopter fly freely, so this is not the best solution

for us.

- Send the information using bluetooth. This would be a good solution, but it would

not give us a large distance, maybe just a few meters.

- Send the information using WiFi. WiFi communication is easier to set up and give

us a wide range of possible distances depending on our needs and possibilities going

from several meters to several kilometers. In addition, WiFi is a reliable technology

in terms of communication issues.

What we use then is a simple and cheap WiFi module called ESP8266 that we connect to

the same local area network of our PC and send information from the computer to the

multicopter using Telnet protocol.

Figure 17: ESP8266 module mounted on the quadcopter

Page 47: The future of robot assistants: Building a hands-free

- 46 -

The following scheme shows the way Arduino board and ESP8266 are connected:

Figure 18: Including the WiFi transceiver to the circuit

1.2 - Linking Arduino with external software

As soon as we have the WiFi module connected to the Arduino UNO, we need to link the

speech recognizer with the quadcopter. With this link, we are able to send simple character-

oriented code that the multicopter can interpret and act in consequence.

To do this, we use a protocol called Telnet. It is an application layer protocol that provides

a bidirectional interactive text-oriented communication between two nodes using a virtual

terminal connection.

All in all, what we do is connect the pc to the IP of the quadcopter on a certain port. Then,

depending on the speech recognition result, we get a certain character that we send through

telnet to the multicopter.

Page 48: The future of robot assistants: Building a hands-free

- 47 -

We send the following information:

- “1”: Start motors

- “0”: Stop motors

- “w”: Go up

- “s”: Go down

- “a”: Go left

- “d”: Go right

- “q”: Rotate left

- “e”: Rotate right

Page 49: The future of robot assistants: Building a hands-free

- 48 -

2. THE FINAL CONTROLLER

2.1 - Testing

After the first tests we have seen that the solution works but with major issues. The

recognition works at first, but once we get the motors to work, there is too much acoustic

noise to continue with the program execution. This noise is created by the motors and the

state “start motors” cannot be interrupted.

In addition, we have to handle some exceptions that are not taken into account. For

example, in case the listener detects less than three words (for example “SCIFO GO”), the

program throws an “Out of bounds” exception that stops the execution of the recognizer.

This exception is due to the attempt to access a nonexistent position of the array of orders

(“orders[2]”).

2.2 - Fixing problems

Regarding the noise that may fail the recognition, we try to increase recognizer accuracy by

doing a better acoustic model adaptation. We increase the number of recordings and test

different scenarios so that every case is taken into account.

For the “Out of bounds” exception, we implement the behaviour so that the program can

handle the exception and act in consequence.

2.3 - Expected final result and future work

For the final result, we expect that the quadcopter can interact easily with the speaker and

have a normal behaviour. Most probably, as it is a handmade multicopter, it will have

stability problems. Those problems already exist in the hand-controlled version of the

quadcopter.

Regarding the future work, It would be interesting to improve the stability of the

quadcopter and depending on the final result, improve also the recognizer behaviour and

accuracy. In addition, it would be interesting to add some features to the quadcopter such as

an FPV camera and a GPS.

Page 50: The future of robot assistants: Building a hands-free

- 49 -

Page 51: The future of robot assistants: Building a hands-free

- 50 -

Chapter 5.

CONCLUSIONS

Page 52: The future of robot assistants: Building a hands-free

- 51 -

1. CONCLUSIONS

There is a bunch of possible configurations when it relates the programming of a

multicopter. What we have seen in this project is that we can always add capabilities to

machines in order to make them smarter and more helpful to humans.

As always, it is not difficult to implement a solution for a given need, what is more

difficult, is to improve this solution in order to make it work perfectly. During the project,

we have faced many difficulties and unexpected problems which are always present on all

projects. What we need to control always is the given time to achieve our goals. For

instance, in this project, time has been correctly invested as we have got promising

outcomes.

In addition, it is always important to get all the possible information before starting a

project of this scope as it might save you a lot of time and headaches. As a matter of fact, if

we had spent a little bit more time searching for motors information, we would not have

spent 180 € but just 100€ for that purpose.

This is just a possible implementation of a solution for speech recognition controlled

quadcopters but other possibilities may arise and improve this one. The main idea that we

want to share is that intelligent systems and robotics are the future. The implementation of

this solution searching for more user scenarios may be a revolution along with other

projects involving artificial intelligence.

Page 53: The future of robot assistants: Building a hands-free

- 52 -

Bibliography

Information

[1] Anonymous, Wikipedia [https://en.wikipedia.org/], March 2016, Unmanned aerial

vehicle.

[2] Anonymous, Wikipedia [https://en.wikipedia.org/], April 2016, First-person view

(Radio Control)

[3] Brett Holman, Airminded [http://airminded.org/], August 2009, The first aerial

bomb

[4] Anonymous, Wikipedia [https://en.wikipedia.org/], May 2016, History of unmanned

aerial vehicles

[5] Anonymous, Nesta [http://www.nesta.org.uk/], Unknown, Drones: a history of

flying robots

[6] CMU Sphinx developers, CMU Sphinx [http://cmusphinx.sourceforge.net/],

December 2010, Before you start

[7] CMU Sphinx developers, CMU Sphinx [http://cmusphinx.sourceforge.net/],

November 2015, Sphinx-4 application programmer’s guide

[8] CMU Sphinx developers, CMU Sphinx [http://cmusphinx.sourceforge.net/],

December 2011, Generating a dictionary

[9] CMU Sphinx developers, CMU Sphinx [http://cmusphinx.sourceforge.net/], March

2016, Adapting the default acoustic model

[10] FreeTTS developers team, FreeTTS

[http://freetts.sourceforge.net/docs/index.php], Unknown, What is FreeTTS

[11] Anonymous, Wikipedia [https://en.wikipedia.org/], March 2016, Telnet

[12] Lawrence R. Rabiner and Ronald W. Schafer. 2007. Introduction to digital

speech processing. Found. Trends Signal Process. 1, 1 (January 2007), 1-194.

DOI=http://dx.doi.org/10.1561/2000000001

[13] Anonymous, Wikipedia [https://en.wikipedia.org/], June 2016, Speech

Synthesis

[14] Anonymous, Wikipedia [https://en.wikipedia.org/], June 2016, Arduino

[15] Bryant Frazer, Studio Daily [http://www.studiodaily.com/], June 2015,

Studio Special Report: The State of the Art in Drone Technology

Else’s Images

[1] Ricardo Cámara, Pinterest, https://es.pinterest.com/pin/566679565588950278/,

Drone schematic.

[2] Unknown, Sharper Image,

http://www.sharperimage.com/si/view/product/RC+Micro+Drone/203803, Micro

drone image.

[3] Unknown, Wikipedia,

https://upload.wikimedia.org/wikipedia/commons/4/41/DJI_Phantom_2_Vision%2

B_V3_hovering_over_Weissfluhjoch_%28cropped%29.jpg, DJI phantom

Page 54: The future of robot assistants: Building a hands-free

- 53 -

Building the drone tutorials

Here are the 6 video-tutorials that I followed to build the drone.

Author: Mr. Joop Brokking

Date: April - May 2015

[Part 1]: https://www.youtube.com/watch?v=2pHdO8m6T7c

[Part 2]: https://www.youtube.com/watch?v=bENjl1KQbvo

[Part 3]: https://www.youtube.com/watch?v=nCPEJTUYch8

[Part 4]: https://www.youtube.com/watch?v=fqEkVcqxtU8

[Part 5]: https://www.youtube.com/watch?v=JBvnB0279-Q

[Part 6]: https://www.youtube.com/watch?v=2MRiVSyedS4

Page 55: The future of robot assistants: Building a hands-free

- 54 -

ANNEX

Link to my github repository:

https://github.com/pedrofrau/SCIFO

Youtube SCIFO videos:

https://www.youtube.com/channel/UC0ug-lOxs-FQt0cTlnkCLhQ

Link to prof. Vijay Kumar’s TED talk:

https://www.youtube.com/watch?v=4ErEBkj_3PY

Consulted web pages for multicopter examples:

http://www.parrot.com/

http://www.dji.com/

http://www.arduino.cc/

https://processing.org/

http://cmusphinx.sourceforge.net/

http://voce.sourceforge.net/

https://pypi.python.org/pypi/SpeechRecognition/

http://freetts.sourceforge.net/docs/index.php