13
Computer Vision: history and applications Albert Alemany Font Helsinki Metropolia University of Applied Sciences Media Engineering April 2014

1. Introduction - EVTEKusers.evtek.fi/~erkkir/ImageTechnology2013/Computer Vi…  · Web viewThe easiness with which we “see”, brought the first artificial intelligence researchers

  • Upload
    vanthuy

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Computer Vision: history and applications

Albert Alemany Font

Helsinki Metropolia University of Applied Sciences

Media Engineering

April 2014

Table of contents

1. Introduction..............................................................................................................3

2. Understanding what is computer vision...................................................................42.1. What is “vision”?............................................................................................................42.2. Computer vision and its related disciplines....................................................................4

3. History of computer vision.......................................................................................6

4. Applications of computer vision...............................................................................64.1. Face and smile detection...............................................................................................64.2. Optical character recognition (OCR)..............................................................................74.3. Smart cars......................................................................................................................84.4. Medical imaging.............................................................................................................84.5. Video-based interaction: gaming...................................................................................94.6. Computer vision as a barrier..........................................................................................9

5. Conclusions...........................................................................................................10

6. References............................................................................................................10

2

1. Introduction

According to Aristotle, Vision is knowing what is where by looking, which is essentially valid.

Our vision and brain identify, from the information that arrive to our eyes, the objects we are

interested in and their position in the environment, which is very important for a lot of our

activities. Computer Vision, somehow tries to emulate that capacity in computers, so that by

means of the interpretation of the acquired images, for example with a camera, the different

objects can be recognized in the environment as well as their position in the space.

The easiness with which we “see”, brought the first artificial intelligence researchers to start

thinking, around 1960, that making a computer interpret images was relatively easy, but it

turned out to be different [4]. Many years of investigation have proven that it is a very

complex subject. However, over the last few years there have been considerable progresses.

Computer vision brings together different fields such as mathematics, physics, biology and

engineering. It provides us a better understanding of human vision, how we perceive and

interpret things. Our world is surrounded by images and movies, and every time more useful

applications are being developed; applications that are touching our lives, making them

easier, safer and more fun.

The goal of this thesis is to investigate how computer vision has evolved over the years since

it first appeared, and to explore the different applications that have been developed and how

they have helped us, improving our lives. Also, in this thesis I will reflect on where computer

vision is going to go in the next years and discuss how we should address it from an ethical

point of view.

3

2. Understanding what is computer vision

2.1. What is “vision”?Vision is the window to the world of many organisms. Its main function is to recognize and

localize objects in the environment through image processing. Computational vision is the

study of these processes, in order to understand them and to build machines with similar

capacities.There are different definitions of vision. The following ones are among the most

important:

“Vision is knowing what is where by looking” (Aristotle)

“Vision is to get from the information of our senses, valid properties from the external world”,

Gibson [3].

“Vision is a process that, from images of the external world, it produces a description that is

useful to the observer and that doesn’t contain irrelevant information”, Marr [7].

All of these definitions are essentially valid, but maybe the one that is closer to the current

idea about computer vision is the definition of Marr. In his definition there are three important

aspects that we have to consider: (i) vision is a computational process, (ii) the description

obtained depends on the observer and (iii) it is necessary to remove the information that is

not useful (information reduction).

2.2. Computer vision and its related disciplinesThe term “Computer Vision” has been used a lot in the last few years and it is often mistaken

for other concepts. In the Figure 1 the different disciplines and fields related to computer

vision are shown.

Figure 1: Computer vision related disciplines

4

Digital image processing is the process by which taking an image, a modified version of it is

produced. In the Figures 1.1 and 1.2 two examples are illustrated. In the first one, the

segmentation can be observed, where the goal is to identify from an image the pixels that

belong to an object. In that case, the output is a binary image formed by white and black

pixels, which means “object” or “no-object”. The second example is about restoration of an

image. In that case, a blurry image becomes clearer.

Figure 2: Image processing – segmentation: the goal is to separate the study object from the background of the image [6].

Figure 3: Image processing - restoration: the goal is to remove the movement of the camera when the photography was taken [5].

Machine vision is similar to computer vision but it is more practical, whereas computer vision

is more academic. Machine vision is not as advanced in theoretical sense as computer

vision. There is a lot of deep mathematics in computer vision, while in machine vision

practical issues such as cost and speed of processing are likely to dominate over academic

matters [8].

5

3. History of computer vision

In 19060’s Larry Roberts wrote his thesis about the possibility of extracting 3D geometric

information from 2D views. This lead to a lot of research in the MIT’s artificial intelligence

labs as well as in other research institutions. In 1970’s MIT’s artificial intelligence lab started

a course in computer vision. In 1980’s OCR (Optical character recognition) systems were

starting being used in various industrial applications to read and verify letters, symbols and

numbers. Smart cameras were developed in the late 80’s. In the 90’s, the first face

recognition systems appeared. [1] [4]

4. Applications of computer vision

4.1. Face and smile detectionIn the 90’s the first face recognition systems appeared. Nowadays almost any digital camera

is able to detect faces and adjust the exposure and flash in order to obtain the best results.

The Figure 4 shows an example of how a camera detects the faces of the people standing in

front of it and how it draws a rectangle in each of them. Some cameras also have the “auto

trigger” option, where the photo is automatically taken when the person in front of the camera

is smiling.

Figure 4: Automatic face detection

6

4.2. Optical character recognition (OCR)Optical character recognition is the technology to convert scanned docs into text that a

computer can read. As the Figure 5 shows, Optical character recognition software are used

for car license plates recognition. The radars must be able to localize a license plate of a

vehicle with variable conditions regarding illumination, perspective and different

environments.

Figure 5: License Plate Recognition OCR software

Another application of OCR is converting handwriting in real time to control a computer. This

is called pen computing. A tablet is used to replace a keyboard and commands are sent to

the computer using gesture recognition.

This technology is also used in database indexing. Printed documents are converted into

electronic copies, becoming searchable documents. This is what Google Books does.

Google has scanned and converted into text a lot of magazines, and now people can perform

searches on these books.

4.3. Smart carsWith the help of computer vision, our society has been able to develop cars that can

effectively drive by themselves. An autonomous vehicle can imitate the human driving

capacities. It is able to sense its surrounding environment and to act accordingly. In order to

do that, it uses technologies such as radars, lidars, GPS, and computer vision.

This type of vehicles not only brings the possibility of a driverless trip, but they also suppose

other advantages. They would reduce the number of car accidents, because these

autonomous systems increase the security compared to a human driver. Also, they could

increase the capacity of the highways and decrease the traffic congestion due to the

7

reduction of the security distance between vehicles. Another advantage is the possible

reduction of traffic signs, because of the fact that these vehicles could receive the information

electronically.

4.4. Medical imagingIn order to help the physician into the diagnosis process, 3D models are created by

computers by combining different 2D scans such as CT (Computerized tomography) and

MRI (Magnetic resonance imaging) [2].

Also, by processing a magnetic resonance image, the internal structures can be easily

located, granting the surgeon x-ray vision, which is a step forward towards minimally invasive

surgeries [2].

4.5. Video-based interaction: gamingVision-based interfaces have been developed lately, allowing the player to move his body to

interact with the game. The interface can sense the position of the body, the orientation of

the head, the direction of gaze as well as the different gestures produced by the player. Then

the character in the game may respond accordingly. These interfaces provide a much more

exciting and fun experience overall.

The application of computer vision to computer games fronts some challenges. It is important

that the response time is as fast as possible. Also, the hardware cost needs to be very low.

4.6. Computer vision as a barrierSince several decades ago, many artists such as Salvador Dalí or M.C. Escher have worked

with optical illusions. An optical illusion is any illusion of the sense of that makes us perceive

the reality erroneously. A computer would not have any difficulty solving an optical illusion,

because optical illusions are based on physiological and cognitive matters.

However, there are other images which a human could interpret its content rather easily,

while for the current computers it is impossible to do. The Figure 6 shows a Google

CAPTCHA. A CAPTCHA (Completely Automated Public Turing test to tell Computers and

Humans Apart) is a test used in computer to determine weather the user is a human or not.

These CAPTCHAs are used as a barrier in Internet, and they have been getting harder over

the last few years. That is because the OCR systems are getting better.

8

Figure 6: Google CAPTCHA

5. Conclusions

Computer vision does not have to be thought as when computers are going to be capable of

holding enough artificial intelligence and do what humans can do. It is not trying to mimic

human behavior, but to extend it beyond that.

Despite the fact that there is still a lot of land to discover in the field of computer vision, and

that some books on that topic get obsolete as soon as they are published, computer vision is

getting to a point where is changing, and will change even more our lives drastically.

Ranging from image inspection and assembly tasks to motion controllers devices, computer

vision is touching our lives in every area. Object recognition, face detection, smart cars and

medical imaging are only a small list of applications that are changing how we as humans

and as a society live and coexist.

Computer vision is leading us to a sensor-driven world, which is going to help us improve as

a society in a lot of aspects, but probably not in all of them. Currently the majority of security

cameras rely on human intervention to be able to detect strange behavior or anomalies, but

that is going to change. There will come a point where these cameras, by using biometric

identification techniques, will be able to recognize, identify and track people.

Until now, computer vision has provide only useful tools, but soon some questions will need

to be answered regarding where the barriers should be put to separate what is ethically and

morally correct and what is not, specially considering the fact the current laws regarding data

collection and management, privacy and surveillance are blurry.

9

6. References

[1] Y. Aloimonos, Special Issue on Purposive and Qualitative Active Vision, 1992.

[2] C. H. Chen, Computer Vision in Medical Imaging, Oct. 15, 2013.

[3] J. J. Gibson, The Ecological Approach to Visual Perception. Boston: Houghton Miin,

1979.

[4] J. Gribbin. Historia de la Ciencia (1543-2001). Editorial Crítica, Barcelona, 2003.

[5] D. Mery and D. Filbert. A fast non-iterative algorithm for the removal of blur caused by

uniform linear motion in X-ray images. In Proceedings of the 15th World Conference on Non-

Destructive Testing (15th - WCNDT), Rome, Oct. 15-21 2000.

[6] D. Mery and F. Pedreschi. Segmentation of colour food images using a robust algorithm.

Journal of Food Engineering, 2004. (accepted April 2004).

[7] D. Marr, Vision. San Francisco: Freeman, 1982.

[8] E. Trucco and A. Verri, Introductory techniques for 3-D Computer Vision, 1998.

10