26
ROAD MODELING FOR AUTONOMOUS VEHICLE (FINAL REPORT) PREPARED AT : INNOVATION CELL IIT BOMBAY SUBMITTED BY: AMARTANSH DUBEY MENTOR: EBRAHIM ATTARWALA ACKNOWLEDGEMENTS I take this opportunity to express my gratitude to the people who have been instrumental in the successful completion of this project. We are thankful to Innovation Cell, IIT Bombay, for providing me with this opportunity to learn, explore and implement our skills. I thank my mentor EBRAHIM ATTARWALA and all other mentors for their constant guidance and support throughout the course of the project. PROBLEM STATEMENT: To develop image processing algorithms for autonomous vehicle which basically involves: 1) Lane Detection. 2) Vehicle Detection and Recognition. 3) Character, Text and Road Sign Detection and Recognition. 4) Controlling motion of vehicle using above three algorithm in Real time.

Image Processin

Embed Size (px)

DESCRIPTION

agha

Citation preview

  • ROAD MODELING FOR AUTONOMOUS

    VEHICLE (FINAL REPORT)

    PREPARED AT : INNOVATION CELL IIT BOMBAY

    SUBMITTED BY: AMARTANSH DUBEY

    MENTOR: EBRAHIM ATTARWALA

    ACKNOWLEDGEMENTS

    I take this opportunity to express my gratitude to the people who have been instrumental in the successful completion of this project. We are

    thankful to Innovation Cell, IIT Bombay, for providing me with this opportunity to learn, explore and implement our skills. I thank my

    mentor EBRAHIM ATTARWALA and all other mentors for their constant guidance and support throughout the course of the project.

    PROBLEM STATEMENT:

    To develop image processing algorithms for autonomous vehicle which basically involves: 1) Lane Detection. 2) Vehicle Detection and Recognition. 3) Character, Text and Road Sign Detection and Recognition. 4) Controlling motion of vehicle using above three algorithm in Real time.

  • INDEX 1. INTRODUCTION TO IMAGE PROCESSING AND OPENCV

    2. OpenCV vs. MATLAB (why opencv not matlab)

    3. WHY FPGA OR GPU NEEDED FOR REAL TIME IP

    4. FPGA (Field Programmable Gate Array):

    5. ADVANTAGES OF USING FOGA FOR IMAGE PROCESSING:

    6. DISADVANTAGES USING FPGA FOR IMAGE PROCESSING:

    7. IMAGE PROCESSING ALGORITHMS FOR LANE DETECTION

    8. Part A: Briefing the concepts of image processing used in the codes for detecting lanes, removing noise, etc.

    SUBHEADINGS:-- CANNY/SOBEL EGDE DETECTION:

    HOUGH STANDARD TRANSFORM.(For shape detection)

    HOUGH PROBABLISTIC TRANSFORM.(For shape detection)

    BIRD EYE VIEW / PERSPECTIVE VISION.

    GAUSSIAN BLURR.

    HORIZON REMOVAL.

    OTHER SMALL CONCEPTS.

    9 Part B: Algorithms and codes which works well and detected lanes successfully but were not adaptive in every conditions.

    SUBHEADINGS:

    ALGORITHM-1 & FLOWCHART AND PROBLEM IN IT ALGORITHM-2 AND PROBLEM IN IT

  • ALGORITHM-3 AND PROBLEM IN IT

    10 Part C: Final code SUBHEADINGS:

    Final algorithm. Final flowchart.

    RESULT.

    11 TRAFFIC SIGN, FACE, CHARACTER RECOGNITION: SUBHEADINGS:

    WHY CASCADED CLASSIFIER NEEDED:

    SAMPLING:

    SETTING REGION OF INTEREST OR CROPPING

    Training and Classification:

    OUTPUT OF CODE

    12 FUTURE WORK THAT CAN BE DONE

    13 IMPORTANT LINKS CONTAINING

    CODES, SAMPLE VIDEOS, OUTPUT VIDEOS

  • INTRODUCTION TO IMAGE PROCESSING AND OPENCV:

    Image processing is a method to convert an image into digital form and perform some

    operations on it, in order to get an enhanced image or to extract some useful

    information from it. It is a type of signal dispensation in which input is image, like video

    frame or photograph and output may be image or characteristics associated with that

    image. Usually Image Processing system includes treating images as two dimensional

    signals while applying already set signal processing methods to them.

    It is among rapidly growing technologies today, with its applications in various aspects

    of a business. Image Processing forms core research area within engineering and

    computer science disciplines too.

    Image processing basically includes the following three steps.

    - Importing the image with optical scanner or by digital photography.

    - Analyzing and manipulating the image which includes data compression and image

    enhancement and spotting patterns that are not to human eyes like satellite

    photographs.

    - Output is the last stage in which result can be altered image or report that is based on

    image analysis.

    OPENCV:

    OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed at real-time computer vision, developed by Intel Russia research center in Nizhny Novgorod, and now supported by Willow Garage and Itseez.[1] It is free for use under the open source BSD license. The library is cross-platform. It focuses mainly on real-time image processing. If the library finds Intel's Integrated Performance Primitives on the system, it will use these proprietary optimized routines to accelerate itself.

    OpenCV is written in C++ and its primary interface is in C++, but it still retains a less comprehensive though extensive older C interface. There are now full interfaces in Python, Java and MATLAB/OCTAVE (as of version 2.5).

  • OpenCV vs. MATLAB (why Opencv not matlab) Speed: Matlab is built on Java, and Java is built upon C. So when you run a Matlab program, your computer is busy trying to interpret all that Matlab code. Then it turns it into Java, and then finally executes the code. OpenCV, on the other hand, is basically a library of functions written in C/C++. You are closer to directly provide machine language code to the computer to get executed. So ultimately you get more image processing done for your computers processing cycles, and not more interpreting. As a result of this, programs written in OpenCV run much faster than similar programs written in Matlab. Resources needed: Due to the high level nature of Matlab, it uses a lot of your systems resources. And I mean A LOT! Matlab code requires over a gig of RAM to run through video. In comparison, typical OpenCV programs only require ~70mb of RAM to run in real-time. The difference as you can easily see is HUGE! Cost: List price for the base (no toolboxes) MATLAB (commercial, single user License) is around USD 2150. OpenCV is free!

    Visual studio 2010: I have used Microsoft visual studio 2010 as programming

    platform for writing, compiling and running my opencv codes, its very easy to

    integrate opencv libraries with visual studio and it is fast and efficient.

    WHY FPGA OR GPU

    NEEDED: As it is about Autonomous Vehicle which have to recognize lanes, obstacles(vehicles,etc), Traffic signs, faces, etc in real time otherwise it may cause serious accidents, therefore it is big challenge to execute the algorithm in real time

    without any lag. When complexity of code increases my laptop start showing some lag, I have tried some embedded boards like PANDA BOARD, INTEL ATOM DE-2I-150 but because complexity of my code was high therefore real time execution was not achievable, though lag was not too much but if complexity increases further then it may become quit significant.

  • FPGA (Field Programmable Gate Array): FPGAs are programmable semiconductor devices that are based around a matrix

    of Configurable Logic Blocks (CLBs) connected through programmable interconnects. FPGAs can be programmed to the desired application or functionality requirements. Using HDL ie hardware descripting language like VHDL, VERILOG, etc. Logic blocks shown below are made up of simple gate arrangement which makes a multiplexer type design with some specific lookup table(Truth table), and this truth table is fed to logic blocks using HDL.

    ADVANTAGES OF USING FOGA FOR IMAGE PROCESSING:

    The biggest advantage of using FPGA is parallel processing, as mentioned earlier, fpga is array of lakhs of programmable gates which can be designed into desired building blocks according to our programming needs. Different regions of fpga can be used for different processing which is called parallel processing, this provides great flexibility over normal CPU because we can decompose the complex algorithm into simpler one and execution can be done parallel which boost up the speed upto 10%. Also processing can be shared between CPU and FPGA.

  • ALTERA INTEL ATOM DE2i-150 BOARD:

    As shown in the figure, it is INTEL ATOM DE2i-150 board. This board is nice combo of intel Atom processor with 1.6GHz speed and Cyclone lV FPGA. Atom processor and FPGA are connected by communication bus

    to boost up the processing and achieve real time feedback. But when I start working on this board its intel Atom processor works

    good for less complex codes but for more complex code it starts lagging so I tried to link FPGA available on this board to Atom board via PCIe communication bus to boost up the speed of execution. I googled too

    much and finally come to the conclusion there are many disadvantage in using FPGA, mentioned below

    DISADVANTAGES OF USING FPGA FOR IMAGE PROCESSING:

    There are three major problems with FPGA

    First is that FPGA do not have any memory of its own, it is just a hardware with no storage capacity, once power is off, FPGA losses its program. One have to always use some processing unit to program FPGA.

    Secondly one cannot use familier languages like C, C++, java, etc to program FPGA, for programming it, Hardware description languages like VHDL or VERILOG, working with complex algorithm on HDL is quite hard.

  • Third is that opencv is dynamic library whereas FPGA are only hardware with no memory so we cant use OPENCV for image processing, we have to choose other tools like OPENCL or VIVADO, and these tools are not as efficient as OPENCV.

    Because of these problems I used the I7 processor with enough processor speed which needed for my code.

    IMAGE PROCESSING ALGORITHMS FOR LANE

    DETECTION I have divided this section in two parts:

    Part A: Briefing the concepts of image processing used in the codes for detecting lanes, removing noise, etc. Part B: Algorithms and codes which works well and detected lanes successfully but were not adaptive in every conditions and depends on certain factors like type of roads, lanes and surroundings. I tried 3 such algorithms and finally reached to the final algorithm. Part C: Final algorithm and code for lane detection which works well for every

    roads and conditions.

  • PART A: FOLLOWINGS ARE CONCEPTS AND TOOLS I USED IN LANE

    DETECTION:

    1)CANNY/SOBEL EGDE DETECTION:

    2)HOUGH TRANSFORM.(For shape detection)

    3)BIRD EYE VIEW / PERSPECTIVE VISION.

    4)GAUSSIAN BLURR.

    5)HORIZON REMOVAL.

    6)There are many other small concepts which I have dealed in the

    explanation of codes.

    DETALIED EXPLANATIONS::

    1)CANNY/SOBEL EGDE DETECTION:

    1. The Canny Edge detector was developed by John F. Canny in 1986. Also known

    to many as the optimal detector, Canny algorithm aims to satisfy three main

    criteria:

    o Low error rate: Meaning a good detection of only existent edges.

    o Good localization: The distance between edge pixels detected and real

    edge pixels have to be minimized.

    o Minimal response: Only one detector response per edge.

    o

    IMPLEMENTATION:

  • 1) Filter out any noise. The Gaussian filter is used for this purpose. An example of a

    Gaussiankernel of size = 5 that might be used is shown below:

    2) Find the intensity gradient of the image: A) Apply a pair of convolution masks (in and directions) and Find the gradient strength and direction with:

    The direction is rounded to one of four possible angles (namely 0, 45, 90 or 135) 3) Non-maximum suppression is applied. This removes pixels that are not considered to be part of an edge. Hence, only thin lines (candidate edges) will remain. 4) Hysteresis: The final step. Canny does use two thresholds (upper and lower): If a pixel gradient is higher than the upper threshold, the pixel is accepted as an edge If a pixel gradient value is below the lower threshold, then it is rejected. If the pixel gradient is between the two thresholds, then it will be accepted only if it is connected to a pixel that is above the upper threshold. Canny recommended a upper:lower ratio between 2:1 and 3:1. Example :::

  • 2) HOUGH TRANSFORM: hough transform is a function available in opencv which helps in shape detection of some standard geometrical shapes like circle, line, ellipse. I used hough line transform for lane detection, there are two types of hough line transform:1)HOUGH STANDARD TRANSFORM. 2)HOUGH PROBABLISTIC TRANSFORM.

    2)HOUGH STANDARD TRANSFORM: When we take image as matrix of pixel then a line in this image matrix can be represented in two basic forms: 1) Cartesian coordinate system: Parameters: (m,b). 2) Polar coordinate system: Parameters: (r, theta).

    In Hough standard Transform, we will express lines in the Polar system. Hence, a line equation can be written as:

    OR

    each pair represents each line that passes by

    In general for each point we can define the family of lines that goes through that point as: .

    If for a given we plot the family of lines that goes through it, we get a sinusoid. For instance, for x=8, y=6, we get the following plot (in a plane - ):

  • We consider only points such

    that and .

    3) We can do the same operation above for all the points in an image. If the curves of two different points intersect in the plane - , that means that both points belong to a same line. For instance, following with the example above and

    drawing the plot for two more points: , and , , we get:

  • The three plots intersect in one single point , these coordinates are the parameters

    ( ) or the line in which , and lay. It means that in general, a line can be detected by finding the number of intersections between curves.The more curves intersecting means that the line represented by that intersection have more points. In general, we can define a threshold of the minimum number of intersections needed to detect a line.

    2B) HOUGH PROBABLISTIC TRANSFORM: It is much more efficient and accurate way to detect lines then hough transform because instead of returning lines in polar coordinates it directly gives two cartesian coordinates of the detected lines.So it is easy to interprete the data returned by the function

    ADVANTAGES OF HOUGH PROBABLISTIC OVER HOUGH STANDARD : (Advantages can

    be clearly seen from above two screenshot of hough standard and hough probabilistic

    implementation on a road.)

    In hough probabilistic, there is a parameter: minLineLength, it is used to set the

    Minimum line length. Line segments shorter than that are rejected, this is not

    present in hough standard transform.

    Another parameter present in hough probabilistic which is not in standard

    transform: maxLineGap Maximum allowed gap between points on the same

    line to link them.

    Third and most important advantage is that hough probabilistic directly returns

    cartesian coordinates not polar.

  • 3) BIRD EYE VIEW / PERSPECTIVE VIEW:

    When camera takes image then it covers area in shape of trapezium, ie as

    vertical distance increases, camera covers more area, and near camera it covers

    less area and hence covers the area in shape of trapezium, it is quite obvious.As a

    result it can be seen in above image that lanes which are parallel seems to be

    converging and intersecting at some point. So camera sees lanes as non parallel

    lines but we want it to be real (as if bird is viewing the road from top )

    So for getting lanes parallel I remapped the pixels of trapezium into rectangle by

    calculating the relation between real world distance and pixels (example 1cm =10

    pixel) and forming two matrix to adjust point 2 at point C and similarly point3 at

    pointD and then use remapping function available in opencv. This will give me

    parallel lanes as output.

  • 4) HORIZON REMOVAL: it means to remove the horizon (sky and other unwanted area above the road), this will help in removing unwanted noises and disturbances. It is done by accessing rows and columns and put pixel =0 to remove unwanted region or it can be done by using function setROI.

    5)GAUSSIAN BLURR:

    It is used to remove noise from the input image.

    When Gaussian blurr function is applied on matrix of pixel of input image it gives output image using the kernel:

    Function uses above matrix to perform averaging of pixels of the input image.

    PART B Here I have mentioned the algorithms and codes which works well and detected lanes successfully but were not adaptive in every conditions and depends on certain factors like type of roads, lanes and surroundings. I tried 3 such algorithms and finally reached to the final algorithm.

    ALGORITHM 1 & FLOWCHART:

  • Start Main function

    LOAD IMAGE OR VIDEO

    CONVERT IMAGE OR FRAME INTO GRAYSCALE AND APPLY CANNY EGDE

    DETECTOR FUNCTION

    APPLY HOUGH STANDARD TRANSFORM AND HOUGH PROBABLISTIC

    TRANSFORM SEPERATELY ON SAME IMAGE WITH ANGLE

    THRESHOLDING

    PERFORM BITWISE AND TO THE EACH PIXELS OF BOTH

    PROCESSED IMAGES, ie ONE WITH HOUGH STANDARD

    TRANSFORM AND ONE WITH HOUGH PROBABLISTIC

    DISPLAY THE

    PROCESSED IMAGE

    RELEASE ALL THE MEMORY USED BY

    IMAGES AND WINDOWS USED

    REMOVE NOISE AND REMOVE

    HORIZON

    ALGORITHM-1 IT IS

    NOT ADAPTIVE FOR

    EVERY TYPE OF

    LANES

  • IMPORTANT NOTE: code, sample videos, output videos are uploaded on google drive:

    https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing

    THIS WILL BE OUTPUT WHEN THIS ALGORITHM IS EXECUTED:

    Advantage of removing horizon can be seen as there are no lines above the line which differentiate road and sky. PROBLEM WITH THE ALGORITHM 1:For controlling a self driving car and maintaining fix distance from both the lanes, I needed the equation of the detected lines, which is not possible because more than one lines are detecting one lane, I needed one hough line for each lane so that I can find its equation and distance of car from the lane. ALGORITHM 2: As I discussed that in previous algorithm that finding equation of lane is possible only when I could get one line detecting one lane, so for that I used concept of averaging the values of rho and theta to get one single line, example (here k is variable which is incremented under loop to get weighted mean of rho and theta ) ` rho1=(rho1*(k-1)+rho)/k;

    theta1=(theta1*(k-1)+theta)/k; k++;

    https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing%20
  • IMPORTANT NOTE: code, sample videos, output videos are uploaded on google drive: https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing PROBLEM WITH THE ALGORITHM 2:

    ALGORITHM 2 solves the problem which algorithm one was having, that is now I got one hough line for each lane, now I can find equation of line, but incase of 4 lane or 6 lane roads, other other lanes are also detected lanes and get added to the weighted mean of hough lines and causes wrong results, this is serious problem I dumped this code and move toward other algorithm. One more problem with this code is that it do not works for curved roads because I have applied threshold on values of theta obtained from hough function: if ( theta < (CV_PI-19.*CV_PI/30) || theta > 19.*CV_PI/30.)//for first lane

    else if(theta < (CV_PI/180)*70 && theta> (CV_PI/180)*20)//for 2nd lane

    this do not allow hough function to print lines which are not in this theta range, so curves could not be detected. Only straight lanes are detected successfully.

    ALGORITHM 3:

    In this code I used distance of hough lines as tool to filter out other hough lines and obtained one hough line for each lane, this avoids weighted mean and errors mentioned in algorithm 2. Here I find equation of the all the hough lines as :

    column=(-sin(theta)/cos(theta)*350)+ rho/cos(theta);

    and divided the hough lines in two categories, one which are on left half of the image, and one which are on right side,then filter out only those lines which are nearest to the center of camera (on left and right of center respectively), because in between the roads there is very less probability of having lane like thick straight lines, and small lines which could be detected as lines are removed by Gaussian blurr or by setting parameters of hough probabilistic function (increase min gap between points which should be detected as lines ). Also k-mean clustering can be used to remove other lines .Well road signs which may cause problem are removed using Haar Classifier. PROBLEM WITH THE ALGORITHM 3:

    This code solves problem of weighted mean but still curved could not be detected

    because thresholding which is applied on theta is still there, moreover since lines

    are too much slanted and it is not possible to calculate pixel-real distance it is not

    possible to localizing car in between lanes.

    https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing%20
  • IMPORTANT NOTE:code, sample videos, output videos are uploaded on google drive: https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing

    PART C

    Final algorithm: To get accurate distance, and relation between real world distance and pixels I

    applied BIRD EYE VIEW(PERSPECTIVE VISION), I already mentioned about BIRD

    EYE VIEW above. This gives me relation between pixel and real world distance,

    and now car can be localized between the lanes.

    Also for detecting curved lanes I used following algorithm: Hough probabilistic returns pair coordinates which defines lines, I find centroid of this pair of coordinates and join it to the centroid of next pair of coordinates, (instead of joing pair of coordinates I joined centroids), this algorithm works well for detecting curved lanes because in this way hough line wil be formed along the tangent of the curves. this is derived from MEAN VALUE THEOREM THEORAM: if a function f is continuous on the closed interval [a, b], where a < b, and differentiable on

    the open interval (a, b), then there exists a point c in (a, b) such that,

    FINAL FLOWCHART: IMPORTANT NOTE: code, sample videos, output videos are uploaded on google drive:

    https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing

    https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing%20https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing%20
  • Start Main function

    LOAD IMAGE OR VIDEO

    CONVERT IMAGE OR FRAME INTO GRAYSCALE AND APPLY CANNY EGDE

    DETECTOR FUNCTION

    APPLY HOUGH PROBABLISTIC TRANSFORM AND GET PAIRS OF

    COORDINATES OF LINES IN A VECTOR

    JOIN CENTROIDS OF PAIR OF COORDINATES INSTEAD

    JOINING COORDINATES TO DETECT CURVES AND FILTER

    UNWANTED HOUGH LINES BY MINIMUM DISTANCE

    METHOD MENTIONED ABOVE

    DISPLAY THE

    PROCESSED IMAGE

    RELEASE ALL THE MEMORY USED BY

    IMAGES AND WINDOWS USED

    APPLY BIRD EYE VIEW AND REMOVE NOISE

    FINAL flowchart

    OPEN SERIAL PORT TO SEND DATA

    THROUGH UART

  • When code is executed then following results are displayed on four windows:

    RESULT

    1)First window : It shows input image.

  • 2) SECOND WINDOW: It shows image after applying BIRD EYE VIEW, CANNY EGDE and GAUSSIAN BLURR. 3)THIRD WINDOW: It shows image on which all the detected hough lines are shown without filtering. 4)FOURTH WINDOW: It shows final image with only hough lines on lanes. 5) FIFTH WINDOW(shown AT LAST): it is showing the distance of centre of camera from both of the lanes, and sending data for localizing car to processor using UART. IMPORTANT NOTE: code, sample videos, output videos are uploaded on google drive: https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing

    TRAFFIC SIGN, FACE, CHARACTER RECOGNITION

    WHY CASCADED CLASSIFIER NEEDED: Object recognition in opencv can be done in several ways like:

    COLOR THRESHOLDING. Recognising object of particular color is easy with color

    thresholding but it have several constraint like if another object comes in background

    then may be one will get wrong results.So one cannot rely on it for good result if

    background is not fix.

    CONTOUR DETECTION: It increases accuracy as compared to color recognition , what it

    do is that it detects the close areas and one can filter out other objects of same colors(in

    backgroung) using tools provided by contour function like with contour area one could

    set range of camera( if area is set greater then particular value then object of same color

    in background having small area can be removed). But still if object of approx same area

    and color may cause misleading results.

    FEATURE EXTRACTION USING CASCADE CLASSIFIERS: The classifier is a set of APIs that

    allow you to define classes, or categories of nodes. By running samples of classes

    through the classifier to train it on what constitutes a given class, you can then run that

    trained classifier on unknown documents or nodes to determine to which classes each

    belongs. There are many classifiers available on intertnet like HAAR, LBP, etc.

    With these classifiers one can not only detect color but also it gives good results for some

    complex tasks which are not possible with contour detection.

    Classifiers can be used for face detection, character and text recognition and many more.

    Classifier works on feature extraction . It involves following steps:

    https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing%20
  • 1) SAMPLING: Sampling means to collect sample images of the object which is to be detected. This is very important step, and for good results sampling should be done accurately. Generally for good face detection program more than 1000 samples are to be taken. Suppose I want to detect a traffic sign, for that I have to gather sample images of the sign from all possible angles and brightness conditions. More are the samples gathered more is the accuracy. In order to train our own classifier we need samples, which means we need a lot of images that show the object we want to detect (positive sample) and even more images without the object (negative sample).

    POSITIVE IMAGES It means images of object to be detected, take photos of the object you want to detect, look for them on the internet, extract them from a video or take some Polaroid pictures generate positive samples for OpenCV to work with. It's also important that they should differ in lighting and background. NEGATIVE IMAGES Now negative images are needed, the ones that don't show a object to be detected. In the best case, if one wants to train a highly accurate classifier, he should have a lot of negative images that look exactly like the positive ones, except that they don't contain the object we want to recognize. As I want to detect stop signs on walls, the negative images would ideally be a lot of pictures of walls. Maybe even with other signs. Keep an eye on the ratios of the cropped images, they shouldn't differ that much. The best results come from positive images that look exactly like the ones you'd want to detect the object in, except that they are cropped so only the object is visible.

    2)SETTING REGION OF INTEREST OR CROPPING: Second step includes setting region of interest in all the sample images gathered, that is to

    remove unwanted background and only selecting the features of object which characterizes

    that object. Now you need to crop them so that only our desired object is visible.

    3)Now collect information of cropped images in text file, information means its size, position in original

    image. Link which explains method of doing so is given at the last of the report.

    4) Training and Classification: Training is the process of taking content that is known to belong to specified classes and creating a

    classifier on the basis of that known content. Classification is the process of taking a classifier built with

    such a training content set and running it on unknown content to determine class membership for the

    unknown content. Training is an iterative process whereby you build the best classifier possible, and

    classification is a one-time process designed to run on unknown content.

    Finally after training, a xml file is generated which is loaded in the main program to match the features

    and detect whether object is present or not.

  • OUTPUT OF CODE: Here is OUTPUT OF CODE was run:::

    TRAFFIC SIGN, FACE AND VEHICLE RECOGNITION:

    IMPORTANT NOTE: code, sample videos, output videos are uploaded on google

    drive:https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV

    3c&usp=sharing

    https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing%20https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing%20
  • FUTURE WORK THAT CAN BE DONE

    1) Pedestrian recognition for better localization.

    2) In vehicle and road sign detection I used 200 positive images for

    training, which sometimes gives wrong result, so number of sample

    should increased.

    3) Shadow and illumination correction for better results.

    4) Multithreading of cascade classifier can used, it increases speed of

    training.

    5) Classifier called LBP can be used for more accurate results.

    IMPORTANT NOTE: code, sample videos, output videos are uploaded on google drive:

    https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1W

    EFRV3c&usp=sharing

    https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing%20https://drive.google.com/folderview?id=0BxV8Z1s8nFXWcDRuOXN1WEFRV3c&usp=sharing%20