dc.description.abstract |
Computer vision is one of the state-of-the-art technologies for object detection problems. Accurate detection of obstacles could assist blind and visually impaired people to navigate safely while they walk. However, object detection and tracking are a one of the challenging tasks in computer vision. Mainly there are basic steps in video analysis: detection of target objects in consecutive frames, and analysis of objects to understand their behavior. Most recently, several methods of object detection based on the Convolutional Neural Network have improved performance under different conditions like speed and accuracy. Still, most of these methods' slow recognition speed limits their use in real-time situations. Recently, a unified object detection model, You Only Look Once - YOLO, was proposed, which could directly regress from input image to object class scores and positions. It processes detection in a speed of 45 fps on PASCAL VOC 2007 dataset and YOLOv2 (the second version of YOLO model) performs detection speed in more than double of the previous version in a higher detection accuracy. However, this model still has limitations when applied to blind assisting obstacle detection tasks. It processes images individually, given the fact that in the video streaming the location of an object changes continuously. Thus, between continuous frames, the model ignores a lot of important information. In this thesis work, we applied YOLOv2 to our own prepared datasets (of three different classes namely pothole, garbage bin and pole, all images are collected in the daytime) by proposing a technique called Short-Term Memory, which considers information between every frames, to reinforce the detection capability of YOLO in video streaming by including object location and size estimation tasks. Through this we achieved mean Average Precision of 60.17% of accuracy with average detection speed of 34.6 fps. |
en_US |