Deep Learning Object Recognition Algorithm for Video

Technology #14476

Questions about this technology? Ask a Technology Manager

Download Printable PDF

Categories
Researchers
Jose C. Principe
Rakesh Chalasani
Managed By
Richard Croley
Assistant Director 352-392-8929
Patent Protection
US Patent 9,536,177

Architecture Extends Content-Based Retrieval to Video without Prior Training

This object recognition algorithm is capable of discriminating objects in videos without requiring extensive training as do most available methods. Based on a deep learning architecture normally developed for images, it provides video processing and object tracking to aid in computer vision applications such as self-driving cars, automated military drones, and surveillance. In 2012, the computer vision market was valued at $4.37 billion. Automated object recognition is classically a complex field requiring the specification of the large number of variations an object can have in an environment, including position, rotation, and scale. Many object recognition algorithms capture and process the entire image at once, losing the finer detail and requiring high processing power. Researchers at the University of Florida have developed an unsupervised object recognition algorithm in video that doesn’t require extensive training. The algorithm narrows the targeted data, reducing the amount of information and power necessary for processing.

Application

Algorithm for natural object recognition in video

Advantages

  • Capable of learning model parameters, requiring no human intervention for training
  • Extends content-based retrieval to natural video, aiding computer vision applications
  • Uses time to disambiguate images, resulting in higher quality performance

Technology

This object recognition software is based on deep learning but uses a dynamic model to handle video processing. Therefore it is capable of processing the large numbers of variations an object can have in an environment. The variations include scale, rotation, position, etc. The model sparsely represents the observations, analyzes parts of the input data independently and combines them in a hierarchical fashion with top down information. The inputs from the images are processed before being combined to form a globally invariant representation. These invariant representations can then be fed to a classifier for robust object recognition.