Apparatus and Method for Detecting Scene Text

Technology #16964

Questions about this technology? Ask a Technology Manager

Download Printable PDF

Image Gallery
Xiaolin Li
Pan He
Managed By
Richard Croley
Assistant Director 352-392-8929
Patent Protection
US Patent Pending

Precisely Identifies Individual Words from a Rough Region of Text Found in the Natural World

This computer vision system detects unconstrained texts or symbols on objects within images captured in the natural world, instead of those viewed on a screen. Identifying and analyzing the text displayed in natural images has numerous potential applications including image retrieval, industrial automation, and robot navigation. However, challenges to image text analysis remain due to the vast diversity of text attributes including scale, orientation, illumination level, font, and the addition of highly complex backgrounds. Available text detection systems largely rely on bottom-up or component-based analytical procedures, both of which suffer from limitations that significantly reduce efficiency and performance.

Researchers at the University of Florida have developed a fast, yet accurate, text detection system that predicts word-level bounding boxes from a natural image all in one shot. It is a real-time text detector that works reliably on images featuring texts and symbols of multiple scales and orientations. It achieves state-of-the-art performance by incorporating more local details and stronger context information and avoiding errors by using fewer sequential steps.


Computer vision text or symbol detector for use in optical character recognition (OCR) systems, document scanners, or text-based image searches


  • Reduces background interference, allowing it to identify extremely challenging text with much higher word-level accuracy
  • Uses a hierarchical, elimination-based algorithm to decode words, enabling it to account for specific text region features and capture stronger context information
  • Applies artificial intelligence search routines, allowing it to learn from errors and speed up the detection process


This technology performs three tasks to detect text: analyzes a natural image, identifies text regions within the image, then indicates the text present by placing a bounding box around the text. Suppressing background interference makes possible the identification of even very small text present within an image. Additionally, this text detector can enhance details and context information within the scene to reliably detect multi-scaled or multi-oriented text in a single image.