Rankings
Publications
Search Publications
Cited-By Search
Sources
Publishers
Scholars
Scholars
Top Cited Scholars
Organizations
About
Login
Register
Home
Publications
OCR for World Wide Web images
Home
Publications
OCR for World Wide Web images
OCR for World Wide Web images
JZ
Jiangying Zhou
Jiangying Zhou
DL
Daniel P. Lopresti
Daniel P. Lopresti
ZL
Zhibin Lei
Zhibin Lei
Publisher Website
Google Scholar
Add to Library
Cite
Download
Share
Download
3 April 1997
proceedings article
Published by
SPIE-Intl Soc Optical Eng
Vol. 3027
,
58-66
https://doi.org/10.1117/12.270080
Abstract
A significant amount of text now present in World Wide Web documents is embedded in image data, and a large portion of it does not appear elsewhere at all. To make this information available, we need to develop techniques for recovering textual information from in-line Web images. In this paper, we describe two methods for Web image OCR. Recognizing text extracted from in-line Web images is difficult because characters in these images are often rendered at a low spatial resolution. Such images are typically considered to be 'low quality' by traditional OCR technologies. Our proposed methods utilize the information contained in the color bits to compensate for the loss of information due to low sampling resolution. The first method uses a polynomial surface fitting technique for object recognition. The second method is based on the traditional n-tuple technique. We collected a small set of character samples from Web documents and tested the two algorithms. Preliminary experimental results show that our n-tuple method works quite well. However, the surface fitting method performs rather poorly due to the coarseness and small number of color shades used in the text.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.
Keywords
OPTICAL CHARACTER RECOGNITION
ALGORITHMS
OBJECT RECOGNITION
INTERNET
All Articles
Open Access
Cited by 7 articles