Abstract
An approach based on edge-pixels clustering to extract Chinese and English text areas from an image is proposed. The image is segmented into pixel-subclasses based on the colors and positions of edge-pixels. And then the initial text areas are extracted according to the features of edges in text area. The boundaries of the initial text areas are expanded for the entire text areas. Furthermore, an algorithm of text area binarization is presented to improve the efficiency of post-processing by reducing the number of binary images when the text color polarity is unknown. The experimental results show that the proposed approach is effective with integrality up to 99%.
Original language | English |
---|---|
Pages (from-to) | 729-734 |
Number of pages | 6 |
Journal | Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics |
Volume | 18 |
Issue number | 5 |
Publication status | Published - May 2006 |
Keywords
- Clustering
- Image binary
- Image retrieval
- Optical character recognition (OCR)
- Text area extraction