Word Segmentation in Handwritten Korean Text Lines based on GAP Clustering

Jeong, Seon-Hwa;Kim, Soo-Hyung;

한국정보과학회논문지:소프트웨어및응용 (Journal of KIISE:Software and Applications)

제27권6호
/
Pages.660-667
/
2000
/
1229-6848(pISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

GAP 군집화에 기반한 필기 한글 단어 분리

Word Segmentation in Handwritten Korean Text Lines based on GAP Clustering

정선화 (전남대학교 전산학과) ;
김수형 (전남대학교 정보통신연구소)

발행 : 2000.06.15

PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문에서는 필기 한글 문자열 영상에 대한 단어 분리 방법을 제안한다. 제안된 방법은 gap 의 크기 정보를 사용하여 단어를 분리하는데, 이때 gap은 문자열 영상을 수직방향으로 투영한 후 흰-런 (white-run)을 찾음으로써 구할 수 있다. 문자열 영상으로부터 얻어지는 gap들의 크기를 측정한 후, 각각의 gap을 단어와 단어사이에 존재하는 gap과 문자와 문자사이에 존재하는 gap 중 하나로 분류한다. 본 논문에서는 필기 영문 문자열의 단어 분리를 위해 제안된 기존의 세 가지 거리 척도를 채택하고 군집화에 기반한 세 가지 분류방법을 적용하여 한글 문자열의 단어 분리를 위한 최적의 조합을 선정하였다. 우편봉투 상에 작성된 주소열로부터 수작업으로 추출한 305 개의 문자열 영상을 사용하여 실험한 결과 BB(bounding box) 거리를 사용하여 순차적 군집 방법을 적용하는 경우 3 순위까지의 누적 단어 분리 성공률이 88.52% 로서 가장 우수한 성능을 보여 주었다. 또한 하나의 문자열 영상에 대한 단어 분리 속도는 약 0.05초이다.

In this paper, a word segmentation method for handwritten Korean text line images is proposed. The method uses gap information to segment words in line images, where the gap is defined as a white run obtained after vertical projection of line images. Each gap is assigned to one of inter-word gap and inter-character gap based on gap distance. We take up three distance measures which have been proposed for the word segmentation of handwritten English text line images. Then we test three clustering techniques to detect the best combination of gap metrics and classification techniques for Korean text line images. The experiment has been done with 305 text line images extracted manually from live mail pieces. The experimental result demonstrates the superiority of BB(Bounding Box) distance measure and sequential clustering approach, in which the cumulative word segmentation accuracy up to the third hypothesis is 88.52%. Given a line image, the processing time is about 0.05 second.

키워드

참고문헌

S.N. Srihari and E.J. Keubert, 'Integration of hand-written address interpretation technology into the United States Postal Service remote computer reader system,' Proc. 4th International Conference on Document Analysis and Recognition, pp. 892-896, Ulm, Germany, Aug. 1997
S.N. Srihari, Y.C. Shin, V. Ramanaprasad and D.S. Lee, 'A system to read names and addresses on tax forms,' Technical Report CEDAR-TR-94-2, CEDAR, SUNY Buffalo, Oct. 1994
A.J. Elms, S. Procter and J. Illingworth, 'The advantage of using HMM-based approach for faxed word recognition,' International Journal of Document Analysis and Recognition, Vol. 1, No. 1, pp. 18-36, 1998 https://doi.org/10.1007/s100320050003
G. Seni and E. Cohen, 'External word segmentation of off-line handwritten text lines,' Pattern Recognition, Vol. 27, No. 1, pp. 41-52, 1994 https://doi.org/10.1016/0031-3203(94)90016-7
U. Mahadevan and R.C. Nagabushnam, 'Gap metrics for word separation in handwritten lines,' Proc. Third International Conference on Document Analysis and Recognition, pp. 124-127, Montreal, Canada, 1995
G. Kim, 'Architecture for handwritten text recognition systems,' Proc. Sixth International Workshop on Frontiers in Handwritten Recognition, pp. 113-122, Taejon, Korea, August 1998
G. Dzuba, A. Filatov and A. Volgunin, 'Handwritten ZIP code recognition,' Proc. Fourth International Conference on Document Analysis and Recognition, pp. 766-770, Ulm-Germany, August 1997
A.C. Downton, R.W.S. Tregidgo, et al., 'Recognition of handwritten British postal addresses,' From Pixels to Features Ⅲ: Frontiers in Handwriting Recognition, S. Impedovo and J.C. Simon, eds., pp. 129-143, 1992
D. Guillevic and C.Y. Suen, 'Cursive script recognition: A sentence level recognition scheme,' Proc. Fourth International Workshop on Frontiers in Handwritten Recognition, pp. 216-223, Taipei, Taiwan, 1994
J.T. Favata, S.N. Srihari and V. Govindaraju, 'Off-line handwritten sentence recognition,' Proc. Fifth International Workshop on Frontiers in Handwritten Recognition, pp. 171-176, Essex, England, 1996
S.N. Srihari, R.K. Srihari and V. Govindaraju, 'Handwritten text recognition,' Proc. Fourth International Workshop on Frontiers in Handwritten Recognition, pp. 265-274, Taipei, Taiwan, 1994
B. Yanikoglu and P. Sandon, 'Segmentation of off-line cursive handwriting using linear programming,' Pattern Recognition, Vol. 31, No. 12, pp. 1825-1833, 1998 https://doi.org/10.1016/S0031-3203(98)00081-8
G. Kim and V. Govindaraju, 'Handwritten phrase recognition as applied to street name images,' Pattern Recognition, Vol. 31, No. 1, pp. 41-51, 1998 https://doi.org/10.1016/S0031-3203(97)00023-X
윤정석, 김경환, '시간지연 신경망을 이용한 영문 필기체 단어 분리', 정보과학회 '99 춘계 학술발표 논문집, Vol. 26, No. 1, pp. 490-492, 1999
U. Mahadevan and S.N.Srihari, 'Hypotheses generation for word-separation in handwritten lines,' Proc. Fifth International Workshop on Frontiers in Handwritten Recognition, pp. 453-456, Essex, England, 1996
E. Cohen, J.J. Hull and S.N. Srihari, 'Control structure for interpreting handwritten addresses,' IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 16, No. 10, pp. 1049-1055, 1994 https://doi.org/10.1109/34.329003
S.N. Srihari, V. Govindaraju and A. Shekhawat, 'Interpretation of handwritten addresses in US mail stream,' Proc. Second International Conference on Document Analysis and Recognition, pp. 291-294, Tsukuba, Japan, 1993 https://doi.org/10.1109/ICDAR.1993.395729
V. Govindaraju, et al., 'Interpretation of handwritten addresses in US mail stream,' Proc. Third Sixth International Workshop on Frontiers in Handwritten Recognition, pp. 197-206, Buffalo, USA, 1993
P.K. Kim and H.J. Kim, 'Off-line handwritten Korean character recognition based on stroke extraction and representation,' Pattern Recognition Letters, Vol. 15, No. 12, pp. 1245-1253, 1994 https://doi.org/10.1016/0167-8655(94)90115-5
U. Manber, Introduction to Algorithms: A Creative Approach, Addison Wesley, 1989

한국정보과학회논문지:소프트웨어및응용 (Journal of KIISE:Software and Applications)

GAP 군집화에 기반한 필기 한글 단어 분리

Word Segmentation in Handwritten Korean Text Lines based on GAP Clustering

초록

키워드

참고문헌

자세히 찾기