• Title/Summary/Keyword: 특수기호 검출

Search Result 6, Processing Time 0.039 seconds

Decomposition of a Text Block into Words Using Projection Profiles, Gaps and Special Symbols (투영 프로파일, GaP 및 특수 기호를 이용한 텍스트 영역의 어절 단위 분할)

  • Jeong Chang Bu;Kim Soo Hyung
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1121-1130
    • /
    • 2004
  • This paper proposes a method for line and word segmentation for machine-printed text blocks. To separate a text region into the unit of lines, it analyses the horizontal projection profile and performs a recursive projection profile cut method. In the word segmentation, between-word gaps are identified by a hierarchical clustering method after finding gaps in the text line by using a connected component analysis. In addition, a special symbol detection technique is applied to find two types of special symbols tying between words using their morphologic features. An experiment with 84 text regions from English and Korean documents shows that the proposed method achieves 99.92% accuracy of word segmentation, while a commercial OCR software named Armi 6.0 Pro$^{TM}$ has 97.58% accuracy.y.

A System for the Decomposition of Text Block into Words (텍스트 영역에 대한 단어 단위 분할 시스템)

  • Jeong, Chang-Boo;Kwag, Hee-Kue;Jeong, Seon-Hwa;Kim, Soo-Hyung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.10a
    • /
    • pp.293-296
    • /
    • 2000
  • 본 논문에서는 주제어 인식에 기반한 문서영상의 검색 및 색인 시스템에 적용하기 위한 단어 단위 분한 시스템을 제안한다. 제안 시스템은 영상 전처리, 문서 구조 분석을 통해 추출된 텍스트 영역을 입력으로 단어 단위 분할을 수행하는데, 텍스트 영역에 대해 텍스트 라인을 분할하고 분할된 텍스트 라인을 단어 단위로 분할하는 계층적 접근 방법을 사용한다. 텍스트라인 분할은 수평 방향 투영 프로파일을 적용하여 분할 지점을 구한다. 그리고 단어 분할은 연결요소들을 추출한 후 연결요소간의 gap 정보를 구하고, gap 군집화 기법을 사용하여 단어 단위 분한 지점을 구한다. 이때 단어 단위 분할의 성능을 저하시키는 특수기호에 대해서는 휴리스틱 정보를 이용하여 검출한다. 제안 시스템의 성능 평가는 50개의 텍스트 영역에 적용하여 99.83%의 정확도를 얻을 수 있었다.

  • PDF

Word Extraction from Table Regions in Document Images (문서 영상 내 테이블 영역에서의 단어 추출)

  • Jeong, Chang-Bu;Kim, Soo-Hyung
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.369-378
    • /
    • 2005
  • Document image is segmented and classified into text, picture, or table by a document layout analysis, and the words in table regions are significant for keyword spotting because they are more meaningful than the words in other regions. This paper proposes a method to extract words from table regions in document images. As word extraction from table regions is practically regarded extracting words from cell regions composing the table, it is necessary to extract the cell correctly. In the cell extraction module, table frame is extracted first by analyzing connected components, and then the intersection points are extracted from the table frame. We modify the false intersections using the correlation between the neighboring intersections, and extract the cells using the information of intersections. Text regions in the individual cells are located by using the connected components information that was obtained during the cell extraction module, and they are segmented into text lines by using projection profiles. Finally we divide the segmented lines into words using gap clustering and special symbol detection. The experiment performed on In table images that are extracted from Korean documents, and shows $99.16\%$ accuracy of word extraction.

Word Image Decomposition from Image Regions in Document Images using Statistical Analyses (문서 영상의 그림 영역에서 통계적 분석을 이용한 단어 영상 추출)

  • Jeong, Chang-Bu;Kim, Soo-Hyung
    • The KIPS Transactions:PartB
    • /
    • v.13B no.6 s.109
    • /
    • pp.591-600
    • /
    • 2006
  • This paper describes the development and implementation of a algorithm to decompose word images from image regions mixed text/graphics in document images using statistical analyses. To decompose word images from image regions, the character components need to be separated from graphic components. For this process, we propose a method to separate them with an analysis of box-plot using a statistics of structural components. An accuracy of this method is not sensitive to the changes of images because the criterion of separation is defined by the statistics of components. And then the character regions are determined by analyzing a local crowdedness of the separated character components. finally, we devide the character regions into text lines and word images using projection profile analysis, gap clustering, special symbol detection, etc. The proposed system could reduce the influence resulted from the changes of images because it uses the criterion based on the statistics of image regions. Also, we made an experiment with the proposed method in document image processing system for keyword spotting and showed the necessity of studying for the proposed method.

Effect of collection time on the chemical composition and levels of thiobarbituric acid reactive substance of Godulbaegi (Youngia sonchifolia M.) (채취시기에 따른 고들빼기의 성분 조성과 산화방지활성)

  • Hwang, Tae Yean;Huh, Chang Ki
    • Food Science and Preservation
    • /
    • v.24 no.6
    • /
    • pp.786-794
    • /
    • 2017
  • This study analyzes the chemical composition and thiobarbituric acid reactive substance levels of Godulbaegi (Youngia sonchifolia M.) depending on collection time. The moisture and crude fat content in leaf and root decreased, while crude fiber, crude protein, carbohydrate, and ash increased with increases in collection time. The mineral elements tended to increase in each sample with increases in collection time. The content of vitamin B increased as collection time increased. Vitamin C content was approximately five times higher in the leaves than that in the roots. Total amino acids in leaf and root increased considerably as collection time increased content of phenolic compounds in root were higher than that in the leaf and these contents increased. Antioxidant activity of Godulbaegi was higher in the root than in the leaf and increased as collection time increased.

Physicochemical Characteristics of Sikhye (Korean Traditional Rice Beverage) with Specialty Rice Varieties (특수미 품종에 따른 식혜의 이화학적 특성)

  • Kim, Kee-Jong;Woo, Koan-Sik;Lee, Jin-Seok;Chun, A-Reum;Choi, Yoon-Hee;Song, Jin;Suh, Sae-Jung;Kim, Sun-Lim;Jeong, Heon-Sang
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.37 no.11
    • /
    • pp.1523-1528
    • /
    • 2008
  • This study was carried out to compare the physicochemical characteristics and sensory quality of Sikhye (a Korean traditional rice beverage) prepared with specialty rice varieties. The results showed that Ilpum had higher hulling recovery, milled/brown rice ratio, and milling recovery than Sulgaeng, Baegjinju, Baegjinju 1, and Dongjinchal. The alkali digestive value, protein content, and amylose contents of Sulgaeng were 6.3, 7.3% and 19.3%, respectively. The highest brix degree was $10.00^{\circ}Bx$ in Baegjinju Sikhye. The turbidity appeared at 0.4440, 0.4100, 0.3828, 0.3372, 0.1414 in Ilpum, Baegjinju, Baegjinju 1, Sulgaeng, and Dongjinchal Sikhye, respectively. There were no significant differences in pH and maltose contents among the groups. The highest L-value was 44.62 in Ilpum Sikhye. The a-value and b-value were $-1.66{\sim}-0.70$ and $-9.18{\sim}-5.19$, respectively. Finally, the sensory evaluation results indicated that there were no significant differences in appearance, aroma, and taste between the groups, and the Sulgaeng Sikhye showed higher overall quality than the Dongjinchal Sikhye as the control.