Search | Korea Science

Automatic Word-Spacing of Syllable Bi-gram Information for Korean OCR Postprocessing (음절 Bi-gram정보를 이용한 한국어 OCR 후처리용 자동 띄어쓰기)

전남열;박혁로
- Proceedings of the Korean Society for Cognitive Science Conference
- /
- 2000.06a
- /
- pp.95-100
- /
- 2000
문자 인식기를 가지고 스캔된 원문 이미지를 인식한 결과로 형태소 분석과 어절 분석을 통해 대용량의 문서 정보를 데이터베이스에 구축하고 전문 검색(full text retrieval)이 가능하도록 한다. 그러나, 입력문자가 오인식된 경우나 띄어쓰기가 잘못된 데이터는 형태소 분석이나 어절 분석에 그대로 사용할 수가 없다. 한글 문자 인식의 경우 문자 단위의 인식률은 약 90.5% 정도나 문자 인식 오류와 띄어쓰기 오류 등을 고려한 어절 단위의 인식률은 현저하게 떨어진다. 이를 위해 한극어의 음절 특성을 고려해서 사전을 기반하지 않고 학습이 잘된 말뭉치(corpus)와 음절 단위의 bi-gram 정보를 이용한 자동 띄어쓰기를 하여 실험한 결과 학습 코퍼스의 크기와 띄어쓰기 오류 위치 정보에 따라 다르지만 약 86.2%의 띄어쓰기 정확도를 보였다. 이 결과를 가지고 형태소 분서고가 언어 평가 등을 이용한 문자 인식 후처리 과정을 거치면 문자 인식 시스템의 인식률 향상에 크게 영향을 미칠 것이다.
PDF

Facial Features Detection for Facial Caricaturing System (캐리커처 실성 시스템을 위한 얼굴 특징 추출 연구)

Lee, Ok-Kyoung;Park, Yeun-Chool;Oh, Hae-Seok
- Proceedings of the Korea Information Processing Society Conference
- /
- 2000.10b
- /
- pp.1329-1332
- /
- 2000
캐리커처 생성 시스템은 입력된 인물 사진을 세그먼테이션을 통하여 특징(이목구비)을 추출하고, 추출된 특징정보를 이용하여 기와 유사한 특징정보를 가지는 캐리커처 이미지를 검색하여 매핑시키는 시스템이다. 캐리커처 생성 시스템에 얼굴 특징정보 추출은 색상과 모양에 대한 정보를 이용한다. 본 논문은 캐리커처생성을 위한 인물 사진을 세그멘테이션 처리하여 부분 영역 특징정보를 추출하는데 그 목적이 있다. 이때 사용하는 이목구비의 특징정보를 위해 수직, 수평의 히스토그램이 주요하게 사용된다. 또한 인물 사진에서 위치정보를 이용하여 얼굴내의 이목구비를 확인하고, 추출하므로 정확한 정보를 이용할 수 있다.
PDF

Automatic Word-Spacing of Syllable Bi-gram Information for Korean OCR Postprocessing (음절 Bi-gram정보를 이용한 한국어 OCR 후처리용 자동 띄어쓰기)

Jeon, Nam-Youl;Park, Hyuk-Ro
- Annual Conference on Human and Language Technology
- /
- 2000.10d
- /
- pp.95-100
- /
- 2000
문자 인식기를 가지고 스캔된 원문 이미지를 인식한 결과로 형태소 분석과 어절 분석을 통해 대용량의 문서 정보를 데이터베이스에 구축하고 전문 검색(full text retrieval)이 가능하도록 한다. 그러나, 입력문자가 오인식된 경우나 띄어쓰기가 잘못된 데이터는 형태소 분석이나 어절 분석에 그대로 사용할 수가 없다. 한글 문자 인식의 경우 문자 단위의 인식률은 약 90.5% 정도나 문자 인식 오류와 띄어쓰기 오류 등을 고려한 어절 단위의 인식률은 현저하게 떨어진다. 이를 위해 한국어의 음절 특성을 고려해서 사전을 기반하지 않고 학습이 잘된 말뭉치(corpus)와 음절 단위의 bigram 정보를 이용한 자동 띄어쓰기를 하여 실험한 결과 학습 코퍼스의 크기와 띄어쓰기 오류 위치 정보에 따라 다르지만 약 86.2%의 띄어쓰기 정확도를 보였다. 이 결과를 가지고 형태소 분석과 언어 평가 등을 이용한 문자 인식 후처리 과정을 거치면 문자 인식 시스템의 인식률 향상에 크게 영향을 미칠 것이다.
PDF

Data Model Design and Pilot Development for Viewing of Compensation Evidence Data in Public Service (공익사업 보상증빙자료뷰어 개발을 위한 데이터 모델 설계 및 파일럿시스템 개발)

Seo, Myoung-Bae
- Proceedings of the Korea Information Processing Society Conference
- /
- 2011.11a
- /
- pp.1510-1511
- /
- 2011
보상업무를 수행하는 정부 및 산하기관, 공사/공단, 보상전문기관 등에서 2000년대 중반부터 자체적으로 보상시스템을 개발하여 운영중에 있으나 보상시스템은 현재 진행중인 공사와 관련된 자료만 전자적으로 보관하고 있어 기 보상 완료된 자료는 여전히 수작업을 관리하고 있는 실정이다. 때문에 과거 보상자료와 관련된 민원을 응대하기 위해 서고에서 방대한 보상자료를 검색하기 위해 많은 시간을 소비하고 있을 뿐만 아니라, 보상자료 손 망실에 따른 민원인과의 소송에서 패하는 등 국고를 낭비하는 원인을 초래하기도 한다. 이에 과거보상자료의 손 망실에 따른 피해를 최소화하고 민원에 효율적으로 대응하기 위한 반드시 보관해야 할 주요 대장을 선별하고 이를 메타데이터와 결합하여 이미지를 조회할 수 있는 증빙서류뷰어(Viewer) 개발을 위해 주요대장정보를 발췌하여 데이터모델을 설계하고 이를 실증하기 위한 파일럿시스템을 개발하였다.
https://doi.org/10.3745/PKIPS.y2011m11a.1510 인용 PDF

A Design and Implementation of a Query Interpreter for SQL/MM Part5 (SQL/MM Part5를 지원하는 쿼리변환기의 설계 및 구현)

Kang Gi-Jun;Lee Bu-Kwon;Seo Yeong-Geon
- Journal of Digital Contents Society
- /
- v.6 no.2
- /
- pp.107-112
- /
- 2005
We need a research for representing and processing of multimedia data in database because of increasing the importance and utilization of the data owing to development of internet technology. RDBMS supports only the storing-structure to store multimedia, but the support for data type, representation and query of multimedia is insufficient. To cope with this problem, ISO/IEC standardized SQL multimedia(SQL/MM) for multimedia data. However, ORDBMS supports SQL/MM, but RDBMS does not support it. Therfore, this theis proposes a query interpreter to support SQL/MM in MS-SQL 2000 as one of RDBMS and introduces a image retrieval application using it. The quary interpreter supports the function to convert SQL/MM into SQL, and additionally the function of the image duplication check. The image processing application using a query interpreter can easily be integrated and operated with traditional RDBMS-based system.
PDF

Implementation of A Clipping-based Conversion Server for Building Wireless Internet Sites (Clipping 기반의 무선 인터넷 사이트 구축용 변환 서버 구현)

Cho, Seung-Ho;Cha, Jeong-Hoon
- The KIPS Transactions:PartA
- /
- v.11A no.2
- /
- pp.165-174
- /
- 2004
Because a quantity wireless internet contents is much less than wired internet contents, it exist high necessities that wired internet contents should be converted into wireless internet contents. The conversion server implemented in this paper, automatically recognizes the type of user agents when they request, retrieves source documents on the web site specified by an URL, generates metaXML documents as an intermediate form, and converts them wireless markup documents appropriate for user agents. The conversion server interoperates with the image converter for image conversion and the clipper which is an authoring tool for clipping existing wired internet documents. We performed experiments about capability of the conversion server transcoding static/dynamic web pages specified by an URL. According to performance results on dynamic web pages, the conversion server showed better throughput when a thread pool in the terror maintains 5 threads compared with 1 and 10 threads.
https://doi.org/10.3745/KIPSTA.2004.11A.2.165 인용 PDF KSCI

Generating Pairwise Comparison Set for Crowed Sourcing based Deep Learning (크라우드 소싱 기반 딥러닝 선호 학습을 위한 쌍체 비교 셋 생성)

Yoo, Kihyun;Lee, Donggi;Lee, Chang Woo;Nam, Kwang Woo
- Journal of Korea Society of Industrial Information Systems
- /
- v.27 no.5
- /
- pp.1-11
- /
- 2022
With the development of deep learning technology, various research and development are underway to estimate preference rankings through learning, and it is used in various fields such as web search, gene classification, recommendation system, and image search. Approximation algorithms are used to estimate deep learning-based preference ranking, which builds more than k comparison sets on all comparison targets to ensure proper accuracy, and how to build comparison sets affects learning. In this paper, we propose a k-disjoint comparison set generation algorithm and a k-chain comparison set generation algorithm, a novel algorithm for generating paired comparison sets for crowd-sourcing-based deep learning affinity measurements. In particular, the experiment confirmed that the k-chaining algorithm, like the conventional circular generation algorithm, also has a random nature that can support stable preference evaluation while ensuring connectivity between data.
https://doi.org/10.9723/jksiis.2022.27.5.001 인용 PDF KSCI

Design and Implementation of a Browser for Educational PDA Contents (교육용 PDA 컨텐츠 브라우저의 설계 및 구현)

신재룡
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.6 no.8
- /
- pp.1223-1233
- /
- 2002
Recently various electronic books (I-Book) based on PDA (personal digital assistance) that we can easily use anytime and anywhere have been developed. Volume and weight of the E-Book is much less than that of traditional books. In that reason, it is easy to carry and serve us with contents by diverse functions such as searching, bookmark, dictionary, and playing of color image, sound or moving picture. On account of these advantages, many products connected with I-Book have been emerged in the market. However a product connected with educational contents is scarce, because it requires not only normal function but also additional functions such as a problem solving. Therefore it is actually necessary to develop a browser and an editor for educational contents. In this paper, we express educational contents by XML and define structure of document with XML schema. Then, we design and implement an editor and a browser that can manage educational contents on PDA.
PDF KSCI

Face Annotation System for Social Network Environments (소셜 네트웍 환경에서의 얼굴 주석 시스템)

Chai, Kwon-Taeg;Byun, Hye-Ran
- Journal of KIISE:Computing Practices and Letters
- /
- v.15 no.8
- /
- pp.601-605
- /
- 2009
Recently, photo sharing and publishing based Social Network Sites(SNSs) are increasingly attracting the attention of academic and industry researches. Millions of users have integrated these sites into their daily practices to communicate with online people. In this paper, we propose an efficient face annotation and retrieval system under SNS. Since the system needs to deal with a huge database which consists of an increasing users and images, both effectiveness and efficiency are required, In order to deal with this problem, we propose a face annotation classifier which adopts an online learning and social decomposition approach. The proposed method is shown to have comparable accuracy and better efficiency than that of the widely used Support Vector Machine. Consequently, the proposed framework can reduce the user's tedious efforts to annotate face images and provides a fast response to millions of users.
PDF KSCI

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
- Journal of Intelligence and Information Systems
- /
- v.24 no.1
- /
- pp.1-23
- /
- 2018
From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.
https://doi.org/10.13088/jiis.2018.24.1.001 인용 PDF KSCI

Search Result 46, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)