• Title/Summary/Keyword: 문자집합

Search Result 87, Processing Time 0.022 seconds

Development of polygon object set matching algorithm between heterogeneous digital maps - using the genetic algorithm based on the shape similarities (형상 유사도 기반의 유전 알고리즘을 활용한 이종 수치지도 간의 면 객체 집합 정합 알고리즘 개발)

  • Huh, Yong;Lee, Jeabin
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.1
    • /
    • pp.1-9
    • /
    • 2013
  • This paper proposes a matching algorithm to find corresponding polygon feature sets between heterogeneous digital maps. The algorithm finds corresponding sets in terms of optimizing their shape similarities based on the assumption that the feature sets describing the same entities in the real world are represented in similar shapes. Then, by using a binary code, it is represented that a polygon feature is chosen for constituting a corresponding set or not. These codes are combined into a binary string as a candidate solution of the matching problem. Starting from initial candidate solutions, a genetic algorithm iteratively optimizes the candidate solutions until it meets a termination condition. Finally, it presents the solution with the highest similarity. The proposed method is applied for the topographical and cadastral maps of an urban region in Suwon, Korea to find corresponding polygon feature sets for block areas, and the results show its feasibility. The results were assessed with manual detection results, and showed overall accuracy of 0.946.

A Word Dictionary Structure for the Postprocessing of Hangul Recognition (한글인식 후처리용 단어사전의 기억구조)

  • ;Yoshinao Aoki
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.9
    • /
    • pp.1702-1709
    • /
    • 1994
  • In the postprocessing of Hangul recognition system, the storage structure of contextual information is an important matter for the recognition rate and speed of the entire system. Trie in general is used to represent the context as word dictionary, but the memory space efficiency of the structure is low. Therefore we propose a new structure for word dictionary that has better space efficiency and the equivalent merits of trie. Because Hangul is a compound language, the language can be represented by phonemes or by characters. In the representation by phonemes(P-mode) the retrieval is fast, but the space efficiency is low. In the representation by characters(C-mode) the space efficiency is high, but the retrieval is slow. In this paper the two representation methods are combined to form a hybrid representation(H-mode). At first an optimal level for the combination is selected by two characteristic curves of node utilization and dispersion. Then the input words are represented with trie structure by P-mode from the first to the optimal level, and the rest are represented with sequentially linked list structure by C-mode. The experimental results for the six kinds of word set show that the proposed structure is more efficient. This result is based on the fact that the retrieval for H-mode is as fast as P-mode and the space efficiency is as good as C-mode.

  • PDF

A Sliding Window-based Multivariate Stream Data Classification (슬라이딩 윈도우 기반 다변량 스트림 데이타 분류 기법)

  • Seo, Sung-Bo;Kang, Jae-Woo;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.163-174
    • /
    • 2006
  • In distributed wireless sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. We propose a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes input as a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a standard text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Bayesian classifier and SVM, and for unsupervised, we tested Jaccard, TFIDF Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

A String Analysis based System for Classifying Android Apps Accessing Harmful Sites (유해 사이트를 접속하는 안드로이드 앱을 문자열 분석으로 검사하는 시스템)

  • Choi, Kwang-Hoon;Ko, Kwang-Man;Park, Hee-Wan;Youn, Jong-Hee
    • The KIPS Transactions:PartA
    • /
    • v.19A no.4
    • /
    • pp.187-194
    • /
    • 2012
  • This paper proposes a string analysis based system for classifying Android Apps that may access so called harmful sites, and shows an experiment result for real Android apps on the market. The system first transforms Android App binary codes into Java byte codes, it performs string analysis to compute a set of strings at all program points, and it classifies the Android App as bad ones if the computed set contains URLs that are classified because the sites provide inappropriate contents. In the proposed approach, the system performs such a classification in the stage of distribution before installing and executing the Apps. Furthermore, the system is suitable for the automatic management of Android Apps in the market. The proposed system can be combined with the existing methods using DNS servers or monitoring modules to identify harmful Android apps better in different stages.

Handwritten Image Segmentation by the Modified Area-based Region Selection Technique (변형된 면적기반영역선별 기법에 의한 문자영상분할)

  • Hwang Jae-Ho
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.5 s.311
    • /
    • pp.30-36
    • /
    • 2006
  • In this paper, a new type of written image segmentation based on relative comparison of region areas is proposed. The original image is composed of two distinctive regions; information and background. Compared with this binary original image, the observed one is the gray scale which is represented with complex regions with speckles and noise due to degradation or contamination. For applying threshold or statistical approach, there occurs the region-deformation problem in the process of binarization. At first step, the efficient iterated conditional mode (ICM) which takes the lozenge type block is used for regions formation into the binary image. Secondly the information region is estimated through selecting action and restored its primary state. Not only decision of the attachment to a region but also the calculation of the magnitude of its area are carried on at each current pixel iteratively. All region areas are sorted into a set and selected through the decision parameter which is obtained statistically. Our experiments show that these approaches are effective on ink-rubbed copy image (拓本 'Takbon') and efficient at shape restoration. Experiments on gray scale image show promising shape extraction results, comparing with the threshold-segmentation and conventional ICM method.

Hangul Font Editor based on Multiple Master Glyph Algorithm (다중 마스터 글리프 알고리즘을 적용한 한글 글꼴 에디터)

  • Lim, Soon-Bum;Kim, Hyun-Young;Chung, Hwaju;Park, Ki-Deok;Choi, Kyong-Sun
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.699-705
    • /
    • 2015
  • Thousands of glyphs are necessary for Hangul font generation. It is mandatory to generate the required glyphs before producing Hangul font. This paper, entitled "Multiple Master Glyph Algorithm", presents an process that generates a target number of glyphs automatically from a very small number of glyphs by using a combination rule setting and a glyph interpolation method. A font editor, which is able to generate Hangul glyphs or fonts, is developed based on this algorithm. The editor generates a target number of fundamental glyphs automatically by using a combination rule setting and four master glyphs, which can be set up by a user. The automatically generated glyphs can be used to generate a target font by combining KSX1001 standard Hangul 2350 characters or Unicode standard Hangul 11172 characters automatically. The efficiency of the proposed Hangul editor is analyzed quantitatively in this paper through application to several commercial typefaces.

An Implementation Method of the Character Recognizer for the Sorting Rate Improvement of an Automatic Postal Envelope Sorting Machine (우편물 자동구분기의 구분율 향상을 위한 문자인식기의 구현 방법)

  • Lim, Kil-Taek;Jeong, Seon-Hwa;Jang, Seung-Ick;Kim, Ho-Yon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.4
    • /
    • pp.15-24
    • /
    • 2007
  • The recognition of postal address images is indispensable for the automatic sorting of postal envelopes. The process of the address image recognition is composed of three steps-address image preprocessing, character recognition, address interpretation. The extracted character images from the preprocessing step are forwarded to the character recognition step, in which multiple candidate characters with reliability scores are obtained for each character image extracted. aracters with reliability scores are obtained for each character image extracted. Utilizing those character candidates with scores, we obtain the final valid address for the input envelope image through the address interpretation step. The envelope sorting rate depends on the performance of all three steps, among which character recognition step could be said to be very important. The good character recognizer would be the one which could produce valid candidates with very reliable scores to help the address interpretation step go easy. In this paper, we propose the method of generating character candidates with reliable recognition scores. We utilize the existing MLP(multilayered perceptrons) neural network of the address recognition system in the current automatic postal envelope sorters, as the classifier for the each image from the preprocessing step. The MLP is well known to be one of the best classifiers in terms of processing speed and recognition rate. The false alarm problem, however, might be occurred in recognition results, which made the address interpretation hard. To make address interpretation easy and improve the envelope sorting rate, we propose promising methods to reestimate the recognition score (confidence) of the existing MLP classifier: the generation method of the statistical recognition properties of the classifier and the method of the combination of the MLP and the subspace classifier which roles as a reestimator of the confidence. To confirm the superiority of the proposed method, we have used the character images of the real postal envelopes from the sorters in the post office. The experimental results show that the proposed method produces high reliability in terms of error and rejection for individual characters and non-characters.

  • PDF

Eojeol-Block Bidirectional Algorithm for Automatic Word Spacing of Hangul Sentences (한글 문장의 자동 띄어쓰기를 위한 어절 블록 양방향 알고리즘)

  • Kang, Seung-Shik
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.4
    • /
    • pp.441-447
    • /
    • 2000
  • Automatic word spacing is needed to solve the automatic indexing problem of the non-spaced documents and the space-insertion problem of the character recognition system at the end of a line. We propose a word spacing algorithm that automatically finds out word spacing positions. It is based on the recognition of Eojeol components by using the sentence partition and bidirectional longest-match algorithm. The sentence partition utilizes an extraction of Eojeol-block where the Eojeol boundary is relatively clear, and a Korean morphological analyzer is applied bidirectionally to the recognition of Eojeol components. We tested the algorithm on two sentence groups of about 4,500 Eojeols. The space-level recall ratio was 97.3% and the Eojeol-level recall ratio was 93.2%.

  • PDF

A Study on Children′s Picture Book as a Communication Medium (커뮤니케이션 매체로서 어린이 그림책에 대한 연구)

  • 박경희
    • Archives of design research
    • /
    • v.14 no.1
    • /
    • pp.7-16
    • /
    • 2001
  • Human beings have been leading their life and desirable social life through communication. Human communication has been changed and expanded through language, letters, printing media, broadcasting media, and more recently network communication media. Since the invention of characters, books have been the human beings' communication medium that have the longest history, and they have preserved and succeeded human spiritual world. Children's picture books are also a communication medium composed of the transmitter of the information, message, medium, and the receiver, and makes the process of communication. The sender: writer, illustrator, and editor, analyzes children, special receiver, symbolizes messages, make up contents and make communication come true by selecting media and effectively delivering them to children. Considering the developmental characteristics of the receiver, children and their desires, visual media such as the writings and illustrations. The writings and illustrations are the most appropriate for communication with children. The first, picture books are for the communication between grown-ups and children, and also that with society for helping children find their identity and perform their roles. The second, through the message of writings and illustrations young children acquire human communication ability of this age. That is, they experience and learn visual communication and letter communication

  • PDF

A Study on the Hangul Character Code System for KS X 1001 Information Interchange considering AMI/HDB-3 Line Encoding and HDLC Flag (AMI/HDB-3 회선부호화 및 HDLC FLAG를 고려한 KS X 1001 정보교환용 한글낱자 부호체계 개선연구)

  • Woo, Je-Teak;Hong, Wan-Pyo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.1
    • /
    • pp.65-72
    • /
    • 2015
  • AMI / HDB-3 method used a scrambling technique is used primarily for long distance data transmission line encoding. In this paper, information communication code standard (KS X 1001; 2014 confirmation), as defined in Hangul Character Code HDLC Flag bit or character stuffing at the data link layer and physical layer with respect to the code set for Hangul AMI / HDB-3 the code set for the new system to increase the data transmission efficiency Hangul consonant and vowel tables presented in terms of scrambling. The result of the existing system and the code set ($4{\times}4$) bit source coding rules for comparing the frequency of use Hangul consonant and vowel tables and statistics showed that about 22.01% of the data processing efficiency is improved.