• Title/Summary/Keyword: 문자집합

Search Result 87, Processing Time 0.025 seconds

Molecular Computing Simulation of Cognitive Anagram Solving (애너그램 문제 인지적 해결과정의 분자컴퓨팅 시뮬레이션)

  • Chun, Hyo-Sun;Lee, Ji-Hoon;Ryu, Je-Hwan;Baek, Christina;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.12
    • /
    • pp.700-705
    • /
    • 2014
  • An anagram is a form of word play to find a new word from a set of given alphabet letters. Good human anagram solvers use the strategy of bigrams. They explore a constraint satisfaction network in parallel and answers consequently pop out quickly. In this paper, we propose a molecular computational algorithm using the same process as this. We encoded letters into DNA sequences and made bigrams and then words by connecting the letter sequences. From letters and bigrams, we performed DNA hybridization, ligation, gel electrophoresis and finally, extraction and separation to extract bigrams. From the matched bigrams and words, we performed the four molecular operations again to distinguish between right and wrong results. Experimental results show that our molecular computer can identify cor rect answers and incorrect answers. Our work shows a new possibility for modeling the cognitive and parallel thinking process of a human.

Automatic Inter-Phoneme Similarity Calculation Method Using PAM Matrix Model (PAM 행렬 모델을 이용한 음소 간 유사도 자동 계산 기법)

  • Kim, Sung-Hwan;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.3
    • /
    • pp.34-43
    • /
    • 2012
  • Determining the similarity between two strings can be applied various area such as information retrieval, spell checker and spam filtering. Similarity calculation between Korean strings based on dynamic programming methods firstly requires a definition of the similarity between phonemes. However, existing methods have a limitation that they use manually set similarity scores. In this paper, we propose a method to automatically calculate inter-phoneme similarity from a given set of variant words using a PAM-like probabilistic model. Our proposed method first finds the pairs of similar words from a given word set, and derives derivation rules from text alignment results among the similar word pairs. Then, similarity scores are calculated from the frequencies of variations between different phonemes. As an experimental result, we show an improvement of 10.1%~14.1% and 8.1%~11.8% in terms of sensitivity compared with the simple match-mismatch scoring scheme and the manually set inter-phoneme similarity scheme, respectively, with a specificity of 77.2%~80.4%.

A String Reconstruction Algorithm and Its Application to Exponentiation Problems (문자열 재구성 알고리즘 및 멱승문제 응용)

  • Sim, Jeong-Seop;Lee, Mun-Kyu;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.9_10
    • /
    • pp.476-484
    • /
    • 2008
  • Most string problems and their solutions are relevant to diverse applications such as pattern matching, data compression, recently bioinformatics, and so on. However, there have been few works on the relations between string problems and cryptographic problems. In this paper, we consider the following string reconstruction problems and show how these problems can be applied to cryptography. Given a string x of length n over a constant-sized alphabet ${\sum}$ and a set W of strings of lengths at most an integer $k({\leq}n)$, the first problem is to find the sequence of strings in W that reconstruct x by the minimum number of concatenations. We propose an O(kn+L)-time algorithm for this problem, where L is the sum of all lengths of strings in a given set, using suffix trees and a shortest path algorithm for directed acyclic graphs. The other is a dynamic version of the first problem and we propose an $O(k^3n+L)$-time algorithm. Finally, we show that exponentiation problems that arise in cryptography can be successfully reduced to these problems and propose a new solution for exponentiation.

A Novel Fuzzy Neural Network and Learning Algorithm for Invariant Handwritten Character Recognition (변형에 무관한 필기체 문자 인식을 위한 퍼지 신경망과 학습 알고리즘)

  • Yu, Jeong-Su
    • Journal of The Korean Association of Information Education
    • /
    • v.1 no.1
    • /
    • pp.28-37
    • /
    • 1997
  • This paper presents a new neural network based on fuzzy set and its application to invariant character recognition. The fuzzy neural network consists of five layers. The results of simulation show that the network can recognize characters in the case of distortion, translation, rotation and different sizes of handwritten characters and even with noise(8${\sim}$30%)). Translation, distortion, different sizes and noise are achieved by layer L2 and rotation invariant by layer L5. The network can recognize 108 examples of training with 100% recognition rate when they are shifted in eight directions by 1 pixel and 2 pixels. Also, the network can recognize all the distorted characters with 100% recognition rate. The simulations show that the test patterns cover a ${\pm}20^{\circ}$ range of rotation correctly. The proposed network can also recall correctly all the learned characters with 100% recognition rate. The proposed network is simple and its learning and recall speeds are very fast. This network also works for the segmentation and recognition of handwritten characters.

  • PDF

Consideration of Roman Character in KS × 1001 Code System for Information Interchange considered AMI/HDB-3 and HDLC FLAG (AMI/HDB-3 회선부호화 및 HDLC FLAG를 고려한 KS × 1001 정보 교환용 로마문자 부호체계고찰)

  • Hong, Wan-Pyo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.7
    • /
    • pp.1017-1023
    • /
    • 2013
  • Datacommunications transmit the source codes that are coded in information devices, such as computer to the transmission line by means of the line coded signal. AMI method is applied to the line coding method to transmit the signal for long distance. The disadvantage of the AMI method is to loss the bit synchronization when consecutive binary bit '0' over 4ea is coming into line coder. The scrambling technique is used to overcome the problem. The HDB-3 scrambling method is used in Korea standard which standard in ITU-T. When the HDB-3 technology is used. the method should convert the consecutive bit '0' over 4ea to certain bits format. As a result, when there are many such kind of '0' bit stream in source codes, data transmission efficiency will be decreased to treat in line coder, etc. This paper is directed to study the Roman character code system in $KS{\times}1001$, Korea standard for information exchange code in datacommunication systems. Based on the study result, this paper proposed the maximum optimized Roman character code system. In the study, Character coding rule for $4{\times}4$bits and the statistical data for roman character using frequency were considered to simulate. The paper shows the result that when the proposed new roman character coding system is applied to use, the data transmission efficiency could be increased to about 134% compared to existing code system.

Handwritten Korean Word Recognition for Address Recognition (주소 인식 시스템을 위한 필기 한글 단어 인식)

  • 권진욱;이관용;변혜란;이일병
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1997.11a
    • /
    • pp.201-204
    • /
    • 1997
  • 최근 주소를 자동으로 인식하여 우편물 분류와 같은 업무를 효과적으로 수행하기 위한 연구가 진행되고 있다. 기존 연구들은 낱자 단위의 인식을 수행한 후 사전 형태의 간단한 DB를 통해 최종의 결과를 생성한다. 그러나 한글과 같은 복잡한 구조의 필기 문자에 대한 인식기의 성능은 아직도 미흡한 상태이다. 따라서 낱자 인식기의 성능에 의존하는 현재와 같은 방법으로는 만족할 만한 결과를 얻기가 힘들 것으로 생각된다. 본 논문에서는 낱자 인식 결과에 크게 의존하지 않고 주소에 나타나는 단어의 낱자들 사이간 연결 정보를 이용하여 단어를 인식할 수 있는 시스템을 제안한다. 본 시스템은 통계적 인식기를 사용하여 낱자를 인식하는 부분과 낱자 인식 결과를 조합하여 단어 수준의 인식과정을 통해 최종의 결과를 생성하는 부분으로 구성된다. 통계적 인식기는 Nearest neighborhood 방법을 사용하여 간단한 형태로 구현하였다. 단어인식 모듈은 단어에서 모든 문자간의 관계를 표현할 수 있도록 HMM 모형을 사용하여 어휘정보 네트워크를 구성하고 이를 이용하여 주소에 나타나는 단어를 인식하도록 하였다. PE92 한글 문자 데이터를 이용하여 실험을 수 璿\ulcorner 결과, 통계적 인식기의 성능이 저조함에도 불구하고 HMM을 이용한 어휘정보 네트워크가 이를 보완함으로써 좋은 결과를 얻었다. 이러한 단어 인식 방법을 주소 이외의 다른 단어 집합에 대해서도 쉽게 적용될 수 있을 것으로 예상된다.

  • PDF

Consideration of CJK Joint Hanja Unicode when is used in AMI/HDB-3 Line Coding (AMI/HDB-3 회선부호화와 한·중·일 한자 유니코드 체계 고찰)

  • Tai, Dong-Zhen;Hong, Wan Pyo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.7
    • /
    • pp.1011-1015
    • /
    • 2013
  • This paper analyses the violation rate of CJK joint Chines character Unicode to the source code rule. In the paper, Chinese character 150ea in Chinese Unicode which have relatively a higher frequency in use of a character was chosen to study. The frequency rate in use of the 150ea characters is about 50% of the total frequency rate of the Chinese characters. The study was applied the AMI/HDB-3 line coding/scrambling and HDLC protocol, According to the analyses, the number of violated characters were 77ea of 150 ea, frequency rate in use 29%. Therefore, when the violated 77ea characters are replaced to the matched character codes to the source coding rule, the processing rate of the line coder can be improved about 37%.

Implementation of a Spam Message Filtering System using Sentence Similarity Measurements (문장유사도 측정 기법을 통한 스팸 필터링 시스템 구현)

  • Ou, SooBin;Lee, Jongwoo
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.1
    • /
    • pp.57-64
    • /
    • 2017
  • Short message service (SMS) is one of the most important communication methods for people who use mobile phones. However, illegal advertising spam messages exploit people because they can be used without the need for friend registration. Recently, spam message filtering systems that use machine learning have been developed, but they have some disadvantages such as requiring many calculations. In this paper, we implemented a spam message filtering system using the set-based POI search algorithm and sentence similarity without servers. This algorithm can judge whether the input query is a spam message or not using only letter composition without any server computing. Therefore, we can filter the spam message although the input text message has been intentionally modified. We added a specific preprocessing option which aims to enable spam filtering. Based on the experimental results, we observe that our spam message filtering system shows better performance than the original set-based POI search algorithm. We evaluate the proposed system through extensive simulation. According to the simulation results, the proposed system can filter the text message and show high accuracy performance against the text message which cannot be filtered by the 3 major telecom companies.

Word Vectorization Method Based on Bag of Characters (Bag of Characters를 응용한 단어의 벡터 표현 생성 방법)

  • Lee, Chanhee;Lee, Seolhwa;Lim, Heuiseok
    • Proceedings of The KACE
    • /
    • 2017.08a
    • /
    • pp.47-49
    • /
    • 2017
  • 인공 신경망 기반 자연어 처리 시스템들에서 단어를 벡터로 변환할 때, 크게 색인 및 순람표를 이용하는 방법과 합성곱 신경망이나 회귀 신경망을 이용하는 방법이 있다. 이 때, 전자의 방법을 사용하려면 시스템이 수용 가능한 어휘집이 정의되어 있어야 하며 새로운 단어를 어휘집에 추가하기 어렵다. 반면 후자의 방법을 사용하면 단어를 구성하는 문자들을 바탕으로 벡터 표현을 생성하기 때문에 어휘집이 필요하지 않지만, 추가적인 인공 신경망 구조가 필요하기 때문에 모델의 복잡도와 파라미터의 수가 증가한다는 단점이 있다. 본 연구에서는 위 두 방법의 한계를 극복하고자 Bag of Characters를 응용하여 단어를 구성하는 문자들의 집합을 바탕으로 벡터 표현을 생성하는 방법을 제안한다. 제안된 방법은 문자를 기반으로 동작하기 때문에 어휘집을 정의할 필요가 없으며, 인공 신경망 구조가 사용되지 않기 때문에 시스템의 복잡도도 증가시키지 않는다. 또한, 단어의 벡터 표현에 단어를 구성하는 문자들의 정보가 반영되기 때문에 Out-Of-Vocabulary 단어에 대한 성능도 어휘집을 사용하는 방법보다 우수할 것으로 기대된다.

  • PDF

Coding Rule of Characters by 2 bytes with 4×4 bits to Improve the Transmission Efficiency in Data Communications (데이터 전송 효율을 고려한 4비트행×4비트열 2 바이트 문자 부호화 규칙에 관한 연구)

  • Hong, Wan-Pyo
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.5
    • /
    • pp.749-756
    • /
    • 2011
  • This paper propose the rule of coding for the characters and symbol, etc which are used in computer, information devices, etc. When they use the rule of coding, they may improve the efficiency of transmission in data communications by reducing the number of scrambling during the line coding in the coder in the transmitter. The paper considered the codes of two bytes(16 bits) of 4bits column ${\times}$ 4bits rows. In the paper, we applied the code system of Roman characters in KS X 1001.