• Title/Summary/Keyword: Finite Automata

Search Result 86, Processing Time 0.021 seconds

Word Recognition Using K-L Dynamic Coefficients (K-L 동적 계수를 이용한 단어 인식)

  • 김주곤
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.103-106
    • /
    • 1998
  • 본 논문에서는 음성인식 시스템의 인식 정도의 향상을 위해서 동적 특징으로서 K-L(Karhanen-Loeve)계수를 이용하여 음소모델을 구성하는 방법을 제안하고, 음소, 단어, 숫자음 인식 실험을 통하여 그 유효성을 검토하였다. 인식 실험을 위한 음성자료는 한국 전자통신 연구소에서 채록한 445단어와 국어정보공학연구소에서 채록한 4연속 숫자음을 사용하였으며, K-L계수 동적 특징의 유효성을 확인하기 위해 정적 특징으로서 멜-켑스트럼과 동적 특징으로서 K-L계수 및 회귀계수를 추출한 후 음소, 단어, 숫자음 인식 실험을 수행하였다. 인식의 기본 단위로는 48개의 유사음소단위(Phoneme Likely Unite ; PLUs)를 음소모델로 사용하였으며, 단어와 숫자음 인식을 위해서는 유한상태 오토마타(Finite State Automata; FSA)에 의한 구문제어를 통한 OPDP(One Pass Dynamic Programming)법을 이용하였다. 인식 실험 결과, 음소인식에 있어서는 정적특징인 멜-켑스트럼을 사용한 경우 39.8%, K-L 동적 계수를 사용한 경우가 52.4%로 12.6%의 향상된 인식률을 얻었다. 또한, 멜-켑스트럼과 회수계수를 사용한 경우 60.1%, K-L계수와 회귀계수를 결합한 경우에 있어서도 60.4%로 높은 인식률은 얻었다. 이 결과를 단어인식에 확장하여 인식 실험을 수행한 결과, 기존의 멜-켑스트럼 계수를 사용한 경우 65.5%, K-L계수를 사용한 경우 75.8%로 10.3% 향상된 인식률을 얻었으며, 멜-켑스트럼과 회귀계수를 결합한 경우 91.2%, K-L계수와 회귀계수를 결합한 경우 91.4%의 높은 인식률을 보였다. 도한, 4연속 숫자음에 적용한 경우에 있어서도 멜-켑스트럼을 사용한 경우 67.5%, K-L계수를 사용한 경우 75.3%로 7.8%의 향상된 인식률을 보였으며 K-L계수와 회귀계수를 결합한 경우에서도 비교적 높은 인식률을 보여 숫자음에 대해서도 K-L계수의 유효성을 확인할 수 있었다.

  • PDF

A Development of Intelligent Simulation Tools based on Multi-agent (멀티 에이전트 기반의 지능형 시뮬레이션 도구의 개발)

  • Woo, Chong-Woo;Kim, Dae-Ryung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.6
    • /
    • pp.21-30
    • /
    • 2007
  • Simulation means modeling structures or behaviors of the various objects, and experimenting them on the computer system. And the major approaches are DEVS(Discrete Event Systems Specification). Petri-net or Automata and so on. But, the simulation problems are getting more complex or complicated these days, so that an intelligent agent-based is being studied. In this paper, we are describing an intelligent agent-based simulation tool, which can supports the simulation experiment more efficiently. The significances of our system can be described as follows. First, the system can provide some AI algorithms through the system libraries. Second, the system supports simple method of designing the simulation model, since it's been built under the Finite State Machine (FSM) structure. And finally, the system acts as a simulation framework by supporting user not only the simulation engine, but also user-friendly tools, such as modeler scriptor and simulator. The system mainly consists of main simulation engine, utility tools, and some other assist tools, and it is tested and showed some efficient results in the three different problems.

  • PDF

Automatic Generation of Code Optimizer for DFA Pattern Matching (DFA 패턴 매칭을 위한 코드 최적화기의 자동적 생성)

  • Yun, Sung-Lim;Oh, Se-Man
    • The KIPS Transactions:PartA
    • /
    • v.14A no.1 s.105
    • /
    • pp.31-38
    • /
    • 2007
  • Code Optimization is converting to a code that is equivalent to given program but more efficient, and this process is processed in Code Optimizer. This paper designed and processed Code Optimizer Generator that automatically generates Code Optimizer. In other words Code Optimizer is automatically generated for DFA Pattern Matching which finds the optimal code for the incoming pattern description. DFA Pattern Matching removes redundancy comparisons that occur when patterns are sought for through normalization process and improves simplification and structure of pattern shapes for low cost. Automatic generation of Code Optimization for DFA Pattern Matching eliminates extra effort to generate Code Optimizer every time the code undergoes various transformations, and enables formalism of Code Optimization. Also, the advantage of making DFA for optimization is that it is faster and saves cost of Code Optimizer Generator.

Judgment about the Usefulness of Automatically Extracted Temporal Information from News Articles for Event Detection and Tracking (사건 탐지 및 추적을 위해 신문기사에서 자동 추출된 시간정보의 유용성 판단)

  • Kim Pyung;Myaeng Sung-Hyon
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.6
    • /
    • pp.564-573
    • /
    • 2006
  • Temporal information plays an important role in natural language processing (NLP) applications such as information extraction, discourse analysis, automatic summarization, and question-answering. In the topic detection and tracking (TDT) area, the temporal information often used is the publication date of a message, which is readily available but limited in its usefulness. We developed a relatively simple NLP method of extracting temporal information from Korean news articles, with the goal of improving performance of TDT tasks. To extract temporal information, we make use of finite state automata and a lexicon containing time-revealing vocabulary. Extracted information is converted into a canonicalized representation of a time point or a time duration. We first evaluated the extraction and canonicalization methods for their accuracy and investigated on the extent to which temporal information extracted as such can help TDT tasks. The experimental results show that time information extracted from text indeed helps improve both precision and recall significantly.

A Hybrid Multiple Pattern Matching Scheme to Reduce Packet Inspection Time (패킷검사시간을 단축하기 위한 혼합형 다중패턴매칭 기법)

  • Lee, Jae-Kook;Kim, Hyong-Shik
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.1
    • /
    • pp.27-37
    • /
    • 2011
  • The IDS/IPS(Intrusion Detection/Prevention System) has been widely deployed to protect the internal network against internet attacks. Reducing the packet inspection time is one of the most important challenges of improving the performance of the IDS/IPS. Since the IDS/IPS needs to match multiple patterns for the incoming traffic, we may have to apply the multiple pattern matching schemes, some of which use finite automata, while the others use the shift table. In this paper, we first show that the performance of those schemes would degrade with various kinds of pattern sets and payload, and then propose a hybrid multiple pattern matching scheme which combines those two schemes. The proposed scheme is organized to guarantee an appropriate level of performance in any cases. The experimental results using real traffic show that the time required to do multiple pattern matching could be reduced effectively.

Part-of-speech Tagging for Hindi Corpus in Poor Resource Scenario

  • Modi, Deepa;Nain, Neeta;Nehra, Maninder
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.147-154
    • /
    • 2018
  • Natural language processing (NLP) is an emerging research area in which we study how machines can be used to perceive and alter the text written in natural languages. We can perform different tasks on natural languages by analyzing them through various annotational tasks like parsing, chunking, part-of-speech tagging and lexical analysis etc. These annotational tasks depend on morphological structure of a particular natural language. The focus of this work is part-of-speech tagging (POS tagging) on Hindi language. Part-of-speech tagging also known as grammatical tagging is a process of assigning different grammatical categories to each word of a given text. These grammatical categories can be noun, verb, time, date, number etc. Hindi is the most widely used and official language of India. It is also among the top five most spoken languages of the world. For English and other languages, a diverse range of POS taggers are available, but these POS taggers can not be applied on the Hindi language as Hindi is one of the most morphologically rich language. Furthermore there is a significant difference between the morphological structures of these languages. Thus in this work, a POS tagger system is presented for the Hindi language. For Hindi POS tagging a hybrid approach is presented in this paper which combines "Probability-based and Rule-based" approaches. For known word tagging a Unigram model of probability class is used, whereas for tagging unknown words various lexical and contextual features are used. Various finite state machine automata are constructed for demonstrating different rules and then regular expressions are used to implement these rules. A tagset is also prepared for this task, which contains 29 standard part-of-speech tags. The tagset also includes two unique tags, i.e., date tag and time tag. These date and time tags support all possible formats. Regular expressions are used to implement all pattern based tags like time, date, number and special symbols. The aim of the presented approach is to increase the correctness of an automatic Hindi POS tagging while bounding the requirement of a large human-made corpus. This hybrid approach uses a probability-based model to increase automatic tagging and a rule-based model to bound the requirement of an already trained corpus. This approach is based on very small labeled training set (around 9,000 words) and yields 96.54% of best precision and 95.08% of average precision. The approach also yields best accuracy of 91.39% and an average accuracy of 88.15%.