• Title/Summary/Keyword: morphological analyzer

Search Result 145, Processing Time 0.025 seconds

Development of Text Mining-Based Accounting Terminology Analyzer for Financial Information Utilization (재정정보 활용을 위한 텍스트 마이닝 기반 회계용어 형태소 분석기 구축)

  • Jung, Geon-Yong;Yoon, Seung-Sik;Kang, Ju-Young
    • The Journal of Information Systems
    • /
    • v.28 no.4
    • /
    • pp.155-174
    • /
    • 2019
  • Purpose Social interest in financial statement notes has recently increased. However, contrary to the keen interest in financial statement notes, there is no morphological analyzer for accounting terms, which is why researchers are having considerable difficulty in carrying out research. In this study, we build a morphological analyzer for accounting related text mining techniques. This morphological analyzer can handle accounting terms like financial statements and we expect it to serve as a springboard for growth in the text mining research field. Design/methodology/approach In this study, we build customized korean morphological analyzer to extract proper accounting terms. First, we collect Company's Financial Statement notes, financial information data published by KPFIS(Korea Public Finance Information Service), K-IFRS accounting terms data. Second, we cleaning and tokeninzing and removing stopwords. Third, we customize morphological analyzer using n-gram methodology. Findings Existing morphological analyzer cannot extract accounting terms because it split accounting terms to many nouns. In this study, the new customized morphological analyzer can detect more appropriate accounting terms comparing to the existing morphological analyzer. We found that accounting words that were not detected by existing morphological analyzers were detected in new customized morphological analyzers.

MADE: Morphological Analyzer Development Environment (MADE : 형태소 분석기 개발환경)

  • Shim, Kwang-Seob
    • Journal of Internet Computing and Services
    • /
    • v.8 no.4
    • /
    • pp.159-171
    • /
    • 2007
  • This paper proposes a software tool MADE that is useful to develop a practical Korean morphological analyzer. A morphological analysis is performed by using adjacency conditions provided by a morphological dictionary. This means that developing a morphological analyzer is reduced merely to constructing a morphological dictionary. No programming skill is required in this process, MADE provides with useful functions that facilitate the construction of a dictionary. Once a dictionary is constructed, the morphological analysis engine embedded in MADE may be used as a stand-alone morphological analyzer or be integrated into an application software which requires a Korean morphological analysis module.

  • PDF

High-Performance Korean Morphological Analyzer Using the MapReduce Framework on the GPU

  • Cho, Shi-Won;Lee, Dong-Wook
    • Journal of Electrical Engineering and Technology
    • /
    • v.6 no.4
    • /
    • pp.573-579
    • /
    • 2011
  • To meet the scalability and performance requirements of data analyses, which often involve voluminous data, efficient parallel or concurrent algorithms and frameworks are essential. We present a high-performance Korean morphological analyzer which employs the MapReduce framework on the graphics processing unit (GPU). MapReduce is a programming framework introduced by Google to aid the development of web search applications on a large number of central processing units (CPUs). GPUs are designed as a special-purpose co-processor. Their programming interfaces are typically formulated for graphics applications. Compared to CPUs, GPUs have greater computation power and memory bandwidth; however, GPUs are more difficult to program because of the design of their architectures. The performance of the Korean morphological analyzer using the MapReduce framework on the GPU is evaluated in comparison with the CPU-based model. The proposed Korean Morphological analyzer shows promising scalable performance on distributed computing with the GPU.

A Morphological Analyzer with Multi-Threads Method (다중 스레드 방식을 도입한 형태소 해석기)

  • 최유경;안동언;정성종;이신원;두길수;노영만;오형진;김금영;이동광
    • Proceedings of the IEEK Conference
    • /
    • 2001.06c
    • /
    • pp.181-184
    • /
    • 2001
  • In recent, a morphological analyzer be used for indexing system in information retrieval system. A morphological analyzer as a indexing system must have multiprocessing ability to deal with multiple users and documents. To meet the needs of these, we propose a morphological analyzer with multi-threads method. To use multi-threads method, we consider memory allocation problem, threads synchronization problem, code optimization and so on. In this paper, first, we report several manners for multi-threads. And next, to evaluate our proposed system, we make a comparison test between proposed system and existing system.

  • PDF

Functional Expansion of Morphological Analyzer Based on Longest Phrase Matching For Efficient Korean Parsing (효율적인 한국어 파싱을 위한 최장일치 기반의 형태소 분석기 기능 확장)

  • Lee, Hyeon-yoeng;Lee, Jong-seok;Kang, Byeong-do;Yang, Seung-weon
    • Journal of Digital Contents Society
    • /
    • v.17 no.3
    • /
    • pp.203-210
    • /
    • 2016
  • Korean is free of omission of sentence elements and modifying scope, so managing it on morphological analyzer is better than parser. In this paper, we propose functional expansion methods of the morphological analyzer to ease the burden of parsing. This method is a longest phrase matching method. When the series of several morpheme have one syntax category by processing of Unknown-words, Compound verbs, Compound nouns, Numbers and Symbols, our method combines them into a syntactic unit. And then, it is to treat by giving them a semantic features as syntax unit. The proposed morphological analysis method removes unnecessary morphological ambiguities and deceases results of morphological analysis, so improves accuracy of tagger and parser. By empirical results, we found that our method deceases 73.4% of Parsing tree and 52.4% of parsing time on average.

A Hybrid Approach for the Morpho-Lexical Disambiguation of Arabic

  • Bousmaha, Kheira Zineb;Rahmouni, Mustapha Kamel;Kouninef, Belkacem;Hadrich, Lamia Belguith
    • Journal of Information Processing Systems
    • /
    • v.12 no.3
    • /
    • pp.358-380
    • /
    • 2016
  • In order to considerably reduce the ambiguity rate, we propose in this article a disambiguation approach that is based on the selection of the right diacritics at different analysis levels. This hybrid approach combines a linguistic approach with a multi-criteria decision one and could be considered as an alternative choice to solve the morpho-lexical ambiguity problem regardless of the diacritics rate of the processed text. As to its evaluation, we tried the disambiguation on the online Alkhalil morphological analyzer (the proposed approach can be used on any morphological analyzer of the Arabic language) and obtained encouraging results with an F-measure of more than 80%.

Cloning of Korean Morphological Analyzers using Pre-analyzed Eojeol Dictionary and Syllable-based Probabilistic Model (기분석 어절 사전과 음절 단위의 확률 모델을 이용한 한국어 형태소 분석기 복제)

  • Shim, Kwangseob
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.3
    • /
    • pp.119-126
    • /
    • 2016
  • In this study, we verified the feasibility of a Korean morphological analyzer that uses a pre-analyzed Eojeol dictionary and syllable-based probabilistic model. For the verification, MACH and KLT2000, Korean morphological analyzers, were cloned with a pre-analyzed eojeol dictionary and syllable-based probabilistic model. The analysis results were compared between the cloned morphological analyzer, MACH, and KLT2000. The 10 million Eojeol Sejong corpus was segmented into 10 sets for cross-validation. The 10-fold cross-validated precision and recall for cloned MACH and KLT2000 were 97.16%, 98.31% and 96.80%, 99.03%, respectively. Analysis speed of a cloned MACH was 308,000 Eojeols per second, and the speed of a cloned KLT2000 was 436,000 Eojeols per second. The experimental results indicated that a Korean morphological analyzer that uses a pre-analyzed eojeol dictionary and syllable-based probabilistic model could be used in practical applications.

A Reranking Model for Korean Morphological Analysis Based on Sequence-to-Sequence Model (Sequence-to-Sequence 모델 기반으로 한 한국어 형태소 분석의 재순위화 모델)

  • Choi, Yong-Seok;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.4
    • /
    • pp.121-128
    • /
    • 2018
  • A Korean morphological analyzer adopts sequence-to-sequence (seq2seq) model, which can generate an output sequence of different length from an input. In general, a seq2seq based Korean morphological analyzer takes a syllable-unit based sequence as an input, and output a syllable-unit based sequence. Syllable-based morphological analysis has the advantage that unknown words can be easily handled, but has the disadvantages that morpheme-based information is ignored. In this paper, we propose a reranking model as a post-processor of seq2seq model that can improve the accuracy of morphological analysis. The seq2seq based morphological analyzer can generate K results by using a beam-search method. The reranking model exploits morpheme-unit embedding information as well as n-gram of morphemes in order to reorder K results. The experimental results show that the reranking model can improve 1.17% F1 score comparing with the original seq2seq model.

A Light Weighted Robust Korean Morphological Analyzer for Korean-to-English Mobile Translator (한영 모바일 번역기를 위한 강건하고 경량화된 한국어 형태소 분석기)

  • Yuh, Sang-Hwa
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.2
    • /
    • pp.191-199
    • /
    • 2009
  • In this paper we present a light weighted robust Korean morphological analyzer for mobile devices such as mobile phones, smart phones, and PDA phones. Such mobile devices are not suitable for natural language interfaces for their low CPU performance and memory restriction. In order to overcome the difficulties we propose 1) an online analysis by using Key Event Handler mechanism, 2) and a robust analysis of the Korean sentences with spacing errors without its correction pre-processing. We adapt the proposed Korean analyzer to a Korean-English mobile translator, which shows 5.8% memory usage reduction and 19.0% enhancement of average response time.

Syllable-based POS Tagging without Korean Morphological Analysis (형태소 분석기 사용을 배제한 음절 단위의 한국어 품사 태깅)

  • Shim, Kwang-Seob
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.3
    • /
    • pp.327-345
    • /
    • 2011
  • In this paper, a new approach to Korean POS (Part-of-Speech) tagging is proposed. In previous works, a Korean POS tagger was regarded as a post-processor of a morphological analyzer, and as such a tagger was used to determine the most likely morpheme/POS sequence from morphological analysis. In the proposed approach, however, the POS tagger is supposed to generate the most likely morpheme and POS pair sequence directly from the given sentences. 398,632 eojeol POS-tagged corpus and 33,467 eojeol test data are used for training and evaluation, respectively. The proposed approach shows 96.31% of POS tagging accuracy.

  • PDF