• Title/Summary/Keyword: Statistical Language Model

Search Result 107, Processing Time 0.026 seconds

TAKTAG: Two phase learning method for hybrid statistical/rule-based part-of-speech disambiguation (TAKTAG: 통계와 규칙에 기반한 2단계 학습을 통한 품사 중의성 해결)

  • Shin, Sang-Hyun;Lee, Geun-Bae;Lee, Jong-Hyeok
    • Annual Conference on Human and Language Technology
    • /
    • 1995.10a
    • /
    • pp.169-174
    • /
    • 1995
  • 품사 태깅은 형태소 분석 이후 발생한 모호성을 제거하는 것으로, 통계적 방법과 규칙에 기 반한 방법이 널리 사용되고 있다. 하지만, 이들 방법론에는 각기 한계점을 지니고 있다. 통계적인 방법인 은닉 마코프 모델(Hidden Markov Model)은 유연성(flexibility)을 지니지만, 교착어(agglutinative language)인 한국어에 있어서 제한된 윈도우로 인하여, 중의성 해결의 실마리가 되는 어휘나 품사별 제대로 참조하지 못하는 경우가 있다. 반면, 규칙에 기반한 방법은 차체가 품사에 영향을 받으므로 인하여, 새로운 태그집합(tagset)이나 언어에 대하여 유연성이나 정확성을 제공해 주지 못한다. 이러한 각기 서로 다른 방법론의 한계를 극복하기 위하여, 본 논문에서는 통계와 규칙을 통합한 한국어 태깅 모델을 제안한다. 즉 통계적 학습을 통한 통계 모델이후에 2차적으로 규칙을 자동학습 하게 하여, 통계모델이 다루지 못하는 범위의 규칙을 생성하게 된다. 이처럼 2단계의 통계와 규칙의 자동 학습단계를 거치게 됨으로써, 두개 모델의 단점을 보강한 높은 정확도를 가지는 한국어 태거를 개발할 수 있게 하였다.

  • PDF

Supervised text data augmentation method for deep neural networks

  • Jaehwan Seol;Jieun Jung;Yeonseok Choi;Yong-Seok Choi
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.343-354
    • /
    • 2023
  • Recently, there have been many improvements in general language models using architectures such as GPT-3 proposed by Brown et al. (2020). Nevertheless, training complex models can hardly be done if the number of data is very small. Data augmentation that addressed this problem was more than normal success in image data. Image augmentation technology significantly improves model performance without any additional data or architectural changes (Perez and Wang, 2017). However, applying this technique to textual data has many challenges because the noise to be added is veiled. Thus, we have developed a novel method for performing data augmentation on text data. We divide the data into signals with positive or negative meaning and noise without them, and then perform data augmentation using k-doc augmentation to randomly combine signals and noises from all data to generate new data.

Nursing Process of Abdominal Surgery Patients (복부수술환자의 간호과정)

  • Yoo, Hyung-Sook
    • Journal of Korean Academy of Nursing Administration
    • /
    • v.8 no.3
    • /
    • pp.411-430
    • /
    • 2002
  • Purpose : This study was to develop Nursing Process Model of abdominal surgery patient using nursing diagnoses of NANDA, Nursing Interventions Classification(NIC), and Nursing Outcomes Classification(NOC). Method : The data in database were collected from nursing records in sixty patients with abdominal surgery admitted in a university hospital and open questionnaires of thirteen nurses. Systematic nursing process resulting from each nursing diagnoses, most common, was developed by the statistical analysis through database query from clinical database of abdominal surgery patients. Result : 51 nursing diagnoses were identified in abdominal surgery patients. The most commonly occurred nursing diagnoses were Pain, Risk for Infection, Sleep Pattern Disturbance, Hyperthermia, Altered Nutrition: Less Than Body Requirements in order. The linkage lists of NANDA to NIC and NANDA to NOC, and the nursing activities according to nursing diagnoses of abdominal surgery patients were identified in unit. Conclusion : Nursing Process of abdominal surgery patients was comprised of core nursing diagnoses, core nursing interventions, core nursing outcomes which provides the most reliable data in unit and could make nurses facilitate nursing process easily without full consideration of knowledge about nursing language classification system. Therefore, it could support nurses' decision making and recording of nursing process especially in the computerized patient record system if unit nursing process model using standardized nursing language system which contains of their own core nursing process data was developed.

  • PDF

Neural Model for Named Entity Recognition Considering Aligned Representation

  • Sun, Hongyang;Kim, Taewhan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.613-616
    • /
    • 2018
  • Sequence tagging is an important task in Natural Language Processing (NLP), in which the Named Entity Recognition (NER) is the key issue. So far the most widely adopted model for NER in NLP is that of combining the neural network of bidirectional long short-term memory (BiLSTM) and the statistical sequence prediction method of Conditional Random Field (CRF). In this work, we improve the prediction accuracy of the BiLSTM by supporting an aligned word representation mechanism. We have performed experiments on multilingual (English, Spanish and Dutch) datasets and confirmed that our proposed model outperformed the existing state-of-the-art models.

Energy Flow Finite Element Analysis(EFFEA) of Coplanar Coupled Mindlin Plates (동일 평면상에서 연성된 Mindlin 판 구조물의 에너지흐름유한요소해석)

  • Park, Young-Ho
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.53 no.4
    • /
    • pp.307-314
    • /
    • 2016
  • Energy flow analysis(EFA) is a representative method that can predict the statistical energetics of structures at high frequencies. Generally, as the frequency increases, the shear distortion and rotatory inertia effects in the out-of-plane motion of beams or plates become important. Therefore, to predict the out-of-plane energetics of coupled structures in the high frequency range, the energy flow analyses of Timoshenko beam and Mindlin plate are required. Unlike the energy flow model of Kirchhoff plate, the energy flow model of Mindlin plate is composed of three kinds of energy governing equations(out-of-plane shear wave, bending dominant flexural wave, and shear dominant flexural wave). This paper performed the energy flow finite element analysis(EFFEA) of coplanar coupled Mindlin plates. For EFFEA of coplanar coupled Mindlin plates, the energy flow finite element formulation of out-of-plane energetics in the Mindlin plate was performed. The general EFFEA program was implemented by MATLAB® language. For the verification of EFFEA of Mindlin plate, the various numerical applications were done successfully.

Part-Of-Speech Tagging using multiple sources of statistical data (이종의 통계정보를 이용한 품사 부착 기법)

  • Cho, Seh-Yeong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.501-506
    • /
    • 2008
  • Statistical POS tagging is prone to error, because of the inherent limitations of statistical data, especially single source of data. Therefore it is widely agreed that the possibility of further enhancement lies in exploiting various knowledge sources. However these data sources are bound to be inconsistent to each other. This paper shows the possibility of using maximum entropy model to Korean language POS tagging. We use as the knowledge sources n-gram data and trigger pair data. We show how perplexity measure varies when two knowledge sources are combined using maximum entropy method. The experiment used a trigram model which produced 94.9% accuracy using Hidden Markov Model, and showed increase to 95.6% when combined with trigger pair data using Maximum Entropy method. This clearly shows possibility of further enhancement when various knowledge sources are developed and combined using ME method.

Numerical model for bolted T-stubs with two bolt rows

  • Daidie, Alain;Chakhari, Jamel;Zghal, Ali
    • Structural Engineering and Mechanics
    • /
    • v.26 no.3
    • /
    • pp.343-361
    • /
    • 2007
  • This article presents a numerical tool for dimensioning two-threaded fasteners connecting prismatic parts subjected to fatigue tension loads that are coplanar with the screw axis. A simplified numerical model is developed from unidirectional finite elements, modeling the connected parts and screws with bent elements and the elastic contact layer between the parts with springs. An algorithm updating the contact stiffness matrix, calculating forces and displacements at each node of the structure and thus normal stresses in the screws in both static and fatigue is further developed using C language. An experimental study is also conducted in parallel with the numerical approach to validate the developed model assumptions, the numerical model and the 3D finite element results. Since stiffness values for the compressive zones in the parts are analytically difficult to determine, a statistical software method is used, from which a tuning factor is derived for identifying these stiffness values. The method is also applied to set out the influence of each parameter on the fatigue behaviour of each screw. Finally, the developed model will be used to establish a new, sophisticated, fast and accurate tool for dimensioning bolted mechanical structures.

A Homonym Disambiguation System Based on Statistical Model Using Sense Category and Distance Weights (의미범주 및 거리 가중치를 고려한 통계기반 동형이의어 분별 시스템)

  • Kim, Jun-Su;Kim, Chang-Hwan;Lee, Wang-Woo;Lee, Soo-Dong;Ock, Cheol-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2001.10d
    • /
    • pp.487-493
    • /
    • 2001
  • 본 논문에서는 Bayes 정리를 적용한 통계기반 동형이의어 분별 시스템에 대한 외부실험 결과를 분석하여, 정확률 향상을 위한 의미범주 가중치 및 인접 어절에 대한 거리 가중치 모델을 제시한다. 의미 분별된 사전 뜻풀이말 코퍼스(120만 어절)에서 구축된 의미정보를 이용한 통계기반 동형이의어 분별 시스템을 사전 뜻풀이말 문장에 출현하는 동형이의어 의미 분별에 적용한 결과 상위 고빈도 200개의 동형이의어에 대해 평균 98.32% 정확률을 보였다. 내부 실험에 사용된 200개의 동형이의어 중 49개(체언 31개, 용언 18개)를 선별하여 이들 동형이의어를 포함하고 있는 50,703개의 문장을 세종계획 품사 부착 코퍼스(350만 어절)에서 추출하여 외부 실험을 하였다. 분별하고자 하는 동형이의어의 앞/뒤 5어절에 대해 의미범주 및 거리 가중치를 부여한 실험 결과 기존 통계기반 분별 모델 보다 2.93% 정확률이 향상되었다.

  • PDF

Research on Big Data Integration Method

  • Kim, Jee-Hyun;Cho, Young-Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.1
    • /
    • pp.49-56
    • /
    • 2017
  • In this paper we propose the approach for big data integration so as to analyze, visualize and predict the future of the trend of the market, and that is to get the integration data model using the R language which is the future of the statistics and the Hadoop which is a parallel processing for the data. As four approaching methods using R and Hadoop, ff package in R, R and Streaming as Hadoop utility, and Rhipe and RHadoop as R and Hadoop interface packages are used, and the strength and weakness of four methods are described and analyzed, so Rhipe and RHadoop are proposed as a complete set of data integration model. The integration of R, which is popular for processing statistical algorithm and Hadoop contains Distributed File System and resource management platform and can implement the MapReduce programming model gives us a new environment where in R code can be written and deployed in Hadoop without any data movement. This model allows us to predictive analysis with high performance and deep understand over the big data.

A study on the aspect-based sentiment analysis of multilingual customer reviews (다국어 사용자 후기에 대한 속성기반 감성분석 연구)

  • Sungyoung Ji;Siyoon Lee;Daewoo Choi;Kee-Hoon Kang
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.515-528
    • /
    • 2023
  • With the growth of the e-commerce market, consumers increasingly rely on user reviews to make purchasing decisions. Consequently, researchers are actively conducting studies to effectively analyze these reviews. Among the various methods of sentiment analysis, the aspect-based sentiment analysis approach, which examines user reviews from multiple angles rather than solely relying on simple positive or negative sentiments, is gaining widespread attention. Among the various methodologies for aspect-based sentiment analysis, there is an analysis method using a transformer-based model, which is the latest natural language processing technology. In this paper, we conduct an aspect-based sentiment analysis on multilingual user reviews using two real datasets from the latest natural language processing technology model. Specifically, we use restaurant data from the SemEval 2016 public dataset and multilingual user review data from the cosmetic domain. We compare the performance of transformer-based models for aspect-based sentiment analysis and apply various methodologies to improve their performance. Models using multilingual data are expected to be highly useful in that they can analyze multiple languages in one model without building separate models for each language.