• Title/Summary/Keyword: random fields

Search Result 418, Processing Time 0.023 seconds

Exploiting Features of Writer's Intent in Automatic Spacing (자동 띄어쓰기에서 글쓴이 의도를 반영한 자질의 활용)

  • Lee, Jeong-wook;Kim, Jae-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.528-531
    • /
    • 2021
  • 띄어쓰기에 대한 오류는 한국어 처리 전반에 영향을 주므로 자동 띄어쓰기는 필수적인 요소이다. 글쓴이의 대부분은 띄어쓰기 오류를 범하지 않으므로 글쓴이의 의도가 띄어쓰기 시스템에 반영되어야 한다. 그러나 대부분의 자동 띄어쓰기 시스템은 모든 띄어쓰기 정보를 제거하고 새로이 공백문자를 추가하는 방법으로 띄어쓰기를 수행한다. 이런 문제를 완화하기 위해서 본 논문에서는 기계학습에서 글쓴이의 의도가 반영된 자질을 추가하는 방법을 제안한다. 실험을 위해서 CRFs(Conditional Random Fields)를 사용하여 기존 시스템과 사용자의 의도를 반영한 띄어쓰기 시스템과의 성능을 비교하고 분석한다.

  • PDF

Korean Named Entity Recognition Using ELECTRA and Label Attention Network (ELECTRA와 Label Attention Network를 이용한 한국어 개체명 인식)

  • Kim, Hong-Jin;Oh, Shin-Hyeok;Kim, Hark-Soo
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.333-336
    • /
    • 2020
  • 개체명 인식이란 문장에서 인명, 지명, 기관명 등과 같이 고유한 의미를 갖는 단어를 찾아 개체명을 분류하는 작업이다. 딥러닝을 활용한 연구가 수행되면서 개체명 인식에 RNN(Recurrent Neural Network)과 CRF(Condition Random Fields)를 결합한 연구가 좋은 성능을 보이고 있다. 그러나 CRF는 시간 복잡도가 분류해야 하는 클래스(Class) 개수의 제곱에 비례하고, 최근 RNN과 Softmax 모델보다 낮은 성능을 보이는 연구도 있었다. 본 논문에서는 CRF의 단점을 보완한 LAN(Label Attention Network)와 사전 학습 언어 모델인 음절 단위 ELECTRA를 활용하는 개체명 인식 모델을 제안한다.

  • PDF

A Big Data Based Random Motif Frequency Method for Analyzing Human Proteins (인간 단백질 분석을 위한 빅 데이타 기반 RMF 방법)

  • Kim, Eun-Mi;Jeong, Jong-Cheol;Lee, Bae-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.6
    • /
    • pp.1397-1404
    • /
    • 2018
  • Due to the technical difficulties and high cost for obtaining 3-dimensional structure data, sequence-based approaches in proteins have not been widely acknowledged. A motif can be defined as any segments in protein or gene sequences. With this simplicity, motifs have been actively and widely used in various areas. However, the motif itself has not been studied comprehensively. The value of this study can be categorized in three fields in order to analyze the human proteins using artificial intelligence method: (1) Based on our best knowledge, this research is the first comprehensive motif analysis by analyzing motifs with all human proteins in Protein Data Bank (PDB) associated with the database of Enzyme Commission (EC) number and Structural Classification of Proteins (SCOP). (2) We deeply analyze the motif in three different categories: pattern, statistical, and functional analysis of clusters. (3) At the last and most importantly, we proposed random motif frequency(RMF) matric that can efficiently distinct the characteristics of proteins by identifying interface residues from non-interface residues and clustering protein functions based on big data while varying the size of random motif.

Experimental Comparison of Network Intrusion Detection Models Solving Imbalanced Data Problem (데이터의 불균형성을 제거한 네트워크 침입 탐지 모델 비교 분석)

  • Lee, Jong-Hwa;Bang, Jiwon;Kim, Jong-Wouk;Choi, Mi-Jung
    • KNOM Review
    • /
    • v.23 no.2
    • /
    • pp.18-28
    • /
    • 2020
  • With the development of the virtual community, the benefits that IT technology provides to people in fields such as healthcare, industry, communication, and culture are increasing, and the quality of life is also improving. Accordingly, there are various malicious attacks targeting the developed network environment. Firewalls and intrusion detection systems exist to detect these attacks in advance, but there is a limit to detecting malicious attacks that are evolving day by day. In order to solve this problem, intrusion detection research using machine learning is being actively conducted, but false positives and false negatives are occurring due to imbalance of the learning dataset. In this paper, a Random Oversampling method is used to solve the unbalance problem of the UNSW-NB15 dataset used for network intrusion detection. And through experiments, we compared and analyzed the accuracy, precision, recall, F1-score, training and prediction time, and hardware resource consumption of the models. Based on this study using the Random Oversampling method, we develop a more efficient network intrusion detection model study using other methods and high-performance models that can solve the unbalanced data problem.

A Study on the Prediction of Uniaxial Compressive Strength Classification Using Slurry TBM Data and Random Forest (이수식 TBM 데이터와 랜덤포레스트를 이용한 일축압축강도 분류 예측에 관한 연구)

  • Tae-Ho Kang;Soon-Wook Choi;Chulho Lee;Soo-Ho Chang
    • Tunnel and Underground Space
    • /
    • v.33 no.6
    • /
    • pp.547-560
    • /
    • 2023
  • Recently, research on predicting ground classification using machine learning techniques, TBM excavation data, and ground data is increasing. In this study, a multi-classification prediction study for uniaxial compressive strength (UCS) was conducted by applying random forest model based on a decision tree among machine learning techniques widely used in various fields to machine data and ground data acquired at three slurry shield TBM sites. For the classification prediction, the training and test data were divided into 7:3, and a grid search including 5-fold cross-validation was used to select the optimal parameter. As a result of classification learning for UCS using a random forest, the accuracy of the multi-classification prediction model was found to be high at both 0.983 and 0.982 in the training set and the test set, respectively. However, due to the imbalance in data distribution between classes, the recall was evaluated low in class 4. It is judged that additional research is needed to increase the amount of measured data of UCS acquired in various sites.

A Statistical Prediction Model of Speakers' Intentions in a Goal-Oriented Dialogue (목적지향 대화에서 화자 의도의 통계적 예측 모델)

  • Kim, Dong-Hyun;Kim, Hark-Soo;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.9
    • /
    • pp.554-561
    • /
    • 2008
  • Prediction technique of user's intention can be used as a post-processing method for reducing the search space of an automatic speech recognizer. Prediction technique of system's intention can be used as a pre-processing method for generating a flexible sentence. To satisfy these practical needs, we propose a statistical model to predict speakers' intentions that are generalized into pairs of a speech act and a concept sequence. Contrary to the previous model using simple n-gram statistic of speech acts, the proposed model represents a dialogue history of a current utterance to a feature set with various linguistic levels (i.e. n-grams of speech act and a concept sequence pairs, clue words, and state information of a domain frame). Then, the proposed model predicts the intention of the next utterance by using the feature set as inputs of CRFs (Conditional Random Fields). In the experiment in a schedule management domain, The proposed model showed the precision of 76.25% on prediction of user's speech act and the precision of 64.21% on prediction of user's concept sequence. The proposed model also showed the precision of 88.11% on prediction of system's speech act and the Precision of 87.19% on prediction of system's concept sequence. In addition, the proposed model showed 29.32% higher average precision than the previous model.

Do Inner Planets Modulate the Space Environment of the Earth?

  • Kim, Jung-Hee;Chang, Heon-Young
    • Journal of Astronomy and Space Sciences
    • /
    • v.31 no.1
    • /
    • pp.7-13
    • /
    • 2014
  • Variabilities in the solar wind cause disturbances throughout the heliosphere on all temporal and spatial scales, which leads to changeable space weather. As a view of space weather forecasting, in particular, it is important to know direct and indirect causes modulating the space environment near the Earth in advance. Recently, there are discussions on a role of the interaction of the solar wind with Mercury in affecting the solar wind velocity in the Earth's neighborhood during its inferior conjunctions. In this study we investigate a question of whether other parameters describing the space environment near the Earth are modulated by the inner planets' wake, by examining whether the interplanetary magnetic field and the proton density in the solar wind observed by the Advanced Composition Explorer (ACE) spacecraft, and the geomagnetic field via the Dst index and Auroral Electrojet index (AE index) are dependent upon the relative position of the inner planets. We find there are indeed apparent variations. For example, the mean variations of the geomagnetic fields measured in the Earth's neighborhood apparently have varied with a timescale of about 10 to 25 days. Those variations in the parameters we have studied, however, turn out to be a part of random fluctuations and have nothing to do with the relative position of inner planets. Moreover, it is found that variations of the proton density in the solar wind, the Dst index, and the AE index are distributed with the Gaussian distribution. Finally, we point out that some of properties in the behavior of the random fluctuation are to be studied.

On the Temporal Variability of Geomagnetic Field and Transfer Function at Icheon Observatory (이천관측소에서 측정된 지자기장 및 지자기 전달함수의 시간적 변동성)

  • Lee, Duk-Kee;Kwon, Byung-Doo;Youn, Yong-Hoon;Yang, Jun-Mo
    • Journal of the Korean earth science society
    • /
    • v.25 no.7
    • /
    • pp.604-614
    • /
    • 2004
  • Using three-components geomagnetic data from a permanent geomagnetic observatory in Icheon, we have computed the power spectrum of each geomagnetic component, amplitude, phase and estimation error of transfer function for each day in the 6 months period July 2002${\sim}$December 2002. The temporal variation of power spectrum have random appearances with repeating relative strong and weak magnitude, which is considered as solar activities. However, there is no clear long-term trend. In the case of amplitude, phase and error of transfer function, even though there are some random patterns over the periods of 1000 s and under 100 s, they seem to be comparatively stable without manifest temporal changes. Futhermore, we have estimated electrical field by assuming P$_{1}\;^{0}$ spherical harmonics and then calculated the approximated apparent resistivity for each day. As a result, the variations of resistivity depend on the temporal magnitude of spectral power in horizontal magnetic fields rather than hydrological changes in near surface.

Study on the Characteristics of Infinite Slope Failures by Probabilistic Seepage Analysis (확률론적 침투해석을 통한 무한사면 파괴의 특성 연구)

  • Cho, Sung-Eun
    • Journal of the Korean Geotechnical Society
    • /
    • v.30 no.10
    • /
    • pp.5-18
    • /
    • 2014
  • Many regions around the world are vulnerable to rainfall-induced slope failures. A variety of methods have been proposed for revealing the mechanism of slope failure initiation. Current analysis methods, however, do not consider the effects of non-homogeneous soil profiles and variable hydraulic responses on rainfall-induced slope failures. In this study, probabilistic stability analyses were conducted for weathered residual soil slopes with different soil thickness overlying impermeable bedrock to study the rainfall-induced failure mechanisms depending on the soil thickness. A series of seepage and stability analyses of an infinite slope based on one-dimensional random fields were performed to consider the effects of uncertainty due to the spatial heterogeneity of hydraulic conductivity on the failure of unsaturated slopes due to rainfall infiltration. The results showed that a probabilistic framework can be used to efficiently consider various failure patterns caused by spatial variability of hydraulic conductivity in rainfall infiltration assessment for a infinite slope.

Joint analysis of binary and continuous data using skewed logit model in developmental toxicity studies (발달 독성학에서 비대칭 로짓 모형을 사용한 이진수 자료와 연속형 자료에 대한 결합분석)

  • Kim, Yeong-hwa;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.2
    • /
    • pp.123-136
    • /
    • 2020
  • It is common to encounter correlated multiple outcomes measured on the same subject in various research fields. In developmental toxicity studies, presence of malformed pups and fetal weight are measured on the pregnant dams exposed to different levels of a toxic substance. Joint analysis of such two outcomes can result in more efficient inferences than separate models for each outcome. Most methods for joint modeling assume a normal distribution as random effects. However, in developmental toxicity studies, the response distributions may change irregularly in location and shape as the level of toxic substance changes, which may not be captured by a normal random effects model. Motivated by applications in developmental toxicity studies, we propose a Bayesian joint model for binary and continuous outcomes. In our model, we incorporate a skewed logit model for the binary outcome to allow the response distributions to have flexibly in both symmetric and asymmetric shapes on the toxic levels. We apply our proposed method to data from a developmental toxicity study of diethylhexyl phthalate.