• Title/Summary/Keyword: Conditional random field

Search Result 47, Processing Time 0.019 seconds

Encoding Dictionary Feature for Deep Learning-based Named Entity Recognition

  • Ronran, Chirawan;Unankard, Sayan;Lee, Seungwoo
    • International Journal of Contents
    • /
    • v.17 no.4
    • /
    • pp.1-15
    • /
    • 2021
  • Named entity recognition (NER) is a crucial task for NLP, which aims to extract information from texts. To build NER systems, deep learning (DL) models are learned with dictionary features by mapping each word in the dataset to dictionary features and generating a unique index. However, this technique might generate noisy labels, which pose significant challenges for the NER task. In this paper, we proposed DL-dictionary features, and evaluated them on two datasets, including the OntoNotes 5.0 dataset and our new infectious disease outbreak dataset named GFID. We used (1) a Bidirectional Long Short-Term Memory (BiLSTM) character and (2) pre-trained embedding to concatenate with (3) our proposed features, named the Convolutional Neural Network (CNN), BiLSTM, and self-attention dictionaries, respectively. The combined features (1-3) were fed through BiLSTM - Conditional Random Field (CRF) to predict named entity classes as outputs. We compared these outputs with other predictions of the BiLSTM character, pre-trained embedding, and dictionary features from previous research, which used the exact matching and partial matching dictionary technique. The findings showed that the model employing our dictionary features outperformed other models that used existing dictionary features. We also computed the F1 score with the GFID dataset to apply this technique to extract medical or healthcare information.

Assessment of Turbulent Spectral Estimators in LDV (LDV의 난류 스펙트럼 추정치 평가)

  • 이도환;성형진
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.16 no.9
    • /
    • pp.1788-1795
    • /
    • 1992
  • Numerical simulations have been performed to investigate various spectral estimators used in LDV signal processing. In order to simulate a particle arrival time statistics known as the doubly stochastic poisson process, an autoregressive vector model was adopted to construct a primary velocity field. The conditional Poisson process with a random rate parameter was generated through the rescaling time process using the mean value function. The direct transform based on random sampling sequences and the standard periodogram using periodically resampled data by the sample and hold interpolation were applied to obtain power spectral density functions. For low turbulent intensity flows, the direct transform with a constant Poisson intensity is in good agreement with the theoretical spectrum. The periodogram using the sample and hold sequences is better than the direct transform in the view of the stability and the weighting of the velocity bias for high data density flows. The high Reynolds stress and high fluctuation of the transverse velocity component affects the velocity bias which increases the distortion of spectral components in the direct transform.

Compiler Analysis Framework Using SVM-Based Genetic Algorithm : Feature and Model Selection Sensitivity (SVM 기반 유전 알고리즘을 이용한 컴파일러 분석 프레임워크 : 특징 및 모델 선택 민감성)

  • Hwang, Cheol-Hun;Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.537-544
    • /
    • 2020
  • Advances in detection techniques, such as mutation and obfuscation, are being advanced with the development of malware technology. In the malware detection technology, unknown malware detection technology is important, and a method for Malware Authorship Attribution that detects an unknown malicious code by identifying the author through distributed malware is being studied. In this paper, we try to extract the compiler information affecting the binary-based author identification method and to investigate the sensitivity of feature selection, probability and non-probability models, and optimization to classification efficiency between studies. In the experiment, the feature selection method through information gain and the support vector machine, which is a non-probability model, showed high efficiency. Among the optimization studies, high classification accuracy was obtained through feature selection and model optimization through the proposed framework, and resulted in 48% feature reduction and 53 faster execution speed. Through this study, we can confirm the sensitivity of feature selection, model, and optimization methods to classification efficiency.

Impacts on Residence Time and Water Quality of the Saemangeum Reservoir Caused by Inner Development (새만금 내부개발이 체류시간 및 수질변화에 미치는 영향)

  • Yoo, Sang-Cheol;Suh, Seung-Won;Lee, Hwa-Young
    • Journal of the Korean Society for Marine Environment & Energy
    • /
    • v.15 no.3
    • /
    • pp.186-197
    • /
    • 2012
  • In order to understand hydrodynamic and water quality changes on the Saemangeum reservoir in accordance to inner development plan, intensive numerical simulations using EFDC have been done. Due to inner dike construction and proposed dredging plans, stratification might occur and yield flow field change. It should be noticed that very conditional gate operation schedule adjusting target water elevation of -1.5 meter causes severe stratification and hence plays an important role in poor water qualities. By using random walk particle tracking residence simulations, it is found that hydrodynamic characteristics depends greatly on riverine inflow conditions. It is also inferred that the northern part of the Mangyeong reservoir behaves as a dead zone and acts as major reasoning of water quality deterioration owing to benthic flux from long-term residing settled sediment.

Development of the Rule-based Smart Tourism Chatbot using Neo4J graph database

  • Kim, Dong-Hyun;Im, Hyeon-Su;Hyeon, Jong-Heon;Jwa, Jeong-Woo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.2
    • /
    • pp.179-186
    • /
    • 2021
  • We have been developed the smart tourism app and the Instagram and YouTube contents to provide personalized tourism information and travel product information to individual tourists. In this paper, we develop a rule-based smart tourism chatbot with the khaiii (Kakao Hangul Analyzer III) morphological analyzer and Neo4J graph database. In the proposed chatbot system, we use a morpheme analyzer, a proper noun dictionary including tourist destination names, and a general noun dictionary including containing frequently used words in tourist information search to understand the intention of the user's question. The tourism knowledge base built using the Neo4J graph database provides adequate answers to tourists' questions. In this paper, the nodes of Neo4J are Area based on tourist destination address, Contents with property of tourist information, and Service including service attribute data frequently used for search. A Neo4J query is created based on the result of analyzing the intention of a tourist's question with the property of nodes and relationships in Neo4J database. An answer to the question is made by searching in the tourism knowledge base. In this paper, we create the tourism knowledge base using more than 1300 Jeju tourism information used in the smart tourism app. We plan to develop a multilingual smart tour chatbot using the named entity recognition (NER), intention classification using conditional random field(CRF), and transfer learning using the pretrained language models.

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods (다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1085-1093
    • /
    • 2016
  • Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

Improving Bidirectional LSTM-CRF model Of Sequence Tagging by using Ontology knowledge based feature (온톨로지 지식 기반 특성치를 활용한 Bidirectional LSTM-CRF 모델의 시퀀스 태깅 성능 향상에 관한 연구)

  • Jin, Seunghee;Jang, Heewon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.253-266
    • /
    • 2018
  • This paper proposes a methodology applying sequence tagging methodology to improve the performance of NER(Named Entity Recognition) used in QA system. In order to retrieve the correct answers stored in the database, it is necessary to switch the user's query into a language of the database such as SQL(Structured Query Language). Then, the computer can recognize the language of the user. This is the process of identifying the class or data name contained in the database. The method of retrieving the words contained in the query in the existing database and recognizing the object does not identify the homophone and the word phrases because it does not consider the context of the user's query. If there are multiple search results, all of them are returned as a result, so there can be many interpretations on the query and the time complexity for the calculation becomes large. To overcome these, this study aims to solve this problem by reflecting the contextual meaning of the query using Bidirectional LSTM-CRF. Also we tried to solve the disadvantages of the neural network model which can't identify the untrained words by using ontology knowledge based feature. Experiments were conducted on the ontology knowledge base of music domain and the performance was evaluated. In order to accurately evaluate the performance of the L-Bidirectional LSTM-CRF proposed in this study, we experimented with converting the words included in the learned query into untrained words in order to test whether the words were included in the database but correctly identified the untrained words. As a result, it was possible to recognize objects considering the context and can recognize the untrained words without re-training the L-Bidirectional LSTM-CRF mode, and it is confirmed that the performance of the object recognition as a whole is improved.