• 제목/요약/키워드: natural language classification

검색결과 191건 처리시간 0.026초

Machine에 의한 자연 언어 이해의 효과성 및 탄력성 중대를 위한 자연언어 이해 기법과 분류 기법과 연결적 통합 사용에 대한 연구 (A Study of Improving the Flexibility and Effectiveness of Natural Anguage Understanding Considering Natural Language Classification Methodologies)

  • 이현부
    • 한국지능시스템학회논문지
    • /
    • 제1권3호
    • /
    • pp.20-32
    • /
    • 1991
  • This study seeks a way a way of dealing with unformatted natural language considering fuzzy set theory. The goal of the study is to establish a framework of an effective language understanding system that is linked to language classification system This study has found that languate understanding is strongly influenced by the language classification. The understanding of language. This study shows that the precision of language classification depends upon the way of how the language is classified in advance. In this study, a fuzzy logic was used to improve the precision of language classification. It was considered that the fuzzy logic might be albe to distinctively classify nuatural language texts into pretinent homogenious groups where contents of the language were identical. Accordingly, in the study, it was expected that classification of language were precisely classified by the fuzzy logic. An experimentalsystems was designed to evaluate the performane of a natural language understanding system that was connected to a fuzzy language classification system. Finally, the experiment suggests that a successful language understanding should require an real time interaction between mem andmachine fuzzy provious language classification.

  • PDF

기계학습을 이용한 한국어 대화시스템 도메인 분류 (Machine Learning Based Domain Classification for Korean Dialog System)

  • 정영섭
    • 융합정보논문지
    • /
    • 제9권8호
    • /
    • pp.1-8
    • /
    • 2019
  • 대화시스템은 인간과 컴퓨터의 상호작용에 새로운 패러다임이 되고 있다. 자연어로써 상호작용함으로써 인간은 보다 자연스럽고 편리하게 각종 서비스를 누릴 수 있게 되었다. 대화시스템의 구조는 일반적으로 음성 인식, 자연어 이해, 문맥 파악 등의 여러 모듈의 파이프라인으로 이뤄지는데, 본 연구에서는 자연어 이해 모듈의 도메인 분류 문제를 풀기 위해 convolutional neural network, random forest 등의 기계학습 모델을 비교하였다. 사람이 직접 태깅한 총 7개 서비스 도메인 데이터에 대하여 각 문장의 도메인을 분류하는 실험을 수행하였고 random forest 모델이 F1 score 0.97 이상으로 가장 높은 성능을 달성한 것을 보였다. 향후 다른 기계학습 모델들을 추가 실험함으로써 도메인 분류 성능 개선을 지속할 계획이다.

자연어 질의유형 판별과 응답 추출을 위한 어휘 의미 체계에 관한 연구 (A Study on Work Semantic Categories for Natural Language Question Type Classification and Answer Extraction)

  • 윤성희
    • 한국산학기술학회논문지
    • /
    • 제5권6호
    • /
    • pp.539-545
    • /
    • 2004
  • 자연어 질의를 입력하고 문서로부터 질의에 대한 정답을 추출하여 제공하는 질의응답 시스템에서는 사용자의 질의 의도를 파악하여 질의 유형을 분류하는 과정이 매우 중요하다. 본 논문에서는 질의 유형을 분류하기 위해 복잡한 분류 규칙이나 대용량의 사전 정보를 이용하지 않고 질의의 의도를 나타내는 어휘들을 추출하고 인접 명사들의 의미 정보를 이용하여 질의 및 정답 유형을 결정할 수 있는 방법을 제안한다. 또 동의어 정보와 접미사 정보를 이용하고, 의문사가 생략된 경우 어휘 의미 정보를 이용하여 질의 유형 분류기의 성능을 향상시킬 수 있음을 보인다.

  • PDF

자연어 처리 기반 『상한론(傷寒論)』 변병진단체계(辨病診斷體系) 분류를 위한 기계학습 모델 선정 (Selecting Machine Learning Model Based on Natural Language Processing for Shanghanlun Diagnostic System Classification)

  • 김영남
    • 대한상한금궤의학회지
    • /
    • 제14권1호
    • /
    • pp.41-50
    • /
    • 2022
  • Objective : The purpose of this study is to explore the most suitable machine learning model algorithm for Shanghanlun diagnostic system classification using natural language processing (NLP). Methods : A total of 201 data items were collected from 『Shanghanlun』 and 『Clinical Shanghanlun』, 'Taeyangbyeong-gyeolhyung' and 'Eumyangyeokchahunobokbyeong' were excluded to prevent oversampling or undersampling. Data were pretreated using a twitter Korean tokenizer and trained by logistic regression, ridge regression, lasso regression, naive bayes classifier, decision tree, and random forest algorithms. The accuracy of the models were compared. Results : As a result of machine learning, ridge regression and naive Bayes classifier showed an accuracy of 0.843, logistic regression and random forest showed an accuracy of 0.804, and decision tree showed an accuracy of 0.745, while lasso regression showed an accuracy of 0.608. Conclusions : Ridge regression and naive Bayes classifier are suitable NLP machine learning models for the Shanghanlun diagnostic system classification.

  • PDF

깊은 신경망 기반 대용량 텍스트 데이터 분류 기술 (Large-Scale Text Classification with Deep Neural Networks)

  • 조휘열;김진화;김경민;장정호;엄재홍;장병탁
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제23권5호
    • /
    • pp.322-327
    • /
    • 2017
  • 문서 분류 문제는 오랜 기간 동안 자연어 처리 분야에서 연구되어 왔다. 우리는 기존 컨볼루션 신경망을 이용했던 연구에서 나아가, 순환 신경망에 기반을 둔 문서 분류를 수행하였고 그 결과를 종합하여 제시하려 한다. 컨볼루션 신경망은 단층 컨볼루션 신경망을 사용했으며, 순환 신경망은 가장 성능이 좋다고 알려져 있는 장기-단기 기억 신경망과 회로형 순환 유닛을 활용하였다. 실험 결과, 분류 정확도는 Multinomial Naïve Bayesian Classifier < SVM < LSTM < CNN < GRU의 순서로 나타났다. 따라서 텍스트 문서 분류 문제는 시퀀스를 고려하는 것 보다는 문서의 feature를 추출하여 분류하는 문제에 가깝다는 것을 확인할 수 있었다. 그리고 GRU가 LSTM보다 문서의 feature 추출에 더 적합하다는 것을 알 수 있었으며 적절한 feature와 시퀀스 정보를 함께 활용할 때 가장 성능이 잘 나온다는 것을 확인할 수 있었다.

Academic Registration Text Classification Using Machine Learning

  • Alhawas, Mohammed S;Almurayziq, Tariq S
    • International Journal of Computer Science & Network Security
    • /
    • 제22권1호
    • /
    • pp.93-96
    • /
    • 2022
  • Natural language processing (NLP) is utilized to understand a natural text. Text analysis systems use natural language algorithms to find the meaning of large amounts of text. Text classification represents a basic task of NLP with a wide range of applications such as topic labeling, sentiment analysis, spam detection, and intent detection. The algorithm can transform user's unstructured thoughts into more structured data. In this work, a text classifier has been developed that uses academic admission and registration texts as input, analyzes its content, and then automatically assigns relevant tags such as admission, graduate school, and registration. In this work, the well-known algorithms support vector machine SVM and K-nearest neighbor (kNN) algorithms are used to develop the above-mentioned classifier. The obtained results showed that the SVM classifier outperformed the kNN classifier with an overall accuracy of 98.9%. in addition, the mean absolute error of SVM was 0.0064 while it was 0.0098 for kNN classifier. Based on the obtained results, the SVM is used to implement the academic text classification in this work.

Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text

  • Atwan, Jaffar
    • International Journal of Computer Science & Network Security
    • /
    • 제22권7호
    • /
    • pp.65-74
    • /
    • 2022
  • In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf's law, and Combined Stop-list. An experiment was conducted using a selected file from the Arabic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.

도서관$\cdot$정보학에서의 인공지능의 응용에 관한 고찰 (Artificial Intelligence Applications in Library and Information Science)

  • 정영미
    • 한국문헌정보학회지
    • /
    • 제14권
    • /
    • pp.67-92
    • /
    • 1987
  • In this paper, artificial intelligence applications in library and information science are reviewed. Especially, natural language processing and expert systems are represented as the two major application areas. In natural language processing, natural language interface systems and .question-answering systems are discussed in detail with some specific examples. In the second part of the paper, online search intermidiary systems, reference expert systems, classification and cataloging expert systems are described as possible expert systems to be developed in libraries and information systems. As a conclusion, implications of the artificial intelligence applications for librarians and information scientists are suggested.

  • PDF

딥러닝을 이용한 언어별 단어 분류 기법 (Language-based Classification of Words using Deep Learning)

  • 듀크;다후다;조인휘
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2021년도 춘계학술발표대회
    • /
    • pp.411-414
    • /
    • 2021
  • One of the elements of technology that has become extremely critical within the field of education today is Deep learning. It has been especially used in the area of natural language processing, with some word-representation vectors playing a critical role. However, some of the low-resource languages, such as Swahili, which is spoken in East and Central Africa, do not fall into this category. Natural Language Processing is a field of artificial intelligence where systems and computational algorithms are built that can automatically understand, analyze, manipulate, and potentially generate human language. After coming to discover that some African languages fail to have a proper representation within language processing, even going so far as to describe them as lower resource languages because of inadequate data for NLP, we decided to study the Swahili language. As it stands currently, language modeling using neural networks requires adequate data to guarantee quality word representation, which is important for natural language processing (NLP) tasks. Most African languages have no data for such processing. The main aim of this project is to recognize and focus on the classification of words in English, Swahili, and Korean with a particular emphasis on the low-resource Swahili language. Finally, we are going to create our own dataset and reprocess the data using Python Script, formulate the syllabic alphabet, and finally develop an English, Swahili, and Korean word analogy dataset.

CNN-based Skip-Gram Method for Improving Classification Accuracy of Chinese Text

  • Xu, Wenhua;Huang, Hao;Zhang, Jie;Gu, Hao;Yang, Jie;Gui, Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권12호
    • /
    • pp.6080-6096
    • /
    • 2019
  • Text classification is one of the fundamental techniques in natural language processing. Numerous studies are based on text classification, such as news subject classification, question answering system classification, and movie review classification. Traditional text classification methods are used to extract features and then classify them. However, traditional methods are too complex to operate, and their accuracy is not sufficiently high. Recently, convolutional neural network (CNN) based one-hot method has been proposed in text classification to solve this problem. In this paper, we propose an improved method using CNN based skip-gram method for Chinese text classification and it conducts in Sogou news corpus. Experimental results indicate that CNN with the skip-gram model performs more efficiently than CNN-based one-hot method.