• Title/Summary/Keyword: natural language classification

Search Result 191, Processing Time 0.029 seconds

A Study of Improving the Flexibility and Effectiveness of Natural Anguage Understanding Considering Natural Language Classification Methodologies (Machine에 의한 자연 언어 이해의 효과성 및 탄력성 중대를 위한 자연언어 이해 기법과 분류 기법과 연결적 통합 사용에 대한 연구)

  • 이현부
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.1 no.3
    • /
    • pp.20-32
    • /
    • 1991
  • This study seeks a way a way of dealing with unformatted natural language considering fuzzy set theory. The goal of the study is to establish a framework of an effective language understanding system that is linked to language classification system This study has found that languate understanding is strongly influenced by the language classification. The understanding of language. This study shows that the precision of language classification depends upon the way of how the language is classified in advance. In this study, a fuzzy logic was used to improve the precision of language classification. It was considered that the fuzzy logic might be albe to distinctively classify nuatural language texts into pretinent homogenious groups where contents of the language were identical. Accordingly, in the study, it was expected that classification of language were precisely classified by the fuzzy logic. An experimentalsystems was designed to evaluate the performane of a natural language understanding system that was connected to a fuzzy language classification system. Finally, the experiment suggests that a successful language understanding should require an real time interaction between mem andmachine fuzzy provious language classification.

  • PDF

Machine Learning Based Domain Classification for Korean Dialog System (기계학습을 이용한 한국어 대화시스템 도메인 분류)

  • Jeong, Young-Seob
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.8
    • /
    • pp.1-8
    • /
    • 2019
  • Dialog system is becoming a new dominant interaction way between human and computer. It allows people to be provided with various services through natural language. The dialog system has a common structure of a pipeline consisting of several modules (e.g., speech recognition, natural language understanding, and dialog management). In this paper, we tackle a task of domain classification for the natural language understanding module by employing machine learning models such as convolutional neural network and random forest. For our dataset of seven service domains, we showed that the random forest model achieved the best performance (F1 score 0.97). As a future work, we will keep finding a better approach for domain classification by investigating other machine learning models.

A Study on Work Semantic Categories for Natural Language Question Type Classification and Answer Extraction (자연어 질의유형 판별과 응답 추출을 위한 어휘 의미 체계에 관한 연구)

  • Yoon Sung-Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.5 no.6
    • /
    • pp.539-545
    • /
    • 2004
  • For question answering system that extracts an answer and output to user‘s natural language question, a process of question type classification from user’s natural language query is very important. This paper proposes a question and answer type classifier using the interrogatives and word semantic categories instead of complicated classifying rules and huge dictionaries. Synonyms and postfix information are also used for question type classification. Experiments show that the semantic categories are helpful for question type classifying without interrogatives.

  • PDF

Selecting Machine Learning Model Based on Natural Language Processing for Shanghanlun Diagnostic System Classification (자연어 처리 기반 『상한론(傷寒論)』 변병진단체계(辨病診斷體系) 분류를 위한 기계학습 모델 선정)

  • Young-Nam Kim
    • 대한상한금궤의학회지
    • /
    • v.14 no.1
    • /
    • pp.41-50
    • /
    • 2022
  • Objective : The purpose of this study is to explore the most suitable machine learning model algorithm for Shanghanlun diagnostic system classification using natural language processing (NLP). Methods : A total of 201 data items were collected from 『Shanghanlun』 and 『Clinical Shanghanlun』, 'Taeyangbyeong-gyeolhyung' and 'Eumyangyeokchahunobokbyeong' were excluded to prevent oversampling or undersampling. Data were pretreated using a twitter Korean tokenizer and trained by logistic regression, ridge regression, lasso regression, naive bayes classifier, decision tree, and random forest algorithms. The accuracy of the models were compared. Results : As a result of machine learning, ridge regression and naive Bayes classifier showed an accuracy of 0.843, logistic regression and random forest showed an accuracy of 0.804, and decision tree showed an accuracy of 0.745, while lasso regression showed an accuracy of 0.608. Conclusions : Ridge regression and naive Bayes classifier are suitable NLP machine learning models for the Shanghanlun diagnostic system classification.

  • PDF

Large-Scale Text Classification with Deep Neural Networks (깊은 신경망 기반 대용량 텍스트 데이터 분류 기술)

  • Jo, Hwiyeol;Kim, Jin-Hwa;Kim, Kyung-Min;Chang, Jeong-Ho;Eom, Jae-Hong;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.322-327
    • /
    • 2017
  • The classification problem in the field of Natural Language Processing has been studied for a long time. Continuing forward with our previous research, which classifies large-scale text using Convolutional Neural Networks (CNN), we implemented Recurrent Neural Networks (RNN), Long-Short Term Memory (LSTM) and Gated Recurrent Units (GRU). The experiment's result revealed that the performance of classification algorithms was Multinomial Naïve Bayesian Classifier < Support Vector Machine (SVM) < LSTM < CNN < GRU, in order. The result can be interpreted as follows: First, the result of CNN was better than LSTM. Therefore, the text classification problem might be related more to feature extraction problem than to natural language understanding problems. Second, judging from the results the GRU showed better performance in feature extraction than LSTM. Finally, the result that the GRU was better than CNN implies that text classification algorithms should consider feature extraction and sequential information. We presented the results of fine-tuning in deep neural networks to provide some intuition regard natural language processing to future researchers.

Academic Registration Text Classification Using Machine Learning

  • Alhawas, Mohammed S;Almurayziq, Tariq S
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.1
    • /
    • pp.93-96
    • /
    • 2022
  • Natural language processing (NLP) is utilized to understand a natural text. Text analysis systems use natural language algorithms to find the meaning of large amounts of text. Text classification represents a basic task of NLP with a wide range of applications such as topic labeling, sentiment analysis, spam detection, and intent detection. The algorithm can transform user's unstructured thoughts into more structured data. In this work, a text classifier has been developed that uses academic admission and registration texts as input, analyzes its content, and then automatically assigns relevant tags such as admission, graduate school, and registration. In this work, the well-known algorithms support vector machine SVM and K-nearest neighbor (kNN) algorithms are used to develop the above-mentioned classifier. The obtained results showed that the SVM classifier outperformed the kNN classifier with an overall accuracy of 98.9%. in addition, the mean absolute error of SVM was 0.0064 while it was 0.0098 for kNN classifier. Based on the obtained results, the SVM is used to implement the academic text classification in this work.

Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text

  • Atwan, Jaffar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.65-74
    • /
    • 2022
  • In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf's law, and Combined Stop-list. An experiment was conducted using a selected file from the Arabic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.

Artificial Intelligence Applications in Library and Information Science (도서관$\cdot$정보학에서의 인공지능의 응용에 관한 고찰)

  • Chung Young Mee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.14
    • /
    • pp.67-92
    • /
    • 1987
  • In this paper, artificial intelligence applications in library and information science are reviewed. Especially, natural language processing and expert systems are represented as the two major application areas. In natural language processing, natural language interface systems and .question-answering systems are discussed in detail with some specific examples. In the second part of the paper, online search intermidiary systems, reference expert systems, classification and cataloging expert systems are described as possible expert systems to be developed in libraries and information systems. As a conclusion, implications of the artificial intelligence applications for librarians and information scientists are suggested.

  • PDF

Language-based Classification of Words using Deep Learning (딥러닝을 이용한 언어별 단어 분류 기법)

  • Zacharia, Nyambegera Duke;Dahouda, Mwamba Kasongo;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.411-414
    • /
    • 2021
  • One of the elements of technology that has become extremely critical within the field of education today is Deep learning. It has been especially used in the area of natural language processing, with some word-representation vectors playing a critical role. However, some of the low-resource languages, such as Swahili, which is spoken in East and Central Africa, do not fall into this category. Natural Language Processing is a field of artificial intelligence where systems and computational algorithms are built that can automatically understand, analyze, manipulate, and potentially generate human language. After coming to discover that some African languages fail to have a proper representation within language processing, even going so far as to describe them as lower resource languages because of inadequate data for NLP, we decided to study the Swahili language. As it stands currently, language modeling using neural networks requires adequate data to guarantee quality word representation, which is important for natural language processing (NLP) tasks. Most African languages have no data for such processing. The main aim of this project is to recognize and focus on the classification of words in English, Swahili, and Korean with a particular emphasis on the low-resource Swahili language. Finally, we are going to create our own dataset and reprocess the data using Python Script, formulate the syllabic alphabet, and finally develop an English, Swahili, and Korean word analogy dataset.

CNN-based Skip-Gram Method for Improving Classification Accuracy of Chinese Text

  • Xu, Wenhua;Huang, Hao;Zhang, Jie;Gu, Hao;Yang, Jie;Gui, Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.6080-6096
    • /
    • 2019
  • Text classification is one of the fundamental techniques in natural language processing. Numerous studies are based on text classification, such as news subject classification, question answering system classification, and movie review classification. Traditional text classification methods are used to extract features and then classify them. However, traditional methods are too complex to operate, and their accuracy is not sufficiently high. Recently, convolutional neural network (CNN) based one-hot method has been proposed in text classification to solve this problem. In this paper, we propose an improved method using CNN based skip-gram method for Chinese text classification and it conducts in Sogou news corpus. Experimental results indicate that CNN with the skip-gram model performs more efficiently than CNN-based one-hot method.