• Title/Summary/Keyword: Morpheme Analysis

Search Result 122, Processing Time 0.024 seconds

A Study on Applicability of Machine Learning for Book Classification of Public Libraries: Focusing on Social Science and Arts (공공도서관 도서 분류를 위한 머신러닝 적용 가능성 연구 - 사회과학과 예술분야를 중심으로 -)

  • Kwak, Chul Wan
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.1
    • /
    • pp.133-150
    • /
    • 2021
  • The purpose of this study is to identify the applicability of machine learning targeting titles in the classification of books in public libraries. Data analysis was performed using Python's scikit-learn library through the Jupiter notebook of the Anaconda platform. KoNLPy analyzer and Okt class were used for Hangul morpheme analysis. The units of analysis were 2,000 title fields and KDC classification class numbers (300 and 600) extracted from the KORMARC records of public libraries. As a result of analyzing the data using six machine learning models, it showed a possibility of applying machine learning to book classification. Among the models used, the neural network model has the highest accuracy of title classification. The study suggested the need for improving the accuracy of title classification, the need for research on book titles, tokenization of titles, and stop words.

Functional Expansion of Morphological Analyzer Based on Longest Phrase Matching For Efficient Korean Parsing (효율적인 한국어 파싱을 위한 최장일치 기반의 형태소 분석기 기능 확장)

  • Lee, Hyeon-yoeng;Lee, Jong-seok;Kang, Byeong-do;Yang, Seung-weon
    • Journal of Digital Contents Society
    • /
    • v.17 no.3
    • /
    • pp.203-210
    • /
    • 2016
  • Korean is free of omission of sentence elements and modifying scope, so managing it on morphological analyzer is better than parser. In this paper, we propose functional expansion methods of the morphological analyzer to ease the burden of parsing. This method is a longest phrase matching method. When the series of several morpheme have one syntax category by processing of Unknown-words, Compound verbs, Compound nouns, Numbers and Symbols, our method combines them into a syntactic unit. And then, it is to treat by giving them a semantic features as syntax unit. The proposed morphological analysis method removes unnecessary morphological ambiguities and deceases results of morphological analysis, so improves accuracy of tagger and parser. By empirical results, we found that our method deceases 73.4% of Parsing tree and 52.4% of parsing time on average.

Automatic Generation of Pronunciation Variants for Korean Continuous Speech Recognition (한국어 연속음성 인식을 위한 발음열 자동 생성)

  • 이경님;전재훈;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.35-43
    • /
    • 2001
  • Many speech recognition systems have used pronunciation lexicon with possible multiple phonetic transcriptions for each word. The pronunciation lexicon is of often manually created. This process requires a lot of time and efforts, and furthermore, it is very difficult to maintain consistency of lexicon. To handle these problems, we present a model based on morphophon-ological analysis for automatically generating Korean pronunciation variants. By analyzing phonological variations frequently found in spoken Korean, we have derived about 700 phonemic contexts that would trigger the multilevel application of the corresponding phonological process, which consists of phonemic and allophonic rules. In generating pronunciation variants, morphological analysis is preceded to handle variations of phonological words. According to the morphological category, a set of tables reflecting phonemic context is looked up to generate pronunciation variants. Our experiments show that the proposed model produces mostly correct pronunciation variants of phonological words. Then we estimated how useful the pronunciation lexicon and training phonetic transcription using this proposed systems.

  • PDF

Implementation of A Plagiarism Detecting System with Sentence and Syntactic Word Similarities (문장 및 어절 유사도를 이용한 표절 탐지 시스템 구현)

  • Maeng, Joosoo;Park, Ji Su;Shon, Jin Gon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.3
    • /
    • pp.109-114
    • /
    • 2019
  • The similarity detecting method that is basically used in most plagiarism detecting systems is to use the frequency of shared words based on morphological analysis. However, this method has limitations on detecting accurate degree of similarity, especially when similar words concerning the same topics are used, sentences are partially separately excerpted, or postpositions and endings of words are similar. In order to overcome this problem, we have designed and implemented a plagiarism detecting system that provides more reliable similarity information by measuring sentence similarity and syntactic word similarity in addition to the conventional word similarity. We have carried out a comparison of on our system with a conventional system using only word similarity. The comparative experiment has shown that our system can detect plagiarized document that the conventional system can detect or cannot.

A Study for Used Transaction Analysis System using Big Data (빅데이터를 이용한 중고 거래 분석 시스템 연구)

  • Ahn, Byeongtae
    • Journal of Digital Convergence
    • /
    • v.19 no.6
    • /
    • pp.259-264
    • /
    • 2021
  • Recently, as the number of used trading sites supporting used trading increases, users want to search for a variety of information in real time. This new change has enabled a new type of C2C (Commerce to Commerce) transaction in the e-commerce base. However, since each used trading site has its own characteristics, it is difficult to standardize the whole. Therefore, in this paper, we studied a system that provides the transaction data used by the user in real time and provides the desired information quickly. In this paper, we researched the crawler system necessary for the development of the integrated trading system for used goods through Internet e-commerce, and made it possible to provide information in the web environment desired by the user through the defined morpheme analyzer. Therefore, in this study, we designed a system that provides information desired by users without accessing various used goods sites.

BEHIND CHICKEN RATINGS: An Exploratory Analysis of Yogiyo Reviews Through Text Mining (치킨 리뷰의 이면: 텍스트 마이닝을 통한 리뷰의 탐색적 분석을 중심으로)

  • Kim, Jungyeom;Choi, Eunsol;Yoon, Soohyun;Lee, Youbeen;Kim, Dongwhan
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.30-40
    • /
    • 2021
  • Ratings and reviews, despite their growing influence on restaurants' sales and reputation, entail a few limitations due to the burgeoning of reviews and inaccuracies in rating systems. This study explores the texts in reviews and ratings of a delivery application and discovers ways to elevate review credibility and usefulness. Through a text mining method, we concluded that the delivery application 'Yogiyo' has (1) a five-star oriented rating dispersion, (2) a strong positive correlation between rating factors (taste, quantity, and delivery) and (3) distinct part of speech and morpheme proportions depending on review polarity. We created a chicken-specialized negative word dictionary under four main topics and 20 sub-topic classifications after extracting a total of 367 negative words. We provide insights on how the research on delivery app reviews should progress, centered on fried chicken reviews.

Analysis of interest in non-face-to-face medical counseling of modern people in the medical industry (의료 산업에 있어 현대인의 비대면 의학 상담에 대한 관심도 분석 기법)

  • Kang, Yooseong;Park, Jong Hoon;Oh, Hayoung;Lee, Se Uk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1571-1576
    • /
    • 2022
  • This study aims to analyze the interest of modern people in non-face-to-face medical counseling in the medical industrys. Big data was collected on two social platforms, 지식인, a platform that allows experts to receive medical counseling, and YouTube. In addition to the top five keywords of telephone counseling, "internal medicine", "general medicine", "department of neurology", "department of mental health", and "pediatrics", a data set was built from each platform with a total of eight search terms: "specialist", "medical counseling", and "health information". Afterwards, pre-processing processes such as morpheme classification, disease extraction, and normalization were performed based on the crawled data. Data was visualized with word clouds, broken line graphs, quarterly graphs, and bar graphs by disease frequency based on word frequency. An emotional classification model was constructed only for YouTube data, and the performance of GRU and BERT-based models was compared.

Study on the Methodology for Extracting Information from SNS Using a Sentiment Analysis (SNS 감성분석을 이용한 정보 추출 방법론에 관한 연구)

  • Hong, Doopyo;Jeong, Harim;Park, Sangmin;Han, Eum;Kim, Honghoi;Yun, Ilsoo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.6
    • /
    • pp.141-155
    • /
    • 2017
  • As the use of SNS becomes more active, many people are posting their thoughts about specific events in their SNS in the form of text. As a result, SNS is used in various fields such as finance and distribution to conduct service satisfaction surveys and consumer monitoring. However, in the transportation area, there are not enough cases to utilize unstructured data analysis such as emotional analysis. In this study, we developed an emotional analysis methodology that can be used in transportation by using highway VOC data, which is atypical data collected by Korea Expressway Corporation. The developed methodology consists of morpheme analysis, emotional dictionary construction, and emotional discrimination of the collected unstructured data. The developed methodology was verified using highway related tweet data. As a result of the analysis, it can be guessed that many information and information about the construction and the accident were related to the highway during the analysis period. Also, it seems that users complain about the delay caused by construction and accident.

How the Title of Investment Strategy Report Affects Stock Price Forecast: Using Text Mining Method (투자전략 보고서의 제목이 주가 예측에 미치는 영향: 텍스트마이닝 중심으로)

  • Jang, Joon-Kyu;Lee, Kyu Hyun;Lee, Zoonky
    • The Journal of Bigdata
    • /
    • v.1 no.2
    • /
    • pp.21-34
    • /
    • 2016
  • There are various investment strategy reports available online, prepared by many financial analysts. If the correlation between the title of the report and analyst forecast can be found, we can tell from the title whether analyst' forecast will be reliable or not. The objective of this study is to see the correlation between the title of analyst investment strategy report and the actual result of forecast by using the Text Mining technique. The result of actual analysis showed that "strong buy and sell call" appeared in the title lead the higher accuracy of analyst forecast and fulfillment ratio. The results that potential investors can get better information by reading the title of the analyst report. We hope that this study could be the basis for new methodologies in this area.

  • PDF

Statistical Approach to Sentiment Classification using MapReduce (맵리듀스를 이용한 통계적 접근의 감성 분류)

  • Kang, Mun-Su;Baek, Seung-Hee;Choi, Young-Sik
    • Science of Emotion and Sensibility
    • /
    • v.15 no.4
    • /
    • pp.425-440
    • /
    • 2012
  • As the scale of the internet grows, the amount of subjective data increases. Thus, A need to classify automatically subjective data arises. Sentiment classification is a classification of subjective data by various types of sentiments. The sentiment classification researches have been studied focused on NLP(Natural Language Processing) and sentiment word dictionary. The former sentiment classification researches have two critical problems. First, the performance of morpheme analysis in NLP have fallen short of expectations. Second, it is not easy to choose sentiment words and determine how much a word has a sentiment. To solve these problems, this paper suggests a combination of using web-scale data and a statistical approach to sentiment classification. The proposed method of this paper is using statistics of words from web-scale data, rather than finding a meaning of a word. This approach differs from the former researches depended on NLP algorithms, it focuses on data. Hadoop and MapReduce will be used to handle web-scale data.

  • PDF