• Title/Summary/Keyword: Morpheme

Search Result 238, Processing Time 0.024 seconds

A Study for Used Transaction Analysis System using Big Data (빅데이터를 이용한 중고 거래 분석 시스템 연구)

  • Ahn, Byeongtae
    • Journal of Digital Convergence
    • /
    • v.19 no.6
    • /
    • pp.259-264
    • /
    • 2021
  • Recently, as the number of used trading sites supporting used trading increases, users want to search for a variety of information in real time. This new change has enabled a new type of C2C (Commerce to Commerce) transaction in the e-commerce base. However, since each used trading site has its own characteristics, it is difficult to standardize the whole. Therefore, in this paper, we studied a system that provides the transaction data used by the user in real time and provides the desired information quickly. In this paper, we researched the crawler system necessary for the development of the integrated trading system for used goods through Internet e-commerce, and made it possible to provide information in the web environment desired by the user through the defined morpheme analyzer. Therefore, in this study, we designed a system that provides information desired by users without accessing various used goods sites.

A Study on Applicability of Machine Learning for Book Classification of Public Libraries: Focusing on Social Science and Arts (공공도서관 도서 분류를 위한 머신러닝 적용 가능성 연구 - 사회과학과 예술분야를 중심으로 -)

  • Kwak, Chul Wan
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.1
    • /
    • pp.133-150
    • /
    • 2021
  • The purpose of this study is to identify the applicability of machine learning targeting titles in the classification of books in public libraries. Data analysis was performed using Python's scikit-learn library through the Jupiter notebook of the Anaconda platform. KoNLPy analyzer and Okt class were used for Hangul morpheme analysis. The units of analysis were 2,000 title fields and KDC classification class numbers (300 and 600) extracted from the KORMARC records of public libraries. As a result of analyzing the data using six machine learning models, it showed a possibility of applying machine learning to book classification. Among the models used, the neural network model has the highest accuracy of title classification. The study suggested the need for improving the accuracy of title classification, the need for research on book titles, tokenization of titles, and stop words.

Predicate Recognition Method using BiLSTM Model and Morpheme Features (BiLSTM 모델과 형태소 자질을 이용한 서술어 인식 방법)

  • Nam, Chung-Hyeon;Jang, Kyung-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.1
    • /
    • pp.24-29
    • /
    • 2022
  • Semantic role labeling task used in various natural language processing fields, such as information extraction and question answering systems, is the task of identifying the arugments for a given sentence and predicate. Predicate used as semantic role labeling input are extracted using lexical analysis results such as POS-tagging, but the problem is that predicate can't extract all linguistic patterns because predicate in korean language has various patterns, depending on the meaning of sentence. In this paper, we propose a korean predicate recognition method using neural network model with pre-trained embedding models and lexical features. The experiments compare the performance on the hyper parameters of models and with or without the use of embedding models and lexical features. As a result, we confirm that the performance of the proposed neural network model was 92.63%.

BEHIND CHICKEN RATINGS: An Exploratory Analysis of Yogiyo Reviews Through Text Mining (치킨 리뷰의 이면: 텍스트 마이닝을 통한 리뷰의 탐색적 분석을 중심으로)

  • Kim, Jungyeom;Choi, Eunsol;Yoon, Soohyun;Lee, Youbeen;Kim, Dongwhan
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.30-40
    • /
    • 2021
  • Ratings and reviews, despite their growing influence on restaurants' sales and reputation, entail a few limitations due to the burgeoning of reviews and inaccuracies in rating systems. This study explores the texts in reviews and ratings of a delivery application and discovers ways to elevate review credibility and usefulness. Through a text mining method, we concluded that the delivery application 'Yogiyo' has (1) a five-star oriented rating dispersion, (2) a strong positive correlation between rating factors (taste, quantity, and delivery) and (3) distinct part of speech and morpheme proportions depending on review polarity. We created a chicken-specialized negative word dictionary under four main topics and 20 sub-topic classifications after extracting a total of 367 negative words. We provide insights on how the research on delivery app reviews should progress, centered on fried chicken reviews.

Analysis of interest in non-face-to-face medical counseling of modern people in the medical industry (의료 산업에 있어 현대인의 비대면 의학 상담에 대한 관심도 분석 기법)

  • Kang, Yooseong;Park, Jong Hoon;Oh, Hayoung;Lee, Se Uk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1571-1576
    • /
    • 2022
  • This study aims to analyze the interest of modern people in non-face-to-face medical counseling in the medical industrys. Big data was collected on two social platforms, 지식인, a platform that allows experts to receive medical counseling, and YouTube. In addition to the top five keywords of telephone counseling, "internal medicine", "general medicine", "department of neurology", "department of mental health", and "pediatrics", a data set was built from each platform with a total of eight search terms: "specialist", "medical counseling", and "health information". Afterwards, pre-processing processes such as morpheme classification, disease extraction, and normalization were performed based on the crawled data. Data was visualized with word clouds, broken line graphs, quarterly graphs, and bar graphs by disease frequency based on word frequency. An emotional classification model was constructed only for YouTube data, and the performance of GRU and BERT-based models was compared.

Improvement of recommendation system using attribute-based opinion mining of online customer reviews

  • Misun Lee;Hyunchul Ahn
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.259-266
    • /
    • 2023
  • In this paper, we propose an algorithm that can improve the accuracy performance of collaborative filtering using attribute-based opinion mining (ABOM). For the experiment, a total of 1,227 online consumer review data about smartphone apps from domestic smartphone users were used for analysis. After morpheme analysis using the KKMA (Kkokkoma) analyzer and emotional word analysis using KOSAC, attribute extraction is performed using LDA topic modeling, and the topic modeling results for each weighted review are used to add up the ratings of collaborative filtering and the sentiment score. MAE, MAPE, and RMSE, which are statistical model performance evaluations that calculate the average accuracy error, were used. Through experiments, we predicted the accuracy of online customers' app ratings (APP_Score) by combining traditional collaborative filtering among the recommendation algorithms and the attribute-based opinion mining (ABOM) technique, which combines LDA attribute extraction and sentiment analysis. As a result of the analysis, it was found that the prediction accuracy of ratings using attribute-based opinion mining CF was better than that of ratings implementing traditional collaborative filtering.

Two-Path Language Modeling Considering Word Order Structure of Korean (한국어의 어순 구조를 고려한 Two-Path 언어모델링)

  • Shin, Joong-Hwi;Park, Jae-Hyun;Lee, Jung-Tae;Rim, Hae-Chang
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.8
    • /
    • pp.435-442
    • /
    • 2008
  • The n-gram model is appropriate for languages, such as English, in which the word-order is grammatically rigid. However, it is not suitable for Korean in which the word-order is relatively free. Previous work proposed a twoply HMM that reflected the characteristics of Korean but failed to reflect word-order structures among words. In this paper, we define a new segment unit which combines two words in order to reflect the characteristic of word-order among adjacent words that appear in verbal morphemes. Moreover, we propose a two-path language model that estimates probabilities depending on the context based on the proposed segment unit. Experimental results show that the proposed two-path language model yields 25.68% perplexity improvement compared to the previous Korean language models and reduces 94.03% perplexity for the prediction of verbal morphemes where words are combined.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

Types of Place Names According to the Named Sources and Those Cultural-Political Meanings (명명 유연성에 따른 지명 유형과 문화정치적 의의)

  • Kim, Sun-Bae
    • Journal of the Korean association of regional geographers
    • /
    • v.17 no.3
    • /
    • pp.270-296
    • /
    • 2011
  • The named source kept in all place names alludes to the close relationship between place name and its place while it also becomes a fundamental condition for geographical research on place names. Meanwhile, the named source may be recognized differently according to who the social subjects producing and changing place names Life. Place names represent and constitute the identity and the ideology of the diverse social subjects. This aspect is related to cultural politics concerned with conflicts and contestation among different social subjects over the meaning of place names. Particularly, the Gongju-Mok Jingwan Area in the Korean peninsula has long history and geopolitical location as a borderland and a buffer zone. As a result, it has provided many conditions for cultural diversity and power relations, both of which have caused social subjects to contest their social power across space and time, and has led to produce the several types in the changes of place names. Therefore, this paper aims to investigate the types according to the named source, especially that of the forepart of place names morpheme, and those cultural-political meanings. These place names are classified into three large groups, such as the physical place names, the social place names, and the economic place names. These types of place names have represented the place identity and the ideology of diverse social subjects, and also accompanied the changes by power relations between themselves.

  • PDF

Research on the Development of Facets for Improvement in Searching Records: Focusing on Presidential Records (기록물의 검색 향상을 위한 패싯 개발에 관한 연구 - 대통령기록물을 중심으로 -)

  • Seong, Hyoju;Rieh, Hae-young
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.17 no.2
    • /
    • pp.165-188
    • /
    • 2017
  • As the recognition of the importance of user-oriented services is increasing, there has been a heightened attention for finding aids that could improve the effectiveness of searching. This study tried to draw various facet elements that can be applied to the presidential records retrieval system using presidential records as cases in analyzing various resources, considering the importance of facets in finding aids for the improvement of effectiveness in searching in the future and the importance of presidential records in Korea. In drawing facet elements based on the characteristics of presidential records, the websites of the National Archives (NARA) and Presidential (Prime Ministers') Archives as well as their search options were examined as cases. In addition, the morpheme of each title of presidential records were analyzed, as well as the terms entered by the users of the Presidential Archives Portal of Korea, the terms used in the request for information disclosure toward the Presidential Archives in Korea, the search options of the Presidential Archives Portal, and the elements of the description and metadata standards. The significance of this study lies on suggesting the methodology of developing various facets as main elements in finding aids using the presidential records as cases.