• Title/Summary/Keyword: morphemes

Search Result 140, Processing Time 0.022 seconds

A Study on the Linkability of Public Information Using Social Network Analysis (사회 연결망 분석을 활용한 공공데이터 간 연관성에 관한 연구)

  • Jeong, Da Woon;Yi, Mi Sook;Shin, Dong Bin
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.6
    • /
    • pp.461-470
    • /
    • 2017
  • In Korea, starting with the Government 3.0 Policy, the utilization of public data as an important driving force to promote economic growth has been highlighted as a major issue. However Korea is currently only able to open and provide accumulated data stored in the public domain. To resolve this issue, we need to not only open and provide public information, but also to create new information by linking the data and developing related services. Thus, this study analyzes the linkability of public information and provides lists of the linkable public data. In order to do this, we first have performed preconditioning processes on the accessibility and workability of the data. Next, we have deduced the major keywords in public data through analyzing the morphemes, and then the core keywords (Top 10) and their linkable keyword lists through an analysis of social networks. Based on the outcome of this study, a subsequent study will deduce new information by linking the public data and creating various services and information contents. Furthermore, not only conceptual but also practical linking measures need to be created, and a related law must be prepared.

Relationship between Result of Sentiment Analysis and User Satisfaction -The case of Korean Meteorological Administration- (감성분석 결과와 사용자 만족도와의 관계 -기상청 사례를 중심으로-)

  • Kim, In-Gyum;Kim, Hye-Min;Lim, Byunghwan;Lee, Ki-Kwang
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.10
    • /
    • pp.393-402
    • /
    • 2016
  • To compensate for limited the satisfaction survey currently conducted by Korea Metrological Administration (KMA), a sentiment analysis via a social networking service (SNS) can be utilized. From 2011 to 2014, with the sentiment analysis, Twitter who had commented 'KMA' had collected, then, using $Na{\ddot{i}}ve$ Bayes classification, we were classified into three sentiments: positive, negative, and neutral sentiments. An additional dictionary was made with morphemes appeared only in the positive, negative, and neutral sentiments of basic $Na{\ddot{i}}ve$ Bayes classification, thus the accuracy of sentiment analysis was improved. As a result, when sentiments were classified with a basic $Na{\ddot{i}}ve$ Bayes classification, the training data were reproduced about 75% accuracy rate. Whereas, when classifying with the additional dictionary, it showed 97% accuracy rate. When using the additional dictionary, sentiments of verification data was classified with about 75% accuracy rate. Lower classification accuracy rate would be improved by not only a qualified dictionary that has increased amount of training data, including diverse keywords related to weather, but continuous update of the dictionary. Meanwhile, contrary to the sentiment analysis based on dictionary definition of individual vocabulary, if sentiments are classified into meaning of sentence, increased rate of negative sentiment and change in satisfaction could be explained. Therefore, the sentiment analysis via SNS would be considered as useful tool for complementing surveys in the future.

Exploratory Research on Automating the Analysis of Scientific Argumentation Using Machine Learning (머신 러닝을 활용한 과학 논변 구성 요소 코딩 자동화 가능성 탐색 연구)

  • Lee, Gyeong-Geon;Ha, Heesoo;Hong, Hun-Gi;Kim, Heui-Baik
    • Journal of The Korean Association For Science Education
    • /
    • v.38 no.2
    • /
    • pp.219-234
    • /
    • 2018
  • In this study, we explored the possibility of automating the process of analyzing elements of scientific argument in the context of a Korean classroom. To gather training data, we collected 990 sentences from science education journals that illustrate the results of coding elements of argumentation according to Toulmin's argumentation structure framework. We extracted 483 sentences as a test data set from the transcription of students' discourse in scientific argumentation activities. The words and morphemes of each argument were analyzed using the Python 'KoNLPy' package and the 'Kkma' module for Korean Natural Language Processing. After constructing the 'argument-morpheme:class' matrix for 1,473 sentences, five machine learning techniques were applied to generate predictive models relating each sentences to the element of argument with which it corresponded. The accuracy of the predictive models was investigated by comparing them with the results of pre-coding by researchers and confirming the degree of agreement. The predictive model generated by the k-nearest neighbor algorithm (KNN) demonstrated the highest degree of agreement [54.04% (${\kappa}=0.22$)] when machine learning was performed with the consideration of morpheme of each sentence. The predictive model generated by the KNN exhibited higher agreement [55.07% (${\kappa}=0.24$)] when the coding results of the previous sentence were added to the prediction process. In addition, the results indicated importance of considering context of discourse by reflecting the codes of previous sentences to the analysis. The results have significance in that, it showed the possibility of automating the analysis of students' argumentation activities in Korean language by applying machine learning.

A Study on Detecting Fake Reviews Using Machine Learning: Focusing on User Behavior Analysis (머신러닝을 활용한 가짜리뷰 탐지 연구: 사용자 행동 분석을 중심으로)

  • Lee, Min Cheol;Yoon, Hyun Shik
    • Knowledge Management Research
    • /
    • v.21 no.3
    • /
    • pp.177-195
    • /
    • 2020
  • The social consciousness on fake reviews has triggered researchers to suggest ways to cope with them by analyzing contents of fake reviews or finding ways to discover them by means of structural characteristics of them. This research tried to collect data from blog posts in Naver and detect habitual patterns users use unconsciously by variables extracted from blogs and blog posts by a machine learning model and wanted to use the technique in predicting fake reviews. Data analysis showed that there was a very high relationship between the number of all the posts registered in the blog of the writer of the related writing and the date when it was registered. And, it was found that, as model to detect advertising reviews, Random Forest is the most suitable. If a review is predicted to be an advertising one by the model suggested in this research, it is very likely that it is fake review, and that it violates the guidelines on investigation into markings and advertising regarding recommendation and guarantee in the Law of Marking and Advertising. The fact that, instead of using analysis of morphemes in contents of writings, this research adopts behavior analysis of the writer, and, based on such an approach, collects characteristic data of blogs and blog posts not by manual works, but by automated system, and discerns whether a certain writing is advertising or not is expected to have positive effects on improving efficiency and effectiveness in detecting fake reviews.

An Efficient Method for Korean Noun Extraction Using Noun Patterns (명사 출현 특성을 이용한 효율적인 한국어 명사 추출 방법)

  • 이도길;이상주;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.173-183
    • /
    • 2003
  • Morphological analysis is the most widely used method for extracting nouns from Korean texts. For every Eojeol, in order to extract nouns from it, a morphological analyzer performs frequent dictionary lookup and applies many morphonological rules, therefore it requires many operations. Moreover, a morphological analyzer generates all the possible morphological interpretations (sequences of morphemes) of a given Eojeol, which may by unnecessary from the noun extraction`s point of view. To reduce unnecessary computation of morphological analysis from the noun extraction`s point of view, this paper proposes a method for Korean noun extraction considering noun occurrence characteristics. Noun patterns denote conditions on which nouns are included in an Eojeol or not, which are positive cues or negative cues, respectively. When using the exclusive information as the negative cues, it is possible to reduce the search space of morphological analysis by ignoring Eojeols not including nouns. Post-noun syllable sequences(PNSS) as the positive cues can simply extract nouns by checking the part of the Eojeol preceding the PNSS and can guess unknown nouns. In addition, morphonological information is used instead of many morphonological rules in order to recover the lexical form from its altered surface form. Experimental results show that the proposed method can speed up without losing accuracy compared with other systems based on morphological analysis.

On Doublets (쌍형어에 대하여)

  • Yi, Eun-Gyeong
    • Cross-Cultural Studies
    • /
    • v.50
    • /
    • pp.425-451
    • /
    • 2018
  • In this paper, we examined the issues of the discussions on the subject of doublets. In general, as a definition, the use of doublets refer to a pair of words which have a common etymon, but also to a pair of words or grammatical morphemes that have the same meaning and similar forms of the word. In this paper, we have seen that a typical pairing word is a pair of words with a common etymology. Generally speaking, it is possible to divide doublets into subtypes depending on the identified similarities or differences in the meaning or form. The most distant type from the typical type of doublets is a pair of words that do not have a common etymon, but have the same meaning and are similar in form. The second issue about doublets is whether doublets include only words. For example, if some josas (postpositions or particles) have a common etymon, then it is noted that they can be accepted as a kind of doublets. In the case of suffixes, it may be possible to recognize the suffixes as doublets if they have a common etymon. In other words, it is not necessary to recognize the suffixes as doublets because the derivatives which are derived by the suffixes can be accepted as doublets. In the case of endings, it may be possible to recognize a pair of endings which have the same meaning and the common etymon as a doublet. Otherwise, the word forms to which the endings are combined can be accepted likewise as doublets. However, considering the fact that the endings typically in use in the Korean language may have syntactic properties, the endings should be considered as doublets rather than the words which have the endings. Finally, we conclude that there may be some debate as to whether stem doublets or ending doublets belong to a lexical item in the lexicon. It can be said that they are plural underlying forms and may be deserving of further research.

Characteristics of Narrative Writing in Normal Aging: Story Grammar and Syntactic Structure (노년층의 글쓰기 특성 -이야기문법과 구문구조)

  • Kim, Hyeon Ah;Won, Sae Rom;Lee, Bo Eun;Yoon, Ji Hye
    • 재활복지
    • /
    • v.21 no.1
    • /
    • pp.193-212
    • /
    • 2017
  • The elderly often produce irrelevant speech and get off-topic more easily than the young; the former also has difficulty generating fewer syntactic structures and makes errors of grammatical morphemes. In particular, the elderly might have more difficulty writing since it requires more complex cognitive processes than storytelling. The participants in this study were 32 young people and 32 older people. They were asked to write a short story of Korean fairy tale('Heungbu Nolbu'). The data was analyzed in narrative composition and syntactic structures. The study revealed the following: First, in composition aspects, the elderly group showed significantly lower total number of story grammar and episodes. In addition, the elderly produced more off topic statements. Second, in syntactic aspects, although there was no significant difference in the number of producing complex sentences between two groups, the elderly group generated more inadequate cohesive devices and used fewer relative and adverbial clauses. These findings suggest that the elderly have a tendency to perform tasks by producing more off-topic statements and shows decreasing coherence by using lower number of relative and adverbial clauses. However, this study also uncovers that the elderly were able to write more complex and longer sentences using visual feedback.

A Study of Relationship Derivation Technique using object extraction Technique (개체추출기법을 이용한 관계성 도출기법)

  • Kim, Jong-hee;Lee, Eun-seok;Kim, Jeong-su;Park, Jong-kook;Kim, Jong-bae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.309-311
    • /
    • 2014
  • Despite increasing demands for big data application based on the analysis of scattered unstructured data, few relevant studies have been reported. Accordingly, the present study suggests a technique enabling a sentence-based semantic analysis by extracting objects from collected web information and automatically analyzing the relationships between such objects with collective intelligence and language processing technology. To be specific, collected information is stored in DBMS in a structured form, and then morpheme and feature information is analyzed. Obtained morphemes are classified into objects of interest, marginal objects and objects of non-interest. Then, with an inter-object attribute recognition technique, the relationships between objects are analyzed in terms of the degree, scope and nature of such relationships. As a result, the analysis of relevance between the information was based on certain keywords and used an inter-object relationship extraction technique that can determine positivity and negativity. Also, the present study suggested a method to design a system fit for real-time large-capacity processing and applicable to high value-added services.

  • PDF

Automatic Classification and Vocabulary Analysis of Political Bias in News Articles by Using Subword Tokenization (부분 단어 토큰화 기법을 이용한 뉴스 기사 정치적 편향성 자동 분류 및 어휘 분석)

  • Cho, Dan Bi;Lee, Hyun Young;Jung, Won Sup;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2021
  • In the political field of news articles, there are polarized and biased characteristics such as conservative and liberal, which is called political bias. We constructed keyword-based dataset to classify bias of news articles. Most embedding researches represent a sentence with sequence of morphemes. In our work, we expect that the number of unknown tokens will be reduced if the sentences are constituted by subwords that are segmented by the language model. We propose a document embedding model with subword tokenization and apply this model to SVM and feedforward neural network structure to classify the political bias. As a result of comparing the performance of the document embedding model with morphological analysis, the document embedding model with subwords showed the highest accuracy at 78.22%. It was confirmed that the number of unknown tokens was reduced by subword tokenization. Using the best performance embedding model in our bias classification task, we extract the keywords based on politicians. The bias of keywords was verified by the average similarity with the vector of politicians from each political tendency.

Analysis of User Reviews of Running Applications Using Text Mining: Focusing on Nike Run Club and Runkeeper (텍스트마이닝을 활용한 러닝 어플리케이션 사용자 리뷰 분석: Nike Run Club과 Runkeeper를 중심으로)

  • Gimun Ryu;Ilgwang Kim
    • Journal of Industrial Convergence
    • /
    • v.22 no.4
    • /
    • pp.11-19
    • /
    • 2024
  • The purpose of this study was to analyze user reviews of running applications using text mining. This study used user reviews of Nike Run Club and Runkeeper in the Google Play Store using the selenium package of python3 as the analysis data, and separated the morphemes by leaving only Korean nouns through the OKT analyzer. After morpheme separation, we created a rankNL dictionary to remove stopwords. To analyze the data, we used TF, TF-IDF and LDA topic modeling in text mining. The results of this study are as follows. First, the keywords 'record', 'app', and 'workout' were identified as the top keywords in the user reviews of Nike Run Club and Runkeeper applications, and there were differences in the rankings of TF and TF-IDF. Second, the LDA topic modeling of Nike Run Club identified the topics of 'basic items', 'additional features', 'errors', and 'location-based data', and the topics of Runkeeper identified the topics of 'errors', 'voice function', 'running data', 'benefits', and 'motivation'. Based on the results, it is recommended that errors and improvements should be made to contribute to the competitiveness of the application.