• Title/Summary/Keyword: Morpheme Analysis

Search Result 122, Processing Time 0.026 seconds

Improvement of recommendation system using attribute-based opinion mining of online customer reviews

  • Misun Lee;Hyunchul Ahn
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.259-266
    • /
    • 2023
  • In this paper, we propose an algorithm that can improve the accuracy performance of collaborative filtering using attribute-based opinion mining (ABOM). For the experiment, a total of 1,227 online consumer review data about smartphone apps from domestic smartphone users were used for analysis. After morpheme analysis using the KKMA (Kkokkoma) analyzer and emotional word analysis using KOSAC, attribute extraction is performed using LDA topic modeling, and the topic modeling results for each weighted review are used to add up the ratings of collaborative filtering and the sentiment score. MAE, MAPE, and RMSE, which are statistical model performance evaluations that calculate the average accuracy error, were used. Through experiments, we predicted the accuracy of online customers' app ratings (APP_Score) by combining traditional collaborative filtering among the recommendation algorithms and the attribute-based opinion mining (ABOM) technique, which combines LDA attribute extraction and sentiment analysis. As a result of the analysis, it was found that the prediction accuracy of ratings using attribute-based opinion mining CF was better than that of ratings implementing traditional collaborative filtering.

Analysis of Phonological Reduction in Conversational Japanese (현대일본어의 회화문에 나타난 축약형의 음운론적 분석)

  • Choi Young-sook;Sato Shigeru;Pahk Hy-tay
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.198-206
    • /
    • 1996
  • Using eighteen text materials from various goners of present-day Japanese, we collected phonologically reduced forms frequently observed in conversational Japanese, and classified them in search of unified explanation of phonological reduction phenomena. We found 7,516 cases of reduced forms which we divided into 43 categories according to the types of phonological changes they have undergone. The general tendencies ale that deletion and fusion of a phoneme or an entire syllable takes place frequently, resulting in the decrease in the number of syllable. Typical examples frequently observed throughout the materials are : $~/noda/{\rightarrow}~/nda/,{\;}-/teiru/{\rightarrow}~/teru/,{\;}~/dewa/{\rightarrow}~/zja/,{\;}~/tesimau/{\rightarrow}~/cjau/$. From morphosyntactic point of view phonological reduction often occurs at the NP and VP morpheme boundaries. The following findings are drawn from phonological observations of reduction. (1) Vowels are more easily deleted than consonants. (2) Bilabials(/m/, /b/, and /w/ are the most likely candidates for deletion. (3) In a concatenation of vowels, closed vowels are absorbed into open vowels, or two adjacent vowels come to create another vowel, in which case reconstruction of the original sequence is not always predictable. (4) Alveolars are palatalized under the influence of front vowels. (5) Regressive assimilation takes place in a syllable starting with ill, changing the entire syllable into phonological choked sound or a syllabic nasal, depending on the voicing of following phoneme.

  • PDF

Maximum Likelihood-based Automatic Lexicon Generation for AI Assistant-based Interaction with Mobile Devices

  • Lee, Donghyun;Park, Jae-Hyun;Kim, Kwang-Ho;Park, Jeong-Sik;Kim, Ji-Hwan;Jang, Gil-Jin;Park, Unsang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4264-4279
    • /
    • 2017
  • In this paper, maximum likelihood-based automatic lexicon generation using mixed-syllables is proposed for unlimited vocabulary voice interface for East Asian languages (e.g. Korean, Chinese and Japanese) in AI-assistant based interaction with mobile devices. The conventional lexicon has two inevitable problems: 1) a tedious repetition of out-of-lexicon unit additions to the lexicon, and 2) the propagation of errors during a morpheme analysis and space segmentation. The proposed method provides an automatic framework to solve the above problems. The proposed method produces a level of overall accuracy similar to one of previous methods in the presence of one out-of-lexicon word in a sentence, but the proposed method provides superior results with the absolute improvements of 1.62%, 5.58%, and 10.09% in terms of word accuracy when the number of out-of-lexicon words in a sentence was two, three and four, respectively.

Practical Development and Application of a Korean Morphological Analyzer for Automatic Indexing (자동 색인을 위한 한국어 형태소 분석기의 실제적인 구현 및 적용)

  • Choi, Sung-Pil;Seo, Jerry;Chae, Young-Suk
    • The KIPS Transactions:PartB
    • /
    • v.9B no.5
    • /
    • pp.689-700
    • /
    • 2002
  • In this paper, we developed Korean Morphological Analyzer for an automatic indexing that is essential for Information Retrieval. Since it is important to index large-scaled document set efficiently, we concentrated on maximizing the speed of word analysis, modularization and structuralization of the system without new concepts or ideas. In this respect, our system is characterized in terms of software engineering aspect to be used in real world rather than theoretical issues. First, a dictionary of words was structured. Then modules that analyze substantive words and inflected words were introduced. Furthermore numeral analyzer was developed. And we introduced an unknown word analyzer using the patterns of morpheme. This whole system was integrated into K-2000, an information retrieval system.

An Analysis of Cancer Survival Narratives Using Computerized Text Analysis Program (컴퓨터 텍스트 분석프로그램을 적용한 암환자의 투병수기 분석)

  • Kim, Dal Sook;Park, Ah Hyun;Kang, Nam Jun
    • Journal of Korean Academy of Nursing
    • /
    • v.44 no.3
    • /
    • pp.328-338
    • /
    • 2014
  • Purpose: This study was done to explore experiences of persons living through the periods of cancer diagnosis, treatment, and self-care. Methods: With permission, texts of 29 cancer survival narratives (8 men and 21 women, winners in contests sponsored by two institutes), were analyzed using Kang's Korean-Computerized-Text-Analysis-Program where the commonly used Korean-Morphological-Analyzer and the 21st-century-Sejong-Modern-Korean-Corpora representing laymen's Korean-language-use are connected. Experiences were explored based on words included in 100 highly-used-morphemes. For interpretation, we used 'categorizing words by meaning', 'comparing use-rate by periods and to the 21st-century-Sejong-Modern-Korean-Corpora', and highly-used-morphemes that appeared only in a specific period. Results: The most highly-used-word-morpheme was first-person-pronouns followed by, diagnosis treatment-related- words, mind-expression-words, cancer, persons-in-meaningful-interaction, living and eating, information-related-verbs, emotion-expression- words, with 240 to 0.8 times for layman use-rate. 'Diagnosis-process', 'cancer-thought', 'things-to-come-after-diagnosis', 'physician husband', 'result-related-information', 'meaningful-things before diagnosis-period', and 'locus-of-cause' dominated the life of the diagnosis-period. 'Treatment', 'unreliable-body', 'husband people mother physician', 'treatment-related-uncertainty', 'hard-time', and 'waiting-time represented experiences in the treatment-period. Themes of living in the self-care-period were complex and included 'living-as-a-human', 'self-managing-of-diseased-body', 'positive-emotion', and 'connecting past present future'. Conclusion: The results show that the experience of living for persons with cancer is influenced by each period's own situational-characteristics. Experiences of the diagnosis and treatment-period are negative disease-oriented while that of the self-care period is positive present-oriented.

Morphological Analysis Study for the Development of DB on the Manufacture Process of Prescription and Medicinal Food (처방 및 약선요리 제조 과정의 데이터베이스 구축을 위한 형태소 분석 연구)

  • Kim, Thae-Yul;Hwang, Su-Jung;Kim, Ki-Wook;Lee, Byung-Wook
    • Journal of Korean Medical classics
    • /
    • v.29 no.2
    • /
    • pp.79-90
    • /
    • 2016
  • Objectives : Treatment using foods has already been recorded since the time of Zhou Dynasty of China. Modifications in the cooking process of medicinal food or manufactural process of herbal medicines are accompanied by the alterations in the ingredients that affect the actual efficacies of medicinal food or herbal medicine, and may have marked effects on the patients including the difficulties that may be experienced in consuming the food or taking the medicine. Therefore, systemic management is essential in such processes. Accordingly, management of such knowledge system must be standardized and conveniently administered by grafting IT technology. This study aims to overcome the problem of the failure of the knowledge system on the material-oriented medicinal herbs to apply the knowledge on the cooking process that impart marked influence on the actual efficacies of the medicinal herbs. Methods : Therefore, analysis of the cooking process or manufacturing processes of prescriptions was executed by using the morphological analysis method in natural language. In this study, we aimed to make data structure of the terminologies that represent manufacture process of prescription and medicinal food. The data structure is combinations of smallest unit in natural language. We made the database by analyzing morpheme of the natural language to express the manufacture process of prescription and medicinal food. Results & Conclusions : As the results, we can express making process of Cheonjin-won, Guseon-wangdogo and Sanyagbaegboglyeongtalagjuk in DB. It was concluded that the development of DB through the extraction of a total of 15 types of concepts including 'order', 'action' and 'continuous action', etc. was helpful in systematization of the knowledge on medicinal herbs including the manufacturing process.

Implementation of User Recommendation System based on Video Contents Story Analysis and Viewing Pattern Analysis (영상 스토리 분석과 시청 패턴 분석 기반의 추천 시스템 구현)

  • Lee, Hyoun-Sup;Kim, Minyoung;Lee, Ji-Hoon;Kim, Jin-Deog
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.12
    • /
    • pp.1567-1573
    • /
    • 2020
  • The development of Internet technology has brought the era of one-man media. An individual produces content on user own and uploads it to related online services, and many users watch the content of online services using devices that allow them to use the Internet. Currently, most users find and watch content they want through search functions provided by existing online services. These features are provided based on information entered by the user who uploaded the content. In an environment where content needs to be retrieved based on these limited word data, user unwanted information is presented to users in the search results. To solve this problem, in this paper, the system actively analyzes the video in the online service, and presents a way to extract and reflect the characteristics held by the video. The research was conducted to extract morphemes based on the story content based on the voice data of a video and analyze them with big data technology.

XML Document Keyword Weight Analysis based Paragraph Extraction Model (XML 문서 키워드 가중치 분석 기반 문단 추출 모델)

  • Lee, Jongwon;Kang, Inshik;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2133-2138
    • /
    • 2017
  • The analysis of existing XML documents and other documents was centered on words. It can be implemented using a morpheme analyzer, but it can classify many words in the document and cannot grasp the core contents of the document. In order for a user to efficiently understand a document, a paragraph containing a main word must be extracted and presented to the user. The proposed system retrieves keyword in the normalized XML document. Then, the user extracts the paragraphs containing the keyword inputted for searching and displays them to the user. In addition, the frequency and weight of the keyword used in the search are informed to the user, and the order of the extracted paragraphs and the redundancy elimination function are minimized so that the user can understand the document. The proposed system can minimize the time and effort required to understand the document by allowing the user to understand the document without reading the whole document.

Rule Construction for Determination of Thematic Roles by Using Large Corpora and Computational Dictionaries (대규모 말뭉치와 전산 언어 사전을 이용한 의미역 결정 규칙의 구축)

  • Kang, Sin-Jae;Park, Jung-Hye
    • The KIPS Transactions:PartB
    • /
    • v.10B no.2
    • /
    • pp.219-228
    • /
    • 2003
  • This paper presents an efficient construction method of determination rules of thematic roles from syntactic relations in Korean language processing. This process is one of the main core of semantic analysis and an important issue to be solved in natural language processing. It is problematic to describe rules for determining thematic roles by only using general linguistic knowledge and experience, since the final result may be different according to the subjective views of researchers, and it is impossible to construct rules to cover all cases. However, our method is objective and efficient by considering large corpora, which contain practical osages of Korean language, and case frames in the Sejong Electronic Lexicon of Korean, which is being developed by dozens of Korean linguistic researchers. To determine thematic roles more correctly, our system uses syntactic relations, semantic classes, morpheme information, position of double subject. Especially by using semantic classes, we can increase the applicability of the rules.

Design and Implementation of Minutes Summary System Based on Word Frequency and Similarity Analysis (단어 빈도와 유사도 분석 기반의 회의록 요약 시스템 설계 및 구현)

  • Heo, Kanhgo;Yang, Jinwoo;Kim, Donghyun;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.10
    • /
    • pp.620-629
    • /
    • 2019
  • An automated minutes summary system is required to objectively summarize and classify the contents of discussions or discussions for decision making. This paper designs and implements a minutes summary system using word2vec model to complement the existing minutes summary system. The proposed system is further implemented with word2vec model to remove index words during morpheme analysis and to extract representative sentences with common opinions from documents. The proposed system automatically classifies documents collected during the meeting process and extracts representative sentences representing the agenda among various opinions. The conference host can quickly identify and manage all the agendas discussed at the meeting through the proposal system. The proposed system analyzes various agendas of large-scale debates or discussions and summarizes sentences that can be representative opinions to support fast and accurate decision making.