• Title/Summary/Keyword: text mining technique

Search Result 222, Processing Time 0.021 seconds

A Technique to Link Bug and Commit Report based on Commit History (커밋 히스토리에 기반한 버그 및 커밋 연결 기법)

  • Chae, Youngjae;Lee, Eunjoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.5
    • /
    • pp.235-239
    • /
    • 2016
  • 'Commit-bug link', the link between commit history and bug reports, is used for software maintenance and defect prediction in bug tracking systems. Previous studies have shown that the links are automatically detected based on text similarity, time interval, and keyword. Existing approaches depend on the quality of commit history and could thus miss several links. In this paper, we proposed a technique to link commit and bug report using not only messages of commit history, but also the similarity of files in the commit history coupled with bug reports. The experimental results demonstrated the applicability of the suggested approach.

Convergence of Korean Traditional Dance and K-Pop Dance : An Analysis of Comments on 2018 MMA BTS 'IDOL' Videos on YouTube (한국 전통춤과 K-pop 댄스의 융합 : 2018 MMA 방탄소년단 'IDOL' 유튜브 댓글 분석)

  • Yoo, Ji-Young;Kim, Mi-Kyung
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.8
    • /
    • pp.189-198
    • /
    • 2019
  • This study aims to make meaning of the reactions of the Korean people through the text mining of comments on videos of the December 2018 MMA performance of intro on YouTube. For this, comments on 15 YouTube videos were collected over the past 10 months. With the collected data, a total of 5,135 comments were analyzed through crawling using the Python and BeautifulSoup programs, data was refined over a total of 3 sessions, and a final total of 5,080 comments were used as analysis material. A mining technique was used for data analysis and the process of refinement, analysis, and visualization was achieved using the Textom program. Research results showed that keyword analysis showed the keywords of 'performance', 'Korea', 'video', 'top', 'cool', 'dance', 'idol', 'legend', 'love', and 'gratitude' in that order and keywords such as 'patriotism' and 'Olympics' also appeared frequently. N-gram analysis showed that comments with contexts such as 'a top performance that will remain a legend among Korean idol performances', and 'an idol performance that displayed the traditional culture of Korea' were in higher ranks. Based on such keyword analysis results, topic modeling was applied and 5 top keywords were extracted from a total of 5 topics. Analysis results of topic contents and distribution showed that topics in the comments of this performance's videos largely consisted of the 3 reactions of 'high praise regarding the stage performance', 'affection towards the fusion and artistic sublimation of Korean traditional dance', and 'gratitude towards the uploading of cool dance videos'

Text Mining Analysis Technique on ECDIS Accident Report (텍스트 마이닝 기법을 활용한 ECDIS 사고보고서 분석)

  • Lee, Jeong-Seok;Lee, Bo-Kyeong;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.25 no.4
    • /
    • pp.405-412
    • /
    • 2019
  • SOLAS requires that ECDIS be installed on ships of more than 500 gross tonnage engaged in international navigation until the first inspection arriving after July 1, 2018. Several accidents related to the use of ECDIS have occurred with its installation as a new major navigation instrument. The 12 incident reports issued by MAIB, BSU, BEAmer, DMAIB, and DSB were analyzed, and the cause of accident was determined to be related to the operation of the navigator and the ECDIS system. The text was analyzed using the R-program to quantitatively analyze words related to the cause of the accident. We used text mining techniques such as Wordcloud, Wordnetwork and Wordweight to represent the importance of words according to their frequency of derivation. Wordcloud uses the N-gram model as a way of expressing the frequency of used words in cloud form. As a result of the uni-gram analysis of the N-gram model, ECDIS words were obtained the most, and the bi-gram analysis results showed that the word "Safety Contour" was used most frequently. Based on the bi-gram analysis, the causative words are classified into the officer and the ECDIS system, and the related words are represented by Wordnetwork. Finally, the related words with the of icer and the ECDIS system were composed of word corpus, and Wordweight was applied to analyze the change in corpus frequency by year. As a result of analyzing the tendency of corpus variation with the trend line graph, more recently, the corpus of the officer has decreased, and conversely, the corpus of the ECDIS system is gradually increasing.

Comparative Study of User Reactions in OTT Service Platforms Using Text Mining (텍스트 마이닝을 활용한 OTT 서비스 플랫폼별 사용자 반응 비교 연구)

  • Soonchan Kwon;Jieun Kim;Beakcheol Jang
    • Journal of Internet Computing and Services
    • /
    • v.25 no.3
    • /
    • pp.43-54
    • /
    • 2024
  • This study employs text mining techniques to compare user responses across various Over-The-Top (OTT) service platforms. The primary objective of the research is to understand user satisfaction with OTT service platforms and contribute to the formulation of more effective review strategies. The key questions addressed in this study involve identifying prominent topics and keywords in user reviews of different OTT services and comprehending platform-specific user reactions. TF-IDF is utilized to extract significant words from positive and negative reviews, while BERTopic, an advanced topic modeling technique, is employed for a more nuanced and comprehensive analysis of intricate user reviews. The results from TF-IDF analysis reveal that positive app reviews exhibit a high frequency of content-related words, whereas negative reviews display a high frequency of words associated with potential issues during app usage. Through the utilization of BERTopic, we were able to extract keywords related to content diversity, app performance components, payment, and compatibility, by associating them with content attributes. This enabled us to verify that the distinguishing attributes of the platforms vary among themselves. The findings of this study offer significant insights into user behavior and preferences, which OTT service providers can leverage to improve user experience and satisfaction. We also anticipate that researchers exploring deep learning models will find our study results valuable for conducting analyses on user review text data.

Research trends in the Korean Journal of Women Health Nursing from 2011 to 2021: a quantitative content analysis

  • Ju-Hee Nho;Sookkyoung Park
    • Women's Health Nursing
    • /
    • v.29 no.2
    • /
    • pp.128-136
    • /
    • 2023
  • Purpose: Topic modeling is a text mining technique that extracts concepts from textual data and uncovers semantic structures and potential knowledge frameworks within context. This study aimed to identify major keywords and network structures for each major topic to discern research trends in women's health nursing published in the Korean Journal of Women Health Nursing (KJWHN) using text network analysis and topic modeling. Methods: The study targeted papers with English abstracts among 373 articles published in KJWHN from January 2011 to December 2021. Text network analysis and topic modeling were employed, and the analysis consisted of five steps: (1) data collection, (2) word extraction and refinement, (3) extraction of keywords and creation of networks, (4) network centrality analysis and key topic selection, and (5) topic modeling. Results: Six major keywords, each corresponding to a topic, were extracted through topic modeling analysis: "gynecologic neoplasms," "menopausal health," "health behavior," "infertility," "women's health in transition," and "nursing education for women." Conclusion: The latent topics from the target studies primarily focused on the health of women across all age groups. Research related to women's health is evolving with changing times and warrants further progress in the future. Future research on women's health nursing should explore various topics that reflect changes in social trends, and research methods should be diversified accordingly.

A Hangul Document Classification System using Case-based Reasoning (사례기반 추론을 이용한 한글 문서분류 시스템)

  • Lee, Jae-Sik;Lee, Jong-Woon
    • Asia pacific journal of information systems
    • /
    • v.12 no.2
    • /
    • pp.179-195
    • /
    • 2002
  • In this research, we developed an efficient Hangul document classification system for text mining. We mean 'efficient' by maintaining an acceptable classification performance while taking shorter computing time. In our system, given a query document, k documents are first retrieved from the document case base using the k-nearest neighbor technique, which is the main algorithm of case-based reasoning. Then, TFIDF method, which is the traditional vector model in information retrieval technique, is applied to the query document and the k retrieved documents to classify the query document. We call this procedure 'CB_TFIDF' method. The result of our research showed that the classification accuracy of CB_TFIDF was similar to that of traditional TFIDF method. However, the average time for classifying one document decreased remarkably.

A Technique for Extracting GeoSemantic Knowledge from Micro-blog (마이크로 블로그기반의 공간 지식 추출 기법연구)

  • Ha, Su-Wook;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Spatial Information Research
    • /
    • v.20 no.2
    • /
    • pp.129-136
    • /
    • 2012
  • Recently international organizations such as ISO/TC211, OGC, INSPIRE (Infrastructure for Spatial Information in Europe) make an effort to share geospatial data using semantic web technologies. In addition, smart phone and social networking services enable community-based opportunities for participants to share issues of a social phenomenon based on geographic area, and many researchers try to find a method of extracting issues from that. However, serviceable spatial ontologies are still insufficient at application level, and studies of spatial information extraction from SNS were focused on user's location finding or geocoding by text mining. Therefore, a study of extracting spatial phenomenon from social media information and converting it into geosemantic knowledge is very usable. In this paper, we propose a framework for extracting keywords from micro-blog, one of the social media services, finding their relationships using data mining technique, and converting it into spatiotemopral knowledge. The result of this study could be used for implementing a related system as a procedure and ontology model for constructing geoseem antic issue. And from this, it is expected to improve the effectiveness of finding, publishing and analysing spatial issues.

An Analysis of School Life Sensibility of Students at Korea National College of Agriculture and Fisheries Using Unstructured Data Mining(1) (비정형 데이터 마이닝을 활용한 한국농수산대학 재학생의 학교생활 감성 분석(1))

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Song, C.Y.;Shin, Y.K.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.21 no.1
    • /
    • pp.99-114
    • /
    • 2019
  • In this study we examined the preferences of eight college living factors for students at Korea National College of Agriculture and Fisheries(KNCAF). Analytical techniques of unstructured data used opinion mining and text mining techniques, and the analysis results of text mining were visualized as word cloud. The college life factors included eight topics that were closely related to students: 'my present', 'my 10 years later', 'friendship', 'college festival', 'student restaurant', 'college dormitory', 'KNCAF', and 'long-term field practice'. In the text submitted by the students, we have established a dictionary of positive words and negative words to evaluate the preference by classifying the emotions of positive and negative. As a result, KNCAF students showed more than 85% positive emotions about the theme of 'student restaurant' and 'friendship'. But students' positive feelings about 'long-term field practice' and 'college dormitory' showed the lowest satisfaction rate of not exceeding 60%. The rest of the topics showed satisfaction of 69.3~74.2%. The gender differences showed that the positive emotions of male students were high in the topics of 'my present', 'my 10 years later', 'friendship', 'college dormitory' and 'long-term field practice'. And those of female were high in 'college festival', 'student restaurant' and 'KNCAF'. In addition, using text mining technique, the main words of positive and negative words were extracted, and word cloud was created to visualize the results.

Classification of ratings in online reviews (온라인 리뷰에서 평점의 분류)

  • Choi, Dongjun;Choi, Hosik;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.845-854
    • /
    • 2016
  • Sentiment analysis or opinion mining is a technique of text mining employed to identify subjective information or opinions of an individual from documents in blogs, reviews, articles, or social networks. In the literature, only a problem of binary classification of ratings based on review texts in an online review. However, because there can be positive or negative reviews as well as neutral reviews, a multi-class classification will be more appropriate than the binary classification. To this end, we consider the multi-class classification of ratings based on review texts. In the preprocessing stage, we extract words related with ratings using chi-square statistic. Then the extracted words are used as input variables to multi-class classifiers such as support vector machines and proportional odds model to compare their predictive performances.

The Analysis of the Visitors' Experiences in Yeonnam-dong before and after the Gyeongui Line Park Project - A Text Mining Approach - (경의선숲길 조성 전후의 연남동 방문자의 경험 분석 - 블로그 텍스트 분석을 중심으로 -)

  • Kim, Sae-Ryung;Choi, Yunwon;Yoon, Heeyeun
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.47 no.4
    • /
    • pp.33-49
    • /
    • 2019
  • The purpose of this study was to investigate the changes in the experiences of visitors of Yeonnam-dong during the period covering the development of a linear park, the Gyeongui Line Park. This study used a text mining technique to analyze Naver Blog postings of those who visited Yeonnam-dong from June 2013 to May 2017, divided into four periods -from June 2013 to May 2014, from June 2014 to May 2015, from June 2015 to May 2016 and from June 2016 to May 2017. The keywords used were 'Yeonnam-dong', 'Gyeongui Line' and 'Yeontral Park' and the data was further refined and resampled. A semantic network analysis was conducted on the basis of the co-occurrences of words. The results of the study were as follows. During the entire period, the main experience of visitors to Yeonnam-dong was 'food culture' consistently, but the activities related to 'market', 'browsing', and 'buy' increased. Also, activities such as 'walk', 'play' and 'rest' in the park newly appeared after the construction of the park. Moreover, more diverse opinions about the Yeonnam-dong were expressed on the blog, and Yeonnam-dong began to be recognized as a place where a variety of activities can be enjoyed. Lastly, when the visitors wrote about the theme 'food culture', the scope of the keywords expanded from simple ones, such as 'eat', 'photograph' and 'chatting' to 'market', 'browsing', and 'walk'. The sub-themes that appeared with the park also expanded to various topics with the emergence of the Gyeongui Line Book Street. This study analyzed the change of experiences of visitors objectively with text mining, a quantitative methodology. Due to the nature of text mining, however, the subjective opinions inevitably have been involved in the process of refining. Also, further research is required to assess the direct relationship between these changes and park construction.