• Title/Summary/Keyword: text mining technique

Search Result 222, Processing Time 0.025 seconds

The Utilization of Local Document Information to Improve Statistical Context-Sensitive Spelling Error Correction (통계적 문맥의존 철자오류 교정 기법의 향상을 위한 지역적 문서 정보의 활용)

  • Lee, Jung-Hun;Kim, Minho;Kwon, Hyuk-Chul
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.7
    • /
    • pp.446-451
    • /
    • 2017
  • The statistical context-sensitive spelling correction technique in this thesis is based upon Shannon's noisy channel model. The interpolation method is used for the improvement of the correction method proposed in the paper, and the general interpolation method is to fill the middle value of the probability by (N-1)-gram and (N-2)-gram. This method is based upon the same statistical corpus. In the proposed method, interpolation is performed using the frequency information between the statistical corpus and the correction document. The advantages of using frequency of correction documents are twofold. First, the probability of the coined word existing only in the correction document can be obtained. Second, even if there are two correction candidates with ambiguous probability values, the ambiguity is solved by correcting them by referring to the correction document. The method proposed in this thesis showed better precision and recall than the existing correction model.

A Study on Analysis of Topic Modeling using Customer Reviews based on Sharing Economy: Focusing on Sharing Parking (공유경제 기반의 고객리뷰를 이용한 토픽모델링 분석: 공유주차를 중심으로)

  • Lee, Taewon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.3
    • /
    • pp.39-51
    • /
    • 2020
  • This study will examine the social issues and consumer awareness of sharing parking through the method text mining. In this experiment, the topic by keyword was extracted and analyzed using TFIDF (Term frequency inverse document frequency) and LDA (Latent dirichlet allocation) technique. As a result of categorization by topic, citizens' complaints such as local government agreements, parking space negotiations, parking culture improvement, citizen participation, etc., played an important role in implementing shared parking services. The contribution of this study highly differentiated from previous studies that conducted exploratory studies using corporate and regional cases, and can be said to have a high academic contribution. In addition, based on the results obtained by utilizing the LDA analysis in this study, there is a practical contribution that it can be applied or utilized in establishing a sharing economy policy for revitalizing the local economy.

Research on Designing Korean Emotional Dictionary using Intelligent Natural Language Crawling System in SNS (SNS대상의 지능형 자연어 수집, 처리 시스템 구현을 통한 한국형 감성사전 구축에 관한 연구)

  • Lee, Jong-Hwa
    • The Journal of Information Systems
    • /
    • v.29 no.3
    • /
    • pp.237-251
    • /
    • 2020
  • Purpose The research was studied the hierarchical Hangul emotion index by organizing all the emotions which SNS users are thinking. As a preliminary study by the researcher, the English-based Plutchick (1980)'s emotional standard was reinterpreted in Korean, and a hashtag with implicit meaning on SNS was studied. To build a multidimensional emotion dictionary and classify three-dimensional emotions, an emotion seed was selected for the composition of seven emotion sets, and an emotion word dictionary was constructed by collecting SNS hashtags derived from each emotion seed. We also want to explore the priority of each Hangul emotion index. Design/methodology/approach In the process of transforming the matrix through the vector process of words constituting the sentence, weights were extracted using TF-IDF (Term Frequency Inverse Document Frequency), and the dimension reduction technique of the matrix in the emotion set was NMF (Nonnegative Matrix Factorization) algorithm. The emotional dimension was solved by using the characteristic value of the emotional word. The cosine distance algorithm was used to measure the distance between vectors by measuring the similarity of emotion words in the emotion set. Findings Customer needs analysis is a force to read changes in emotions, and Korean emotion word research is the customer's needs. In addition, the ranking of the emotion words within the emotion set will be a special criterion for reading the depth of the emotion. The sentiment index study of this research believes that by providing companies with effective information for emotional marketing, new business opportunities will be expanded and valued. In addition, if the emotion dictionary is eventually connected to the emotional DNA of the product, it will be possible to define the "emotional DNA", which is a set of emotions that the product should have.

Similarity Analysis of Hospitalization using Crowding Distance

  • Jung, Yong Gyu;Choi, Young Jin;Cha, Byeong Heon
    • International journal of advanced smart convergence
    • /
    • v.5 no.2
    • /
    • pp.53-58
    • /
    • 2016
  • With the growing use of big data and data mining, it serves to understand how such techniques can be used to understand various relationships in the healthcare field. This study uses hierarchical methods of data analysis to explore similarities in hospitalization across several New York state counties. The study utilized methods of measuring crowding distance of data for age-specific hospitalization period. Crowding distance is defined as the longest distance, or least similarity, between urban cities. It is expected that the city of Clinton have the greatest distance, while Albany the other cities are closer because they are connected by the shortest distance to each step. Similarities were stronger across hospital stays categorized by age. Hierarchical clustering can be applied to predict the similarity of data across the 10 cities of hospitalization with the measurement of crowding distance. In order to enhance the performance of hierarchical clustering, comparison can be made across congestion distance when crowding distance is applied first through the application of converting text to an attribute vector. Measurements of similarity between two objects are dependent on the measurement method used in clustering but is distinguished from the similarity of the distance; where the smaller the distance value the more similar two things are to one other. By applying this specific technique, it is found that the distance between crowding is reduced consistently in relationship to similarity between the data increases to enhance the performance of the experiments through the application of special techniques. Furthermore, through the similarity by city hospitalization period, when the construction of hospital wards in cities, by referring to results of experiments, or predict possible will land to the extent of the size of the hospital facilities hospital stay is expected to be useful in efficiently managing the patient in a similar area.

Application of Sentiment Analysis and Topic Modeling on Rural Solar PV Issues : Comparison of News Articles and Blog Posts (감성분석과 토픽모델링을 활용한 농촌태양광 관련 이슈 연구 : 언론 기사와 블로그 포스트 비교)

  • Ki, Jaehong;Ahn, Seunghyeok
    • Journal of Digital Convergence
    • /
    • v.18 no.9
    • /
    • pp.17-27
    • /
    • 2020
  • News articles and blog posts have influence on social agenda setting and this study applied text mining on the subject of solar PV in rural area appeared in those media. Texts are gained from online news articles and blog posts with rural solar PV as a keyword by web scrapping, and these are analysed by sentiment analysis and topic modeling technique. Sentiment analysis shows that the proportion of negative texts are significantly lower in blog posts compared to news articles. Result of topic modeling shows that topics related to government policy have the largest loading in positive articles whereas various topics are relatively evenly distributed in negative articles. For blog posts, topics related to rural area installation and environmental damage are have the largest loading in positive and negative texts, respectively. This research reveals issues related to rural solar PV by combining sentiment analysis and topic modeling that were separately applied in previous studies.

Analysis of Consulting Research Trends Using Topic Modeling (토픽 모델링을 활용한 컨설팅 연구동향 분석)

  • Kim, Min Kwan;Lee, Yong;Han, Chang Hee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.46-54
    • /
    • 2017
  • 'Consulting', which is the main research topic of the knowledge service industry, is a field of study that is essential for the growth and development of companies and proliferation to specialized fields. However, it is difficult to grasp the current status of international research related to consulting, mainly on which topics are being studied, and what are the latest research topics. The purpose of this study is to analyze the research trends of academic research related to 'consulting' by applying quantitative analysis such as topic modeling and statistic analysis. In this study, we collected statistical data related to consulting in the Scopus DB of Elsevier, which is a representative academic database, and conducted a quantitative analysis on 15,888 documents. We scientifically analyzed the research trends related to consulting based on the bibliographic data of academic research published all over the world. Specifically, the trends of the number of articles published in the major countries including Korea, the author key word trend, and the research topic trend were compared by country and year. This study is significant in that it presents the result of quantitative analysis based on bibliographic data in the academic DB in order to scientifically analyze the trend of academic research related to consulting. Especially, it is meaningful that the traditional frequency-based quantitative bibliographic analysis method and the text mining (topic modeling) technique are used together and analyzed. The results of this study can be used as a tool to guide the direction of research in consulting field. It is expected that it will help to predict the promising field, changes and trends of consulting industry related research through the trend analysis.

Analysis of drama viewership related words through unstructured data collection (비정형데이터 수집을 통한 드라마 시청률 연관어 분석)

  • Kang, Sun-Kyoung;Lee, Hyun-Chang;Shin, Seong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.8
    • /
    • pp.1567-1574
    • /
    • 2017
  • In this paper, we analyzed the stereotyped and non - stereotyped data in order to analyze the drama 's ratings. The formalized data collection collected 19 items from the four areas of drama information, person information, broadcasting information, and audience rating information of each broadcasting company. Atypical data were collected from bulletin boards, pre - broadcast blogs and post - broadcast blogs operated by each broadcasting company using a crawling technique. As a result of comparing the differences according to the four areas for each broadcaster from the collected regular data, the results were similar to each other. And we derived seven related words by analyzing the correlation of occurrence frequencies from unstructured data collected from bulletin boards and blogs of each broadcasting company. The derived associations were obtained through reliability analysis.

A Study on the Perception of Data 3 Act through Big Data Analysis (빅데이터 분석을 통한 데이터 3법 인식에 관한 연구)

  • Oh, Jungjoo;Lee, Hwansoo
    • Convergence Security Journal
    • /
    • v.21 no.2
    • /
    • pp.19-28
    • /
    • 2021
  • Korea is promoting a digital new deal policy for the digital transformation and innovation accelerating of the industry. However, because of the strict existing data-related laws, there are still restrictions on the industry's use of data for the digital new deal policy. In order to solve this issue, a revised bill of the Data 3 Act has been proposed, but there is still insufficient discussion on how it will actually affect the activation of data use in the industry. Therefore, this study aims to analyze the perception of public opinion on the Data 3 Act and the implications of the revision of the Data 3 Act. To this end, the revision of the Data 3 Act and related research trends were analyzed, and the perception of the Data 3 Act was analyzed using a big data analysis technique. According to the analysis results, while promoting the vitalization of the data industry in line with the purpose of the revision, the Data 3 Act has a concern that it focuses on specific industries. The results of this study are meaningful in providing implications for future improvement plans by analyzing online perceptions of the industrial impact of the Data 3 Act in the early stages of implementation through big data analysis.

Research trends in statistics for domestic and international journal using paper abstract data (초록데이터를 활용한 국내외 통계학 분야 연구동향)

  • Yang, Jong-Hoon;Kwak, Il-Youp
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.267-278
    • /
    • 2021
  • As time goes by, the amount of data is increasing regardless of government, business, domestic or overseas. Accordingly, research on big data is increasing in academia. Statistics is one of the major disciplines of big data research, and it will be interesting to understand the research trend of statistics through big data in the growing number of papers in statistics. In this study, we analyzed what studies are being conducted through abstract data of statistical papers in Korea and abroad. Research trends in domestic and international were analyzed through the frequency of keyword data of the papers, and the relationship between the keywords was visualized through the Word Embedding method. In addition to the keywords selected by the authors, words that are importantly used in statistical papers selected through Textrank were also visualized. Lastly, 10 topics were investigated by applying the LDA technique to the abstract data. Through the analysis of each topic, we investigated which research topics are frequently studied and which words are used importantly.

Sensitivity of abacus and Chasdaq in the Chinese stock market through analysis of Weibo sentiment related to Corona-19 (코로나-19관련 웨이보 정서 분석을 통한 중국 주식시장의 주판 및 차스닥의 민감도 예측 기법)

  • Li, Jiaqi;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • Investor mood from social media is gaining increasing attention for leading a price movement in stock market. Based on the behavioral finance theory, this study argues that sentiment extracted from social media using big data technique can predict a real-time (short-run) price momentum in Chinese stock market. Collecting Sina Weibo posts that related to COVID-19 using keyword method, a daily influential weighted sentiment factors is extracted from the sizable raw data of over 2 millions of posts. We examine one supervised and 4 unsupervised sentiment analysis model, and use the best performed word-frequency and BiLSTM mdoel. The test result shows a similar movement between stock price change and sentiment factor. It indicates that public mood extracted from social media can in some extent represent the investors' sentiment and make a difference in stock market fluctuation when people are concentrating on a special events that can cause effect on the stock market.