• Title/Summary/Keyword: big data mining

Search Result 679, Processing Time 0.024 seconds

Signed Hellinger measure for directional association (연관성 방향을 고려한 부호 헬링거 측도의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.353-362
    • /
    • 2016
  • By Wikipedia, data mining is the process of discovering patterns in a big data set involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. and database systems. Association rule is a method for discovering interesting relations between items in large transactions by interestingness measures. Association rule interestingness measures play a major role within a knowledge discovery process in databases, and have been developed by many researchers. Among them, the Hellinger measure is a good association threshold considering the information content and the generality of a rule. But it has the drawback that it can not determine the direction of the association. In this paper we proposed a signed Hellinger measure to be able to interpret operationally, and we checked three conditions of association threshold. Furthermore, we investigated some aspects through a few examples. The results showed that the signed Hellinger measure was better than the Hellinger measure because the signed one was able to estimate the right direction of association.

Does Rain Really Cause Toothache? Statistical Analysis Based on Google Trends

  • Jeon, Se-Jeong
    • Journal of dental hygiene science
    • /
    • v.21 no.2
    • /
    • pp.104-110
    • /
    • 2021
  • Background: Regardless of countries, the myth that rain makes the body ache has been worded in various forms, and a number of studies have been reported to investigate this. However, these studies, which depended on the patient's experience or memory, had obvious limitations. Google Trends is a big data analysis service based on search terms and viewing videos provided by Google LLC, and attempts to use it in various fields are continuing. In this study, we endeavored to introduce the 'value as a research tool' of the Google Trends, that has emerged along with technological advancements, through research on 'whether toothaches really occur frequently on rainy days'. Methods: Keywords were selected as objectively as possible by applying web crawling and text mining techniques, and the keyword "bi" meaning rain in Korean was added to verify the reliability of Google Trends data. The correlation was statistically analyzed using precipitation and temperature data provided by the Korea Meteorological Agency and daily search volume data provided by Google Trends. Results: Keywords "chi-gwa", "chi-tong", and "chung-chi" were selected, which in Korean mean 'dental clinic', 'toothache', and 'tooth decay' respectively. A significant correlation was found between the amount of precipitation and the search volume of tooth decay. No correlation was found between precipitation and other keywords or other combinations. It was natural that a very significant correlation was found between the amount of precipitation, temperature, and the search volume of "bi". Conclusion: Rain seems to actually be a cause of toothache, and if objective keyword selection is premised, Google Trends is considered to be very useful as a research tool in the future.

Big Data News Analysis in Healthcare Using Topic Modeling and Time Series Regression Analysis (토픽모델링과 시계열 회귀분석을 활용한 헬스케어 분야의 뉴스 빅데이터 분석 연구)

  • Eun-Jung Kim;Suk-Gwon Chang;Sang-Yong Tom Lee
    • Information Systems Review
    • /
    • v.25 no.3
    • /
    • pp.163-177
    • /
    • 2023
  • This research aims to identify key initiatives and a policy approach to support the industrialization of the sector. The research collected a total of 91,873 news data points relating to healthcare between 2013 to 2022. A total of 20 topics were derived through topic modeling analysis, and as a result of time series regression analysis, 4 hot topics (Healthcare, Biopharmaceuticals, Corporate outlook·Sales, Government·Policy), 3 cold topics (Smart devices, Stocks·Investment, Urban development·Construction) derived a significant topic. The research findings will serve as an important data source for government institutions that are engaged in the formulation and implementation of Korea's policies.

Analysis of News Agenda Using Text mining and Semantic Network Analysis: Focused on COVID-19 Emotions (텍스트 마이닝과 의미 네트워크 분석을 활용한 뉴스 의제 분석: 코로나 19 관련 감정을 중심으로)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.47-64
    • /
    • 2021
  • The global spread of COVID-19 around the world has not only affected many parts of our daily life but also has a huge impact on many areas, including the economy and society. As the number of confirmed cases and deaths increases, medical staff and the public are said to be experiencing psychological problems such as anxiety, depression, and stress. The collective tragedy that accompanies the epidemic raises fear and anxiety, which is known to cause enormous disruptions to the behavior and psychological well-being of many. Long-term negative emotions can reduce people's immunity and destroy their physical balance, so it is essential to understand the psychological state of COVID-19. This study suggests a method of monitoring medial news reflecting current days which requires striving not only for physical but also for psychological quarantine in the prolonged COVID-19 situation. Moreover, it is presented how an easier method of analyzing social media networks applies to those cases. The aim of this study is to assist health policymakers in fast and complex decision-making processes. News plays a major role in setting the policy agenda. Among various major media, news headlines are considered important in the field of communication science as a summary of the core content that the media wants to convey to the audiences who read it. News data used in this study was easily collected using "Bigkinds" that is created by integrating big data technology. With the collected news data, keywords were classified through text mining, and the relationship between words was visualized through semantic network analysis between keywords. Using the KrKwic program, a Korean semantic network analysis tool, text mining was performed and the frequency of words was calculated to easily identify keywords. The frequency of words appearing in keywords of articles related to COVID-19 emotions was checked and visualized in word cloud 'China', 'anxiety', 'situation', 'mind', 'social', and 'health' appeared high in relation to the emotions of COVID-19. In addition, UCINET, a specialized social network analysis program, was used to analyze connection centrality and cluster analysis, and a method of visualizing a graph using Net Draw was performed. As a result of analyzing the connection centrality between each data, it was found that the most central keywords in the keyword-centric network were 'psychology', 'COVID-19', 'blue', and 'anxiety'. The network of frequency of co-occurrence among the keywords appearing in the headlines of the news was visualized as a graph. The thickness of the line on the graph is proportional to the frequency of co-occurrence, and if the frequency of two words appearing at the same time is high, it is indicated by a thick line. It can be seen that the 'COVID-blue' pair is displayed in the boldest, and the 'COVID-emotion' and 'COVID-anxiety' pairs are displayed with a relatively thick line. 'Blue' related to COVID-19 is a word that means depression, and it was confirmed that COVID-19 and depression are keywords that should be of interest now. The research methodology used in this study has the convenience of being able to quickly measure social phenomena and changes while reducing costs. In this study, by analyzing news headlines, we were able to identify people's feelings and perceptions on issues related to COVID-19 depression, and identify the main agendas to be analyzed by deriving important keywords. By presenting and visualizing the subject and important keywords related to the COVID-19 emotion at a time, medical policy managers will be able to be provided a variety of perspectives when identifying and researching the regarding phenomenon. It is expected that it can help to use it as basic data for support, treatment and service development for psychological quarantine issues related to COVID-19.

Text Mining-Based Emerging Trend Analysis for e-Learning Contents Targeting for CEO (텍스트마이닝을 통한 최고경영자 대상 이러닝 콘텐츠 트렌드 분석)

  • Kyung-Hoon Kim;Myungsin Chae;Byungtae Lee
    • Information Systems Review
    • /
    • v.19 no.2
    • /
    • pp.1-19
    • /
    • 2017
  • Original scripts of e-learning lectures for the CEOs of corporation S were analyzed using topic analysis, which is a text mining method. Twenty-two topics were extracted based on the keywords chosen from five-year records that ranged from 2011 to 2015. Research analysis was then conducted on various issues. Promising topics were selected through evaluation and element analysis of the members of each topic. In management and economics, members demonstrated high satisfaction and interest toward topics in marketing strategy, human resource management, and communication. Philosophy, history of war, and history demonstrated high interest and satisfaction in the field of humanities, whereas mind health showed high interest and satisfaction in the field of in lifestyle. Studies were also conducted to identify topics on the proportion of content, but these studies failed to increase member satisfaction. In the field of IT, educational content responds sensitively to change of the times, but it may not increase the interest and satisfaction of members. The present study found that content production for CEOs should draw out deep implications for value innovation through technology application instead of simply ending the technical aspect of information delivery. Previous studies classified contents superficially based on the name of content program when analyzing the status of content operation. However, text mining can derive deep content and subject classification based on the contents of unstructured data script. This approach can examine current shortages and necessary fields if the service contents of the themes are displayed by year. This study was based on data obtained from influential e-learning companies in Korea. Obtaining practical results was difficult because data were not acquired from portal sites or social networking service. The content of e-learning trends of CEOs were analyzed. Data analysis was also conducted on the intellectual interests of CEOs in each field.

A Study on the Intelligence Information System's Research Identity Using the Keywords Profiling and Co-word Analysis (주제어 프로파일링 및 동시출현분석을 통한 지능정보시스템 연구의 정체성에 관한 연구)

  • Yoon, Seong Jeong;Kim, Min Yong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.139-155
    • /
    • 2016
  • The purpose of this study is to find the research identity of the Korea Intelligent Information Systems Society through the profiling methods and co-word analysis in the most recent three-year('2014~'2016) study to collect keyword. In order to understand the research identity for intelligence information system, we need that the relative position of the study will be to compare identity by collecting keyword and research methodology of The korea Society of Management Information Systems and Korea Association of Information Systems, as well as Korea Intelligent Information Systems Society for the similar. Also, Korea Intelligent Information Systems Society is focusing on the four research areas such as artificial intelligence/data mining, Intelligent Internet, knowledge management and optimization techniques. So, we analyze research trends with a representative journals for the focusing on the four research areas. A journal of the data-related will be investigated with the keyword and research methodology in Korean Society for Big Data Service and the Korean Journal of Big Data. Through this research, we will find to research trends with research keyword in recent years and compare against the study methodology and analysis tools. Finally, it is possible to know the position and orientation of the current research trends in Korea Intelligent Information Systems Society. As a result, this study revealed a study area that Korea Intelligent Information Systems Society only be pursued through a unique reveal its legitimacy and identity. So, this research can suggest future research areas to intelligent information systems specifically. Furthermore, we will predict convergence possibility of the similar research areas and Korea Intelligent Information Systems Society in overall ecosystem perspectives.

A Study on Tourism Behavior in the New normal Era Using Big Data (빅데이터를 활용한 뉴노멀(New normal)시대의 관광행태 변화에 관한 연구)

  • Kyoung-mi Yoo;Jong-cheon Kang;Youn-hee Choi
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.167-181
    • /
    • 2023
  • This study utilized TEXTOM, a social network analysis program to analyze changes in current tourism behavior after travel restrictions were eased after the outbreak of COVID-19. Data on the keywords 'domestic travel' and 'overseas travel' were collected from blogs, cafes, and news provided by Naver, Google, and Daum. The collection period was set from April to December 2022 when social distancing was lifted, and 2019 and 2020 were each set as one year and compared and analyzed with 2022. A total of 80 key words were extracted through text mining and centrality analysis was performed using NetDraw. Finally, through the CONCOR, the correlated keywords were clustered into 4. As a result of the study, tourism behavior in 2022 shows tourism recovery before the outbreak of COVID-19, segmentation of travel based on each person's preferred theme, prioritization of each country's corona mitigation policy, and then selecting a tourist destination. It is expected to provide basic data for the development of tourism marketing strategies and tourism products for the newly emerging tourism ecosystem after COVID-19.

The Perception Analysis of Autonomous Vehicles using Network Graph (네트워크 그래프를 활용한 자율주행차에 대한 인식 분석)

  • Hyo-gyeong Park;Yeon-hwi You;Sung-jung Yong;Seo-young Lee;Il-young Moon
    • Journal of Practical Engineering Education
    • /
    • v.15 no.1
    • /
    • pp.97-105
    • /
    • 2023
  • Recently, with the development of artificial intelligence technology, many technologies for user convenience are being developed. Among them, interest in autonomous vehicles is increasing day by day. Currently, many automobile companies are aiming to commercialize autonomous vehicles. In order to lay the foundation for the government's new and reasonable policy establishment to support commercialization, we tried to analyze changes and perceptions of public opinion through news article data. Therefore, in this paper, 35,891 news article data mentioning terms similar to 'autonomous vehicles' over the past three years were collected and network analyzed. As a result of the analysis, major keywords such as 'autonomous driving', 'AI', 'future', 'Hyundai Motor', 'autonomous driving vehicle', 'automobile', 'industrial', and 'electric vehicle' were derived. In addition, the autonomous vehicle industry is developing into a faster and more diverse platform and service industry by converging with various industries such as semiconductor companies and big tech companies as well as automobile companies and is paying attention to the convergence of industries. To continuously confirm changes and perceptions in public opinion, it is necessary to analyze perceptions through continuous analysis of SNS data or technology trends.

Analysis of the Research Trends by Environmental Spatial-Information Using Text-Mining Technology (텍스트 마이닝 기법을 활용한 환경공간정보 연구 동향 분석)

  • OH, Kwan-Young;LEE, Moung-Jin;PARK, Bo-Young;LEE, Jung-Ho;YOON, Jung-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.20 no.1
    • /
    • pp.113-126
    • /
    • 2017
  • This study aimed to quantitatively analyze the trends in environmental research that utilize environmental geospatial information through text mining, one of the big data analysis technologies. The analysis was conducted on a total of 869 papers published in the Republic of Korea, which were collected from the National Digital Science Library (NDSL). On the basis of the classification scheme, the keywords extracted from the papers were recategorized into 10 environmental fields including "general environment", "climate", "air quality", and 20 environmental geospatial information fields including "satellite image", "numerical map", and "disaster". With the recategorized keywords, their frequency levels and time series changes in the collected papers were analyzed, as well as the association rules between keywords. First, the results of frequency analysis showed that "general environment"(40.85%) and "satellite image"(24.87%) had the highest frequency levels among environmental fields and environmental geospatial information fields, respectively. Second, the results of the time series analysis on environmental fields showed that the share of "climate" between 1996 and 2000 was high, but since 2001, that of "general environment" has increased. In terms of environmental geospatial information fields, the demand for "satellite image" was highest throughout the period analyzed, and its utilization share has also gradually increased. Third, a total of 80 correlation rules were generated for environmental fields and environmental geospatial information fields. Among environmental fields, "general environment" generated the highest number of correlation rules (17) with environmental geospatial information fields such as "satellite image" and "digital map".

Text Mining Analysis of Media Coverage of Maritime Sports: Perceptions of Yachting, Rowing, and Canoeing (텍스트마이닝을 활용한 해양스포츠에 대한 언론 보도기사 분석: 요트, 조정, 카누를 중심으로)

  • Ji-Hyeon Kim;Bo-Kyeong Kim
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.6
    • /
    • pp.609-619
    • /
    • 2023
  • This study aimed to investigate the formation of the social perception of domestic maritime sports using text mining analysis of keywords and topics from domestic media coverage over the past 10 years related to representative maritime sports, including yachting, rowing, and canoeing. The results are as follows: First, term frequency (TF) and word cloud analyses identified the top keywords: "maritime," "competition," "experience," "tourism," "world," "yachting," "canoeing," "leisure," and "participation." Second, semantic network analysis revealed that yachting was correlated with terms like "maritime," "industry," "competition," "leisure," "tourism," "boat," "facilities," and "business"; rowing with terms like "competition" and "Chungju"; and canoeing with terms like "maritime," "competition," "experience," "leisure," and "tourism." Third, topic modeling analysis indicated that yachting, rowing, and canoeing are perceived as elite sports and maritime leisure sports. However, the perception of these sports has been demonstrated to have little impact on society, public opinion, and social transformation. In summary, when considering these results comprehensively, it can be concluded that yachting and canoeing have gradually shifted from being perceived as elite sports to essential elements of the maritime leisure industry. Contrariwise, rowing remains primarily associated with elite sports, and its popularization as a maritime leisure sport appears limited at this time.