• Title/Summary/Keyword: dirichlet distribution

Search Result 75, Processing Time 0.021 seconds

An analysis of the change in media's reports and attitudes about face masks during the COVID-19 pandemic in South Korea: a study using Big Data latent dirichlet allocation (LDA) topic modelling (빅데이터 LDA 토픽 모델링을 활용한 국내 코로나19 대유행 기간 마스크 관련 언론 보도 및 태도 변화 분석)

  • Suh, Ye-Ryoung;Koh, Keumseok Peter;Lee, Jaewoo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.5
    • /
    • pp.731-740
    • /
    • 2021
  • This study applied LDA topic modeling analysis to collect and analyze news media big data related to face masks in the three waves of the COVID-19 pandemic in Korea. The results empirically show that media reports focused on mask production and distribution policies in the first wave and the mandatory mask wearing in the second wave. In contrast, more reports on trivial, gossipy events consist of the media coverage in the second and third waves. The findings imply that Korea's governmental interventions to address the shortage of face masks and to regulate mask wearing were successful relatively in a short time. In contrast, the study also reports that there may be relative less number of science-based news reports like the ones on the effectiveness of face masks or different levels of filter types. This study exemplifies how a big data analysis can be applied to evaluate and enhance public health communication.

A Trend Analysis of Radiological Research in Korea using Topic Modeling (토픽모델링을 이용한 국내 방사선 학술연구 트렌드 분석)

  • Hong, Dong-Hee
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.3
    • /
    • pp.343-349
    • /
    • 2022
  • We intend to use topic modeling to identify radiation-themed papers published from 1989 to 2022 and analyze the relevance and weight between topics. This study analyzed topics derived from national subjects for 717 papers published until recently in 2022 to contribute to the revitalization of research in the field of radiation. Through text mining, overall research trends on the subject distribution of the study were analyzed, and five topics were derived through topic modeling. First, among the papers to be analyzed, a total of 1,675 words were frequency-analyzed through the preprocessing process of key words in a total of 717 papers centered on keywords. Second, as a result of analyzing topics based on the association of constituent words for five topics, it was found that studies focused on minimizing dose in the range that does not degrade image quality in the fields of radiation, image, CT clinical. In addition, it was found that various studies were mainly conducted in the MRI, and the study of ultrasound in various areas of disease analysis was actively attempted.

Analyzing the discriminative characteristic of cover letters using text mining focused on Air Force applicants (텍스트 마이닝을 이용한 공군 부사관 지원자 자기소개서의 차별적 특성 분석)

  • Kwon, Hyeok;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.75-94
    • /
    • 2021
  • The low birth rate and shortened military service period are causing concerns about selecting excellent military officers. The Republic of Korea entered a low birth rate society in 1984 and an aged society in 2018 respectively, and is expected to be in a super-aged society in 2025. In addition, the troop-oriented military is changed as a state-of-the-art weapons-oriented military, and the reduction of the military service period was implemented in 2018 to ease the burden of military service for young people and play a role in the society early. Some observe that the application rate for military officers is falling due to a decrease of manpower resources and a preference for shortened mandatory military service over military officers. This requires further consideration of the policy of securing excellent military officers. Most of the related studies have used social scientists' methodologies, but this study applies the methodology of text mining suitable for large-scale documents analysis. This study extracts words of discriminative characteristics from the Republic of Korea Air Force Non-Commissioned Officer Applicant cover letters and analyzes the polarity of pass and fail. It consists of three steps in total. First, the application is divided into general and technical fields, and the words characterized in the cover letter are ordered according to the difference in the frequency ratio of each field. The greater the difference in the proportion of each application field, the field character is defined as 'more discriminative'. Based on this, we extract the top 50 words representing discriminative characteristics in general fields and the top 50 words representing discriminative characteristics in technology fields. Second, the number of appropriate topics in the overall cover letter is calculated through the LDA. It uses perplexity score and coherence score. Based on the appropriate number of topics, we then use LDA to generate topic and probability, and estimate which topic words of discriminative characteristic belong to. Subsequently, the keyword indicators of questions used to set the labeling candidate index, and the most appropriate index indicator is set as the label for the topic when considering the topic-specific word distribution. Third, using L-LDA, which sets the cover letter and label as pass and fail, we generate topics and probabilities for each field of pass and fail labels. Furthermore, we extract only words of discriminative characteristics that give labeled topics among generated topics and probabilities by pass and fail labels. Next, we extract the difference between the probability on the pass label and the probability on the fail label by word of the labeled discriminative characteristic. A positive figure can be seen as having the polarity of pass, and a negative figure can be seen as having the polarity of fail. This study is the first research to reflect the characteristics of cover letters of Republic of Korea Air Force non-commissioned officer applicants, not in the private sector. Moreover, these methodologies can apply text mining techniques for multiple documents, rather survey or interview methods, to reduce analysis time and increase reliability for the entire population. For this reason, the methodology proposed in the study is also applicable to other forms of multiple documents in the field of military personnel. This study shows that L-LDA is more suitable than LDA to extract discriminative characteristics of Republic of Korea Air Force Noncommissioned cover letters. Furthermore, this study proposes a methodology that uses a combination of LDA and L-LDA. Therefore, through the analysis of the results of the acquisition of non-commissioned Republic of Korea Air Force officers, we would like to provide information available for acquisition and promotional policies and propose a methodology available for research in the field of military manpower acquisition.

Case Study of Big Data-Based Agri-food Recommendation System According to Types of Customers (빅데이터 기반 소비자 유형별 농식품 추천시스템 구축 사례)

  • Moon, Junghoon;Jang, Ikhoon;Choe, Young Chan;Kim, Jin Gyo;Bock, Gene
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.5
    • /
    • pp.903-913
    • /
    • 2015
  • The Korea Agency of Education, Promotion and Information Service in Food, Agriculture, Forestry and Fisheries launched a public data portal service in January 2015. The service provides customized information for consumers through an agri-food recommendation system built-in portal service. The recommendation system has fallowing characteristics. First, the system can increase recommendation accuracy by using a wide variety of agri-food related data, including SNS opinion mining, consumer's purchase data, climate data, and wholesale price data. Second, the system uses segmentation method based on consumer's lifestyle and megatrends factors to overcome the cold start problem. Third, the system recommends agri-foods to users reflecting various preference contextual factors by using recommendation algorithm, dirichlet-multinomial distribution. In addition, the system provides diverse information related to recommended agri-foods to increase interest in agri-food of service users.

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.