• Title/Summary/Keyword: IDF

Search Result 527, Processing Time 0.024 seconds

Text Mining Driven Content Analysis of Social Perception on Schizophrenia Before and After the Revision of the Terminology (조현병과 정신분열병에 대한 뉴스 프레임 분석을 통해 본 사회적 인식의 변화)

  • Kim, Hyunji;Park, Seojeong;Song, Chaemin;Song, Min
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.4
    • /
    • pp.285-307
    • /
    • 2019
  • In 2011, the Korean Medical Association revised the name of schizophrenia to remove the social stigma for the sick. Although it has been about nine years since the revision of the terminology, no studies have quantitatively analyzed how much social awareness has changed. Thus, this study investigates the changes in social awareness of schizophrenia caused by the revision of the disease name by analyzing Naver news articles related to the disease. For text analysis, LDA topic modeling, TF-IDF, word co-occurrence, and sentiment analysis techniques were used. The results showed that social awareness of the disease was more negative after the revision of the terminology. In addition, social awareness of the former term among two terms used after the revision was more negative. In other words, the revision of the disease did not resolve the stigma.

Active Senior Contents Trend Analysis using LDA Topic Modeling (LDA 토픽 모델링을 이용한 액티브 시니어 콘텐츠 트렌드 분석)

  • Lee, Dongwoo;Kim, Yoosin;Shin, Eunjung
    • Journal of Internet Computing and Services
    • /
    • v.22 no.5
    • /
    • pp.35-45
    • /
    • 2021
  • The purpose of this study is to understand the characteristics and trends of active senior. As the baby boom generation become the age of the elderly, they are more active than senior. These seniors are called active seniors, a new consumer group. Many countries and companies are also interested in providing relevant policies and services, but there is lack of researches on active senior trends. This study collects the 8,740 posts related to active seniors on social media from January 1st, 2018 to June 31st, 2021, and conducted keyword frequency analysis, TF-IDF analysis and LDA topic modeling. Through LDA topic modeling, topics are classified into 10 categories: lifestyle, benefits, shopping, government business, government education, health, society and economy, care industry, silver housing, leisure. The results of this study can be utilized as fundamental data to help understand the academic and industrial aspects of active senior.

Multi-Modal Based Malware Similarity Estimation Method (멀티모달 기반 악성코드 유사도 계산 기법)

  • Yoo, Jeong Do;Kim, Taekyu;Kim, In-sung;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.2
    • /
    • pp.347-363
    • /
    • 2019
  • Malware has its own unique behavior characteristics, like DNA for living things. To respond APT (Advanced Persistent Threat) attacks in advance, it needs to extract behavioral characteristics from malware. To this end, it needs to do classification for each malware based on its behavioral similarity. In this paper, various similarity of Windows malware is estimated; and based on these similarity values, malware's family is predicted. The similarity measures used in this paper are as follows: 'TF-IDF cosine similarity', 'Nilsimsa similarity', 'malware function cosine similarity' and 'Jaccard similarity'. As a result, we find the prediction rate for each similarity measure is widely different. Although, there is no similarity measure which can be applied to malware classification with high accuracy, this result can be helpful to select a similarity measure to classify specific malware family.

Influencer Attribute Analysis based Recommendation System (인플루언서 속성 분석 기반 추천 시스템)

  • Park, JeongReun;Park, Jiwon;Kim, Minwoo;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.11
    • /
    • pp.1321-1329
    • /
    • 2019
  • With the development of social information networks, the marketing methods are also changing in various ways. Unlike successful marketing methods based on existing celebrities and financial support, Influencer-based marketing is a big trend and very famous. In this paper, we first extract influencer features from more than 54 YouTube channels using the multi-dimensional qualitative analysis based on the meta information and comment data analysis of YouTube, model representative themes to maximize a personalized video satisfaction. Plus, the purpose of this study is to provide supplementary means for the successful promotion and marketing by creating and distributing videos of new items by referring to the existing Influencer features. For that we assume all comments of various videos for each channel as each document, TF-IDF (Term Frequency and Inverse Document Frequency) and LDA (Latent Dirichlet Allocation) algorithms are applied to maximize performance of the proposed scheme. Based on the performance evaluation, we proved the proposed scheme is better than other schemes.

Analysis of Sustainability Report Content Using GRI: Public and Private Enterprise Perspective (GRI를 이용한 지속가능보고서 구성 분석: 공,사 기업 관점으로)

  • Yun, Ji Hye;Lee, Jong Hwa
    • Knowledge Management Research
    • /
    • v.23 no.3
    • /
    • pp.153-171
    • /
    • 2022
  • With the global ESG management craze, domestic and foreign companies voluntarily declare sustainable management and actively respond by establishing strategies. The Financial Services Commission mandates the disclosure of sustainability reports representing ESG management sequentially and will expand to SMEs in the future. Information disclosure of sustainability reports is mainly done through international standards such as GRI, SASB, and TCFD, and many domestic companies use GRI Standards guidelines. This study examines the composition system of sustainability reports and compares public and private companies with GRI Standards to analyze sustainable management by type. This study revealed that public enterprises focused on social and labor, while private enterprises focused on the economy and environment through TF-IDF modeling. In addition, the electronic and information communication industries focused on product responsibility. Unlike previous studies that quantified and analyzed sustainability management according to grade, the current study analyzed sustainability reports, which are unstructured data. Therefore, the results of this study are expected to provide valuable theoretical and practical implications for researchers and supervisors interested in ESG management.

Effects of Duration and Time Distribution of Probability Rainfall on Paddy Fields Inundation (설계강우의 지속시간 및 시간분포에 따른 배수개선 농경지 침수 영향 분석)

  • Jun, Sang-Min;Kim, Kwi-Hoon;Lee, Hyunji;Kang, Ki-Ho;Yoo, Seung-Hwan;Choi, Jin-Yong;Kang, Moon-Seong
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.64 no.2
    • /
    • pp.47-55
    • /
    • 2022
  • The objective of this study was to analyze the effect of the duration and time distribution of probability rainfall on farmland inundation for the paddy fields in the drainage improvement project site. In this study, eight drainage improvement project sites were selected for inundation modeling. Hourly rainfall data were collected, and 20- and 30-year frequency probability rainfalls were estimated for 14 different durations. Probability rainfalls were distributed using Intensity-Duration-Frequency (IDF) and Huff time distribution methods. Design floods were calculated for 48 hr and critical duration, and IDF time distribution and Huff time distribution were used for 48 hr duration and critical duration, respectively. Inundation modeling was carried out for each study district using 48 hr and critical duration rainfalls. The result showed that six of the eight districts had a larger flood discharge using the method of applying critical duration and Huff distribution. The results of inundation depth analysis showed similar trends to those of design flood calculations. However, the inundation durations showed different tendencies from the inundation depth. The IDF time distribution is a distribution in which most of the rainfall is concentrated at the beginning of rainfall, and the theoretical background is unclear. It is considered desirable to apply critical duration and Huff time distribution to agricultural production infrastructure design standards in consideration of uniformity with other design standards such as flood calculation standard guidelines.

Analysis of Public Perception and Policy Implications of Foreign Workers through Social Big Data analysis (소셜 빅데이터분석을 통한 외국인근로자에 관한 국민 인식 분석과 정책적 함의)

  • Ha, Jae-Been;Lee, Do-Eun
    • Journal of Digital Convergence
    • /
    • v.19 no.11
    • /
    • pp.1-10
    • /
    • 2021
  • This paper aimed to look at the awareness of foreign workers in social platforms by using text mining, one of the big data techniques and draw suggestions for foreign workers. To achieve this purpose, data collection was conducted with search keyword 'Foreign Worker' from Jan. 1, to Dec. 31, 2020, and frequency analysis, TF-IDF analysis, and degree centrality analysis and 100 parent keywords were drawn for comparison. Furthermore, Ucinet6.0 and Netdraw were used to analyze semantic networks, and through CONCOR analysis, data were clustered into the following eight groups: foreigner policy issue, regional community issue, business owner's perspective issue, employment issue, working environment issue, legal issue, immigration issue, and human rights issue. Based on such analyzed results, it identified national awareness of foreign workers and main issues and provided the basic data on policy proposals for foreign workers and related researches.

Analysis of Traffic Improvement Measures in Transportation Impact Assessment Using Text Mining : Focusing on City Development Projects in Gyeonggi Province (텍스트마이닝을 활용한 교통영향평가 교통개선대책 분석 : 경기도 도시개발사업을 대상으로)

  • Eun Hye Yang;Hee Chan Kang;Woo-Young Ahn
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.22 no.2
    • /
    • pp.182-194
    • /
    • 2023
  • Traffic impact assessment plays a crucial role in resolving traffic issues that may arise during the implementation of urban and transportation projects. However, reported results diverge, presumably because the items reviewed differ. In this study, we analyze traffic improvement measures approved for traffic impact assessment, identify key items, and present items that should be included in assessments. Specifically, TF-IDF and N-gram analysis and text mining were performed with focus on urban development projects approved in Gyeonggi Province. The results obtained show that keywords associated with newly established transportation infrastructure, such as roads and intersections, were essential assessment items, followed by the locations of entrances and exits and pedestrian connectivity. We recommend that considerations of the items presented in this study be incorporated into future traffic impact assessment guidelines and standards to improve the consistency and objectivity of the assessment process.

A Case Study on Text Analysis Using Meal Kit Product Review Data (밀키트 제품 리뷰 데이터를 이용한 텍스트 분석 사례 연구)

  • Choi, Hyeseon;Yeon, Kyupil
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.1-15
    • /
    • 2022
  • In this study, text analysis was performed on the mealkit product review data to identify factors affecting the evaluation of the mealkit product. The data used for the analysis were collected by scraping 334,498 reviews of mealkit products in Naver shopping site. After preprocessing the text data, wordclouds and sentiment analyses based on word frequency and normalized TF-IDF were performed. Logistic regression model was applied to predict the polarity of reviews on mealkit products. From the logistic regression models derived for each product category, the main factors that caused positive and negative emotions were identified. As a result, it was verified that text analysis can be a useful tool that provides a basis for maximizing positive factors for a specific category, menu, and material and removing negative risk factors when developing a mealkit product.

Metaverse Platform Customer Review Analysis Using Text Mining Techniques (텍스트 마이닝 기법을 활용한 메타버스 플랫폼 고객 리뷰 분석)

  • Hye Jin Kim;Jung Seung Lee;Soo Kyung Kim
    • Journal of Information Technology Applications and Management
    • /
    • v.31 no.1
    • /
    • pp.113-122
    • /
    • 2024
  • This comprehensive study delves into the analysis of user review data across various metaverse platforms, employing advanced text mining techniques such as TF-IDF and Word2Vec to gain insights into user perceptions. The primary objective is to uncover the factors that contribute to user satisfaction and dissatisfaction, thereby providing a nuanced understanding of user experiences in the metaverse. Through TF-IDF analysis, the research identifies key words and phrases frequently mentioned in user reviews, highlighting aspects that resonate positively with users, such as the ability to engage in creative activities and social interactions within these virtual environments. Word2Vec analysis further enriches this understanding by revealing the contextual relationships between words, offering a deeper insight into user sentiments and the specific features that enhance their engagement with the platforms. A significant finding of this study is the identification of common grievances among users, particularly related to the processes of refunds and login, which point to broader issues within payment systems and user interface designs across platforms. These insights are critical for developers and operators of metaverse platforms, suggesting a focused approach towards enhancing user experiences by amplifying positive aspects. The research underscores the importance of continuous improvement in user interface design and the transparency of payment systems to foster a loyal user base. By providing a comprehensive analysis of user reviews, this study offers valuable guidance for the strategic development and optimization of metaverse platforms, ensuring they remain responsive to user needs and continue to evolve as vibrant, engaging virtual environments.