• Title/Summary/Keyword: 단어빈도

Search Result 542, Processing Time 0.028 seconds

Data Analysis Research to Analyze the Cause of Low Birth Rate (저출산 원인 확인을 위한 데이터 분석연구)

  • Lee, Jeongwon;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.496-498
    • /
    • 2021
  • In Korea, based on the high fertility rate before 1980, the total population has been steadily increasing, and since the mid-1980s, the fertility rate has fallen sharply and has fallen below the level of population replacement. The cause of low birth rate in the region is not voluntary rejection, but rather, it is necessary to find out the cause by identifying the structural causes of the local community from various angles. We collected local Internet news and local representative cafe data, where many mothers participate, based on the budget area with a very low fertility rate among various areas. Factors of childbirth inhibition were analyzed by using the frequency of concurrent words that became issues related to population decline, low birthrate, and child-rearing welfare.

  • PDF

Convolutional Neural Network-based Malware Classification Method utilizing Local Feature-based Global Image (로컬 특징 기반 글로벌 이미지를 사용한 CNN 기반의 악성코드 분류 방법)

  • Jang, Sejun;Sung, Yunsick
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.05a
    • /
    • pp.222-223
    • /
    • 2020
  • 최근 악성코드로 인한 피해가 증가하고 있다. 악성코드는 악성코드가 속한 종류에 따라서 대응하는 방법도 다르기 때문에 악성코드를 종류별로 분류하는 연구도 중요하다. 기존에는 악성코드 시각화 과정을 통해서 생성된 악성코드의 글로벌 이미지를 사용해 악성코드를 각 종류별로 분류한다. 글로벌 이미지를 악성코드로부터 추출한 바이너리 정보를 사용해서 생성한다. 하지만, 글로벌 이미지만을 사용해서 악성코드를 각 종류별로 분류하는 경우 악성코드의 종류별로 중요한 특징을 고려하기 않기 때문에 분류 정확도가 떨어진다. 본 논문에서는 악성코드의 글로벌 이미지에 악성코드의 종류별 특징을 나타내기 위한 로컬 특징 기반 글로벌 이미지를 사용한 악성코드 분류 방법을 제안한다. 첫 번째, 악성 코드로부터 바이너리를 추출하고 추출된 바이너리를 사용해서 글로벌 이미지를 생성한다. 두 번째, 악성 코드로부터 로컬 특징을 추출하고 악성코드의 종류별 핵심 로컬 특징을 단어-역문서 빈도(Term Frequency Inverse Document Frequency, TFIDF) 알고리즘을 사용해 선택한다. 세 번째, 생성된 글로벌 이미지에 악성코드의 패밀리별 핵심 특징을 픽셀화해서 적용한다. 네 번째, 생성된 로컬 특징 기반 글로벌 이미지를 사용해서 컨볼루션 모델을 학습하고, 학습된 컨볼루션 모델을 사용해서 악성코드를 각 종류별로 분류한다.

Extracting User-Specific Advertising Keywords Based on Textual Data Mining from KakaoTalk (카카오톡에서의 텍스트 데이터 마이닝 기반의 사용자별 적합 광고 키워드 도출 )

  • Yerim Jeon;Dayeong So;Jimin Lee;Eunjin (Jinny) Jo;Jihoon Moon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.368-369
    • /
    • 2023
  • 대화 데이터 기반 광고 추천은 광고 마케팅에서 고객 맞춤형 광고 제공, 마케팅 효과 극대화 등을 위한 중요한 기술로 주목받고 있다. 본 논문에서는 모바일 인스턴스 메신저인 카카오톡 대화창에서 발생한 텍스트 데이터를 기반으로 대화 내용을 분석하여 대화 주제별 적절한 광고 키워드를 제안한다. 이를 위해 주제별 대화 내용을 미용, 식음료, 상거래로 세분하고 KoNLPy 의 Okt 를 이용하여 텍스트 전처리를 수행하고 키워드별로 빈도수를 뽑아 워드 클라우드를 제시한다. 또한, 잠재 디리클레 할당(Latent Dirichlet Allocation, LDA)을 기반으로 대화 주제를 세분화한 뒤 라벨링을 통해 주제별 대화 키워드를 분석한다. 실험 결과, 대화 주제를 온라인 쇼핑, 헤어, 뷰티 관리, 음식으로 나눌 수 있었으며, 토픽별 상위 키워드를 Word2Vec 을 통해 특정 단어와 유사한 키워드를 도출하여 적절한 광고 키워드를 제시할 수 있었다.

Properties and Quantitative Analysis of Bias in Korean Language Models: A Comparison with English Language Models and Improvement Suggestions (한국어 언어모델의 속성 및 정량적 편향 분석: 영어 언어모델과의 비교 및 개선 제안)

  • Jaemin Kim;Dong-Kyu Chae
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.558-562
    • /
    • 2023
  • 최근 ChatGPT의 등장으로 텍스트 생성 모델에 대한 관심이 높아지면서, 텍스트 생성 태스크의 성능평가를 위한 지표에 대한 연구가 활발히 이뤄지고 있다. 전통적인 단어 빈도수 기반의 성능 지표는 의미적인 유사도를 고려하지 못하기 때문에, 사전학습 언어모델을 활용한 지표인 BERTScore를 주로 활용해왔다. 하지만 이러한 방법은 사전학습 언어모델이 학습한 데이터에 존재하는 편향으로 인해 공정성에 대한 문제가 우려된다. 이에 따라 한국어 사전학습 언어모델의 편향에 대한 분석 연구가 필요한데, 기존의 한국어 사전학습 언어모델의 편향 분석 연구들은 사회에서 생성되는 다양한 속성 별 편향을 고려하지 못했다는 한계가 있다. 또한 서로 다른 언어를 기반으로 하는 사전학습 언어모델들의 속성 별 편향을 비교 분석하는 연구 또한 미비하였다. 이에 따라 본 논문에서는 한국어 사전학습 언어모델의 속성 별 편향을 비교 분석하며, 영어 사전학습 언어모델이 갖고 있는 속성 별 편향과 비교 분석하였고, 비교 가능한 데이터셋을 구축하였다. 더불어 한국어 사전학습 언어모델의 종류 및 크기 별 편향 분석을 통해 적합한 모델을 선택할 수 있도록 가이드를 제시한다.

  • PDF

Trend Properties and a Ranking Method for Automatic Trend Analysis (자동 트렌드 탐지를 위한 속성의 정의 및 트렌드 순위 결정 방법)

  • Oh, Heung-Seon;Choi, Yoon-Jung;Shin, Wook-Hyun;Jeong, Yoon-Jae;Myaeng, Sung-Hyon
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.3
    • /
    • pp.236-243
    • /
    • 2009
  • With advances in topic detection and tracking(TDT), automatic trend analysis from a collection of time-stamped documents, like patents, news papers, and blog pages, is a challenging research problem. Past research in this area has mainly focused on showing a trend line over time of a given concept by measuring the strength of trend-associated term frequency information. for detection of emerging trends, either a simple criterion such as frequency change was used, or an overall comparison was made against a training data. We note that in order to show most salient trends detected among many possibilities, it is critical to devise a ranking function. To this end, we define four properties(change, persistency, stability and volume) of trend lines drawn from frequency information, to quantify various aspects of trends, and propose a method by which trend lines can be ranked. The properties are examined individually and in combination in a series of experiments for their validity using the ranking algorithm. The results show that a judicious combination of the four properties is a better indicator for salient trends than any single criterion used in the past for ranking or detecting emerging trends.

The Relationship Between Korean Handwriting Skill and Visual Fixation (비장애 아동의 한글쓰기 숙련도와 시선고정 간의 관련성)

  • Hong, Mi Young;Lee, Cho Hee;Kim, Eunbin;Lee, Onseok;Kim, Eun Young
    • The Journal of Korean Academy of Sensory Integration
    • /
    • v.17 no.1
    • /
    • pp.1-8
    • /
    • 2019
  • Objective : This paper aimed to measure the relationship between the performance of Korean handwriting and visual fixation for children. Methods : Twenty-one typically developing children aged 7 to 9 years participated in the study. The children performed Korean handwriting task wearing Tobii Pro Glasses 2. The Korean handwriting task consisted of 10 words from elementary school textbooks. The handwriting skill was measured by the coefficient variation of the letter size and the fixation cound and duration. Correlation analysis was performed to investigate the relation between visual fixation and the coefficient variation of the letter size. Results : The results showed that the visual fixation per second was positively correlated with Korean handwriting vertical size coefficient variation, indicating that the more consistent the vertical size of the letter, the smaller the fixation count per second. Conclusion : The results suggested a relation between the performance of Korean handwriting and visual fixation in typically developing children. This study is the first attempt to measure eye movement during the Korean handwriting process, and suggests a future direction for research on students' development in writing Korean.

A Study on the Feature Point Extraction Methodology based on XML for Searching Hidden Vault Anti-Forensics Apps (은닉형 Vault 안티포렌식 앱 탐색을 위한 XML 기반 특징점 추출 방법론 연구)

  • Kim, Dae-gyu;Kim, Chang-soo
    • Journal of Internet Computing and Services
    • /
    • v.23 no.2
    • /
    • pp.61-70
    • /
    • 2022
  • General users who use smartphone apps often use the Vault app to protect personal information such as photos and videos owned by individuals. However, there are increasing cases of criminals using the Vault app function for anti-forensic purposes to hide illegal videos. These apps are one of the apps registered on Google Play. This paper proposes a methodology for extracting feature points through XML-based keyword frequency analysis to explore Vault apps used by criminals, and text mining techniques are applied to extract feature points. In this paper, XML syntax was compared and analyzed using strings.xml files included in the app for 15 hidden Vault anti-forensics apps and non-hidden Vault apps, respectively. In hidden Vault anti-forensics apps, more hidden-related words are found at a higher frequency in the first and second rounds of terminology processing. Unlike most conventional methods of static analysis of APK files from an engineering point of view, this paper is meaningful in that it approached from a humanities and sociological point of view to find a feature of classifying anti-forensics apps. In conclusion, applying text mining techniques through XML parsing can be used as basic data for exploring hidden Vault anti-forensics apps.

A Study on the Learning Effect and Satisfaction of Practical Classes for Students Majoring in Radiology in a Non-face-to-face Class Environment (방사선학 전공 학생의 비대면 전공 실습 수업에 대한 학습효과와 만족도에 관한 고찰)

  • Sung-Jin, Kang
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.7
    • /
    • pp.995-1006
    • /
    • 2022
  • The purpose of this study was to investigate the current status of practical course operation in a non-face-to-face online environment, learning effects, and students' experiences and perceptions for radiology major students using a survey. The questionnaire consisted of a total of 34 items in 5 areas: general characteristics of subjects, current learning participation in non-face-to-face environments, learning satisfaction, learning outcomes, improvement and requirements. For the analysis of the questionnaire responses, frequency analysis was performed on the response frequency, ratio, and scale for each item. Based on the general characteristics of the survey respondents, cross-analysis was performed using the chi-square test for participation in non-face-to-face learning, learning performance, and learning satisfaction. implemented. Improvements and requirements were qualitatively analyzed for the repetition frequency of words with the same meaning. Through the results of analyzing the responses of a total of 397 questionnaires, the direction of design and development of practical classes in a non-face-to-face environment in the future and basic information and implications for efficient operation were confirmed. Based on this, it is necessary to continue to think and make efforts for the efficient operation of non-face-to-face practice classes in the post-corona era.

Study on Korea Social Perceptions on the Forest Fires of Newspaper Analysis (신문사설 분석을 통한 산불에 대한 사회적 인식연구)

  • Kim, Bomi;Park, Joowon
    • Journal of Korean Society of Forest Science
    • /
    • v.108 no.1
    • /
    • pp.88-96
    • /
    • 2019
  • The purpose of this study is to understand when forest fire as a natural phenomenon becomes constructed as social issues in Korea; how the forest fire-related discourses in the editorials reflecting the social perceptions have been changed regarding the principal subject and the measures of the forest fire management; and whether the social perception on the forest fire affects the forest fire policy of the state. From the analysis of a total of 44 editorials related to forest fires from 1988 to 2017. By the using, in the forest fire-related editorials the social perceptions on forest fires are forest fire editorials categories, main keywords, contents of social perception on 'the main subject responsible for forest fire management,' 'forest fire prevention measures,' categorization, frequency analysis and context analysis of words used. It is found that in the first-period forest fire management measures were recognized as a part of the overall forest management. In the second period, the approaches of ecological management emerged on the part of forest fire management. As forest fires were managed as a type of social disaster during the third period, such perceptions were gradually reinforced that the state should protect the people from the forest fire. In the 3rd, 4th, and 5th National Forest Plan, the forest fire management policy of each period was focused in enlightening the general public, protecting forest resources ecosystems, and preventing loss of lives, respectively. As a result of the analysis of social perceptions and comparing them to the forest fire policies through the analysis of editorials on forest fires, it is found that the social perception on the forest fire and forest fire management plan has changed having interconnections.

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.