• Title/Summary/Keyword: Frequency based Text Analysis

Search Result 237, Processing Time 0.038 seconds

Selecting a key issue through association analysis of realtime search words (실시간 검색어 연관 분석을 통한 핵심 이슈 선정)

  • Chong, Min-Yeong
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.161-169
    • /
    • 2015
  • Realtime search words of typical portal sites appear every few seconds in descending order by search frequency in order to show issues increasing rapidly in interest. However, the characteristics of realtime search words reordering within too short a time cause problems that they go over the key issues of the day. This paper proposes a method for deriving a key issue through association analysis of realtime search words. The proposed method first makes scores of realtime search words depending on the ranking and the relative interest, and derives the top 10 search words through descriptive statistics for groups. Then, it extracts association rules depending on 'support' and 'confidence', and chooses the key issue based on the results as a graph visualizing them. The results of experiments show that the key issue through association rules is more meaningful than the first realtime search word.

Prediction Techniques for Difficulty Level of Hanja Using Multiple Linear Regression (다중 회귀 분석을 이용한 한자 난이도 예측 기법 연구)

  • Choi, Jeongwhan;Noh, Jiwoo;Kim, Suntae
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.6
    • /
    • pp.219-225
    • /
    • 2019
  • There is a problem with the existing method of selecting the difficulty levels of Hanja characters. Some Hanja characters selected by the existing methods are different from Sino-Korean words used in real life and it is impossible to know how many times the Hanja characters are used. To solve this problem, we measure the difficulty of Hanja characters using the multiple regression analysis with the frequency as the features. Based on the elementary textbooks, FWS and FHU are counted. A questionnaire is written using the two frequencies and stroke together to answer the appropriate timing of learning the Hanja characters and use them as target variables for regression. Use stepwise regression to select the appropriate features and perform multiple linear regression. The R2 score of the model was 0.1105 and the RMSE was 0.1105.

Identifying Interdisciplinary Trends of Humanities, Sociology, Science and Technology Research in Korea Using Topic Modeling and Network Analysis (인문사회 과학기술 분야 연구의 학제적 동향 분석 : 토픽 모델링과 네트워크 분석의 활용)

  • Choi, Jaewoong;Jang, Jaehyuk;Kim, Dae Hwan;Yoon, Janghyeok
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.42 no.1
    • /
    • pp.74-86
    • /
    • 2019
  • As many existing research fields are matured academically, researchers have encountered numbers of academic, social and other problems that cannot be addressed by internal knowledge and methodologies of existing disciplines. Earlier, pioneers of researchers thus are following a new paradigm that breaks the boundaries between the prior disciplines, fuses them and seeks new approaches. Moreover, developed countries including Korea are actively supporting and fostering the convergence research at the national level. Nevertheless, there is insufficient research to analyze convergence trends in national R&D support projects and what kind of content the projects mainly deal with. This study, therefore, collected and preprocessed the research proposal data of National Research Foundation of Korea, transforming the proposal documents to term-frequency matrices. Based on the matrices, this study derived detailed research topics through Latent Dirichlet Allocation, a kind of topic modeling algorithm. Next, this study identified the research topics each proposal mainly deals with, visualized the convergence relationships, and quantitatively analyze them. Specifically, this study analyzed the centralities of the detailed research topics to derive clues about the convergence of the near future, in addition to visualizing the convergence relationship and analyzing time-varying number of research proposals per each topic. The results of this study can provide specific insights on the research direction to researchers and monitor domestic convergence R&D trends by year.

Evaluating real-time search query variation for intelligent information retrieval service (지능 정보검색 서비스를 위한 실시간검색어 변화량 평가)

  • Chong, Min-Young
    • Journal of Digital Convergence
    • /
    • v.16 no.12
    • /
    • pp.335-342
    • /
    • 2018
  • The search service, which is a core service of the portal site, presents search queries that are rapidly increasing among the inputted search queries based on the highest instantaneous search frequency, so it is difficult to immediately notify a search query having a high degree of interest for a certain period. Therefore, it is necessary to overcome the above problems and to provide more intelligent information retrieval service by bringing improved analysis results on the change of the search queries. In this paper, we present the criteria for measuring the interest, continuity, and attention of real-time search queries. In addition, according to the criteria, we measure and summarize changes in real-time search queries in hours, days, weeks, and months over a period of time to assess the issues that are of high interest, long-lasting issues of interest, and issues that need attention in the future.

Sensitivity of abacus and Chasdaq in the Chinese stock market through analysis of Weibo sentiment related to Corona-19 (코로나-19관련 웨이보 정서 분석을 통한 중국 주식시장의 주판 및 차스닥의 민감도 예측 기법)

  • Li, Jiaqi;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • Investor mood from social media is gaining increasing attention for leading a price movement in stock market. Based on the behavioral finance theory, this study argues that sentiment extracted from social media using big data technique can predict a real-time (short-run) price momentum in Chinese stock market. Collecting Sina Weibo posts that related to COVID-19 using keyword method, a daily influential weighted sentiment factors is extracted from the sizable raw data of over 2 millions of posts. We examine one supervised and 4 unsupervised sentiment analysis model, and use the best performed word-frequency and BiLSTM mdoel. The test result shows a similar movement between stock price change and sentiment factor. It indicates that public mood extracted from social media can in some extent represent the investors' sentiment and make a difference in stock market fluctuation when people are concentrating on a special events that can cause effect on the stock market.

A Suggestion and an analysis on Changes on trend of the 'Virtual Tourism' before and after the Covid 19 Crisis using Textmining Method (텍스트 마이닝을 활용한 '가상관광'의 코로나19 전후 트렌드 분석 및 방향성 제언)

  • Sung, Yun-A
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.4
    • /
    • pp.155-161
    • /
    • 2022
  • The outbreak of the Covid 19 increased the interest on the 'Virtual Tourism. In this research the key word related to "Virtual Tourism" was collected through the search engine and was analyzed through the data mining method such as Log-odds ratio, Frequency, and network analysis. It is clear that the information and communication dependency increased in the field of "Virtual Tourism" after Covid 19 and also the trend have changed from "securement of the contents diversity" to "project related to economic recovery." Since the demands for the "Virtual Reality" such as metaverse is increasing, there should be an economic and circular structure in which the government establishing a related policy and the funding plan based on the research, local government and the private companies planning and producing discriminate contents focusing on AISAS(Attension, Interest, Search, Action, Share) aand the research institutions and universities developing, applying, assessing and commercializing the technology.

User Centered Interface Design of Web-based Attention Testing Tools: Inhibition of Return(IOR) and Graphic UI (웹 기반 주의력 검사의 사용자 인터페이스 설계: 회귀억제 과제와 그래픽 UI를 중심으로)

  • Kwahk, Ji-Eun;Kwak, Ho-Wan
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.4
    • /
    • pp.331-367
    • /
    • 2008
  • This study aims to validate a web-based neuropsychological testing tool developed by Kwak(2007) and to suggest solutions to potential problems that can deteriorate its validity. When it targets a wider range of subjects, a web-based neuropsychological testing tool is challenged by high drop-out rates, lack of motivation, lack of interactivity with the experimenter, fear of computer, etc. As a possible solution to these threats, this study aims to redesign the user interface of a web-based attention testing tool through three phases of study. In Study 1, an extensive analysis of Kwak's(2007) attention testing tool was conducted to identify potential usability problems. The Heuristic Walkthrough(HW) method was used by three usability experts to review various design features. As a result, many problems were found throughout the tool. The findings concluded that the design of instructions, user information survey forms, task screen, results screen, etc. did not conform to the needs of users and their tasks. In Study 2, 11 guidelines for the design of web-based attention testing tools were established based on the findings from Study 1. The guidelines were used to optimize the design and organization of the tool so that it fits to the user and task needs. The resulting new design alternative was then implemented as a working prototype using the JAVA programming language. In Study 3, a comparative study was conducted to demonstrate the excellence of the new design of attention testing tool(named graphic style tool) over the existing design(named text style tool). A total of 60 subjects participated in user testing sessions where their error frequency, error patterns, and subjective satisfaction were measured through performance observation and questionnaires. Through the task performance measurement, a number of user errors in various types were observed in the existing text style tool. The questionnaire results were also in support of the new graphic style tool, users rated the new graphic style tool higher than the existing text style tool in terms of overall satisfaction, screen design, terms and system information, ease of learning, and system performance.

  • PDF

Content Analysis of the 'Housing' Unit in the 2015 Revised Middle School Technology and Home Economics Textbook Using Text Mining (텍스트 마이닝을 이용한 2015 개정 중학교 기술·가정 교과서의 주생활 단원 내용분석)

  • Kim, Do-Yeon
    • Journal of Korean Home Economics Education Association
    • /
    • v.34 no.2
    • /
    • pp.1-19
    • /
    • 2022
  • The purpose of this study is to analyze the keywords of the middle school textbooks based on the 2015 revision of the technology and home economics curriculum to understand the core concepts and contents composition of the 'housing' unit. Using TEXTOM and UCINET programs, the frequencies and centralities of the keywords were analyzed, and CONCOR analysis was performed. The results are as follows. First, the content system of the 'housing' unit is divided into 'life culture' and 'safety' in the 'family life and safety' area. Second, in the 'safety' section, the frequencies of occurrence of the words were high in the order of indoor, occurrence, use, noise, and safety accidents, in the order of frequency of occurrence. It was confirmed that words related to daily life, safety accidents, and prevention were closely connected to each other. In the 'life culture' section, the frequencies of occurrence were high in the order of space, housing, family, and residential space, and the correlations between these keywords were also high. Third, the most influential core keywords were, indoor and occurrence in the 'safety' section, and space, family, and housing, in the 'life culture' section. Fourth, the 'safety' section were divided into two subunits, 'safe living environment' and 'comfortable living environment', and the 'life culture' section were divided into four subunits, 'living space composition', 'space utilization', 'housing value and lifestyle', and 'housing culture'.

Exploring the Trend of Korean Creative Dance by Analyzing Research Topics : Application of Text Mining (연구주제 분석을 통한 한국창작무용 경향 탐색 : 텍스트 마이닝의 적용)

  • Yoo, Ji-Young;Kim, Woo-Kyung
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.6
    • /
    • pp.53-60
    • /
    • 2020
  • The study is based on the assumption that the trend of phenomena and trends in research are contextually consistent. Therefore the purpose of this study is to explore the trend of dance through the subject analysis of the Korean creative dance study by utilizing text mining. Thus, 1,291 words were analyzed in the 616 journal title, which were established on the paper search website. The collection, refining and analysis of the data were all R 3.6.0 SW. According to the study, keywords representing the times were frequently used before the 2000s, but Korean creative dance research types were also found in terms of education and physical training. Second, the frequency of keywords related to the dance troupe's performance was high after the 2000s, but it was confirmed that Choi Seung-hee was still in an important position in the study of Korean creative dance. Third, an analysis of the overall research subjects of the Korean creative dance study showed that the research on 'Art of Choi Seung-hee in the modern era' was the highest proportion. Fourth, the Hot Topics, which are rising as of 2000, appeared as 'the performance activities of the National Dance Company' and 'the choreography expression and utilization of traditional dance'. However, since the recent trend of the National Dance Company's performance is advocating 'modernization based on tradition', it has been confirmed that the trend of Korean creative dance since the 2000s has been focused on the use of traditional dance motifs. Fifth, the Cold Topic, which has been falling as of 2000, has been shown to be a study of 'dancing expressions by age'. It was judged that interest in research also decreased due to the tendency to mix various dance styles after the establishment of the genre of Korean creative dance.

Study on the Viewers' Perception of Investigative Journalism Before and After Pandemic Using Big Data (빅데이터를 활용한 팬데믹 전후 탐사보도프로그램에 대한 시청자 인식연구)

  • Kyunghee Kim;Soonchul Kwon;Seunghyun Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.311-320
    • /
    • 2023
  • This paper analyzes viewers' perception of investigative journalism before and after COVID-19, and examines the direction of investigative journalism using big data. Based on the previous research set as a social science model, the relationship between words related to big data TV current affairs programs and investigative journalism in this paper was investigated before and after the appearance of COVID-19. We visualized changes in viewers' perception of investigative journalism by analyzing text data obtained through the use of Textom, with TV current affairs programs and investigative journalism as keywords. Data was collected from 2017 to June 2022 and refined for analysis. We visualized connectivity centrality using Ucinet 6.0 and Netdraw, and clustered the number of keywords and their frequency using Concor analysis. Our study found a clear change in viewer perception before and after the pandemic. As an implication of this thesis, big data analysis was conducted with the investigative journalism as the main keyword, and the direction of the investigative journalism was presented based on the analysis. Furthermore, based on previous research, we suggest effective approaches for investigative journalism after the pandemic to better engage viewers.