• Title/Summary/Keyword: Document Frequency

Search Result 298, Processing Time 0.022 seconds

Text Mining and Association Rules Analysis to a Self-Introduction Letter of Freshman at Korea National College of Agricultural and Fisheries (1) (한국농수산대학 신입생 자기소개서의 텍스트 마이닝과 연관규칙 분석 (1))

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Shin, Y.K.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.22 no.1
    • /
    • pp.113-129
    • /
    • 2020
  • In this study we examined the topic analysis and correlation analysis by text mining to extract meaningful information or rules from the self introduction letter of freshman at Korea National College of Agriculture and Fisheries in 2020. The analysis items are described in items related to 'academic' and 'in-school activities' during high school. In the text mining results, the keywords of 'academic' items were 'study', 'thought', 'effort', 'problem', 'friend', and the key words of 'in-school activities' were 'activity', 'thought', 'friend', 'club', 'school' in order. As a result of the correlation analysis, the key words of 'thinking', 'studying', 'effort', and 'time' played a central role in the 'academic' item. And the key words of 'in-school activities' were 'thought', 'activity', 'school', 'time', and 'friend'. The results of frequency analysis and association analysis were visualized with word cloud and correlation graphs to make it easier to understand all the results. In the next study, TF-IDF(Term Frequency-Inverse Document Frequency) analysis using 'frequency of keywords' and 'reverse of document frequency' will be performed as a method of extracting key words from a large amount of documents.

The Information Effect on Foreigner's Intraday in Stock Index Futures (주가지수선물에 있어 외국인의 하루중 정보효과에 관한 연구)

  • 신연수
    • The Journal of Information Technology
    • /
    • v.1 no.2
    • /
    • pp.181-193
    • /
    • 1998
  • The measure of public information flow developed here is the number of order frequency. In the first part of the analysis, I document the general pattern of public information, with an emphasis on the intraday arrival of information. Overall, I find that public information arrival is nonconstant Consistent with earlier studies, I find that information arrival exhibits an inverted U-shape pattern across intraday trading. Over the average trading day, the flow of public information increases throughout the morning hours and then falls over the period, between 3:00 P.M. and 3:05. The second part of my analysis focuse is the relation between the public information variable and measure of intraday order frequency, specifically insignificant effect. According to time flow in the intraday trading, although the number of insignificant effect is increasing moderately, the result is remarkable in light of the aggreate nature of the public information and order frequency variable employed. The foreigner's investor group changes homogenously.

  • PDF

An Analysis of Indications of Meridians in DongUiBoGam Using Data Mining (데이터마이닝을 이용한 동의보감에서 경락의 주치특성 분석)

  • Chae, Younbyoung;Ryu, Yeonhee;Jung, Won-Mo
    • Korean Journal of Acupuncture
    • /
    • v.36 no.4
    • /
    • pp.292-299
    • /
    • 2019
  • Objectives : DongUiBoGam is one of the representative medical literatures in Korea. We used text mining methods and analyzed the characteristics of the indications of each meridian in the second chapter of DongUiBoGam, WaeHyeong, which addresses external body elements. We also visualized the relationships between the meridians and the disease sites. Methods : Using the term frequency-inverse document frequency (TF-IDF) method, we quantified values regarding the indications of each meridian according to the frequency of the occurrences of 14 meridians and 14 disease sites. The spatial patterns of the indications of each meridian were visualized on a human body template according to the TF-IDF values. Using hierarchical clustering methods, twelve meridians were clustered into four groups based on the TF-IDF distributions of each meridian. Results : TF-IDF values of each meridian showed different constellation patterns at different disease sites. The spatial patterns of the indications of each meridian were similar to the route of the corresponding meridian. Conclusions : The present study identified spatial patterns between meridians and disease sites. These findings suggest that the constellations of the indications of meridians are primarily associated with the lines of the meridian system. We strongly believe that these findings will further the current understanding of indications of acupoints and meridians.

Design of WWW IR System Based on Keyword Clustering Architecture (색인어 말뭉치 처리를 기반으로 한 웹 정보검색 시스템의 설계)

  • 송점동;이정현;최준혁
    • The Journal of Information Technology
    • /
    • v.1 no.1
    • /
    • pp.13-26
    • /
    • 1998
  • In general Information retrieval systems, improper keywords are often extracted and different search results are offered comparing to user's aim bacause the systems use only term frequency informations for selecting keywords and don't consider their meanings. It represents that improving precision is limited without considering semantics of keywords because recall ratio and precision have inverse proportion relation. In this paper, a system which is able to improve precision without decreasing recall ratio is designed and implemented, as client user module is introduced which can send feedbacks to server with user's intention. For this purpose, keywords are selected using relative term frequency and inverse document frequency and co-occurrence words are extracted from original documents. Then, the keywords are clustered by their semantics using calculated mutual informations. In this paper, the system can reject inappropriate documents using segmented semantic informations according to feedbacks from client user module. Consequently precision of the system is improved without decreasing recall ratio.

  • PDF

Analysis of Safety of the Chemical Facilities by Korea Risk Based-Inspection in the Petrochemical Plant (석유화학공장에서의 한국형 위험기반검사에 의한 화학설비의 안정성 평가)

  • Kim, Tae-Ok;Lee, Hern-Chang;Shin, Pyng-Sik;Choi, Byung-Nam;Jo, Ji-Hoon;Choi, Byung-Young;Park, Sung-Hoo;Kim, Hung-Kun
    • Journal of the Korean Society of Safety
    • /
    • v.22 no.6
    • /
    • pp.35-40
    • /
    • 2007
  • As a way of improving the safety of the chemical facilities, the risk based-inspection(RBI) was executed for the facilities of the applied petrochemical plant using KS-RBI Ver. 3.0 program developed based on the API-581 based resource document(BRD). From an evaluation result of KS-RBI program, we could find the evaluation of the process safety management(PSM) for the applied plant, risk of the applied process, risk of static facilities and pipes, and the demage mechanism of the facilities. Also, we could suggest a proper inspection plan(frequency and method of inspection) using the calculated risk and the status of the facilities. Therefore, the applied plant could be achieved a reduced inspection cost by an extension of inspection frequency, improved productivity, improved reliability of the facilities, and a computerized history management.

Analysis of Media Articles on COVID-19 and Nurses Using Text Mining and Topic Modeling (텍스트 마이닝과 토픽모델링 분석을 활용한 코로나19와 간호사에 대한 언론기사 분석)

  • An, Jiyeon;Yi, Yunjeong;Lee, Bokim
    • Research in Community and Public Health Nursing
    • /
    • v.32 no.4
    • /
    • pp.467-476
    • /
    • 2021
  • Purpose: The purpose of this study is to understand the social perceptions of nurses in the context of the COVID-19 outbreak through analysis of media articles. Methods: Among the media articles reported from January 1st to September 30th, 2020, those containing the keywords '[corona or Wuhan pneumonia or covid] and [nurse or nursing]' are extracted. After the selection process, the text mining and topic modeling are performed on 454 media articles using textom version 4.5. Results: Frequency Top 30 keywords include 'Nurse', 'Corona', 'Isolation', 'Support', 'Shortage', 'Protective Clothing', and so on. Keywords that ranked high in Term Frequency-Inverse Document Frequency (TF-IDF) values are 'Daegu', 'President', 'Gwangju', 'manpower', and so on. As a result of the topic analysis, 10 topics are derived, such as 'Local infection', 'Dispatch of personnel', 'Message for thanks', and 'Delivery of one's heart'. Conclusion: Nurses are both the contributors and victims of COVID-19 prevention. The government and the nurses' community should make efforts to improve poor working conditions and manpower shortages.

A Study on Fashion Startup Ecosystem Trends in Korea Using Big Data Analysis - Focusing on Newspaper Articles in 2012-2022 - (빅데이터 분석을 활용한 우리나라 패션 스타트업 생태계의 추세 연구 - 2012~2022년 신문기사를 중심으로 -)

  • Soojung Lim;Sunjin Hwang
    • Journal of Fashion Business
    • /
    • v.27 no.1
    • /
    • pp.1-15
    • /
    • 2023
  • This study divided articles into two time periods, from 2012 to 2022, with the aim of using big data analysis to look at patterns in the ecosystem of fashion start-ups. The research method extracted top keywords based on TF(Term Frequency) and TF-IDF(Term Frequency-Inverse Document Frequency), analyzed the network, and derived centrality values. As a result of comparing the first and second fashion startup ecosystems, elements of policy, support, market, finance, and human capital were derived in the first period. In addition, in the second period, elements of policy, support, market, finance, and culture were derived. In the first period, the fashion startup ecosystem focused on fostering new designer startups by emphasizing support, finance, and human capital factors and focusing on policies. Meanwhile, in the second period, online-based fashion platform startups and fashion tech startups appeared with the support of digital transformation and fulfillment services triggered by COVID-19(Corona Virus Disease 19), private finances were emphasized, and cultural factors were derived along with success stories of fashion startups. This study is meaningful in that it helps in developing strategies for fashion startups to grow into sustainable companies.

A Study on the Perception of Metaverse Fashion Using Big Data Analysis

  • Hosun Lim
    • Fashion & Textile Research Journal
    • /
    • v.25 no.1
    • /
    • pp.72-81
    • /
    • 2023
  • As changes in social and economic paradigms are accelerating, and non-contact has become the new normal due to the COVID-19 pandemic, metaverse services that build societies in online activities and virtual reality are spreading rapidly. This study analyzes the perception and trend of metaverse fashion using big data. TEXTOM was used to extract metaverse and fashion-related words from Naver and Google and analyze their frequency and importance. Additionally, structural equivalence analysis based on the derived main words was conducted to identify the perception and trend of metaverse fashion. The following results were obtained: First, term frequency(TF) analysis revealed the most frequently appearing words were "metaverse," "fashion," "virtual," "brand," "platform," "digital," "world," "Zepeto," "company," and "game." After analyzing TF-inverse document frequency(TF-IDF), "virtual" was the most important, followed by "brand," "platform," "Zepeto," "digital," "world," "industry," "game," "fashion show," and "industry." "Metaverse" and "fashion" were found to have a high TF but low TF-IDF. Further, words such as "virtual," "brand," "platform," "Zepeto," and "digital" had a higher TF-IDF ranking than TF, indicating that they had high importance in the text. Second, convergence of iterated correlations analysis using UNICET revealed four clusters, classified as "virtual world," "metaverse distribution platform," "fashion contents technology investment," and "metaverse fashion week." Fashion brands are hosting virtual fashion shows and stores on metaverse platforms where the virtual and real worlds coexist, and investment in developing metaverse-related technologies is under way.

Chatbot Design Method Using Hybrid Word Vector Expression Model Based on Real Telemarketing Data

  • Zhang, Jie;Zhang, Jianing;Ma, Shuhao;Yang, Jie;Gui, Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1400-1418
    • /
    • 2020
  • In the development of commercial promotion, chatbot is known as one of significant skill by application of natural language processing (NLP). Conventional design methods are using bag-of-words model (BOW) alone based on Google database and other online corpus. For one thing, in the bag-of-words model, the vectors are Irrelevant to one another. Even though this method is friendly to discrete features, it is not conducive to the machine to understand continuous statements due to the loss of the connection between words in the encoded word vector. For other thing, existing methods are used to test in state-of-the-art online corpus but it is hard to apply in real applications such as telemarketing data. In this paper, we propose an improved chatbot design way using hybrid bag-of-words model and skip-gram model based on the real telemarketing data. Specifically, we first collect the real data in the telemarketing field and perform data cleaning and data classification on the constructed corpus. Second, the word representation is adopted hybrid bag-of-words model and skip-gram model. The skip-gram model maps synonyms in the vicinity of vector space. The correlation between words is expressed, so the amount of information contained in the word vector is increased, making up for the shortcomings caused by using bag-of-words model alone. Third, we use the term frequency-inverse document frequency (TF-IDF) weighting method to improve the weight of key words, then output the final word expression. At last, the answer is produced using hybrid retrieval model and generate model. The retrieval model can accurately answer questions in the field. The generate model can supplement the question of answering the open domain, in which the answer to the final reply is completed by long-short term memory (LSTM) training and prediction. Experimental results show which the hybrid word vector expression model can improve the accuracy of the response and the whole system can communicate with humans.

A Technique to Recommend Appropriate Developers for Reported Bugs Based on Term Similarity and Bug Resolution History (개발자 별 버그 해결 유형을 고려한 자동적 개발자 추천 접근법)

  • Park, Seong Hun;Kim, Jung Il;Lee, Eun Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.12
    • /
    • pp.511-522
    • /
    • 2014
  • During the development of the software, a variety of bugs are reported. Several bug tracking systems, such as, Bugzilla, MantisBT, Trac, JIRA, are used to deal with reported bug information in many open source development projects. Bug reports in bug tracking system would be triaged to manage bugs and determine developer who is responsible for resolving the bug report. As the size of the software is increasingly growing and bug reports tend to be duplicated, bug triage becomes more and more complex and difficult. In this paper, we present an approach to assign bug reports to appropriate developers, which is a main part of bug triage task. At first, words which have been included the resolved bug reports are classified according to each developer. Second, words in newly bug reports are selected. After first and second steps, vectors whose items are the selected words are generated. At the third step, TF-IDF(Term frequency - Inverse document frequency) of the each selected words are computed, which is the weight value of each vector item. Finally, the developers are recommended based on the similarity between the developer's word vector and the vector of new bug report. We conducted an experiment on Eclipse JDT and CDT project to show the applicability of the proposed approach. We also compared the proposed approach with an existing study which is based on machine learning. The experimental results show that the proposed approach is superior to existing method.