• Title/Summary/Keyword: 시간 마이닝

Search Result 401, Processing Time 0.027 seconds

A Case Study of a Text Mining Method for Discovering Evolutionary Patterns of Mobile Phone in Korea (국내 휴대폰의 진화패턴 규명을 위한 텍스트 마이닝 방안 제안 및 사례 연구)

  • On, Byung-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.2
    • /
    • pp.29-45
    • /
    • 2015
  • Systematic theory, concepts, and methodology for the biological evolution have been developed while patterns and principles of the evolution have been actively studied in the past 200 years. Furthermore, they are applied to various fields such as evolutionary economics, evolutionary psychology, evolutionary linguistics, making significant progress in research. In addition, existing studies have applied main biological evolutionary models to artifacts although such methods do not fit to them. These models are also limited to generalize evolutionary patterns of artifacts because they are designed in terms of a subjective point of view of experts who know well about the artifacts. Unlike biological organisms, because artifacts are likely to reflect the imagination of the human will, it is known that the theory of biological evolution cannot be directly applied to artifacts. In this paper, beyond the individual's subjective, the aim of our research is to present evolutionary patterns of a given artifact based on peeping the idea of the public. For this, we propose a text mining approach that presents a systematic framework that can find out the evolutionary patterns of a given artifact and then visualize effectively. In particular, based on our proposal, we focus mainly on a case study of mobile phone that has emerged as an icon of innovation in recent years. We collect and analyze review posts on mobile phone available in the domestic market over the past decade, and discuss the detailed results about evolutionary patterns of the mobile phone. Moreover, this kind of task is a tedious work over a long period of time because a small number of experts carry out an extensive literature survey and summarize a huge number of materials to finally draw a diagram of evolutionary patterns of the mobile phone. However, in this work, to minimize the human efforts, we present a semi-automatic mining algorithm, and through this research we can understand how human creativity and imagination are implemented. In addition, it is a big help to predict the future trend of mobile phone in business and industries.

The Tresnds of Artiodactyla Researches in Korea, China and Japan using Text-mining and Co-occurrence Analysis of Words (텍스트마이닝과 동시출현단어분석을 이용한 한국, 중국, 일본의 우제목 연구 동향 분석)

  • Lee, Byeong-Ju;Kim, Baek-Jun;Lee, Jae Min;Eo, Soo Hyung
    • Korean Journal of Environment and Ecology
    • /
    • v.33 no.1
    • /
    • pp.9-15
    • /
    • 2019
  • Artiodactyla, which is an even-toed mammal, widely inhabits worldwide. In recent years, wild Artiodactyla species have attracted public attention due to the rapid increase of crop damage and road-kill caused by wild Artiodactyla such as water deer and wild boar and the decrease of some species such as long-tailed goral and musk deer. In spite of such public attention, however, there have been few studies on Artiodactyla in Korea, and no studies have focused on the trend analysis of Artiodactyla, making it difficult to understand actual problems. Many recent studies on trend used text-mining and co-occurrence analysis to increase objectivity in the classification of research subjects by extracting keywords appearing in literature and quantifying relevance between words. In this study, we analyzed texts from research articles of three countries (Korea, China, and Japan) through text-mining and co-occurrence analysis and compared the research subjects in each country. We extracted 199 words from 665 articles related to Artiodactyla of three countries through text-mining. Three word-clusters were formed as a result of co-occurrence analysis on extracted words. We determined that cluster1 was related to "habitat condition and ecology", cluster2 was related to "disease" and cluster3 was related to "conservation genetics and molecular ecology". The results of comparing the rates of occurrence of each word clusters in each country showed that they were relatively even in China and Japan whereas Korea had a prevailing rate (69%) of cluster2 related to "disease". In the regression analysis on the number of words per year in each cluster, the number of words in both China and Japan increased evenly by year in each cluster while the rate of increase of cluster2 was five times more than the other clusters in Korea. The results indicate that Korean researches on Artiodactyla tended to focus on diseases more than those in China and Japan, and few researchers considered other subjects including habitat characteristics, behavior and molecular ecology. In order to control the damage caused by Artiodactyla and to establish a reasonable policy for the protection of endangered species, it is necessary to accumulate basic ecological data by conducting researches on wild Artiodactyla more.

Analyzing the discriminative characteristic of cover letters using text mining focused on Air Force applicants (텍스트 마이닝을 이용한 공군 부사관 지원자 자기소개서의 차별적 특성 분석)

  • Kwon, Hyeok;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.75-94
    • /
    • 2021
  • The low birth rate and shortened military service period are causing concerns about selecting excellent military officers. The Republic of Korea entered a low birth rate society in 1984 and an aged society in 2018 respectively, and is expected to be in a super-aged society in 2025. In addition, the troop-oriented military is changed as a state-of-the-art weapons-oriented military, and the reduction of the military service period was implemented in 2018 to ease the burden of military service for young people and play a role in the society early. Some observe that the application rate for military officers is falling due to a decrease of manpower resources and a preference for shortened mandatory military service over military officers. This requires further consideration of the policy of securing excellent military officers. Most of the related studies have used social scientists' methodologies, but this study applies the methodology of text mining suitable for large-scale documents analysis. This study extracts words of discriminative characteristics from the Republic of Korea Air Force Non-Commissioned Officer Applicant cover letters and analyzes the polarity of pass and fail. It consists of three steps in total. First, the application is divided into general and technical fields, and the words characterized in the cover letter are ordered according to the difference in the frequency ratio of each field. The greater the difference in the proportion of each application field, the field character is defined as 'more discriminative'. Based on this, we extract the top 50 words representing discriminative characteristics in general fields and the top 50 words representing discriminative characteristics in technology fields. Second, the number of appropriate topics in the overall cover letter is calculated through the LDA. It uses perplexity score and coherence score. Based on the appropriate number of topics, we then use LDA to generate topic and probability, and estimate which topic words of discriminative characteristic belong to. Subsequently, the keyword indicators of questions used to set the labeling candidate index, and the most appropriate index indicator is set as the label for the topic when considering the topic-specific word distribution. Third, using L-LDA, which sets the cover letter and label as pass and fail, we generate topics and probabilities for each field of pass and fail labels. Furthermore, we extract only words of discriminative characteristics that give labeled topics among generated topics and probabilities by pass and fail labels. Next, we extract the difference between the probability on the pass label and the probability on the fail label by word of the labeled discriminative characteristic. A positive figure can be seen as having the polarity of pass, and a negative figure can be seen as having the polarity of fail. This study is the first research to reflect the characteristics of cover letters of Republic of Korea Air Force non-commissioned officer applicants, not in the private sector. Moreover, these methodologies can apply text mining techniques for multiple documents, rather survey or interview methods, to reduce analysis time and increase reliability for the entire population. For this reason, the methodology proposed in the study is also applicable to other forms of multiple documents in the field of military personnel. This study shows that L-LDA is more suitable than LDA to extract discriminative characteristics of Republic of Korea Air Force Noncommissioned cover letters. Furthermore, this study proposes a methodology that uses a combination of LDA and L-LDA. Therefore, through the analysis of the results of the acquisition of non-commissioned Republic of Korea Air Force officers, we would like to provide information available for acquisition and promotional policies and propose a methodology available for research in the field of military manpower acquisition.

Mining Frequent Trajectory Patterns in RFID Data Streams (RFID 데이터 스트림에서 이동궤적 패턴의 탐사)

  • Seo, Sung-Bo;Lee, Yong-Mi;Lee, Jun-Wook;Nam, Kwang-Woo;Ryu, Keun-Ho;Park, Jin-Soo
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.1
    • /
    • pp.127-136
    • /
    • 2009
  • This paper proposes an on-line mining algorithm of moving trajectory patterns in RFID data streams considering changing characteristics over time and constraints of single-pass data scan. Since RFID, sensor, and mobile network technology have been rapidly developed, many researchers have been recently focused on the study of real-time data gathering from real-world and mining the useful patterns from them. Previous researches for sequential patterns or moving trajectory patterns based on stream data have an extremely time-consum ing problem because of multi-pass database scan and tree traversal, and they also did not consider the time-changing characteristics of stream data. The proposed method preserves the sequential strength of 2-lengths frequent patterns in binary relationship table using the time-evolving graph to exactly reflect changes of RFID data stream from time to time. In addition, in order to solve the problem of the repetitive data scans, the proposed algorithm infers candidate k-lengths moving trajectory patterns beforehand at a time point t, and then extracts the patterns after screening the candidate patterns by only one-pass at a time point t+1. Through the experiment, the proposed method shows the superior performance in respect of time and space complexity than the Apriori-like method according as the reduction ratio of candidate sets is about 7 percent.

  • PDF

Text Mining and Association Rules Analysis to a Self-Introduction Letter of Freshman at Korea National College of Agricultural and Fisheries (1) (한국농수산대학 신입생 자기소개서의 텍스트 마이닝과 연관규칙 분석 (1))

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Shin, Y.K.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.22 no.1
    • /
    • pp.113-129
    • /
    • 2020
  • In this study we examined the topic analysis and correlation analysis by text mining to extract meaningful information or rules from the self introduction letter of freshman at Korea National College of Agriculture and Fisheries in 2020. The analysis items are described in items related to 'academic' and 'in-school activities' during high school. In the text mining results, the keywords of 'academic' items were 'study', 'thought', 'effort', 'problem', 'friend', and the key words of 'in-school activities' were 'activity', 'thought', 'friend', 'club', 'school' in order. As a result of the correlation analysis, the key words of 'thinking', 'studying', 'effort', and 'time' played a central role in the 'academic' item. And the key words of 'in-school activities' were 'thought', 'activity', 'school', 'time', and 'friend'. The results of frequency analysis and association analysis were visualized with word cloud and correlation graphs to make it easier to understand all the results. In the next study, TF-IDF(Term Frequency-Inverse Document Frequency) analysis using 'frequency of keywords' and 'reverse of document frequency' will be performed as a method of extracting key words from a large amount of documents.

Text Mining and Association Rules Analysis to a Self-Introduction Letter of Freshman at Korea National College of Agricultural and Fisheries (2) (한국농수산대학 신입생 자기소개서의 텍스트 마이닝과 연관규칙 분석 (2))

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Shin, Y.K.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.22 no.2
    • /
    • pp.99-114
    • /
    • 2020
  • In this study we examined the topic analysis and correlation analysis by text mining from the self introduction letter of freshman at Korea National College of Agriculture and Fisheries(KNCAF) in 2020. The analysis items of the 3rd question were and the 4th question were the motivation for applying to college, the academic plan and the career plan. The text mining to the 3rd question showed that the frequency of 'friends' was overwhelmingly high, followed by keywords such as 'thought', 'time', 'opinion', 'activity', and 'club'. In the 4th question, keyword frequency such as 'thought', 'agriculture', 'KNCAF', 'farm', 'father' was high. The result of association rules analysis for each question showed that the relationship with the highest support level, which means the frequency and importance of the rule, was the {friend} <=> {thought}, {thought} <=> {KNCAF}. The confidence level of a correlation between keywords was the highest in the rules of {teacher}=>{friend}, {agriculture, KNCAF}=>{thought}. Also the lift level that indicates the closeness of two words was the highest in the rules of {friend} <=> {teacher}, {knowledge} <=> {professional}. These keywords are found to play a very important roles in analyzing betweenness centrality and analyzing degree centrality between keywords. The results of frequency analysis and association analysis were visualized with word cloud and correlation graphs to make it easier to understand all the results.

Mining Association Rules in Multiple Databases using Links (복수 데이터베이스에서 링크를 이용한 연관 규칙 탐사)

  • Bae, Jin-Uk;Sin, Hyo-Seop;Lee, Seok-Ho
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.8
    • /
    • pp.939-954
    • /
    • 1999
  • 데이타마이닝 분야에서는 대용량의 트랜잭션 데이타베이스와 같은 하나의 데이타베이스로부터 연관 규칙을 찾는 연구가 많이 수행되어왔다. 그러나, 창고형 할인매장이나 백화점 같이 고객 카드를 이용하는 판매점의 등장으로, 단지 트랜잭션에 대한 분석 뿐만이 아니라, 트랜잭션과 고객과의 관계에 대한 분석 또한 요구되고 있다. 즉, 두 개의 데이타베이스로부터 연관 규칙을 찾는 연구가 필요하다. 이 논문에서는 두 데이타베이스 사이에 링크를 생성하여 연관 항목집합을 찾는 알고리즘을 제안한다. 실험 결과, 링크를 이용한 알고리즘은 고객 데이타베이스가 메모리에 거주가능한 크기라면 시간에 따른 분석에 유용함을 보여주었다.Abstract There have been a lot of researches of mining association rules from one database such as transaction database until now. But as the large discount store using customer card emerges, the analysis is not only required about transactions, but also about the relation between transactions and customer data. That is, it is required to search association rules from two databases. This paper proposes an efficient algorithm constructing links from one database to the other. Our experiments show the algorithm using link is useful for temporal analysis of memory-resident customer database.

New Optimization Algorithm for Data Clustering (최적화에 기반 한 데이터 클러스터링 알고리즘)

  • Kim, Ju-Mi
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.3
    • /
    • pp.31-45
    • /
    • 2007
  • Large data handling is one of critical issues that the data mining community faces. This is particularly true for computationally intense tasks such as data clustering. Random sampling of instances is one possible means of achieving large data handling, but a pervasive problem with this approach is how to deal with the noise in the evaluation of the learning algorithm. This paper develops a new optimization based clustering approach using an algorithm specifically designed for noisy performance. Numerical results show this algorithm better than the other algorithms such as PAM and CLARA. Also with this algorithm substantial benefits can be achieved in terms of computational time without sacrificing solution quality using partial data.

  • PDF

Efficient Algorithms for Mining Association Rules Under the Interactive Environments (대화형 환경에서 효율적인 연관 규칙 알고리즘)

  • Lee, Jae-Moon
    • The KIPS Transactions:PartD
    • /
    • v.8D no.4
    • /
    • pp.339-346
    • /
    • 2001
  • A problem for mining association rules under the interactive environments is to mine repeatedly association rules with the different minimum support. This problem includes all subproblems except on the facts that mine repeatedly association rules with the s믇 database. This paper proposed the efficient algorithms to improve the performance by using the information of the candidate large itemsets which calculate the previous association rules. The proposed algorithms were compared with the conventional algorithm with respect to the execution time. The comparisons show that the proposed algorithms achieve 10∼30% more gain than the conventional algorithm.

  • PDF

Customized Query Recommendation by Agent Based on User's Query Pattern (사용자 질의패턴 기반 에이전트에 의한 맞춤형 질의추천)

  • Lim, Yo-Han;Park, Gun-Woo;Lee, Sang-Hoon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06b
    • /
    • pp.200-204
    • /
    • 2008
  • 검색엔진을 사용해 질의를 입력 후 사용자가 원하는 정보를 얻을 때까지의 검색 결과정보의 탐색 범위에 대해 설문한 연구 보고서에 검색 결과정보의 첫 페이지만 보는 사용자가 설문인원의 41%를 차지했고, 상위 3페이지만 사용하는 사용자는 88%에 달한다고 하였다. 따라서 검색결과의 상위순위는 사용자의 정보 존재여부를 판단하는 중요한 척도가 된다. 또한 인터넷의 방대한 정보로 인해 정보 홍수에 빠진 사람들은 정보에 대한 까다로운 요구를 하고 있다. 이를 테면 개인화 또는 맞춤화된 정보를 제공 받기를 원하고 있다. 정보검색시 대다수의 사용자들은 질의의 길이를 2단어 이하의 키워드를 사용하여 질의가 특정한 토픽을 지향하도록 하고 있다. 본 논문에서는 데이터 마이닝의 연관규칙을 적용 사용자 프로파일 DB내 질의에 대한 사용자 질의패턴을 분석하여 '분석 Agent' 통한 연관 질의 리스트를 생성하고 '추천 Agent'는 사용자들의 취향변화 즉 시간에 따라 변하는 관심영역 또는 사용자 질의 변화에 대해서 날짜별 가중치를 부여하여 사용자와 상호교류를 통해 사용자에게 맞춤형 질의를 추천하는 방안을 제시하고자 한다.

  • PDF