• 제목/요약/키워드: Data mining analysis

검색결과 2,174건 처리시간 0.028초

Predicting stock price direction by using data mining methods : Emphasis on comparing single classifiers and ensemble classifiers

  • Eo, Kyun Sun;Lee, Kun Chang
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권11호
    • /
    • pp.111-116
    • /
    • 2017
  • This paper proposes a data mining approach to predicting stock price direction. Stock market fluctuates due to many factors. Therefore, predicting stock price direction has become an important issue in the field of stock market analysis. However, in literature, there are few studies applying data mining approaches to predicting the stock price direction. To contribute to literature, this paper proposes comparing single classifiers and ensemble classifiers. Single classifiers include logistic regression, decision tree, neural network, and support vector machine. Ensemble classifiers we consider are adaboost, random forest, bagging, stacking, and vote. For the sake of experiments, we garnered dataset from Korea Stock Exchange (KRX) ranging from 2008 to 2015. Data mining experiments using WEKA revealed that random forest, one of ensemble classifiers, shows best results in terms of metrics such as AUC (area under the ROC curve) and accuracy.

연관분석을 이용한 데이터마이닝 기법에 관한 사례연구

  • 류귀열;문영수;최승두
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2006년도 PROCEEDINGS OF JOINT CONFERENCEOF KDISS AND KDAS
    • /
    • pp.109-120
    • /
    • 2006
  • 본 연구에서는 RFM 분석을 통하여 전체 고객들을 점수화(scoring)하고 이를 다시 5개의 그룹 (최우수그룹, 우수그룹, 일반그룹, 하위그룹, 최하위그룹)으로 세분화하고, 세분그룹별 유의성을 검정한다. 이렇게 분류된 5개의 세분화그룹들은 연관분석과 의사결정나무 등을 통하여 고객들의 인구학적 변수와 자 그룹별 유의한 변수들의 패턴을 찾아냄으로써 우수 고객들을 유지하기 위해 서는 어떻게 해야 하며, 경쟁업체로 떠날 가능성이 높은 고객은 누구이며, 이러한 이유가 무엇인지에 대하여 효과적인 분석을 할 수 있는 기반이 조성된다. 본 연구의 목적은 통하여 연관규칙(association rules)과 의사결정나무(decision tree)를 비친 분석을 함으로써, 이론적으로 설명할 수 없는 복잡한 세분그룹의 특성들에 대해 효과적으로 파악하는 방법을 제시하는 것이다.

  • PDF

데이터 스트림 마이닝에서 정보 중요성 차별화를 위한 퍼지 윈도우 기법 (A Fuzzy Window Mechanism for Information Differentiation in Mining Data Streams)

  • 장중혁
    • 한국산학기술학회논문지
    • /
    • 제12권9호
    • /
    • pp.4183-4191
    • /
    • 2011
  • 구성요소가 지속적으로 생성되고 시간 흐름에 따라 변화되기도 하는 데이터 스트림의 특성을 고려하여 데이터 스트림 구성요소의 중요성을 발생 시간에 따라 차별화하기 위한 기법들이 활발히 제안되어 왔다. 기존의 방법들은 최근에 발생된 정보에 집중된 분석 결과를 제공하는데 효과적이나 보다 유연하게 다양한 형태로 정보 중요성을 차별화하는데 한계가 있다. 퍼지 개념에 기반한 정보 중요성 차별화는 이러한 한계를 보완하는 좋은 대안이 될 수 있다. 퍼지 개념은 기존의 뚜렷한 경계를 갖는 접근법의 문제점을 극복하고 실세계의 요구에 보다 부합되는 결과를 제공할 수 있는 방법으로 여러 데이터 마이닝 분야에서 널리 적용되어 왔다. 본 논문에서는 퍼지 개념을 적용하여 데이터 스트림 마이닝에서 정보 중요성 차별화에 효율적으로 활용될 수 있는 퍼지 윈도우 기법을 제안한다. 퍼지 캘린더를 포함한 기본적인 퍼지 개념에 대해서 먼저 기술하고, 다음으로 데이터 스트림 마이닝에서 퍼지 윈도우 기법을 적용한 가중치 패턴 탐색에 대한 세부 내용을 기술한다.

Safety Culture: A Retrospective Analysis of Occupational Health and Safety Mining Reports

  • Tetzlaff, Emily J.;Goggins, Katie A.;Pegoraro, Ann L.;Dorman, Sandra C.;Pakalnis, Vic;Eger, Tammy R.
    • Safety and Health at Work
    • /
    • 제12권2호
    • /
    • pp.201-208
    • /
    • 2021
  • Background: In the mining industry, various methods of accident analysis have utilized official accident investigations to try and establish broader causation mechanisms. An emerging area of interest is identifying the extent to which cultural influences, such as safety culture, are acting as drivers in the reoccurrence of accidents. Thus, the overall objective of this study was to analyze occupational health and safety (OHS) reports in mining to investigate if/how safety culture has historically been framed in the mining industry, as it relates to accident causation. Methods: Using a computer-assisted qualitative data analysis software, 34 definitions of safety culture were analyzed to highlight key terms. Based on word count and contextual relevance, 26 key terms were captured. Ten OHS reports were then analyzed via an inductive thematic analysis, using the key terms. This analysis provided a concept map representing the 50-year data set and facilitated the use of text framing to highlight safety culture in the selected OHS mining reports. Results: Overall, 954 references and six themes, safety culture, attitude, competence, belief, patterns, and norms, were identified in the data set. Of the 26 key terms originally identified, 24 of them were captured within the text. The results made evident two distinct frames in which to interpret the data: the role of the individual and the role of the organization, in safety culture. Conclusion: Unless efforts are made to understand and alter cultural drivers and share these findings within and across industries, the same accidents are likely to continue to occur.

Twostep Clustering of Environmental Indicator Survey Data

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권1호
    • /
    • pp.1-11
    • /
    • 2006
  • Data mining technique is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. It has been widely used in many applications, such that pattern analysis or recognition, data analysis, image processing, market research on off-line or on-line and so on. We analyze Gyeongnam social indicator survey data by 2001 using twostep clustering technique for environment information. The twostep clustering is classified as a partitional clustering method. We can apply these twostep clustering outputs to environmental preservation and improvement.

  • PDF

대규모 궤적 데이타를 위한 데이타 마이닝 툴 (A Data Mining Tool for Massive Trajectory Data)

  • 이재길
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제15권3호
    • /
    • pp.145-153
    • /
    • 2009
  • 궤적(trajectory) 데이타는 실세계 어디에서든지 쉽게 찾아볼 수 있다. 최근 들어, 위성, 센서, RFID, 비디오 및 무선 통신 기술의 발전으로 말미암아 이동 객체를 체계적으로 추적하고, 많은 양의 궤적데이타를 수집할 수 있게 되었다. 이에 따라, 궤적 데이타의 분석에 대한 필요성이 점차 증대되고 있다. 본 논문에서는 대규모 궤적 데이타를 위한 마이닝 툴을 개발한다. 본 마이닝 툴에서는 가장 널리 사용되는 마이닝 연산인 집단화(clustering), 분류(classification), 이상치 발견(outlier detection)을 제공한다. 궤적 집단화는 공통적인 이동 패턴을 발견하며, 궤적 분류는 궤적에 기반하여 이동 객체의 범주를 예측하며, 궤적 이상치 발견은 나머지 궤적들과 크게 다르거나 일관적이지 않은 궤적을 발견한다. 본 마이닝 툴의 가장 큰 장점은 데이타 마이닝 도중에 부분 궤적 정보를 활용한다는 점이다. 본 마이닝 툴의 우수성은 다양한 실제 궤적 데이타 셋을 사용하여 입증되었다. 본 논문의 결과로 궤적 데이타 마이닝을 위한 실용적인 소프트웨어를 개발하였고 많은 실제 응용에 적용될 수 있을 것이라 사료된다.

Finding Naval Ship Maintenance Expertise Through Text Mining and SNA

  • Kim, Jin-Gwang;Yoon, Soung-woong;Lee, Sang-Hoon
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권7호
    • /
    • pp.125-133
    • /
    • 2019
  • Because military weapons systems for special purposes are small and complex, they are not easy to maintain. Therefore, it is very important to maintain combat strength through quick maintenance in the event of a breakdown. In particular, naval ships are complex weapon systems equipped with various equipment, so other equipment must be considered for maintenance in the event of equipment failure, so that skilled maintenance personnel have a great influence on rapid maintenance. Therefore, in this paper, we analyzed maintenance data of defense equipment maintenance information system through text mining and social network analysis(SNA), and tried to identify the naval ship maintenance expertise. The defense equipment maintenance information system is a system that manages military equipment efficiently. In this study, the data(2,538cases) of some naval ship maintenance teams were analyzed. In detail, we examined the contents of main maintenance and maintenance personnel through text mining(word cloud, word network). Next, social network analysis(collaboration analysis, centrality analysis) was used to confirm the collaboration relationship between maintenance personnel and maintenance expertise. Finally, we compare the results of text mining and social network analysis(SNA) to find out appropriate methods for finding and finding naval ship maintenance expertise.

트래픽 데이터의 시계열 분석을 위한 데이터 마이닝 기법 (Data Mining Technique for Time Series Analysis of Traffic Data)

  • 김철;이도헌
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2001년도 하계종합학술대회 논문집(3)
    • /
    • pp.59-62
    • /
    • 2001
  • This paper discusses a data mining technique for time series analysis of traffic data, which provides useful knowledge for network configuration management. Commonly, a network designer must employ a combination of heuristic algorithms and analysis in an interactive manner until satisfactory solutions are obtained. The problem of heuristic algorithms is that it is difficult to deal with large networks and simplification or assumptions have to be made to make them solvable. Various data mining techniques are studied to gain valuable knowledge in large and complex telecommunication networks. In this paper, we propose a traffic pattern association technique among network nodes, which produces association rules of traffic fluctuation patterns among network nodes. Discovered rules can be utilized for improving network topologies and dynamic routing performance.

  • PDF

Artificial Neural Networks for Interest Rate Forecasting based on Structural Change : A Comparative Analysis of Data Mining Classifiers

  • Oh, Kyong-Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권3호
    • /
    • pp.641-651
    • /
    • 2003
  • This study suggests the hybrid models for interest rate forecasting using structural changes (or change points). The basic concept of this proposed model is to obtain significant intervals caused by change points, to identify them as the change-point groups, and to reflect them in interest rate forecasting. The model is composed of three phases. The first phase is to detect successive structural changes in the U. S. Treasury bill rate dataset. The second phase is to forecast the change-point groups with data mining classifiers. The final phase is to forecast interest rates with backpropagation neural networks (BPN). Based on this structure, we propose three hybrid models in terms of data mining classifier: (1) multivariate discriminant analysis (MDA)-supported model, (2) case-based reasoning (CBR)-supported model, and (3) BPN-supported model. Subsequently, we compare these models with a neural network model alone and, in addition, determine which of three classifiers (MDA, CBR and BPN) can perform better. For interest rate forecasting, this study then examines the prediction ability of hybrid models to reflect the structural change.

  • PDF

Twostep Clustering of Environmental Indicator Survey Data

  • Park, Hee-Chang
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2005년도 추계학술대회
    • /
    • pp.59-69
    • /
    • 2005
  • Data mining technique is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. It has been widely used in many applications, such that pattern analysis or recognition, data analysis, image processing, market research on off-line or on-line and so on. We analyze Gyeongnam social indicator survey data by 2001 using twostep clustering technique for environment information. The twostep clustering is classified as a partitional clustering method. We can apply these twostep clustering outputs to environmental preservation and improvement.

  • PDF