• 제목/요약/키워드: Data mining analysis

검색결과 2,174건 처리시간 0.047초

텍스트마이닝을 활용한 사용자 요구사항 우선순위 도출 방법론 : 온라인 게임을 중심으로 (Analysis of User Requirements Prioritization Using Text Mining : Focused on Online Game)

  • 정미연;허선우;백동현
    • 산업경영시스템학회지
    • /
    • 제43권3호
    • /
    • pp.112-121
    • /
    • 2020
  • Recently, as the internet usage is increasing, accordingly generated text data is also increasing. Because this text data on the internet includes users' comments, the text data on the Internet can help you get users' opinion more efficiently and effectively. The topic of text mining has been actively studied recently, but it primarily focuses on either the content analysis or various improving techniques mostly for the performance of target mining algorithms. The objective of this study is to propose a novel method of analyzing the user's requirements by utilizing the text-mining technique. To complement the existing survey techniques, this study seeks to present priorities together with efficient extraction of customer requirements from the text data. This study seeks to identify users' requirements, derive the priorities of requirements, and identify the detailed causes of high-priority requirements. The implications of this study are as follows. First, this study tried to overcome the limitations of traditional investigations such as surveys and VOCs through text mining of online text data. Second, decision makers can derive users' requirements and prioritize without having to analyze numerous text data manually. Third, user priorities can be derived on a quantitative basis.

시공간 데이터를 위한 클러스터링 기법 성능 비교 (Performance Comparison of Clustering Techniques for Spatio-Temporal Data)

  • 강나영;강주영;용환승
    • 지능정보연구
    • /
    • 제10권2호
    • /
    • pp.15-37
    • /
    • 2004
  • 최근 데이터 양이 급증하면서 데이터 마이닝에 대한 연구가 활발하게 진행되고 있으며 특히 GPS 시스템, 감시시스템, 기상 관측 시스템과 같은 다양한 응용 시스템으로부터 수집된 데이터를 분석하고자 하는 시공간 데이터 마이닝 연구에 대한 관심이 더욱 높아지고 있다. 기존의 시공간 데이터 마이닝 연구들에서는 비시공간 데이터 기반의 일반적인 클러스터링 기법들을 그대로 적용하고 있으나 데이터의 속성이 다른 시공간 데이터 마이닝에서 기존의 알고리즘들이 어느 정도의 성능을 보장하는지, 데이터의 시공간 속성에 따라 적절한 마이닝 알고리즘을 선택하기 위한 기준이 무엇인지 등에 대한 연구는 미흡한 실정이다. 본 논문에서는 기존의 시공간 데이터 마이닝 연구에서 일반적으로 많이 사용되어 온 알고리즘인 SOM(Self-Organizing Map)을 기반으로 시공간 데이터 마이닝 모듈을 개발하고, 개발된 클러스터링 모듈의 성능을 K-means과 두 가지 응집 계층(Hierarchical Agglomerative) 알고리즘들과 균질도, 분리도, 반면영상 너비, 정확도의 네 가지 평가 기준을 기반으로 비교하였다. 또한 입력 데이터의 특성 가시화 및 클러스터링 결과의 정확한 분석을 위해 시공간 데이터 클러스터링을 위한 가시화 모듈을 개발하였다.

  • PDF

오피니언 마이닝 기반 SNS 감성 정보 분석 전략 설계 (A Design of SNS Emotional Information Analysis Strategy based on Opinion Mining)

  • 정은희;이병관
    • 한국정보전자통신기술학회논문지
    • /
    • 제8권6호
    • /
    • pp.544-550
    • /
    • 2015
  • 현재, SNS으로 소통되는 의견들이 증가하고 있기 때문에 SNS 메시지로부터 의미 있는 정보를 유추해내는 오피니언 마이닝(Opinion mining) 기술이 중요해지고 있다. 본 논문은 반의어와 부사의 위치에 따라 가중치를 다르게 설정하여 SNS의 감성 정보를 정확하게 추출하는 오피니언 마이닝 기반 SNS 감성 정보 분석 전략(SEIAS, SNS Emotional Information Analysis Strategy)을 제안한다. 제안하는 SEIAS(SNS Emotional Information Analysis Strategy)는 첫째, 오피니언 마이닝 분석에 필요한 감성사전을 구축하고, 둘째, SNS 데이터를 실시간으로 수집하고, 수집된 SNS 데이터와 감성사전를 비교하여 SNS 데이터의 의견값을 산출한다. 특히, 데이터의 의견값을 산출할 때, 반의어, 부사의 위치에 따라 가중값을 다르게 설정함으로써 기존의 SO-PMI와 비교하였을 때 오피니언 분석결과의 정확도를 향상시켰다.

구매의도 생성 순서와 구매실현 순서의 역전 현상을 감안한 확장된 순차분석 방법론 (An Investigation on Expanding Traditional Sequential Analysis Method by Considering the Reversion of Purchase Realization Order)

  • 김민석;김남규
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제22권3호
    • /
    • pp.25-42
    • /
    • 2013
  • Recently various kinds of Information Technology services are created and the quantities of the data flow are increase rapidly. Not only that, but the data patterns that we deal with also slowly becoming diversity. As a result, the demand of discover the meaningful knowledge/information through the various mining analysis such as linkage analysis, sequencing analysis, classification and prediction, has been steadily increasing. However, solving the business problems using data mining analysis does not always concerning, one of the major causes of these limitations is there are some analyzed data can't accurately reflect the real world phenomenon. For example, although the time gap of purchasing the two products is very short, by using the traditional sequencing analysis, the precedence relationship of the two products is clearly reflected. But in the real world, with the very short time interval, the precedence relationship of the two purchases might not be defined. What was worse, the sequence of the purchase intention and the sequence of the purchase realization of the two products might be mutually be reversed. Therefore, in this study, an expanded sequencing analysis methodology has been proposed in order to reflect this situation. In this proposed methodology, the purchases that being made in a very short time interval among the purchase order which might not important will be notice, and the analysis which included the original sequence and reversed sequence will be used to extend the analysis of the data. Also, to some extent a very short time interval can be defined as the time interval, so an experiment were carried out to determine the varying based on the time interval for the actual data.

피에이치피와 웨카를 이용한 데이터마이닝 도구의 설계 및 구현 (Design and implementation of data mining tool using PHP and WEKA)

  • 유영재;박희창
    • Journal of the Korean Data and Information Science Society
    • /
    • 제20권2호
    • /
    • pp.425-433
    • /
    • 2009
  • 데이터마이닝은 방대한 양의 데이터 속에서 유용한 정보를 찾아내는 과정이며, 이를 위해 데이터마이닝 도구가 필요하다. 데이터마이닝 도구 또는 솔루션은 E-Miner, Clementine, WEKA, R 등 상당히 많은 종류가 있으나 대부분의 데이터마이닝 도구는 다양성과 범용성에 초점을 맞추어 개발되어 사용 편의성과 분석 자동화에 대해서는 소홀한 실정이어서 비전문가가 사용하기 어려운 경우가 대부분이다. 본 논문에서는 피에이치피와 웨카를 이용하여 인터넷 환경에서 데이터마이닝 기법을 실행하고, 생성된 분석결과를 보다 쉽게 해석할 수 있도록 개선하여 일반 사용자도 쉽게 사용할 수 있는 시스템을 설계하고 구현하고자 한다. 본 논문에서 구현하는 데이터마이닝 기법은 가장 많이이용되고 있는 연관성 규칙의 Apriori 알고리즘, 군집분석의 K-평균 알고리즘, 의사결정나무의 J48 알고리즘 등이다.

  • PDF

A Comparison of Clustering Algorithm in Data Mining

  • Lee, Yung-Seop;An, Mi-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권4호
    • /
    • pp.725-736
    • /
    • 2003
  • To provide the information needed to make a decision, it is important to know the relationship or pattern between variables in database. Grouping objects which have similar characteristics of pattern is called as cluster analysis, one of data mining techniques. In this study, it is compared with several partitioning clustering algorithms, based on the statistical distance or total variance in each cluster.

  • PDF

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • 제3권2호
    • /
    • pp.18-24
    • /
    • 2007
  • Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Comparison of Multiway Discretization Algorithms for Data Mining

  • Kim, Jeong-Suk;Jang, Young-Mi;Na, Jong-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권4호
    • /
    • pp.801-813
    • /
    • 2005
  • The discretization algorithms for continuous data have been actively studied in the area of data mining. These discretizations are very important in data analysis, especially for efficient model selection in data mining. So, in this paper, we introduce the principles of some mutiway discretization algorithms including KEX, 1R and CN4 algorithm and investigate the efficiency of these algorithms through numerical study. For various underlying distribution, we compare these algorithms in view of misclassification rate.

  • PDF

국민건강영양조사 자료를 이용한 만성신장질환 분류기법 연구 (The Study of Chronic Kidney Disease Classification using KHANES data)

  • 이홍기;명성민
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2020년도 제61차 동계학술대회논문집 28권1호
    • /
    • pp.271-272
    • /
    • 2020
  • Data mining is known useful in medical area when no availability of evidence favoring a particular treatment option is found. Huge volume of structured/unstructured data is collected by the healthcare field in order to find unknown information or knowledge for effective diagnosis and clinical decision making. The data of 5,179 records considered for analysis has been collected from Korean National Health and Nutrition Examination Survey(KHANES) during 2-years. Data splitting, referred as the training and test sets, was applied to predict to fit the model. We analyzed to predict chronic kidney disease (CKD) using data mining method such as naive Bayes, logistic regression, CART and artificial neural network(ANN). This result present to select significant features and data mining techniques for the lifestyle factors related CKD.

  • PDF

Data Mining Research on Maehwado Painting Poetry in the Early Joseon Dynasty

  • Haeyoung Park;Younghoon An
    • Journal of Information Processing Systems
    • /
    • 제19권4호
    • /
    • pp.474-482
    • /
    • 2023
  • Data mining is a technique for extracting valuable information from vast amounts of data by analyzing statistical and mathematical operations, rules, and relationships. In this study, we employed data mining technology to analyze the data concerning the painting poetry of Maehwado (plum blossom paintings) from the early Joseon Dynasty. The data was extracted from the Hanguk Munjip Chonggan (Korean Literary Collections in Classical Chinese) in the Hanguk Gojeon Jonghap database (Korea Classics DB). Using computer information processing techniques, we carried out web scraping and classification of the painting poetry from the Hanguk Munjip Chonggan. Subsequently, we narrowed down our focus to the painting poetry specifically related to Maehwado in the early Joseon Dynasty. Based on this, refined dataset, we conducted an in-depth analysis and interpretation of the text data at the syllable corpus level. As a result, we found a direct correlation between the corpus statistics for each syllable in Maehwado painting poetry and the symbolic meaning of plum blossoms.