• Title/Summary/Keyword: Pattern mining

Search Result 624, Processing Time 0.051 seconds

웹 로그(Web Log) 분석을 통한 정보의 활용

  • 김석기;안정용;한경수;한범수
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.123-127
    • /
    • 2000
  • 인터넷이 데이터 저장 및 서비스를 위한 도구로 폭넓게 활용되고 있으며, 이 과정에서 웹 서버 방문객에 대한 정보인 로그가 발생된다. 이러한 로그는 방문객 주소, 참조 페이지, 방문 시각 등의 정보를 포함하고 있다. 웹 로그에 대하여 패턴분석(pattern analysis), 군집분석(clustering), 판별분석(classification) 등의 통계적 분석을 통하여 방문객이 관심을 가지는 항목이나 항목간의 연관관계 등 새로운 정보를 생성하여 웹 디자인 또는 비즈니스에의 적용에 대한 연구가 활발히 논의되고 있다. 본 연구에서는 웹 로그 분석에 대하여 소개하고 웹 로그 분석을 위한 방안을 제시하고자 한다.

  • PDF

K-means Clustering using Grid-based Representatives

  • Park, Hee-Chang;Lee, Sun-Myung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.759-768
    • /
    • 2005
  • K-means clustering has been widely used in many applications, such that pattern analysis, data analysis, market research and so on. It can identify dense and sparse regions among data attributes or object attributes. But k-means algorithm requires many hours to get k clusters, because it is more primitive and explorative. In this paper we propose a new method of k-means clustering using the grid-based representative value(arithmetic and trimmed mean) for sample. It is more fast than any traditional clustering method and maintains its accuracy.

  • PDF

Mining Korean-English Terminologies by Pattern Generation in Internet (패턴생성을 통한 인터넷 문서의 한글-영문용어 추출)

  • 강재호;김종성;류광렬
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.148-150
    • /
    • 2003
  • 전문용어의 가짓수가 많고 생성빈도 또한 높은 분야에서 고품질의 정보검색과 기계번역 결과를 얻기 위해서는 상당 분량의 번역용어사전의 확보가 필수적이다. 이러한 분야에서 번역용어사전을 수작업으로 구축하는 것은 큰 부담이 된다. 본 논문에서는 이미 알고 있는 용어(원어)와 번역용어를 말뭉치에서 함께 표기한 부분을 찾아 패턴화하는 작업과, 생성된 패턴으로 추가의 용어-번역용어를 추출하는 작업을 반복하여 수행함으로써 번역용어사전을 자동으로 구축하는 방안을 제안한다. 인터넷 문서를 대상으로 본 제안방법을 적용해 본 결과 상당분량의 유효한 한글-영문용어들을 추출할 수 있었다.

  • PDF

Web Access Pattern Mining considering Page Visiting Duration Time (페이지 소요 시간을 고려한 웹 액세스 패턴 마이닝)

  • 성현정;용환승
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.55-57
    • /
    • 2001
  • 웹로그 마이닝은 대용량의 웹로그 데이터로부터 웹액세스 패턴을 추출함으로써 사용자의 행등 패턴을 찾아내는데 이러한 작업은 웹사이트 설계상의 문제점 등을 발견 및 보완하거나 사용자에게 개인화 페이지를 제공하는데 이용될 수 있다. 사용자의 관심도를 반영하는 웹액세스 패턴을 추출할 때 페이지의 액세스 횟수 뿐만 아니라 페이지의 소요 시간까지 고려함으로써 더욱 정확한 액세스 패턴을 추출하는 것이 본 논문의 목적이다.

  • PDF

Regular Pattern Mining with Multiple Minimum Supports (다중 최소 임계치를 이용한 정규 패턴 마이닝)

  • Choi, Hyong-Gil;Lee, Sang-Jun
    • Annual Conference of KIPS
    • /
    • 2013.11a
    • /
    • pp.1061-1063
    • /
    • 2013
  • 기존의 많은 빈발 패턴 마이닝은 단일 최소 임계치를 전체 트랜잭션 데이터베이스의 각 아이템에 똑같이 적용하고 빈발 패턴을 마이닝해왔다. 단일 최소 임계치를 설정함으로써, 모든 아이템이 동일한 임계치가 적용되므로 레어 아이템 문제가 발생한다. 한편, 일정 주기마다 발생하는 정규 패턴이라고 한다. 실 세계에서는 빈발한 아이템 뿐만 아니라 주기적으로 발생하는 패턴정보의 필요성이 증가하고 있다. 본 논문은 레어 아이템 문제를 해결하는 빈발한 정규 패턴을 마이닝하는 기법을 제시한다.

The Pattern Analysis of Financial Distress for Non-audited Firms using Data Mining (데이터마이닝 기법을 활용한 비외감기업의 부실화 유형 분석)

  • Lee, Su Hyun;Park, Jung Min;Lee, Hyoung Yong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.111-131
    • /
    • 2015
  • There are only a handful number of research conducted on pattern analysis of corporate distress as compared with research for bankruptcy prediction. The few that exists mainly focus on audited firms because financial data collection is easier for these firms. But in reality, corporate financial distress is a far more common and critical phenomenon for non-audited firms which are mainly comprised of small and medium sized firms. The purpose of this paper is to classify non-audited firms under distress according to their financial ratio using data mining; Self-Organizing Map (SOM). SOM is a type of artificial neural network that is trained using unsupervised learning to produce a lower dimensional discretized representation of the input space of the training samples, called a map. SOM is different from other artificial neural networks as it applies competitive learning as opposed to error-correction learning such as backpropagation with gradient descent, and in the sense that it uses a neighborhood function to preserve the topological properties of the input space. It is one of the popular and successful clustering algorithm. In this study, we classify types of financial distress firms, specially, non-audited firms. In the empirical test, we collect 10 financial ratios of 100 non-audited firms under distress in 2004 for the previous two years (2002 and 2003). Using these financial ratios and the SOM algorithm, five distinct patterns were distinguished. In pattern 1, financial distress was very serious in almost all financial ratios. 12% of the firms are included in these patterns. In pattern 2, financial distress was weak in almost financial ratios. 14% of the firms are included in pattern 2. In pattern 3, growth ratio was the worst among all patterns. It is speculated that the firms of this pattern may be under distress due to severe competition in their industries. Approximately 30% of the firms fell into this group. In pattern 4, the growth ratio was higher than any other pattern but the cash ratio and profitability ratio were not at the level of the growth ratio. It is concluded that the firms of this pattern were under distress in pursuit of expanding their business. About 25% of the firms were in this pattern. Last, pattern 5 encompassed very solvent firms. Perhaps firms of this pattern were distressed due to a bad short-term strategic decision or due to problems with the enterpriser of the firms. Approximately 18% of the firms were under this pattern. This study has the academic and empirical contribution. In the perspectives of the academic contribution, non-audited companies that tend to be easily bankrupt and have the unstructured or easily manipulated financial data are classified by the data mining technology (Self-Organizing Map) rather than big sized audited firms that have the well prepared and reliable financial data. In the perspectives of the empirical one, even though the financial data of the non-audited firms are conducted to analyze, it is useful for find out the first order symptom of financial distress, which makes us to forecast the prediction of bankruptcy of the firms and to manage the early warning and alert signal. These are the academic and empirical contribution of this study. The limitation of this research is to analyze only 100 corporates due to the difficulty of collecting the financial data of the non-audited firms, which make us to be hard to proceed to the analysis by the category or size difference. Also, non-financial qualitative data is crucial for the analysis of bankruptcy. Thus, the non-financial qualitative factor is taken into account for the next study. This study sheds some light on the non-audited small and medium sized firms' distress prediction in the future.

Investigation of Topic Trends in Computer and Information Science by Text Mining Techniques: From the Perspective of Conferences in DBLP (텍스트 마이닝 기법을 이용한 컴퓨터공학 및 정보학 분야 연구동향 조사: DBLP의 학술회의 데이터를 중심으로)

  • Kim, Su Yeon;Song, Sung Jeon;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.32 no.1
    • /
    • pp.135-152
    • /
    • 2015
  • The goal of this paper is to explore the field of Computer and Information Science with the aid of text mining techniques by mining Computer and Information Science related conference data available in DBLP (Digital Bibliography & Library Project). Although studies based on bibliometric analysis are most prevalent in investigating dynamics of a research field, we attempt to understand dynamics of the field by utilizing Latent Dirichlet Allocation (LDA)-based multinomial topic modeling. For this study, we collect 236,170 documents from 353 conferences related to Computer and Information Science in DBLP. We aim to include conferences in the field of Computer and Information Science as broad as possible. We analyze topic modeling results along with datasets collected over the period of 2000 to 2011 including top authors per topic and top conferences per topic. We identify the following four different patterns in topic trends in the field of computer and information science during this period: growing (network related topics), shrinking (AI and data mining related topics), continuing (web, text mining information retrieval and database related topics), and fluctuating pattern (HCI, information system and multimedia system related topics).

Data Bias Optimization based Association Reasoning Model for Road Risk Detection (도로 위험 탐지를 위한 데이터 편향성 최적화 기반 연관 추론 모델)

  • Ryu, Seong-Eun;Kim, Hyun-Jin;Koo, Byung-Kook;Kwon, Hye-Jeong;Park, Roy C.;Chung, Kyungyong
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.9
    • /
    • pp.1-6
    • /
    • 2020
  • In this study, we propose an association inference model based on data bias optimization for road hazard detection. This is a mining model based on association analysis to collect user's personal characteristics and surrounding environment data and provide traffic accident prevention services. This creates transaction data composed of various context variables. Based on the generated information, a meaningful correlation of variables in each transaction is derived through correlation pattern analysis. Considering the bias of classified categorical data, pruning is performed with optimized support and reliability values. Based on the extracted high-level association rules, a risk detection model for personal characteristics and driving road conditions is provided to users. This enables traffic services that overcome the data bias problem and prevent potential road accidents by considering the association between data. In the performance evaluation, the proposed method is excellently evaluated as 0.778 in accuracy and 0.743 in the Kappa coefficient.

WebPR : A Dynamic Web Page Recommendation Algorithm Based on Mining Frequent Traversal Patterns (WebPR :빈발 순회패턴 탐사에 기반한 동적 웹페이지 추천 알고리즘)

  • Yoon, Sun-Hee;Kim, Sam-Keun;Lee, Chang-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.2
    • /
    • pp.187-198
    • /
    • 2004
  • The World-Wide Web is the largest distributed Information space and has grown to encompass diverse information resources. However, although Web is growing exponentially, the individual's capacity to read and digest contents is essentially fixed. From the view point of Web users, they can be confused by explosion of Web information, by constantly changing Web environments, and by lack of understanding needs of Web users. In these Web environments, mining traversal patterns is an important problem in Web mining with a host of application domains including system design and Information services. Conventional traversal pattern mining systems use the inter-pages association in sessions with only a very restricted mechanism (based on vector or matrix) for generating frequent k-Pagesets. We develop a family of novel algorithms (termed WebPR - Web Page Recommend) for mining frequent traversal patterns and then pageset to recommend. Our algorithms provide Web users with new page views, which Include pagesets to recommend, so that users can effectively traverse its Web site. The main distinguishing factors are both a point consistently spanning schemes applying inter-pages association for mining frequent traversal patterns and a point proposing the most efficient tree model. Our experimentation with two real data sets, including Lady Asiana and KBS media server site, clearly validates that our method outperforms conventional methods.

Multiple SVM Classifier for Pattern Classification in Data Mining (데이터 마이닝에서 패턴 분류를 위한 다중 SVM 분류기)

  • Kim Man-Sun;Lee Sang-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.3
    • /
    • pp.289-293
    • /
    • 2005
  • Pattern classification extracts various types of pattern information expressing objects in the real world and decides their class. The top priority of pattern classification technologies is to improve the performance of classification and, for this, many researches have tried various approaches for the last 40 years. Classification methods used in pattern classification include base classifier based on the probabilistic inference of patterns, decision tree, method based on distance function, neural network and clustering but they are not efficient in analyzing a large amount of multi-dimensional data. Thus, there are active researches on multiple classifier systems, which improve the performance of classification by combining problems using a number of mutually compensatory classifiers. The present study identifies problems in previous researches on multiple SVM classifiers, and proposes BORSE, a model that, based on 1:M policy in order to expand SVM to a multiple class classifier, regards each SVM output as a signal with non-linear pattern, trains the neural network for the pattern and combine the final results of classification performance.