• Title/Summary/Keyword: FP-Tree

Search Result 48, Processing Time 0.029 seconds

Mining Technique of Tour Destination by weighted FP-tree (가중치가 부여된 FP-tree를 이용한 여행지 추출 기법)

  • MinJu Kim;EunJu Lee;Eung-Mo Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.233-236
    • /
    • 2008
  • 최근 컴퓨터와 통신의 기술이 빠르게 발달함에 따라 사회 각 부분은 그동안 경험하지 못했던 정보화라는 새로운 변화를 겪었다. 그 결과 정보화 수준이 점점 고도화 될수록 더욱 다양하고 방대한 데이터가 생성되어 데이터베이스를 이루게 되었다. 방대한 데이터에서 유용한 정보를 얻는 데이터마이닝 기법이 중요한 문제로 대두되었다. 데이터마이닝 기법은 점점 더 많은 분야에서 합리적인 선택을 위해 필수적으로 사용된다. 본 논문은 마이닝 기법을 적용하여 방대한 데이터베이스가 최적의 여행 경로 선택을 제공한다. 본 논문은 빈발 패턴 증가 기법에 가중치를 두어 여행자가 여행지를 선별하기 좋은 환경을 제공한다. 미래 산업 중 가장 중요한 산업 중 하나인 관광 산업은 계속적으로 성장하고 있으며 논문에서 제시하는 데이터 마이닝 기법으로 더 큰 발전을 기대한다.

High Utility Pattern Mining using a Prefix-Tree (Prefix-Tree를 이용한 높은 유틸리티 패턴 마이닝 기법)

  • Jeong, Byeong-Soo;Ahmed, Chowdhury Farhan;Lee, In-Gi;Yong, Hwan-Seong
    • Journal of KIISE:Databases
    • /
    • v.36 no.5
    • /
    • pp.341-351
    • /
    • 2009
  • Recently high utility pattern (HUP) mining is one of the most important research issuer in data mining since it can consider the different weight Haloes of items. However, existing mining algorithms suffer from the performance degradation because it cannot easily apply Apriori-principle for pattern mining. In this paper, we introduce new high utility pattern mining approach by using a prefix-tree as in FP-Growth algorithm. Our approach stores the weight value of each item into a node and utilizes them for pruning unnecessary patterns. We compare the performance characteristics of three different prefix-tree structures. By thorough experimentation, we also prove that our approach can give performance improvement to a degree.

Pattern Analysis of Traffic Accident data and Prediction of Victim Injury Severity Using Hybrid Model (교통사고 데이터의 패턴 분석과 Hybrid Model을 이용한 피해자 상해 심각도 예측)

  • Ju, Yeong Ji;Hong, Taek Eun;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.5 no.4
    • /
    • pp.75-82
    • /
    • 2016
  • Although Korea's economic and domestic automobile market through the change of road environment are growth, the traffic accident rate has also increased, and the casualties is at a serious level. For this reason, the government is establishing and promoting policies to open traffic accident data and solve problems. In this paper, describe the method of predicting traffic accidents by eliminating the class imbalance using the traffic accident data and constructing the Hybrid Model. Using the original traffic accident data and the sampled data as learning data which use FP-Growth algorithm it learn patterns associated with traffic accident injury severity. Accordingly, In this paper purpose a method for predicting the severity of a victim of a traffic accident by analyzing the association patterns of two learning data, we can extract the same related patterns, when a decision tree and multinomial logistic regression analysis are performed, a hybrid model is constructed by assigning weights to related attributes.

Characterization and Fibrinolytic Activity of Acetobacter sp. FP1 Isolated from Fermented Pine Needle Extract

  • Park, Jae-Young;Yoon, Seo-Hyeon;Kim, Seong-Sim;Lee, Beom-Gi;Cheong, Hyeong-Sook
    • Journal of Microbiology and Biotechnology
    • /
    • v.22 no.2
    • /
    • pp.215-219
    • /
    • 2012
  • The strain KCTC 11629BP, isolated from spontaneously fermented pine needle extract (FPE), showed fibrinolysis activity. The isolated strain was analyzed in physiological and biochemical experiments. Based on 16S rDNA sequencing and phylogenic tree analysis, the strain was identified to be a part of the genus Acetobacter, with Acetobacter senegalensis and Acetobacter tropicalis as the closest phylogenetic neighbors. Based on genotypic and phenotypic results, it was proposed that bacterial strain KCTC 11629BP represents a species of the genus Acetobacter. The strain was thusly named Acetobacter sp. FP1. In conclusion, Acetobacter sp. FP1 isolated from FPE possesses fibrinolytic activity.

Introduction to Concept in Association Rule Mining (연관규칙 마이닝에서의 Concept 개요)

  • ;;R. S. Famakrishna
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.100-102
    • /
    • 2002
  • 데이터 마이닝의 대표적인 기법인 연관규칙 마이닝을 위한 다양만 알고리즘들이 제안되었고, 각 알고리즘에 따른 대용량 데이터에 대한 신속한 탐색을 위한 독특한 자료구조가 제안되었다 각 자료구조의 특성에 따른 알고리즘 성능은 데이터의 패턴에 크게 의존한다. 본 논문에서는 Concept을 형성하는 세가지 대표적인 자료구조인 Hash Tree, Lattice. FP-Tree에 대해 비교 분석해보고, 데이터 패턴에 적합한 효율적인 알고리즘의 설계 위한 framework을 제안한다.

  • PDF

Mining Frequent Pattern from Large Spatial Data (대용량 공간 데이터로 부터 빈발 패턴 마이닝)

  • Lee, Dong-Gyu;Yi, Gyeong-Min;Jung, Suk-Ho;Lee, Seong-Ho;Ryu, Keun-Ho
    • Journal of Korea Spatial Information System Society
    • /
    • v.12 no.1
    • /
    • pp.49-56
    • /
    • 2010
  • Many researches of frequent pattern mining technique for detecting unknown patterns on spatial data have studied actively. Existing data structures have classified into tree-structure and array-structure, and those structures show the weakness of performance on dense or sparse data. Since spatial data have obtained the characteristics of dense and sparse patterns, it is important for us to mine quickly dense and sparse patterns using only single algorithm. In this paper, we propose novel data structure as compressed patricia frequent pattern tree and frequent pattern mining algorithm based on proposed data structure which can detect frequent patterns quickly in terms of both dense and sparse frequent patterns mining. In our experimental result, proposed algorithm proves about 10 times faster than existing FP-Growth algorithm on both dense and sparse data.

Advanced Improvement for Frequent Pattern Mining using Bit-Clustering (비트 클러스터링을 이용한 빈발 패턴 탐사의 성능 개선 방안)

  • Kim, Eui-Chan;Kim, Kye-Hyun;Lee, Chul-Yong;Park, Eun-Ji
    • Journal of Korea Spatial Information System Society
    • /
    • v.9 no.1
    • /
    • pp.105-115
    • /
    • 2007
  • Data mining extracts interesting knowledge from a large database. Among numerous data mining techniques, research work is primarily concentrated on clustering and association rules. The clustering technique of the active research topics mainly deals with analyzing spatial and attribute data. And, the technique of association rules deals with identifying frequent patterns. There was an advanced apriori algorithm using an existing bit-clustering algorithm. In an effort to identify an alternative algorithm to improve apriori, we investigated FP-Growth and discussed the possibility of adopting bit-clustering as the alternative method to solve the problems with FP-Growth. FP-Growth using bit-clustering demonstrated better performance than the existing method. We used chess data in our experiments. Chess data were used in the pattern mining evaluation. We made a creation of FP-Tree with different minimum support values. In the case of high minimum support values, similar results that the existing techniques demonstrated were obtained. In other cases, however, the performance of the technique proposed in this paper showed better results in comparison with the existing technique. As a result, the technique proposed in this paper was considered to lead to higher performance. In addition, the method to apply bit-clustering to GML data was proposed.

  • PDF

Parallel Data Mining with Distributed Frequent Pattern Trees (분산형 FP트리를 활용한 병렬 데이터 마이닝)

  • 조두산;김동승
    • Proceedings of the IEEK Conference
    • /
    • 2003.07c
    • /
    • pp.2561-2564
    • /
    • 2003
  • Data mining is an effective method of the discovery of useful information such as rules and previously unknown patterns existing in large databases. The discovery of association rules is an important data mining problem. We have developed a new parallel mining called Distributed Frequent Pattern Tree (abbreviated by DFPT) algorithm on a distributed shared nothing parallel system to detect association rules. DFPT algorithm is devised for parallel execution of the FP-growth algorithm. It needs only two full disk data scanning of the database by eliminating the need for generating the candidate items. We have achieved good workload balancing throughout the mining process by distributing the work equally to all processors. We implemented the algorithm on a PC cluster system, and observed that the algorithm outperformed the Improved Count Distribution scheme.

  • PDF

Design and Implementation of PMSL for Information Retrieval (의미있는 정보 검색을 위한 개인화된 다중 전략 학습 모듈의 설계 및 구현)

  • 유수경;김교정
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.208-210
    • /
    • 2004
  • 오늘날 인터넷상에서 존재하는 않은 정보들은 다양한 사용자의 개인 특성에 안게 새로운 정보의 지식으로 제공되어지기를 원한다. 기존의 연구는 단일 학술 기법을 통해 정보를 추출했으나 사용자에게 보다 의미 있는 정보를 제공하기 위해 다중 전략 학습 기법인 PMSL(Personalized Multi-Strategy Learning) 모듈 시스템을 제안하고자 한다. PMSL 모듈은 인터넷의 정보를 여과하여 필터링하고, 사용자 개인화의 키워드를 중심으로 연관된 객체를 추출한다. 이때 연관된 객체 추출시 대용량 데이터에서 시간적, 공간적면에서 효율적인 연관 탐색 기법인 Fp-Tree와 Fp-Growth 알고리즘을 적용시킴으로 결과의 효율성을 높이고자 하였으며, 연관규칙의 문제점을 보완하기 위해 가중치 기법인 TF*IDF 학습 기법을 적용시켰다. PMSL 모듈을 실행한 결과 기존 학습 기법에 비해 보다 더 의미 있는 연관 지식을 추출하게 되었다.

  • PDF

Change of the Vegetation Due to Soyanggang Dam Construction (소양강댐 건설에 따른 주변 식생의 변화)

  • Choi, Ho;Park, Pil-Sun;Kim, Jae-Geun;Suh, Sim-Eun
    • Journal of Wetlands Research
    • /
    • v.12 no.3
    • /
    • pp.1-13
    • /
    • 2010
  • Most of investigations about the effects of dam construction on the surrounding environments have focused mainly on the change of climate conditions and crop production. In order to research the effect of dam construction on the surrounding vegetation, we chose the Soyanggang dam whose storage capacity is the largest in Korea, and was built 33 years ago. We surveyed and analyzed the surrounding vegetation by using quadrat method and measured the soil moisture content among floodplain (FP), 5m above the flood plain (AFP) and control group (CG) which is 3km far from the lake through ridge. The largest value of mean importance percentage of the canopy~understory layer at FP was Salix koreensis (87.9%) and those of AFP and CG was Quercus mongolica (38.9% and 40.4% respectively) and the largest important percentage of the herb layer at FP was Artemisia capillaris (34.2%) and those of AFP and CG was Oplismenus undulatifolius var. undulatifolius (9.4% and 24.6% respectively). The Shannon-Wiener diversity index of shrub~canopy layer at FP (0.26) was lower than AFP (2.34) and CG (2.23) and there was not any significant difference in the herb layer among three groups. The S${\o}$rensen similarity index between FP and AFP, FP and CG was 0, and that of AFP and CG was relatively high. The highest density of tree and subtree with the DBH level of FP was S. koreensis of 5~10cm (240/ha), and that of AFP and CG was Quercus spp. of 15~20cm (400/ha and 466/ha respectively). And the highest density of seedlings of FP was Pinus densiflora (7,040/ha), and that of AFP and CG was Quercus spp. (720/ha and 400/ha respectively). The soil water content of FP (6.28%) was relatively lower than AFP and CG (11.13% and 10.14% respectively; p<.01). These results indicated that construction of Soyanggang dam changed the vegetation of the floodplain, without showing a change in its upland areas.