• Title/Summary/Keyword: Cluster Tree

Search Result 339, Processing Time 0.026 seconds

Design and implementation of data mining tool using PHP and WEKA (피에이치피와 웨카를 이용한 데이터마이닝 도구의 설계 및 구현)

  • You, Young-Jae;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.425-433
    • /
    • 2009
  • Data mining is the method to find useful information for large amounts of data in database. It is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. We need a data mining tool to explore a lot of information. There are many data mining tools or solutions; E-Miner, Clementine, WEKA, and R. Almost of them are were focused on diversity and general purpose, and they are not useful for laymen. In this paper we design and implement a web-based data mining tool using PHP and WEKA. This system is easy to interpret results and so general users are able to handle. We implement Apriori algorithm of association rule, K-means algorithm of cluster analysis, and J48 algorithm of decision tree.

  • PDF

Inferring the Molecular Phylogeny of Chroococcalian Strains (Blue-green algae/Cyanophyta) from the Geumgang River, Based on Partial Sequences of 16S rRNA Gene

  • Lee, Wook-Jae;Bae, Kyung-Sook
    • Journal of Microbiology
    • /
    • v.40 no.4
    • /
    • pp.335-339
    • /
    • 2002
  • Partial sequences of 16S rRNA gene of five chroococcalian blue-green algal strains, Aphanothece nidulans KCTC AG10041, Aphanothece naegelii KCTC AG10042, Microcystis aeruginosa KCTC AG10159, Microcystis ichthyoblabe KCTC AG10160, and Microcystis viridis KCTC AG10198, which were isolated from water from the Geumgang River, were determined and were inferred their phylogenetic and taxonomic positions among taxa of order Chroococcales. Most taxa of Chroococcales whose partial 16S rRNA gene sequences were aligned in this study, are clustered with other related taxa. Aphanothece nidulans KCTC AG10041 and Aphanothece naegelii KCTC AG10042 made a cluster with other European species of these genera, which supported 100% of the bootstrap trees with a very high sequence similarity (97.4-99.4%) in this study. Three strains, Microcystis aeruginosa KCTC AG10159, M. ichthyoblabe KCTC AG10160, and M. viridis KCTC AG10198, formed a cluster with other Microcystis spp. supported 100 % of the bootstrap trees with a similarity of 97.0-99.9% except for two strains. However, this phylogentic tree made no resolution among the species of Microcystis spp. The topology of the tree reconfirmed the taxonomic status of three species of Microcystis, identified in this study based on the morphology, as three colonial types of Microcystis aeruginosa com. nov. Otsuka et al. (1999c). The genera of chroococcalian cyanophytes are heterogeneously clustered in these sequence analyses. We suggest that more molecular studies on the genera of Chroococcales with reference strains, widely collected from restricted geographic or environmental ranges, get accurate taxonomic or phylogenetic determinations.

Forest Structure in Relation to Altitude and Part of Slope in a Valley Forest at Chuwangsan Area (주왕산지역 계곡부의 해발고와 사면부위에 따른 산림구조)

  • 박인협;문광선;류석봉
    • Korean Journal of Environment and Ecology
    • /
    • v.8 no.2
    • /
    • pp.154-159
    • /
    • 1995
  • The Chuwang valley-Kumunkwangi valley forest in Chuwangsan area was studied to investigate forest structure in relation to altitude and part of slope. Forty eight quadrats were set up in the valley forest along altitude of 470m to 780m and part of the slope. Density and basal area of trees in tree strata decreased as increasing elevation. With increasing elevation the importance values of Quercus mongolica, Fraxinus rhynchophylla increased, while those of Pinus densiflora, Lindera obtussiloba decreased. As going from lower part to upper part of the slope, the importance values of Quercus valiabilis and Lindera obtussiloba increased while those of Fraxius rhynchophylla, Acer mono decreased. Species diversity tended to decreased as going to upper parts of the slope. The range of similarity indices between elevation belts, and parts of the slope were 74.4~84.2% and 68.0~96.3%, respectively. According to importance value and cluster analysis, the studied valley forest was classified into three forest communities of Pinus densiflora-deciduous tree species community of lower part of slope, Pinus densiflora-Quercus variabilis community of middle and upper part of slope, Pinus densiflora community of the top area.

  • PDF

An Incremental Web Document Clustering Based on the Transitive Closure Tree (이행적 폐쇄트리를 기반으로 한 점증적 웹 문서 클러스터링)

  • Youn Sung-Dae;Ko Suc-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.1
    • /
    • pp.1-10
    • /
    • 2006
  • In document clustering methods, the k-means algorithm and the Hierarchical Alglomerative Clustering(HAC) are often used. The k-means algorithm has the advantage of a processing time and HAC has also the advantage of a precision of classification. But both methods have mutual drawbacks, a slow processing time and a low quality of classification for the k-means algorithm and the HAC, respectively. Also both methods have the serious problem which is to compute a document similarity whenever new document is inserted into a cluster. A main property of web resource is to accumulate an information by adding new documents frequently. Therefore, we propose a new method of transitive closure tree based on the HAC method which can improve a processing time for a document clustering, and also propose a superior incremental clustering method for an insertion of a new document and a deletion of a document contained in a cluster. The proposed method is compared with those existing algorithms on the basis of a pre챠sion, a recall, a F-Measure, and a processing time and we present the experimental results.

  • PDF

Phylogenetic Analysis of Phellinus linteus and Related Species Comparing the Sequences of rDNA Internal Transcribed Spacers

  • Lee, Jae-Dong;Kim, Gi-Young;Park, Joung-Eon;Park, Hyung-Sik;Nam, Byung-Hyouk;An, Won-Gun;Lee, Tae-Ho
    • Journal of Life Science
    • /
    • v.11 no.2
    • /
    • pp.126-134
    • /
    • 2001
  • The phylogenetic tree displayed the presence of five groups in the Phellinus genus, which were distinguished based on their morphology. Most of the p. linteus appeared a cluster which was highly significant with the exception of P. linteus KACC 500122 and KACC 500411. They formed the sister taxa of P 1inteus where P. baumii, Phellinus sp. MPNU 7003, MPNU 7007, and MPNU 7010 had similar morphological characteristics. Also, P. nigricans IMSNU 32024 and P. pini var, carniformans IMSNU 32031 were grouped in the same cluster with P. igniarius KCTC 6227, KCTC 6228, and P. chrysoloma KCTC 6225 extracted from the Gen-Bank database. P. torulosus IMSNU 32028 and Phellinus sp. MPNU 7011 formed a closed group, however, these species had a distant taxa when compared with the other Phellinus species. The nucleotide sequences of the internal transcribed spacer (ITS) regions of ribosomal DNA (rDNA) including the 5.85 rDNA were determined from 24 strains of the Phellinus genus in order to analyze their phylogenetic relationship. These fungi were divided into two basic groups based on their ITS length, however, this grouping was different from that based on their morphological characteristics. Although various ITS sequences were ambiguously aligned, conserved sites were also identified. Accordingly, a neighbor-joining tree was constructed using the nucleotide sequence data of the conserved sites of the ITS regions and the 5.8S rDNA.

  • PDF

The Difference Analysis between Maturity Stages of Venture Firms by Classification Techniques of Big Data (빅데이터 분류 기법에 따른 벤처 기업의 성장 단계별 차이 분석)

  • Jung, Byoungho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.4
    • /
    • pp.197-212
    • /
    • 2019
  • The purpose of this study is to identify the maturity stages of venture firms through classification analysis, which is widely used as a big data technique. Venture companies should develop a competitive advantage in the market. And the maturity stage of a company can be classified into five stages. I will analyze a difference in the growth stage of venture firms between the survey response and the statistical classification methods. The firm growth level distinguished five stages and was divided into the period of start-up and declines. A classification method of big data uses popularly k-mean cluster analysis, hierarchical cluster analysis, artificial neural network, and decision tree analysis. I used variables that asset increase, capital increase, sales increase, operating profit increase, R&D investment increase, operation period and retirement number. The research results, each big data analysis technique showed a large difference of samples sized in the group. In particular, the decision tree and neural networks' methods were classified as three groups rather than five groups. The groups size of all classification analysis was all different by the big data analysis methods. Furthermore, according to the variables' selection and the sample size may be dissimilar results. Also, each classed group showed a number of competitive differences. The research implication is that an analysts need to interpret statistics through management theory in order to interpret classification of big data results correctly. In addition, the choice of classification analysis should be determined by considering not only management theory but also practical experience. Finally, the growth of venture firms needs to be examined by time-series analysis and closely monitored by individual firms. And, future research will need to include significant variables of the company's maturity stages.

Detection of Entry/Exit Zones for Visual Surveillance System using Graph Theoretic Clustering (그래프 이론 기반의 클러스터링을 이용한 영상 감시 시스템 시야 내의 출입 영역 검출)

  • Woo, Ha-Yong;Kim, Gyeong-Hwan
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.46 no.6
    • /
    • pp.1-8
    • /
    • 2009
  • Detecting entry and exit zones in a view covered by multiple cameras is an essential step to determine the topology of the camera setup, which is critical for achieving and sustaining the accuracy and efficiency of multi-camera surveillance system. In this paper, a graph theoretic clustering method is proposed to detect zones using data points which correspond to entry and exit events of objects in the camera view. The minimum spanning tree (MST) is constructed by associating the data points. Then a set of well-formed clusters is sought by removing inconsistent edges of the MST, based on the concepts of the cluster balance and the cluster density defined in the paper. Experimental results suggest that the proposed method is effective, even for sparsely elongated clusters which could be problematic for expectation-maximization (EM). In addition, comparing to the EM-based approaches, the number of data required to obtain stable outcome is relatively small, hence shorter learning period.

Forest Structure in Relation to Altitude and Part of Slope in the Mansugol Valley at Woraksan National Park (월악산국립공원 만수골 계곡부의 해발고와 사면부위에 따른 산림구조)

  • Park In-Hyeop;Jang Jeong-Jae;Kim Kye-Seon
    • Korean Journal of Environment and Ecology
    • /
    • v.19 no.2
    • /
    • pp.99-105
    • /
    • 2005
  • The Mansugol valley forest in Woraksan National Park was studied to investigate forest structure in relation to altitude and part of the slope. Forty eight quadrats were set up in the valley forest along altitude of 380m to 915m and part of the slope, and vegetation analysis for the woody species in the tree and subtree layers were carried out. With increasing elevation belt, tree density and basal area of the tree layer decreased while basal area of the subtree layer increased. As elevation increased, the importance percentages of Quercus mongolica, Fraxinus rhynchophylla, Lindera obtusiloba and Acer mono increased while those of Pinus densiflora, Quercus variablias Quercus serrata and Styrax obassia decreased. Species diversities of the elevation belts including the top of the valley ranged from 0.351 to 0.903, and those of the parts of the slope ranged from 0.780 to 1.064. The range of similarity indices between elevation belts were $36.0\~67.3\%$, and the range of similarity indices between parts of the slope were $66.8\~75.1\%$. According to importance percentage and cluster analysis, the studied valley forest was classified into three forest communities of Pinus densiflora-Quecus species community in the low elevation belt and the middle part of the slope at the middle elevation belt, Quercus mongolica-broad-leaved tree species community in the high elevation belt and the lower and upper parts at the middle elevation belt, and Quercus mongolica community in the top area of the valley. The importace percentage of Quercus mongolica was significantly and negatively correlated with those of Pinus desiflora and Quercus serrata. There were significantly positive correlation among Pinus densiflora, Quercuss serata and Rhus trichocarpa.

Forest Structure in Relation to Slope Aspect and Altitude in valley Forests at Hambaeksan Area (함백산지역 계곡부의 사면방향과 해발고에 따른 산림구조)

  • 박인협;최윤호;이석면;최영철;유석봉
    • Korean Journal of Environment and Ecology
    • /
    • v.15 no.4
    • /
    • pp.361-368
    • /
    • 2002
  • The valley forests located at the east-facing slope and the west facing slope in Hambaeksan area were studied to investigate forest structure in relation to aspect and altitude of the slope. There was little difference in density. mean DBH and basal area of the tree layer between east-facing slope and west-facing slope. The importance percentages of Tilia amurensis and Betula costata in west-facing slope were higher than those in east-facing slope. However, the importance percentages of Quercus mongilica and Fraxinus rhynchophylla in the west facing slope were lower than those in east-facing slope. Species diversity of the west-facing slope was 1.415 and that of the east-facing slope was 1.328. Elevation trends were also found for forest structure. As elevation Increased basal area and mean height of the tree layer decreased in both of east-facing slope and west-facing slope. There was a tendency that number of species, species diversity and evenness decreased with increasing elevation. The importance percentage of Quercus mongolica increased with increasing elevation while those of Betula costata and Maackia amurensis decreased. The result of cluster analysis for the tree and subtree layer indicated that the studied forests were classified into the mixed forest community of broad-leaved tree species at west-facing slope and the low and middle elevation belts of east-facing slope and Quercus mongolica community at the high elevation belt of east-facing slope. Quercus mongolica was significantly and positively correlated with Symplocos chinensis for. pilosa, Acer tschonoskii var. rubripes and deutzia glabrata. Betula costata was significantly and negatively correlated with Quercus mongolica and Acer pseudo-sieboldianum.

Analysis of Difference in Growing Stock Volume Estimates by the Changes of Cluster Plot Design and Volume Equation (표본점 설계방법과 적용 단목재적식 변경에 따른 임목축적 차이의 구명)

  • Han, Won-Sung;Kim, Sung-Ho;Kim, Chong-Chan;Shin, Man-Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.99 no.3
    • /
    • pp.304-311
    • /
    • 2010
  • Korea National Forest Inventory System has been adopting different cluster plot design and new equations to estimate growing stock volumes since 2006. These changes have resulted in volume estimations which show some difference from previous ones. This study is to find out the source of such difference. For this, relevant data was collected from 80 plots of 20 cluster samples according to the cluster plot design applied to 4th and 5th National Forest Inventory. Then growing stock volumes were estimated by using current and previous individual tree volume equations respectively. An investigation was made to detect whether such difference in volume estimates was originated from the changes in cluster plot design or from using different volume equations. T-test results showed that the difference from changes in cluster plot design was negligible. Instead, changes in volume equations had statistically significant effects in volume estimation. Since the volume estimation by the 5th National Forest Inventory would bring overestimation by applying different volume equations, all the volume estimations made prior to 2006 would require necessary modifications for international reporting.