• Title/Summary/Keyword: classification trees

Search Result 313, Processing Time 0.025 seconds

A Study on the Printed French Textiles in the 18th Century - Focus on the Toile do Jouy (18세기 프랑스의 프린트 직물에 관한 연구 -트왈 드 죠이 디자인을 중심으로-)

  • Kim, Hee-Sun;Koo, Hee-Kyung
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.8 no.3
    • /
    • pp.129-143
    • /
    • 2006
  • This study is to review the printed cotton textile industry of Europe in 17th-l8th century, and specially investigate the development of the Toile do Jouy, printed French fabrics around the 18th century. Generally, the Toile de Jouy has two different meanings. The first meaning is the popular printed cotton textiles producted by wood block printing, copper plate printing and roller printing techniques at Jouy on Joas factory in France, around 18th century. The second meaning is the monochromatic upholstery fabrics printed by copper plate. Actually, this monochromatic printed textiles were the most popular printed cotton fabrics with large scale scenic designs with people, trees, birds, buildings, mythical heroes, protagonists of novel and country scenes of shepherds, sheep and other animals manufactured by Jouy on Joas factory. Main issue of this paper is to propose features of pattern, color and classify types of patterns expressed on the Toile de Jouy fabrics according to printing techniques such as wood block printing, copper plate printing and copper roller printing. And this study is also to analyze on origins of the variety of names called the printed cotton textiles in those days. The results of this study can help to understand the knowledge of printed cotton textiles in Europe and be effectively applied to develop printed fabric design in the textile industry.

  • PDF

An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining (데이터 마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석)

  • Lee Yung-Seop;Oh Hyun-Joung;Kim Mee-Kyung
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.343-354
    • /
    • 2005
  • The goal of this paper is to compare classification performances and to find a better classifier based on the characteristics of data. The compared methods are CART with two ensemble algorithms, bagging or boosting and SVM. In the empirical study of twenty-eight data sets, we found that SVM has smaller error rate than the other methods in most of data sets. When comparing bagging, boosting and SVM based on the characteristics of data, SVM algorithm is suitable to the data with small numbers of observation and no missing values. On the other hand, boosting algorithm is suitable to the data with number of observation and bagging algorithm is suitable to the data with missing values.

GAM: A Criticality Prediction Model for Large Telecommunication Systems (GAM: 대형 통신 시스템을 위한 위험도 예측 모델)

  • Hong, Euy-Seok
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.2
    • /
    • pp.33-40
    • /
    • 2003
  • Criticality prediction models that determine whether a design entity is fault-prone or non fault-prone play an important role in reducing system development costs because the problems in early phases largely affect the quality of the late products. Real-time systems such as telecommunication systems are so large that criticality prediction is mere important in real-time system design. The current models are based on the technique such as discriminant analysis, neural net and classification trees. These models have some problems with analyzing causes of the prediction results and low extendability. This paper builds a new prediction model, GAM, based on Genetic Algorithm. GAM is different from other models because it produces a criticality function. So GAM can be used for comparison between entities by criticality. GAM is implemented and compared with a well-known prediction model, BackPropagation neural network Model(BPM), considering Internal characteristics and accuracy of prediction.

  • PDF

A study on removal of unnecessary input variables using multiple external association rule (다중외적연관성규칙을 이용한 불필요한 입력변수 제거에 관한 연구)

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.877-884
    • /
    • 2011
  • The decision tree is a representative algorithm of data mining and used in many domains such as retail target marketing, fraud detection, data reduction, variable screening, category merging, etc. This method is most useful in classification problems, and to make predictions for a target group after dividing it into several small groups. When we create a model of decision tree with a large number of input variables, we suffer difficulties in exploration and analysis of the model because of complex trees. And we can often find some association exist between input variables by external variables despite of no intrinsic association. In this paper, we study on the removal method of unnecessary input variables using multiple external association rules. And then we apply the removal method to actual data for its efficiencies.

Nonstandard Machine Learning Algorithms for Microarray Data Mining

  • Zhang, Byoung-Tak
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2001.10a
    • /
    • pp.165-196
    • /
    • 2001
  • DNA chip 또는 microarray는 다수의 유전자 또는 유전자 조각을 (보통 수천내지 수만 개)칩상에 고정시켜 놓고 DNA hybridization 반응을 이용하여 유전자들의 발현 양상을 분석할 수 있는 기술이다. 이러한 high-throughput기술은 예전에는 생각하지 못했던 여러가지 분자생물학의 문제에 대한 해답을 제시해 줄 수 있을 뿐 만 아니라, 분자수준에서의 질병 진단, 신약 개발, 환경 오염 문제의 해결 등 그 응용 가능성이 무한하다. 이 기술의 실용적인 적용을 위해서는 DNA chip을 제작하기 위한 하드웨어/웻웨어 기술 외에도 이러한 데이터로부터 최대한 유용하고 새로운 지식을 창출하기 위한 bioinformatics 기술이 핵심이라고 할 수 있다. 유전자 발현 패턴을 데이터마이닝하는 문제는 크게 clustering, classification, dependency analysis로 구분할 수 있으며 이러한 기술은 통계학과인공지능 기계학습에 기반을 두고 있다. 주로 사용된 기법으로는 principal component analysis, hierarchical clustering, k-means, self-organizing maps, decision trees, multilayer perceptron neural networks, association rules 등이다. 본 세미나에서는 이러한 기본적인 기계학습 기술 외에 최근에 연구되고 있는 새로운 학습 기술로서 probabilistic graphical model (PGM)을 소개하고 이를 DNA chip 데이터 분석에 응용하는 연구를 살펴본다. PGM은 인공신경망, 그래프 이론, 확률 이론이 결합되어 형성된 기계학습 모델로서 인간 두뇌의 기억과 학습 기작에 기반을 두고 있으며 다른 기계학습 모델과의 큰 차이점 중의 하나는 generative model이라는 것이다. 즉 일단 모델이 만들어지면 이것으로부터 새로운 데이터를 생성할 수 있는 능력이 있어서, 만들어진 모델을 검증하고 이로부터 새로운 사실을 추론해 낼 수 있어 biological data mining 문제에서와 같이 새로운 지식을 발견하는 exploratory analysis에 적합하다. 또한probabilistic graphical model은 기존의 신경망 모델과는 달리 deterministic한의사결정이 아니라 확률에 기반한 soft inference를 하고 학습된 모델로부터 관련된 요인들간의 인과관계(causal relationship) 또는 상호의존관계(dependency)를 분석하기에 적합한 장점이 있다. 군체적인 PGM 모델의 예로서, Bayesian network, nonnegative matrix factorization (NMF), generative topographic mapping (GTM)의 구조와 학습 및 추론알고리즘을소개하고 이를 DNA칩 데이터 분석 평가 대회인 CAMDA-2000과 CAMDA-2001에서 사용된cancer diagnosis 문제와 gene-drug dependency analysis 문제에 적용한 결과를 살펴본다.

  • PDF

Identification, Characterization and Phylogenic Analysis of Conserved Genes within the p74 Gene Region of Choristoneura fumiferana Granulovirus Genome

  • Rashidan, Kianoush Khajeh;Nassoury, Nasha;Giannopoulos, Paresa N.;Mauffette, Yves;Guertin, Claude
    • BMB Reports
    • /
    • v.37 no.6
    • /
    • pp.700-708
    • /
    • 2004
  • The genes located within the p74 gene region of the Choristoneura fumiferana granulovirus (ChfuGV) were identified by sequencing an 8.9 kb BamHI restriction fragment on the ChfuGV genome. The global guanine-cytosine (GC) content of this region of the genome was 33.02%. This paper presents the ORFs within the p74 gene region along with their transcriptional orientations. This region contains a total of 15 open reading frames (ORFs). Among those, 8 ORFs were found to be homologues to the baculoviral ORFs: Cf-i-p , Cf-vi, Cf-vii, Cf-viii (ubiquitin), Cf-xi (pp31), Cf-xii (lef-11), Cf-xiii (sod) and Cf-xv-p (p74). To date, no specific function has been assigned to the ORFs: Cf-i, Cf-ii, Cf-iii, Cf-iv, Cf-v, Cf-vi, Cf-vii, Cf-ix and Cf-x. The most noticeable ORFs located in this region of the ChfuGV genome were ubiquitin, lef-11, sod, fibrillin and p74. The phylogenetic trees (constructed using conceptual products of major conserved ORFs) and gene arrangement in this region were used to further examine the classification of the members of the granulovirus genus. Comparative studies demonstrated that ChfuGV along with the Cydia pomonella granulovirus (CpGV), Phthorimaea operculella granulovirus (PhopGV), Adoxophyes orana granulovirus (AoGV) and Cryptophlebia leucotreta granulovirus (ClGV) share a high degree of amino acids sequence and gene arrangement preservation within the studied region. These results support a previous report, which classified a granuloviruses into 2 distinct groups: Group I: ChfuGV, CpGV, PhopGV and AoGV and Group II: Xestia c-nigrum granulovirus (XcGV) and Plutella xylostella granulovirus (PxGV). The phylogenetic and gene arrangement studies also placed ClGV as a novel member of the Group I granuloviruses.

Spam-Filtering by Identifying Automatically Generated Email Accounts (자동 생성 메일계정 인식을 통한 스팸 필터링)

  • Lee Sangho
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.378-384
    • /
    • 2005
  • In this paper, we describe a novel method of spam-filtering to improve the performance of conventional spam-filtering systems. Conventional systems filter emails by investigating words distribution in email headers or bodies. Nowadays, spammers begin making email accounts in web-based email service sites and sending emails as if they are not spams. Investigating the email accounts of those spams, we notice that there is a large difference between the automatically generated accounts and ordinaries. Based on that difference, incoming emails are classified into spam/non-spam classes. To classify emails from only account strings, we used decision trees, which have been generally used for conventional pattern classification problems. We collected about 2.15 million account strings from email service sites, and our account checker resulted in the accuracy of $96.3\%$. The previous filter system with the checker yielded the improved filtering performance.

Validation of DEM Derived from ERS Tandem Images Using GPS Techniques

  • Lee, In-Su;Chang, Hsing-Chung;Ge, Linlin
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.13 no.1 s.31
    • /
    • pp.63-69
    • /
    • 2005
  • Interferometric Synthetic Aperture Radar(InSAR) is a rapidly evolving technique. Spectacular results obtained in various fields such as the monitoring of earthquakes, volcanoes, land subsidence and glacier dynamics, as well as in the construction of Digital Elevation Models(DEMs) of the Earth's surface and the classification of different land types have demonstrated its strength. As InSAR is a remote sensing technique, it has various sources of errors due to the satellite positions and attitude, atmosphere, and others. Therefore, it is important to validate its accuracy, especially for the DEM derived from Satellite SAR images. In this study, Real Time Kinematic(RTK) GPS and Kinematic GPS positioning were chosen as tools for the validation of InSAR derived DEM. The results showed that Kinematic GPS positioning had greater coverage of test area in terms of the number of measurements than RTK GPS. But tracking the satellites near and/or under trees md transmitting data between reference and rover receivers are still pending tasks in GPS techniques.

  • PDF

CHANGE DETECTION ANALYSIS OF FORESTED AREA IN THE TRANSITION ZONE AT HUSTAI NATIONAL PARK, CENTRAL MONGOLIA

  • Bayarsaikhan, Uudus;Boldgiv, Bazartseren;Kim, Kyung-Ryul;Park, Kyeng-Ae
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.426-429
    • /
    • 2007
  • One of the widely used applications of remote sensing studies is environmental change detection and biodiversity conservation. The study area Hustai Mountain is situated in the transition zone between the Siberian taiga forest and Central Mongolian arid steppe. Hustai National Park carries out one of several reintroduction programs of takhi (wild horse or Equus ferus przewalskii) from various zoos in the world and it represents one of a few textbook examples of successful reintroduction of an animal extinct in the wild. In this paper we describe the results of an analysis on the change of remaining forest area over the 7-year period since Hustai Mountain was designated as a protected area for reintroduction to wild horses. Today the forested area covers approximately 5% of the Hustai National Park, mostly the north-facing slopes above 1400 m altitude. Birch (Betula platyphylla) and aspen (Populus tremula) trees are predominant in the forest. We used Landsat ETM+ images from two different years and multi temporal MODIS NDVI data. Land types were determined by supervised classification methods (Maximum Likelihood algorithm) verified with ground-truthing data and the Land Change Modeler (LCM) which was developed by Clark Labs. Forested area was classified into three different land types, namely the forest land, mountain meadow and mountain steppe. The study results illustrate that the remaining birch forest has rapidly changed to fragmented forest land and to open areas. Underlying causes for such a rapid change during the 15-year period may be manifold. However, the responsible factors appear to be the drying off and outbreak of forest pest species (such as gypsy moth or Lymantria dispar) in the area.

  • PDF

Important SNPs Identification from the Economic Traits for the High Quality Korean Cattle (고품질 한우를 위한 여러 경제형질에서의 주요 SNP 규명)

  • Lee, Jea-Young;Kim, Dong-Chul
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.1
    • /
    • pp.67-74
    • /
    • 2009
  • In order to make the high quality Korean cattle, it has been identified the gene markers which influence to various economic traits. To identify statistically significances among SNP markers, Lee et. al. (2008b) identified SNP(19_1)$^*$SNP(28_2) marker was an important marker in LMA(longissimus muscle dorsi area). In addition, CWT(carcass cold weight) and ADG(average daily gain) are applied for expanded multifactor dimensionality reduction (expanded MDR) method from the comprehensive economic traits. The results showed that SNP(19_1)$^*$SNP(28_2) interaction marker was good and a very meaningful for economic traits.