• Title/Summary/Keyword: CART (classification and regression tree)

Search Result 92, Processing Time 0.027 seconds

An Improved Text Classification Method for Sentiment Classification

  • Wang, Guangxing;Shin, Seong Yoon
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.1
    • /
    • pp.41-48
    • /
    • 2019
  • In recent years, sentiment analysis research has become popular. The research results of sentiment analysis have achieved remarkable results in practical applications, such as in Amazon's book recommendation system and the North American movie box office evaluation system. Analyzing big data based on user preferences and evaluations and recommending hot-selling books and hot-rated movies to users in a targeted manner greatly improve book sales and attendance rate in movies [1, 2]. However, traditional machine learning-based sentiment analysis methods such as the Classification and Regression Tree (CART), Support Vector Machine (SVM), and k-nearest neighbor classification (kNN) had performed poorly in accuracy. In this paper, an improved kNN classification method is proposed. Through the improved method and normalizing of data, the purpose of improving accuracy is achieved. Subsequently, the three classification algorithms and the improved algorithm were compared based on experimental data. Experiments show that the improved method performs best in the kNN classification method, with an accuracy rate of 11.5% and a precision rate of 20.3%.

Development of medical/electrical convergence software for classification between normal and pathological voices (장애 음성 판별을 위한 의료/전자 융복합 소프트웨어 개발)

  • Moon, Ji-Hye;Lee, JiYeoun
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.187-192
    • /
    • 2015
  • If the software is developed to analyze the speech disorder, the application of various converged areas will be very high. This paper implements the user-friendly program based on CART(Classification and regression trees) analysis to distinguish between normal and pathological voices utilizing combination of the acoustical and HOS(Higher-order statistics) parameters. It means convergence between medical information and signal processing. Then the acoustical parameters are Jitter(%) and Shimmer(%). The proposed HOS parameters are means and variances of skewness(MOS and VOS) and kurtosis(MOK and VOK). Database consist of 53 normal and 173 pathological voices distributed by Kay Elemetrics. When the acoustical and proposed parameters together are used to generate the decision tree, the average accuracy is 83.11%. Finally, we developed a program with more user-friendly interface and frameworks.

Analysis of Factors Influencing upon the Metro Wear Using the Classification and Regression Trees (CART 분석을 이용한 지하철 마모 영향인자 분석)

  • Jeong, Min Chul;Lee, Won Woo;Kim, Jung Hoon;Kong, Jung Sik
    • 한국방재학회:학술대회논문집
    • /
    • 2011.02a
    • /
    • pp.38-38
    • /
    • 2011
  • 일반적으로 레일마모는 열차의 주행안전 및 승차감에 미치는 영향이 크고, 소음 진동의 주요원인으로 작용한다. 또한 레일마모가 발생할 경우 궤도구조의 파괴를 촉진시킴으로써 차량 및 궤도유지보수비를 크게 증가시킨다. 따라서 구간 특성 및 환경 영향 인자 등 현장에서 발생하는 마모 원인을 체계적으로 분석함으로써 마모를 저감할 수 있도록 차량운행 조건과 선로선형 및 궤도구조를 설계하는 것은 중요한 과제이다. CART(Classification And Regression Tree; 분류와 회귀나무) 분석은 패키지화된 좋은 분류 및 예측도구 기법으로 나무의 상위 분리수준에서 일반적으로 나타나는 가장 중요한 입력변수들을 사용하는 등의 입력변수를 선정하는 경우 매우 유용하다. 본 연구에서는 다변수 구간특성 및 환경인자를 고려한 검측 자료 상관관계 분석을 위한 회귀 나무기반 모델(TBM: Tree Based Model) 분석 수행을 위해 지하철 2호선 마모 데이터와 마모 데이터에 영향을 미치는 각종 다변수 구간특성 및 환경인자를 사용하였다. 2호선 지하철의 구간특성 인자 및 환경인자는 레일의 종류, 레일의 위치, 도상, 곡률반경, 캔트 슬랙 및 운행 일수 등으로 구분하였다. 레일의 종류는 ks-50kg과 ks-60kg 두 종류의 레일이 있으며, 레일의 위치는 지상과 지하로 크게 구분할 수 있다. 도상은 콘크리트 도상, 자갈 도상과 일부 구간의 방진상 콘크리트 도상으로 구분할 수 있으며, 곡률반경은 직선구간과 완화곡선 구간 및 최소 250m부터 627m까지 분포된 원 곡선 구간으로 구분할 수 있다. 캔트 간격은 최소 96cm 부터 120cm 간격으로 구분하며, 슬랙은 5~9cm에 분포하고, 운행 기간은 해당 기간 동안 유지보수 이력이 없는 구간을 선정하여 2005년부터 2006년까지 4번에 걸쳐 검측된 지하철 2호선 내선 마모데이터를 사용하였다. 총 X1부터 X7까지 총 7개의 구간특성 또는 환경특성을 영향인자로 선정하였으며, 이러한 영향인자에 의해 결정되는 종속 인자로 Y1인 직마모와 Y2인 측마모를 선정하여 이 중 실질적으로 지하철 궤도의 성능 평가에 주요 판단인자로 사용되는 측마모와 구간특성 및 환경영향인자와의 상관관계 분석을 수행하였다. 해당 마모 데이터가 검측되는 기간 동안 유지보수 이력이 없는 12272 point의 데이터를 검출하였고 CART 프로그램을 이용하여 데이터를 분석하였으며, CART 프로그램의 해석을 위해 종속변수인 직마모량은 각 검측 지점의 마모량에 해당하는 등급으로 변환하여 분석을 수행하였다. 레일의 마모에 영향을 미치는 구간특성 및 환경인자와 종속 변수로 사용된 레일의 마모량 사이의 CART를 이용한 상관관계 분석은 실제 구조물에서 영향인자간의 상관 관계와 유사하며, 추후 연구에서는 이를 바탕으로 하여 정량화된 검측 데이터를 종속변수로 하여 구간특성 또는 환경인자 등 외부 영향인자를 고려한 궤도 검측데이터와의 상관관계 분석을 수행할 계획이다.

  • PDF

Estimation of Leaf Wetness Duration Using An Empirical Model

  • Kim, Kwang-Soo;S.Elwynn Taylor;Mark L.Gleason;Kenneth J.Koehler
    • Proceedings of The Korean Society of Agricultural and Forest Meteorology Conference
    • /
    • 2001.06a
    • /
    • pp.93-96
    • /
    • 2001
  • Estimation of leaf wetness duration (LWD) facilitates assessment of the likelihood of outbreaks of many crop diseases. Models that estimate LWD may be more convenient and grower-friendly than measuring it with wetness sensors. Empirical models utilizing statistical procedures such as CART (Classification and Regression Tree; Gleason et al., 1994) have estimated LWD with accuracy comparable to that of electronic sensors.(omitted)

  • PDF

Screening Vital Few Variables and Development of Logistic Regression Model on a Large Data Set (대용량 자료에서 핵심적인 소수의 변수들의 선별과 로지스틱 회귀 모형의 전개)

  • Lim, Yong-B.;Cho, J.;Um, Kyung-A;Lee, Sun-Ah
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.2
    • /
    • pp.129-135
    • /
    • 2006
  • In the advance of computer technology, it is possible to keep all the related informations for monitoring equipments in control and huge amount of real time manufacturing data in a data base. Thus, the statistical analysis of large data sets with hundreds of thousands observations and hundred of independent variables whose some of values are missing at many observations is needed even though it is a formidable computational task. A tree structured approach to classification is capable of screening important independent variables and their interactions. In a Six Sigma project handling large amount of manufacturing data, one of the goals is to screen vital few variables among trivial many variables. In this paper we have reviewed and summarized CART, C4.5 and CHAID algorithms and proposed a simple method of screening vital few variables by selecting common variables screened by all the three algorithms. Also how to develop a logistics regression model on a large data set is discussed and illustrated through a large finance data set collected by a credit bureau for th purpose of predicting the bankruptcy of the company.

Design of Contact Scheduling System(CSS) for Customer Retention (고객유지를 위한 접촉스케줄링시스템의 설계)

  • Lee, Jee-Sik;Cho, You-Jung
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.3
    • /
    • pp.83-101
    • /
    • 2005
  • Customer retention is one of the major issues in life insurance industry, in which competition is increasingly fierce. There are many things for the life insurers to do many things to retain the customers. One of those things is to make sure to keep in touch with all customers. When an insurance-planner resigned, his/her customers must be taken care of by some planner-assistants. This article outlines the design of Contact Scheduling System (CSS) that supports planner-assistants for contacting the customers. Planner-assistants are unable to share the resigned insurance-planner's experience and knowledge regarding the customer relationship management. The CSS developed by employing both Classification And Regression Tree (CART) technique and Sequential Pattern Mining (SPM) technique has a two-stage process. In the first stage, it segments the customers into eight groups by CART model. Then it generates contact scheduling information consisting of contact-purpose, contact-interval and contact-channel, according to the segment's typical contact pattern. Contact-purpose is derived by schedule-driven, event-driven, or business-rule-driven. Schedule-driven contact is determined by SPM model. In the operation of CSS in a realistic situation, it shows a practicality in supporting planner-assistants to keep in touch with the customers efficiently and effectively.

  • PDF

Regression Tree based Modeling of Segmental Durations For Text-to-Speech Conversion System (Text-to-Speech 변환 시스템을 위한 회귀 트리 기반의 음소 지속 시간 모델링)

  • Pyo, Kyung-Ran;Kim, Hyung-Soon
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.191-195
    • /
    • 1999
  • 자연스럽고 명료한 한국어 Text-to-Speech 변환 시스템을 위해서 음소의 지속 시간을 제어하는 일은 매우 중요하다. 음소의 지속 시간은 여러 가지 문맥 정보에 의해서 변화하므로 제어 규칙에 의존하기 보다 방대한 데이터베이스를 이용하여 통계적인 기법으로 음소의 지속 시간에 변화를 주는 요인을 찾아내려고 하는 것이 지금의 추세이다. 본 연구에서도 트리기반 모델링 방법중의 하나인 CART(classification and regression tree) 방법을 사용하여 회귀 트리를 생성하고, 생성된 트리에 기반하여 음소의 지속 시간 예측 모델과, 자연스러운 끊어 읽기를 위한 휴지 기간 예측 모델을 제안하고 있다. 실험에 사용한 음성코퍼스는 550개의 문장으로 구성되어 있으며, 이 중 428개 문장으로 회귀 트리를 학습시켰고, 나머지 122개의 문장으로 실험하였다. 모델의 평가를 위해서 실제값과 예측값과의 상관관계를 구하였더니 음소의 지속 시간을 예측하는 회귀 트리에서는 상관계수가 0.84로 계산되었고, 끊어 읽는 경계에서의 휴지 기간을 예측하는 회귀 트리에서는 상관계수가 0.63으로 나타났다.

  • PDF

Perceptual Evaluation of Duration Models in Spoken Korean

  • Chung, Hyun-Song
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.207-215
    • /
    • 2002
  • Perceptual evaluation of duration models of spoken Korean was carried out based on the Classification and Regression Tree (CART) model for text-to-speech conversion. A reference set of durations was produced by a commercial text-to-speech synthesis system for comparison. The duration model which was built in the previous research (Chung & Huckvale, 2001) was applied to a Korean language speech synthesis diphone database, 'Hanmal (HN 1.0)'. The synthetic speech produced by the CART duration model was preferred in the subjective preference test by a small margin and the synthetic speech from the commercial system was superior in the clarity test. In the course of preparing the experiment, a labeled database of spoken Korean with 670 sentences was constructed. As a result of the experiment, a trained duration model for speech synthesis was obtained. The 'Hanmal' diphone database for Korean speech synthesis was also developed as a by-product of the perceptual evaluation.

  • PDF

Context-adaptive Smoothing for Speech Synthesis (음성 합성기를 위한 문맥 적응 스무딩 필터의 구현)

  • 이기승;김정수;이재원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.285-292
    • /
    • 2002
  • One of the problems that should be solved in Text-To-Speech (TTS) is discontinuities at unit-joining points. To cope with this problem, a smoothing method using a low-pass filter is employed in this paper, In the proposed soothing method, a filter coefficient that controls the amount of smoothing is determined according to contort information to be synthesized. This method efficiently reduces both discontinuities at unit-joining points and artifacts caused by undesired smoothing. The amount of smoothing is determined with discontinuities around unit-joins points in the current synthesized speech and discontinuities predicted from context. The discontinuity predictor is implemented by CART that has context feature variables. To evaluate the performance of the proposed method, a corpus-based concatenative TTS was used as a baseline system. More than 6075 of listeners realized that the quality of the synthesized speech through the proposed smoothing is superior to that of non-smoothing synthesized speech in both naturalness and intelligibility.

Impervious Surface Estimation of Jungnangcheon Basin Using Satellite Remote Sensing and Classification and Regression Tree (위성원격탐사와 분류 및 회귀트리를 이용한 중랑천 유역의 불투수층 추정)

  • Kim, Sooyoung;Heo, Jun-Haeng;Heo, Joon;Kim, SungHoon
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.28 no.6D
    • /
    • pp.915-922
    • /
    • 2008
  • Impervious surface is an important index for the estimation of urbanization and the assessment of environmental change. In addition, impervious surface influences on short-term rainfall-runoff model during rainy season in hydrology. Recently, the necessity of impervious surface estimation is increased because the effect of impervious surface is increased by rapid urbanization. In this study, impervious surface estimation is performed by using remote sensing image such as Landsat-7 ETM+image with $30m{\times}30m$ spatial resolution and satellite image with $1m{\times}1m$ spatial resolution based on Jungnangcheon basin. A tasseled cap transformation and NDVI(normalized difference vegetation index) transformation are applied to Landsat-7 ETM+ image to collect various predict variables. Moreover, the training data sets are collected by overlaying between Landsat-7 ETM+ image and satellite image, and CART(classification and regression tree) is applied to the training data sets. As a result, impervious surface prediction model is consisted and the impervious surface map is generated for Jungnangcheon basin.