• 제목/요약/키워드: Statistical feature

검색결과 664건 처리시간 0.03초

토픽 모형을 이용한 텍스트 데이터의 단어 선택 (Feature selection for text data via topic modeling)

  • 장우솔;김예은;손원
    • 응용통계연구
    • /
    • 제35권6호
    • /
    • pp.739-754
    • /
    • 2022
  • 텍스트 데이터는 일반적으로 많은 변수를 포함하고 있으며 변수들 사이의 연관성도 높아 통계 분석의 정확성, 효율성 등에서 문제가 생길 수 있다. 이러한 문제점에 대처하기 위해 목표 변수가 주어진 지도 학습에서는 목표 변수를 잘 설명할 수 있는 단어들을 선택하여 이 단어들만 통계 분석에 이용하기도 한다. 반면, 비지도 학습에서는 목표 변수가 주어지지 않으므로 지도 학습에서와 같은 단어 선택 절차를 활용하기 어렵다. 이 연구에서는 토픽 모형을 이용하여 지도 학습에서의 목표 변수를 대신할 수 있는 토픽을 생성하고 각 토픽별로 연관성이 높은 단어들을 선택하는 단어 선택 절차를 제안한다. 제안된 절차를 실제 텍스트 데이터에 적용한 결과, 단어 선택 절차를 이용하면 많은 토픽에서 공통적으로 자주 등장하는 단어들을 제거함으로써 토픽을 더 명확하게 식별할 수 있었다. 또한, 군집 분석에 적용한 결과, 군집과 범주 사이에 높은 연관성을 가지는 군집 분석 결과를 얻을 수 있는 것으로 나타났다. 목표 변수에 대한 정보없이 토픽 모형을 이용하여 선택한 단어들을 분류 분석에 적용하였을 때 목표 변수를 이용하여 단어들을 선택한 경우와 비슷한 분류 정확성을 얻을 수 있음도 확인하였다.

숨은마코프모형을 이용하는 음성 끝점 검출을 위한 이산 특징벡터 (A Discrete Feature Vector for Endpoint Detection of Speech with Hidden Markov Model)

  • 이재기;오창혁
    • 응용통계연구
    • /
    • 제21권6호
    • /
    • pp.959-967
    • /
    • 2008
  • 본 연구의 목적은 숨은마코프모형을 사용하여 음성구간의 끝점을 검출하는 문제에서 소음의 환경에서도 강건하며 계산의 부하가 적은 이산형 특징벡터를 제안하고 이의 성질을 실증적으로 밝히는 것이다. 제시된 특징벡터는 일차원의 소리 신호의 에너지의 변화율을 나타내는 경사도이며 숨은마코프모형과 관련된 계산에서의 부하를 감소하기 위하여 세 개의 값으로 이산화하였다. 여러 소음 수준의 끝점 검출의 실험에서, 제시된 특징벡터가 잡음 환경에서도 강건함을 보였다.

Hepatitis C Stage Classification with hybridization of GA and Chi2 Feature Selection

  • Umar, Rukayya;Adeshina, Steve;Boukar, Moussa Mahamat
    • International Journal of Computer Science & Network Security
    • /
    • 제22권1호
    • /
    • pp.167-174
    • /
    • 2022
  • In metaheuristic algorithms such as Genetic Algorithm (GA), initial population has a significant impact as it affects the time such algorithm takes to obtain an optimal solution to the given problem. In addition, it may influence the quality of the solution obtained. In the machine learning field, feature selection is an important process to attaining a good performance model; Genetic algorithm has been utilized for this purpose by scientists. However, the characteristics of Genetic algorithm, namely random initial population generation from a vector of feature elements, may influence solution and execution time. In this paper, the use of a statistical algorithm has been introduced (Chi2) for feature relevant checks where p-values of conditional independence were considered. Features with low p-values were discarded and subject relevant subset of features to Genetic Algorithm. This is to gain a level of certainty of the fitness of features randomly selected. An ensembled-based learning model for Hepatitis has been developed for Hepatitis C stage classification. 1385 samples were used using Egyptian-dataset obtained from UCI repository. The comparative evaluation confirms decreased in execution time and an increase in model performance accuracy from 56% to 63%.

Choice of Statistical Calibration Procedures When the Standard Measurement is Also Subject to Error

  • Lee, Seung-Hoon;Yum, Bong-Jin
    • Journal of the Korean Statistical Society
    • /
    • 제14권2호
    • /
    • pp.63-75
    • /
    • 1985
  • This paper considers a statistical calibration problem in which the standard as wel as the nonstandard measurement is subject to error. Since the classicla approach cannot handle this situation properly, a functional relationship model with additional feature of prediction is proposed. For the analysis of the problem four different approaches-two estimation techniques (ordinary and grouping least squares) combined with two prediction methods (classical and inverse prediction)-are considered. By Monte Carlo simulation the perromance of each approach is assessed in term of the probability of concentration. The simulation results indicate that the ordinary least squares with inverse prediction is generally preferred in interpolation while the grouping least squares with classical prediction turns out to be better in extrapolation.

  • PDF

일영 통계기계번역에서 의존문법 문장 구조와 품사 정보를 사용한 클러스터링 기법 (A Clustering Method using Dependency Structure and Part-Of-Speech(POS) for Japanese-English Statistical Machine Translation)

  • 김한경;나휘동;이금희;이종혁
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제15권12호
    • /
    • pp.993-997
    • /
    • 2009
  • 클러스터링 기법은 다양한 분야에서 이용되어 왔으며, 통계 기반 기계번역에서도 익히 사용된 기법이다. 그러나 기존의 연구에서는 깊이 있는 문법적인 분석 없이 기계학습 기법을 사용하거나, 문장구조의 정보를 사용하더라도 정규식을 이용하여 판별하는 선에서 그치는 경우가 많았다. 본 논문에서는 각 문장의 의존관계 문법에 따른 구조와 조사 등의 품사 정보를 사용하여 문장구조를 파악하고 유형별로 분류하여 각각에 특화된 언어모델을 획득하는 방법과, 이를 구 기반 통계기계번역에 추가적인 정보로 사용하여 번역성능을 향상하는 데 이용하는 방법을 제안한다.

데이터 마이닝과 통계적 기법을 통합한 최적화 기법 (Optimization Methodology Integrated Data Mining and Statistical Method)

  • 송서일;신상문;정혜진
    • 품질경영학회지
    • /
    • 제34권4호
    • /
    • pp.33-39
    • /
    • 2006
  • These days manufacture technology and manufacture environment are changing rapidly. By development of computer and enlargement of technique, most of manufacture field are computerized. In order to win international competition, it is important for companies how fast get the useful information from vast data. Statistical process control(SPC) techniques have been used as a problem solution tool at manufacturing process until present. However, these statistical methods are not applied more extensively because it has much restrictions in realistic problems. These statistical techniques have lots of problems when much data and factors are analyzed. In this paper, we proposed more practical and efficient a new statistical design technique which integrated data mining (DM) and statistical methods as alternative of problems. First step is selecting significant factor using DM feature selection algorithm from data of manufacturing process including many factors. Second step is finding optimum of process after estimating response function through response surface methodology(RSM) that is a statistical techniques

특징 선택과 융합 방법을 이용한 음성 감정 인식 (Speech Emotion Recognition using Feature Selection and Fusion Method)

  • 김원구
    • 전기학회논문지
    • /
    • 제66권8호
    • /
    • pp.1265-1271
    • /
    • 2017
  • In this paper, the speech parameter fusion method is studied to improve the performance of the conventional emotion recognition system. For this purpose, the combination of the parameters that show the best performance by combining the cepstrum parameters and the various pitch parameters used in the conventional emotion recognition system are selected. Various pitch parameters were generated using numerical and statistical methods using pitch of speech. Performance evaluation was performed on the emotion recognition system using Gaussian mixture model(GMM) to select the pitch parameters that showed the best performance in combination with cepstrum parameters. As a parameter selection method, sequential feature selection method was used. In the experiment to distinguish the four emotions of normal, joy, sadness and angry, fifteen of the total 56 pitch parameters were selected and showed the best recognition performance when fused with cepstrum and delta cepstrum coefficients. This is a 48.9% reduction in the error of emotion recognition system using only pitch parameters.

원심펌프용 메커니컬 씰 결함 검출 신호 특성 (Fault Detection Signal for Mechanical Seal of Centrifugal Pump)

  • 정래혁;이병곤
    • 한국안전학회지
    • /
    • 제27권3호
    • /
    • pp.20-27
    • /
    • 2012
  • Mechanical seals are one of main components of high speed centrifugal pumps. So, it is very important to detect the faults (scratch, notch, indentation, wear) of mechanical seals since the damage of seal can cause a critical failures or accidents of machinery system. In the past, many researchers mainly performed to detect the seal fault using the time signals measured from sensors. Recently, studies are focused on the development of on-line real time monitoring system. But study on the feature parameters used for fault detection of mechanical seals has a little been performed. In this paper, we showed feature parameters extracted from accelerated and acoustic signals by using the discrete wavelet transform (DWT), alpha coefficient, statistical parameters. And also verified the possibility for fault detection of mechanical seal.

에폭시/마이카 커플러를 이용한 고정자권선 결함신호 특징추출에 관한연구 (A Study on Feature Extraction of Fault Signal for Stator Winding using Epoxy/Mica Coupler)

  • 박재준;김희동
    • 한국전기전자재료학회:학술대회논문집
    • /
    • 한국전기전자재료학회 2005년도 하계학술대회 논문집 Vol.6
    • /
    • pp.225-226
    • /
    • 2005
  • In this Study, we have acquired 5-simulation Fault types Signals of high voltage Motor stator winding using epoxy/mica coupler. In order to know stator winding fault type using fault signals, we have performed feature extraction to apply wavelet transform technique. we have obtained skewness and kurtosis as statistical parameters of fault signal pattern from non deterioration state winding. We have know that 5 fault signals types have done an exponential function pattern shape but individually fault a class widely was different each other a signal waveform of pattern.

  • PDF

택배 산업에서의 물류 서비스 품질 측정 (Measuring Logistics Quality in Parcel Delivery Service)

  • 최성운;백봉기
    • 대한안전경영과학회지
    • /
    • 제5권4호
    • /
    • pp.219-228
    • /
    • 2003
  • Today, the size of a parcel delivery service market, which is a part of logistics, at home and abroad has been extended rapidly and its growth rate is expected to increase hereafter. At this point, when service is applied strategically in a parcel delivery service, we need to understand the feature of logistics service quality by view of customer differentiation. In this study, we try to constitute a model of the feature of logistics service, which is combined five features of service quality (Responsiveness, Empathy, Reliability, Accuracy and Tangibility) based on measuring model of SERVQUAL with logistics service, and to know the feature of logistics service from parcel delivery service by jobs with statistical tool.