• Title/Summary/Keyword: Statistical feature

Search Result 666, Processing Time 0.026 seconds

Properties of chi-square statistic and information gain for feature selection of imbalanced text data (불균형 텍스트 데이터의 변수 선택에 있어서의 카이제곱통계량과 정보이득의 특징)

  • Mun, Hye In;Son, Won
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.469-484
    • /
    • 2022
  • Since a large text corpus contains hundred-thousand unique words, text data is one of the typical large-dimensional data. Therefore, various feature selection methods have been proposed for dimension reduction. Feature selection methods can improve the prediction accuracy. In addition, with reduced data size, computational efficiency also can be achieved. The chi-square statistic and the information gain are two of the most popular measures for identifying interesting terms from text data. In this paper, we investigate the theoretical properties of the chi-square statistic and the information gain. We show that the two filtering metrics share theoretical properties such as non-negativity and convexity. However, they are different from each other in the sense that the information gain is prone to select more negative features than the chi-square statistic in imbalanced text data.

Data abnormal detection using bidirectional long-short neural network combined with artificial experience

  • Yang, Kang;Jiang, Huachen;Ding, Youliang;Wang, Manya;Wan, Chunfeng
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.117-127
    • /
    • 2022
  • Data anomalies seriously threaten the reliability of the bridge structural health monitoring system and may trigger system misjudgment. To overcome the above problem, an efficient and accurate data anomaly detection method is desiderated. Traditional anomaly detection methods extract various abnormal features as the key indicators to identify data anomalies. Then set thresholds artificially for various features to identify specific anomalies, which is the artificial experience method. However, limited by the poor generalization ability among sensors, this method often leads to high labor costs. Another approach to anomaly detection is a data-driven approach based on machine learning methods. Among these, the bidirectional long-short memory neural network (BiLSTM), as an effective classification method, excels at finding complex relationships in multivariate time series data. However, training unprocessed original signals often leads to low computation efficiency and poor convergence, for lacking appropriate feature selection. Therefore, this article combines the advantages of the two methods by proposing a deep learning method with manual experience statistical features fed into it. Experimental comparative studies illustrate that the BiLSTM model with appropriate feature input has an accuracy rate of over 87-94%. Meanwhile, this paper provides basic principles of data cleaning and discusses the typical features of various anomalies. Furthermore, the optimization strategies of the feature space selection based on artificial experience are also highlighted.

The Study of Failure Mode Data Development and Feature Parameter's Reliability Verification Using LSTM Algorithm for 2-Stroke Low Speed Engine for Ship's Propulsion (선박 추진용 2행정 저속엔진의 고장모드 데이터 개발 및 LSTM 알고리즘을 활용한 특성인자 신뢰성 검증연구)

  • Jae-Cheul Park;Hyuk-Chan Kwon;Chul-Hwan Kim;Hwa-Sup Jang
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.60 no.2
    • /
    • pp.95-109
    • /
    • 2023
  • In the 4th industrial revolution, changes in the technological paradigm have had a direct impact on the maintenance system of ships. The 2-stroke low speed engine system integrates with the core equipment required for propulsive power. The Condition Based Management (CBM) is defined as a technology that predictive maintenance methods in existing calender-based or running time based maintenance systems by monitoring the condition of machinery and diagnosis/prognosis failures. In this study, we have established a framework for CBM technology development on our own, and are engaged in engineering-based failure analysis, data development and management, data feature analysis and pre-processing, and verified the reliability of failure mode DB using LSTM algorithms. We developed various simulated failure mode scenarios for 2-stroke low speed engine and researched to produce data on onshore basis test_beds. The analysis and pre-processing of normal and abnormal status data acquired through failure mode simulation experiment used various Exploratory Data Analysis (EDA) techniques to feature extract not only data on the performance and efficiency of 2-stroke low speed engine but also key feature data using multivariate statistical analysis. In addition, by developing an LSTM classification algorithm, we tried to verify the reliability of various failure mode data with time-series characteristics.

A Study on A Biometric Bits Extraction Method of A Cancelable face Template based on A Helper Data (보조정보에 기반한 가변 얼굴템플릿의 이진화 방법의 연구)

  • Lee, Hyung-Gu;Kim, Jai-Hie
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.1
    • /
    • pp.83-90
    • /
    • 2010
  • Cancelable biometrics is a robust and secure biometric recognition method using revocable biometric template in order to prevent possible compromisation of the original biometric data. In this paper, we present a new cancelable bits extraction method for the facial data. We use our previous cancelable feature template for the bits extraction. The adopted cancelable template is generated from two different original face feature vectors extracted from two different appearance-based approaches. Each element of feature vectors is re-ordered, and the scrambled features are added. With the added feature, biometric bits string is extracted using helper data based method. In this technique, helper data is generated using statistical property of the added feature vector, which can be easily replaced with straightforward revocation. Because, the helper data only utilizes partial information of the added feature, our proposed method is a more secure method than our previous one. The proposed method utilizes the helper data to reduce feature variance within the same individual and increase the distinctiveness of bit strings of different individuals for good recognition performance. For a security evaluation of our proposed method, a scenario in which the system is compromised by an adversary is also considered. In our experiments, we analyze the proposed method with respect to performance and security using the extended YALEB face database

Process operation improvement methodology based on statistical data analysis (통계적 분석기법을 이용한 공정 운전 향상의 방법)

  • Hwang, Dae-Hee;Ahn, Tae-Jin;Han, Chonghun
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1997.10a
    • /
    • pp.1516-1519
    • /
    • 1997
  • With disseminationof Distributed Control Systems(DCS), the huge amounts of process operation data could have been available and led to figure out process behaviors better on the statistical basis. Until now, the statistical modeling technology has been susally applied to process monitoring and fault diagnosis. however, it has been also thought that these process information, extracted from statistical analysis, might serve a great opportunity for process operation improvements and process improvements. This paper proposed a general methodolgy for process operation improvements including data analysis, backing up the result of analysis based on the methodology, and the mapping physical physical phenomena to the Principal Components(PC) which is the most distinguished feature in the methodology form traditional statistical analyses. The application of the proposed methodology to the Balst Furnace(BF) process has been presented for details. The BF process is one of the complicated processes, due to the highly nonlinear and correlated behaviors, and so the analysis for the process based on the mathematical modeling has been very difficult. So the statisitical analysis has come forward as a alternative way for the useful analysis. Using the proposed methodology, we could interpret the complicated process, the BF, better than any other mathematical methods and find the direction for process operation improvement. The direction of process operationimprovement, in the BF case, is to increase the fludization and the permeability, while decreasing the effect of tapping operation. These guide directions, with those physical meanings, could save fuel cost and process operator's pressure for proper actions, the better set point changes, in addition to the assistance with the better knowledge of the process. Open to set point change, the BF has a variety of steady state modes. In usual almost chemical processes are under the same situation with the BF in the point of multimode steady states. The proposed methodology focused on the application to the multimode steady state process such as the BF, consequently can be applied to any chemical processes set point changing whether operator intervened or not.

  • PDF

Analysis on the Characteristics of Urban Decline Using GIS and Spatial Statistical Method : The Case of Gwangju Metropolitan City (GIS와 공간통계기법을 활용한 도시쇠퇴 특성 분석 - 광주광역시를 중심으로 -)

  • Jang, Mun-Hyun
    • Journal of the Korean association of regional geographers
    • /
    • v.22 no.2
    • /
    • pp.424-438
    • /
    • 2016
  • In an effort to prevent urban decline and hollowing-out phenomenon and to vitalize stagnant local economy, a new urban regeneration paradigm is on the rise. This study aims to analyze urban decline characteristics using the spatial statistical method and GIS on the basis of decline standards in the Urban Regeneration Special Act, and spatial autocorrelation technique. The Gwangju Metropolitan City was set as a research target, and the decline standards in the Urban Regeneration Special Act - population reduction, business declines, and outworn buildings - were applied as the indicator to secure the objectivity. In particular, this study has a distinctive feature from the other existing ones, as applying GIS and the spatial statistical technique, in a sense to make urban decline characteristics analysis by the spatial autocorrelation technique. The overall analysis procedure was carried out by applying the standards of designating urban regeneration regions, and following the spatial exploratory procedure step by step. Therefore, the spatial statistical method procedure and the urban decline characteristics analysis data being presented in this study, as the results, are expected to contribute to the urban decline diagnosis at the level of metropolitan city, as well as to provide useful information for spatial decision making in accordance with urban regeneration.

  • PDF

A license plate area segmentation algorithm using statistical processing on color and edge information (색상과 에지에 대한 통계 처리를 이용한 번호판 영역 분할 알고리즘)

  • Seok Jung-Chul;Kim Ku-Jin;Baek Nak-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.353-360
    • /
    • 2006
  • This paper presents a robust algorithm for segmenting a vehicle license plate area from a road image. We consider the features of license plates in three aspects : 1) edges due to the characters in the plate, 2) colors in the plate, and 3) geometric properties of the plate. In the preprocessing step, we compute the thresholds based on each feature to decide whether a pixel is inside a plate or not. A statistical approach is applied to the sample images to compute the thresholds. For a given road image, our algorithm binarizes it by using the thresholds. Then, we select three candidate regions to be a plate by searching the binary image with a moving window. The plate area is selected among the candidates with simple heuristics. This algorithm robustly detects the plate against the transformation or the difference of color intensity of the plate in the input image. Moreover, the preprocessing step requires only a small number of sample images for the statistical processing. The experimental results show that the algorithm has 97.8% of successful segmentation of the plate from 228 input images. Our prototype implementation shows average processing time of 0.676 seconds per image for a set of $1280{\times}960$ images, executed on a 3GHz Pentium4 PC with 512M byte memory.

Segmentation and estimation of surfaces from statistical probability of texture features

  • Terauchi, Mutsuhiro;Nagamachi, Mitsuo;Koji-Ito;Tsuji, Toshio
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1988.10b
    • /
    • pp.826-831
    • /
    • 1988
  • This paper presents an approach to segment an image into areas of surfaces, and to compute the surface properties from a gray-scale image in order to describe the surfaces for reconstruction of the 3-D shape of the objects. In general, an rigid body has several surfaces and many edges. But if it is not polyhedoron, it is necessary not only to describe the relation between surfaces, i.e. its line drawings but also to represent the surfaces' equations itself. In order to compute the surfaces' equation we use a probability of edge distribution. At first it is extracted edges from a gray-level image as much as possible. These are not only the points that maximize the change of an image intensuty but candidates which can be seemed to be edges. Next, other character of a surface (color, coordinates and image intensity) are extracted. In our study, we call the all feature of a surface as "texture", for example color, intensity level, orientation of an edge, shape of a surface and so on. These features of a surface on a pixel of an image plane are mapped to a point of the feature space, and segmented to each groups by cluster analysis on this space. These groups are considered to represent object surface in an image plane. Finally, the states of object surface in 3-D space are computed from distributional probability of local and overall statistical features of a surface, and from shape of a surface.a surface.

  • PDF

Speaker Recognition Using Dynamic Time Variation fo Orthogonal Parameters (직교인자의 동적 특성을 이용한 화자인식)

  • 배철수
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.9
    • /
    • pp.993-1000
    • /
    • 1992
  • Recently, many researchers have found that the speaker recognition rate is high when they perform the speaker recognition using statistical processing method of orthogonal parameter, which are derived from the analysis of speech signal and contain much of the speaker's identity. This method, however, has problems caused by vocalization speed or time varying feature of speed. Thus, to solve these problems, this paper proposes two methods of speaker recognition which combine DTW algorithm with the method using orthogonal parameters extracted from $Karthumem-Lo\'{e}ve$ Transform method which applies orthogonal parameters as feature vector to ETW algorithm and the other is the method which applies orthogonal parameters to the optimal path. In addition, we compare speaker recognition rate obtained from the proposed two method with that from the conventional method of statistical process of orthogonal parameters. Orthogonal parameters used in this paper are derived from both linear prediction coefficients and partial correlation coefficients of speech signal.

  • PDF

EPS Gesture Signal Recognition using Deep Learning Model (심층 학습 모델을 이용한 EPS 동작 신호의 인식)

  • Lee, Yu ra;Kim, Soo Hyung;Kim, Young Chul;Na, In Seop
    • Smart Media Journal
    • /
    • v.5 no.3
    • /
    • pp.35-41
    • /
    • 2016
  • In this paper, we propose hand-gesture signal recognition based on EPS(Electronic Potential Sensor) using Deep learning model. Extracted signals which from Electronic field based sensor, EPS have much of the noise, so it must remove in pre-processing. After the noise are removed with filter using frequency feature, the signals are reconstructed with dimensional transformation to overcome limit which have just one-dimension feature with voltage value for using convolution operation. Then, the reconstructed signal data is finally classified and recognized using multiple learning layers model based on deep learning. Since the statistical model based on probability is sensitive to initial parameters, the result can change after training in modeling phase. Deep learning model can overcome this problem because of several layers in training phase. In experiment, we used two different deep learning structures, Convolutional neural networks and Recurrent Neural Network and compared with statistical model algorithm with four kinds of gestures. The recognition result of method using convolutional neural network is better than other algorithms in EPS gesture signal recognition.