• Title/Summary/Keyword: Mahalanobis

Search Result 181, Processing Time 0.038 seconds

Development of Predictive Models for Subway Disaster Forecasting (지하철 재난 전조 예측 모델 개발)

  • Park, Mi Yun;Park, Wan Soon;Lee, Jeonghun;Kwon, and Se Gon
    • Journal of Korean Society of Disaster and Security
    • /
    • v.10 no.2
    • /
    • pp.1-6
    • /
    • 2017
  • In the previous research, the research on the development of subway disaster detection system that discovers the disaster early warning of the subway station disaster and the evacuation to the passengers based on the Internet of things. This paper as a follow-up study analyzes the sensor data installed in the station in real time to quickly detect the disaster. In particular, we developed a statistical methodology based on the Mahalanobis distance in consideration of the environment that varies depending on the installation location of the sensor during initial system construction.

Visualizing multidimensional data in multiple groups (다그룹 다차원 데이터의 시각화)

  • Huh, Myung-Hoe
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.83-93
    • /
    • 2017
  • A typical approach to visualizing k (${\geq}2$)-group multidimensional data is to use Fisher's canonical discriminant analysis (CDA). CDA finds the best low-dimensional subspace that accommodates k group centroids in the Mahalanobis space. This paper proposes an alternative visualization procedure functioning in the Euclidean space, which finds the primary dimension with maximum discrimination of k group centroids and the secondary dimension with maximum dispersion of all observational units. This hybrid procedure is especially useful when the number of groups k is two.

Classification of Imbalanced Data Based on MTS-CBPSO Method: A Case Study of Financial Distress Prediction

  • Gu, Yuping;Cheng, Longsheng;Chang, Zhipeng
    • Journal of Information Processing Systems
    • /
    • v.15 no.3
    • /
    • pp.682-693
    • /
    • 2019
  • The traditional classification methods mostly assume that the data for class distribution is balanced, while imbalanced data is widely found in the real world. So it is important to solve the problem of classification with imbalanced data. In Mahalanobis-Taguchi system (MTS) algorithm, data classification model is constructed with the reference space and measurement reference scale which is come from a single normal group, and thus it is suitable to handle the imbalanced data problem. In this paper, an improved method of MTS-CBPSO is constructed by introducing the chaotic mapping and binary particle swarm optimization algorithm instead of orthogonal array and signal-to-noise ratio (SNR) to select the valid variables, in which G-means, F-measure, dimensionality reduction are regarded as the classification optimization target. This proposed method is also applied to the financial distress prediction of Chinese listed companies. Compared with the traditional MTS and the common classification methods such as SVM, C4.5, k-NN, it is showed that the MTS-CBPSO method has better result of prediction accuracy and dimensionality reduction.

Assessing the Performance of Pongamia pinnata (l.) Pierre under Ex-situ Condition in Karnataka

  • Divakara, Baragur Neelappa;Nikhitha, Chitradurga Umesh
    • Journal of Forest and Environmental Science
    • /
    • v.38 no.1
    • /
    • pp.12-20
    • /
    • 2022
  • Pongamia (Pongamia pinnata L.) as a source of non-edible oil, is potential tree species for biodiesel production. For several reasons, both technical and economical, the potential of P. pinnata is far from being realized. The exploitation of genetic diversity for crop improvement has been the major driving force for the exploration and ex situ/in situ conservation of plant genetic resources. However, P. pinnata improvement for high oil and seed production is not achieved because of unsystematic way of tree improvement. Performance of P. pinnata planted by Karnataka Forest Department was assessed based on yield potential by collecting 157 clones out of 264 clones established by Karnataka Forest Department research wing under different research circles/ranges. It was evident that the all the seed and pod traits were significantly different. Further, selection of superior germplasm based on oil and pod/seed parameters was achieved by application of Mahalanobis statistics and Tocher's technique. On the basis of D2 values for all possible 253 pairs of populations the 157 genotypes were grouped into 28 clusters. The clustering pattern showed that geographical diversity is not necessarily related to genetic diversity. Cluster means indicated a wide range of variation for all the pod and seed traits. The best cluster having total oil content of more than 34.9% with 100 seed weight of above 125 g viz. Cluster I, II, III, IX, XV, XIX, XXI, XXIII, XXVI and XXVII were selected for clonal propagation.

GNSS/Multiple IMUs Based Navigation Strategy Using the Mahalanobis Distance in Partially GNSS-denied Environments (GNSS 부분 음영 지역에서 마할라노비스 거리를 이용한 GNSS/다중 IMU 센서 기반 측위 알고리즘)

  • Kim, Jiyeon;Song, Moogeun;Kim, Jaehoon;Lee, Dongik
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.239-247
    • /
    • 2022
  • The existing studies on the localization in the GNSS (Global Navigation Satellite System) denied environment usually exploit low-cost MEMS IMU (Micro Electro Mechanical Systems Inertial Measurement Unit) sensors to replace the GNSS signals. However, the navigation system still requires GNSS signals for the normal environment. This paper presents an integrated GNSS/INS (Inertial Navigation System) navigation system which combines GNSS and multiple IMU sensors using extended Kalman filter in partially GNSS-denied environments. The position and velocity of the INS and GNSS are used as the inputs to the integrated navigation system. The Mahalanobis distance is used for novelty detection to detect the outlier of GNSS measurements. When the abnormality is detected in GNSS signals, GNSS data is excluded from the fusion process. The performance of the proposed method is evaluated using MATLAB/Simulink. The simulation results show that the proposed algorithm can achieve a higher degree of positioning accuracy in the partially GNSS-denied environment.

Comparative Study on Similarity Measurement Methods in CBR Cost Estimation

  • Ahn, Joseph;Park, Moonseo;Lee, Hyun-Soo;Ahn, Sung Jin;Ji, Sae-Hyun;Kim, Sooyoung;Song, Kwonsik;Lee, Jeong Hoon
    • International conference on construction engineering and project management
    • /
    • 2015.10a
    • /
    • pp.597-598
    • /
    • 2015
  • In order to improve the reliability of cost estimation results using CBR, there has been a continuous issue on similarity measurement to accurately compute the distance among attributes and cases to retrieve the most similar singular or plural cases. However, these existing similarity measures have limitations in taking the covariance among attributes into consideration and reflecting the effects of covariance in computation of distances among attributes. To deal with this challenging issue, this research examines the weighted Mahalanobis distance based similarity measure applied to CBR cost estimation and carries out the comparative study on the existing distance measurement methods of CBR. To validate the suggest CBR cost model, leave-one-out cross validation (LOOCV) using two different sets of simulation data are carried out. Consequently, this research is expected to provide an analysis of covariance effects in similarity measurement and a basis for further research on the fundamentals of case retrieval.

  • PDF

An Outlier Detection Algorithm and Data Integration Technique for Prediction of Hypertension (고혈압 예측을 위한 이상치 탐지 알고리즘 및 데이터 통합 기법)

  • Khongorzul Dashdondov;Mi-Hye Kim;Mi-Hwa Song
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.417-419
    • /
    • 2023
  • Hypertension is one of the leading causes of mortality worldwide. In recent years, the incidence of hypertension has increased dramatically, not only among the elderly but also among young people. In this regard, the use of machine-learning methods to diagnose the causes of hypertension has increased in recent years. In this study, we improved the prediction of hypertension detection using Mahalanobis distance-based multivariate outlier removal using the KNHANES database from the Korean national health data and the COVID-19 dataset from Kaggle. This study was divided into two modules. Initially, the data preprocessing step used merged datasets and decision-tree classifier-based feature selection. The next module applies a predictive analysis step to remove multivariate outliers using the Mahalanobis distance from the experimental dataset and makes a prediction of hypertension. In this study, we compared the accuracy of each classification model. The best results showed that the proposed MAH_RF algorithm had an accuracy of 82.66%. The proposed method can be used not only for hypertension but also for the detection of various diseases such as stroke and cardiovascular disease.

Fault Detection Method for Multivariate Process using Mahalanobis Distance and ICA (마할라노비스 거리와 독립성분분석을 이용한 다변량 공정 고장탐지 방법에 관한 연구)

  • Jung, Seunghwan;Kim, Sungshin
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.1
    • /
    • pp.22-28
    • /
    • 2021
  • Multivariate processes, such as chemical and mechanical process, power plants are operated in a state where several facilities are complexly connected, the fault of a particular system can also have fatal consequences for the entire process. In addition, since process data is measured in an unstable environment, outlier is likely to be include in the data. Therefore, monitoring technology is essential, which can remove outlier from measured data and detect failures in advance. In this paper, data obtained from dynamic and multivariate process models was used to detect fault in various type of processes. The dynamic process is a simulation of a process with autoregressive property, and the multivariate process is a model that describes a situation when a specific sensor fault. Mahalanobis distance was used to remove outlier contained in the data generated by dynamic process model and multivariate process model, and fault detection was performed using ICA. For comparison, we compared performance with and a conventional single ICA method. The proposed fault detection method improves performance by 0.84%p for bias data and 6.82%p for drift data in the dynamic process. In the case of the multivariate process, the performance was improves by 3.78%p, therefore, the proposed method showed better fault detection performance.

Optimization of Sensor Location for Real-Time Damage assessment of Cable in the cable-Stayed Bridge (사장교 케이블의 실시간 손상평가를 위한 센서 배치의 최적화)

  • Geon-Hyeok Bang;Gwang-Hee Heo;Jae-Hoon Lee;Yu-Jae Lee
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.27 no.6
    • /
    • pp.172-181
    • /
    • 2023
  • In this study, real-time damage evaluation of cable-stayed bridges was conducted for cable damage. ICP type acceleration sensors were used for real-time damage assessment of cable-stayed bridges, and Kinetic Energy Optimization Techniques (KEOT) were used to select the optimal conditions for the location and quantity of the sensors. When a structure vibrates by an external force, KEOT measures the value of the maximum deformation energy to determine the optimal measurement position and the quantity of sensors. The damage conditions in this study were limited to cable breakage, and cable damage was caused by dividing the cable-stayed bridge into four sections. Through FE structural analysis, a virtual model similar to the actual model was created in the real-time damage evaluation method of cable. After applying random oscillation waves to the generated virtual model and model structure, cable damage to the model structure was caused. The two data were compared by defining the response output from the virtual model as a corruption-free response and the response measured from the real model as a corruption-free data. The degree of damage was evaluated by applying the data of the damaged cable-stayed bridge to the Improved Mahalanobis Distance (IMD) theory from the data of the intact cable-stayed bridge. As a result of evaluating damage with IMD theory, it was identified as a useful damage evaluation technology that can properly find damage by section in real time and apply it to real-time monitoring.

On Assessing Inter-observer Agreement Independent of Variables' Measuring Units

  • Um, Yong-Hwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.529-536
    • /
    • 2006
  • Investigators use either Euclidean distance or volume of a simplex defined composed of data points as agreement index to measure chance-corrected agreement among observers for multivariate interval data. The agreement coefficient proposed by Um(2004) is based on a volume of a simplex and does not depend on the variables' measuring units. We consider a comparison of Um(2004)'s agreement coefficient with others based on two unit-free distance measures, Pearson distance and Mahalanobis distance. Comparison among them is made using hypothetical data set.

  • PDF