• 제목/요약/키워드: statistic machine learning

검색결과 11건 처리시간 0.028초

Improvement of Self Organizing Maps using Gap Statistic and Probability Distribution

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제8권2호
    • /
    • pp.116-120
    • /
    • 2008
  • Clustering is a method for unsupervised learning. General clustering tools have been depended on statistical methods and machine learning algorithms. One of the popular clustering algorithms based on machine learning is the self organizing map(SOM). SOM is a neural networks model for clustering. SOM and extended SOM have been used in diverse classification and clustering fields such as data mining. But, SOM has had a problem determining optimal number of clusters. In this paper, we propose an improvement of SOM using gap statistic and probability distribution. The gap statistic was introduced to estimate the number of clusters in a dataset. We use gap statistic for settling the problem of SOM. Also, in our research, weights of feature nodes are updated by probability distribution. After complete updating according to prior and posterior distributions, the weights of SOM have probability distributions for optima clustering. To verify improved performance of our work, we make experiments compared with other learning algorithms using simulation data sets.

Monitoring moisture content of timber structures using PZT-enabled sensing and machine learning

  • Chen, Lin;Xiong, Haibei;He, Yufeng;Li, Xiuquan;Kong, Qingzhao
    • Smart Structures and Systems
    • /
    • 제29권4호
    • /
    • pp.589-598
    • /
    • 2022
  • Timber structures are susceptible to structural damages caused by variations in moisture content (MC), inducing severe durability deterioration and safety issues. Therefore, it is of great significance to detect MC levels in timber structures. Compared to current methods for timber MC detection, which are time-consuming and require bulky equipment deployment, Lead Zirconate Titanate (PZT)-enabled stress wave sensing combined with statistic machine learning classification proposed in this paper show the advantage of the portable device and ease of operation. First, stress wave signals from different MC cases are excited and received by PZT sensors through active sensing. Subsequently, two non-baseline features are extracted from these stress wave signals. Finally, these features are fed to a statistic machine learning classifier (i.e., naïve Bayesian classification) to achieve MC detection of timber structures. Numerical simulations validate the feasibility of PZT-enabled sensing to perceive MC variations. Tests referring to five MC cases are conducted to verify the effectiveness of the proposed method. Results present high accuracy for timber MC detection, showing a great potential to conduct rapid and long-term monitoring of the MC level of timber structures in future field applications.

SHM data anomaly classification using machine learning strategies: A comparative study

  • Chou, Jau-Yu;Fu, Yuguang;Huang, Shieh-Kung;Chang, Chia-Ming
    • Smart Structures and Systems
    • /
    • 제29권1호
    • /
    • pp.77-91
    • /
    • 2022
  • Various monitoring systems have been implemented in civil infrastructure to ensure structural safety and integrity. In long-term monitoring, these systems generate a large amount of data, where anomalies are not unusual and can pose unique challenges for structural health monitoring applications, such as system identification and damage detection. Therefore, developing efficient techniques is quite essential to recognize the anomalies in monitoring data. In this study, several machine learning techniques are explored and implemented to detect and classify various types of data anomalies. A field dataset, which consists of one month long acceleration data obtained from a long-span cable-stayed bridge in China, is employed to examine the machine learning techniques for automated data anomaly detection. These techniques include the statistic-based pattern recognition network, spectrogram-based convolutional neural network, image-based time history convolutional neural network, image-based time-frequency hybrid convolution neural network (GoogLeNet), and proposed ensemble neural network model. The ensemble model deliberately combines different machine learning models to enhance anomaly classification performance. The results show that all these techniques can successfully detect and classify six types of data anomalies (i.e., missing, minor, outlier, square, trend, drift). Moreover, both image-based time history convolutional neural network and GoogLeNet are further investigated for the capability of autonomous online anomaly classification and found to effectively classify anomalies with decent performance. As seen in comparison with accuracy, the proposed ensemble neural network model outperforms the other three machine learning techniques. This study also evaluates the proposed ensemble neural network model to a blind test dataset. As found in the results, this ensemble model is effective for data anomaly detection and applicable for the signal characteristics changing over time.

Export-Import Value Nowcasting Procedure Using Big Data-AIS and Machine Learning Techniques

  • NICKELSON, Jimmy;NOORAENI, Rani;EFLIZA, EFLIZA
    • Asian Journal of Business Environment
    • /
    • 제12권3호
    • /
    • pp.1-12
    • /
    • 2022
  • Purpose: This study aims to investigate whether AIS data can be used as a supporting indicator or as an initial signal to describe Indonesia's export-import conditions in real-time. Research design, data, and methodology: This study performs several stages of data selection to obtain indicators from AIS that truly reflect export-import activities in Indonesia. Also, investigate the potential of AIS indicators in producing forecasts of the value and volume of Indonesian export-import using conventional statistical methods and machine learning techniques. Results: The six preprocessing stages defined in this study filtered AIS data from 661.8 million messages to 73.5 million messages. Seven predictors were formed from the selected AIS data. The AIS indicator can be used to provide an initial signal about Indonesia's import-export activities. Each export or import activity has its own predictor. Conventional statistical methods and machine learning techniques have the same ability both in forecasting Indonesia's exports and imports. Conclusions: Big data AIS can be used as a supporting indicator as a signal of the condition of export-import values in Indonesia. The right method of building indicators can make the data valuable for the performance of the forecasting model.

On the Application of Channel Characteristic-Based Physical Layer Authentication in Industrial Wireless Networks

  • Wang, Qiuhua;Kang, Mingyang;Yuan, Lifeng;Wang, Yunlu;Miao, Gongxun;Choo, Kim-Kwang Raymond
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권6호
    • /
    • pp.2255-2281
    • /
    • 2021
  • Channel characteristic-based physical layer authentication is one potential identity authentication scheme in wireless communication, such as used in a fog computing environment. While existing channel characteristic-based physical layer authentication schemes may be efficient when deployed in the conventional wireless network environment, they may be less efficient and practical for the industrial wireless communication environment due to the varying requirements. We observe that this is a topic that is understudied, and therefore in this paper, we review the constructions and performance of several commonly used test statistics and analyze their performance in typical industrial wireless networks using simulation experiments. The findings from the simulations show a number of limitations in existing channel characteristic-based physical layer authentication schemes. Therefore, we believe that it is a good idea to combine machine learning and multiple test statistics for identity authentication in future industrial wireless network deployment. Four machine learning methods prove that the scheme significantly improves the authentication accuracy and solves the challenge of choosing a threshold.

Application of Artificial Neural Networks to Search for Gravitational-Wave Signals Associated with Short Gamma-Ray Bursts

  • Oh, Sang Hoon;Kim, Kyungmin;Harry, Ian W.;Hodge, Kari A.;Kim, Young-Min;Lee, Chang-Hwan;Lee, Hyun Kyu;Oh, John J.;Son, Edwin J.
    • 천문학회보
    • /
    • 제39권2호
    • /
    • pp.107.1-107.1
    • /
    • 2014
  • We apply a machine learning algorithm, artificial neural network, to the search for gravitational-wave signals associated with short gamma-ray bursts. The multi-dimensional samples consisting of data corresponding to the statistical and physical quantities from the coherent search pipeline are fed into the artificial neural network to distinguish simulated gravitational-wave signals from background noise artifacts. Our result shows that the data classification efficiency at a fixed false alarm probability is improved by the artificial neural network in comparison to the conventional detection statistic. Therefore, this algorithm increases the distance at which a gravitational-wave signal could be observed in coincidence with a gamma-ray burst. We also evaluate the gravitational-wave data within a few seconds of the selected short gamma-ray bursts' event times using the trained networks and obtain the false alarm probability. We suggest that artificial neural network can be a complementary method to the conventional detection statistic for identifying gravitational-wave signals related to the short gamma-ray bursts.

  • PDF

신경망과 운전자 알고리즘을 이용한 스팸 메일 필터링 기법에 구현과 성능평가 (Implementation and Experimental Results of Neural Network and Genetic Algorithm based Spam Filtering Technique)

  • 김범배;최형기
    • 정보처리학회논문지C
    • /
    • 제13C권2호
    • /
    • pp.259-266
    • /
    • 2006
  • 스팸 메일의 양의 급증함에 따라, 다양한 스팸 메일 필터링 기법이 제시되고 있다. 이런 필터링 기법 가운데, 학습 기반 필터링 기법은 현재 가장 보편화된 필터링 기법 가운데 하나이다. 본고에서는 신경망과, 유전자알고리즘, 카이제곱통계를 이용한 학습 기반 필터링 기법을 제시한다. 제안된 필터링 기법은 기존 필터링 기법의 문제를 해결하고, 스팸 메일 필터링에 높은 정확도를 제공할 수 있다 제안된 필터링 기법은 스팸메일 필터링 정확도와 정상 메일 필터링 정확도에서 각각 95.25%와 95.31%의 높은 정확도를 보인다. 이런 실험 결과는 기존의 규칙 기반 필터링 기법과 베이지안 필터링 기법에 비해 각각 7%, 12% 이상 높은 수치이다.

계절성 시계열 자료의 concept drift 탐지를 위한 새로운 창 전략 (A novel window strategy for concept drift detection in seasonal time series)

  • 이도운;배수민;김강섭;안순홍
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 춘계학술발표대회
    • /
    • pp.377-379
    • /
    • 2023
  • Concept drift detection on data stream is the major issue to maintain the performance of the machine learning model. Since the online stream is to be a function of time, the classical statistic methods are hard to apply. In particular case of seasonal time series, a novel window strategy with Fourier analysis however, gives a chance to adapt the classical methods on the series. We explore the KS-test for an adaptation of the periodic time series and show that this strategy handles a complicate time series as an ordinary tabular dataset. We verify that the detection with the strategy takes the second place in time delay and shows the best performance in false alarm rate and detection accuracy comparing to that of arbitrary window sizes.

Evaluating flexural strength of concrete with steel fibre by using machine learning techniques

  • Sharma, Nitisha;Thakur, Mohindra S.;Upadhya, Ankita;Sihag, Parveen
    • Composite Materials and Engineering
    • /
    • 제3권3호
    • /
    • pp.201-220
    • /
    • 2021
  • In this study, potential of three machine learning techniques i.e., M5P, Support vector machines and Gaussian processes were evaluated to find the best algorithm for the prediction of flexural strength of concrete mix with steel fibre. The study comprises the comparison of results obtained from above-said techniques for given dataset. The dataset consists of 124 observations from past research studies and this dataset is randomly divided into two subsets namely training and testing datasets with (70-30)% proportion by weight. Cement, fine aggregates, coarse aggregates, water, super plasticizer/ high-range water reducer, steel fibre, fibre length and curing days were taken as input parameters whereas flexural strength of the concrete mix was taken as the output parameter. Performance of the techniques was checked by statistic evaluation parameters. Results show that the Gaussian process technique works better than other techniques with its minimum error bandwidth. Statistical analysis shows that the Gaussian process predicts better results with higher coefficient of correlation value (0.9138) and minimum mean absolute error (1.2954) and Root mean square error value (1.9672). Sensitivity analysis proves that steel fibre is the significant parameter among other parameters to predict the flexural strength of concrete mix. According to the shape of the fibre, the mixed type performs better for this data than the hooked shape of the steel fibre, which has a higher CC of 0.9649, which shows that the shape of fibers do effect the flexural strength of the concrete. However, the intricacy of the mixed fibres needs further investigations. For future mixes, the most favorable range for the increase in flexural strength of concrete mix found to be (1-3)%.

Evaluation of soil-concrete interface shear strength based on LS-SVM

  • Zhang, Chunshun;Ji, Jian;Gui, Yilin;Kodikara, Jayantha;Yang, Sheng-Qi;He, Lei
    • Geomechanics and Engineering
    • /
    • 제11권3호
    • /
    • pp.361-372
    • /
    • 2016
  • The soil-concrete interface shear strength, although has been extensively studied, is still difficult to predict as a result of the dependence on many factors such as normal stresses, surface roughness, particle sizes, moisture contents, dilation angles of soils, etc. In this study, a well-known rigorous statistical learning approach, namely the least squares support vector machine (LS-SVM) realized in a ubiquitous spreadsheet platform is firstly used in estimating the soil-structure interface shear strength. Instead of studying the complicated mechanism, LS-SVM enables to explore the possible link between the fundamental factors and the interface shear strengths, via a sophisticated statistic approach. As a preliminary investigation, the authors study the expansive soils that are found extensively in most countries. To reduce the complexity, three major influential factors, e.g., initial moisture contents, initial dry densities and normal stresses of soils are taken into account in developing the LS-SVM models for the soil-concrete interface shear strengths. The predicted results by LS-SVM show reasonably good agreement with experimental data from direct shear tests.