• Title/Summary/Keyword: random data analysis

Search Result 1,737, Processing Time 0.025 seconds

Study on the Effect of Training Data Sampling Strategy on the Accuracy of the Landslide Susceptibility Analysis Using Random Forest Method (Random Forest 기법을 이용한 산사태 취약성 평가 시 훈련 데이터 선택이 결과 정확도에 미치는 영향)

  • Kang, Kyoung-Hee;Park, Hyuck-Jin
    • Economic and Environmental Geology
    • /
    • v.52 no.2
    • /
    • pp.199-212
    • /
    • 2019
  • In the machine learning techniques, the sampling strategy of the training data affects a performance of the prediction model such as generalizing ability as well as prediction accuracy. Especially, in landslide susceptibility analysis, the data sampling procedure is the essential step for setting the training data because the number of non-landslide points is much bigger than the number of landslide points. However, the previous researches did not consider the various sampling methods for the training data. That is, the previous studies selected the training data randomly. Therefore, in this study the authors proposed several different sampling methods and assessed the effect of the sampling strategies of the training data in landslide susceptibility analysis. For that, total six different scenarios were set up based on the sampling strategies of landslide points and non-landslide points. Then Random Forest technique was trained on the basis of six different scenarios and the attribute importance for each input variable was evaluated. Subsequently, the landslide susceptibility maps were produced using the input variables and their attribute importances. In the analysis results, the AUC values of the landslide susceptibility maps, obtained from six different sampling strategies, showed high prediction rates, ranges from 70 % to 80 %. It means that the Random Forest technique shows appropriate predictive performance and the attribute importance for the input variables obtained from Random Forest can be used as the weight of landslide conditioning factors in the susceptibility analysis. In addition, the analysis results obtained using specific sampling strategies for training data show higher prediction accuracy than the analysis results using the previous random sampling method.

Random Vibration Analysis of Portable Power Supply Container for Radar With U.S. Military Standards (미 군사규격을 적용한 레이더 전력공급용 이동식 컨테이너의 Random Vibration 해석)

  • Do, Jae-Seok;Hur, Jang-Wook
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.21 no.9
    • /
    • pp.71-77
    • /
    • 2022
  • In times of war or emergencies, weapon systems, such as radars, must receive stable power. This can be achieved using improved onboard portable power systems made of steel containers. However, a breakdown can occur in the event of random vibration during transportation via a vehicle or train. Electrical-power shortages or restrictions pose a significant threat to security. In this study, Composite Wheeled Vehicle(CWV) data and rail cargo data with Acceleration Spectral Density(ASD), specified in MIL-STD-810H METHOD 514.8, were interpreted as input data of the three-axis random vibration method using ANSYS 19.2. Modal analysis was performed up to 500 Hz, and deformations in modes 1 to 117 were calculated to utilize all ASD data. The maximum equivalent stress in the three-axis direction was obtained using a random vibration analysis. Similarly, the margin of safety was calculated using the derived equivalent stress and material properties. Overall, the analysis verified that the portable container designed for the power supply system satisfied the required vibration demands.

A HGLM framework for Meta-Analysis of Clinical Trials with Binary Outcomes

  • Ha, Il-Do
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1429-1440
    • /
    • 2008
  • In a meta-analysis combining the results from different clinical trials, it is important to consider the possible heterogeneity in outcomes between trials. Such variations can be regarded as random effects. Thus, random-effect models such as HGLMs (hierarchical generalized linear models) are very useful. In this paper, we propose a HGLM framework for analyzing the binominal response data which may have variations in the odds-ratios between clinical trials. We also present the prediction intervals for random effects which are in practice useful to investigate the heterogeneity of the trial effects. The proposed method is illustrated with a real-data set on 22 trials about respiratory tract infections. We further demonstrate that an appropriate HGLM can be confirmed via model-selection criteria.

  • PDF

Performance Analysis of Perturbation-based Privacy Preserving Techniques: An Experimental Perspective

  • Ritu Ratra;Preeti Gulia;Nasib Singh Gill
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.81-88
    • /
    • 2023
  • In the present scenario, enormous amounts of data are produced every second. These data also contain private information from sources including media platforms, the banking sector, finance, healthcare, and criminal histories. Data mining is a method for looking through and analyzing massive volumes of data to find usable information. Preserving personal data during data mining has become difficult, thus privacy-preserving data mining (PPDM) is used to do so. Data perturbation is one of the several tactics used by the PPDM data privacy protection mechanism. In Perturbation, datasets are perturbed in order to preserve personal information. Both data accuracy and data privacy are addressed by it. This paper will explore and compare several perturbation strategies that may be used to protect data privacy. For this experiment, two perturbation techniques based on random projection and principal component analysis were used. These techniques include Improved Random Projection Perturbation (IRPP) and Enhanced Principal Component Analysis based Technique (EPCAT). The Naive Bayes classification algorithm is used for data mining approaches. These methods are employed to assess the precision, run time, and accuracy of the experimental results. The best perturbation method in the Nave-Bayes classification is determined to be a random projection-based technique (IRPP) for both the cardiovascular and hypothyroid datasets.

Prediction of New Confirmed Cases of COVID-19 based on Multiple Linear Regression and Random Forest (다중 선형 회귀와 랜덤 포레스트 기반의 코로나19 신규 확진자 예측)

  • Kim, Jun Su;Choi, Byung-Jae
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.249-255
    • /
    • 2022
  • The COVID-19 virus appeared in 2019 and is extremely contagious. Because it is very infectious and has a huge impact on people's mobility. In this paper, multiple linear regression and random forest models are used to predict the number of COVID-19 cases using COVID-19 infection status data (open source data provided by the Ministry of health and welfare) and Google Mobility Data, which can check the liquidity of various categories. The data has been divided into two sets. The first dataset is COVID-19 infection status data and all six variables of Google Mobility Data. The second dataset is COVID-19 infection status data and only two variables of Google Mobility Data: (1) Retail stores and leisure facilities (2) Grocery stores and pharmacies. The models' performance has been compared using the mean absolute error indicator. We also a correlation analysis of the random forest model and the multiple linear regression model.

A Study on Error of Frequence Rainfall Estimates Using Random Variate (무작위변량을 이용한 강우빈도분석시 내외삽오차에 관한 연구)

  • Chai, Han Kyu;Eam, Ki Ok
    • Journal of Industrial Technology
    • /
    • v.20 no.A
    • /
    • pp.159-167
    • /
    • 2000
  • In the study rainfall frequency analysis attemped the many specific property data record duration it is differance from occur to error-term and probability ditribution of concern manifest. error-term analysis of method are fact sample data using method in other hand it is not appear to be fault that sample data of number to be small random variates. Therefore, day-rainfall data: to randomicity consider of this study sample data to the Monte Carlo method by randomize after data recode duration of form was choice method which compared an assumed maternal distribution from splitting frequency analysis consequence. In the conclusion, frequency analysis of chuncheon region rainfall appeared samll RMSE to the Gamma II distribution. In the rainfall frequency analysis estimate RMSE using random variates great transform, RMSE is appear that return period increasing little by little RMSE incresed and data number incresing to RMSE decreseing.

  • PDF

Analysis of Output Stream Characteristics Processing in Digital Hardware Random Number Generator (디지털 하드웨어 난수 발생기에서 출력열 특성 처리 분석)

  • Hong, Jin-Keun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.3
    • /
    • pp.1147-1152
    • /
    • 2012
  • In this paper, it is key issue about analysis of characteristics processing of digital random output stream of hardware random number generator, which is applied in medical area. The output stream of random number generator based on hardware binary random number is effected from factors such as delay, jitter, temperature, and so on. In this paper, it presents about major factor, which effects hardware output random number stream, and the randomness of output stream data, which are combined output stream and postprocessing data such as encryption algorithm, encoding algorithm, is analyzed. the analyzed results are evaluated by major test items of randomness.

An analysis of the gyro random process (자이로 랜덤 프로세스의 분석)

  • 고영웅;김경주;이재철;권태무
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1996.10b
    • /
    • pp.210-212
    • /
    • 1996
  • Random drift rate (i.e., random drift in angle rate) of a gyro represents the major error source of inertial navigation systems that are required to operate over long time intervals. It is uncorrectable and leads to an increase in the error with the passage of time. In this paper a technique is presented for analyzing random process from experimental data and the results are presented. The problem of estimating the a priori statistics of a random process is considered using time averages of experimental data. Time averages are calculated and used in the optimal data-processing techniques to determine the statistics of the random process. Therefore the contribution each component to the gyro drift process can be quantitatively measured by its statistics. The above techniques will be applied to actual gyro drift rate data with satisfactory results.

  • PDF

Correlation Analysis of Airline Customer Satisfaction using Random Forest with Deep Neural Network and Support Vector Machine Model

  • Hong, Sang Hoon;Kim, Bumsu;Jung, Yong Gyu
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.4
    • /
    • pp.26-32
    • /
    • 2020
  • There are many airline customer evaluation data, but they are insufficient in terms of predicting customer satisfaction in practice. In particular, they are generally insufficient in case of verification of data value and development of a customer satisfaction prediction model based on customer evaluation data. In this paper, airline customer satisfaction analysis is conducted through an experiment of correlation analysis between customer evaluation data provided by Google's Kaggle. The difference in accuracy varied according to the three types, which are the overall variables, the top 4 and top 8 variables with the highest correlation. To build an airline customer satisfaction prediction model, they are applied to three classification algorithms of Random Forest, SVM, DNN and conduct a classification experiment. They are divided into training data and verification data by 7:3. As a result, the DNN model showed the lowest accuracy at 86.4%, while the SVM model at 89% and the Random Forest model at 95.7% showed the highest accuracy and performance.

Construction of an Internet of Things Industry Chain Classification Model Based on IRFA and Text Analysis

  • Zhimin Wang
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.215-225
    • /
    • 2024
  • With the rapid development of Internet of Things (IoT) and big data technology, a large amount of data will be generated during the operation of related industries. How to classify the generated data accurately has become the core of research on data mining and processing in IoT industry chain. This study constructs a classification model of IoT industry chain based on improved random forest algorithm and text analysis, aiming to achieve efficient and accurate classification of IoT industry chain big data by improving traditional algorithms. The accuracy, precision, recall, and AUC value size of the traditional Random Forest algorithm and the algorithm used in the paper are compared on different datasets. The experimental results show that the algorithm model used in this paper has better performance on different datasets, and the accuracy and recall performance on four datasets are better than the traditional algorithm, and the accuracy performance on two datasets, P-I Diabetes and Loan Default, is better than the random forest model, and its final data classification results are better. Through the construction of this model, we can accurately classify the massive data generated in the IoT industry chain, thus providing more research value for the data mining and processing technology of the IoT industry chain.