• Title/Summary/Keyword: Imputation accuracy

Search Result 47, Processing Time 0.026 seconds

Smoothed RSSI-Based Distance Estimation Using Deep Neural Network (심층 인공신경망을 활용한 Smoothed RSSI 기반 거리 추정)

  • Hyeok-Don Kwon;Sol-Bee Lee;Jung-Hyok Kwon;Eui-Jik Kim
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.2
    • /
    • pp.71-76
    • /
    • 2023
  • In this paper, we propose a smoothed received signal strength indicator (RSSI)-based distance estimation using deep neural network (DNN) for accurate distance estimation in an environment where a single receiver is used. The proposed scheme performs a data preprocessing consisting of data splitting, missing value imputation, and smoothing steps to improve distance estimation accuracy, thereby deriving the smoothed RSSI values. The derived smoothed RSSI values are used as input data of the Multi-Input Single-Output (MISO) DNN model, and are finally returned as an estimated distance in the output layer through input layer and hidden layer. To verify the superiority of the proposed scheme, we compared the performance of the proposed scheme with that of the linear regression-based distance estimation scheme. As a result, the proposed scheme showed 29.09% higher distance estimation accuracy than the linear regression-based distance estimation scheme.

Development of Machine Learning Based Precipitation Imputation Method (머신러닝 기반의 강우추정 방법 개발)

  • Heechan Han;Changju Kim;Donghyun Kim
    • Journal of Wetlands Research
    • /
    • v.25 no.3
    • /
    • pp.167-175
    • /
    • 2023
  • Precipitation data is one of the essential input datasets used in various fields such as wetland management, hydrological simulation, and water resource management. In order to efficiently manage water resources using precipitation data, it is essential to secure as much data as possible by minimizing the missing rate of data. In addition, more efficient hydrological simulation is possible if precipitation data for ungauged areas are secured. However, missing precipitation data have been estimated mainly by statistical equations. The purpose of this study is to propose a new method to restore missing precipitation data using machine learning algorithms that can predict new data based on correlations between data. Moreover, compared to existing statistical methods, the applicability of machine learning techniques for restoring missing precipitation data is evaluated. Representative machine learning algorithms, Artificial Neural Network (ANN) and Random Forest (RF), were applied. For the performance of classifying the occurrence of precipitation, the RF algorithm has higher accuracy in classifying the occurrence of precipitation than the ANN algorithm. The F1-score and Accuracy values, which are evaluation indicators of the classification model, were calculated as 0.80 and 0.77, while the ANN was calculated as 0.76 and 0.71. In addition, the performance of estimating precipitation also showed higher accuracy in RF than in ANN algorithm. The RMSE of the RF and ANN algorithms was 2.8 mm/day and 2.9 mm/day, and the values were calculated as 0.68 and 0.73.

Estimation of the Percent of the Vote by Adjustment of Voter Turnout in Election Polls (선거여론조사에서 투표율 반영을 통한 득표율 추정)

  • Kim, Jeonghoon;Han, Sang-Tae;Kang, Hyuncheol
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2873-2881
    • /
    • 2018
  • It is very important to obtain objective and credible information through election polls in order to contribute to the correct voting behavior of the voters or to establish appropriate election strategies for candidates or political parties. Therefore, many related organizations such as political parties, media organizations, and research institutions have been making efforts to improve the accuracy of the results of the polls and the election prediction. Kim et al. (2017) analyzed whether the non-response group responded that there is no support candidate in the election survey to increase the accuracy of the estimation of the vote rate. As a result, it has been confirmed that the accuracy of the estimation of the vote rate can be significantly improved by performing an appropriate classification on the non-response layer. In this study, we propose a method to estimate the turnout by each strata (sex, age group) under the condition that the total turnout rate is given for a specific district (region) and propose a procedure to predict the vote rate by reflecting the turnout. In addition, case studies were conducted using data gathered through telephone interviews for the 20th National Assembly elections in 2016.

A Study for Traffic Forecasting Using Traffic Statistic Information (교통 통계 정보를 이용한 속도 패턴 예측에 관한 연구)

  • Choi, Bo-Seung;Kang, Hyun-Cheol;Lee, Seong-Keon;Han, Sang-Tae
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1177-1190
    • /
    • 2009
  • The traffic operating speed is one of important information to measure a road capacity. When we supply the information of the road of high traffic by using navigation, offering the present traffic information and the forecasted future information are the outstanding functions to serve the more accurate expected times and intervals. In this study, we proposed the traffic speed forecasting model using the accumulated traffic speed data of the road and highway and forecasted the average speed for each the road and high interval and each time interval using Fourier transformation and time series regression model with trigonometrical function. We also propose the proper method of missing data imputation and treatment for the outliers to raise an accuracy of the traffic speed forecasting and the speed grouping method for which data have similar traffic speed pattern to increase an efficiency of analysis.

Store Sales Prediction Using Gradient Boosting Model (그래디언트 부스팅 모델을 활용한 상점 매출 예측)

  • Choi, Jaeyoung;Yang, Heeyoon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.2
    • /
    • pp.171-177
    • /
    • 2021
  • Through the rapid developments in machine learning, there have been diverse utilization approaches not only in industrial fields but also in daily life. Implementations of machine learning on financial data, also have been of interest. Herein, we employ machine learning algorithms to store sales data and present future applications for fintech enterprises. We utilize diverse missing data processing methods to handle missing data and apply gradient boosting machine learning algorithms; XGBoost, LightGBM, CatBoost to predict the future revenue of individual stores. As a result, we found that using median imputation onto missing data with the appliance of the xgboost algorithm has the best accuracy. By employing the proposed method, fintech enterprises and customers can attain benefits. Stores can benefit by receiving financial assistance beforehand from fintech companies, while these corporations can benefit by offering financial support to these stores with low risk.

Personalized Data Restoration Algorithm to Improve Wearable Device Service (웨어러블 디바이스 서비스 향상을 위한 개인 맞춤형 데이터 복원 알고리즘)

  • Kikun Park;Hye-Rim Bae
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.51-60
    • /
    • 2021
  • The market size of wearable devices is growing rapidly every year, and manufacturers around the world are introducing products that utilize their unique characteristics to keep up with the demand. Among them, smart watches are wearable devices with a very high share in sales, and they provide a variety of services to users by using information collected in real-time. The quality of service depends on the accuracy of the data collected by the smart watch, but data measurement may not be possible depending on the situation. This paper introduces a method to restore data that a smart watch could not collect. It deals with the similarity calculation method of trajectory information measured over time for data restoration and introduces a procedure for restoring missing sections according to the similarity. To prove the performance of the proposed methodology, a comparative experiment with a machine learning algorithm was conducted. Finally, the expected effects of this study and future research directions are discussed.

Accuracy of genomic-polygenic estimated breeding value for milk yield and fat yield in the Thai multibreed dairy population with five single nucleotide polymorphism sets

  • Wongpom, Bodin;Koonawootrittriron, Skorn;Elzo, Mauricio A.;Suwanasopee, Thanathip;Jattawa, Danai
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.9
    • /
    • pp.1340-1348
    • /
    • 2019
  • Objective: The objectives were to compare variance components, genetic parameters, prediction accuracies, and genomic-polygenic estimated breeding value (EBV) rankings for milk yield (MY) and fat yield (FY) in the Thai multibreed dairy population using five single nucleotide polymorphism (SNP) sets from GeneSeek GGP80K chip. Methods: The dataset contained monthly MY and FY of 8,361 first-lactation cows from 810 farms. Variance components, genetic parameters, and EBV for five SNP sets from the GeneSeek GGP80K chip were obtained using a 2-trait single-step average-information restricted maximum likelihood procedure. The SNP sets were the complete SNP set (all available SNP; SNP100), top 75% set (SNP75), top 50% set (SNP50), top 25% set (SNP25), and top 5% set (SNP5). The 2-trait models included herd-year-season, heterozygosity and age at first calving as fixed effects, and animal additive genetic and residual as random effects. Results: The estimates of additive genetic variances for MY and FY from SNP subsets were mostly higher than those of the complete set. The SNP25 MY and FY heritability estimates (0.276 and 0.183) were higher than those from SNP75 (0.265 and 0.168), SNP50 (0.275 and 0.179), SNP5 (0.231 and 0.169), and SNP100 (0.251and 0.159). The SNP25 EBV accuracies for MY and FY (39.76% and 33.82%) were higher than for SNP75 (35.01% and 32.60%), SNP50 (39.64% and 33.38%), SNP5 (38.61% and 29.70%), and SNP100 (34.43% and 31.61%). All rank correlations between SNP100 and SNP subsets were above 0.98 for both traits, except for SNP100 and SNP5 (0.93 for MY; 0.92 for FY). Conclusion: The high SNP25 estimates of genetic variances, heritabilities, EBV accuracies, and rank correlations between SNP100 and SNP25 for MY and FY indicated that genotyping animals with SNP25 dedicated chip would be a suitable to maintain genotyping costs low while speeding up genetic progress for MY and FY in the Thai dairy population.