• Title/Summary/Keyword: Missing data patterns

Search Result 62, Processing Time 0.021 seconds

Some Considerations on the On-site Applicability of PSA(Pulse Sequence Analysis) as a Partial Discharge Analysis Method (부분방전 해석 방법으로 PSA(Pulse Sequence Analysis)의 현장 적용성에 대한 고찰)

  • Kim, Jeong-Tae;Lee, Ho-Keun
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.18 no.5
    • /
    • pp.484-489
    • /
    • 2005
  • Because of its effectiveness for the PD(Partial Discharge) pattern recognition, PSA(Pulse Sequence Analysis) has been considered as a new analytic method instead of conventional PRPDA(Phase Resolved Partial Discharge Analysis). However, it is generally thought that PSA has some possibility to misjudge patterns in case of data-missing resulting from poor sensitivity because it analyses the correlation between sequential pulses, which leads to hesitate to apply it to on-site. Therefore, in this paper, the problems of PSA such as data-missing and noise-adding cases were investigated. for the purpose, PD data obtained from various defects including noise-adding data were used and analyzed. As a result, it was shown that both cases could cause fatal errors in recognizing PD patterns. In case of the data missing, the error was dependant on the kinds of defect and the degree of degradation Also, it could be noticed that the error due to adding noises was larger than that due to some data missing.

A Study on Imputation using Adjusted Cohen Method

  • Chung, Sung-Suk;Chun, Young-Min;Lee, Sun-Kyung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.3
    • /
    • pp.871-888
    • /
    • 2006
  • Many studies have been done to develop procedures to deal with missing values. Most common method is to reassign the other values to the missing data. The purpose of our study is to suggest adjusted Cohen methods and to compare the efficiency of them with other methods through a simulation study. The adjusted Cohen methods use an auxiliary variable to arrange ranking of the variable with missing values. It leads to a reduced mean square error(MSE) compared with the Cohen method.

  • PDF

Prevalence and patterns of tooth agenesis among patients aged 12-22 years: A retrospective study

  • Eliacik, Basak Kiziltan;Atas, Cafer;Polat, Gunseli Guven
    • The korean journal of orthodontics
    • /
    • v.51 no.5
    • /
    • pp.355-362
    • /
    • 2021
  • Objective: This study aimed to establish the prevalence and patterns of nonsyndromic tooth agenesis in patients referred to a tertiary health care facility. Methods: The intraoral records and panoramic radiographs of 9,874 patients aged 12-22 years were evaluated. The study group included 716 patients (371 male, 345 female) with non-syndromic agenesis of at least one tooth (except the third molars). The study data were assessed using descriptive statistics, chi-square test, and Mann-Whitney U test, while patterns were evaluated using a tooth agenesis code (TAC) tool. Results: A total of 1,627 congenitally missing teeth, were found in patients with non-syndromic tooth agenesis, with an average of 2.27 missing teeth per patient. The prevalence of tooth agenesis was 7.25%, and the most commonly missing teeth were the left mandibular second premolars (10.17%). The age group comparison revealed no significant difference in the median number of missing teeth per patient according to the cutoff values for ages between 12 and 22 years. When the missing teeth were examined separately according to quadrants, 114 different tooth agenesis patterns (upper right quadrant = 28, upper left quadrant = 27, lower left quadrant = 31, and lower right quadrant = 28) were identified, and 81 of these patterns appeared only once. Conclusions: This study highlights the benefits of applying the TAC tool in a large sample population. The application of the TAC tool in such studies will enable the development of template treatment plans by determining homogenous patterns of tooth agenesis in certain populations.

Estimate method of missing data using Similarity in AMI system (AMI시스템에서 유사도를 활용한 누락데이터 보정 방법)

  • Kwon, Hyuk-Rok;Hong, Taek-Eun;Kim, Pan-Koo
    • Smart Media Journal
    • /
    • v.8 no.4
    • /
    • pp.80-84
    • /
    • 2019
  • As a result of AMI rapidly expanding and distributing its products, variety of services that utilize data on the use of electricity are increasing. In order to make these services more effective, missing metric data needs to be corrected, compensating for which Euclidean similarity is used to find customers with similar usage patterns. Throughout such a process, we propose a method for correcting missing data and provide comparison with the preceding methods.

Comparative Evaluation of the Pollutant Load Estimation Method in the Water Quality Data Missing Intervals (수질자료 결측구간의 오염부하 추정기법 비교평가)

  • Cho, Beom-Jun;Cho, Hong-Yeon;Kahng, Sung-Hyun
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.19 no.1
    • /
    • pp.45-56
    • /
    • 2007
  • Direct estimation of the pollutant load(PL) should be carried out by the data filling in the missing intervals using an appropriate method because it is impossible in which the flow discharge(water quantity) or water quality(WQ) time-series data set have the missing intervals. In this study, the several methods estimating the water quality in the missing periods are suggested and the WQ and pollutants load change patterns are compared and evaluated based on the reproducible degree of the available data change patterns. The most appropriate method is finally suggested and the contribution factor deciding the influence degree and the PL characteristics of the river estuary is also suggested. Based on the PL estimation results using the several methods, the interpolation method considering the fluctuation of the available WQ data is shown to be most efficient. The PL patterns of the Han river estuary is classified as the discharge-dominated type. The data filling process is inevitable and the WQ estimation using the efficient and effective method should be carried out in order to estimate reasonable PL.

Comparison of GEE Estimators Using Imputation Methods (대체방법별 GEE추정량 비교)

  • 김동욱;노영화
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.407-426
    • /
    • 2003
  • We consider the missing covariates problem in generalized estimating equations(GEE) model. If the covariate is partially missing, GEE can not be calculated. In this paper, we study the performance of 7 imputation methods to handle missing covariates in GEE models, and the properties of GEE estimators are investigated after missing covariates are imputed for ordinal data of repeated measurements. The 7 imputation methods include i) Naive Deletion ii) Sample Average Imputation iii) Row Average Imputation iv) Cross-wave Regression Imputation v) Carry-over Imputation vi) Bayesian Bootstrap vii) Approximate Bayesian Bootstrap. A Monte-Carlo simulation is used to compare the performance of these methods. For the missing mechanism generating the missing data, we assume ignorable nonresponse. Furthermore, we generate missing covariates with or without considering wave nonresp onse patterns.

A Study on Shape Variability in Canonical Correlation Biplot with Missing Values (결측값이 있는 정준상관 행렬도의 형상변동 연구)

  • Hong, Hyun-Uk;Choi, Yong-Seok;Shin, Sang-Min;Ka, Chang-Wan
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.955-966
    • /
    • 2010
  • Canonical correlation biplot is a useful biplot for giving a graphical description of the data matrix which consists of the association between two sets of variables, for detecting patterns and displaying results found by more formal methods of analysis. Nevertheless, when some values are missing in data, most biplots are not directly applicable. To solve this problem, we estimate the missing data using the median, mean, EM algorithm and MCMC imputation methods according to missing rates. Even though we estimate the missing values of biplot of incomplete data, we have different shapes of biplots according to the imputation methods and missing rates. Therefore we use a RMS(root mean square) which was proposed by Shin et al. (2007) and PS(procrustes statistic) for measuring and comparing the shape variability between the original biplots and the estimated biplots.

Development of a Machine Learning Model for Imputing Time Series Data with Massive Missing Values (결측치 비율이 높은 시계열 데이터 분석 및 예측을 위한 머신러닝 모델 구축)

  • Bangwon Ko;Yong Hee Han
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.3
    • /
    • pp.176-182
    • /
    • 2024
  • In this study, we compared and analyzed various methods of missing data handling to build a machine learning model that can effectively analyze and predict time series data with a high percentage of missing values. For this purpose, Predictive State Model Filtering (PSMF), MissForest, and Imputation By Feature Importance (IBFI) methods were applied, and their prediction performance was evaluated using LightGBM, XGBoost, and Explainable Boosting Machines (EBM) machine learning models. The results of the study showed that MissForest and IBFI performed the best among the methods for handling missing values, reflecting the nonlinear data patterns, and that XGBoost and EBM models performed better than LightGBM. This study emphasizes the importance of combining nonlinear imputation methods and machine learning models in the analysis and prediction of time series data with a high percentage of missing values, and provides a practical methodology.

Development and Application of Imputation Technique Based on NPR for Missing Traffic Data (NPR기반 누락 교통자료 추정기법 개발 및 적용)

  • Jang, Hyeon-Ho;Han, Dong-Hui;Lee, Tae-Gyeong;Lee, Yeong-In;Won, Je-Mu
    • Journal of Korean Society of Transportation
    • /
    • v.28 no.3
    • /
    • pp.61-74
    • /
    • 2010
  • ITS (Intelligent transportation systems) collects real-time traffic data, and accumulates vest historical data. But tremendous historical data has not been managed and employed efficiently. With the introduction of data management systems like ADMS (Archived Data Management System), the potentiality of huge historical data dramatically surfs up. However, traffic data in any data management system includes missing values in nature, and one of major obstacles in applying these data has been the missing data because it makes an entire dataset useless every so often. For these reasons, imputation techniques take a key role in data management systems. To address these limitations, this paper presents a promising imputation technique which could be mounted in data management systems and robustly generates the estimations for missing values included in historical data. The developed model, based on NPR (Non-Parametric Regression) approach, employs various traffic data patterns in historical data and is designated for practical requirements such as the minimization of parameters, computational speed, the imputation of various types of missing data, and multiple imputation. The model was tested under the conditions of various missing data types. The results showed that the model outperforms reported existing approaches in the side of prediction accuracy, and meets the computational speed required to be mounted in traffic data management systems.

Sample size calculation for comparing time-averaged responses in K-group repeated binary outcomes

  • Wang, Jijia;Zhang, Song;Ahn, Chul
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.3
    • /
    • pp.321-328
    • /
    • 2018
  • In clinical trials with repeated measurements, the time-averaged difference (TAD) may provide a more powerful evaluation of treatment efficacy than the rate of changes over time when the treatment effect has rapid onset and repeated measurements continue across an extended period after a maximum effect is achieved (Overall and Doyle, Controlled Clinical Trials, 15, 100-123, 1994). The sample size formula has been investigated by many researchers for the evaluation of TAD in two treatment groups. For the evaluation of TAD in multi-arm trials, Zhang and Ahn (Computational Statistics & Data Analysis, 58, 283-291, 2013) and Lou et al. (Communications in Statistics-Theory and Methods, 46, 11204-11213, 2017b) developed the sample size formulas for continuous outcomes and count outcomes, respectively. In this paper, we derive a sample size formula to evaluate the TAD of the repeated binary outcomes in multi-arm trials using the generalized estimating equation approach. This proposed sample size formula accounts for various correlation structures and missing patterns (including a mixture of independent missing and monotone missing patterns) that are frequently encountered by practitioners in clinical trials. We conduct simulation studies to assess the performance of the proposed sample size formula under a wide range of design parameters. The results show that the empirical powers and the empirical Type I errors are close to nominal levels. We illustrate our proposed method using a clinical trial example.