• Title/Summary/Keyword: missing-value problems

Search Result 27, Processing Time 0.023 seconds

A Study on Automatic Missing Value Imputation Replacement Method for Data Processing in Digital Data (디지털 데이터에서 데이터 전처리를 위한 자동화된 결측 구간 대치 방법에 관한 연구)

  • Kim, Jong-Chan;Sim, Chun-Bo;Jung, Se-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.2
    • /
    • pp.245-254
    • /
    • 2021
  • We proposed the research on an analysis and prediction model that allows the identification of outliers or abnormality in the data followed by effective and rapid imputation of missing values was conducted. This model is expected to analyze efficiently the problems in the data based on the calibrated raw data. As a result, a system that can adequately utilize the data was constructed by using the introduced KNN + MLE algorithm. With this algorithm, the problems in some of the existing KNN-based missing data imputation algorithms such as ignoring the missing values in some data sections or discarding normal observations were effectively addressed. A comparative evaluation was performed between the existing imputation approaches such as K-means, KNN, MEI, and MI as well as the data missing mechanisms including MCAR, MAR, and NI to check the effectiveness/efficiency of the proposed algorithm, and its superiority in all aspects was confirmed.

A data extension technique to handle incomplete data (불완전한 데이터를 처리하기 위한 데이터 확장기법)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.2
    • /
    • pp.7-13
    • /
    • 2021
  • This paper introduces an algorithm that compensates for missing values after converting them into a format that can represent the probability for incomplete data including missing values in training data. In the previous method using this data conversion, incomplete data was processed by allocating missing values with an equal probability that missing variables can have. This method applied to many problems and obtained good results, but it was pointed out that there is a loss of information in that all information remaining in the missing variable is ignored and a new value is assigned. On the other hand, in the new proposed method, only complete information not including missing values is input into the well-known classification algorithm (C4.5), and the decision tree is constructed during learning. Then, the probability of the missing value is obtained from this decision tree and assigned as an estimated value of the missing variable. That is, some lost information is recovered using a lot of information that has not been lost from incomplete learning data.

A comparison of imputation methods for the consecutive missing temperature data (연속적 결측이 존재하는 기온 자료에 대한 결측복원 기법의 비교)

  • Kim, Hee-Kyung;Kang, In-Kyeong;Lee, Jae-Won;Lee, Yung-Seop
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.3
    • /
    • pp.549-557
    • /
    • 2016
  • Consecutive missing values are likely to occur in long climate data due to system error or defective equipment. Furthermore, it is difficult to impute missing values. However, these complicated problems can be overcame by imputing missing values with reference time series. Reference time series must be composed of similar time series to time series that include missing values. We performed a simulation to compare three missing imputation methods (the adjusted normal ratio method, the regression method and the IDW method) to complete the missing values of time series. A comparison of the three missing imputation methods for the daily mean temperatures at 14 climatological stations indicated that the IDW method was better thanx others at south seaside stations. We also found the regression method was better than others at most stations (except south seaside stations).

A Survey on the Proportional Reasoning Ability of Fifth, Sixth, and Seventh Graders (5, 6, 7학년 학생들의 비례추론 능력 실태 조사)

  • Ahn, Suk-Hyun;Pang, Jeong-Suk
    • Journal of Educational Research in Mathematics
    • /
    • v.18 no.1
    • /
    • pp.103-121
    • /
    • 2008
  • The primary purpose of this study was to gather knowledge about $5^{th},\;6^{th},\;and\;7^{th}$ graders' proportional reasoning ability by investigating their reactions and use of strategies when encounting proportional or nonproportional problems, and then to raise issues concerning instructional methods related to proportion. A descriptive study through pencil-and-paper tests was conducted. The tests consisted of 12 questions, which included 8 proportional questions and 4 nonproportional questions. The following conclusions were drawn from the results obtained in this study. First, for a deeper understanding of the ratio, textbooks should treat numerical comparison problems and qualitative prediction and comparison problems together with missing-value problems. Second, when solving missing-value problems, students correctly answered direct-proportion questions but failed to correctly answer inverse-proportion questions. This result highlights the need for a more intensive curriculum to handle inverse-proportion. In particular, students need to experience inverse-relationships more often. Third, qualitative reasoning tends to be a more general norm than quantitative reasoning. Moreover, the former could be the cornerstone of proportional reasoning, and for this reason, qualitative reasoning should be emphasized before proportional reasoning. Forth, when dealing with nonproportional problems about 34% of students made proportional errors because they focused on numerical structure instead of comprehending the overall relationship. In order to overcome such errors, qualitative reasoning should be emphasized. Before solving proportional problems, students must be enriched by experiences that include dealing with direct and inverse proportion problems as well as nonproportional situational problems. This will result in the ability to accurately recognize a proportional situation.

  • PDF

A Research on the Energy Data Analysis using Machine Learning (머신러닝 기법을 활용한 에너지 데이터 분석에 관한 연구)

  • Kim, Dongjoo;Kwon, Seongchul;Moon, Jonghui;Sim, Gido;Bae, Moonsung
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.7 no.2
    • /
    • pp.301-307
    • /
    • 2021
  • After the spread of the data collection devices such as smart meters, energy data is increasingly collected in a variety of ways, and its importance continues to grow. However, due to technical or practical limitations, errors such as missing or outliers in the data occur during data collection process. Especially in the case of customer-related data, billing problems may occur, so energy companies are conducting various research to process such data. In addition, efforts are being made to create added value from data, which makes it difficult to provide such services unless reliability of data is guaranteed. In order to solve these challenges, this research analyzes prior research related to bad data processing specifically in the energy field, and propose new missing value processing methods to improve the reliability and field utilization of energy data.

Application Examples Applying Extended Data Expression Technique to Classification Problems (패턴 분류 문제에 확장된 데이터 표현 기법을 적용한 응용 사례)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.12
    • /
    • pp.9-15
    • /
    • 2018
  • The main goal of extended data expression is to develop a data structure suitable for common problems in ubiquitous environments. The greatest feature of this method is that the attribute values can be represented with probability. The next feature is that each event in the training data has a weight value that represents its importance. After this data structure has been developed, an algorithm has been devised that can learn it. In the meantime, this algorithm has been applied to various problems in various fields to obtain good results. This paper first introduces the extended data expression technique, UChoo, and rule refinement method, which are the theoretical basis. Next, this paper introduces some examples of application areas such as rule refinement, missing data processing, BEWS problem, and ensemble system.

A Study on the Index Estimation of Missing Real Estate Transaction Cases Using Machine Learning (머신러닝을 활용한 결측 부동산 매매 지수의 추정에 대한 연구)

  • Kim, Kyung-Min;Kim, Kyuseok;Nam, Daisik
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.25 no.1
    • /
    • pp.171-181
    • /
    • 2022
  • The real estate price index plays key roles as quantitative data in real estate market analysis. International organizations including OECD publish the real estate price indexes by country, and the Korea Real Estate Board announces metropolitan-level and municipal-level indexes. However, when the index is set on the smaller spatial unit level than metropolitan and municipal-level, problems occur: missing values. As the spatial scope is narrowed down, there are cases where there are few or no transactions depending on the unit period, which lead index calculation difficult or even impossible. This study suggests a supervised learning-based machine learning model to compensate for missing values that may occur due to no transaction in a specific range and period. The models proposed in our research verify the accuracy of predicting the existing values and missing values.

Comparison of Data Reconstruction Methods for Missing Value Imputation (결측값 대체를 위한 데이터 재현 기법 비교)

  • Cheongho Kim;Kee-Hoon Kang
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.603-608
    • /
    • 2024
  • Nonresponse and missing values are caused by sample dropouts and avoidance of answers to surveys. In this case, problems with the possibility of information loss and biased reasoning arise, and a replacement of missing values with appropriate values is required. In this paper, as an alternative to missing values imputation, we compare several replacement methods, which use mean, linear regression, random forest, K-nearest neighbor, autoencoder and denoising autoencoder based on deep learning. These methods of imputing missing values are explained, and each method is compared by using continuous simulation data and real data. The comparison results confirm that in most cases, the performance of the random forest imputation method and the denoising autoencoder imputation method are better than the others.

SMOOTH SINGULAR VALUE THRESHOLDING ALGORITHM FOR LOW-RANK MATRIX COMPLETION PROBLEM

  • Geunseop Lee
    • Journal of the Korean Mathematical Society
    • /
    • v.61 no.3
    • /
    • pp.427-444
    • /
    • 2024
  • The matrix completion problem is to predict missing entries of a data matrix using the low-rank approximation of the observed entries. Typical approaches to matrix completion problem often rely on thresholding the singular values of the data matrix. However, these approaches have some limitations. In particular, a discontinuity is present near the thresholding value, and the thresholding value must be manually selected. To overcome these difficulties, we propose a shrinkage and thresholding function that smoothly thresholds the singular values to obtain more accurate and robust estimation of the data matrix. Furthermore, the proposed function is differentiable so that the thresholding values can be adaptively calculated during the iterations using Stein unbiased risk estimate. The experimental results demonstrate that the proposed algorithm yields a more accurate estimation with a faster execution than other matrix completion algorithms in image inpainting problems.

Missing Pattern Matching of Rough Set Based on Attribute Variations Minimization in Rough Set (속성 변동 최소화에 의한 러프집합 누락 패턴 부합)

  • Lee, Young-Cheon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.6
    • /
    • pp.683-690
    • /
    • 2015
  • In Rough set, attribute missing values have several problems such as reduct and core estimation. Further, they do not give some discernable pattern for decision tree construction. Now, there are several methods such as substitutions of typical attribute values, assignment of every possible value, event covering, C4.5 and special LEMS algorithm. However, they are mainly substitutions into frequently appearing values or common attribute ones. Thus, decision rules with high information loss are derived in case that important attribute values are missing in pattern matching. In particular, there is difficult to implement cross validation of the decision rules. In this paper we suggest new method for substituting the missing attribute values into high information gain by using entropy variation among given attributes, and thereby completing the information table. The suggested method is validated by conducting the same rough set analysis on the incomplete information system using the software ROSE.