• Title/Summary/Keyword: 이상치 데이터 감소

Search Result 41, Processing Time 0.03 seconds

강우센서에서 생성된 강우정보를 이용한 선형회귀분석과 대역 통과 필터링 분석간의 정확도 비교

  • Kim, Yeong-Gon;Lee, Seok-Ho;Kim, Byeong-Sik
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2017.05a
    • /
    • pp.172-172
    • /
    • 2017
  • 본 연구는 차량의 AW(AutoWiping) 기능을 위해 장착된 강우센서를 이용하여 강우정보를 생산하는 기술을 개발하고자 하였다. AW(AutoWiping) 기능이란 차량 앞창(Windshield)에 빗방울이 맺히게 되면 광신호의 산란으로 인해 수광부에 들어오는 감소되는 광신호의 정도에 따라 차량 와이퍼의 속도를 결정해 주는 기능이다. 빗방울이 많이 맺힐수록 광신호는 감소되며 와이퍼는 더 빠른 속도로 작동을 하게 된다. 여기서 강우센서가 강우량이 많으면 감소된 광신호 데이터를 표출하는 현상을 이용하여 강우정보를 생산한다. 강우센서는 총 8개의 채널로 이루어져있고, 초당 250개의 광신호 데이터를 수집하며, 10분이면 약 120만 개의 데이터가 생산되게 된다. 이 대량의 데이터에서 정확한 강우량을 산출하기 위해 강우센서의 초기값과 와이퍼 이동시 발생하는 순간 이상치를 제거해야 한다. 하지만 일일이 수백만 개 이상의 데이터에서 모든 이상치를 제거하는 작업은 불가능하다. 따라서 이상치를 포함한 회귀 분석 방법을 연구하였고, 인공강우 발생기를 이용하여 광신호를 강우량으로 환산하는 2가지 회귀식이 유도되었다. 이들은 각각 이상치를 모두 포함시켜 독립변수(광신호)에 따라 종속변수(강우량)의 값이 변화하는 관계를 나타내는 선형회귀분석(model 1), 임계치를 정하여 일정 이상치가 제거된 신호만 통과시키는 대역통과 필터링 분석(model 2)으로 유도된 회귀식을 실강우에 회귀식을 적용하여 정확도를 분석하였다.

  • PDF

Anomaly Detection in Livestock Environmental Time Series Data Using LSTM Autoencoders: A Comparison of Performance Based on Threshold Settings (LSTM 오토인코더를 활용한 축산 환경 시계열 데이터의 이상치 탐지: 경계값 설정에 따른 성능 비교)

  • Se Yeon Chung;Sang Cheol Kim
    • Smart Media Journal
    • /
    • v.13 no.4
    • /
    • pp.48-56
    • /
    • 2024
  • In the livestock industry, detecting environmental outliers and predicting data are crucial tasks. Outliers in livestock environment data, typically gathered through time-series methods, can signal rapid changes in the environment and potential unexpected epidemics. Prompt detection and response to these outliers are essential to minimize stress in livestock and reduce economic losses for farmers by early detection of epidemic conditions. This study employs two methods to experiment and compare performances in setting thresholds that define outliers in livestock environment data outlier detection. The first method is an outlier detection using Mean Squared Error (MSE), and the second is an outlier detection using a Dynamic Threshold, which analyzes variability against the average value of previous data to identify outliers. The MSE-based method demonstrated a 94.98% accuracy rate, while the Dynamic Threshold method, which uses standard deviation, showed superior performance with 99.66% accuracy.

Outlier Impact on the Power of Significance Test for Cronbach Alpha Reliability Coefficient

  • Yonghwan Um
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.5
    • /
    • pp.179-187
    • /
    • 2023
  • In this paper, we studied the impact of outliers on the power of the significance tests for Cronbach alpha reliability coefficient. Four variables were varied: sample size, the number of items, the number of outliers and population Cronbach Alpha levels. We simulated data using multivariate normal distribution and used outliers sampled from uniform distribution. To test the significance of Cronbach Alpha Reliability, parametric approach(F statistic) and permutation method were used. Consequently, we observed that the powers of permutation test are equal to or greater than those of F test under all conditions, and also both F test and permutation test lose the power as the number of outliers increases, and that these effects of outliers on the power are enhanced for increasing population alpha levels.

Redundant and Abnormal Data Processing Scheme in Large-scale IoT Environment (대규모 IoT 환경에서의 중복 및 비정상 데이터 처리 기법)

  • Kim, Min-Woo;Lee, Tae-Ho;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.109-110
    • /
    • 2019
  • 최근 IoT 환경에서는 고밀도로 노드가 분포되어진다. 이러한 센서 노드들은 데이터 전송 시 혼잡을 초래하는 중복 데이터를 생성하여 데이터의 정확도를 저하시킨다. 이에 따라 본 연구에서는 데이터 집중으로 인해 발생하는 네트워크의 정체 문제를 해결하기 위해 제안 기법은 사 분위(Interquatile, IRQ) 분석과 코사인 유사도 함수를 통해 데이터의 이상치와 중복성을 측정하여 중복 데이터 및 특이치를 제거한다. 본 연구를 통하여 최적의 데이터 전송을 통하여 IoT의 통신 성능을 향상시킬 수 있으며 결과적으로 데이터 감소율, 네트워크 수명 및 에너지의 효율성을 높일 수 있다.

  • PDF

Outlier detection and treatment in industrial sampling survey (경제조사에서의 이상치 탐지와 처리방법)

  • Joo, Young Sun;Cho, Gyo-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.1
    • /
    • pp.131-142
    • /
    • 2016
  • Outliers in surveys can have a large effect on estimates of totals. This is especially true in business surveys where the populations are drawn are typically skewed. In this paper, we discussed the practical development and implementation of methods to identify and deal with outliers. A detection method is based on quartile method and detected outlier is processed in various ways. The study examines two versions of winsorised estimators with three different cut-off thresholds for each one. For the simulation study, four types of weight transformation function have been considered.

Handoff Control Scheme for IP Based Hybrid Mobile Data Network (IP 기반 흔합 무선데이터망에서의 핸드오프 제어방식 연구)

  • 권수근
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.5
    • /
    • pp.680-688
    • /
    • 2004
  • In this paper, we propose a new handoff scheme which is efficient in hybrid mobile data network consists of cellular mobile network and wireless LAN. In this scheme, handoff is delayed until connections with wireless LAN and data rates are smoothly decreased according to becon signal strength of wireless LAN. By doing so, data transfer capacity is increased and required data buffer in handoff for mobile and network system can be decreased. We analyze new handoff scheme by computer simulation. The results show that 180Mbytes data can be transferred additionally in handoff processing and required buffer size can be decreased 1/2 with the conditions that mobile speed is 1Km/hr and the data rate of a original call is 2,048Kbps.

  • PDF

Study on Outlier Analysis Considering the Spatial Distribution of Intelligent Compaction Measurement Values (지능형 다짐값의 공간적 분포를 고려한 이상치 분석 기법 연구)

  • Chung, Taek-Kyu;Cho, Jin-Woo;Chung, Choong-Ki;Baek, Sung-Ha
    • Journal of the Korean Geotechnical Society
    • /
    • v.40 no.4
    • /
    • pp.91-103
    • /
    • 2024
  • In this study, we propose an outlier detection method that considers the spatial distribution of intelligent compaction measurement values (ICMVs) to address the high variability of ICMVs measured continuously across an entire construction area. The proposed method initially identified cases where the CMV at a specific location decreased despite an increase in the number of compaction passes. Among these, values that significantly differed from those measured within a 1.5-m radius were classified as outliers. Applying this method to CMV data obtained from field tests, we found that it effectively excluded the influence of changes in roller operating conditions unrelated to compaction quality while considering the inherent heterogeneity of the soil. However, after removing the outliers, the coefficient of variation of CMV (21.4%-26.3%) remained higher than the 20% suggested by relevant standards. Further field tests are needed to modify the proposed outlier detection method and to establish reasonable criteria for the variability of ICMV.

고온 고속 노즐부위에서의 열전달

  • 장태호
    • Journal of the KSME
    • /
    • v.25 no.3
    • /
    • pp.236-241
    • /
    • 1985
  • 본 고에서는 일반적으로 노즐 부위 열해석에서 무시되는 복사열전달율과 점성소산효과를 수치적 모델을 통하여 그 필요성 여부를 조사한 것이며 다음과 같은 결론을 얻었다. (1)연소실 및 수 렴부위에서는 복사열전달율이 대류열전달율과 같은 차수의 크기로 나타나고 있어서 고 복사율을 갖는 연소가스에서는 특히 중요하다. 특히 최근에 많이 사용되는 연료에는 연소가스에 산화알 루미늄 성분이 증가하는 추세이므로 노즐부위 열해석에는 복사열전달이 차지하는 비중이 커질 것이다. (2)노즐의 확산부위에서는 고속으로 인하여 가스자체의 점성소산이 일어나 특성치 보 정계수 값이 감소한다. 따라서 Bartz의 예측치 보다는 열전달계수의 값이 적어지고 있다. (3) 따라서 노즐수렴부위에서는 일반적으로 Bartz의 예상치보다 높고 확산부에서는 낮은 결과를 얻 었던 실험결과와를 비교할 때 고온고속 노즐에서의 열전달해석은 복사 열전달과 점성열 소산을 고려함으로써 정확하게 될 수 있다. (4)이상 고려된 실험 데이터와 수치모델의 고찰은 노즐내의 침식이 없는 경우이나 실제의 경우 노즐벽 표면에서 화학적 반응이 일어난다. 그러나 이때 발 생될 수 있는 순수한 발한효과는 미미하며 단지 전체적인 단면의 열 해석시 상기에서 예측된 열전달율을 근간으로 화학반응열 및 온도분포를 계산하여야 할 것이다.

  • PDF

Big Data Management in Structured Storage Based on Fintech Models for IoMT using Machine Learning Techniques (기계학습법을 이용한 IoMT 핀테크 모델을 기반으로 한 구조화 스토리지에서의 빅데이터 관리 연구)

  • Kim, Kyung-Sil
    • Advanced Industrial SCIence
    • /
    • v.1 no.1
    • /
    • pp.7-15
    • /
    • 2022
  • To adopt the development in the medical scenario IoT developed towards the advancement with the processing of a large amount of medical data defined as an Internet of Medical Things (IoMT). The vast range of collected medical data is stored in the cloud in the structured manner to process the collected healthcare data. However, it is difficult to handle the huge volume of the healthcare data so it is necessary to develop an appropriate scheme for the healthcare structured data. In this paper, a machine learning mode for processing the structured heath care data collected from the IoMT is suggested. To process the vast range of healthcare data, this paper proposed an MTGPLSTM model for the processing of the medical data. The proposed model integrates the linear regression model for the processing of healthcare information. With the developed model outlier model is implemented based on the FinTech model for the evaluation and prediction of the COVID-19 healthcare dataset collected from the IoMT. The proposed MTGPLSTM model comprises of the regression model to predict and evaluate the planning scheme for the prevention of the infection spreading. The developed model performance is evaluated based on the consideration of the different classifiers such as LR, SVR, RFR, LSTM and the proposed MTGPLSTM model and the different size of data as 1GB, 2GB and 3GB is mainly concerned. The comparative analysis expressed that the proposed MTGPLSTM model achieves ~4% reduced MAPE and RMSE value for the worldwide data; in case of china minimal MAPE value of 0.97 is achieved which is ~ 6% minimal than the existing classifier leads.

Design of Heuristic Decision Tree (HDT) Using Human Knowledge (인간 지식을 이용한 경험적 의사결정트리의 설계)

  • Yoon, Tae-Tok;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.525-531
    • /
    • 2009
  • Data mining is the process of extracting hidden patterns from collected data. At this time, for collected data which take important role as the basic information for prediction and recommendation, the process to discriminate incorrect data in order to enhance the performance of analysis result, is needed. The existing methods to discriminate unexpected data from collected data, mainly relies on methods which are based on statistics or simple distance between data. However, for these methods, the problematic point that even meaningful data could be excluded from analysis due that the environment and characteristic of the relevant data are not considered, exists. This study proposes a method to endow human heuristic knowledge with weight value through the comparison between collected data and human heuristic knowledge, and to use the value for creating a decision tree. The data discrimination by the method proposed is more credible as human knowledge is reflected in the created tree. The validity of the proposed method is verified through an experiment.