• Title/Summary/Keyword: 불균형데이터 처리

Search Result 115, Processing Time 0.029 seconds

Balance Control of Drone using Adaptive Two-Track Control (적응적 Two-Track 기술을 이용한 드론의 균형 제어)

  • Kim, Jang-Won
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.6
    • /
    • pp.666-671
    • /
    • 2019
  • The flight controller(FC) used in small-sized drone was developed as simple structure does not perform complex operations because it uses different MCU with large-sized drone. Also, the balance control of small-sized drone should be simpler than Kalman filter using complex filter and the method using Complementary filter has relatively more operations. So, the method to realize the balance control on small-sized drone effectively using two-track control operating as proper method for above is suggested in this research. This method is a system maintaining effective balance with simple structure and less operations by operating adaptively for the unbalance of the drone with the acceleration sensor with the advantage which performing accurate correction by data processing for long term change and gyroscope sensor maintaining the balance of the drone by data processing for short term change. It is confirmed that stable operation was performed mostly based on the test result for repeatable test more than 100 times using two-track control and it maintained normal state operation more than 98% excluding the difficulty of maintaining normal operation when meets sudden and rapid wind yet.

A K-Means-Based Clustering Algorithm for Traffic Prediction in a Bike-Sharing System (공유자전거 시스템의 이용 예측을 위한 K-Means 기반의 군집 알고리즘)

  • Kim, Kyoungok;Lee, Chang Hwan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.5
    • /
    • pp.169-178
    • /
    • 2021
  • Recently, a bike-sharing system (BSS) has become popular as a convenient "last mile" transportation. Rebalancing of bikes is a critical issue to manage BSS because the rents and returns of bikes are not balanced by stations and periods. For efficient and effective rebalancing, accurate traffic prediction is important. Recently, cluster-based traffic prediction has been utilized to enhance the accuracy of prediction at the station-level and the clustering step is very important in this approach. In this paper, we propose a k-means based clustering algorithm that overcomes the drawbacks of the existing clustering methods for BSS; indeterministic and hardly converged. By employing the centroid initialization and using the temporal proportion of the rents and returns of stations as an input for clustering, the proposed algorithm can be deterministic and fast.

Prediction of Semiconductor Exposure Process Measurement Results using XGBoost (XGBoost를 사용한 반도체 노광 공정 계측 결과 예측)

  • Shin, Jeong Il;Park, Ji Su;Shon, Jin Gon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.505-508
    • /
    • 2021
  • 반도체 회로의 미세화로 단위 공정이 증가하면 TAT(turn-around time) 증가에 따른 제조 비용이 늘어난다. 반도체 공정 중 포토 공정은 마스크의 회로를 웨이퍼에 전사하는 공정으로 전사를 담당하는 노광장비의 성능에 의해 회로의 정확성이 결정된다. 이런 정확성을 검증하는 계측공정은 회로의 미세화가 진행될수록 필요성은 증가하나 TAT 증가의 주된 요인으로 최근 기계학습을 사용한 다양한 예측 모형들의 개발로 계측 결과를 예측하는 실험들이 진행되고 있다. 본 논문은 노광장비 센서들의 이상값을 감지하여 분류 후 계측공정을 진행하는 LFDC(Lithography Fault Detection and Classification) 시스템의 문제인 분류 성능이 떨어지는 것을 해결하기 위해 XGBoost를 사용하여 계측공정을 진행하지 않고 노광장비 센서의 이상값을 학습된 학습기를 통해 분류하여 포토 공정을 재진행하거나 다음 공정을 진행하는 방법을 실험하였다. 실험에서 사용된 계측 결과 예측 모형은 89%의 정확도를 확보하였고 반도체 데이터 특성인 심각한 불균형의 데이터에 대해서도 같은 정확도를 얻었다. 이런 결과는 노광장비 센서들의 이상값에 대해 89%는 정상으로 판단하였고 정상으로 판단한 웨이퍼를 실제 계측 시 예측과 같은 결과를 얻었다. 계측 결과 예측 모형을 사용하면 실제 계측을 진행하지 않고 노광장비 센서들의 이상값에 대한 판정을 할 수 있어 TAT 단축으로 제조 비용감소, 계측 장비 부하 감소 및 효율 향상을 할 수 있다. 하지만 본 논문에서는 90%의 성능을 보이는 계측 결과 예측 모형으로 여전히 10%에 대해서는 실제 계측이 필요한 문제에 대해 추후 더 연구가 필요하다.

The Performance Improvement of U-Net Model for Landcover Semantic Segmentation through Data Augmentation (데이터 확장을 통한 토지피복분류 U-Net 모델의 성능 개선)

  • Baek, Won-Kyung;Lee, Moung-Jin;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1663-1676
    • /
    • 2022
  • Recently, a number of deep-learning based land cover segmentation studies have been introduced. Some studies denoted that the performance of land cover segmentation deteriorated due to insufficient training data. In this study, we verified the improvement of land cover segmentation performance through data augmentation. U-Net was implemented for the segmentation model. And 2020 satellite-derived landcover dataset was utilized for the study data. The pixel accuracies were 0.905 and 0.923 for U-Net trained by original and augmented data respectively. And the mean F1 scores of those models were 0.720 and 0.775 respectively, indicating the better performance of data augmentation. In addition, F1 scores for building, road, paddy field, upland field, forest, and unclassified area class were 0.770, 0.568, 0.433, 0.455, 0.964, and 0.830 for the U-Net trained by original data. It is verified that data augmentation is effective in that the F1 scores of every class were improved to 0.838, 0.660, 0.791, 0.530, 0.969, and 0.860 respectively. Although, we applied data augmentation without considering class balances, we find that data augmentation can mitigate biased segmentation performance caused by data imbalance problems from the comparisons between the performances of two models. It is expected that this study would help to prove the importance and effectiveness of data augmentation in various image processing fields.

Experimental Study on Flicker Mitigation in VLC using Pseudo Manchester Coding (VLC에서 Pseudo Manchester Coding을 사용한 Flicker 최소화에 관한 실험 연구)

  • Ifthekhar, Md. Shareef;Le, Nam-Tuan;Jang, Yeong Min
    • Journal of Satellite, Information and Communications
    • /
    • v.9 no.3
    • /
    • pp.116-120
    • /
    • 2014
  • Visible Light Communication is one of the promising technologies for wireless communication due to the possibility to use existing LED lightening infrastructure to transmit data. LED has the ability to turn on and off very fast enough that our human eyes can't recognize so it can be used to transmit data via visible light along with illumination. But it faces flicker problem due to the brightness discrepancies between '1' and '0' bit patterns inside a data frame. Various run length limited (RLL) coding scheme like Manchester code, 4B6B, 8B10B or VPPM can be used to solve flickering problem. So we propose pseudo Manchester codding which can transmit data without modifying LED modulator and demodulator circuit as well as solve flickering problem.

Comparison of Korean Classification Models' Korean Essay Score Range Prediction Performance (한국어 학습 모델별 한국어 쓰기 답안지 점수 구간 예측 성능 비교)

  • Cho, Heeryon;Im, Hyeonyeol;Yi, Yumi;Cha, Junwoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.133-140
    • /
    • 2022
  • We investigate the performance of deep learning-based Korean language models on a task of predicting the score range of Korean essays written by foreign students. We construct a data set containing a total of 304 essays, which include essays discussing the criteria for choosing a job ('job'), conditions of a happy life ('happ'), relationship between money and happiness ('econ'), and definition of success ('succ'). These essays were labeled according to four letter grades (A, B, C, and D), and a total of eleven essay score range prediction experiments were conducted (i.e., five for predicting the score range of 'job' essays, five for predicting the score range of 'happiness' essays, and one for predicting the score range of mixed topic essays). Three deep learning-based Korean language models, KoBERT, KcBERT, and KR-BERT, were fine-tuned using various training data. Moreover, two traditional probabilistic machine learning classifiers, naive Bayes and logistic regression, were also evaluated. Experiment results show that deep learning-based Korean language models performed better than the two traditional classifiers, with KR-BERT performing the best with 55.83% overall average prediction accuracy. A close second was KcBERT (55.77%) followed by KoBERT (54.91%). The performances of naive Bayes and logistic regression classifiers were 52.52% and 50.28% respectively. Due to the scarcity of training data and the imbalance in class distribution, the overall prediction performance was not high for all classifiers. Moreover, the classifiers' vocabulary did not explicitly capture the error features that were helpful in correctly grading the Korean essay. By overcoming these two limitations, we expect the score range prediction performance to improve.

Proposal of Augmented Drought Inflow to Search Reliable Operational Policies for Water Supply Infrastructures (물 공급 시설의 신뢰성 있는 운영 계획 수립을 위한 가뭄 유입량 증강 기법의 제안)

  • Ji, Sukwang;Ahn, Kuk-Hyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.189-189
    • /
    • 2022
  • 물 공급 시설의 효율적이고 안정적인 운영을 위한 운영 계획의 수립 및 검증을 위해서는 장기간의 유입량 자료가 필요하다. 하지만, 현실적으로 얻을 수 있는 실측 자료는 제한적이며, 유입량이 부족하여 댐 운영에 영향을 미치는 자료는 더욱 적을 수밖에 없다. 이를 개선하고자 장기간의 모의 유입량을 생성해 운영 계획을 수립하는 방법이 종종 사용되지만, 실측 자료를 기반으로 모의하기 때문에 이 역시 가뭄의 빈도가 낮아, 장기 가뭄이나 짧은 간격으로 가뭄이 발생할 시 안정적인 운영이 어렵다. 본 연구에서는 장기 가뭄 발생 시에도 안정적인 물 공급이 가능한 운영 계획 수립을 위해 가뭄 빈도를 증가시킨 유입량 모의 기법을 제안하고자 한다. 제안하는 모의 기법은 최근 머신러닝에서 사용되는 SMOTE 알고리즘을 기반으로 한다. SMOTE 알고리즘은 데이터의 불균형을 처리하기 위한 오버 샘플링 기법으로, 소수 그룹을 단순 복제하지 않고 새로운 복제본을 생성해 과적합의 위험이 적으며, 원자료의 정보가 손실되지 않는 장점이 있다. 본 연구에서는 미국 캘리포니아주에 위치한 Folsom 댐을 대상으로 고빈도 가뭄 유입량을 모의했으며, 고빈도 가뭄 유입량을 사용한 운영 계획을 수립하였다. Folsom 댐의 과거 관측 유입량 자료를 기반으로 고빈도 가뭄 유입량을 사용한 운영 계획과 일반적인 가뭄 빈도의 유입량을 사용한 운영 계획을 적용했을 때 발생하는 공급 부족량과 과잉 방류량의 차이를 비교해 고빈도 가뭄 유입량의 사용이 물 공급 시설의 안정적인 운영에 끼치는 영향을 확인하고자 한다.

  • PDF

Efficient Load Balancing Scheme using Resource Information in Web Server System (웹 서버 시스템에서의 자원 정보를 이용한 효율적인 부하분산 기법)

  • Chang Tae-Mu;Myung Won-Shig;Han Jun-Tak
    • The KIPS Transactions:PartA
    • /
    • v.12A no.2 s.92
    • /
    • pp.151-160
    • /
    • 2005
  • The exponential growth of Web users requires the web serves with high expandability and reliability. It leads to the excessive transmission traffic and system overload problems. To solve these problems, cluster systems are widely studied. In conventional cluster systems, when the request size is large owing to such types as multimedia and CGI, the particular server load and response time tend to increase even if the overall loads are distributed evenly. In this paper, a cluster system is proposed where each Web server in the system has different contents and loads are distributed efficiently using the Web server resource information such as CPU, memory and disk utilization. Web servers having different contents are mutually connected and managed with a network file system to maintain information consistency required to support resource information updates, deletions, and additions. Load unbalance among contents group owing to distribution of contents can be alleviated by reassignment of Web servers. Using a simulation method, we showed that our method shows up to $50\%$ about average throughput and processing time improvement comparing to systems using each LC method and RR method.

A Dynamic Hashing Based Load Balancing for a Scalable Wireless Internet Proxy Server Cluster (확장성 있는 무선 인터넷 프록시 서버 클러스터를 위한 동적 해싱 기반의 부하분산)

  • Kwak, Hu-Keun;Kim, Dong-Seung;Chung, Kyu-Sik
    • The KIPS Transactions:PartA
    • /
    • v.14A no.7
    • /
    • pp.443-450
    • /
    • 2007
  • Performance scalability and storage scalability become important in a large scale cluster of wireless internet proxy cache servers. Performance scalability means that the whole performance of the cluster increases linearly according as servers are added. Storage scalability means that the total size of cache storage in the cluster is constant, regardless of the number of cache servers used, if the whole cache data are partitioned and each partition is stored in each server, respectively. The Round-Robin based load balancing method generally used in a large scale server cluster shows the performance scalability but no storage scalability because all the requested URL data need to be stored in each server. The hashing based load balancing method shows storage scalability because all the requested URL data are partitioned and each partition is stored in each server, respectively. but, it shows no performance scalability in case of uneven pattern of client requests or Hot-Spot. In this paper, we propose a novel dynamic hashing method with performance and storage scalability. In a time interval, the proposed scheme keeps to find some of requested URLs allocated to overloaded servers and dynamically reallocate them to other less-loaded servers. We performed experiments using 16 PCs and experimental results show that the proposed method has the performance and storage scalability as different from the existing hashing method.

A Study on Leakage Detection Technique Using Transfer Learning-Based Feature Fusion (전이학습 기반 특징융합을 이용한 누출판별 기법 연구)

  • YuJin Han;Tae-Jin Park;Jonghyuk Lee;Ji-Hoon Bae
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.2
    • /
    • pp.41-47
    • /
    • 2024
  • When there were disparities in performance between models trained in the time and frequency domains, even after conducting an ensemble, we observed that the performance of the ensemble was compromised due to imbalances in the individual model performances. Therefore, this paper proposes a leakage detection technique to enhance the accuracy of pipeline leakage detection through a step-wise learning approach that extracts features from both the time and frequency domains and integrates them. This method involves a two-step learning process. In the Stage 1, independent model training is conducted in the time and frequency domains to effectively extract crucial features from the provided data in each domain. In Stage 2, the pre-trained models were utilized by removing their respective classifiers. Subsequently, the features from both domains were fused, and a new classifier was added for retraining. The proposed transfer learning-based feature fusion technique in this paper performs model training by integrating features extracted from the time and frequency domains. This integration exploits the complementary nature of features from both domains, allowing the model to leverage diverse information. As a result, it achieved a high accuracy of 99.88%, demonstrating outstanding performance in pipeline leakage detection.