• Title/Summary/Keyword: Data imbalance

Search Result 491, Processing Time 0.022 seconds

Load Balancing for Distributed Processing of Real-time Spatial Big Data Stream (실시간 공간 빅데이터 스트림 분산 처리를 위한 부하 균형화 방법)

  • Yoon, Susik;Lee, Jae-Gil
    • Journal of KIISE
    • /
    • v.44 no.11
    • /
    • pp.1209-1218
    • /
    • 2017
  • A variety of sensors is widely used these days, and it has become much easier to acquire spatial big data streams from various sources. Since spatial data streams have inherently skewed and dynamically changing distributions, the system must effectively distribute the load among workers. Previous studies to solve this load imbalance problem are not directly applicable to processing spatial data. In this research, we propose Adaptive Spatial Key Grouping (ASKG). The main idea of ASKG is, by utilizing the previous distribution of the data streams, to adaptively suggest a new grouping scheme that evenly distributes the future load among workers. We evaluate the validity of the proposed algorithm in various environments, by conducting an experiment with real datasets while varying the number of workers, input rate, and processing overhead. Compared to two other alternative algorithms, ASKG improves the system performance in terms of load imbalance, throughput, and latency.

Classification Algorithm-based Prediction Performance of Order Imbalance Information on Short-Term Stock Price (분류 알고리즘 기반 주문 불균형 정보의 단기 주가 예측 성과)

  • Kim, S.W.
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.157-177
    • /
    • 2022
  • Investors are trading stocks by keeping a close watch on the order information submitted by domestic and foreign investors in real time through Limit Order Book information, so-called price current provided by securities firms. Will order information released in the Limit Order Book be useful in stock price prediction? This study analyzes whether it is significant as a predictor of future stock price up or down when order imbalances appear as investors' buying and selling orders are concentrated to one side during intra-day trading time. Using classification algorithms, this study improved the prediction accuracy of the order imbalance information on the short-term price up and down trend, that is the closing price up and down of the day. Day trading strategies are proposed using the predicted price trends of the classification algorithms and the trading performances are analyzed through empirical analysis. The 5-minute KOSPI200 Index Futures data were analyzed for 4,564 days from January 19, 2004 to June 30, 2022. The results of the empirical analysis are as follows. First, order imbalance information has a significant impact on the current stock prices. Second, the order imbalance information observed in the early morning has a significant forecasting power on the price trends from the early morning to the market closing time. Third, the Support Vector Machines algorithm showed the highest prediction accuracy on the day's closing price trends using the order imbalance information at 54.1%. Fourth, the order imbalance information measured at an early time of day had higher prediction accuracy than the order imbalance information measured at a later time of day. Fifth, the trading performances of the day trading strategies using the prediction results of the classification algorithms on the price up and down trends were higher than that of the benchmark trading strategy. Sixth, except for the K-Nearest Neighbor algorithm, all investment performances using the classification algorithms showed average higher total profits than that of the benchmark strategy. Seventh, the trading performances using the predictive results of the Logical Regression, Random Forest, Support Vector Machines, and XGBoost algorithms showed higher results than the benchmark strategy in the Sharpe Ratio, which evaluates both profitability and risk. This study has an academic difference from existing studies in that it documented the economic value of the total buy & sell order volume information among the Limit Order Book information. The empirical results of this study are also valuable to the market participants from a trading perspective. In future studies, it is necessary to improve the performance of the trading strategy using more accurate price prediction results by expanding to deep learning models which are actively being studied for predicting stock prices recently.

Traffic Data Generation Technique for Improving Network Attack Detection Using Deep Learning (네트워크 공격 탐지 성능향상을 위한 딥러닝을 이용한 트래픽 데이터 생성 연구)

  • Lee, Wooho;Hahm, Jaegyoon;Jung, Hyun Mi;Jeong, Kimoon
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.11
    • /
    • pp.1-7
    • /
    • 2019
  • Recently, various approaches to detect network attacks using machine learning have been studied and are being applied to detect new attacks and to increase precision. However, the machine learning method is dependent on feature extraction and takes a long time and complexity. It also has limitation of performace due to learning data imbalance. In this study, we propose a method to solve the degradation of classification performance due to imbalance of learning data among the limit points of detection system. To do this, we generate data using Generative Adversarial Networks (GANs) and propose a classification method using Convolutional Neural Networks (CNNs). Through this approach, we can confirm that the accuracy is improved when applied to the NSL-KDD and UNSW-NB15 datasets.

A Study on the Classification of Fault Motors using Sound Data (소리 데이터를 이용한 불량 모터 분류에 관한 연구)

  • Il-Sik, Chang;Gooman, Park
    • Journal of Broadcast Engineering
    • /
    • v.27 no.6
    • /
    • pp.885-896
    • /
    • 2022
  • Motor failure in manufacturing plays an important role in future A/S and reliability. Motor failure is detected by measuring sound, current, and vibration. For the data used in this paper, the sound of the car's side mirror motor gear box was used. Motor sound consists of three classes. Sound data is input to the network model through a conversion process through MelSpectrogram. In this paper, various methods were applied, such as data augmentation to improve the performance of classifying fault motors and various methods according to class imbalance were applied resampling, reweighting adjustment, change of loss function and representation learning and classification into two stages. In addition, the curriculum learning method and self-space learning method were compared through a total of five network models such as Bidirectional LSTM Attention, Convolutional Recurrent Neural Network, Multi-Head Attention, Bidirectional Temporal Convolution Network, and Convolution Neural Network, and the optimal configuration was found for motor sound classification.

Image-Based Skin Cancer Classification System Using Attention Layer (Attention layer를 활용한 이미지 기반 피부암 분류 시스템)

  • GyuWon Lee;SungHee Woo
    • Journal of Practical Engineering Education
    • /
    • v.16 no.1_spc
    • /
    • pp.59-64
    • /
    • 2024
  • As the aging population grows, the incidence of cancer is increasing. Skin cancer appears externally, but people often don't notice it or simply overlook it. As a result, if the early detection period is missed, the survival rate in the case of late stage cancer is only 7.5-11%. However, the disadvantage of diagnosing, serious skin cancer is that it requires a lot of time and money, such as a detailed examination and cell tests, rather than simple visual diagnosis. To overcome these challenges, we propose an Attention-based CNN model skin cancer classification system. If skin cancer can be detected early, it can be treated quickly, and the proposed system can greatly help the work of a specialist. To mitigate the problem of image data imbalance according to skin cancer type, this skin cancer classification model applies the Over Sampling, technique to data with a high distribution ratio, and adds a pre-learning model without an Attention layer. This model is then compared to the model without the Attention layer. We also plan to solve the data imbalance problem by strengthening data augmentation techniques for specific classes.

Local Imbalance of Emergency Medical Services(EMS): Analyses on 119 EMS Activity Reports of Busan (구급서비스의 지역 불균형: 부산시 119 구급활동일지 분석)

  • Lee, Dalbyul
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.23 no.3
    • /
    • pp.161-173
    • /
    • 2020
  • This study analyzed local imbalances in the supply and demand of emergency medical services in Busan using the 119 emergency activity reports of the Busan Fire & Disaster Headquarters. The data for EMS activity reports in 2017 was converted into Jimgyegu units. The spatial distribution of the indicators representing the local imbalance of emergency demand and supply (number of reports, number of reports relative to the population, average coefficient of variation and outlier of on-site arrival time, and number of dispatches outside the jurisdiction) was analyzed using Hotspot analysis of GIS spatial statistics analysis. As a result of the analysis, the hot spot area and the cold spot area where both supply and demand of emergency services are concentrated were clearly distinguished. This means that the supply and demand of emergency services in Busan are locally unbalanced. In particular, there was a difference in the demand and supply of emergency services in the original downtown and its surrounding areas, and in the outskirts of Busan.

Antecedents and consequences of trust and commitment in apparel manufacturer-contractor relationships: The moderating role of length of relationship (국내 패션기업과 협력업체와의 관계에서 신뢰와 몰입에 영향을 미치는 변인: 관계 기간의 조절 효과)

  • Park, Na Ri;Park, Jae-Ok
    • The Research Journal of the Costume Culture
    • /
    • v.21 no.2
    • /
    • pp.220-233
    • /
    • 2013
  • This study examined regarding the moderating effect of length of relationship in the relationship among the antecedent variables (i.e., specific investment, opportunistic behavior, communication, uncertainty, interdependence, power imbalance, shared value, and flexibility) of trust and commitment, trust and commitment and firm performance and relationship satisfaction. A total of 128 apparel manufacturers participated in this study. Flexibility exerted the most positive effect on trust in short-term relationship, followed by specific investment. And opportunistic behavior was found to exert negative effect on trust. Commitment was found to be most negatively affected by power imbalance, followed by interdependence. Trust was shown to be significantly affected by communication, shared value and flexibility in short-term relationship. In the case of long-term relationship, commitment was shown to be significantly affected by uncertainty, interdependence, power imbalance and flexibility. Firm performance was positively affected by both trust and commitment. As for the effect of trust and commitment on relationship satisfaction, relationship satisfaction was also affected by both trust and commitment. In case the length of relationship, firm performance was affected by both trust and commitment. As for the effect of trust and commitment on relationship satisfaction, relationship satisfaction was also affected by both trust and commitment. The result of this research provides valuable data for making a concrete suggestion regarding the strategy for improving trust and commitment for the sake of the desirable relationship between apparel manufacturers and contractors.

Effort-reward Imbalance at Work, Parental Support, and Suicidal Ideation in Adolescents: A Cross-sectional Study from Chinese Dual-earner Families

  • Li, Jian;Loerbroks, Adrian;Siegrist, Johannes
    • Safety and Health at Work
    • /
    • v.8 no.1
    • /
    • pp.77-83
    • /
    • 2017
  • Background: In contemporary China, most parents are dual-earner couples and there is only one child in the family. We aimed to examine the associations of parents' work stress with suicidal ideation among the corresponding adolescent. We further hypothesized that low parental support experienced by adolescents may mediate the associations. Methods: Cross-sectional data from school students and their working parents were used, with 907 families from Kunming City, China. Stress at work was measured by the effort-reward imbalance questionnaire. Perceived parental support was assessed by an item on parental empathy and their willingness to communicate with the adolescent. Suicidal ideation was considered positive if students reported thoughts about suicide every month or more frequently during the previous 6 months. Logistic regression was used to examine the associations. Results: We observed that parents' work stress was positively associated with low parental support, which was in turn associated with adolescent suicidal ideation. The odds ratio for parents' work stress and adolescent suicidal ideation was 2.91 (95% confidence interval: 1.53-5.53), and this association was markedly attenuated to 2.24 (95% confidence interval: 1.15-4.36) after additional adjustment for parental support. Notably, mothers' work stress levels exerted stronger effects on children's suicidal ideation than those of fathers. Conclusion: Parents' work stress (particularly mother's work stress) was strongly associated with adolescent's suicidal ideation, and the association was partially mediated by low parental support. These results need to be replicated and extended in prospective investigations within and beyond China, in order to explore potential causal pathways as a basis of preventive action.

Consensus-Based Distributed Algorithm for Optimal Resource Allocation of Power Network under Supply-Demand Imbalance (수급 불균형을 고려한 전력망의 최적 자원 할당을 위한 일치 기반의 분산 알고리즘)

  • Young-Hun, Lim
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.15 no.6
    • /
    • pp.440-448
    • /
    • 2022
  • Recently, due to the introduction of distributed energy resources, the optimal resource allocation problem of the power network is more and more important, and the distributed resource allocation method is required to process huge amount of data in large-scale power networks. In the optimal resource allocation problem, many studies have been conducted on the case when the supply-demand balance is satisfied due to the limitation of the generation capacity of each generator, but the studies considering the supply-demand imbalance, that total demand exceeds the maximum generation capacity, have rarely been considered. In this paper, we propose the consensus-based distributed algorithm for the optimal resource allocation of power network considering the supply-demand imbalance condition as well as the supply-demand balance condition. The proposed distributed algorithm is designed to allocate the optimal resources when the supply-demand balance condition is satisfied, and to measure the amount of required resources when the supply-demand is imbalanced. Finally, we conduct the simulations to verify the performance of the proposed algorithm.

Enabling Efficient Verification of Dynamic Data Possession and Batch Updating in Cloud Storage

  • Qi, Yining;Tang, Xin;Huang, Yongfeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2429-2449
    • /
    • 2018
  • Dynamic data possession verification is a common requirement in cloud storage systems. After the client outsources its data to the cloud, it needs to not only check the integrity of its data but also verify whether the update is executed correctly. Previous researches have proposed various schemes based on Merkle Hash Tree (MHT) and implemented some initial improvements to prevent the tree imbalance. This paper tries to take one step further: Is there still any problems remained for optimization? In this paper, we study how to raise the efficiency of data dynamics by improving the parts of query and rebalancing, using a new data structure called Rank-Based Merkle AVL Tree (RB-MAT). Furthermore, we fill the gap of verifying multiple update operations at the same time, which is the novel batch updating scheme. The experimental results show that our efficient scheme has better efficiency than those of existing methods.