• 제목/요약/키워드: Imbalance data

검색결과 493건 처리시간 0.027초

분산 메모리 시스템에서의 MPMD 방식의 비동기 반복 알고리즘을 위한 비대칭 전송의 구현 (Implementation Of Asymmetric Communication For Asynchronous Iteration By the MPMD Method On Distributed Memory Systems)

  • 박필성
    • 인터넷정보학회논문지
    • /
    • 제4권5호
    • /
    • pp.51-60
    • /
    • 2003
  • 비동기 반복 알고리즘은 부하 불균형 및 컴퓨터 노드 간의 전송 지연에 의한 병렬 알고리즘의 성능 저하를 완화하는 하나의 방법인데, 이는 노드들 간의 비대칭적 데이터 전송을 필요로 한다 본 논문에서는 분산 메모리 시스템 상에서 MPMD 방식으로 노드당 별도의 서버 프로세스를 추가로 생성하여 비대칭적 전송을 구현하고, 노드당 하나의 프로세스를 생성하는 SPMD 방식과 비교하며 그 장단점에 대해 논의한다.

  • PDF

Enhancing E-commerce Security: A Comprehensive Approach to Real-Time Fraud Detection

  • Sara Alqethami;Badriah Almutanni;Walla Aleidarousr
    • International Journal of Computer Science & Network Security
    • /
    • 제24권4호
    • /
    • pp.1-10
    • /
    • 2024
  • In the era of big data, the growth of e-commerce transactions brings forth both opportunities and risks, including the threat of data theft and fraud. To address these challenges, an automated real-time fraud detection system leveraging machine learning was developed. Four algorithms (Decision Tree, Naïve Bayes, XGBoost, and Neural Network) underwent comparison using a dataset from a clothing website that encompassed both legitimate and fraudulent transactions. The dataset exhibited an imbalance, with 9.3% representing fraud and 90.07% legitimate transactions. Performance evaluation metrics, including Recall, Precision, F1 Score, and AUC ROC, were employed to assess the effectiveness of each algorithm. XGBoost emerged as the top-performing model, achieving an impressive accuracy score of 95.85%. The proposed system proves to be a robust defense mechanism against fraudulent activities in e-commerce, thereby enhancing security and instilling trust in online transactions.

Study on Accelerating Distributed ML Training in Orchestration

  • Su-Yeon Kim;Seok-Jae Moon
    • International journal of advanced smart convergence
    • /
    • 제13권3호
    • /
    • pp.143-149
    • /
    • 2024
  • As the size of data and models in machine learning training continues to grow, training on a single server is becoming increasingly challenging. Consequently, the importance of distributed machine learning, which distributes computational loads across multiple machines, is becoming more prominent. However, several unresolved issues remain regarding the performance enhancement of distributed machine learning, including communication overhead, inter-node synchronization challenges, data imbalance and bias, as well as resource management and scheduling. In this paper, we propose ParamHub, which utilizes orchestration to accelerate training speed. This system monitors the performance of each node after the first iteration and reallocates resources to slow nodes, thereby speeding up the training process. This approach ensures that resources are appropriately allocated to nodes in need, maximizing the overall efficiency of resource utilization and enabling all nodes to perform tasks uniformly, resulting in a faster training speed overall. Furthermore, this method enhances the system's scalability and flexibility, allowing for effective application in clusters of various sizes.

불균형데이터의 비용민감학습을 통한 국방분야 이미지 분류 성능 향상에 관한 연구 (A Study on the Improvement of Image Classification Performance in the Defense Field through Cost-Sensitive Learning of Imbalanced Data)

  • 정미애;마정목
    • 한국군사과학기술학회지
    • /
    • 제24권3호
    • /
    • pp.281-292
    • /
    • 2021
  • With the development of deep learning technology, researchers and technicians keep attempting to apply deep learning in various industrial and academic fields, including the defense. Most of these attempts assume that the data are balanced. In reality, since lots of the data are imbalanced, the classifier is not properly built and the model's performance can be low. Therefore, this study proposes cost-sensitive learning as a solution to the imbalance data problem of image classification in the defense field. In the proposed model, cost-sensitive learning is a method of giving a high weight on the cost function of a minority class. The results of cost-sensitive based model shows the test F1-score is higher when cost-sensitive learning is applied than general learning's through 160 experiments using submarine/non-submarine dataset and warship/non-warship dataset. Furthermore, statistical tests are conducted and the results are shown significantly.

Evaluating AI Models and Predictors for COVID-19 Infection Dependent on Data from Patients with Cancer or Not: A Systematic Review

  • Takdon Kim;Heeyoung Lee
    • 한국임상약학회지
    • /
    • 제34권3호
    • /
    • pp.141-154
    • /
    • 2024
  • Background: As preexisting comorbidities are risk factors for Coronavirus Disease 19 (COVID-19), improved tools are needed for screening or diagnosing COVID-19 in clinical practice. Difficulties of including vulnerable patient data may create data imbalance and hinder the provision of well-performing prediction tools, such as artificial intelligence (AI) models. Thus, we systematically reviewed studies on AI prognosis prediction in patients infected with COVID-19 and existing comorbidities, including cancer, to investigate model performance and predictors dependent on patient data. PubMed and Cochrane Library databases were searched. This study included research meeting the criteria of using AI to predict outcomes in COVID-19 patients, whether they had cancer or not. Preprints, abstracts, reviews, and animal studies were excluded from the analysis. Majority of non-cancer studies (54.55 percent) showed an area under the curve (AUC) of >0.90 for AI models, whereas 30.77 percent of cancer studies showed the same result. For predicting mortality (3.85 percent), severity (8.33 percent), and hospitalization (14.29 percent), only cancer studies showed AUC values between 0.50 and 0.69. The distribution of comorbidity data varied more in non-cancer studies than in cancer studies but age was indicated as the primary predictor in all studies. Non-cancer studies with more balanced datasets of comorbidities showed higher AUC values than cancer studies. Based on the current findings, dataset balancing is essential for improving AI performance in predicting COVID-19 in patients with comorbidities, especially considering age.

특정 시간대 전력수요예측 시계열모형 (Electricity forecasting model using specific time zone)

  • 신이레;윤상후
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권2호
    • /
    • pp.275-284
    • /
    • 2016
  • 정확한 전력수요 예측은 에너지 소비를 줄이고 전력수급의 불균형을 방지한다. 본 연구는 외부요인의 영향을 가장 적게 받는 특정 시간대의 일 단위 전력 수요량을 참조선 (reference line)으로 한 시계열모형을 세우고자 한다. 고려된 시계열모형은 슬라이딩 창을 이용한 이중 계절성 Holt-Winters 모형과 TBATS 모형이다. 시계열모형의 모수는 2009년 1월 4일부터 2011년 12월 31일까지 자료를 이용하여 추정되었으며, 2012년 1월 1일부터 2012년 12월 29일까지의 각 모형의 전력수요량을 예측하여 성능을 비교하였다. RMSE와 MAPE를 통해 예측 성능을 비교한 결과 TBATS 모형의 성능이 우수하였다.

공공기관과 민간기업의 소득격차에 관한 연구 : 중국 지역별 격차를 중심으로 (The Research of Difference between Public and Private Section : Sort by Region in China)

  • 김영길;안진예;김수욱
    • 한국경영과학회지
    • /
    • 제40권1호
    • /
    • pp.139-154
    • /
    • 2015
  • This paper uses the Heckman model to evaluate the income difference between the public sector and the private sector based on the CHNS data. The research finds that the difference of the public sector versus the private sector between the west area and the east area is about 10% from 1989 to 2000, the transition of the income difference is smooth, that data has made sharp increase to 32% from 2000 to 2011. Considering the income difference between the west area and the central area, the central area and the east area from 1989 to 1997, the data is about 10~15%, from 2000 to 2011 is rocketing time, the data reaches 20%. This paper is very revealing about the income difference ofthe public sector versus the private sector is increasing year after year, and the economy is developing rapidly but with imbalance among different areas in China. It would provides the reference for adjust the income distribution system in future.

최소제곱평균 추정기법 알고리즘을 이용한 트랙서보패턴 간격 최적화 (Track servo patterns spacing optimization using least mean square estimation algorithm for holographic data storage)

  • 임성용;이종진;이재성;정우영;양현석;박노철;박영필
    • 정보저장시스템학회논문집
    • /
    • 제9권1호
    • /
    • pp.5-9
    • /
    • 2013
  • Page-oriented holographic data storage (HDS) is very sensitive to the disturbances. However, vibration effect by disc imbalance can be ignored because data pages are recorded and retrieved with stop-go rotation. Therefore, just estimating de-track due to eccentricity of disc is enough to construct stable track servo system. In this paper, propose the spacing of track servo patterns optimization method using Least Mean Square (LMS) estimation algorithm. Through the patterns spacing optimization, storage density maximize can be achieved.

Epidemiology of trigeminal neuralgia: an electronic population health data study in Korea

  • Lee, Cheol-Hyeong;Jang, Ho-Yeon;Won, Hyung-Sun;Kim, Ja-Sook;Kim, Yeon-Dong
    • The Korean Journal of Pain
    • /
    • 제34권3호
    • /
    • pp.332-338
    • /
    • 2021
  • Background: Trigeminal neuralgia (TN) is one of the most painful disorder in the orofacial region, and many patients have suffered from this disease. For the effective management of TN, fundamental epidemiologic data related to the target population group are essential. Thus, this study was performed to clarify the epidemiological characteristics of TN in the Korean population. This is the first national study to investigate the prevalence of TN in Korean patients. Methods: From 2014 to 2018, population-based medical data for 51,276,314 subscribers to the National Health Insurance Service of Korea were used for this study. Results: The incidence of TN was 100.21 per 100,000 person-years in the year of 2018 in Korea, and the male to female ratio was 1:2.14. The age group of 51-59 years had the highest prevalence of TN. Constant increases in medical cost, regional imbalance, and differences in prescription patterns by the medical specialties were showed in the management of TN. Conclusions: The results in this study will not only help to study the characteristics of TN, but also serve as an important basis for the effective management of TN in Korea.

특징학습과 계층분류를 이용한 침입탐지 방법 연구 (Intrusion Detection Approach using Feature Learning and Hierarchical Classification)

  • 이한성;정윤희;정세훈
    • 한국전자통신학회논문지
    • /
    • 제19권1호
    • /
    • pp.249-256
    • /
    • 2024
  • 기계학습 기반의 침입탐지 방법론들은 분류하고자 하는 각 클래스에 대해 균등한 많은 학습 데이터가 필요하며, 탐지 또는 분류하려는 공격유형의 추가 시 시스템을 모두 재학습해야 하는 문제점을 가지고 있다. 본 논문에서는 특징학습과 계층분류 방법을 이용하여, 비교적 적은 학습 데이터를 이용한 분류 문제 및 데이터 불균형 문제를 해결하고, 새로운 공격유형의 추가가 쉬운 침입탐지 방법론을 제안하고자 한다. 제안된 시스템은 KDD 침입탐지 데이터를 이용한 실험으로 가능성을 검증하였다.