• Title/Summary/Keyword: Imbalance Problem

Search Result 273, Processing Time 0.029 seconds

Planning and Establishment of Sejong City Smart City (세종시 스마트시티 구상 및 수립 방안)

  • Park, Jungsu;Jung, Hanmin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.161-163
    • /
    • 2021
  • This urban centralization is expected to develop rapidly, with 75% of the population living in the city by 2035. Large cities are becoming unsustainable due to side effects such as environmental pollution, severe traffic jams, excessive energy depletion, and destruction of the natural ecosystem. In addition, the happiness index of citizens of large cities is also falling because of high crime rates and safety accidents, the work-life imbalance caused by inequality and polarization, and overly competitive education. To solve this problem, Smart City, an IT-based future city model, was born. The Korean government is also actively attempting to improve urban competitiveness and promote sustainable development through efficient construction and operation of smart cities as a national focus project. To support the effort, we review the basic directions and strategies of Sejong City's Smart City service infrastructure based on the comprehensive national land plan, Smart City plan, and Smart City strategy plan.

  • PDF

An Improved Coyote Optimization Algorithm-Based Clustering for Extending Network Lifetime in Wireless Sensor Networks

  • Venkatesh Sivaprakasam;Vartika Kulshrestha;Godlin Atlas Lawrence Livingston;Senthilnathan Arumugam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1873-1893
    • /
    • 2023
  • The development of lightweight, low energy and small-sized sensors incorporated with the wireless networks has brought about a phenomenal growth of Wireless Sensor Networks (WSNs) in its different fields of applications. Moreover, the routing of data is crucial in a wide number of critical applications that includes ecosystem monitoring, military and disaster management. However, the time-delay, energy imbalance and minimized network lifetime are considered as the key problems faced during the process of data transmission. Furthermore, only when the functionality of cluster head selection is available in WSNs, it is possible to improve energy and network lifetime. Besides that, the task of cluster head selection is regarded as an NP-hard optimization problem that can be effectively modelled using hybrid metaheuristic approaches. Due to this reason, an Improved Coyote Optimization Algorithm-based Clustering Technique (ICOACT) is proposed for extending the lifetime for making efficient choices for cluster heads while maintaining a consistent balance between exploitation and exploration. The issue of premature convergence and its tendency of being trapped into the local optima in the Improved Coyote Optimization Algorithm (ICOA) through the selection of center solution is used for replacing the best solution in the search space during the clustering functionality. The simulation results of the proposed ICOACT confirmed its efficiency by increasing the number of alive nodes, the total number of clusters formed with the least amount of end-to-end delay and mean packet loss rate.

A Scalability based Energy Model for Sustainability of Blockchain Networks (블록체인 네트워크의 지속 가능성을 위한 확장성 기반 에너지 모델)

  • Seung Hyun Jeon;Bokrae Jung
    • Journal of Industrial Convergence
    • /
    • v.21 no.8
    • /
    • pp.51-58
    • /
    • 2023
  • Blockchains have recently struggled to design for the ideal distributed trust networks by solving scalability trilemma. However, local conflicts between some countries lead to imbalance on energy distribution. Besides, blockchain networks (e.g., Bitcoin) currently consume enormous energy for transaction and mining. The existing data volume based trust model evaluated an increasing blockchain size better than Lubin's trust model in scalability trilemma. In this paper, we propose a scalability based energy model to evaluate sustainability for blockchain networks, considering energy consumption for transaction, time duration, and the blockchain size of growing blockchain networks. Through the rigorous numerical analysis, we compare the proposed scalability based energy model with the existing model for the satisfaction and optimal blockchain size. Thus, the scalability based energy model will provide an assessment tool to choose the proper blockchain networks to solve scalability trilemma problem and prove sustainability.

Research on the Financial Data Fraud Detection of Chinese Listed Enterprises by Integrating Audit Opinions

  • Leiruo Zhou;Yunlong Duan;Wei Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3218-3241
    • /
    • 2023
  • Financial fraud undermines the sustainable development of financial markets. Financial statements can be regarded as the key source of information to obtain the operating conditions of listed companies. Current research focuses more on mining financial digital data instead of looking into text data. However, text data can reveal emotional information, which is an important basis for detecting financial fraud. The audit opinion of the financial statement is especially the fair opinion of a certified public accountant on the quality of enterprise financial reports. Therefore, this research was carried out by using the data features of 4,153 listed companies' financial annual reports and audits of text opinions in the past six years, and the paper puts forward a financial fraud detection model integrating audit opinions. First, the financial data index database and audit opinion text database were built. Second, digitized audit opinions with deep learning Bert model was employed. Finally, both the extracted audit numerical characteristics and the financial numerical indicators were used as the training data of the LightGBM model. What is worth paying attention to is that the imbalanced distribution of sample labels is also one of the focuses of financial fraud research. To solve this problem, data enhancement and Focal Loss feature learning functions were used in data processing and model training respectively. The experimental results show that compared with the conventional financial fraud detection model, the performance of the proposed model is improved greatly, with Area Under the Curve (AUC) and Accuracy reaching 81.42% and 78.15%, respectively.

Generative AI Jeonse Fraud Prevention System (생성형 인공지능 전세 사기 방지 시스템)

  • Yeon-Jae Oh
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.173-180
    • /
    • 2024
  • Along with its importance, the real estate market poses risks of various fraudulent activities. Recently, a surge in real estate-related scams, such as lease fraud, has caused great financial damage to many ordinary people. These problems are often caused by the complexity of real estate transactions and information imbalance. Therefore, there is an urgent need to secure reliability and improve transparency in the transaction process. In this paper, to solve this real estate fraud problem, we propose a chatbot system using digital technology and artificial intelligence, especially GPT (Generative Pre-Trained Transformer). This system serves to protect users from fraud by providing them with precautions and confirmations in the lease transaction process. In addition, GPT-based chatbots respond to questions from users in time, contributing to reducing uncertainty in the transaction process and increasing reliability.

A Diagnosis system of misalignments of linear motion robots using transfer learning (전이 학습을 이용한 선형 이송 로봇의 정렬 이상진단 시스템)

  • Su-bin Hong;Young-dae Lee;Arum Park;Chanwoo Moon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.801-807
    • /
    • 2024
  • Linear motion robots are devices that perform functions such as transferring parts or positioning devices, and require high precision. In companies that develop linear robot application systems, human workers are in charge of quality control and fault diagnosis of linear robots, and the result and accuracy of a fault diagnosis varies depending on the skill level of the person in charge. Recently, there have been many attempts to utilize artificial intelligence to diagnose faults in industrial devices. In this paper, we present a system that automatically diagnoses linear rail and ball screw misalignment of a linear robot using transfer learning. In industrial systems, it is difficult to obtain a lot of learning data, and this causes a data imbalance problem. In this case, a transfer learning model configured by retraining an established model is widely used. The information obtained by using an acceleration sensor and torque sensor was used, and its usefulness was evaluated for each case. After converting the signal obtained from the sensor into a spectrogram image, the type of abnormality was diagnosed using an image recognition artificial intelligence classifier. It is expected that the proposed method can be used not only for linear robots but also for diagnosing other industrial robots.

Decision Tree Induction with Imbalanced Data Set: A Case of Health Insurance Bill Audit in a General Hospital (불균형 데이터 집합에서의 의사결정나무 추론: 종합 병원의 건강 보험료 청구 심사 사례)

  • Hur, Joon;Kim, Jong-Woo
    • Information Systems Review
    • /
    • v.9 no.1
    • /
    • pp.45-65
    • /
    • 2007
  • In medical industry, health insurance bill audit is unique and essential process in general hospitals. The health insurance bill audit process is very important because not only for hospital's profit but also hospital's reputation. Particularly, at the large general hospitals many related workers including analysts, nurses, and etc. have engaged in the health insurance bill audit process. This paper introduces a case of health insurance bill audit for finding reducible health insurance bill cases using decision tree induction techniques at a large general hospital in Korea. When supervised learning methods had been tried to be applied, one of major problems was data imbalance problem in the health insurance bill audit data. In other words, there were many normal(passing) cases and relatively small number of reduction cases in a bill audit dataset. To resolve the problem, in this study, well-known methods for imbalanced data sets including over sampling of rare cases, under sampling of major cases, and adjusting the misclassification cost are combined in several ways to find appropriate decision trees that satisfy required conditions in health insurance bill audit situation.

An Aerodynamic Modeling and Simulation of a Folding Tandem Wing Type Aerial Launching UAV (접이식 직렬날개형 공중투하 무인비행체의 공력 모델링 및 시뮬레이션)

  • Lee, Seungjin;Lee, Jungmin;Ahn, Jeongwoo;Park, Jinyong
    • Journal of the Korea Society for Simulation
    • /
    • v.27 no.4
    • /
    • pp.19-26
    • /
    • 2018
  • The aerial launching UAV(Unmanned Aerial Vehicle) mainly uses a set of folding tandem wings to maximize flight performance and minimize the space required for mounting in a mothership. This folding tandem wing has a unique aerodynamic problem that is different from the general type of fixed wing aircraft, such as the rear wing interference problem caused by the wing of the front wing wake and vortex, and the imbalance of the pivot moment applied to the front and rear wings when the wing is deployed. In this paper, we have modeled and simulated various cases through computational fluid dynamics based on the finite volume method and analyzed various aerodynamic phenomena of the tandem wing type aircraft. We find that the front wing shall be installed higher than the rear for minimizing the wake influence and the rear wing can be deployed faster than the front because of the pivot moment due to aerodynamic forces. Also, considering the pivot moment due to aerodynamic force, the rear wing can be deployed much faster than the front wing. Therefore, it is necessary to consider it when developing the wing deploy mechanism.

A Study on the Prediction Model for Bioactive Components of Cnidium officinale Makino according to Climate Change using Machine Learning (머신러닝을 이용한 기후변화에 따른 천궁 생리 활성 성분 예측 모델 연구)

  • Hyunjo Lee;Hyun Jung Koo;Kyeong Cheol Lee;Won-Kyun Joo;Cheol-Joo Chae
    • Smart Media Journal
    • /
    • v.12 no.10
    • /
    • pp.93-101
    • /
    • 2023
  • Climate change has emerged as a global problem, with frequent temperature increases, droughts, and floods, and it is predicted that it will have a great impact on the characteristics and productivity of crops. Cnidium officinale is used not only as traditionally used herbal medicines, but also as various industrial raw materials such as health functional foods, natural medicines, and living materials, but productivity is decreasing due to threats such as continuous crop damage and climate change. Therefore, this paper proposes a model that can predict the physiologically active ingredient index according to the climate change scenario of Cnidium officinale, a representative medicinal crop vulnerable to climate change. In this paper, data was first augmented using the CTGAN algorithm to solve the problem of data imbalance in the collection of environment information, physiological reactions, and physiological active ingredient information. Column Shape and Column Pair Trends were used to measure augmented data quality, and overall quality of 88% was achieved on average. In addition, five models RF, SVR, XGBoost, AdaBoost, and LightBGM were used to predict phenol and flavonoid content by dividing them into ground and underground using augmented data. As a result of model evaluation, the XGBoost model showed the best performance in predicting the physiological active ingredients of the sacrum, and it was confirmed to be about twice as accurate as the SVR model.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.