• Title/Summary/Keyword: machine-learning method

Search Result 2,058, Processing Time 0.031 seconds

Study on Anomaly Detection Method of Improper Foods using Import Food Big data (수입식품 빅데이터를 이용한 부적합식품 탐지 시스템에 관한 연구)

  • Cho, Sanggoo;Choi, Gyunghyun
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.19-33
    • /
    • 2018
  • Owing to the increase of FTA, food trade, and versatile preferences of consumers, food import has increased at tremendous rate every year. While the inspection check of imported food accounts for about 20% of the total food import, the budget and manpower necessary for the government's import inspection control is reaching its limit. The sudden import food accidents can cause enormous social and economic losses. Therefore, predictive system to forecast the compliance of food import with its preemptive measures will greatly improve the efficiency and effectiveness of import safety control management. There has already been a huge data accumulated from the past. The processed foods account for 75% of the total food import in the import food sector. The analysis of big data and the application of analytical techniques are also used to extract meaningful information from a large amount of data. Unfortunately, not many studies have been done regarding analyzing the import food and its implication with understanding the big data of food import. In this context, this study applied a variety of classification algorithms in the field of machine learning and suggested a data preprocessing method through the generation of new derivative variables to improve the accuracy of the model. In addition, the present study compared the performance of the predictive classification algorithms with the general base classifier. The Gaussian Naïve Bayes prediction model among various base classifiers showed the best performance to detect and predict the nonconformity of imported food. In the future, it is expected that the application of the abnormality detection model using the Gaussian Naïve Bayes. The predictive model will reduce the burdens of the inspection of import food and increase the non-conformity rate, which will have a great effect on the efficiency of the food import safety control and the speed of import customs clearance.

A study on time series linkage in the Household Income and Expenditure Survey (가계동향조사 지출부문 시계열 연계 방안에 관한 연구)

  • Kim, Sihyeon;Seong, Byeongchan;Choi, Young-Geun;Yeo, In-kwon
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.553-568
    • /
    • 2022
  • The Household Income and Expenditure Survey is a representative survey of Statistics Korea, which aims to measure and analyze national income and consumption levels and their changes by understanding the current state of household balances. Recently, the disconnection problem in these time series caused by the large-scale reorganization of the survey methods in 2017 and 2019 has become an issue. In this study, we model the characteristics of the time series in the Household Income and Expenditure Survey up to 2016, and use the modeling to compute forecasts for linking the expenditures in 2017 and 2018. In order to evenly reflect the characteristics across all expenditure item series and to reduce the impact of a specific forecast model, we synthesize a total of 8 models such as regression models, time series models, and machine learning techniques. In particular, the noteworthy aspect of this study is that it improves the forecast by using the optimal combination technique that can exactly reflect the hierarchical structure of the Household Income and Expenditure Survey without loss of information as in the top-down or bottom-up methods. As a result of applying the proposed method to forecast expenditure series from 2017 to 2019, it contributed to the recovery of time series linkage and improved the forecast. In addition, it was confirmed that the hierarchical time series forecasts by the optimal combination method make linkage results closer to the actual survey series.

Quantitative Estimation Method for ML Model Performance Change, Due to Concept Drift (Concept Drift에 의한 ML 모델 성능 변화의 정량적 추정 방법)

  • Soon-Hong An;Hoon-Suk Lee;Seung-Hoon Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.6
    • /
    • pp.259-266
    • /
    • 2023
  • It is very difficult to measure the performance of the machine learning model in the business service stage. Therefore, managing the performance of the model through the operational department is not done effectively. Academically, various studies have been conducted on the concept drift detection method to determine whether the model status is appropriate. The operational department wants to know quantitatively the performance of the operating model, but concept drift can only detect the state of the model in relation to the data, it cannot estimate the quantitative performance of the model. In this study, we propose a performance prediction model (PPM) that quantitatively estimates precision through the statistics of concept drift. The proposed model induces artificial drift in the sampling data extracted from the training data, measures the precision of the sampling data, creates a dataset of drift and precision, and learns it. Then, the difference between the actual precision and the predicted precision is compared through the test data to correct the error of the performance prediction model. The proposed PPM was applied to two models, a loan underwriting model and a credit card fraud detection model that can be used in real business. It was confirmed that the precision was effectively predicted.

Reliability of mortar filling layer void length in in-service ballastless track-bridge system of HSR

  • Binbin He;Sheng Wen;Yulin Feng;Lizhong Jiang;Wangbao Zhou
    • Steel and Composite Structures
    • /
    • v.47 no.1
    • /
    • pp.91-102
    • /
    • 2023
  • To study the evaluation standard and control limit of mortar filling layer void length, in this paper, the train sub-model was developed by MATLAB and the track-bridge sub-model considering the mortar filling layer void was established by ANSYS. The two sub-models were assembled into a train-track-bridge coupling dynamic model through the wheel-rail contact relationship, and the validity was corroborated by the coupling dynamic model with the literature model. Considering the randomness of fastening stiffness, mortar elastic modulus, length of mortar filling layer void, and pier settlement, the test points were designed by the Box-Behnken method based on Design-Expert software. The coupled dynamic model was calculated, and the support vector regression (SVR) nonlinear mapping model of the wheel-rail system was established. The learning, prediction, and verification were carried out. Finally, the reliable probability of the amplification coefficient distribution of the response index of the train and structure in different ranges was obtained based on the SVR nonlinear mapping model and Latin hypercube sampling method. The limit of the length of the mortar filling layer void was, thus, obtained. The results show that the SVR nonlinear mapping model developed in this paper has a high fitting accuracy of 0.993, and the computational efficiency is significantly improved by 99.86%. It can be used to calculate the dynamic response of the wheel-rail system. The length of the mortar filling layer void significantly affects the wheel-rail vertical force, wheel weight load reduction ratio, rail vertical displacement, and track plate vertical displacement. The dynamic response of the track structure has a more significant effect on the limit value of the length of the mortar filling layer void than the dynamic response of the vehicle, and the rail vertical displacement is the most obvious. At 250 km/h - 350 km/h train running speed, the limit values of grade I, II, and III of the lengths of the mortar filling layer void are 3.932 m, 4.337 m, and 4.766 m, respectively. The results can provide some reference for the long-term service performance reliability of the ballastless track-bridge system of HRS.

Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information (언어 정보가 반영된 문장 점수를 활용하는 삭제 기반 문장 압축)

  • Lee, Jun-Beom;Kim, So-Eon;Park, Seong-Bae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.125-132
    • /
    • 2022
  • Sentence compression is a natural language processing task that generates concise sentences that preserves the important meaning of the original sentence. For grammatically appropriate sentence compression, early studies utilized human-defined linguistic rules. Furthermore, while the sequence-to-sequence models perform well on various natural language processing tasks, such as machine translation, there have been studies that utilize it for sentence compression. However, for the linguistic rule-based studies, all rules have to be defined by human, and for the sequence-to-sequence model based studies require a large amount of parallel data for model training. In order to address these challenges, Deleter, a sentence compression model that leverages a pre-trained language model BERT, is proposed. Because the Deleter utilizes perplexity based score computed over BERT to compress sentences, any linguistic rules and parallel dataset is not required for sentence compression. However, because Deleter compresses sentences only considering perplexity, it does not compress sentences by reflecting the linguistic information of the words in the sentences. Furthermore, since the dataset used for pre-learning BERT are far from compressed sentences, there is a problem that this can lad to incorrect sentence compression. In order to address these problems, this paper proposes a method to quantify the importance of linguistic information and reflect it in perplexity-based sentence scoring. Furthermore, by fine-tuning BERT with a corpus of news articles that often contain proper nouns and often omit the unnecessary modifiers, we allow BERT to measure the perplexity appropriate for sentence compression. The evaluations on the English and Korean dataset confirm that the sentence compression performance of sentence-scoring based models can be improved by utilizing the proposed method.

Development of a Water Quality Indicator Prediction Model for the Korean Peninsula Seas using Artificial Intelligence (인공지능 기법을 활용한 한반도 해역의 수질평가지수 예측모델 개발)

  • Seong-Su Kim;Kyuhee Son;Doyoun Kim;Jang-Mu Heo;Seongeun Kim
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.1
    • /
    • pp.24-35
    • /
    • 2023
  • Rapid industrialization and urbanization have led to severe marine pollution. A Water Quality Index (WQI) has been developed to allow the effective management of marine pollution. However, the WQI suffers from problems with loss of information due to the complex calculations involved, changes in standards, calculation errors by practitioners, and statistical errors. Consequently, research on the use of artificial intelligence techniques to predict the marine and coastal WQI is being conducted both locally and internationally. In this study, six techniques (RF, XGBoost, KNN, Ext, SVM, and LR) were studied using marine environmental measurement data (2000-2020) to determine the most appropriate artificial intelligence technique to estimate the WOI of five ecoregions in the Korean seas. Our results show that the random forest method offers the best performance as compared to the other methods studied. The residual analysis of the WQI predicted score and actual score using the random forest method shows that the temporal and spatial prediction performance was exceptional for all ecoregions. In conclusion, the RF model of WQI prediction developed in this study is considered to be applicable to Korean seas with high accuracy.

Artificial Neural Network with Firefly Algorithm-Based Collaborative Spectrum Sensing in Cognitive Radio Networks

  • Velmurugan., S;P. Ezhumalai;E.A. Mary Anita
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1951-1975
    • /
    • 2023
  • Recent advances in Cognitive Radio Networks (CRN) have elevated them to the status of a critical instrument for overcoming spectrum limits and achieving severe future wireless communication requirements. Collaborative spectrum sensing is presented for efficient channel selection because spectrum sensing is an essential part of CRNs. This study presents an innovative cooperative spectrum sensing (CSS) model that is built on the Firefly Algorithm (FA), as well as machine learning artificial neural networks (ANN). This system makes use of user grouping strategies to improve detection performance dramatically while lowering collaboration costs. Cooperative sensing wasn't used until after cognitive radio users had been correctly identified using energy data samples and an ANN model. Cooperative sensing strategies produce a user base that is either secure, requires less effort, or is faultless. The suggested method's purpose is to choose the best transmission channel. Clustering is utilized by the suggested ANN-FA model to reduce spectrum sensing inaccuracy. The transmission channel that has the highest weight is chosen by employing the method that has been provided for computing channel weight. The proposed ANN-FA model computes channel weight based on three sets of input parameters: PU utilization, CR count, and channel capacity. Using an improved evolutionary algorithm, the key principles of the ANN-FA scheme are optimized to boost the overall efficiency of the CRN channel selection technique. This study proposes the Artificial Neural Network with Firefly Algorithm (ANN-FA) for cognitive radio networks to overcome the obstacles. This proposed work focuses primarily on sensing the optimal secondary user channel and reducing the spectrum handoff delay in wireless networks. Several benchmark functions are utilized We analyze the efficacy of this innovative strategy by evaluating its performance. The performance of ANN-FA is 22.72 percent more robust and effective than that of the other metaheuristic algorithm, according to experimental findings. The proposed ANN-FA model is simulated using the NS2 simulator, The results are evaluated in terms of average interference ratio, spectrum opportunity utilization, three metrics are measured: packet delivery ratio (PDR), end-to-end delay, and end-to-average throughput for a variety of different CRs found in the network.

Studying the Comparative Analysis of Highway Traffic Accident Severity Using the Random Forest Method. (Random Forest를 활용한 고속도로 교통사고 심각도 비교분석에 관한 연구)

  • Sun-min Lee;Byoung-Jo Yoon;WutYeeLwin
    • Journal of the Society of Disaster Information
    • /
    • v.20 no.1
    • /
    • pp.156-168
    • /
    • 2024
  • Purpose: The trend of highway traffic accidents shows a repeating pattern of increase and decrease, with the fatality rate being highest on highways among all road types. Therefore, there is a need to establish improvement measures that reflect the situation within the country. Method: We conducted accident severity analysis using Random Forest on data from accidents occurring on 10 specific routes with high accident rates among national highways from 2019 to 2021. Factors influencing accident severity were identified. Result: The analysis, conducted using the SHAP package to determine the top 10 variable importance, revealed that among highway traffic accidents, the variables with a significant impact on accident severity are the age of the perpetrator being between 20 and less than 39 years, the time period being daytime (06:00-18:00), occurrence on weekends (Sat-Sun), seasons being summer and winter, violation of traffic regulations (failure to comply with safe driving), road type being a tunnel, geometric structure having a high number of lanes and a high speed limit. We identified a total of 10 independent variables that showed a positive correlation with highway traffic accident severity. Conclusion: As accidents on highways occur due to the complex interaction of various factors, predicting accidents poses significant challenges. However, utilizing the results obtained from this study, there is a need for in-depth analysis of the factors influencing the severity of highway traffic accidents. Efforts should be made to establish efficient and rational response measures based on the findings of this research.

Domain Knowledge Incorporated Local Rule-based Explanation for ML-based Bankruptcy Prediction Model (머신러닝 기반 부도예측모형에서 로컬영역의 도메인 지식 통합 규칙 기반 설명 방법)

  • Soo Hyun Cho;Kyung-shik Shin
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.105-123
    • /
    • 2022
  • Thanks to the remarkable success of Artificial Intelligence (A.I.) techniques, a new possibility for its application on the real-world problem has begun. One of the prominent applications is the bankruptcy prediction model as it is often used as a basic knowledge base for credit scoring models in the financial industry. As a result, there has been extensive research on how to improve the prediction accuracy of the model. However, despite its impressive performance, it is difficult to implement machine learning (ML)-based models due to its intrinsic trait of obscurity, especially when the field requires or values an explanation about the result obtained by the model. The financial domain is one of the areas where explanation matters to stakeholders such as domain experts and customers. In this paper, we propose a novel approach to incorporate financial domain knowledge into local rule generation to provide explanations for the bankruptcy prediction model at instance level. The result shows the proposed method successfully selects and classifies the extracted rules based on the feasibility and information they convey to the users.

A Study on Risk Parity Asset Allocation Model with XGBoos (XGBoost를 활용한 리스크패리티 자산배분 모형에 관한 연구)

  • Kim, Younghoon;Choi, HeungSik;Kim, SunWoong
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.135-149
    • /
    • 2020
  • Artificial intelligences are changing world. Financial market is also not an exception. Robo-Advisor is actively being developed, making up the weakness of traditional asset allocation methods and replacing the parts that are difficult for the traditional methods. It makes automated investment decisions with artificial intelligence algorithms and is used with various asset allocation models such as mean-variance model, Black-Litterman model and risk parity model. Risk parity model is a typical risk-based asset allocation model which is focused on the volatility of assets. It avoids investment risk structurally. So it has stability in the management of large size fund and it has been widely used in financial field. XGBoost model is a parallel tree-boosting method. It is an optimized gradient boosting model designed to be highly efficient and flexible. It not only makes billions of examples in limited memory environments but is also very fast to learn compared to traditional boosting methods. It is frequently used in various fields of data analysis and has a lot of advantages. So in this study, we propose a new asset allocation model that combines risk parity model and XGBoost machine learning model. This model uses XGBoost to predict the risk of assets and applies the predictive risk to the process of covariance estimation. There are estimated errors between the estimation period and the actual investment period because the optimized asset allocation model estimates the proportion of investments based on historical data. these estimated errors adversely affect the optimized portfolio performance. This study aims to improve the stability and portfolio performance of the model by predicting the volatility of the next investment period and reducing estimated errors of optimized asset allocation model. As a result, it narrows the gap between theory and practice and proposes a more advanced asset allocation model. In this study, we used the Korean stock market price data for a total of 17 years from 2003 to 2019 for the empirical test of the suggested model. The data sets are specifically composed of energy, finance, IT, industrial, material, telecommunication, utility, consumer, health care and staple sectors. We accumulated the value of prediction using moving-window method by 1,000 in-sample and 20 out-of-sample, so we produced a total of 154 rebalancing back-testing results. We analyzed portfolio performance in terms of cumulative rate of return and got a lot of sample data because of long period results. Comparing with traditional risk parity model, this experiment recorded improvements in both cumulative yield and reduction of estimated errors. The total cumulative return is 45.748%, about 5% higher than that of risk parity model and also the estimated errors are reduced in 9 out of 10 industry sectors. The reduction of estimated errors increases stability of the model and makes it easy to apply in practical investment. The results of the experiment showed improvement of portfolio performance by reducing the estimated errors of the optimized asset allocation model. Many financial models and asset allocation models are limited in practical investment because of the most fundamental question of whether the past characteristics of assets will continue into the future in the changing financial market. However, this study not only takes advantage of traditional asset allocation models, but also supplements the limitations of traditional methods and increases stability by predicting the risks of assets with the latest algorithm. There are various studies on parametric estimation methods to reduce the estimated errors in the portfolio optimization. We also suggested a new method to reduce estimated errors in optimized asset allocation model using machine learning. So this study is meaningful in that it proposes an advanced artificial intelligence asset allocation model for the fast-developing financial markets.