• Title/Summary/Keyword: support vector regression machine

Search Result 381, Processing Time 0.03 seconds

Energy analysis-based core drilling method for the prediction of rock uniaxial compressive strength

  • Qi, Wang;Shuo, Xu;Ke, Gao Hong;Peng, Zhang;Bei, Jiang;Hong, Liu Bo
    • Geomechanics and Engineering
    • /
    • v.23 no.1
    • /
    • pp.61-69
    • /
    • 2020
  • The uniaxial compressive strength (UCS) of rock is a basic parameter in underground engineering design. The disadvantages of this commonly employed laboratory testing method are untimely testing, difficulty in performing core testing of broken rock mass and long and complicated onsite testing processes. Therefore, the development of a fast and simple in situ rock UCS testing method for field use is urgent. In this study, a multi-function digital rock drilling and testing system and a digital core bit dedicated to the system are independently developed and employed in digital drilling tests on rock specimens with different strengths. The energy analysis is performed during rock cutting to estimate the energy consumed by the drill bit to remove a unit volume of rock. Two quantitative relationship models of energy analysis-based core drilling parameters (ECD) and rock UCS (ECD-UCS models) are established in this manuscript by the methods of regression analysis and support vector machine (SVM). The predictive abilities of the two models are comparatively analysed. The results show that the mean value of relative difference between the predicted rock UCS values and the UCS values measured by the laboratory uniaxial compression test in the prediction set are 3.76 MPa and 4.30 MPa, respectively, and the standard deviations are 2.08 MPa and 4.14 MPa, respectively. The regression analysis-based ECD-UCS model has a more stable predictive ability. The energy analysis-based rock drilling method for the prediction of UCS is proposed. This method realized the quick and convenient in situ test of rock UCS.

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games (데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구)

  • Oh, Younhak;Kim, Han;Yun, Jaesub;Lee, Jong-Seok
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.8-17
    • /
    • 2014
  • In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

Financial Fraud Detection using Data Mining: A Survey

  • Sudhansu Ranjan Lenka;Bikram Kesari Ratha
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.9
    • /
    • pp.169-185
    • /
    • 2024
  • Due to levitate and rapid growth of E-Commerce, most of the organizations are moving towards cashless transaction Unfortunately, the cashless transactions are not only used by legitimate users but also it is used by illegitimate users and which results in trouncing of billions of dollars each year worldwide. Fraud prevention and Fraud Detection are two methods used by the financial institutions to protect against these frauds. Fraud prevention systems (FPSs) are not sufficient enough to provide fully security to the E-Commerce systems. However, with the combined effect of Fraud Detection Systems (FDS) and FPS might protect the frauds. However, there still exist so many issues and challenges that degrade the performances of FDSs, such as overlapping of data, noisy data, misclassification of data, etc. This paper presents a comprehensive survey on financial fraud detection system using such data mining techniques. Over seventy research papers have been reviewed, mainly within the period 2002-2015, were analyzed in this study. The data mining approaches employed in this research includes Neural Network, Logistic Regression, Bayesian Belief Network, Support Vector Machine (SVM), Self Organizing Map(SOM), K-Nearest Neighbor(K-NN), Random Forest and Genetic Algorithm. The algorithms that have achieved high success rate in detecting credit card fraud are Logistic Regression (99.2%), SVM (99.6%) and Random Forests (99.6%). But, the most suitable approach is SOM because it has achieved perfect accuracy of 100%. But the algorithms implemented for financial statement fraud have shown a large difference in accuracy from CDA at 71.4% to a probabilistic neural network with 98.1%. In this paper, we have identified the research gap and specified the performance achieved by different algorithms based on parameters like, accuracy, sensitivity and specificity. Some of the key issues and challenges associated with the FDS have also been identified.

Estimation of KOSPI200 Index option volatility using Artificial Intelligence (이기종 머신러닝기법을 활용한 KOSPI200 옵션변동성 예측)

  • Shin, Sohee;Oh, Hayoung;Kim, Jang Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.10
    • /
    • pp.1423-1431
    • /
    • 2022
  • Volatility is one of the variables that the Black-Scholes model requires for option pricing. It is an unknown variable at the present time, however, since the option price can be observed in the market, implied volatility can be derived from the price of an option at any given point in time and can represent the market's expectation of future volatility. Although volatility in the Black-Scholes model is constant, when calculating implied volatility, it is common to observe a volatility smile which shows that the implied volatility is different depending on the strike prices. We implement supervised learning to target implied volatility by adding V-KOSPI to ease volatility smile. We examine the estimation performance of KOSPI200 index options' implied volatility using various Machine Learning algorithms such as Linear Regression, Tree, Support Vector Machine, KNN and Deep Neural Network. The training accuracy was the highest(99.9%) in Decision Tree model and test accuracy was the highest(96.9%) in Random Forest model.

Status of Groundwater Potential Mapping Research Using GIS and Machine Learning (GIS와 기계학습을 이용한 지하수 가능성도 작성 연구 현황)

  • Lee, Saro;Fetemeh, Rezaie
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_1
    • /
    • pp.1277-1290
    • /
    • 2020
  • Water resources which is formed of surface and groundwater, are considered as one of the pivotal natural resources worldwide. Since last century, the rapid population growth as well as accelerated industrialization and explosive urbanization lead to boost demand for groundwater for domestic, industrial and agricultural use. In fact, better management of groundwater can play crucial role in sustainable development; therefore, determining accurate location of groundwater based groundwater potential mapping is indispensable. In recent years, integration of machine learning techniques, Geographical Information System (GIS) and Remote Sensing (RS) are popular and effective methods employed for groundwater potential mapping. For determining the status of the integrated approach, a systematic review of 94 directly relevant papers were carried out over the six previous years (2015-2020). According to the literature review, the number of studies published annually increased rapidly over time. The total study area spanned 15 countries, and 85.1% of studies focused on Iran, India, China, South Korea, and Iraq. 20 variables were found to be frequently involved in groundwater potential investigations, of which 9 factors are almost always present namely slope, lithology (geology), land use/land cover (LU/LC), drainage/river density, altitude (elevation), topographic wetness index (TWI), distance from river, rainfall, and aspect. The data integration was carried random forest, support vector machine and boost regression tree among the machine learning techniques. Our study shows that for optimal results, groundwater mapping must be used as a tool to complement field work, rather than a low-cost substitute. Consequently, more study should be conducted to enhance the generalization and precision of groundwater potential map.

Change Analysis of Aboveground Forest Carbon Stocks According to the Land Cover Change Using Multi-Temporal Landsat TM Images and Machine Learning Algorithms (다시기 Landsat TM 영상과 기계학습을 이용한 토지피복변화에 따른 산림탄소저장량 변화 분석)

  • LEE, Jung-Hee;IM, Jung-Ho;KIM, Kyoung-Min;HEO, Joon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.18 no.4
    • /
    • pp.81-99
    • /
    • 2015
  • The acceleration of global warming has required better understanding of carbon cycles over local and regional areas such as the Korean peninsula. Since forests serve as a carbon sink, which stores a large amount of terrestrial carbon, there has been a demand to accurately estimate such forest carbon sequestration. In Korea, the National Forest Inventory(NFI) has been used to estimate the forest carbon stocks based on the amount of growing stocks per hectare measured at sampled location. However, as such data are based on point(i.e., plot) measurements, it is difficult to identify spatial distribution of forest carbon stocks. This study focuses on urban areas, which have limited number of NFI samples and have shown rapid land cover change, to estimate grid-based forest carbon stocks based on UNFCCC Approach 3 and Tier 3. Land cover change and forest carbon stocks were estimated using Landsat 5 TM data acquired in 1991, 1992, 2010, and 2011, high resolution airborne images, and the 3rd, 5th~6th NFI data. Machine learning techniques(i.e., random forest and support vector machines/regression) were used for land cover change classification and forest carbon stock estimation. Forest carbon stocks were estimated using reflectance, band ratios, vegetation indices, and topographical indices. Results showed that 33.23tonC/ha of carbon was sequestrated on the unchanged forest areas between 1991 and 2010, while 36.83 tonC/ha of carbon was sequestrated on the areas changed from other land-use types to forests. A total of 7.35 tonC/ha of carbon was released on the areas changed from forests to other land-use types. This study was a good chance to understand the quantitative forest carbon stock change according to the land cover change. Moreover the result of this study can contribute to the effective forest management.

A Study on Optimization of Perovskite Solar Cell Light Absorption Layer Thin Film Based on Machine Learning (머신러닝 기반 페로브스카이트 태양전지 광흡수층 박막 최적화를 위한 연구)

  • Ha, Jae-jun;Lee, Jun-hyuk;Oh, Ju-young;Lee, Dong-geun
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.7
    • /
    • pp.55-62
    • /
    • 2022
  • The perovskite solar cell is an active part of research in renewable energy fields such as solar energy, wind, hydroelectric power, marine energy, bioenergy, and hydrogen energy to replace fossil fuels such as oil, coal, and natural gas, which will gradually disappear as power demand increases due to the increase in use of the Internet of Things and Virtual environments due to the 4th industrial revolution. The perovskite solar cell is a solar cell device using an organic-inorganic hybrid material having a perovskite structure, and has advantages of replacing existing silicon solar cells with high efficiency, low cost solutions, and low temperature processes. In order to optimize the light absorption layer thin film predicted by the existing empirical method, reliability must be verified through device characteristics evaluation. However, since it costs a lot to evaluate the characteristics of the light-absorbing layer thin film device, the number of tests is limited. In order to solve this problem, the development and applicability of a clear and valid model using machine learning or artificial intelligence model as an auxiliary means for optimizing the light absorption layer thin film are considered infinite. In this study, to estimate the light absorption layer thin-film optimization of perovskite solar cells, the regression models of the support vector machine's linear kernel, R.B.F kernel, polynomial kernel, and sigmoid kernel were compared to verify the accuracy difference for each kernel function.

Study on Predicting the Designation of Administrative Issue in the KOSDAQ Market Based on Machine Learning Based on Financial Data (머신러닝 기반 KOSDAQ 시장의 관리종목 지정 예측 연구: 재무적 데이터를 중심으로)

  • Yoon, Yanghyun;Kim, Taekyung;Kim, Suyeong
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.1
    • /
    • pp.229-249
    • /
    • 2022
  • This paper investigates machine learning models for predicting the designation of administrative issues in the KOSDAQ market through various techniques. When a company in the Korean stock market is designated as administrative issue, the market recognizes the event itself as negative information, causing losses to the company and investors. The purpose of this study is to evaluate alternative methods for developing a artificial intelligence service to examine a possibility to the designation of administrative issues early through the financial ratio of companies and to help investors manage portfolio risks. In this study, the independent variables used 21 financial ratios representing profitability, stability, activity, and growth. From 2011 to 2020, when K-IFRS was applied, financial data of companies in administrative issues and non-administrative issues stocks are sampled. Logistic regression analysis, decision tree, support vector machine, random forest, and LightGBM are used to predict the designation of administrative issues. According to the results of analysis, LightGBM with 82.73% classification accuracy is the best prediction model, and the prediction model with the lowest classification accuracy is a decision tree with 71.94% accuracy. As a result of checking the top three variables of the importance of variables in the decision tree-based learning model, the financial variables common in each model are ROE(Net profit) and Capital stock turnover ratio, which are relatively important variables in designating administrative issues. In general, it is confirmed that the learning model using the ensemble had higher predictive performance than the single learning model.

Reliability of mortar filling layer void length in in-service ballastless track-bridge system of HSR

  • Binbin He;Sheng Wen;Yulin Feng;Lizhong Jiang;Wangbao Zhou
    • Steel and Composite Structures
    • /
    • v.47 no.1
    • /
    • pp.91-102
    • /
    • 2023
  • To study the evaluation standard and control limit of mortar filling layer void length, in this paper, the train sub-model was developed by MATLAB and the track-bridge sub-model considering the mortar filling layer void was established by ANSYS. The two sub-models were assembled into a train-track-bridge coupling dynamic model through the wheel-rail contact relationship, and the validity was corroborated by the coupling dynamic model with the literature model. Considering the randomness of fastening stiffness, mortar elastic modulus, length of mortar filling layer void, and pier settlement, the test points were designed by the Box-Behnken method based on Design-Expert software. The coupled dynamic model was calculated, and the support vector regression (SVR) nonlinear mapping model of the wheel-rail system was established. The learning, prediction, and verification were carried out. Finally, the reliable probability of the amplification coefficient distribution of the response index of the train and structure in different ranges was obtained based on the SVR nonlinear mapping model and Latin hypercube sampling method. The limit of the length of the mortar filling layer void was, thus, obtained. The results show that the SVR nonlinear mapping model developed in this paper has a high fitting accuracy of 0.993, and the computational efficiency is significantly improved by 99.86%. It can be used to calculate the dynamic response of the wheel-rail system. The length of the mortar filling layer void significantly affects the wheel-rail vertical force, wheel weight load reduction ratio, rail vertical displacement, and track plate vertical displacement. The dynamic response of the track structure has a more significant effect on the limit value of the length of the mortar filling layer void than the dynamic response of the vehicle, and the rail vertical displacement is the most obvious. At 250 km/h - 350 km/h train running speed, the limit values of grade I, II, and III of the lengths of the mortar filling layer void are 3.932 m, 4.337 m, and 4.766 m, respectively. The results can provide some reference for the long-term service performance reliability of the ballastless track-bridge system of HRS.

The big data method for flash flood warning (돌발홍수 예보를 위한 빅데이터 분석방법)

  • Park, Dain;Yoon, Sanghoo
    • Journal of Digital Convergence
    • /
    • v.15 no.11
    • /
    • pp.245-250
    • /
    • 2017
  • Flash floods is defined as the flooding of intense rainfall over a relatively small area that flows through river and valley rapidly in short time with no advance warning. So that it can cause damage property and casuality. This study is to establish the flash-flood warning system using 38 accident data, reported from the National Disaster Information Center and Land Surface Model(TOPLATS) between 2009 and 2012. Three variables were used in the Land Surface Model: precipitation, soil moisture, and surface runoff. The three variables of 6 hours preceding flash flood were reduced to 3 factors through factor analysis. Decision tree, random forest, Naive Bayes, Support Vector Machine, and logistic regression model are considered as big data methods. The prediction performance was evaluated by comparison of Accuracy, Kappa, TP Rate, FP Rate and F-Measure. The best method was suggested based on reproducibility evaluation at the each points of flash flood occurrence and predicted count versus actual count using 4 years data.