• Title/Summary/Keyword: machine learning techniques

Search Result 1,088, Processing Time 0.029 seconds

Study on the Effect of Training Data Sampling Strategy on the Accuracy of the Landslide Susceptibility Analysis Using Random Forest Method (Random Forest 기법을 이용한 산사태 취약성 평가 시 훈련 데이터 선택이 결과 정확도에 미치는 영향)

  • Kang, Kyoung-Hee;Park, Hyuck-Jin
    • Economic and Environmental Geology
    • /
    • v.52 no.2
    • /
    • pp.199-212
    • /
    • 2019
  • In the machine learning techniques, the sampling strategy of the training data affects a performance of the prediction model such as generalizing ability as well as prediction accuracy. Especially, in landslide susceptibility analysis, the data sampling procedure is the essential step for setting the training data because the number of non-landslide points is much bigger than the number of landslide points. However, the previous researches did not consider the various sampling methods for the training data. That is, the previous studies selected the training data randomly. Therefore, in this study the authors proposed several different sampling methods and assessed the effect of the sampling strategies of the training data in landslide susceptibility analysis. For that, total six different scenarios were set up based on the sampling strategies of landslide points and non-landslide points. Then Random Forest technique was trained on the basis of six different scenarios and the attribute importance for each input variable was evaluated. Subsequently, the landslide susceptibility maps were produced using the input variables and their attribute importances. In the analysis results, the AUC values of the landslide susceptibility maps, obtained from six different sampling strategies, showed high prediction rates, ranges from 70 % to 80 %. It means that the Random Forest technique shows appropriate predictive performance and the attribute importance for the input variables obtained from Random Forest can be used as the weight of landslide conditioning factors in the susceptibility analysis. In addition, the analysis results obtained using specific sampling strategies for training data show higher prediction accuracy than the analysis results using the previous random sampling method.

Investigating the Characteristics of Academia-Industrial Cooperation-based Patents for their Long-term Use (지속적 활용이 가능한 산학협력 특허 특성 분석)

  • Park, Sang-Young;Choi, Youngjae;Lee, Sungjoo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.3
    • /
    • pp.568-578
    • /
    • 2021
  • Patents that are research results from industry-university cooperation (IUC) are a source of innovation, and play an important role in economic growth, such as technology transfer and commercialization. For this reason, there are many efforts to revitalize IUC, but in general, company patents are achievements that can be commercialized, rather than research achievements, so not all patents are used for business, even after their creation as the outcome of IUC. Therefore, this research supports the design of measures in which IUC can ultimately be linked to successful utilization of patents by identifying the purposes of IUC, even after it has been successfully promoted, and patents have been filed as a result. To this end, first, the patents registered for industry-academia cooperation in the United States are collected, and second, a predictive model is designed, with unexpired and expired patents predicted using machine learning techniques. The final identified patents are intended to derive available factors in terms of marketability and technicality. This study is expected to help predict the utilization of unexpired and expired patents, and is expected to contribute to setting goals for research results from technical cooperation between corporate and university officials planning early IUC.

Machine-Learning Evaluation of Factors Influencing Landslides (머신러닝기법을 이용한 산사태 발생인자의 영향도 분석)

  • Park, Seong-Yong;Moon, Seong-Woo;Choi, Jaewan;Seo, Yong-Seok
    • The Journal of Engineering Geology
    • /
    • v.31 no.4
    • /
    • pp.701-718
    • /
    • 2021
  • Geological field surveys and a series of laboratory tests were conducted to obtain data related to landslides in Sancheok-myeon, Chungju-si, Chungcheongbuk-do, South Korea where many landslides occurred in the summer of 2020. The magnitudes of various factors' influence on landslide occurrence were evaluated using logistic regression analysis and an artificial neural network. Undisturbed specimens were sampled according to landslide occurrence, and dynamic cone penetration testing measured the depth of the soil layer during geological field surveys. Laboratory tests were performed following the standards of ASTM International. To solve the problem of multicollinearity, the variation inflation factor was calculated for all factors related to landslides, and then nine factors (shear strength, lithology, saturated water content, specific gravity, hydraulic conductivity, USCS, slope angle, and elevation) were determined as influential factors for consideration by machine learning techniques. Minimum-maximum normalization compared factors directly with each other. Logistic regression analysis identified soil depth, slope angle, saturated water content, and shear strength as having the greatest influence (in that order) on the occurrence of landslides. Artificial neural network analysis ranked factors by greatest influence in the order of slope angle, soil depth, saturated water content, and shear strength. Arithmetically averaging the effectiveness of both analyses found slope angle, soil depth, saturated water content, and shear strength as the top four factors. The sum of their effectiveness was ~70%.

Analysis of ICT Education Trends using Keyword Occurrence Frequency Analysis and CONCOR Technique (키워드 출현 빈도 분석과 CONCOR 기법을 이용한 ICT 교육 동향 분석)

  • Youngseok Lee
    • Journal of Industrial Convergence
    • /
    • v.21 no.1
    • /
    • pp.187-192
    • /
    • 2023
  • In this study, trends in ICT education were investigated by analyzing the frequency of appearance of keywords related to machine learning and using conversion of iteration correction(CONCOR) techniques. A total of 304 papers from 2018 to the present published in registered sites were searched on Google Scalar using "ICT education" as the keyword, and 60 papers pertaining to ICT education were selected based on a systematic literature review. Subsequently, keywords were extracted based on the title and summary of the paper. For word frequency and indicator data, 49 keywords with high appearance frequency were extracted by analyzing frequency, via the term frequency-inverse document frequency technique in natural language processing, and words with simultaneous appearance frequency. The relationship degree was verified by analyzing the connection structure and centrality of the connection degree between words, and a cluster composed of words with similarity was derived via CONCOR analysis. First, "education," "research," "result," "utilization," and "analysis" were analyzed as main keywords. Second, by analyzing an N-GRAM network graph with "education" as the keyword, "curriculum" and "utilization" were shown to exhibit the highest correlation level. Third, by conducting a cluster analysis with "education" as the keyword, five groups were formed: "curriculum," "programming," "student," "improvement," and "information." These results indicate that practical research necessary for ICT education can be conducted by analyzing ICT education trends and identifying trends.

Prediction of Safety Grade of Bridges Using the Classification Models of Decision Tree and Random Forest (의사결정나무 및 랜덤포레스트 분류 모델을 이용한 교량 안전등급 예측)

  • Hong, Jisu;Jeon, Se-Jin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.3
    • /
    • pp.397-411
    • /
    • 2023
  • The number of deteriorated bridges with a service period of more than 30 years has been rapidly increasing in Korea. Accordingly, the importance of advanced maintenance technologies through the predictions of age-induced deterioration degree, condition, and performance of bridges is more and more noticed. The prediction method of the safety grade of bridges was proposed in this study using the classification models of the Decision Tree and the Random Forest based on machine learning. As a result of analyzing these models for the 8,850 bridges located in national roads with various evaluation indexes such as confusion matrix, balanced accuracy, recall, ROC curve, and AUC, the Random Forest largely showed better predictive performance than that of the Decision Tree. In particular, random under-sampling in the Random Forest showed higher predictive performance than that of other sampling techniques for the C and D grade bridges, with the recall of 83.4%, which need more attention to maintenance because of the significant deterioration degree. The proposed model can be usefully applied to rapidly identify the safety grade and to establish an efficient and economical maintenance plan of bridges that have not recently been inspected.

Research on optimal safety ship-route based on artificial intelligence analysis using marine environment prediction (해양환경 예측정보를 활용한 인공지능 분석 기반의 최적 안전항로 연구)

  • Dae-yaoung Eeom;Bang-hee Lee
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.05a
    • /
    • pp.100-103
    • /
    • 2023
  • Recently, development of maritime autonomoust surface ships and eco-friendly ships, production and evaluation research considering various marine environments is needed in the field of optimal routes as the demand for accurate and detailed real-time marine environment prediction information expands. An algorithm that can calculate the optimal route while reducing the risk of the marine environment and uncertainty in energy consumption in smart ships was developed in 2 stages. In the first stage, a profile was created by combining marine environmental information with ship location and status information within the Automatic Ship Identification System(AIS). In the second stage, a model was developed that could define the marine environment energy map using the configured profile results, A regression equation was generated by applying Random Forest among machine learning techniques to reflect about 600,000 data. The Random Forest coefficient of determination (R2) was 0.89, showing very high reliability. The Dijikstra shortest path algorithm was applied to the marine environment prediction at June 1 to 3, 2021, and to calculate the optimal safety route and express it on the map. The route calculated by the random forest regression model was streamlined, and the route was derived considering the state of the marine environment prediction information. The concept of route calculation based on real-time marine environment prediction information in this study is expected to be able to calculate a realistic and safe route that reflects the movement tendency of ships, and to be expanded to a range of economic, safety, and eco-friendliness evaluation models in the future.

  • PDF

A Study on the Turbidity Estimation Model Using Data Mining Techniques in the Water Supply System (데이터마이닝 기법을 이용한 상수도 시스템 내의 탁도 예측모형 개발에 관한 연구)

  • Park, No-Suk;Kim, Soonho;Lee, Young Joo;Yoon, Sukmin
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.38 no.2
    • /
    • pp.87-95
    • /
    • 2016
  • Turbidity is a key indicator to the user that the 'Discolored Water' phenomenon known to be caused by corrosion of the pipeline in the water supply system. 'Discolored Water' is defined as a state with a turbidity of the degree to which the user visually be able to recognize water. Therefore, this study used data mining techniques in order to estimate turbidity changes in water supply system. Decision tree analysis was applied in data mining techniques to develop estimation models for turbidity changes in the water supply system. The pH and residual chlorine dataset was used as variables of the turbidity estimation model. As a result, the case of applying both variables(pH and residual chlorine) were shown more reasonable estimation results than models only using each variable. However, the estimation model developed in this study were shown to have underestimated predictions for the peak observed values. To overcome this disadvantage, a high-pass filter method was introduced as a pretreatment of estimation model. Modified model using high-pass filter method showed more exactly predictions for the peak observed values as well as improved prediction performance than the conventional model.

A Study on Classification of CNN-based Linux Malware using Image Processing Techniques (영상처리기법을 이용한 CNN 기반 리눅스 악성코드 분류 연구)

  • Kim, Se-Jin;Kim, Do-Yeon;Lee, Hoo-Ki;Lee, Tae-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.634-642
    • /
    • 2020
  • With the proliferation of Internet of Things (IoT) devices, using the Linux operating system in various architectures has increased. Also, security threats against Linux-based IoT devices are increasing, and malware variants based on existing malware are constantly appearing. In this paper, we propose a system where the binary data of a visualized Executable and Linkable Format (ELF) file is applied to Local Binary Pattern (LBP) image processing techniques and a median filter to classify malware in a Convolutional Neural Network (CNN). As a result, the original image showed the highest accuracy and F1-score at 98.77%, and reproducibility also showed the highest score at 98.55%. For the median filter, the highest precision was 99.19%, and the lowest false positive rate was 0.008%. Using the LBP technique confirmed that the overall result was lower than putting the original ELF file through the median filter. When the results of putting the original file through image processing techniques were classified by majority, it was confirmed that the accuracy, precision, F1-score, and false positive rate were better than putting the original file through the median filter. In the future, the proposed system will be used to classify malware families or add other image processing techniques to improve the accuracy of majority vote classification. Or maybe we mean "the use of Linux O/S distributions for various architectures has increased" instead? If not, please rephrase as intended.

Evaluating the groundwater prediction using LSTM model (LSTM 모형을 이용한 지하수위 예측 평가)

  • Park, Changhui;Chung, Il-Moon
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.4
    • /
    • pp.273-283
    • /
    • 2020
  • Quantitative forecasting of groundwater levels for the assessment of groundwater variation and vulnerability is very important. To achieve this purpose, various time series analysis and machine learning techniques have been used. In this study, we developed a prediction model based on LSTM (Long short term memory), one of the artificial neural network (ANN) algorithms, for predicting the daily groundwater level of 11 groundwater wells in Hankyung-myeon, Jeju Island. In general, the groundwater level in Jeju Island is highly autocorrelated with tides and reflected the effects of precipitation. In order to construct an input and output variables based on the characteristics of addressing data, the precipitation data of the corresponding period was added to the groundwater level data. The LSTM neural network was trained using the initial 365-day data showing the four seasons and the remaining data were used for verification to evaluate the fitness of the predictive model. The model was developed using Keras, a Python-based deep learning framework, and the NVIDIA CUDA architecture was implemented to enhance the learning speed. As a result of learning and verifying the groundwater level variation using the LSTM neural network, the coefficient of determination (R2) was 0.98 on average, indicating that the predictive model developed was very accurate.

Learning a Classifier for Weight Grouping of Export Containers (기계학습을 이용한 수출 컨테이너의 무게그룹 분류)

  • Kang, Jae-Ho;Kang, Byoung-Ho;Ryu, Kwang-Ryel;Kim, Kap-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.2
    • /
    • pp.59-79
    • /
    • 2005
  • Export containers in a container terminal are usually classified into a few weight groups and those belonging to the same group are placed together on a same stack. The reason for this stacking by weight groups is that it becomes easy to have the heavier containers be loaded onto a ship before the lighter ones, which is important for the balancing of the ship. However, since the weight information available at the time of container arrival is only an estimate, those belonging to different weight groups are often stored together on a same stack. This becomes the cause of extra moves, or rehandlings, of containers at the time of loading to fetch out the heavier containers placed under the lighter ones. In this paper, we use machine learning techniques to derive a classifier that can classify the containers into the weight groups with improved accuracy. We also show that a more useful classifier can be derived by applying a cost-sensitive learning technique, for which we introduce a scheme of searching for a good cost matrix. Simulation experiments have shown that our proposed method can reduce about 5$\sim$7% of rehandlings when compared to the traditional weight grouping method.

  • PDF