• Title/Summary/Keyword: Machine Learning

Search Result 5,492, Processing Time 0.035 seconds

Domain Knowledge Incorporated Local Rule-based Explanation for ML-based Bankruptcy Prediction Model (머신러닝 기반 부도예측모형에서 로컬영역의 도메인 지식 통합 규칙 기반 설명 방법)

  • Soo Hyun Cho;Kyung-shik Shin
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.105-123
    • /
    • 2022
  • Thanks to the remarkable success of Artificial Intelligence (A.I.) techniques, a new possibility for its application on the real-world problem has begun. One of the prominent applications is the bankruptcy prediction model as it is often used as a basic knowledge base for credit scoring models in the financial industry. As a result, there has been extensive research on how to improve the prediction accuracy of the model. However, despite its impressive performance, it is difficult to implement machine learning (ML)-based models due to its intrinsic trait of obscurity, especially when the field requires or values an explanation about the result obtained by the model. The financial domain is one of the areas where explanation matters to stakeholders such as domain experts and customers. In this paper, we propose a novel approach to incorporate financial domain knowledge into local rule generation to provide explanations for the bankruptcy prediction model at instance level. The result shows the proposed method successfully selects and classifies the extracted rules based on the feasibility and information they convey to the users.

Safety Verification Techniques of Privacy Policy Using GPT (GPT를 활용한 개인정보 처리방침 안전성 검증 기법)

  • Hye-Yeon Shim;MinSeo Kweun;DaYoung Yoon;JiYoung Seo;Il-Gu Lee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.2
    • /
    • pp.207-216
    • /
    • 2024
  • As big data was built due to the 4th Industrial Revolution, personalized services increased rapidly. As a result, the amount of personal information collected from online services has increased, and concerns about users' personal information leakage and privacy infringement have increased. Online service providers provide privacy policies to address concerns about privacy infringement of users, but privacy policies are often misused due to the long and complex problem that it is difficult for users to directly identify risk items. Therefore, there is a need for a method that can automatically check whether the privacy policy is safe. However, the safety verification technique of the conventional blacklist and machine learning-based privacy policy has a problem that is difficult to expand or has low accessibility. In this paper, to solve the problem, we propose a safety verification technique for the privacy policy using the GPT-3.5 API, which is a generative artificial intelligence. Classification work can be performed evenin a new environment, and it shows the possibility that the general public without expertise can easily inspect the privacy policy. In the experiment, how accurately the blacklist-based privacy policy and the GPT-based privacy policy classify safe and unsafe sentences and the time spent on classification was measured. According to the experimental results, the proposed technique showed 10.34% higher accuracy on average than the conventional blacklist-based sentence safety verification technique.

Multifaceted Evaluation Methodology for AI Interview Candidates - Integration of Facial Recognition, Voice Analysis, and Natural Language Processing (AI면접 대상자에 대한 다면적 평가방법론 -얼굴인식, 음성분석, 자연어처리 영역의 융합)

  • Hyunwook Ji;Sangjin Lee;Seongmin Mun;Jaeyeol Lee;Dongeun Lee;kyusang Lim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.55-58
    • /
    • 2024
  • 최근 각 기업의 AI 면접시스템 도입이 증가하고 있으며, AI 면접에 대한 실효성 논란 또한 많은 상황이다. 본 논문에서는 AI 면접 과정에서 지원자를 평가하는 방식을 시각, 음성, 자연어처리 3영역에서 구현함으로써, 면접 지원자를 다방면으로 분석 방법론의 적절성에 대해 평가하고자 한다. 첫째, 시각적 측면에서, 면접 지원자의 감정을 인식하기 위해, 합성곱 신경망(CNN) 기법을 활용해, 지원자 얼굴에서 6가지 감정을 인식했으며, 지원자가 카메라를 응시하고 있는지를 시계열로 도출하였다. 이를 통해 지원자가 면접에 임하는 태도와 특히 얼굴에서 드러나는 감정을 분석하는 데 주력했다. 둘째, 시각적 효과만으로 면접자의 태도를 파악하는 데 한계가 있기 때문에, 지원자 음성을 주파수로 환산해 특성을 추출하고, Bidirectional LSTM을 활용해 훈련해 지원자 음성에 따른 6가지 감정을 추출했다. 셋째, 지원자의 발언 내용과 관련해 맥락적 의미를 파악해 지원자의 상태를 파악하기 위해, 음성을 STT(Speech-to-Text) 기법을 이용하여 텍스트로 변환하고, 사용 단어의 빈도를 분석하여 지원자의 언어 습관을 파악했다. 이와 함께, 지원자의 발언 내용에 대한 감정 분석을 위해 KoBERT 모델을 적용했으며, 지원자의 성격, 태도, 직무에 대한 이해도를 파악하기 위해 객관적인 평가지표를 제작하여 적용했다. 논문의 분석 결과 AI 면접의 다면적 평가시스템의 적절성과 관련해, 시각화 부분에서는 상당 부분 정확도가 객관적으로 입증되었다고 판단된다. 음성에서 감정분석 분야는 면접자가 제한된 시간에 모든 유형의 감정을 드러내지 않고, 또 유사한 톤의 말이 진행되다 보니 특정 감정을 나타내는 주파수가 다소 집중되는 현상이 나타났다. 마지막으로 자연어처리 영역은 면접자의 발언에서 나오는 말투, 특정 단어의 빈도수를 넘어, 전체적인 맥락과 느낌을 이해할 수 있는 자연어처리 분석모델의 필요성이 더욱 커졌음을 판단했다.

  • PDF

A Study on the Drug Classification Using Machine Learning Techniques (머신러닝 기법을 이용한 약물 분류 방법 연구)

  • Anmol Kumar Singh;Ayush Kumar;Adya Singh;Akashika Anshum;Pradeep Kumar Mallick
    • Advanced Industrial SCIence
    • /
    • v.3 no.2
    • /
    • pp.8-16
    • /
    • 2024
  • This paper shows the system of drug classification, the goal of this is to foretell the apt drug for the patients based on their demographic and physiological traits. The dataset consists of various attributes like Age, Sex, BP (Blood Pressure), Cholesterol Level, and Na_to_K (Sodium to Potassium ratio), with the objective to determine the kind of drug being given. The models used in this paper are K-Nearest Neighbors (KNN), Logistic Regression and Random Forest. Further to fine-tune hyper parameters using 5-fold cross-validation, GridSearchCV was used and each model was trained and tested on the dataset. To assess the performance of each model both with and without hyper parameter tuning evaluation metrics like accuracy, confusion matrices, and classification reports were used and the accuracy of the models without GridSearchCV was 0.7, 0.875, 0.975 and with GridSearchCV was 0.75, 1.0, 0.975. According to GridSearchCV Logistic Regression is the most suitable model for drug classification among the three-model used followed by the K-Nearest Neighbors. Also, Na_to_K is an essential feature in predicting the outcome.

Mapping Mammalian Species Richness Using a Machine Learning Algorithm (머신러닝 알고리즘을 이용한 포유류 종 풍부도 매핑 구축 연구)

  • Zhiying Jin;Dongkun Lee;Eunsub Kim;Jiyoung Choi;Yoonho Jeon
    • Journal of Environmental Impact Assessment
    • /
    • v.33 no.2
    • /
    • pp.53-63
    • /
    • 2024
  • Biodiversity holds significant importance within the framework of environmental impact assessment, being utilized in site selection for development, understanding the surrounding environment, and assessing the impact on species due to disturbances. The field of environmental impact assessment has seen substantial research exploring new technologies and models to evaluate and predict biodiversity more accurately. While current assessments rely on data from fieldwork and literature surveys to gauge species richness indices, limitations in spatial and temporal coverage underscore the need for high-resolution biodiversity assessments through species richness mapping. In this study, leveraging data from the 4th National Ecosystem Survey and environmental variables, we developed a species distribution model using Random Forest. This model yielded mapping results of 24 mammalian species' distribution, utilizing the species richness index to generate a 100-meter resolution map of species richness. The research findings exhibited a notably high predictive accuracy, with the species distribution model demonstrating an average AUC value of 0.82. In addition, the comparison with National Ecosystem Survey data reveals that the species richness distribution in the high-resolution species richness mapping results conforms to a normal distribution. Hence, it stands as highly reliable foundational data for environmental impact assessment. Such research and analytical outcomes could serve as pivotal new reference materials for future urban development projects, offering insights for biodiversity assessment and habitat preservation endeavors.

Analysis of the Effectiveness of Big Data-Based Six Sigma Methodology: Focus on DX SS (빅데이터 기반 6시그마 방법론의 유효성 분석: DX SS를 중심으로)

  • Kim Jung Hyuk;Kim Yoon Ki
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.13 no.1
    • /
    • pp.1-16
    • /
    • 2024
  • Over recent years, 6 Sigma has become a key methodology in manufacturing for quality improvement and cost reduction. However, challenges have arisen due to the difficulty in analyzing large-scale data generated by smart factories and its traditional, formal application. To address these limitations, a big data-based 6 Sigma approach has been developed, integrating the strengths of 6 Sigma and big data analysis, including statistical verification, mathematical optimization, interpretability, and machine learning. Despite its potential, the practical impact of this big data-based 6 Sigma on manufacturing processes and management performance has not been adequately verified, leading to its limited reliability and underutilization in practice. This study investigates the efficiency impact of DX SS, a big data-based 6 Sigma, on manufacturing processes, and identifies key success policies for its effective introduction and implementation in enterprises. The study highlights the importance of involving all executives and employees and researching key success policies, as demonstrated by cases where methodology implementation failed due to incorrect policies. This research aims to assist manufacturing companies in achieving successful outcomes by actively adopting and utilizing the methodologies presented.

Performance Evaluation and Analysis on Single and Multi-Network Virtualization Systems with Virtio and SR-IOV (가상화 시스템에서 Virtio와 SR-IOV 적용에 대한 단일 및 다중 네트워크 성능 평가 및 분석)

  • Jaehak Lee;Jongbeom Lim;Heonchang Yu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.2
    • /
    • pp.48-59
    • /
    • 2024
  • As functions that support virtualization on their own in hardware are developed, user applications having various workloads are operating efficiently in the virtualization system. SR-IOV is a virtualization support function that takes direct access to PCI devices, thus giving a high I/O performance by minimizing the need for hypervisor or operating system interventions. With SR-IOV, network I/O acceleration can be realized in virtualization systems that have relatively long I/O paths compared to bare-metal systems and frequent context switches between the user area and kernel area. To take performance advantages of SR-IOV, network resource management policies that can derive optimal network performance when SR-IOV is applied to an instance such as a virtual machine(VM) or container are being actively studied.This paper evaluates and analyzes the network performance of SR-IOV implementing I/O acceleration is compared with Virtio in terms of 1) network delay, 2) network throughput, 3) network fairness, 4) performance interference, and 5) multi-network. The contributions of this paper are as follows. First, the network I/O process of Virtio and SR-IOV was clearly explained in the virtualization system, and second, the evaluation results of the network performance of Virtio and SR-IOV were analyzed based on various performance metrics. Third, the system overhead and the possibility of optimization for the SR-IOV network in a virtualization system with high VM density were experimentally confirmed. The experimental results and analysis of the paper are expected to be referenced in the network resource management policy for virtualization systems that operate network-intensive services such as smart factories, connected cars, deep learning inference models, and crowdsourcing.

Study on the Seismic Random Noise Attenuation for the Seismic Attribute Analysis (탄성파 속성 분석을 위한 탄성파 자료 무작위 잡음 제거 연구)

  • Jongpil Won;Jungkyun Shin;Jiho Ha;Hyunggu Jun
    • Economic and Environmental Geology
    • /
    • v.57 no.1
    • /
    • pp.51-71
    • /
    • 2024
  • Seismic exploration is one of the widely used geophysical exploration methods with various applications such as resource development, geotechnical investigation, and subsurface monitoring. It is essential for interpreting the geological characteristics of subsurface by providing accurate images of stratum structures. Typically, geological features are interpreted by visually analyzing seismic sections. However, recently, quantitative analysis of seismic data has been extensively researched to accurately extract and interpret target geological features. Seismic attribute analysis can provide quantitative information for geological interpretation based on seismic data. Therefore, it is widely used in various fields, including the analysis of oil and gas reservoirs, investigation of fault and fracture, and assessment of shallow gas distributions. However, seismic attribute analysis is sensitive to noise within the seismic data, thus additional noise attenuation is required to enhance the accuracy of the seismic attribute analysis. In this study, four kinds of seismic noise attenuation methods are applied and compared to mitigate random noise of poststack seismic data and enhance the attribute analysis results. FX deconvolution, DSMF, Noise2Noise, and DnCNN are applied to the Youngil Bay high-resolution seismic data to remove seismic random noise. Energy, sweetness, and similarity attributes are calculated from noise-removed seismic data. Subsequently, the characteristics of each noise attenuation method, noise removal results, and seismic attribute analysis results are qualitatively and quantitatively analyzed. Based on the advantages and disadvantages of each noise attenuation method and the characteristics of each seismic attribute analysis, we propose a suitable noise attenuation method to improve the result of seismic attribute analysis.

Development and Assessment of LSTM Model for Correcting Underestimation of Water Temperature in Korean Marine Heatwave Prediction System (한반도 고수온 예측 시스템의 수온 과소모의 보정을 위한 LSTM 모델 구축 및 예측성 평가)

  • NA KYOUNG IM;HYUNKEUN JIN;GYUNDO PAK;YOUNG-GYU PARK;KYEONG OK KIM;YONGHAN CHOI;YOUNG HO KIM
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.29 no.2
    • /
    • pp.101-115
    • /
    • 2024
  • The ocean heatwave is emerging as a major issue due to global warming, posing a direct threat to marine ecosystems and humanity through decreased food resources and reduced carbon absorption capacity of the oceans. Consequently, the prediction of ocean heatwaves in the vicinity of the Korean Peninsula is becoming increasingly important for marine environmental monitoring and management. In this study, an LSTM model was developed to improve the underestimated prediction of ocean heatwaves caused by the coarse vertical grid system of the Korean Peninsula Ocean Prediction System. Based on the results of ocean heatwave predictions for the Korean Peninsula conducted in 2023, as well as those generated by the LSTM model, the performance of heatwave predictions in the East Sea, Yellow Sea, and South Sea areas surrounding the Korean Peninsula was evaluated. The LSTM model developed in this study significantly improved the prediction performance of sea surface temperatures during periods of temperature increase in all three regions. However, its effectiveness in improving prediction performance during periods of temperature decrease or before temperature rise initiation was limited. This demonstrates the potential of the LSTM model to address the underestimated prediction of ocean heatwaves caused by the coarse vertical grid system during periods of enhanced stratification. It is anticipated that the utility of data-driven artificial intelligence models will expand in the future to improve the prediction performance of dynamical models or even replace them.

Implementation of an Automated Agricultural Frost Observation System (AAFOS) (농업서리 자동관측 시스템(AAFOS)의 구현)

  • Kyu Rang Kim;Eunsu Jo;Myeong Su Ko;Jung Hyuk Kang;Yunjae Hwang;Yong Hee Lee
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.26 no.1
    • /
    • pp.63-74
    • /
    • 2024
  • In agriculture, frost can be devastating, which is why observation and forecasting are so important. According to a recent report analyzing frost observation data from the Korea Meteorological Administration, despite global warming due to climate change, the late frost date in spring has not been accelerated, and the frequency of frost has not decreased. Therefore, it is important to automate and continuously operate frost observation in risk areas to prevent agricultural frost damage. In the existing frost observation using leaf wetness sensors, there is a problem that the reference voltage value fluctuates over a long period of time due to contamination of the observation sensor or changes in the humidity of the surrounding environment. In this study, a datalogger program was implemented to automatically solve these problems. The established frost observation system can stably and automatically accumulate time-resolved observation data over a long period of time. This data can be utilized in the future for the development of frost diagnosis models using machine learning methods and the production of frost occurrence prediction information for surrounding areas.