• 제목/요약/키워드: missForest

검색결과 8건 처리시간 0.019초

미세먼지 자료에서의 결측치 대체 방법 비교 (Comparision of Missing Imputaion Methods In fine dust data)

  • 김연진;박헌진
    • 한국빅데이터학회지
    • /
    • 제4권2호
    • /
    • pp.105-114
    • /
    • 2019
  • 자료 분석에 있어서 결측치 대체는 큰 이슈중 하나이다. 결측치의 발생을 무시하고 분석을 진행하게 되면, bias가 발생하여 그에 따른 추정치에 대해 잘못된 결과를 줄 수 있다. 이 논문에서는 미세먼지자료에서 발생한 결측치를 적절한 대체 방법을 찾아 적용하자 한다. 이를 통해 시계열 자료에서 발생한 결측치를 R을 기반으로 한MICE, MissForest 등의 기존 방법과 시계열 기반 모델을 사용하여 여러 가지 상황에 대한 시뮬레이션을 설정해 비교해 밝히고자 하였다. 이 결과에 대해 각각을 변수 별로 비교하였을때 ImputeTS 패키지를 이용한 auto arima 모델의 kalman filter를 적용한 모형과 MissForest 모형이 미세먼지자료 결측치 대체에서는 좋은 결과를 주는 것으로 판단되었다.

  • PDF

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • 제30권3호
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.

Performance analysis and comparison of various machine learning algorithms for early stroke prediction

  • Vinay Padimi;Venkata Sravan Telu;Devarani Devi Ningombam
    • ETRI Journal
    • /
    • 제45권6호
    • /
    • pp.1007-1021
    • /
    • 2023
  • Stroke is the leading cause of permanent disability in adults, and it can cause permanent brain damage. According to the World Health Organization, 795 000 Americans experience a new or recurrent stroke each year. Early detection of medical disorders, for example, strokes, can minimize the disabling effects. Thus, in this paper, we consider various risk factors that contribute to the occurrence of stoke and machine learning algorithms, for example, the decision tree, random forest, and naive Bayes algorithms, on patient characteristics survey data to achieve high prediction accuracy. We also consider the semisupervised self-training technique to predict the risk of stroke. We then consider the near-miss undersampling technique, which can select only instances in larger classes with the smaller class instances. Experimental results demonstrate that the proposed method obtains an accuracy of approximately 98.83% at low cost, which is significantly higher and more reliable compared with the compared techniques.

3D 프린팅 소재 화학물질의 독성 예측을 위한 Data-centric XAI 기반 분자 구조 Data Imputation과 QSAR 모델 개발 (Data-centric XAI-driven Data Imputation of Molecular Structure and QSAR Model for Toxicity Prediction of 3D Printing Chemicals)

  • 정찬혁;김상윤;허성구;;신민혁;유창규
    • Korean Chemical Engineering Research
    • /
    • 제61권4호
    • /
    • pp.523-541
    • /
    • 2023
  • 3D 프린터의 활용이 높아짐에 따라 발생하는 화학물질에 대한 노출 빈도가 증가하고 있다. 그러나 3D 프린팅 발생 화학물질의 독성 및 유해성에 대한 연구는 미비하며, 분자 구조 데이터의 결측치로 인해 in silico 기법을 사용한 독성예측 연구는 저조한 실정이다. 본 연구에서는 화학물질의 분자구조 정보를 나타내는 주요 분자표현자의 결측치를 보간하여 3D 프린팅의 독성 및 유해성을 예측한 Data-centric QSAR 모델을 개발하였다. 먼저 MissForest 알고리즘을 사용해 3D 프린팅으로 발생되는 유해물질의 분자표현자 결측치를 보완하였으며, 서로 다른 4가지 기계학습 모델(결정트리, 랜덤포레스트, XGBoost, SVM)을 기반으로 Data-centric QSAR 모델을 개발하여 생물 농축 계수(Log BCF)와 옥탄올-공기분배계수(Log Koa), 분배계수(Log P)를 예측하였다. 또한, 설명 가능한 인공지능(XAI) 방법론 중 TreeSHAP (SHapley Additive exPlanations) 기법을 활용하여 Data-centric QSAR 모델의 신뢰성을 입증하였다. MissForest 알고리즘 기반 결측지 보간 기법은, 기존 분자구조 데이터에 비하여 약 2.5배 많은 분자구조 데이터를 확보할 수 있었다. 이를 바탕으로 개발된 Data-centric QSAR 모델의 성능은 Log BCF, Log Koa와 Log P를 각각 73%, 76%, 92% 의 예측 성능으로 예측할 수 있었다. 마지막으로 Tree-SHAP 분석결과 개발된 Data-centric QSAR 모델은 각 독성치와 물리적으로 상관성이 높은 분자표현자를 통하여 선택함을 설명할 수 있었고 독성 정보에 대한 높은 예측 성능을 확보할 수 있었다. 본 연구에서 개발한 방법론은 다른 프린팅 소재나 화학공정, 그리고 반도체/디스플레이 공정에서 발생 가능한 오염물질의 독성 및 인체 위해성 평가에 활용될 수 있을 것으로 사료된다.

Machine Learning-based landslide susceptibility mapping - Inje area, South Korea

  • Chanul Choi;Le Xuan Hien;Seongcheon Kwon;Giha Lee
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2023년도 학술발표회
    • /
    • pp.248-248
    • /
    • 2023
  • In recent years, the number of landslides in Korea has been increasing due to extreme weather events such as localized heavy rainfall and typhoons. Landslides often occur with debris flows, land subsidence, and earthquakes. They cause significant damage to life and property. 64% of Korea's land area is made up of mountains, the government wanted to predict landslides to reduce damage. In response, the Korea Forest Service has established a 'Landslide Information System' to predict the likelihood of landslides. This system selects a total of 13 landslide factors based on past landslide events. Using the LR technique (Logistic Regression) to predict the possibility of a landslide occurrence and the accuracy is known to be 0.75. However, most of the data used for learning in the current system is on landslides that occurred from 2005 to 2011, and it does not reflect recent typhoons or heavy rain. Therefore, in this study, we will apply a total of six machine learning techniques (KNN, LR, SVM, XGB, RF, GNB) to predict the occurrence of landslides based on the data of Inje, Gangwon-do, which was recently produced by the National Institute of Forest. To predict the occurrence of landslides, it is necessary to process converting landslide events and factors data into a suitable form for machine learning techniques through ArcGIS and Python. In addition, there is a large difference in the number of data between areas where landslides occurred or not. Therefore, the prediction was performed after correcting the unbalanced data using Tomek Links and Near Miss techniques. Moreover, to control unbalanced data, a model that reflects soil properties will use to remove absolute safe areas.

  • PDF

Systematic Review of Reciprocal Changes after Spinal Reconstruction Surgery : Do Not Miss the Forest for the Trees

  • Kim, Chang-Wook;Hyun, Seung-Jae;Kim, Ki-Jeong
    • Journal of Korean Neurosurgical Society
    • /
    • 제64권6호
    • /
    • pp.843-852
    • /
    • 2021
  • The purpose of this review was to synthesize the research on global spinal alignment and reciprocal changes following cervical or thoracolumbar reconstruction surgery. We carried out a search of PubMed, EMBASE, and Cochrane Library for studies through May 2020, and ultimately included 11 articles. The optimal goal of a truly balanced spine is to maintain the head over the femoral heads. When spinal imbalance occurs, the human body reacts through various compensatory mechanisms to maintain the head over the pelvis and to retain a horizontal gaze. Historically, deformity correction has focused on correcting scoliosis and preventing scoliotic curve progression. Following substantial correction of a spinal deformity, reciprocal changes take place in the flexible segments proximal and distal to the area of correction. Restoration of lumbar lordosis following surgery to correct a thoracolumbar deformity induces reciprocal changes in T1 slope, cervical lordosis, pelvic shift, and lower extremity parameters. Patients with cervical kyphosis exhibit different patterns of reciprocal changes depending on whether they have head-balanced or trunk-balanced kyphosis. These reciprocal changes should be considered to in order to prevent secondary spine disorders. We emphasize the importance of evaluating the global spinal alignment to assess postoperative changes.

Automatic Detection of Dead Trees Based on Lightweight YOLOv4 and UAV Imagery

  • Yuanhang Jin;Maolin Xu;Jiayuan Zheng
    • Journal of Information Processing Systems
    • /
    • 제19권5호
    • /
    • pp.614-630
    • /
    • 2023
  • Dead trees significantly impact forest production and the ecological environment and pose constraints to the sustainable development of forests. A lightweight YOLOv4 dead tree detection algorithm based on unmanned aerial vehicle images is proposed to address current limitations in dead tree detection that rely mainly on inefficient, unsafe and easy-to-miss manual inspections. An improved logarithmic transformation method was developed in data pre-processing to display tree features in the shadows. For the model structure, the original CSPDarkNet-53 backbone feature extraction network was replaced by MobileNetV3. Some of the standard convolutional blocks in the original extraction network were replaced by depthwise separable convolution blocks. The new ReLU6 activation function replaced the original LeakyReLU activation function to make the network more robust for low-precision computations. The K-means++ clustering method was also integrated to generate anchor boxes that are more suitable for the dataset. The experimental results show that the improved algorithm achieved an accuracy of 97.33%, higher than other methods. The detection speed of the proposed approach is higher than that of YOLOv4, improving the efficiency and accuracy of the detection process.

Himawari-8 AHI 적설 탐지의 성능 평가 (Performance Evaluation of Snow Detection Using Himawari-8 AHI Data)

  • 진동현;이경상;서민지;최성원;성노훈;이은경;한현경;한경수
    • 대한원격탐사학회지
    • /
    • 제34권6_1호
    • /
    • pp.1025-1032
    • /
    • 2018
  • 적설은 강수의 한 형태로 지표면에 쌓인 눈으로 정의되며 빙권의 가장 큰 단일 구성 요소로서 지구 표면과 대기 사이의 열 교환이나 전 지구 또는 지역적인 측면에서 지구의 에너지 수지 균형을 유지하는 중요한 역할을 하는 등 지구 표면 온도를 조절하는데 영향을 미친다. 그러나 적설은 인간의 접근이 어려운 지역에 주로 분포하기 때문에 위성을 활용한 적설 탐지가 활발히 수행되고 있으며 산림 지역의 적설 탐지는 구름과 적설의 구분 다음으로 중요한 과정이다. 따라서 본 연구는 기존 극 궤도 위성에서 산림 지역 적설 탐지에 활용하는 Normalized Difference Snow Index(NDSI) 및 Normalized Difference Vegetation Index(NDVI)를 정지궤도 위성에 적용하였으며, 산림 지역 외 영역은 적설의 분광 특징을 활용한 $R_{1.61{\mu}m}$ anomaly 기법 및 NDSI를 활용하여 적설 탐지를 수행하였다. 본 연구에서 산출한 Snow Cover 자료와 Visible Infrared Imaging Radiometer(VIIRS) Snow Cover 자료를 활용해 간접 검증을 수행한 결과, Probability of Detection(POD)는 99.95%, False Alarm Ratio(FAR)는 16.63 %로 나타났다. Himawari-8 Advanced Himawari Imager(AHI) RGB 영상을 추가로 활용해 정성적 검증 또한 수행하였으며 수행 결과, VIIRS Snow Cover가 미탐지한 영역과 본 연구가 오탐지한 영역이 혼합되어 나타났다.