• Title/Summary/Keyword: random factor

Search Result 839, Processing Time 0.027 seconds

Prediction and Analysis of PM2.5 Concentration in Seoul Using Ensemble-based Model (앙상블 기반 모델을 이용한 서울시 PM2.5 농도 예측 및 분석)

  • Ryu, Minji;Son, Sanghun;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1191-1205
    • /
    • 2022
  • Particulate matter(PM) among air pollutants with complex and widespread causes is classified according to particle size. Among them, PM2.5 is very small in size and can cause diseases in the human respiratory tract or cardiovascular system if inhaled by humans. In order to prepare for these risks, state-centered management and preventable monitoring and forecasting are important. This study tried to predict PM2.5 in Seoul, where high concentrations of fine dust occur frequently, using two ensemble models, random forest (RF) and extreme gradient boosting (XGB) using 15 local data assimilation and prediction system (LDAPS) weather-related factors, aerosol optical depth (AOD) and 4 chemical factors as independent variables. Performance evaluation and factor importance evaluation of the two models used for prediction were performed, and seasonal model analysis was also performed. As a result of prediction accuracy, RF showed high prediction accuracy of R2 = 0.85 and XGB R2 = 0.91, and it was confirmed that XGB was a more suitable model for PM2.5 prediction than RF. As a result of the seasonal model analysis, it can be said that the prediction performance was good compared to the observed values with high concentrations in spring. In this study, PM2.5 of Seoul was predicted using various factors, and an ensemble-based PM2.5 prediction model showing good performance was constructed.

Characteristics of Distribution of Phytoplankton Communities in Three Estuarial Lakes of the Yeongsan River (영산강 하구역에 위치한 세 호수의 식물플랑크톤 군집 분포 특성)

  • Cho, Hyeon Jin;Na, Jeong Eun;Lee, Gun Ju;Lee, Hak Young
    • Korean Journal of Ecology and Environment
    • /
    • v.54 no.4
    • /
    • pp.291-302
    • /
    • 2021
  • The phytoplankton community in the estuarine system is affected by changes of physicochemical factors easily. The present study analyzed phytoplankton community distribution and similarity, in addition to exploring factors influencing variations in phytoplankton community structure in three lakes located in the Yeongsan River estuary from March 2014 to November 2017. We carried out non-multidimensional scaling (NMDS) and random forest analysis (RF) for comparing the pattern of phytoplankton distribution and the relationship between phytoplankton distribution and environmental variables. Similarity Percentage (SIMPER) and Analysis of Similarity (ANOSIM) were performed to figure out the similarity of phytoplankton community at each site of three lakes. From NMDS, Phytoplankton community distribution differed between Yeongsan and Gumho lakes, and the factors influencing the distribution of phytoplankton communities across the three lakes were water temperature, dissolved oxygen, total nitrogen (T-N), nitrate-N (NO3-N), and conductivity. NO3-N was a key factor influencing phytoplankton community structure in the three lakes based on RF. A total of 24 species were identified as indicator species in the three lakes studied, with the highest species numbers observed in Yeongsan Lake (13) and the lowest observed in Yeongam Lake (2). According to SIMPER and ANOSIM results, the phytoplankton community in Yeongsan and Yeongam lakes were similar, and they differed from those in Gumho Lake. In addition, the phytoplankton community structure varied across the study sites in the three lakes, indicating that water channels across the lakes a minor influence phytoplankton community distribution.

Comparison of Genome-wide Association Study (GWAS) Algorithms for Detecting Genetic Variants Associated with Growth Traits in Olive Flounder Paralichthys olivaceus (넙치(Paralichthys olivaceus)의 성장형질 연관 유전자 변이 탐색을 위한 전장유전체연관분석(GWAS) 알고리즘 비교 분석 연구)

  • Sangwon Yoon;Heegun Lee;Jong-Won Park;Minhwan Jeong;Dain Lee;Hyo Sun Jung;Julan Kim;Hye-Rim Yang;Seung Hwan Lee;Jeong-Ho Lee
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.56 no.4
    • /
    • pp.411-418
    • /
    • 2023
  • Genome wide association studies (GWAS) identify genetic loci associated with quantitative traits in genomic selection. Although several studies have compared performance of various algorithms, no study compares them in olive flounder Paralichthys olivaceus. This study compared the GWAS results of four mixed linear model (MLM) algorithms and one Fixed and random model Circulating Probability Unification (FarmCPU) algorithm in olive flounder. Considering gender and genetic association matrices as fixed and random effects, the MLM had stable performance without inflation for λGC (genomic inflation factor) of -log10P. The FarmCPU algorithm had some appropriate λGC of -log10P, and an upward tail was identified in quantile-quantile plots. Therefore, the models were suitable for detecting genetic variants associated with olive flounder growth traits. Moreover, significant genotypes appeared several times at chromosome 22, around which quantitative trait loci are expected to exist. Finally, in both models, some of the most genetic variants were found in genes related to growth traits, confirming their reliability. These results will be helpful when applied to the genomic selection of olive flounder growth traits in the future.

The Prediction of Survival of Breast Cancer Patients Based on Machine Learning Using Health Insurance Claim Data (건강보험 청구 데이터를 활용한 머신러닝 기반유방암 환자의 생존 여부 예측)

  • Doeggyu Lee;Kyungkeun Byun;Hyungdong Lee;Sunhee Shin
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.2
    • /
    • pp.1-9
    • /
    • 2023
  • Research using AI and big data is also being actively conducted in the health and medical fields such as disease diagnosis and treatment. Most of the existing research data used cohort data from research institutes or some patient data. In this paper, the difference in the prediction rate of survival and the factors affecting survival between breast cancer patients in their 40~50s and other age groups was revealed using health insurance review claim data held by the HIRA. As a result, the accuracy of predicting patients' survival was 0.93 on average in their 40~50s, higher than 0.86 in their 60~80s. In terms of that factor, the number of treatments was high for those in their 40~50s, and age was high for those in their 60~80s. Performance comparison with previous studies, the average precision was 0.90, which was higher than 0.81 of the existing paper. As a result of performance comparison by applied algorithm, the overall average precision of Decision Tree, Random Forest, and Gradient Boosting was 0.90, and the recall was 1.0, and the precision of multi-layer perceptrons was 0.89, and the recall was 1.0. I hope that more research will be conducted using machine learning automation(Auto ML) tools for non-professionals to enhance the use of the value for health insurance review claim data held by the HIRA.

A Study on the Prediction of Mortality Rate after Lung Cancer Diagnosis for Men and Women in 80s, 90s, and 100s Based on Deep Learning (딥러닝 기반 80대·90대·100대 남녀 대상 폐암 진단 후 사망률 예측에 관한 연구)

  • Kyung-Keun Byun;Doeg-Gyu Lee;Se-Young Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.2
    • /
    • pp.87-96
    • /
    • 2023
  • Recently, research on predicting the treatment results of diseases using deep learning technology is also active in the medical community. However, small patient data and specific deep learning algorithms were selected and utilized, and research was conducted to show meaningful results under specific conditions. In this study, in order to generalize the research results, patients were further expanded and subdivided to derive the results of a study predicting mortality after lung cancer diagnosis for men and women in their 80s, 90s, and 100s. Using AutoML, which provides large-scale medical information and various deep learning algorithms from the Health Insurance Review and Assessment Service, five algorithms such as Decision Tree, Random Forest, Gradient Boosting, XGBoost, and Logistic Registration were created to predict mortality rates for 84 months after lung cancer diagnosis. As a result of the study, men in their 80s and 90s had a higher mortality prediction rate than women, and women in their 100s had a higher mortality prediction rate than men. And the factor that has the greatest influence on the mortality rate was analyzed as the treatment period.

Factor Analysis of Seaborne Trade Volume Affecting on The World Economy (품목별 해상 물동량이 세계 경제에 미치는 영향 요인분석)

  • Ahn, Young-Gyun;Lee, Min-Kyu;Park, Ju-Dong
    • Korea Trade Review
    • /
    • v.42 no.2
    • /
    • pp.277-296
    • /
    • 2017
  • More than 95% of imports and exports in the World are being transported by vessels. In other words, marine transportation accounts for a large portion of share in the world trade. The purpose of this study is to analyze factors of seaborne trade volume according to items affecting on the world economy. This study conducted a linear regression analysis between seaborne trade volume and the world economy (world GDP) to estimate the correlation between them. Panel data analysis and random effects model analysis have been applied to examine the effect of seaborne trade volume. For this study, the seaborne trade volume is categorized into 10 items, and estimated how much global GDP will be affected when the trade volume changes. In addition, the granger causality test was conducted to verify the relationship between seaborne trade volume and the world GDP. As a result, seaborne trade volume and the world GDP were mutually influenced each other. However, seaborne trade volume affects the world economy more significantly. The items affecting world economic growth include petroleum products, crude oil, chemical products, and so on. The estimated value of the coefficients of petroleum products, crude oil and chemical products were 1.014, 1.013 and 1.010, respectively. The estimated value 1.014 of petroleum products means that the growth rate is 1.014 times higher than the current world GDP growth rate when the seaborne trade volume of petroleum products increased by one unit Lastly, this study examines the seaborne trade volume of 10 categories and then verifies whether the growth rate of world GDP will increase when the volume of seaborne trade increased. This study is expected to provide policy-makers with useful information about formulating policies related to international trade.

  • PDF

Assessment of Landslide Susceptibility in Jecheon Using Deep Learning Based on Exploratory Data Analysis (데이터 탐색을 활용한 딥러닝 기반 제천 지역 산사태 취약성 분석)

  • Sang-A Ahn;Jung-Hyun Lee;Hyuck-Jin Park
    • The Journal of Engineering Geology
    • /
    • v.33 no.4
    • /
    • pp.673-687
    • /
    • 2023
  • Exploratory data analysis is the process of observing and understanding data collected from various sources to identify their distributions and correlations through their structures and characterization. This process can be used to identify correlations among conditioning factors and select the most effective factors for analysis. This can help the assessment of landslide susceptibility, because landslides are usually triggered by multiple factors, and the impacts of these factors vary by region. This study compared two stages of exploratory data analysis to examine the impact of the data exploration procedure on the landslide prediction model's performance with respect to factor selection. Deep-learning-based landslide susceptibility analysis used either a combinations of selected factors or all 23 factors. During the data exploration phase, we used a Pearson correlation coefficient heat map and a histogram of random forest feature importance. We then assessed the accuracy of our deep-learning-based analysis of landslide susceptibility using a confusion matrix. Finally, a landslide susceptibility map was generated using the landslide susceptibility index derived from the proposed analysis. The analysis revealed that using all 23 factors resulted in low accuracy (55.90%), but using the 13 factors selected in one step of exploration improved the accuracy to 81.25%. This was further improved to 92.80% using only the nine conditioning factors selected during both steps of the data exploration. Therefore, exploratory data analysis selected the conditioning factors most suitable for landslide susceptibility analysis and thereby improving the performance of the analysis.

Single-Channel Seismic Data Processing via Singular Spectrum Analysis (특이 스펙트럼 분석 기반 단일 채널 탄성파 자료처리 연구)

  • Woodon Jeong;Chanhee Lee;Seung-Goo Kang
    • Geophysics and Geophysical Exploration
    • /
    • v.27 no.2
    • /
    • pp.91-107
    • /
    • 2024
  • Single-channel seismic exploration has proven effective in delineating subsurface geological structures using small-scale survey systems. The seismic data acquired through zero- or near-offset methods directly capture subsurface features along the vertical axis, facilitating the construction of corresponding seismic sections. However, substantial noise in single-channel seismic data hampers precise interpretation because of the low signal-to-noise ratio. This study introduces a novel approach that integrate noise reduction and signal enhancement via matrix rank optimization to address this issue. Unlike conventional rank-reduction methods, which retain selected singular values to mitigate random noise, our method optimizes the entire singular value spectrum, thus effectively tackling both random and erratic noises commonly found in environments with low signal-to-noise ratio. Additionally, to enhance the horizontal continuity of seismic events and mitigate signal loss during noise reduction, we introduced an adaptive weighting factor computed from the eigenimage of the seismic section. To access the robustness of the proposed method, we conducted numerical experiments using single-channel Sparker seismic data from the Chukchi Plateau in the Arctic Ocean. The results demonstrated that the seismic sections had significantly improved signal-to-noise ratios and minimal signal loss. These advancements hold promise for enhancing single-channel and high-resolution seismic surveys and aiding in the identification of marine development and submarine geological hazards in domestic coastal areas.

Subepithelial neutrophil infiltration as a predictor of the surgical outcome of chronic rhinosinusitis with nasal polyps

  • Dong-Kyu Kim;Hee-Suk Lim;Kyoung Mi Eun;Yuju Seo;Joon Kon Kim;Young Seok Kim;Min-Kyung Kim;Siyeon Jin;Seung Cheol Han;Dae Woo Kim
    • Journal of Rhinology
    • /
    • v.59 no.2
    • /
    • pp.173-180
    • /
    • 2021
  • Background: Neutrophils present as major inflammatory cells in refractory chronic rhinosinusitis with nasal polyps (CRSwNP), regardless of the endotype. However, their role in the pathophysiology of CRSwNP remains poorly understood. We investigated factors predicting the surgical outcomes of CRSwNP patients with focus on neutrophilic localization. Methods: We employed machine-learning methods such as the decision tree and random forest models to predict the surgical outcomes of CRSwNP. Immunofluorescence analysis was conducted to detect human neutrophil elastase (HNE), Bcl-2, and Ki-67 in NP tissues. We counted the immunofluorescence-positive cells and divided them into three groups based on the infiltrated area, namely, epithelial, subepithelial, and perivascular groups. Results: On machine learning, the decision tree algorithm demonstrated that the number of subepithelial HNE-positive cells, Lund-Mackay (LM) scores, and endotype (eosinophilic or non-eosinophilic) were the most important predictors of surgical outcomes in CRSwNP patients. Additionally, the random forest algorithm showed that, after ranking the mean decrease in the Gini index or the accuracy of each factor, the top three ranking factors associated with surgical outcomes were the LM score, age, and number of subepithelial HNE-positive cells. In terms of cellular proliferation, immunofluorescence analysis revealed that Ki-67/HNE-double positive and Bcl-2/HNE-double positive cells were significantly increased in the subepithelial area in refractory CRSwNP. Conclusion: Our machine-learning approach and immunofluorescence analysis demonstrated that subepithelial neutrophils in NP tissues had a high expression of Ki-67 and could serve as a cellular biomarker for predicting surgical outcomes in CRSwNP patients.

Comparison of Error Rate and Prediction of Compression Index of Clay to Machine Learning Models using Orange Mining (오렌지마이닝을 활용한 기계학습 모델별 점토 압축지수의 오차율 및 예측 비교)

  • Yoo-Jae Woong;Woo-Young Kim;Tae-Hyung Kim
    • Journal of the Korean Geosynthetics Society
    • /
    • v.23 no.3
    • /
    • pp.15-22
    • /
    • 2024
  • Predicting ground settlement during the improvement of soft ground and the construction of a structure is an crucial factor. Numerous studies have been conducted, and many prediction equations have been proposed to estimate settlement. Settlement can be calculated using the compression index of clay. In this study, data on water content, void ratio, liquid limit, plastic limit, and compression index from the Busan New Port area were collected to construct a dataset. Correlation analysis was conducted among the collected data. Machine learning algorithms, including Random Forest, Neural Network, Linear Regression, Ada Boost, and Gradient Boosting, were applied using the Orange mining program to propose compression index prediction models. The models' results were evaluated by comparing RMSE and MAPE values, which indicate error rates, and R2 values, which signify the models' significance. As a result, water content showed the highest correlation, while the plastic limit showed a somewhat lower correlation than other characteristics. Among the compared models, the AdaBoost model demonstrated the best performance. As a result of comparing each model, the AdaBoost model had the lowest error rate and a large coefficient of determination.