• Title/Summary/Keyword: Decision Tree

Search Result 1,642, Processing Time 0.028 seconds

Affected Model of Indoor Radon Concentrations Based on Lifestyle, Greenery Ratio, and Radon Levels in Groundwater (생활 습관, 주거지 주변 녹지 비율 및 지하수 내 라돈 농도 따른 실내 라돈 농도 영향 모델)

  • Lee, Hyun Young;Park, Ji Hyun;Lee, Cheol-Min;Kang, Dae Ryong
    • Journal of health informatics and statistics
    • /
    • v.42 no.4
    • /
    • pp.309-316
    • /
    • 2017
  • Objectives: Radon and its progeny pose environmental risks as a carcinogen, especially to the lungs. Investigating factors affecting indoor radon concentrations and models thereof are needed to prevent exposure to radon and to reduce indoor radon concentrations. The purpose of this study was to identify factors affecting indoor radon concentration and to construct a comprehensive model thereof. Methods: Questionnaires were administered to obtain data on residential environments, including building materials and life style. Decision tree and structural equation modeling were applied to predict residences at risk for higher radon concentrations and to develop the comprehensive model. Results: Greenery ratio, impermeable layer ratio, residence at ground level, daily ventilation, long-term heating, crack around the measuring device, and bedroom were significantly shown to be predictive factors of higher indoor radon concentrations. Daily ventilation reduced the probability of homes having indoor radon concentrations ${\geq}200Bq/m^3$ by 11.6%. Meanwhile, a greenery ratio ${\geq}65%$ without daily ventilation increased this probability by 15.3% compared to daily ventilation. The constructed model indicated greenery ratio and ventilation rate directly affecting indoor radon concentrations. Conclusions: Our model highlights the combined influences of geographical properties, groundwater, and lifestyle factors of an individual resident on indoor radon concentrations in Korea.

Factors analysis of the cyanobacterial dominance in the four weirs installed in of Nakdong River (낙동강의 중·하류 4개보에서 남조류 우점 환경 요인 분석)

  • Kim, Sung jin;Chung, Se woong;Park, Hyung seok;Cho, Young cheol;Lee, Hee suk
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.413-413
    • /
    • 2019
  • 하천과 호수에서 남조류의 이상 과잉증식 문제(이하 녹조문제)는 담수생태계의 생물다양성을 감소시키며, 음용수의 이취미 원인물질을 발생시켜 물 이용에 장해가 된다. 또한 독소를 생산하는 유해남조류가 대량 증식할 경우에는 가축이나 인간의 건강에 치명적 해를 끼치기도 한다. 그 동안 국내에서 녹조문제는 댐 저수지와 하구호와 같은 정체수역에서 간헐적으로 문제를 일으켰으나, 4대강사업(2010-2011)으로 16개의 보가 설치된 이후 낙동강, 금강, 영산강 등 대하천에서도 광범위하게 발생되고 있어 중요한 사회적 환경적 이슈로 대두되었다. 한편, 대하천에 설치된 보 구간에서 빈번히 발생하는 녹조현상의 원인에 대해서는 전 지구적 기온상승에 따른 기후변화의 영향이라는 주장과 유역으로부터 영양염류의 과도한 유입, 가뭄에 따른 유량감소, 보 설치에 따른 체류시간 증가 등 다양한 의견이 제시되고 있으나, 대상 유역과 수체의 특성에 따라 녹조 발생의 원인이 상이하거나 또는 다양한 요인이 복합적으로 작용하기 때문에 보편적 해석(universal interpretation)이 어려운 것이 현실이다. 따라서 각 수계별, 보별 녹조현상에 대한 정확한 원인분석과 효과적인 대책 마련을 위해서는 집중된 실험자료와 데이터마이닝 기법에 근거로 한 보다 과학적이고 객관적인 접근이 이루어져야 한다. 본 연구에서는 2012년 보 설치 이후 남조류에 의한 녹조현상이 빈번히 발생하고 있는 낙동강 4개보(강정고령보, 달성보, 합천창녕보, 창녕함안보)를 대상으로 집중적인 현장조사와 실험분석을 수행하고, 수집된 기상, 수문, 수질, 조류 자료에 대해 통계분석과 다양한 데이터모델링 기법을 적용하여 보별 남조류 우점 환경조건과 이를 제어하기 위한 주요 조절변수를 규명하는데 있다. 연구대상 보 별 수질과 식물플랑크톤의 정성 및 정량 실험은 2017년 5월부터 2018년 11월까지 2년에 걸쳐 실시하였으며, 남조류 세포수 밀도와 환경요인과의 상관성 분석을 실시하고, 단계적 다중회귀모델(Step-wise Multiple Linear Regressions, SMLR), 랜덤포레스트(Random Forests, RF) 모델과 재귀적 변수 제거 기법(Recursive Feature Elimination using Random Forest, RFE-RF)을 이용한 변수중요도 평가, 의사결정나무(Decision Tree, DT), 주성분분석(Principal Component Analysis, PCA) 기법 등 다양한 모수적 및 비모수적 데이터마이닝 결과를 바탕으로 각 보별 남 조류 우점 환경요인을 종합적으로 해석하였다.

  • PDF

Exploring Feature Selection Methods for Effective Emotion Mining (효과적 이모션마이닝을 위한 속성선택 방법에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.3
    • /
    • pp.107-117
    • /
    • 2019
  • In the era of SNS, many people relies on it to express their emotions about various kinds of products and services. Therefore, for the companies eagerly seeking to investigate how their products and services are perceived in the market, emotion mining tasks using dataset from SNSs become important much more than ever. Basically, emotion mining is a branch of sentiment analysis which is based on BOW (bag-of-words) and TF-IDF. However, there are few studies on the emotion mining which adopt feature selection (FS) methods to look for optimal set of features ensuring better results. In this sense, this study aims to propose FS methods to conduct emotion mining tasks more effectively with better outcomes. This study uses Twitter and SemEval2007 dataset for the sake of emotion mining experiments. We applied three FS methods such as CFS (Correlation based FS), IG (Information Gain), and ReliefF. Emotion mining results were obtained from applying the selected features to nine classifiers. When applying DT (decision tree) to Tweet dataset, accuracy increases with CFS, IG, and ReliefF methods. When applying LR (logistic regression) to SemEval2007 dataset, accuracy increases with ReliefF method.

Convergence Research on Relationships among the inhibiting factors of Dying Well (웰다잉 저해 요인의 관련성에 관한 융합 연구)

  • Lee, Chong Hyung;Ahn, Sang-Yoon;Kim, Yong-Ha;Kim, Kwang-Hwan
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.8
    • /
    • pp.37-44
    • /
    • 2019
  • The purpose of this study is to determine the inhibiting factors of dying well for people who want to have a good death. The final respondents in this study were sampled using stratified proportional allocation using a stratified random sampling method, and 1,000 adults aged between 19 and 75 years were selected. The questionnaire used consisted of four items on general characteristics and 20 items related to the inhibiting factors of dying well scored on a 7-point Likert scale. Analysis was conducted using descriptive statistics, correlation analysis, and decision tree analysis. Results showed that, among the inhibiting factors of dying well, "degenerative diseases (such as dementia)" and "loss of control (mental / physical)" scored 5.502 and 5.268 points, respectively; the highest significant positive correlation was found between "bad marital relationship" and "bad relationship with children," followed by "did not receive death education" and "lack of medical policy promotion (dying well)" and "bad relationship with children" and "indifference of others." Considering these findings, it appears that the whole society will make efforts to improve the perception and practice of good death, and life and death education will be expanded if death education for dying well is organized and implemented.

A Study on the Development of Readmission Predictive Model (재입원 예측 모형 개발에 관한 연구)

  • Cho, Yun-Jung;Kim, Yoo-Mi;Han, Seung-Woo;Choe, Jun-Yeong;Baek, Seol-Gyeong;Kang, Sung-Hong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.4
    • /
    • pp.435-447
    • /
    • 2019
  • In order to prevent unnecessary re-admission, it is necessary to intensively manage the groups with high probability of re-admission. For this, it is necessary to develop a re-admission prediction model. Two - year discharge summary data of one university hospital were collected from 2016 to 2017 to develop a predictive model of re-admission. In this case, the re-admitted patients were defined as those who were discharged more than once during the study period. We conducted descriptive statistics and crosstab analysis to identify the characteristics of rehospitalized patients. The re-admission prediction model was developed using logistic regression, neural network, and decision tree. AUC (Area Under Curve) was used for model evaluation. The logistic regression model was selected as the final re-admission predictive model because the AUC was the best at 0.81. The main variables affecting the selected rehospitalization in the logistic regression model were Residental regions, Age, CCS, Charlson Index Score, Discharge Dept., Via ER, LOS, Operation, Sex, Total payment, and Insurance. The model developed in this study was limited to generalization because it was two years data of one hospital. It is necessary to develop a model that can collect and generalize long-term data from various hospitals in the future. Furthermore, it is necessary to develop a model that can predict the re-admission that was not planned.

Cost-Utility Analysis of Pegfilgrastim and Pegteograstim in Patients with Breast Cancer using Doxorubicin and Cyclophosphamide (Doxorubicin과 Cyclophosphamide를 투여받는 유방암 환자에서 Pegfilgrastim과 Pegteograstim의 비용-효용 분석)

  • Kwon, Su Ji;Geum, Min Jung;Kim, Jae Song;Son, Eun Sun;Kwon, Kyeng Hee
    • Journal of Korean Society of Health-System Pharmacists
    • /
    • v.35 no.4
    • /
    • pp.409-417
    • /
    • 2018
  • Background : Febrile neutropenia (FN) is one of the side effects in the patients treated with chemotherapy, and the patients who have FN generally need immediate treatment with extended-spectrum antibiotics and hospitalization. Pegfilgrastim and pegteograstim, which are used for the prevention of FN as a granulocyte-colony stimulating factor (G-CSF), have been granted insurance coverage in the Republic of Korea for certain breast cancer patients using doxorubicin and cyclophosphamide (AC) from September 2016. Methods : The data of the patients with breast cancer using AC regimen and G-CSF were collected retrospectively. This study involves cost-utility analysis of pegfilgrastim and pegteograstim. In this study, we constructed a simple decision tree model for short-term observation and calculated quality-adjusted life year (QALY) and the direct medical costs from the medical provider's perspective. Results : From September 2016 to May 2017, 15 patients were treated with pegfilgrastim and 15 patients were treated with pegteograstim. As a result of dividing the average cost by QALY for each treatment group, it was observed that pegfilgrastim and pegteograstim were consumed 24,923,384 won and 22,808,336 won per 1QALY, respectively. Consequently, incremental cost effectiveness ratio (ICER) showed 2,115,048 won more per pegfilgrastim than pegteograstim per 1QALY, and the cost per 1QALY of both the drugs was lower than 30,500,000 won; the Koreans were willing to pay this amount. Conclusions : This study suggests that pegfilgrastim and pegteograstim can be used to improve the quality of life of breast cancer patients undergoing AC therapy. Among the two drugs, pegteograstim seems to be more cost-effective. However, since this study was conducted as a retrospective observation method on a small scale, it is associated with many limitations. Therefore, a long-term prospective cohort study is needed to supplement the present findings.

Evaluation of a Thermal Conductivity Prediction Model for Compacted Clay Based on a Machine Learning Method (기계학습법을 통한 압축 벤토나이트의 열전도도 추정 모델 평가)

  • Yoon, Seok;Bang, Hyun-Tae;Kim, Geon-Young;Jeon, Haemin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.41 no.2
    • /
    • pp.123-131
    • /
    • 2021
  • The buffer is a key component of an engineered barrier system that safeguards the disposal of high-level radioactive waste. Buffers are located between disposal canisters and host rock, and they can restrain the release of radionuclides and protect canisters from the inflow of ground water. Since considerable heat is released from a disposal canister to the surrounding buffer, the thermal conductivity of the buffer is a very important parameter in the entire disposal safety. For this reason, a lot of research has been conducted on thermal conductivity prediction models that consider various factors. In this study, the thermal conductivity of a buffer is estimated using the machine learning methods of: linear regression, decision tree, support vector machine (SVM), ensemble, Gaussian process regression (GPR), neural network, deep belief network, and genetic programming. In the results, the machine learning methods such as ensemble, genetic programming, SVM with cubic parameter, and GPR showed better performance compared with the regression model, with the ensemble with XGBoost and Gaussian process regression models showing best performance.

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Kim, SungJin;Choi, NakJin;Lee, JunDong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.105-112
    • /
    • 2021
  • In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP's data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

Data-driven Co-Design Process for New Product Development: A Case Study on Smart Heating Jacket (신제품 개발을 위한 데이터 기반 공동 디자인 프로세스: 스마트 난방복 사례 연구)

  • Leem, Sooyeon;Lee, Sang Won
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.1
    • /
    • pp.133-141
    • /
    • 2021
  • This research suggests a design process that effectively complements the human-centered design through an objective data-driven approach. The subjective human-centered design process can often lack objectivity and can be supplemented by the data-driven approaches to effectively discover hidden user needs. This research combines the data mining analysis with co-design process and verifies its applicability through the case study on the smart heating jacket. In the data mining process, the clustering can group the users which is the basis for selecting the target groups and the decision tree analysis primarily identifies the important user perception attributes and values. The broad point of view based on the data analysis is modified through the co-design process which is the deeper human-centered design process by using the developed workbook. In the co-design process, the journey maps, needs and pain points, ideas, values for the target user groups are identified and finalized. They can become the basis for starting new product development.

Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches (기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구)

  • Cho, Sanggoo;Cho, Seung Yong
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.53-67
    • /
    • 2020
  • This study employees a supervised learning prediction model to detect nonconformity in advance of processed food manufacturing and processing businesses. The study was conducted according to the standard procedure of machine learning, such as definition of objective function, data preprocessing and feature engineering and model selection and evaluation. The dependent variable was set as the number of supervised inspection detections over the past five years from 2014 to 2018, and the objective function was to maximize the probability of detecting the nonconforming companies. The data was preprocessed by reflecting not only basic attributes such as revenues, operating duration, number of employees, but also the inspections track records and extraneous climate data. After applying the feature variable extraction method, the machine learning algorithm was applied to the data by deriving the company's risk, item risk, environmental risk, and past violation history as feature variables that affect the determination of nonconformity. The f1-score of the decision tree, one of ensemble models, was much higher than those of other models. Based on the results of this study, it is expected that the official food control for food safety management will be enhanced and geared into the data-evidence based management as well as scientific administrative system.