• Title/Summary/Keyword: Data Quality Model

Search Result 4,555, Processing Time 0.037 seconds

Evaluation of Multi-classification Model Performance for Algal Bloom Prediction Using CatBoost (머신러닝 CatBoost 다중 분류 알고리즘을 이용한 조류 발생 예측 모형 성능 평가 연구)

  • Juneoh Kim;Jungsu Park
    • Journal of Korean Society on Water Environment
    • /
    • v.39 no.1
    • /
    • pp.1-8
    • /
    • 2023
  • Monitoring and prediction of water quality are essential for effective river pollution prevention and water quality management. In this study, a multi-classification model was developed to predict chlorophyll-a (Chl-a) level in rivers. A model was developed using CatBoost, a novel ensemble machine learning algorithm. The model was developed using hourly field monitoring data collected from January 1 to December 31, 2015. For model development, chl-a was classified into class 1 (Chl-a≤10 ㎍/L), class 2 (10<Chl-a≤50 ㎍/L), and class 3 (Chl-a>50 ㎍/L), where the number of data used for the model training were 27,192, 11,031, and 511, respectively. The macro averages of precision, recall, and F1-score for the three classes were 0.58, 0.58, and 0.58, respectively, while the weighted averages were 0.89, 0.90, and 0.89, for precision, recall, and F1-score, respectively. The model showed relatively poor performance for class 3 where the number of observations was much smaller compared to the other two classes. The imbalance of data distribution among the three classes was resolved by using the synthetic minority over-sampling technique (SMOTE) algorithm, where the number of data used for model training was evenly distributed as 26,868 for each class. The model performance was improved with the macro averages of precision, rcall, and F1-score of the three classes as 0.58, 0.70, and 0.59, respectively, while the weighted averages were 0.88, 0.84, and 0.86 after SMOTE application.

A Structural Model for Health Promotion and Quality of Life in People with Cancer (건강증진과 삶의 질 구조모형 II-암환자 중심-)

  • 오복자
    • Journal of Korean Academy of Nursing
    • /
    • v.26 no.3
    • /
    • pp.632-652
    • /
    • 1996
  • It has been noted that a genetic alteration of cells influenced by unhealthy lifestyle in addition to a series of other carcinogens increases the incidence of various neoplasmic diseases. Therefore the importance of a lifestyle that minimizes such an impact on health should be emphasized. Since stomach cancer, the most common neoplasmic disease in Korea, is related to personal lifestyle and as there is a possibility of its recurrence, patients with stomach cancer need to lead a healthy lifestyle. Also the quality of life which patients experience is negatively affected by the side effects of treatments and the possibility of recurrence. Therefore an effective nursing intervention to enhance quality of life and encourage healthy lifestyle is needed. The purpose of this study is to provide a basis for nursing intervention strategies to promote health and thus enhance quality of life. A hypothetical model for this purpose was constructed based on Pender's Health Promotion Model and Becker's Health Belief Model, with the inclusion of some influential factors such as hope for quality of life and health promoting behavior. The aims of study were to : 1) evaluate the effectiveness of patient's cognitive-perceptual factors on health promoting behaviors and quality of life ; 2) examine the causal relationships among perceived benefit, perceived barrier, perceived susceptibility and severity, internal locus of control, perceived health status, hope, health concept, self efficacy, self esteem health promoting behaviors & quality of life ; 3) build and test a global hypothetical model. The subjects for this study were 164 patients who were being treated for stomach cancer were approached in the outpatient clinic on a University Hospital. The data from the completed questionnaires were analyzed using Linear Structural Relationships (LISREL). The results of research are as follows : 1) Hypothetical model and the modified model showed a good fit to the empirical data, revealing considerable explanational power for health promoting behaviors(54.9%) and quality of life(87.6%) 2) Self efficacy and hope had significant effects on health promoting behaviors. Of these, hope was affected indirectly through self efficacy and self esteem. 3) Perceived health status, hope and self esteem had significant direct effect on the quality of life. Of these variables, perceived health status was the most essential factor affecting general satisfaction in life. 4) Self-efficacy, as a mediating variable, was positively affected by perceived benefit and hope. 5) Self-esteem, as a mediating variable, was positively affected by perceived health status and hope. 6) Hope was the main variable affecting self efficacy, self esteem, health promoting behaviors and quality of life. The derived model in this study could effectively be used as a reference model for further study and could suggests a direction for nursing practices

  • PDF

Ensemble Method for Predicting Particulate Matter and Odor Intensity (미세먼지, 악취 농도 예측을 위한 앙상블 방법)

  • Lee, Jong-Yeong;Choi, Myoung Jin;Joo, Yeongin;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.42 no.4
    • /
    • pp.203-210
    • /
    • 2019
  • Recently, a number of researchers have produced research and reports in order to forecast more exactly air quality such as particulate matter and odor. However, such research mainly focuses on the atmospheric diffusion models that have been used for the air quality prediction in environmental engineering area. Even though it has various merits, it has some limitation in that it uses very limited spatial attributes such as geographical attributes. Thus, we propose the new approach to forecast an air quality using a deep learning based ensemble model combining temporal and spatial predictor. The temporal predictor employs the RNN LSTM and the spatial predictor is based on the geographically weighted regression model. The ensemble model also uses the RNN LSTM that combines two models with stacking structure. The ensemble model is capable of inferring the air quality of the areas without air quality monitoring station, and even forecasting future air quality. We installed the IoT sensors measuring PM2.5, PM10, H2S, NH3, VOC at the 8 stations in Jeonju in order to gather air quality data. The numerical results showed that our new model has very exact prediction capability with comparison to the real measured data. It implies that the spatial attributes should be considered to more exact air quality prediction.

Quality Characteristics of Public Open Data (공공개방데이터 품질 특성에 관한 연구)

  • Park, Go-Eun;Kim, Chang-Jae
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.135-146
    • /
    • 2015
  • Public data open is one of the important tasks of Korea Government 3.0. By making open data available to the private sector, the goal is to create jobs, increase innovation and improve quality of life. Public data open is a policy that emphasized its importance worldwide. Open data should have adequate quality in order to achieve the object of the public. However, there are open data's quality problems due to the lack of data quality management and standardization. The purpose of this study is to derive data characteristics of public open data from existing researches. In addition, the model was modified and verified through a survey targeting the experts on public open data. The study indicates that public open data's quality characteristics as publicity, usability, reliability, suitability. This study is significant in that it suggests quality characteristics to improve the data quality and promote utilization of the open data.

The software quality measurement based on software reliability model (소프트웨어 신뢰성 모델링 기반 소프트웨어 품질 측정)

  • Jung, Hye-Jung
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.4
    • /
    • pp.45-50
    • /
    • 2019
  • This study proposes a method to measure software reliability according to software reliability measurement model to measure software reliability. The model presented in this study uses the distribution of Non - Homogeneous Poisson Process and presents a measure of the software reliability of the presented model. As a method to select a suitable software reliability growth model according to the presented model, we have studied a method of proposing an appropriate software reliability function by calculating the mean square error according to the estimated value of the reliability function according to the software failure data. In this study, we propose a reliability function to measure the software quality and suggest a method to select the software reliability function from the viewpoint of minimizing the error of the estimation value by applying the failure data.

Development of System for Enhancing the Quality of Power Generation Facilities Failure History Data Based on Explainable AI (XAI) (XAI 기반 발전설비 고장 기록 데이터 품질 향상 시스템 개발)

  • Kim Yu Rim;Park Jeong In;Park Dong Hyun;Kang Sung Woo
    • Journal of Korean Society for Quality Management
    • /
    • v.52 no.3
    • /
    • pp.479-493
    • /
    • 2024
  • Purpose: The deterioration in the quality of failure history data due to differences in interpretation of failures among workers at power plants and the lack of consistency in the way failures are recorded negatively impacts the efficient operation of power plants. The purpose of this study is to propose a system that classifies power generation facilities failures consistently based on the failure history text data created by the workers. Methods: This study utilizes data collected from three coal unloaders operated by Korea Midland Power Co., LTD, from 2012 to 2023. It classifies failures based on the results of Soft Voting, which incorporates the prediction probabilities derived from applying the predict_proba technique to four machine learning models: Random Forest, Logistic Regression, XGBoost, and SVM, along with scores obtained by constructing word dictionaries for each type of failure using LIME, one of the XAI (Explainable Artificial Intelligence) methods. Through this, failure classification system is proposed to improve the quality of power generation facilities failure history data. Results: The results of this study are as follows. When the power generation facilities failure classification system was applied to the failure history data of Continuous Ship Unloader, XGBoost showed the best performance with a Macro_F1 Score of 93%. When the system proposed in this study was applied, there was an increase of up to 0.17 in the Macro_F1 Score for Logistic Regression compared to when the model was applied alone. All four models used in this study, when the system was applied, showed equal or higher values in Accuracy and Macro_F1 Score than the single model alone. Conclusion: This study propose a failure classification system for power generation facilities to improve the quality of failure history data. This will contribute to cost reduction and stability of power generation facilities, as well as further improvement of power plant operation efficiency and stability.

A Study on the Operational Forecasting of the Nakdong River Flow with a Combined Watershed and Waterbody Model (실시간 낙동강 흐름 예측을 위한 유역 및 수체모델 결합 적용 연구)

  • Na, Eun Hye;Shin, Chang Min;Park, Lan Joo;Kim, Duck Gil;Kim, Kyunghyun
    • Journal of Korean Society on Water Environment
    • /
    • v.30 no.1
    • /
    • pp.16-24
    • /
    • 2014
  • A combined watershed and receiving waterbody model was developed for operational water flow forecasting of the Nakdong river. The Hydrological Simulation Program Fortran (HSPF) was used for simulating the flow rates at major tributaries. To simulate the flow dynamics in the main stream, a three-dimensional hydrodynamic model, EFDC was used with the inputs derived from the HSPF simulation. The combined models were calibrated and verified using the data measured under different hydrometeological and hydraulic conditions. The model results were generally in good agreement with the field measurements in both calibration and verification. The 7-days forecasting performance of water flows in the Nakdong river was satisfying compared with model calibration results. The forecasting results suggested that the water flow forecasting errors were primarily attributed to the uncertainties of the models, numerical weather prediction, and water release at the hydraulic structures such as upstream dams and weirs. From the results, it is concluded that the combined watershed-waterbody model could successfully simulate the water flows in the Nakdong river. Also, it is suggested that integrating real-time data and information of dam/weir operation plans into model simulation would be essential to improve forecasting reliability.

Factors Associated With Oral Health Related-quality of Life in Elderly Persons: Applying Andersen's Model (노인의 구강건강 관련 삶의 질 결정 요인에 관한 연구 - 앤더슨 모델(Andersen Model)의 적용 -)

  • Yom, Young-Hee;Han, Jung-Hee
    • Journal of Korean Academy of Fundamentals of Nursing
    • /
    • v.21 no.1
    • /
    • pp.18-28
    • /
    • 2014
  • Purpose: This study was done to apply Andersen's behavioral model to identify factors that determine oral health-related quality of life in elderly persons. Methods: Participants were 257 people ages 65 years or older. Data were analyzed using frequency, percentage, mean and hierarchical multiple regression. Results: The variables in the behavioral model, predisposing factors, enabling factors and need factors, explained 31% (F=12.7, p<.001) of variance in oral health-related quality of life. The predisposing factors, enabling factors, need factors and health behavior collectively explained 35% (F=9.22, p<.001) of variance in oral health-related quality of life. Factors influencing oral health-related quality of life in older adults were ADL and IADL, self-reported oral health status, xerostomia and dental care in last 12 months. Conclusions: The analysis results showed that the need factor had the highest level of relative importance of the three factors. The model used for this study can be used to predict oral health-related quality of life.

Customer Satisfaction Measurement Model Based on QFD

  • Liu, Yumin;Xu, Jichao
    • International Journal of Quality Innovation
    • /
    • v.4 no.2
    • /
    • pp.101-122
    • /
    • 2003
  • With the development of the American Customer satisfaction index (ACSI), research on customer satisfaction measurement or evaluation methods have become significant in the last decade. Most of international customer satisfaction barometers or indices are evolved based on the cause and effect relationship model of ACSI. Of critical importance to validity of customer satisfaction indices is how to construct a measurement attribute or indicator model and provide an effective implementation method effectively. Quality Function Deployment (QFD) is a very useful tool for translating the customer voice into product design through quality engineering. In fact, this is a methodology for measuring and analyzing evaluation indicators by their relationship matrix. In this paper, we will make an effort to integrate the framework of QFD into the measurement problem of customer satisfaction, and also develop a new multi-phase QFD model for evaluation of Customer Satisfaction Index (CSI). From the houses of quality in this model, the evaluation indicators impacting on customer's global satisfaction are identified by means of their relationship matrix. Then the evaluation indicator hierarchy and its measurement method for the customer satisfaction index are presented graphically. Furthermore, survey data from the Chinese automobile maintenance sector and a relevant case study are utilized to show the implementation method of the QFD model used to measure and analyze of customer satisfaction.

Enterprise-wide Production Data Model for Decision Support System and Production Automation (생산 자동화 및 의사결정지원시스템 지원을 위한 전사적 생산데이터 프레임웍 개발)

  • Jang J.D.;Hong S.S.;Kim C.Y.;Bae S.M.
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2006.05a
    • /
    • pp.615-616
    • /
    • 2006
  • Many manufacturing companies manage their production-related data for quality management and production management. Nevertheless, production related-data should be closely related to each other Stored data is mainly used to monitor their process and products' error. In this paper, we provide an enterprise-wide production data model for decision support system and product automation. Process data, quality-related data, and test data are integrated to identify the process inter or intra dependency, the yield forecasting, and the trend of process status. In addition, it helps the manufacturing decision support system to decide critical manufacturing problems.

  • PDF