• Title/Summary/Keyword: Data-driven Research

Search Result 731, Processing Time 0.026 seconds

Data-centric XAI-driven Data Imputation of Molecular Structure and QSAR Model for Toxicity Prediction of 3D Printing Chemicals (3D 프린팅 소재 화학물질의 독성 예측을 위한 Data-centric XAI 기반 분자 구조 Data Imputation과 QSAR 모델 개발)

  • ChanHyeok Jeong;SangYoun Kim;SungKu Heo;Shahzeb Tariq;MinHyeok Shin;ChangKyoo Yoo
    • Korean Chemical Engineering Research
    • /
    • v.61 no.4
    • /
    • pp.523-541
    • /
    • 2023
  • As accessibility to 3D printers increases, there is a growing frequency of exposure to chemicals associated with 3D printing. However, research on the toxicity and harmfulness of chemicals generated by 3D printing is insufficient, and the performance of toxicity prediction using in silico techniques is limited due to missing molecular structure data. In this study, quantitative structure-activity relationship (QSAR) model based on data-centric AI approach was developed to predict the toxicity of new 3D printing materials by imputing missing values in molecular descriptors. First, MissForest algorithm was utilized to impute missing values in molecular descriptors of hazardous 3D printing materials. Then, based on four different machine learning models (decision tree, random forest, XGBoost, SVM), a machine learning (ML)-based QSAR model was developed to predict the bioconcentration factor (Log BCF), octanol-air partition coefficient (Log Koa), and partition coefficient (Log P). Furthermore, the reliability of the data-centric QSAR model was validated through the Tree-SHAP (SHapley Additive exPlanations) method, which is one of explainable artificial intelligence (XAI) techniques. The proposed imputation method based on the MissForest enlarged approximately 2.5 times more molecular structure data compared to the existing data. Based on the imputed dataset of molecular descriptor, the developed data-centric QSAR model achieved approximately 73%, 76% and 92% of prediction performance for Log BCF, Log Koa, and Log P, respectively. Lastly, Tree-SHAP analysis demonstrated that the data-centric-based QSAR model achieved high prediction performance for toxicity information by identifying key molecular descriptors highly correlated with toxicity indices. Therefore, the proposed QSAR model based on the data-centric XAI approach can be extended to predict the toxicity of potential pollutants in emerging printing chemicals, chemical process, semiconductor or display process.

Explainable Artificial Intelligence (XAI) Surrogate Models for Chemical Process Design and Analysis (화학 공정 설계 및 분석을 위한 설명 가능한 인공지능 대안 모델)

  • Yuna Ko;Jonggeol Na
    • Korean Chemical Engineering Research
    • /
    • v.61 no.4
    • /
    • pp.542-549
    • /
    • 2023
  • Since the growing interest in surrogate modeling, there has been continuous research aimed at simulating nonlinear chemical processes using data-driven machine learning. However, the opaque nature of machine learning models, which limits their interpretability, poses a challenge for their practical application in industry. Therefore, this study aims to analyze chemical processes using Explainable Artificial Intelligence (XAI), a concept that improves interpretability while ensuring model accuracy. While conventional sensitivity analysis of chemical processes has been limited to calculating and ranking the sensitivity indices of variables, we propose a methodology that utilizes XAI to not only perform global and local sensitivity analysis, but also examine the interactions among variables to gain physical insights from the data. For the ammonia synthesis process, which is the target process of the case study, we set the temperature of the preheater leading to the first reactor and the split ratio of the cold shot to the three reactors as process variables. By integrating Matlab and Aspen Plus, we obtained data on ammonia production and the maximum temperatures of the three reactors while systematically varying the process variables. We then trained tree-based models and performed sensitivity analysis using the SHAP technique, one of the XAI methods, on the most accurate model. The global sensitivity analysis showed that the preheater temperature had the greatest effect, and the local sensitivity analysis provided insights for defining the ranges of process variables to improve productivity and prevent overheating. By constructing alternative models for chemical processes and using XAI for sensitivity analysis, this work contributes to providing both quantitative and qualitative feedback for process optimization.

Development of Web-based Construction-Site-Safety-Management Platform Using Artificial Intelligence (인공지능을 이용한 웹기반 건축현장 안전관리 플랫폼 개발)

  • Siuk Kim;Eunseok Kim;Cheekyeong Kim
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.37 no.2
    • /
    • pp.77-84
    • /
    • 2024
  • In the fourth industrial-revolution era, the construction industry is transitioning from traditional methods to digital processes. This shift has been challenging owing to the industry's employment of diverse processes and extensive human resources, leading to a gradual adoption of digital technologies through trial and error. One critical area of focus is the safety management at construction sites, which is undergoing significant research and efforts towards digitization and automation. Despite these initiatives, recent statistics indicate a persistent occurrence of accidents and fatalities in construction sites. To address this issue, this study utilizes large-scale language-model artificial intelligence to analyze big data from a construction safety-management information network. The findings are integrated into on-site models, which incorporate real-time updates from detailed design models and are enriched with location information and spatial characteristics, for enhanced safety management. This research aims to develop a big-data-driven safety-management platform to bolster facility and worker safety by digitizing construction-site safety data. This platform can help prevent construction accidents and provide effective education for safety practices.

Analysis of Research Trends in Korean English Education Journals Using Topic Modeling (토픽 모델링을 활용한 한국 영어교육 학술지에 나타난 연구동향 분석)

  • Won, Yongkook;Kim, Youngwoo
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.4
    • /
    • pp.50-59
    • /
    • 2021
  • To understand the research trends of English education in Korea for the last 20 years from 2000 to 2019, 12 major academic journals in Korea in the field of English education were selected, and bibliographic information of 7,329 articles published in these journals were collected and analyzed. The total number of articles increased from the 2000s to the first half of the 2010s, but decreased somewhat in the late 2010s and the number of publications by journal has become similar. These results show that the overall influence of English education journals has decreased and then leveled in terms of quantity. Next, 34 topics were extracted by applying latent Dirichlet allocation (LDA) topic modeling using the English abstract of the articles. Teacher, word, culture/media, and grammar appeared as topics that were highly studied. Topics such as word, vocabulary, and testing and evaluation appeared through unique keywords, and various topics related to learner factors emerged, becoming topics of interest in English education research. Then, topics were analyzed to determine which ones were rising or falling in frequency. As a result of this analysis, qualitative research, vocabulary, learner factor, and testing were found to be rising topics, while falling topics included CALL, language, teaching, and grammar. This change in research topics shows that research interests in the field of English education are shifting from static research topics to data-driven and dynamic research topics.

A Study on Consumer Value Perception through Social Big Data Analysis: Focus on Smartphone Brands (소셜 빅데이터 분석을 통한 소비자 가치 인식 연구: 신규 스마트폰을 중심으로)

  • Kim, Hyong-Jung;Kim, Jin-Hwa
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.1
    • /
    • pp.123-146
    • /
    • 2017
  • The information that consumers share in the SNS (Social Networking Service) has a great influence on the purchase of consumers. Therefore, it is necessary to pay attention to new research methodology and advertising strategy using Social Big Data. In this context, the purpose of this study is to quantitatively analyze customer value through Social Big Data. In this study, we analyzed the value structure of consumers for the three smartphone brands through text mining and positive/negative image analysis. Analysis result, it was possible to distinguish the emotional aspects (sensitivity) and rational aspects (rationality) for customer value per brand. In the case of the Galaxy S7 and iPhone 6S, emotional aspects were important before the launch, but the rational aspects was important after release date. On the other hand, in the case of the LG G5, emotional aspects were important before and after launch. We can propose two core advertising strategies based on analyzed consumer value. When developing advertising strategy in the case of the Galaxy S7, there is a need to emphasize the rational aspects of product attributes and differentiated functions. In the case of the LG G5, it is necessary to consider the emotional aspects of happiness, excitement, pleasure, and fun that are felt by using products in advertising strategy. As a result, this study will provide a good standard for actual advertising strategy through consumer value analysis. Advertising strategies are primarily driven by intuition or experience. Therefore, it is important to develop advertising strategies by analyzing consumer value through social big data analysis.

Causal inference from nonrandomized data: key concepts and recent trends (비실험 자료로부터의 인과 추론: 핵심 개념과 최근 동향)

  • Choi, Young-Geun;Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.173-185
    • /
    • 2019
  • Causal questions are prevalent in scientific research, for example, how effective a treatment was for preventing an infectious disease, how much a policy increased utility, or which advertisement would give the highest click rate for a given customer. Causal inference theory in statistics interprets those questions as inferring the effect of a given intervention (treatment or policy) in the data generating process. Causal inference has been used in medicine, public health, and economics; in addition, it has received recent attention as a tool for data-driven decision making processes. Many recent datasets are observational, rather than experimental, which makes the causal inference theory more complex. This review introduces key concepts and recent trends of statistical causal inference in observational studies. We first introduce the Neyman-Rubin's potential outcome framework to formularize from causal questions to average treatment effects as well as discuss popular methods to estimate treatment effects such as propensity score approaches and regression approaches. For recent trends, we briefly discuss (1) conditional (heterogeneous) treatment effects and machine learning-based approaches, (2) curse of dimensionality on the estimation of treatment effect and its remedies, and (3) Pearl's structural causal model to deal with more complex causal relationships and its connection to the Neyman-Rubin's potential outcome model.

Long-term runoff simulation using rainfall LSTM-MLP artificial neural network ensemble (LSTM - MLP 인공신경망 앙상블을 이용한 장기 강우유출모의)

  • An, Sungwook;Kang, Dongho;Sung, Janghyun;Kim, Byungsik
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.2
    • /
    • pp.127-137
    • /
    • 2024
  • Physical models, which are often used for water resource management, are difficult to build and operate with input data and may involve the subjective views of users. In recent years, research using data-driven models such as machine learning has been actively conducted to compensate for these problems in the field of water resources, and in this study, an artificial neural network was used to simulate long-term rainfall runoff in the Osipcheon watershed in Samcheok-si, Gangwon-do. For this purpose, three input data groups (meteorological observations, daily precipitation and potential evapotranspiration, and daily precipitation - potential evapotranspiration) were constructed from meteorological data, and the results of training the LSTM (Long Short-term Memory) artificial neural network model were compared and analyzed. As a result, the performance of LSTM-Model 1 using only meteorological observations was the highest, and six LSTM-MLP ensemble models with MLP artificial neural networks were built to simulate long-term runoff in the Fifty Thousand Watershed. The comparison between the LSTM and LSTM-MLP models showed that both models had generally similar results, but the MAE, MSE, and RMSE of LSTM-MLP were reduced compared to LSTM, especially in the low-flow part. As the results of LSTM-MLP show an improvement in the low-flow part, it is judged that in the future, in addition to the LSTM-MLP model, various ensemble models such as CNN can be used to build physical models and create sulfur curves in large basins that take a long time to run and unmeasured basins that lack input data.

Behavior-Structure-Evolution Evaluation Model(BSEM) for Open Source Software Service (공개소프트웨어 서비스 평가모델(BSEM)에 관한 개념적 연구)

  • Lee, Seung-Chang;Park, Hoon-Sung;Suh, Eung-Kyo
    • Journal of Distribution Science
    • /
    • v.13 no.1
    • /
    • pp.57-70
    • /
    • 2015
  • Purpose - Open source software has high utilization in most of the server market. The utilization of open source software is a global trend. Particularly, Internet infrastructure and platform software open source software development has increased rapidly. Since 2003, the Korean government has published open source software promotion policies and a supply promotion policy. The dynamism of the open source software market, the lack of relevant expertise, and the market transformation due to reasons such as changes in the relevant technology occur slowly in relation to adoption. Therefore, this study proposes an assessment model of services provided in an open source software service company. In this study, the service level of open source software companies is classified into an enterprise-level assessment area, the service level assessment area, and service area. The assessment model is developed from an on-site driven evaluation index and proposed evaluation framework; the evaluation procedures and evaluation methods are used to achieve the research objective, involving an impartial evaluation model implemented after pilot testing and validation. Research Design, data, and methodology - This study adopted an iteration development model to accommodate various requirements, and presented and validated the assessment model to address the situation of the open source software service company. Phase 1 - Theoretical background and literature review Phase 2 - Research on an evaluation index based on the open source software service company Phase 3 - Index improvement through expert validation Phase 4 - Finalizing an evaluation model reflecting additional requirements Based on the open source software adoption case study and latest technology trends, we developed an open source software service concept definition and classification of public service activities for open source software service companies. We also presented open source software service company service level measures by developing a service level factor analysis assessment. The Behavior-Structure-Evolution Evaluation Model (BSEM) proposed in this study consisted of a rating methodology for calculating the level that can be granted through the assessment and evaluation of an enterprise-level data model. An open source software service company's service comprises the service area and service domain, while the technology acceptance model comprises the service area, technical domain, technical sub-domain, and open source software name. Finally, the evaluation index comprises the evaluation group, category, and items. Results - Utilization of an open source software service level evaluation model For the development of an open source software service level evaluation model, common service providers need to standardize the quality of the service, so that surveys and expert workshops performed in open source software service companies can establish the evaluation criteria according to their qualitative differences. Conclusion - Based on this evaluation model's systematic evaluation process and monitoring, an open source software service adoption company can acquire reliable information for open source software adoption. Inducing the growth of open source software service companies will facilitate the development of the open source software industry.

Advances, Limitations, and Future Applications of Aerospace and Geospatial Technologies for Apple IPM (사과 IPM을 위한 항공 및 지리정보 기술의 진보, 제한 및 미래 응용)

  • Park, Yong-Lak;Cho, Jum Rae;Choi, Kyung-Hee;Kim, Hyun Ran;Kim, Ji Won;Kim, Se Jin;Lee, Dong-Hyuk;Park, Chang-Gyu;Cho, Young Sik
    • Korean journal of applied entomology
    • /
    • v.60 no.1
    • /
    • pp.135-143
    • /
    • 2021
  • Aerospace and geospatial technologies have become more accessible by researchers and agricultural practitioners, and these technologies can play a pivotal role in transforming current pest management practices in agriculture and forestry. During the past 20 years, technologies including satellites, manned and unmanned aircraft, spectral sensors, information systems, and autonomous field equipment, have been used to detect pests and apply control measures site-specifically. Despite the availability of aerospace and geospatial technologies, along with big-data-driven artificial intelligence, applications of such technologies to apple IPM have not been realized yet. Using a case study conducted at the Korea Apple Research Institute, this article discusses the advances and limitations of current aerospace and geospatial technologies that can be used for improving apple IPM.

A Study on the Application of Suitable Urban Regeneration Project Types Reflecting the Spatial Characteristics of Urban Declining Areas (도시 쇠퇴지역 공간 특성을 반영한 적합 도시재생 사업유형 적용방안 연구)

  • CHO, Don-Cherl;SHIN, Dong-Bin
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.24 no.4
    • /
    • pp.148-163
    • /
    • 2021
  • The diversification of the New Deal urban regeneration projects, that started in 2017 in accordance with the "Special Act on Urban Regeneration Activation and Support", generated the increased demand for the accuracy of data-driven diagnosis and project type forecast. Thus, this research was conducted to develop an application model able to identify the most appropriate New Deal project type for "eup", "myeon" and "dong" across the country. Data for application model development were collected through Statistical geographic information service(SGIS) and the 'Urban Regeneration Comprehensive Information Open System' of the Urban Regeneration Information System, and data for the analysis model was constructed through data pre-processing. Four models were derived and simulations were performed through polynomial regression analysis and multinomial logistic regression analysis for the application of the appropriate New Deal project type. I verified the applicability and validity of the four models by the comparative analysis of spatial distribution of the previously selected New Deal projects by targeting the sites located in Seoul by each model and the result showed that the DI-54 model had the highest concordance rate.