• Title/Summary/Keyword: Algorithms

Search Result 16,541, Processing Time 0.047 seconds

Automated Data Extraction from Unstructured Geotechnical Report based on AI and Text-mining Techniques (AI 및 텍스트 마이닝 기법을 활용한 지반조사보고서 데이터 추출 자동화)

  • Park, Jimin;Seo, Wanhyuk;Seo, Dong-Hee;Yun, Tae-Sup
    • Journal of the Korean Geotechnical Society
    • /
    • v.40 no.4
    • /
    • pp.69-79
    • /
    • 2024
  • Field geotechnical data are obtained from various field and laboratory tests and are documented in geotechnical investigation reports. For efficient design and construction, digitizing these geotechnical parameters is essential. However, current practices involve manual data entry, which is time-consuming, labor-intensive, and prone to errors. Thus, this study proposes an automatic data extraction method from geotechnical investigation reports using image-based deep learning models and text-mining techniques. A deep-learning-based page classification model and a text-searching algorithm were employed to classify geotechnical investigation report pages with 100% accuracy. Computer vision algorithms were utilized to identify valid data regions within report pages, and text analysis was used to match and extract the corresponding geotechnical data. The proposed model was validated using a dataset of 205 geotechnical investigation reports, achieving an average data extraction accuracy of 93.0%. Finally, a user-interface-based program was developed to enhance the practical application of the extraction model. It allowed users to upload PDF files of geotechnical investigation reports, automatically analyze these reports, and extract and edit data. This approach is expected to improve the efficiency and accuracy of digitizing geotechnical investigation reports and building geotechnical databases.

Ethical and Legal Implications of AI-based Human Resources Management (인공지능(AI) 기반 인사관리의 윤리적·법적 영향)

  • Jungwoo Lee;Jungsoo Lee;Ji Hun kwon;Minyi Cha;Kyu Tae Kim
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.25 no.2
    • /
    • pp.100-112
    • /
    • 2024
  • This study investigates the ethical and legal implications of utilizing artificial intelligence (AI) in human resource management, with a particular focus on AI interviews in the recruitment process. AI, defined as the capability of computer programs to perform tasks associated with human intelligence such as reasoning, learning, and adapting, is increasingly being integrated into HR practices. The deployment of AI in recruitment, specifically through AI-driven interviews, promises efficiency and objectivity but also raises significant ethical and legal concerns. These concerns include potential biases in AI algorithms, transparency in AI decision-making processes, data privacy issues, and compliance with existing labor laws and regulations. By analyzing case studies and reviewing relevant literature, this paper aims to provide a comprehensive understanding of these challenges and propose recommendations for ensuring ethical and legal compliance in AI-based HR practices. The findings suggest that while AI can enhance recruitment efficiency, it is imperative to establish robust ethical guidelines and legal frameworks to mitigate risks and ensure fair and transparent hiring practices.

Improving minority prediction performance of support vector machine for imbalanced text data via feature selection and SMOTE (단어선택과 SMOTE 알고리즘을 이용한 불균형 텍스트 데이터의 소수 범주 예측성능 향상 기법)

  • Jongchan Kim;Seong Jun Chang;Won Son
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.4
    • /
    • pp.395-410
    • /
    • 2024
  • Text data is usually made up of a wide variety of unique words. Even in standard text data, it is common to find tens of thousands of different words. In text data analysis, usually, each unique word is treated as a variable. Thus, text data can be regarded as a dataset with a large number of variables. On the other hand, in text data classification, we often encounter class label imbalance problems. In the cases of substantial imbalances, the performance of conventional classification models can be severely degraded. To improve the classification performance of support vector machines (SVM) for imbalanced data, algorithms such as the Synthetic Minority Over-sampling Technique (SMOTE) can be used. The SMOTE algorithm synthetically generates new observations for the minority class based on the k-Nearest Neighbors (kNN) algorithm. However, in datasets with a large number of variables, such as text data, errors may accumulate. This can potentially impact the performance of the kNN algorithm. In this study, we propose a method for enhancing prediction performance for the minority class of imbalanced text data. Our approach involves employing variable selection to generate new synthetic observations in a reduced space, thereby improving the overall classification performance of SVM.

A Study on Dementia Prediction Models and Commercial Utilization Strategies Using Machine Learning Techniques: Based on Sleep and Activity Data from Wearable Devices (머신러닝 기법을 활용한 치매 예측 모델과 상업적 활용 전략: 웨어러블 기기의 수면 및 활동 데이터를 기반으로)

  • Youngeun Jo;Jongpil Yu;Joongan Kim
    • Information Systems Review
    • /
    • v.26 no.2
    • /
    • pp.137-153
    • /
    • 2024
  • This study aimed to propose early diagnosis and management of dementia, which is increasing in aging societies, and suggest commercial utilization strategies by leveraging digital healthcare technologies, particularly lifelog data collected from wearable devices. By introducing new approaches to dementia prevention and management, this study sought to contribute to the field of dementia prediction and prevention. The research utilized 12,184 pieces of lifelog information (sleep and activity data) and dementia diagnosis data collected from 174 individuals aged between 60 and 80, based on medical pathological diagnoses. During the research process, a multidimensional dataset including sleep and activity data was standardized, and various machine learning algorithms were analyzed, with the random forest model showing the highest ROC-AUC score, indicating superior performance. Furthermore, an ablation test was conducted to evaluate the impact of excluding variables related to sleep and activity on the model's predictive power, confirming that regular sleep and activity have a significant influence on dementia prevention. Lastly, by exploring the potential for commercial utilization strategies of the developed model, the study proposed new directions for the commercial spread of dementia prevention systems.

Comparison of Error Rate and Prediction of Compression Index of Clay to Machine Learning Models using Orange Mining (오렌지마이닝을 활용한 기계학습 모델별 점토 압축지수의 오차율 및 예측 비교)

  • Yoo-Jae Woong;Woo-Young Kim;Tae-Hyung Kim
    • Journal of the Korean Geosynthetics Society
    • /
    • v.23 no.3
    • /
    • pp.15-22
    • /
    • 2024
  • Predicting ground settlement during the improvement of soft ground and the construction of a structure is an crucial factor. Numerous studies have been conducted, and many prediction equations have been proposed to estimate settlement. Settlement can be calculated using the compression index of clay. In this study, data on water content, void ratio, liquid limit, plastic limit, and compression index from the Busan New Port area were collected to construct a dataset. Correlation analysis was conducted among the collected data. Machine learning algorithms, including Random Forest, Neural Network, Linear Regression, Ada Boost, and Gradient Boosting, were applied using the Orange mining program to propose compression index prediction models. The models' results were evaluated by comparing RMSE and MAPE values, which indicate error rates, and R2 values, which signify the models' significance. As a result, water content showed the highest correlation, while the plastic limit showed a somewhat lower correlation than other characteristics. Among the compared models, the AdaBoost model demonstrated the best performance. As a result of comparing each model, the AdaBoost model had the lowest error rate and a large coefficient of determination.

Advancing Process Plant Design: A Framework for Design Automation Using Generative Neural Network Models

  • Minhyuk JUNG;Jaemook CHOI;Seonu JOO;Wonseok CHOI;Hwikyung Chun
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.1285-1285
    • /
    • 2024
  • In process plant construction, the implementation of design automation technologies is pivotal in reducing the timeframes associated with the design phase and in enabling the generation and evaluation of a variety of design alternatives, thereby facilitating the identification of optimal solutions. These technologies can play a crucial role in ensuring the successful delivery of projects. Previous research in the domain of design automation has primarily focused on parametric design in architectural contexts and on the automation of equipment layout and pipe routing within plant engineering, predominantly employing rule-based algorithms. Nevertheless, these studies are constrained by the limited flexibility of their models, which narrows the scope for generating alternative solutions and complicates the process of exploring comprehensive solutions using nonlinear optimization techniques as the number of design and engineering parameters increases. This research introduces a framework for automating plant design through the use of generative neural network models to overcome these challenges. The framework is applicable to the layout problems of process plants, covering the equipment necessary for production processes and the facilities for essential resources and their interconnections. The development of the proposed Neural-network (NN) based Generative Design Model unfolds in four stages: (a) Rule-based Model Development: This initial phase involves the development of rule-based models for layout generation and evaluation, where the generation model produces layouts based on predefined parameters, and the evaluation model assesses these layouts using various performance metrics. (b) Neural Network Model Development: This phase transitions towards neural network models, establishing a NN-based layout generation model utilizing Generative Adversarial Network (GAN)-based methods and a NN-based layout evaluation model. (c) Model Optimization: The third phase is dedicated to optimizing the models through Bayesian Optimization, aiming to extend the exploration space beyond the limitations of rule-based models. (d) Inverse Design Model Development: The concluding phase employs an inverse design method to merge the generative and evaluative networks, resulting in a model that outputs layout designs to meet specific performance objectives. This study aims to augment the efficiency and effectiveness of the design process in process plant construction, transcending the limitations of conventional rule-based approaches and contributing to the achievement of successful project outcomes.

Development of New 4D Phantom Model in Respiratory Gated Volumetric Modulated Arc Therapy for Lung SBRT (폐암 SBRT에서 호흡동조 VMAT의 정확성 분석을 위한 새로운 4D 팬텀 모델 개발)

  • Yoon, KyoungJun;Kwak, JungWon;Cho, ByungChul;Song, SiYeol;Lee, SangWook;Ahn, SeungDo;Nam, SangHee
    • Progress in Medical Physics
    • /
    • v.25 no.2
    • /
    • pp.100-109
    • /
    • 2014
  • In stereotactic body radiotherapy (SBRT), the accurate location of treatment sites should be guaranteed from the respiratory motions of patients. Lots of studies on this topic have been conducted. In this letter, a new verification method simulating the real respiratory motion of heterogenous treatment regions was proposed to investigate the accuracy of lung SBRT for Volumetric Modulated Arc Therapy. Based on the CT images of lung cancer patients, lung phantoms were fabricated to equip in $QUASAR^{TM}$ respiratory moving phantom using 3D printer. The phantom was bisected in order to measure 2D dose distributions by the insertion of EBT3 film. To ensure the dose calculation accuracy in heterogeneous condition, The homogeneous plastic phantom were also utilized. Two dose algorithms; Analytical Anisotropic Algorithm (AAA) and AcurosXB (AXB) were applied in plan dose calculation processes. In order to evaluate the accuracy of treatments under respiratory motion, we analyzed the gamma index between the plan dose and film dose measured under various moving conditions; static and moving target with or without gating. The CT number of GTV region was 78 HU for real patient and 92 HU for the homemade lung phantom. The gamma pass rates with 3%/3 mm criteria between the plan dose calculated by AAA algorithm and the film doses measured in heterogeneous lung phantom under gated and no gated beam delivery with respiratory motion were 88% and 78%. In static case, 95% of gamma pass rate was presented. In the all cases of homogeneous phantom, the gamma pass rates were more than 99%. Applied AcurosXB algorithm, for heterogeneous phantom, more than 98% and for homogeneous phantom, more than 99% of gamma pass rates were achieved. Since the respiratory amplitude was relatively small and the breath pattern had the longer exhale phase than inhale, the gamma pass rates in 3%/3 mm criteria didn't make any significant difference for various motion conditions. In this study, the new phantom model of 4D dose distribution verification using patient-specific lung phantoms moving in real breathing patterns was successfully implemented. It was also evaluated that the model provides the capability to verify dose distributions delivered in the more realistic condition and also the accuracy of dose calculation.

Usability of Multiple Confocal SPECT SYSTEM in the Myocardial Perfusion SPECT Using $^{99m}Tc$ ($^{99m}Tc$을 이용한 심근 관류 SPECT에서 Multiple Confocal SPECT System의 유용성)

  • Shin, Chae-Ho;Pyo, Sung-Jai;Kim, Bong-Su;Cho, Yong-Gyi;Jo, Jin-Woo;Kim, Chang-Ho
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.15 no.2
    • /
    • pp.65-71
    • /
    • 2011
  • Purpose: The recently adopted multiple confocal SPECT SYSTEM (hereinafter called IQ SPECT$^{TM}$) has a high difference from the conventional myocardial perfusion SPECT in the collimator form, image capture method, and image reconstruction method. This study was conducted to compare this novice equipment with the conventional one to design a protocol meeting the IQ SPECT, and also determine the characteristics and usefulness of IQ SPECT. Materials and Methods: 1. For the objects of LEHR (Low energy high resolution) collimator and Multiple confocal collimator, $^{99m}Tc$ 37MBq was put in the acrylic dish then each sensitivity ($cpm/{\mu}Ci$) was measured at the distance of 5 cm, 10 cm, 20 cm, 30 cm, and 40 cm respectively. 2. Based on the sensitivity measure results, IQ SPECT Protocol was designed according to the conventional general myocardial SPECT, then respectively 278 kBq/mL, 7.4 kBq/mL, and 48 kBq/mL of $^{99m}Tc$ were injected into the myocardial and soft tissues and liver site by using the anthropomorphic torso phantom then the myocardial perfusion SPECT was run. 3. For the comparison of FWHMs (Full Width at Half Maximum) resulted from the image reconstruction of LEHR collimator, the FWHMs (mm) were measured with only algorithms changed, in the case of the FBP (Filtered Back projection) method- a reconstruction method of conventional myocardial perfusion SPECT, and the 3D OSEM (Ordered subsets expectation maximization) method of IQ SPECT, by using $^{99m}Tc$ Line source. Results: 1. The values of IQ SPECT collimator sensitivity ($cpm/{\mu}Ci$) were 302, 382, 655, 816, 1178, and those of LEHR collimator were measured as 204, 204, 202, 201, 198, both at the distance of 5 cm, 10 cm, 20 cm, 30 cm, and 40 cm respectively. It was found the difference of sensitivity increases up to 4 times at the distance of 30 cm in the cases of IQ SPECT and LEHR. 2. The myocardial perfusion SPECT Protocol was designed according to the geometric characteristics of IQ SPECT based on the sensitivity results, then the phantom test for the aforesaid protocol was conducted. As a result, it was found the examination time can be reduced 1/4 compared to the past. 3. In the comparison of FWHMs according to the reconstructed algorithm in the FBP method and 3D OSEM method followed after the SEPCT test using a LEHR collimator, the result was obtained that FWHM rose around twice in the 3D OSEM method. Conclusion : The IQ SPECT uses the Multiple confocal collimator for the myocardial perfusion SPECT to enhance the sensitivity and also reduces examination time and contributes to improvement of visual screen quality through the myocardial-specific geometric image capture method and image reconstruction method. Due to such benefits, it is expected patients will receive more comfortable and more accurate examinations and it is considered a further study is required using additional clinical materials.

  • PDF

Comparing Prediction Uncertainty Analysis Techniques of SWAT Simulated Streamflow Applied to Chungju Dam Watershed (충주댐 유역의 유출량에 대한 SWAT 모형의 예측 불확실성 분석 기법 비교)

  • Joh, Hyung-Kyung;Park, Jong-Yoon;Jang, Cheol-Hee;Kim, Seong-Joon
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.9
    • /
    • pp.861-874
    • /
    • 2012
  • To fulfill applicability of Soil and Water Assessment Tool (SWAT) model, it is important that this model passes through a careful calibration and uncertainty analysis. In recent years, many researchers have come up with various uncertainty analysis techniques for SWAT model. To determine the differences and similarities of typical techniques, we applied three uncertainty analysis procedures to Chungju Dam watershed (6,581.1 $km^2$) of South Korea included in SWAT-Calibration Uncertainty Program (SWAT-CUP): Sequential Uncertainty FItting algorithm ver.2 (SUFI2), Generalized Likelihood Uncertainty Estimation (GLUE), Parameter Solution (ParaSol). As a result, there was no significant difference in the objective function values between SUFI2 and GLUE algorithms. However, ParaSol algorithm shows the worst objective functions, and considerable divergence was also showed in 95PPU bands with each other. The p-factor and r-factor appeared from 0.02 to 0.79 and 0.03 to 0.52 differences in streamflow respectively. In general, the ParaSol algorithm showed the lowest p-factor and r-factor, SUFI2 algorithm was the highest in the p-factor and r-factor. Therefore, in the SWAT model calibration and uncertainty analysis of the automatic methods, we suggest the calibration methods considering p-factor and r-factor. The p-factor means the percentage of observations covered by 95PPU (95 Percent Prediction Uncertainty) band, and r-factor is the average thickness of the 95PPU band.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.