• Title/Summary/Keyword: Preprocessed data

Search Result 188, Processing Time 0.026 seconds

Damaged cable detection with statistical analysis, clustering, and deep learning models

  • Son, Hyesook;Yoon, Chanyoung;Kim, Yejin;Jang, Yun;Tran, Linh Viet;Kim, Seung-Eock;Kim, Dong Joo;Park, Jongwoong
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.17-28
    • /
    • 2022
  • The cable component of cable-stayed bridges is gradually impacted by weather conditions, vehicle loads, and material corrosion. The stayed cable is a critical load-carrying part that closely affects the operational stability of a cable-stayed bridge. Damaged cables might lead to the bridge collapse due to their tension capacity reduction. Thus, it is necessary to develop structural health monitoring (SHM) techniques that accurately identify damaged cables. In this work, a combinational identification method of three efficient techniques, including statistical analysis, clustering, and neural network models, is proposed to detect the damaged cable in a cable-stayed bridge. The measured dataset from the bridge was initially preprocessed to remove the outlier channels. Then, the theory and application of each technique for damage detection were introduced. In general, the statistical approach extracts the parameters representing the damage within time series, and the clustering approach identifies the outliers from the data signals as damaged members, while the deep learning approach uses the nonlinear data dependencies in SHM for the training model. The performance of these approaches in classifying the damaged cable was assessed, and the combinational identification method was obtained using the voting ensemble. Finally, the combination method was compared with an existing outlier detection algorithm, support vector machines (SVM). The results demonstrate that the proposed method is robust and provides higher accuracy for the damaged cable detection in the cable-stayed bridge.

Determination of the stage and grade of periodontitis according to the current classification of periodontal and peri-implant diseases and conditions (2018) using machine learning algorithms

  • Kubra Ertas;Ihsan Pence;Melike Siseci Cesmeli;Zuhal Yetkin Ay
    • Journal of Periodontal and Implant Science
    • /
    • v.53 no.1
    • /
    • pp.38-53
    • /
    • 2023
  • Purpose: The current Classification of Periodontal and Peri-Implant Diseases and Conditions, published and disseminated in 2018, involves some difficulties and causes diagnostic conflicts due to its criteria, especially for inexperienced clinicians. The aim of this study was to design a decision system based on machine learning algorithms by using clinical measurements and radiographic images in order to determine and facilitate the staging and grading of periodontitis. Methods: In the first part of this study, machine learning models were created using the Python programming language based on clinical data from 144 individuals who presented to the Department of Periodontology, Faculty of Dentistry, Süleyman Demirel University. In the second part, panoramic radiographic images were processed and classification was carried out with deep learning algorithms. Results: Using clinical data, the accuracy of staging with the tree algorithm reached 97.2%, while the random forest and k-nearest neighbor algorithms reached 98.6% accuracy. The best staging accuracy for processing panoramic radiographic images was provided by a hybrid network model algorithm combining the proposed ResNet50 architecture and the support vector machine algorithm. For this, the images were preprocessed, and high success was obtained, with a classification accuracy of 88.2% for staging. However, in general, it was observed that the radiographic images provided a low level of success, in terms of accuracy, for modeling the grading of periodontitis. Conclusions: The machine learning-based decision system presented herein can facilitate periodontal diagnoses despite its current limitations. Further studies are planned to optimize the algorithm and improve the results.

Research on BGP dataset analysis and CyCOP visualization methods (BGP 데이터셋 분석 및 CyCOP 가시화 방안 연구)

  • Jae-yeong Jeong;Kook-jin Kim;Han-sol Park;Ji-soo Jang;Dong-il Shin;Dong-kyoo Shin
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.177-188
    • /
    • 2024
  • As technology evolves, Internet usage continues to grow, resulting in a geometric increase in network traffic and communication volumes. The network path selection process, which is one of the core elements of the Internet, is becoming more complex and advanced as a result, and it is important to effectively manage and analyze it, and there is a need for a representation and visualization method that can be intuitively understood. To this end, this study designs a framework that analyzes network data using BGP, a network path selection method, and applies it to the cyber common operating picture for situational awareness. After that, we analyze the visualization elements required to visualize the information and conduct an experiment to implement a simple visualization. Based on the data collected and preprocessed in the experiment, the visualization screens implemented help commanders or security personnel to effectively understand the network situation and take command and control.

Deep Learning-based Happiness Index Model Considering Social Variables and Individual Emotional Index (사회적 변수와 개개인의 감정지수를 함께 고려한 딥러닝 기반 행복 지수 모델 설계)

  • Sumin Oh;Minseo Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.489-493
    • /
    • 2024
  • Happiness index is a measurement system for understanding collective happiness. As values change, studies have been proposed to add the value of behavior to the happiness index. However, there is a lack of studies analyze the relationship using individual emotions. Using a deep learning model, we predicted happiness index using social variables and individual emotional index. First, we collected social and emotional variables from January 2005 to December 2020. Second, we preprocessed the data and identified significant variables. Finally, we trained deep learning-based regression model. Our proposed model was evaluated using 5-fold cross validation. The proposed model showed 90.86% accuracy on test sets. Our model will be expected to analyze the significant factors of country-specific happiness index.

Prediction of East Asian Brain Age using Machine Learning Algorithms Trained With Community-based Healthy Brain MRI

  • Chanda Simfukwe;Young Chul Youn
    • Dementia and Neurocognitive Disorders
    • /
    • v.21 no.4
    • /
    • pp.138-146
    • /
    • 2022
  • Background and Purpose: Magnetic resonance imaging (MRI) helps with brain development analysis and disease diagnosis. Brain volumes measured from different ages using MRI provides useful information in clinical evaluation and research. Therefore, we trained machine learning models that predict the brain age gap of healthy subjects in the East Asian population using T1 brain MRI volume images. Methods: In total, 154 T1-weighted MRIs of healthy subjects (55-83 years of age) were collected from an East Asian community. The information of age, gender, and education level was collected for each participant. The MRIs of the participants were preprocessed using FreeSurfer(https://surfer.nmr.mgh.harvard.edu/) to collect the brain volume data. We trained the models using different supervised machine learning regression algorithms from the scikit-learn (https://scikit-learn.org/) library. Results: The trained models comprised 19 features that had been reduced from 55 brain volume labels. The algorithm BayesianRidge (BR) achieved a mean absolute error (MAE) and r squared (R2) of 3 and 0.3 years, respectively, in predicting the age of the new subjects compared to other regression methods. The results of feature importance analysis showed that the right pallidum, white matter hypointensities on T1-MRI scans, and left hippocampus comprise some of the essential features in predicting brain age. Conclusions: The MAE and R2 accuracies of the BR model predicting brain age gap in the East Asian population showed that the model could reduce the dimensionality of neuroimaging data to provide a meaningful biomarker for individual brain aging.

Exploring Predictive Models for Student Success in National Physical Therapy Examination: Machine Learning Approach

  • Bokyung Kim;Yeonseop Lee;Jang-hoon Shin;Yusung Jang;Wansuk Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.10
    • /
    • pp.113-120
    • /
    • 2024
  • This study aims to assess the effectiveness of machine learning models in predicting the pass rates of physical therapy students in national exams. Traditional grade prediction methods primarily rely on past academic performance or demographic data. However, this study employed machine learning and deep learning techniques to analyze mock test scores with the goal of improving prediction accuracy. Data from 1,242 students across five Korean universities were collected and preprocessed, followed by analysis using various models. Models, including those generated and fine-tuned with the assistance of ChatGPT-4, were applied to the dataset. The results showed that H2OAutoML (GBM2) performed the best with an accuracy of 98.4%, while TabNet, LightGBM, and RandomForest also demonstrated high performance. This study demonstrates the exceptional effectiveness of H2OAutoML (GBM2) in predicting national exam pass rates and suggests that these AI-assisted models can significantly contribute to medical education and policy.

Guidelines for big data projects in artificial intelligence mathematics education (인공지능 수학 교육을 위한 빅데이터 프로젝트 과제 가이드라인)

  • Lee, Junghwa;Han, Chaereen;Lim, Woong
    • The Mathematical Education
    • /
    • v.62 no.2
    • /
    • pp.289-302
    • /
    • 2023
  • In today's digital information society, student knowledge and skills to analyze big data and make informed decisions have become an important goal of school mathematics. Integrating big data statistical projects with digital technologies in high school <Artificial Intelligence> mathematics courses has the potential to provide students with a learning experience of high impact that can develop these essential skills. This paper proposes a set of guidelines for designing effective big data statistical project-based tasks and evaluates the tasks in the artificial intelligence mathematics textbook against these criteria. The proposed guidelines recommend that projects should: (1) align knowledge and skills with the national school mathematics curriculum; (2) use preprocessed massive datasets; (3) employ data scientists' problem-solving methods; (4) encourage decision-making; (5) leverage technological tools; and (6) promote collaborative learning. The findings indicate that few textbooks fully align with these guidelines, with most failing to incorporate elements corresponding to Guideline 2 in their project tasks. In addition, most tasks in the textbooks overlook or omit data preprocessing, either by using smaller datasets or by using big data without any form of preprocessing. This can potentially result in misconceptions among students regarding the nature of big data. Furthermore, this paper discusses the relevant mathematical knowledge and skills necessary for artificial intelligence, as well as the potential benefits and pedagogical considerations associated with integrating technology into big data tasks. This research sheds light on teaching mathematical concepts with machine learning algorithms and the effective use of technology tools in big data education.

Validation of Sea Surface Wind Estimated from KOMPSAT-5 Backscattering Coefficient Data (KOMPSAT-5 후방산란계수 자료로 산출된 해상풍 검증)

  • Jang, Jae-Cheol;Park, Kyung-Ae;Yang, Dochul
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.6_3
    • /
    • pp.1383-1398
    • /
    • 2018
  • Sea surface wind is one of the most fundamental variables for understanding diverse marine phenomena. Although scatterometers have produced global wind field data since the early 1990's, the data has been used limitedly in oceanic applications due to it slow spatial resolution, especially at coastal regions. Synthetic Aperture Radar (SAR) is capable to produce high resolution wind field data. KOMPSAT-5 is the first Korean satellite equipped with X-band SAR instrument and is able to retrieve the sea surface wind. This study presents the validation results of sea surface wind derived from the KOMPSAT-5 backscattering coefficient data for the first time. We collected 18 KOMPSAT-5 ES mode data to produce a matchup database collocated with buoy stations. In order to calculate the accurate wind speed, we preprocessed the SAR data, including land masking, speckle noise reduction, and ship detection, and converted the in-situ wind to 10-m neutral wind as reference wind data using Liu-Katsaros-Businger (LKB) model. The sea surface winds based on XMOD2 show root-mean-square errors of about $2.41-2.74m\;s^{-1}$ depending on backscattering coefficient conversion equations. In-depth analyses on the wind speed errors derived from KOMPSAT-5 backscattering coefficient data reveal the existence of diverse potential error factors such as image quality related to range ambiguity, discrete and discontinuous distribution of incidence angle, change in marine atmospheric environment, impacts on atmospheric gravity waves, ocean wave spectrum, and internal wave.

Reliability Analysis of Privacy Policies Using Android Static Analysis (안드로이드 정적 분석을 활용한 개인정보 처리방침의 신뢰성 분석)

  • Yoonkyo, Jung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.1
    • /
    • pp.17-24
    • /
    • 2023
  • Mobile apps frequently request permission to access sensitive data for user convenience. However, while using mobile applications, sensitive and personal data has been leaked even if users do not allow it. To deal with this problem, Google App Store has required developers to disclose how the mobile app handles user data in a privacy policy. However, users are not certain that the privacy policy describes all the app's behavior. They have no choice but to rely on the privacy policy to confirm how the app uses data. This study designed a system that checks the reliability of privacy policies by analyzing the privacy policy texts and mobile apps. First, the system extracts and analyzes the privacy policy texts to check which personal data the privacy policy discloses that the mobile apps can collect. After analyzing which data apps can access using android static analysis, we compare both results to analyze the reliability of privacy policies. For the experiment, we collected the APK files and metadata of about 13K android apps registered in the Google Play Store and preprocessed the apps by four conditions. According to the comparison between privacy policies and mobile app behavior, many apps can access more personal data than disclosed in the privacy policy.

An Analysis of Causes of Marine Incidents at sea Using Big Data Technique (빅데이터 기법을 활용한 항해 중 준해양사고 발생원인 분석에 관한 연구)

  • Kang, Suk-Young;Kim, Ki-Sun;Kim, Hong-Beom;Rho, Beom-Seok
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.24 no.4
    • /
    • pp.408-414
    • /
    • 2018
  • Various studies have been conducted to reduce marine accidents. However, research on marine incidents is only marginal. There are many reports of marine incidents, but the main content of existing studies has been qualitative, which makes quantitative analysis difficult. However, quantitative analysis of marine accidents is necessary to reduce marine incidents. The purpose of this paper is to analyze marine incident data quantitatively by applying big data techniques to predict marine incident trends and reduce marine accident. To accomplish this, about 10,000 marine incident reports were prepared in a unified format through pre-processing. Using this preprocessed data, we first derived major keywords for the Marine incidents at sea using text mining techniques. Secondly, time series and cluster analysis were applied to major keywords. Trends for possible marine incidents were predicted. The results confirmed that it is possible to use quantified data and statistical analysis to address this topic. Also, we have confirmed that it is possible to provide information on preventive measures by grasping objective tendencies for marine incidents that may occur in the future through big data techniques.