• Title/Summary/Keyword: preprocessing

Search Result 2,062, Processing Time 0.028 seconds

Analysis of Research Trends in New Drug Development with Artificial Intelligence Using Text Mining (텍스트 마이닝을 이용한 인공지능 활용 신약 개발 연구 동향 분석)

  • Jae Woo Nam;Young Jun Kim
    • Journal of Life Science
    • /
    • v.33 no.8
    • /
    • pp.663-679
    • /
    • 2023
  • This review analyzes research trends related to new drug development using artificial intelligence from 2010 to 2022. This analysis organized the abstracts of 2,421 studies into a corpus, and words with high frequency and high connection centrality were extracted through preprocessing. The analysis revealed a similar word frequency trend between 2010 and 2019 to that between 2020 and 2022. In terms of the research method, many studies using machine learning were conducted from 2010 to 2020, and since 2021, research using deep learning has been increasing. Through these studies, we investigated the trends in research on artificial intelligence utilization by field and the strengths, problems, and challenges of related research. We found that since 2021, the application of artificial intelligence has been expanding, such as research using artificial intelligence for drug rearrangement, using computers to develop anticancer drugs, and applying artificial intelligence to clinical trials. This article briefly presents the prospects of new drug development research using artificial intelligence. If the reliability and safety of bio and medical data are ensured, and the development of the above artificial intelligence technology continues, it is judged that the direction of new drug development using artificial intelligence will proceed to personalized medicine and precision medicine, so we encourage efforts in that field.

Analysis of Research Trends in Information Literacy Education Using Keyword Network Analysis and Topic Modeling (키워드 네트워크 분석과 토픽모델링을 활용한 정보활용교육 연구 동향 분석)

  • Jeong-Hoon, Lim
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.4
    • /
    • pp.23-48
    • /
    • 2022
  • The purpose of this study is to investigate the flow of domestic information literacy education research using keyword network analysis and topic modeling and to explore the direction of information literacy education in the future. For this reason, 306 academic papers related to information literacy education published in academic journals of the library and information science field in Korea were chosen. And through the preprocessing process for abstracts of the paper, total keyword appearance frequency, keyword appearance frequency by period, and keyword simultaneous occurrence frequency were analyzed. Subsequently, keyword network analysis analyzed the degree centrality, between centrality, and eigenvector centrality of keywords. Using structural topic modeling analysis, 15 topics -curriculum, information literacy effect, contents of information literacy education, school library education, information media literacy, information literacy ability evaluation index, library anxiety, public library program, health information literacy ability, digital divide, library assisted instruction improvement, research trend, information literacy model, and teacher role-were derived. In addition, the trend of topics by year was analyzed to confirm the change in relative weight by topic. Based on these results, the direction of information literacy education and the suggestions for follow-up research were presented.

Comparison of Adversarial Example Restoration Performance of VQ-VAE Model with or without Image Segmentation (이미지 분할 여부에 따른 VQ-VAE 모델의 적대적 예제 복원 성능 비교)

  • Tae-Wook Kim;Seung-Min Hyun;Ellen J. Hong
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.23 no.4
    • /
    • pp.194-199
    • /
    • 2022
  • Preprocessing for high-quality data is required for high accuracy and usability in various and complex image data-based industries. However, when a contaminated hostile example that combines noise with existing image or video data is introduced, which can pose a great risk to the company, it is necessary to restore the previous damage to ensure the company's reliability, security, and complete results. As a countermeasure for this, restoration was previously performed using Defense-GAN, but there were disadvantages such as long learning time and low quality of the restoration. In order to improve this, this paper proposes a method using adversarial examples created through FGSM according to image segmentation in addition to using the VQ-VAE model. First, the generated examples are classified as a general classifier. Next, the unsegmented data is put into the pre-trained VQ-VAE model, restored, and then classified with a classifier. Finally, the data divided into quadrants is put into the 4-split-VQ-VAE model, the reconstructed fragments are combined, and then put into the classifier. Finally, after comparing the restored results and accuracy, the performance is analyzed according to the order of combining the two models according to whether or not they are split.

Ground Subsidence Risk Grade Prediction Model Based on Machine Learning According to the Underground Facility Properties and Density (기계학습 기반 지하매설물 속성 및 밀집도를 활용한 지반함몰 위험도 예측 모델)

  • Sungyeol Lee;Jaemo Kang;Jinyoung Kim
    • Journal of the Korean GEO-environmental Society
    • /
    • v.24 no.4
    • /
    • pp.23-29
    • /
    • 2023
  • Ground subsidence shows a mechanism in which the upper ground collapses due to the formation of a cavity due to the movement of soil particles in the ground due to the formation of a waterway because of damage to the water supply/sewer pipes. As a result, cavity is created in the ground and the upper ground is collapsing. Therefore, ground subsidence frequently occurs mainly in downtown areas where a large amount of underground facilities are buried. Accordingly, research to predict the risk of ground subsidence is continuously being conducted. This study tried to present a ground subsidence risk prediction model for two districts of ○○ city. After constructing a data set and performing preprocessing, using the property data of underground facilities in the target area (year of service, pipe diameter), density of underground facilities, and ground subsidence history data. By applying the dataset to the machine learning model, it is evaluated the reliability of the selected model and the importance of the influencing factors used in predicting the ground subsidence risk derived from the model is presented.

Verification of Ground Subsidence Risk Map Based on Underground Cavity Data Using DNN Technique (DNN 기법을 활용한 지하공동 데이터기반의 지반침하 위험 지도 작성)

  • Han Eung Kim;Chang Hun Kim;Tae Geon Kim;Jeong Jun Park
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.2
    • /
    • pp.334-343
    • /
    • 2023
  • Purpose: In this study, the cavity data found through ground cavity exploration was combined with underground facilities to derive a correlation, and the ground subsidence prediction map was verified based on the AI algorithm. Method: The study was conducted in three stages. The stage of data investigation and big data collection related to risk assessment. Data pre-processing steps for AI analysis. And it is the step of verifying the ground subsidence risk prediction map using the AI algorithm. Result: By analyzing the ground subsidence risk prediction map prepared, it was possible to confirm the distribution of risk grades in three stages of emergency, priority, and general for Busanjin-gu and Saha-gu. In addition, by arranging the predicted ground subsidence risk ratings for each section of the road route, it was confirmed that 3 out of 61 sections in Busanjin-gu and 7 out of 68 sections in Sahagu included roads with emergency ratings. Conclusion: Based on the verified ground subsidence risk prediction map, it is possible to provide citizens with a safe road environment by setting the exploration section according to the risk level and conducting investigation.

A Review of Seismic Full Waveform Inversion Based on Deep Learning (딥러닝 기반 탄성파 전파형 역산 연구 개관)

  • Sukjoon, Pyun;Yunhui, Park
    • Geophysics and Geophysical Exploration
    • /
    • v.25 no.4
    • /
    • pp.227-241
    • /
    • 2022
  • Full waveform inversion (FWI) in the field of seismic data processing is an inversion technique that is used to estimate the velocity model of the subsurface for oil and gas exploration. Recently, deep learning (DL) technology has been increasingly used for seismic data processing, and its combination with FWI has attracted remarkable research efforts. For example, DL-based data processing techniques have been utilized for preprocessing input data for FWI, enabling the direct implementation of FWI through DL technology. DL-based FWI can be divided into the following methods: pure data-based, physics-based neural network, encoder-decoder, reparameterized FWI, and physics-informed neural network. In this review, we describe the theory and characteristics of the methods by systematizing them in the order of advancements. In the early days of DL-based FWI, the DL model predicted the velocity model by preparing a large training data set to adopt faithfully the basic principles of data science and apply a pure data-based prediction model. The current research trend is to supplement the shortcomings of the pure data-based approach using the loss function consisting of seismic data or physical information from the wave equation itself in deep neural networks. Based on these developments, DL-based FWI has evolved to not require a large amount of learning data, alleviating the cycle-skipping problem, which is an intrinsic limitation of FWI, and reducing computation times dramatically. The value of DL-based FWI is expected to increase continually in the processing of seismic data.

Development of a Real-time Ship Operational Efficiency Analysis Model (선박운항데이터 기반 실시간 선박운항효율 분석 모델 개발)

  • Taemin Hwang;Hyoseon Hwang;Ik-Hyun Youn
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.1
    • /
    • pp.60-66
    • /
    • 2023
  • Currently, the maritime industry is focusing on developing technologies that promote autonomy and intelligence, such as smart ships, autonomous ships, and eco-friendly technologies, to enhance ship operational efficiency. Many countries are conducting research on different methods to ensure ship safety while increasing operational efficiency. This study aims to develop a real-time ship operational efficiency analysis model using data analysis methods to address the current limitations of the present technologies in the real-time evaluation of operational efficiency. The model selected ship operational efficiency factors and ship operational condition factors to compare the operational efficiency of the ship with present and classified factors to determine whether the present ship operational efficiency is appropriate. The study involved selecting a target ship, collecting data, preprocessing data, and developing classification models. The results of the research were obtained by determining the improved ship operational efficiency based on the ship operational condition factors to support ship operators.

Short-Term Precipitation Forecasting based on Deep Neural Network with Synthetic Weather Radar Data (기상레이더 강수 합성데이터를 활용한 심층신경망 기반 초단기 강수예측 기술 연구)

  • An, Sojung;Choi, Youn;Son, MyoungJae;Kim, Kwang-Ho;Jung, Sung-Hwa;Park, Young-Youn
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.43-45
    • /
    • 2021
  • The short-term quantitative precipitation prediction (QPF) system is important socially and economically to prevent damage from severe weather. Recently, many studies for short-term QPF model applying the Deep Neural Network (DNN) has been conducted. These studies require the sophisticated pre-processing because the mistreatment of various and vast meteorological data sets leads to lower performance of QPF. Especially, for more accurate prediction of the non-linear trends in precipitation, the dataset needs to be carefully handled based on the physical and dynamical understands the data. Thereby, this paper proposes the following approaches: i) refining and combining major factors (weather radar, terrain, air temperature, and so on) related to precipitation development in order to construct training data for pattern analysis of precipitation; ii) producing predicted precipitation fields based on Convolutional with ConvLSTM. The proposed algorithm was evaluated by rainfall events in 2020. It is outperformed in the magnitude and strength of precipitation, and clearly predicted non-linear pattern of precipitation. The algorithm can be useful as a forecasting tool for preventing severe weather.

  • PDF

A Classification Model for Customs Clearance Inspection Results of Imported Aquatic Products Using Machine Learning Techniques (머신러닝 기법을 활용한 수입 수산물 통관검사결과 분류 모델)

  • Ji Seong Eom;Lee Kyung Hee;Wan-Sup Cho
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.157-165
    • /
    • 2023
  • Seafood is a major source of protein in many countries and its consumption is increasing. In Korea, consumption of seafood is increasing, but self-sufficiency rate is decreasing, and the importance of safety management is increasing as the amount of imported seafood increases. There are hundreds of species of aquatic products imported into Korea from over 110 countries, and there is a limit to relying only on the experience of inspectors for safety management of imported aquatic products. Based on the data, a model that can predict the customs inspection results of imported aquatic products is developed, and a machine learning classification model that determines the non-conformity of aquatic products when an import declaration is submitted is created. As a result of customs inspection of imported marine products, the nonconformity rate is less than 1%, which is very low imbalanced data. Therefore, a sampling method that can complement these characteristics was comparatively studied, and a preprocessing method that can interpret the classification result was applied. Among various machine learning-based classification models, Random Forest and XGBoost showed good performance. The model that predicts both compliance and non-conformance well as a result of the clearance inspection is the basic random forest model to which ADASYN and one-hot encoding are applied, and has an accuracy of 99.88%, precision of 99.87%, recall of 99.89%, and AUC of 99.88%. XGBoost is the most stable model with all indicators exceeding 90% regardless of oversampling and encoding type.

A Study about Learning Graph Representation on Farmhouse Apple Quality Images with Graph Transformer (그래프 트랜스포머 기반 농가 사과 품질 이미지의 그래프 표현 학습 연구)

  • Ji Hun Bae;Ju Hwan Lee;Gwang Hyun Yu;Gyeong Ju Kwon;Jin Young Kim
    • Smart Media Journal
    • /
    • v.12 no.1
    • /
    • pp.9-16
    • /
    • 2023
  • Recently, a convolutional neural network (CNN) based system is being developed to overcome the limitations of human resources in the apple quality classification of farmhouse. However, since convolutional neural networks receive only images of the same size, preprocessing such as sampling may be required, and in the case of oversampling, information loss of the original image such as image quality degradation and blurring occurs. In this paper, in order to minimize the above problem, to generate a image patch based graph of an original image and propose a random walk-based positional encoding method to apply the graph transformer model. The above method continuously learns the position embedding information of patches which don't have a positional information based on the random walk algorithm, and finds the optimal graph structure by aggregating useful node information through the self-attention technique of graph transformer model. Therefore, it is robust and shows good performance even in a new graph structure of random node order and an arbitrary graph structure according to the location of an object in an image. As a result, when experimented with 5 apple quality datasets, the learning accuracy was higher than other GNN models by a minimum of 1.3% to a maximum of 4.7%, and the number of parameters was 3.59M, which was about 15% less than the 23.52M of the ResNet18 model. Therefore, it shows fast reasoning speed according to the reduction of the amount of computation and proves the effect.