• Title/Summary/Keyword: preprocessing

Search Result 2,093, Processing Time 0.026 seconds

Intellectual Structure Analysis on the Field of Open Data Using Co-word Analysis (동시출현단어 분석을 이용한 오픈 데이터 분야의 지적 구조 분석)

  • HyeKyung Lee;Yong-Gu Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.4
    • /
    • pp.429-450
    • /
    • 2023
  • The purpose of this study is to examine recent trends and intellectual structures in research related to open data. To achieve this, the study conducted a search for the keyword "open data" in Scopus and collected a total of 6,543 papers from 1999 to 2023. After data preprocessing, the study focused on the author keywords of 5,589 papers to perform network analysis and derive centrality in the field of open data research and linked open data research. As a result, the study found that "big data" exhibited the highest centrality in research related to open data. The research in this area mainly focuses on the utilization of open data as a concept of public data, studies on the application of open data in analysis related to big data as an associated concept, and research on topics related to the use of open data, such as the reproduction, utilization, and access of open data. In linked open data research, both triadic centrality and closeness centrality showed that "the semantic web" had the highest centrality. Moreover, it was observed that research emphasizing data linkage and relationship formation, rather than public data policies, was more prevalent in this field.

Improvement of Underground Cavity and Structure Detection Performance Through Machine Learning-based Diffraction Separation of GPR Data (기계학습 기반 회절파 분리 적용을 통한 GPR 탐사 자료의 도로 하부 공동 및 구조물 탐지 성능 향상)

  • Sooyoon Kim;Joongmoo Byun
    • Geophysics and Geophysical Exploration
    • /
    • v.26 no.4
    • /
    • pp.171-184
    • /
    • 2023
  • Machine learning (ML)-based cavity detection using a large amount of survey data obtained from vehicle-mounted ground penetrating radar (GPR) has been actively studied to identify underground cavities. However, only simple image processing techniques have been used for preprocessing the ML input, and many conventional seismic and GPR data processing techniques, which have been used for decades, have not been fully exploited. In this study, based on the idea that a cavity can be identified using diffraction, we applied ML-based diffraction separation to GPR data to increase the accuracy of cavity detection using the YOLO v5 model. The original ML-based seismic diffraction separation technique was modified, and the separated diffraction image was used as the input to train the cavity detection model. The performance of the proposed method was verified using public GPR data released by the Seoul Metropolitan Government. Underground cavities and objects were more accurately detected using separated diffraction images. In the future, the proposed method can be useful in various fields in which GPR surveys are used.

K-Means Clustering Algorithm and CPA based Collinear Multiple Static Obstacle Collision Avoidance for UAVs (K-평균 군집화 알고리즘 및 최근접점 기반 무인항공기용 공선상의 다중 정적 장애물 충돌 회피)

  • Hyeji Kim;Hyeok Kang;Seongbong Lee;Hyeongseok Kim;Dongjin Lee
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.6
    • /
    • pp.427-433
    • /
    • 2022
  • Obstacle detection, collision recognition, and avoidance technologies are required the collision avoidance technology for UAVs. In this paper, considering collinear multiple static obstacle, we propose an obstacle detection algorithm using LiDAR and a collision recognition and avoidance algorithm based on CPA. Preprocessing is performed to remove the ground from the LiDAR measurement data before obstacle detection. And we detect and classify obstacles in the preprocessed data using the K-means clustering algorithm. Also, we estimate the absolute positions of detected obstacles using relative navigation and correct the estimated positions using a low-pass filter. For collision avoidance with the detected multiple static obstacle, we use a collision recognition and avoidance algorithm based on CPA. Information of obstacles to be avoided is updated using distance between each obstacle, and collision recognition and avoidance are performed through the updated obstacles information. Finally, through obstacle location estimation, collision recognition, and collision avoidance result analysis in the Gazebo simulation environment, we verified that collision avoidance is performed successfully.

Study on Applicability of Cloth Simulation Filtering Algorithm for Segmentation of Ground Points from Drone LiDAR Point Clouds in Mountainous Areas (산악지형 드론 라이다 데이터 점군 분리를 위한 CSF 알고리즘 적용에 관한 연구)

  • Seul Koo ;Eon Taek Lim ;Yong Han Jung ;Jae Wook Suk ;Seong Sam Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_2
    • /
    • pp.827-835
    • /
    • 2023
  • Drone light detection and ranging (LiDAR) is a state-of-the-art surveying technology that enables close investigation of the top of the mountain slope or the inaccessible slope, and is being used for field surveys in mountainous terrain. To build topographic information using Drone LiDAR, a preprocessing process is required to effectively separate ground and non-ground points from the acquired point cloud. Therefore, in this study, the point group data of the mountain topography was acquired using an aerial LiDAR mounted on a commercial drone, and the application and accuracy of the cloth simulation filtering algorithm, one of the ground separation techniques, was verified. As a result of applying the algorithm, the separation accuracy of the ground and the non-ground was 84.3%, and the kappa coefficient was 0.71, and drone LiDAR data could be effectively used for landslide field surveys in mountainous terrain.

Generation of Time-Series Data for Multisource Satellite Imagery through Automated Satellite Image Collection (자동 위성영상 수집을 통한 다종 위성영상의 시계열 데이터 생성)

  • Yunji Nam;Sungwoo Jung;Taejung Kim;Sooahm Rhee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_4
    • /
    • pp.1085-1095
    • /
    • 2023
  • Time-series data generated from satellite data are crucial resources for change detection and monitoring across various fields. Existing research in time-series data generation primarily relies on single-image analysis to maintain data uniformity, with ongoing efforts to enhance spatial and temporal resolutions by utilizing diverse image sources. Despite the emphasized significance of time-series data, there is a notable absence of automated data collection and preprocessing for research purposes. In this paper, to address this limitation, we propose a system that automates the collection of satellite information in user-specified areas to generate time-series data. This research aims to collect data from various satellite sources in a specific region and convert them into time-series data, developing an automatic satellite image collection system for this purpose. By utilizing this system, users can collect and extract data for their specific regions of interest, making the data immediately usable. Experimental results have shown the feasibility of automatically acquiring freely available Landsat and Sentinel images from the web and incorporating manually inputted high-resolution satellite images. Comparisons between automatically collected and edited images based on high-resolution satellite data demonstrated minimal discrepancies, with no significant errors in the generated output.

An Accelerated Approach to Dose Distribution Calculation in Inverse Treatment Planning for Brachytherapy (근접 치료에서 역방향 치료 계획의 선량분포 계산 가속화 방법)

  • Byungdu Jo
    • Journal of the Korean Society of Radiology
    • /
    • v.17 no.5
    • /
    • pp.633-640
    • /
    • 2023
  • With the recent development of static and dynamic modulated brachytherapy methods in brachytherapy, which use radiation shielding to modulate the dose distribution to deliver the dose, the amount of parameters and data required for dose calculation in inverse treatment planning and treatment plan optimization algorithms suitable for new directional beam intensity modulated brachytherapy is increasing. Although intensity-modulated brachytherapy enables accurate dose delivery of radiation, the increased amount of parameters and data increases the elapsed time required for dose calculation. In this study, a GPU-based CUDA-accelerated dose calculation algorithm was constructed to reduce the increase in dose calculation elapsed time. The acceleration of the calculation process was achieved by parallelizing the calculation of the system matrix of the volume of interest and the dose calculation. The developed algorithms were all performed in the same computing environment with an Intel (3.7 GHz, 6-core) CPU and a single NVIDIA GTX 1080ti graphics card, and the dose calculation time was evaluated by measuring only the dose calculation time, excluding the additional time required for loading data from disk and preprocessing operations. The results showed that the accelerated algorithm reduced the dose calculation time by about 30 times compared to the CPU-only calculation. The accelerated dose calculation algorithm can be expected to speed up treatment planning when new treatment plans need to be created to account for daily variations in applicator movement, such as in adaptive radiotherapy, or when dose calculation needs to account for changing parameters, such as in dynamically modulated brachytherapy.

CNN-LSTM-based Upper Extremity Rehabilitation Exercise Real-time Monitoring System (CNN-LSTM 기반의 상지 재활운동 실시간 모니터링 시스템)

  • Jae-Jung Kim;Jung-Hyun Kim;Sol Lee;Ji-Yun Seo;Do-Un Jeong
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.3
    • /
    • pp.134-139
    • /
    • 2023
  • Rehabilitators perform outpatient treatment and daily rehabilitation exercises to recover physical function with the aim of quickly returning to society after surgical treatment. Unlike performing exercises in a hospital with the help of a professional therapist, there are many difficulties in performing rehabilitation exercises by the patient on a daily basis. In this paper, we propose a CNN-LSTM-based upper limb rehabilitation real-time monitoring system so that patients can perform rehabilitation efficiently and with correct posture on a daily basis. The proposed system measures biological signals through shoulder-mounted hardware equipped with EMG and IMU, performs preprocessing and normalization for learning, and uses them as a learning dataset. The implemented model consists of three polling layers of three synthetic stacks for feature detection and two LSTM layers for classification, and we were able to confirm a learning result of 97.44% on the validation data. After that, we conducted a comparative evaluation with the Teachable machine, and as a result of the comparative evaluation, we confirmed that the model was implemented at 93.6% and the Teachable machine at 94.4%, and both models showed similar classification performance.

NFT(Non-Fungible Token) Patent Trend Analysis using Topic Modeling

  • Sin-Nyum Choi;Woong Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.41-48
    • /
    • 2023
  • In this paper, we propose an analysis of recent trends in the NFT (Non-Fungible Token) industry using topic modeling techniques, focusing on their universal application across various industrial fields. For this study, patent data was utilized to understand industry trends. We collected data on 371 domestic and 454 international NFT-related patents registered in the patent information search service KIPRIS from 2017, when the first NFT standard was introduced, to October 2023. In the preprocessing stage, stopwords and lemmas were removed, and only noun words were extracted. For the analysis, the top 50 words by frequency were listed, and their corresponding TF-IDF values were examined to derive key keywords of the industry trends. Next, Using the LDA algorithm, we identified four major latent topics within the patent data, both domestically and internationally. We analyzed these topics and presented our findings on NFT industry trends, underpinned by real-world industry cases. While previous review presented trends from an academic perspective using paper data, this study is significant as it provides practical trend information based on data rooted in field practice. It is expected to be a useful reference for professionals in the NFT industry for understanding market conditions and generating new items.

Analysis of Research Trends Related to drug Repositioning Based on Machine Learning (머신러닝 기반의 신약 재창출 관련 연구 동향 분석)

  • So Yeon Yoo;Gyoo Gun Lim
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.21-37
    • /
    • 2022
  • Drug repositioning, one of the methods of developing new drugs, is a useful way to discover new indications by allowing drugs that have already been approved for use in people to be used for other purposes. Recently, with the development of machine learning technology, the case of analyzing vast amounts of biological information and using it to develop new drugs is increasing. The use of machine learning technology to drug repositioning will help quickly find effective treatments. Currently, the world is having a difficult time due to a new disease caused by coronavirus (COVID-19), a severe acute respiratory syndrome. Drug repositioning that repurposes drugsthat have already been clinically approved could be an alternative to therapeutics to treat COVID-19 patients. This study intends to examine research trends in the field of drug repositioning using machine learning techniques. In Pub Med, a total of 4,821 papers were collected with the keyword 'Drug Repositioning'using the web scraping technique. After data preprocessing, frequency analysis, LDA-based topic modeling, random forest classification analysis, and prediction performance evaluation were performed on 4,419 papers. Associated words were analyzed based on the Word2vec model, and after reducing the PCA dimension, K-Means clustered to generate labels, and then the structured organization of the literature was visualized using the t-SNE algorithm. Hierarchical clustering was applied to the LDA results and visualized as a heat map. This study identified the research topics related to drug repositioning, and presented a method to derive and visualize meaningful topics from a large amount of literature using a machine learning algorithm. It is expected that it will help to be used as basic data for establishing research or development strategies in the field of drug repositioning in the future.

A Study on the Drug Classification Using Machine Learning Techniques (머신러닝 기법을 이용한 약물 분류 방법 연구)

  • Anmol Kumar Singh;Ayush Kumar;Adya Singh;Akashika Anshum;Pradeep Kumar Mallick
    • Advanced Industrial SCIence
    • /
    • v.3 no.2
    • /
    • pp.8-16
    • /
    • 2024
  • This paper shows the system of drug classification, the goal of this is to foretell the apt drug for the patients based on their demographic and physiological traits. The dataset consists of various attributes like Age, Sex, BP (Blood Pressure), Cholesterol Level, and Na_to_K (Sodium to Potassium ratio), with the objective to determine the kind of drug being given. The models used in this paper are K-Nearest Neighbors (KNN), Logistic Regression and Random Forest. Further to fine-tune hyper parameters using 5-fold cross-validation, GridSearchCV was used and each model was trained and tested on the dataset. To assess the performance of each model both with and without hyper parameter tuning evaluation metrics like accuracy, confusion matrices, and classification reports were used and the accuracy of the models without GridSearchCV was 0.7, 0.875, 0.975 and with GridSearchCV was 0.75, 1.0, 0.975. According to GridSearchCV Logistic Regression is the most suitable model for drug classification among the three-model used followed by the K-Nearest Neighbors. Also, Na_to_K is an essential feature in predicting the outcome.