• Title/Summary/Keyword: preprocessing

Search Result 2,062, Processing Time 0.027 seconds

Research of Water-related Disaster Monitoring Using Satellite Bigdata Based on Google Earth Engine Cloud Computing Platform (구글어스엔진 클라우드 컴퓨팅 플랫폼 기반 위성 빅데이터를 활용한 수재해 모니터링 연구)

  • Park, Jongsoo;Kang, Ki-mook
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_3
    • /
    • pp.1761-1775
    • /
    • 2022
  • Due to unpredictable climate change, the frequency of occurrence of water-related disasters and the scale of damage are also continuously increasing. In terms of disaster management, it is essential to identify the damaged area in a wide area and monitor for mid-term and long-term forecasting. In the field of water disasters, research on remote sensing technology using Synthetic Aperture Radar (SAR) satellite images for wide-area monitoring is being actively conducted. Time-series analysis for monitoring requires a complex preprocessing process that collects a large amount of images and considers the noisy radar characteristics, and for this, a considerable amount of time is required. With the recent development of cloud computing technology, many platforms capable of performing spatiotemporal analysis using satellite big data have been proposed. Google Earth Engine (GEE)is a representative platform that provides about 600 satellite data for free and enables semi real time space time analysis based on the analysis preparation data of satellite images. Therefore, in this study, immediate water disaster damage detection and mid to long term time series observation studies were conducted using GEE. Through the Otsu technique, which is mainly used for change detection, changes in river width and flood area due to river flooding were confirmed, centered on the torrential rains that occurred in 2020. In addition, in terms of disaster management, the change trend of the time series waterbody from 2018 to 2022 was confirmed. The short processing time through javascript based coding, and the strength of spatiotemporal analysis and result expression, are expected to enable use in the field of water disasters. In addition, it is expected that the field of application will be expanded through connection with various satellite bigdata in the future.

Application of Dimensional Expansion and Reduction to Earthquake Catalog for Machine Learning Analysis (기계학습 분석을 위한 차원 확장과 차원 축소가 적용된 지진 카탈로그)

  • Jang, Jinsu;So, Byung-Dal
    • The Journal of Engineering Geology
    • /
    • v.32 no.3
    • /
    • pp.377-388
    • /
    • 2022
  • Recently, several studies have utilized machine learning to efficiently and accurately analyze seismic data that are exponentially increasing. In this study, we expand earthquake information such as occurrence time, hypocentral location, and magnitude to produce a dataset for applying to machine learning, reducing the dimension of the expended data into dominant features through principal component analysis. The dimensional extended data comprises statistics of the earthquake information from the Global Centroid Moment Tensor catalog containing 36,699 seismic events. We perform data preprocessing using standard and max-min scaling and extract dominant features with principal components analysis from the scaled dataset. The scaling methods significantly reduced the deviation of feature values caused by different units. Among them, the standard scaling method transforms the median of each feature with a smaller deviation than other scaling methods. The six principal components extracted from the non-scaled dataset explain 99% of the original data. The sixteen principal components from the datasets, which are applied with standardization or max-min scaling, reconstruct 98% of the original datasets. These results indicate that more principal components are needed to preserve original data information with even distributed feature values. We propose a data processing method for efficient and accurate machine learning model to analyze the relationship between seismic data and seismic behavior.

Development of Registration Post-Processing Technology to Homogenize the Density of the Scan Data of Earthwork Sites (토공현장 스캔데이터 밀도 균일화를 위한 정합 후처리 기술 개발)

  • Kim, Yonggun;Park, Suyeul;Kim, Seok
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.42 no.5
    • /
    • pp.689-699
    • /
    • 2022
  • Recently, high productivity capabilities have been improved due to the application of advanced technologies in various industries, but in the construction industry, productivity improvements have been relatively low. Research on advanced technology for the construction industry is being conducted quickly to overcome the current low productivity. Among advanced technologies, 3D scan technology is widely used for creating 3D digital terrain models at construction sites. In particular, the 3D digital terrain model provides basic data for construction automation processes, such as earthwork machine guidance and control. The quality of the 3D digital terrain model has a lot of influence not only on the performance and acquisition environment of the 3D scanner, but also on the denoising, registration and merging process, which is a preprocessing process for creating a 3D digital terrain model after acquiring terrain scan data. Therefore, it is necessary to improve the terrain scan data processing performance. This study seeks to solve the problem of density inhomogeneity in terrain scan data that arises during the pre-processing step. The study suggests a 'pixel-based point cloud comparison algorithm' and verifies the performance of the algorithm using terrain scan data obtained at an actual earthwork site.

Analysis of Research Trends in Tax Compliance using Topic Modeling (토픽모델링을 활용한 조세순응 연구 동향 분석)

  • Kang, Min-Jo;Baek, Pyoung-Gu
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.99-115
    • /
    • 2022
  • In this study, domestic academic journal papers on tax compliance, tax consciousness, and faithful tax payment (hereinafter referred to as "tax compliance") were comprehensively analyzed from an interdisciplinary perspective as a representative research topic in the field of tax science. To achieve the research purpose, topic modeling technique was applied as part of text mining. In the flow of data collection-keyword preprocessing-topic model analysis, potential research topics were presented from tax compliance related keywords registered by the researcher in a total of 347 papers. The results of this study can be summarized as follows. First, in the keyword analysis, keywords such as tax investigation, tax avoidance, and honest tax reporting system were included in the top 5 keywords based on simple term-frequency, and in the TF-IDF value considering the relative importance of keywords, they were also included in the top 5 keywords. On the other hand, the keyword, tax evasion, was included in the top keyword based on the TF-IDF value, whereas it was not highlighted in the simple term-frequency. Second, eight potential research topics were derived through topic modeling. The topics covered are (1) tax fairness and suppression of tax offenses, (2) the ideology of the tax law and the validity of tax policies, (3) the principle of substance over form and guarantee of tax receivables (4) tax compliance costs and tax administration services, (5) the tax returns self- assessment system and tax experts, (6) tax climate and strategic tax behavior, (7) multifaceted tax behavior and differential compliance intentions, (8) tax information system and tax resource management. The research comprehensively looked at the various perspectives on the tax compliance from an interdisciplinary perspective, thereby comprehensively grasping past research trends on tax compliance and suggesting the direction of future research.

Generative optical flow based abnormal object detection method using a spatio-temporal translation network

  • Lim, Hyunseok;Gwak, Jeonghwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.11-19
    • /
    • 2021
  • An abnormal object refers to a person, an object, or a mechanical device that performs abnormal and unusual behavior and needs observation or supervision. In order to detect this through artificial intelligence algorithm without continuous human intervention, a method of observing the specificity of temporal features using optical flow technique is widely used. In this study, an abnormal situation is identified by learning an algorithm that translates an input image frame to an optical flow image using a Generative Adversarial Network (GAN). In particular, we propose a technique that improves the pre-processing process to exclude unnecessary outliers and the post-processing process to increase the accuracy of identification in the test dataset after learning to improve the performance of the model's abnormal behavior identification. UCSD Pedestrian and UMN Unusual Crowd Activity were used as training datasets to detect abnormal behavior. For the proposed method, the frame-level AUC 0.9450 and EER 0.1317 were shown in the UCSD Ped2 dataset, which shows performance improvement compared to the models in the previous studies.

A Study on the Development of a Fire Site Risk Prediction Model based on Initial Information using Big Data Analysis (빅데이터 분석을 활용한 초기 정보 기반 화재현장 위험도 예측 모델 개발 연구)

  • Kim, Do Hyoung;Jo, Byung wan
    • Journal of the Society of Disaster Information
    • /
    • v.17 no.2
    • /
    • pp.245-253
    • /
    • 2021
  • Purpose: This study develops a risk prediction model that predicts the risk of a fire site by using initial information such as building information and reporter acquisition information, and supports effective mobilization of fire fighting resources and the establishment of damage minimization strategies for appropriate responses in the early stages of a disaster. Method: In order to identify the variables related to the fire damage scale on the fire statistics data, a correlation analysis between variables was performed using a machine learning algorithm to examine predictability, and a learning data set was constructed through preprocessing such as data standardization and discretization. Using this, we tested a plurality of machine learning algorithms, which are evaluated as having high prediction accuracy, and developed a risk prediction model applying the algorithm with the highest accuracy. Result: As a result of the machine learning algorithm performance test, the accuracy of the random forest algorithm was the highest, and it was confirmed that the accuracy of the intermediate value was relatively high for the risk class. Conclusion: The accuracy of the prediction model was limited due to the bias of the damage scale data in the fire statistics, and data refinement by matching data and supplementing the missing values was necessary to improve the predictive model performance.

Efficient CT Image Denoising Using Deformable Convolutional AutoEncoder Model

  • Eon Seung, Seong;Seong Hyun, Han;Ji Hye, Heo;Dong Hoon, Lim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.3
    • /
    • pp.25-33
    • /
    • 2023
  • Noise generated during the acquisition and transmission of CT images acts as a factor that degrades image quality. Therefore, noise removal to solve this problem is an important preprocessing process in image processing. In this paper, we remove noise by using a deformable convolutional autoencoder (DeCAE) model in which deformable convolution operation is applied instead of the existing convolution operation in the convolutional autoencoder (CAE) model of deep learning. Here, the deformable convolution operation can extract features of an image in a more flexible area than the conventional convolution operation. The proposed DeCAE model has the same encoder-decoder structure as the existing CAE model, but the encoder is composed of deformable convolutional layers and the decoder is composed of conventional convolutional layers for efficient noise removal. To evaluate the performance of the DeCAE model proposed in this paper, experiments were conducted on CT images corrupted by various noises, that is, Gaussian noise, impulse noise, and Poisson noise. As a result of the performance experiment, the DeCAE model has more qualitative and quantitative measures than the traditional filters, that is, the Mean filter, Median filter, Bilateral filter and NL-means method, as well as the existing CAE models, that is, MAE (Mean Absolute Error), PSNR (Peak Signal-to-Noise Ratio) and SSIM. (Structural Similarity Index Measure) showed excellent results.

Analyzing the Phenomena of Hate in Korea by Text Mining Techniques (텍스트마이닝 기법을 이용한 한국 사회의 혐오 양상 분석)

  • Hea-Jin, Kim
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.56 no.4
    • /
    • pp.431-453
    • /
    • 2022
  • Hate is a collective expression of exclusivity toward others and it is fostered and reproduced through false public perception. This study aims to explore the objects and issues of hate discussed in our society using text mining techniques. To this end, we collected 17,867 news data published from 1990 to 2020 and constructed a co-word network and cluster analysis. In order to derive an explicit co-word network highly related to hate, we carried out sentence split and extracted a total of 52,520 sentences containing the words 'hate', 'prejudice' and 'discrimination' in the preprocessing phase. As a result of analyzing the frequency of words in the collected news data, the subjects that appeared most frequently in relation to hate in our society were women, race, and sexual minorities, and the related issues were related laws and crimes. As a result of cluster analysis based on the co-word network, we found a total of six hate-related clusters. The largest cluster was 'genderphobic', accounting for 41.4% of the total, followed by 'sexual minority hatred' at 28.7%, 'racial hatred' at 15.1%, 'selective hatred' at 8.5%, 'political hatred' accounted for 5.7% and 'environmental hatred' accounted for 0.3%. In the discussion, we comprehensively extracted all specific hate target names from the collected news data, which were not specifically revealed as a result of the cluster analysis.

An Analysis Model Study on the Vulnerability in the Infectious Disease Spread of Public-use Facilities neighboring Senior Leisure Welfare Facilities (노인여가복지시설 주변 다중이용시설에서의 감염병 확산 취약성 분석 모델에 관한 연구)

  • Kim, Mijung;Kweon, Jihoon
    • Journal of The Korea Institute of Healthcare Architecture
    • /
    • v.28 no.4
    • /
    • pp.41-50
    • /
    • 2022
  • Purpose: This study aims to suggest an analysis model finding the relationship between building scale characteristics of Public-use facilities and infectious disease outbreaks around senior leisure welfare facilities and the features and their scopes where quarantine resources are to be concentrated. Methods: Reviewing previous studies found the user characteristics of senior leisure welfare facilities and scale characteristics of urban architectures. The data preprocessing was performed after collecting building data and infectious disease outbreak data in the analysis area. This study derived data for attributes of building size and frequency of infectious disease outbreaks in Public-use facilities around senior leisure welfare facilities. A computing algorithm was implemented to analyze the correlation between the building size characteristics and the infectious disease outbreak frequency as per the change of the spatial scope. Results: The results of this study are as follows: First, the suggested model was to analyze the correlation between the infection frequency and the number of senior leisure welfare facilities, the number of Public-use facilities, building area, total floor area, site area, height, building-to-land ratio, and floor area ratio varied as per the change of spatial scope. Second, correlation results varied between the infection frequency and the number of senior leisure welfare facilities, the number of Public-use facilities, building area, total floor area, site area, height, building-to-land ratio, and floor area ratio. Third, a negative correlation appeared in the analysis between the number of senior leisure welfare facilities and infection frequency. And positive correlations appeared noticeably in the study between the number of Public-use facilities, building area, total floor area, height, building-to-land ratio, and floor area ratio. Implications: This study can be used as primary data on the utilization of limited quarantine resources by analyzing the relationship between the Public-use facilities around the senior leisure welfare facilities and the spread of infectious diseases. In addition, it suggests that infectious disease prevention measures are necessary considering the spatial scope of the analysis area and the size of buildings.

Implementation of CNN-based Classification Training Model for Unstructured Fashion Image Retrieval using Preprocessing with MASK R-CNN (비정형 패션 이미지 검색을 위한 MASK R-CNN 선형처리 기반 CNN 분류 학습모델 구현)

  • Seunga, Cho;Hayoung, Lee;Hyelim, Jang;Kyuri, Kim;Hyeon-Ji, Lee;Bong-Ki, Son;Jaeho, Lee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.6
    • /
    • pp.13-23
    • /
    • 2022
  • In this paper, we propose a detailed component image classification algorithm by fashion item for unstructured data retrieval in the fashion field. Due to the COVID-19 environment, AI-based online shopping malls are increasing recently. However, there is a limit to accurate unstructured data search with existing keyword search and personalized style recommendations based on user surfing behavior. In this study, pre-processing using Mask R-CNN was conducted using images crawled from online shopping sites and then classified components for each fashion item through CNN. We obtain the accuaracy for collar of the shirt's as 93.28%, the pattern of the shirt as 98.10%, the 3 classese fit of the jeans as 91.73%, And, we further obtained one for the 4 classes fit of jeans as 81.59% and the color of the jeans as 93.91%. At the results for the decorated items, we also obtained the accuract of the washing of the jeans as 91.20% and the demage of jeans accuaracy as 92.96%.