• Title/Summary/Keyword: data pre-processing

Search Result 801, Processing Time 0.027 seconds

Souce Code Identification Using Deep Neural Network (심층신경망을 이용한 소스 코드 원작자 식별)

  • Rhim, Jisu;Abuhmed, Tamer
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.9
    • /
    • pp.373-378
    • /
    • 2019
  • Since many programming sources are open online, problems with reckless plagiarism and copyrights are occurring. Among them, source codes produced by repeated authors may have unique fingerprints due to their programming characteristics. This paper identifies each author by learning from a Google Code Jam program source using deep neural network. In this case, the original creator's source is to be vectored using a pre-processing instrument such as predictive-based vector or frequency-based approach, TF-IDF, etc. and to identify the original program source by learning by using a deep neural network. In addition a language-independent learning system was constructed using a pre-processing machine and compared with other existing learning methods. Among them, models using TF-IDF and in-depth neural networks were found to perform better than those using other pre-processing or other learning methods.

A System for Thermal Distortion Analysis of Hull Structures by Solar Radiation (선체의 태양복사 열변형 해석을 위한 전처리시스템)

  • Ha, Yunsok;Lee, Donghoon
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.53 no.4
    • /
    • pp.275-281
    • /
    • 2016
  • One of the most important things for quality to meet ship-production schedule is an accuracy control. A ship is assembled by welding through whole production process, so it is important that loss by correction will not happen as much as possible by using some engineering skills like reverse design, reverse setting and margin for thermal shrinkage. These efforts are a quite effective in fabrication stages, but not in erection stages. If a ship block which consists of common steel is exposed to directional solar radiation, its dimensional accuracy will change high as time by its thermal expansion coefficient. Therefore, the measuring work would be often done at dawn or evening even with having a very accurate device. In this study, an FE analysis method is developed to solve this problem. It can change measured data affected by solar thermal distortion to ones not, even though ship-block is measured at an arbitrary time. It will use the time when measuring, the direction of block and the weather record by satellites. It is confirmed by a comparison between measured data of a ship-block and the result by suggested analysis method. Furthermore, a pre-processing system is also developed for fast application of the suggested analysis method.

A Design and Implementation of Missing Person Identification System using face Recognition

  • Shin, Jong-Hwan;Park, Chan-Mi;Lee, Heon-Ju;Lee, Seoung-Hyeon;Lee, Jae-Kwang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.2
    • /
    • pp.19-25
    • /
    • 2021
  • In this paper proposes a method of finding missing persons based on face-recognition technology and deep learning. In this paper, a real-time face-recognition technology was developed, which performs face verification and improves the accuracy of face identification through data fortification for face recognition and convolutional neural network(CNN)-based image learning after the pre-processing of images transmitted from a mobile device. In identifying a missing person's image using the system implemented in this paper, the model that learned both original and blur-processed data performed the best. Further, a model using the pre-learned Noisy Student outperformed the one not using the same, but it has had a limitation of producing high levels of deflection and dispersion.

Effective Pre-rating Method Based on Users' Dichotomous Preferences and Average Ratings Fusion for Recommender Systems

  • Cheng, Shulin;Wang, Wanyan;Yang, Shan;Cheng, Xiufang
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.462-472
    • /
    • 2021
  • With an increase in the scale of recommender systems, users' rating data tend to be extremely sparse. Some methods have been utilized to alleviate this problem; nevertheless, it has not been satisfactorily solved yet. Therefore, we propose an effective pre-rating method based on users' dichotomous preferences and average ratings fusion. First, based on a user-item ratings matrix, a new user-item preference matrix was constructed to analyze and model user preferences. The items were then divided into two categories based on a parameterized dynamic threshold. The missing ratings for items that the user was not interested in were directly filled with the lowest user rating; otherwise, fusion ratings were utilized to fill the missing ratings. Further, an optimized parameter λ was introduced to adjust their weights. Finally, we verified our method on a standard dataset. The experimental results show that our method can effectively reduce the prediction error and improve the recommendation quality. As for its application, our method is effective, but not complicated.

Feature selection-based Risk Prediction for Hypertension in Korean men (한국 남성의 고혈압에 대한 특징 선택 기반 위험 예측)

  • Dashdondov, Khongorzul;Kim, Mi-Hye
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.323-325
    • /
    • 2021
  • In this article, we have improved the prediction of hypertension detection using the feature selection method for the Korean national health data named by the KNHANES database. The study identified a variety of risk factors associated with chronic hypertension. The paper is divided into two modules. The first of these is a data pre-processing step that uses a factor analysis (FA) based feature selection method from the dataset. The next module applies a predictive analysis step to detect and predict hypertension risk prediction. In this study, we compare the mean standard error (MSE), F1-score, and area under the ROC curve (AUC) for each classification model. The test results show that the proposed FIFA-OE-NB algorithm has an MSE, F1-score, and AUC outcomes 0.259, 0.460, and 64.70%, respectively. These results demonstrate that the proposed FIFA-OE method outperforms other models for hypertension risk predictions.

Investigation of light stimulated mouse brain activation in high magnetic field fMRI using image segmentation methods

  • Kim, Wook;Woo, Sang-Keun;Kang, Joo Hyun;Lim, Sang Moo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.12
    • /
    • pp.11-18
    • /
    • 2016
  • Magnetic resonance image (MRI) is widely used in brain research field and medical image. Especially, non-invasive brain activation acquired image technique, which is functional magnetic resonance image (fMRI) is used in brain study. In this study, we investigate brain activation occurred by LED light stimulation. For investigate of brain activation in experimental small animal, we used high magnetic field 9.4T MRI. Experimental small animal is Balb/c mouse, method of fMRI is using echo planar image (EPI). EPI method spend more less time than any other MRI method. For this reason, however, EPI data has low contrast. Due to the low contrast, image pre-processing is very hard and inaccuracy. In this study, we planned the study protocol, which is called block design in fMRI research field. The block designed has 8 LED light stimulation session and 8 rest session. All block is consist of 6 EPI images and acquired 1 slice of EPI image is 16 second. During the light session, we occurred LED light stimulation for 1 minutes 36 seconds. During the rest session, we do not occurred light stimulation and remain the light off state for 1 minutes 36 seconds. This session repeat the all over the EPI scan time, so the total spend time of EPI scan has almost 26 minutes. After acquired EPI data, we performed the analysis of this image data. In this study, we analysis of EPI data using statistical parametric map (SPM) software and performed image pre-processing such as realignment, co-registration, normalization, smoothing of EPI data. The pre-processing of fMRI data have to segmented using this software. However this method has 3 different method which is Gaussian nonparametric, warped modulate, and tissue probability map. In this study we performed the this 3 different method and compared how they can change the result of fMRI analysis results. The result of this study show that LED light stimulation was activate superior colliculus region in mouse brain. And the most higher activated value of segmentation method was using tissue probability map. this study may help to improve brain activation study using EPI and SPM analysis.

A Design of RSIDS using Rough Set Theory and Support Vector Machine Algorithm (Rough Set Theory와 Support Vector Machine 알고리즘을 이용한 RSIDS 설계)

  • Lee, Byung-Kwan;Jeong, Eun-Hee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.12
    • /
    • pp.179-185
    • /
    • 2012
  • This paper proposes a design of RSIDS(RST and SVM based Intrusion Detection System) using RST(Rough Set Theory) and SVM(Support Vector Machine) algorithm. The RSIDS consists of PrePro(PreProcessing) module, RRG(RST based Rule Generation) module, and SAD(SVM based Attack Detection) module. The PrePro module changes the collected information to the data format of RSIDS. The RRG module analyzes attack data, generates the rules of attacks, extracts attack information from the massive data by using these rules, and transfers the extracted attack information to the SAD module. The SAD module detects the attacks by using it, which the SAD module notifies to a manager. Therefore, compared to the existing SVM, the RSIDS improved average ADR(Attack Detection Ratio) from 77.71% to 85.28%, and reduced average FPR(False Positive ratio) from 13.25% to 9.87%. Thus, the RSIDS is estimated to have been improved, compared to the existing SVM.

New Medical Image Fusion Approach with Coding Based on SCD in Wireless Sensor Network

  • Zhang, De-gan;Wang, Xiang;Song, Xiao-dong
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.6
    • /
    • pp.2384-2392
    • /
    • 2015
  • The technical development and practical applications of big-data for health is one hot topic under the banner of big-data. Big-data medical image fusion is one of key problems. A new fusion approach with coding based on Spherical Coordinate Domain (SCD) in Wireless Sensor Network (WSN) for big-data medical image is proposed in this paper. In this approach, the three high-frequency coefficients in wavelet domain of medical image are pre-processed. This pre-processing strategy can reduce the redundant ratio of big-data medical image. Firstly, the high-frequency coefficients are transformed to the spherical coordinate domain to reduce the correlation in the same scale. Then, a multi-scale model product (MSMP) is used to control the shrinkage function so as to make the small wavelet coefficients and some noise removed. The high-frequency parts in spherical coordinate domain are coded by improved SPIHT algorithm. Finally, based on the multi-scale edge of medical image, it can be fused and reconstructed. Experimental results indicate the novel approach is effective and very useful for transmission of big-data medical image(especially, in the wireless environment).

Automated Geo-registration for Massive Satellite Image Processing

  • Heo, Joon;Park, Wan-Yong;Bang, Soo-Nam
    • 한국공간정보시스템학회:학술대회논문집
    • /
    • 2005.05a
    • /
    • pp.345-349
    • /
    • 2005
  • Massive amount of satellite image processing such asglobal/continental-level analysis and monitoring requires automated and speedy georegistration. There could be two major automated approaches: (1) rigid mathematical modeling using sensor model and ephemeris data; (2) heuristic co-registration approach with respect to existing reference image. In case of ETM+, the accuracy of the first approach is known as RMSE 250m, which is far below requested accuracy level for most of satellite image processing. On the other hands, the second approach is to find identical points between new image and reference image and use heuristic regression model for registration. The latter shows better accuracy but has problems with expensive computation. To improve efficiency of the coregistration approach, the author proposed a pre-qualified matching algorithm which is composed of feature extraction with canny operator and area matching algorithm with correlation coefficient. Throughout the pre-qualification approach, the computation time was significantly improved and make the registration accuracy is improved. A prototype was implemented and tested with the proposed algorithm. The performance test of 14 TM/ETM+ images in the U.S. showed: (1) average RMSE error of the approach was 0.47 dependent upon terrain and features; (2) the number average matching points were over 15,000; (3) the time complexity was 12 min per image with 3.2GHz Intel Pentium 4 and 1G Ram.

  • PDF

Performance analysis on the geometric correction algorithms using GCPs - polynomial warping and full camera modelling algorithm

  • Shin, Dong-Seok;Lee, Young-Ran
    • Proceedings of the KSRS Conference
    • /
    • 1998.09a
    • /
    • pp.252-256
    • /
    • 1998
  • Accurate mapping of satellite images is one of the most important Parts in many remote sensing applications. Since the position and the attitude of a satellite during image acquisition cannot be determined accurately enough, it is normal to have several hundred meters' ground-mapping errors in the systematically corrected images. The users which require a pixel-level or a sub-pixel level mapping accuracy for high-resolution satellite images must use a number of Ground Control Points (GCPs). In this paper, the performance of two geometric correction algorithms is tested and compared. One is the polynomial warping algorithm which is simple and popular enough to be implemented in most of the commercial satellite image processing software. The other is full camera modelling algorithm using Physical orbit-sensor-Earth geometry which is used in satellite image data receiving, pre-processing and distribution stations. Several criteria were considered for the performance analysis : ultimate correction accuracy, GCP representatibility, number of GCPs required, convergence speed, sensitiveness to inaccurate GCPs, usefulness of the correction results. This paper focuses on the usefulness of the precision correction algorithm for regular image pre-processing operations. This means that not only final correction accuracy but also the number of GCPs and their spatial distribution required for an image correction are important factors. Both correction algorithms were implemented and will be used for the precision correction of KITSAT-3 images.

  • PDF