• Title/Summary/Keyword: Feature engineering

Search Result 5,860, Processing Time 0.032 seconds

Multimodal Sentiment Analysis Using Review Data and Product Information (리뷰 데이터와 제품 정보를 이용한 멀티모달 감성분석)

  • Hwang, Hohyun;Lee, Kyeongchan;Yu, Jinyi;Lee, Younghoon
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.1
    • /
    • pp.15-28
    • /
    • 2022
  • Due to recent expansion of online market such as clothing, utilizing customer review has become a major marketing measure. User review has been used as a tool of analyzing sentiment of customers. Sentiment analysis can be largely classified with machine learning-based and lexicon-based method. Machine learning-based method is a learning classification model referring review and labels. As research of sentiment analysis has been developed, multi-modal models learned by images and video data in reviews has been studied. Characteristics of words in reviews are differentiated depending on products' and customers' categories. In this paper, sentiment is analyzed via considering review data and metadata of products and users. Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Self Attention-based Multi-head Attention models and Bidirectional Encoder Representation from Transformer (BERT) are used in this study. Same Multi-Layer Perceptron (MLP) model is used upon every products information. This paper suggests a multi-modal sentiment analysis model that simultaneously considers user reviews and product meta-information.

Adversarial Learning-Based Image Correction Methodology for Deep Learning Analysis of Heterogeneous Images (이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법론)

  • Kim, Junwoo;Kim, Namgyu
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.457-464
    • /
    • 2021
  • The advent of the big data era has enabled the rapid development of deep learning that learns rules by itself from data. In particular, the performance of CNN algorithms has reached the level of self-adjusting the source data itself. However, the existing image processing method only deals with the image data itself, and does not sufficiently consider the heterogeneous environment in which the image is generated. Images generated in a heterogeneous environment may have the same information, but their features may be expressed differently depending on the photographing environment. This means that not only the different environmental information of each image but also the same information are represented by different features, which may degrade the performance of the image analysis model. Therefore, in this paper, we propose a method to improve the performance of the image color constancy model based on Adversarial Learning that uses image data generated in a heterogeneous environment simultaneously. Specifically, the proposed methodology operates with the interaction of the 'Domain Discriminator' that predicts the environment in which the image was taken and the 'Illumination Estimator' that predicts the lighting value. As a result of conducting an experiment on 7,022 images taken in heterogeneous environments to evaluate the performance of the proposed methodology, the proposed methodology showed superior performance in terms of Angular Error compared to the existing methods.

A Study on Lightweight CNN-based Interpolation Method for Satellite Images (위성 영상을 위한 경량화된 CNN 기반의 보간 기술 연구)

  • Kim, Hyun-ho;Seo, Doochun;Jung, JaeHeon;Kim, Yongwoo
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.2
    • /
    • pp.167-177
    • /
    • 2022
  • In order to obtain satellite image products using the image transmitted to the ground station after capturing the satellite images, many image pre/post-processing steps are involved. During the pre/post-processing, when converting from level 1R images to level 1G images, geometric correction is essential. An interpolation method necessary for geometric correction is inevitably used, and the quality of the level 1G images is determined according to the accuracy of the interpolation method. Also, it is crucial to speed up the interpolation algorithm by the level processor. In this paper, we proposed a lightweight CNN-based interpolation method required for geometric correction when converting from level 1R to level 1G. The proposed method doubles the resolution of satellite images and constructs a deep learning network with a lightweight deep convolutional neural network for fast processing speed. In addition, a feature map fusion method capable of improving the image quality of multispectral (MS) bands using panchromatic (PAN) band information was proposed. The images obtained through the proposed interpolation method improved by about 0.4 dB for the PAN image and about 4.9 dB for the MS image in the quantitative peak signal-to-noise ratio (PSNR) index compared to the existing deep learning-based interpolation methods. In addition, it was confirmed that the time required to acquire an image that is twice the resolution of the 36,500×36,500 input image based on the PAN image size is improved by about 1.6 times compared to the existing deep learning-based interpolation method.

Code Coverage Measurement in Configurable Software Product Line Testing (구성가능한 소프트웨어 제품라인 시험에서 코드 커버리지 측정)

  • Han, Soobin;Lee, Jihyun;Go, Seoyeon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.7
    • /
    • pp.273-282
    • /
    • 2022
  • Testing approaches for configurable software product lines differs significantly from a single software testing, as it requires consideration of common parts used by all member products of a product line and variable parts shared by some or a single product. Test coverage is a measure of the adequacy of testing performed. Test coverage measurements are important to evaluate the adequacy of testing at the software product line level, as there can be hundreds of member products produced from configurable software product lines. This paper proposes a method for measuring code coverage at the product line level in configurable software product lines. The proposed method tests the member products of a product line after hierarchizing member products based on the inclusion relationship of the selected features, and quantifies SPL(Software Product Line) test coverage by synthesizing the test coverage of each product. As a result of applying the proposed method to 11 configurable software product line cases, we confirmed that the proposed method could quantitatively visualize how thoroughly the SPL testing was performed to help verify the adequacy of the SPL testing. In addition, we could check whether the newly performed testing for a member product covers the newly added code parts of a feature.

A Thoracic Spine Segmentation Technique for Automatic Extraction of VHS and Cobb Angle from X-ray Images (X-ray 영상에서 VHS와 콥 각도 자동 추출을 위한 흉추 분할 기법)

  • Ye-Eun, Lee;Seung-Hwa, Han;Dong-Gyu, Lee;Ho-Joon, Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.1
    • /
    • pp.51-58
    • /
    • 2023
  • In this paper, we propose an organ segmentation technique for the automatic extraction of medical diagnostic indicators from X-ray images. In order to calculate diagnostic indicators of heart disease and spinal disease such as VHS(vertebral heart scale) and Cobb angle, it is necessary to accurately segment the thoracic spine, carina, and heart in a chest X-ray image. A deep neural network model in which the high-resolution representation of the image for each layer and the structure converted into a low-resolution feature map are connected in parallel was adopted. This structure enables the relative position information in the image to be effectively reflected in the segmentation process. It is shown that learning performance can be improved by combining the OCR module, in which pixel information and object information are mutually interacted in a multi-step process, and the channel attention module, which allows each channel of the network to be reflected as different weight values. In addition, a method of augmenting learning data is presented in order to provide robust performance against changes in the position, shape, and size of the subject in the X-ray image. The effectiveness of the proposed theory was evaluated through an experiment using 145 human chest X-ray images and 118 animal X-ray images.

Deep Learning Similarity-based 1:1 Matching Method for Real Product Image and Drawing Image

  • Han, Gi-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.12
    • /
    • pp.59-68
    • /
    • 2022
  • This paper presents a method for 1:1 verification by comparing the similarity between the given real product image and the drawing image. The proposed method combines two existing CNN-based deep learning models to construct a Siamese Network. After extracting the feature vector of the image through the FC (Fully Connected) Layer of each network and comparing the similarity, if the real product image and the drawing image (front view, left and right side view, top view, etc) are the same product, the similarity is set to 1 for learning and, if it is a different product, the similarity is set to 0. The test (inference) model is a deep learning model that queries the real product image and the drawing image in pairs to determine whether the pair is the same product or not. In the proposed model, through a comparison of the similarity between the real product image and the drawing image, if the similarity is greater than or equal to a threshold value (Threshold: 0.5), it is determined that the product is the same, and if it is less than or equal to, it is determined that the product is a different product. The proposed model showed an accuracy of about 71.8% for a query to a product (positive: positive) with the same drawing as the real product, and an accuracy of about 83.1% for a query to a different product (positive: negative). In the future, we plan to conduct a study to improve the matching accuracy between the real product image and the drawing image by combining the parameter optimization study with the proposed model and adding processes such as data purification.

Comparative Study of Anomaly Detection Accuracy of Intrusion Detection Systems Based on Various Data Preprocessing Techniques (다양한 데이터 전처리 기법 기반 침입탐지 시스템의 이상탐지 정확도 비교 연구)

  • Park, Kyungseon;Kim, Kangseok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.449-456
    • /
    • 2021
  • An intrusion detection system is a technology that detects abnormal behaviors that violate security, and detects abnormal operations and prevents system attacks. Existing intrusion detection systems have been designed using statistical analysis or anomaly detection techniques for traffic patterns, but modern systems generate a variety of traffic different from existing systems due to rapidly growing technologies, so the existing methods have limitations. In order to overcome this limitation, study on intrusion detection methods applying various machine learning techniques is being actively conducted. In this study, a comparative study was conducted on data preprocessing techniques that can improve the accuracy of anomaly detection using NGIDS-DS (Next Generation IDS Database) generated by simulation equipment for traffic in various network environments. Padding and sliding window were used as data preprocessing, and an oversampling technique with Adversarial Auto-Encoder (AAE) was applied to solve the problem of imbalance between the normal data rate and the abnormal data rate. In addition, the performance improvement of detection accuracy was confirmed by using Skip-gram among the Word2Vec techniques that can extract feature vectors of preprocessed sequence data. PCA-SVM and GRU were used as models for comparative experiments, and the experimental results showed better performance when sliding window, skip-gram, AAE, and GRU were applied.

Performance Analysis of Trading Strategy using Gradient Boosting Machine Learning and Genetic Algorithm

  • Jang, Phil-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.147-155
    • /
    • 2022
  • In this study, we developed a system to dynamically balance a daily stock portfolio and performed trading simulations using gradient boosting and genetic algorithms. We collected various stock market data from stocks listed on the KOSPI and KOSDAQ markets, including investor-specific transaction data. Subsequently, we indexed the data as a preprocessing step, and used feature engineering to modify and generate variables for training. First, we experimentally compared the performance of three popular gradient boosting algorithms in terms of accuracy, precision, recall, and F1-score, including XGBoost, LightGBM, and CatBoost. Based on the results, in a second experiment, we used a LightGBM model trained on the collected data along with genetic algorithms to predict and select stocks with a high daily probability of profit. We also conducted simulations of trading during the period of the testing data to analyze the performance of the proposed approach compared with the KOSPI and KOSDAQ indices in terms of the CAGR (Compound Annual Growth Rate), MDD (Maximum Draw Down), Sharpe ratio, and volatility. The results showed that the proposed strategies outperformed those employed by the Korean stock market in terms of all performance metrics. Moreover, our proposed LightGBM model with a genetic algorithm exhibited competitive performance in predicting stock price movements.

A Study on AR Algorithm Modeling for Indoor Furniture Interior Arrangement Using CNN

  • Ko, Jeong-Beom;Kim, Joon-Yong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.11-17
    • /
    • 2022
  • In this paper, a model that can increase the efficiency of work in arranging interior furniture by applying augmented reality technology was studied. In the existing system to which augmented reality is currently applied, there is a problem in that information is limitedly provided depending on the size and nature of the company's product when outputting the image of furniture. To solve this problem, this paper presents an AR labeling algorithm. The AR labeling algorithm extracts feature points from the captured images and builds a database including indoor location information. A method of detecting and learning the location data of furniture in an indoor space was adopted using the CNN technique. Through the learned result, it is confirmed that the error between the indoor location and the location shown by learning can be significantly reduced. In addition, a study was conducted to allow users to easily place desired furniture through augmented reality by receiving detailed information about furniture along with accurate image extraction of furniture. As a result of the study, the accuracy and loss rate of the model were found to be 99% and 0.026, indicating the significance of this study by securing reliability. The results of this study are expected to satisfy consumers' satisfaction and purchase desires by accurately arranging desired furniture indoors through the design and implementation of AR labels.

Method of Biological Information Analysis Based-on Object Contextual (대상객체 맥락 기반 생체정보 분석방법)

  • Kim, Kyung-jun;Kim, Ju-yeon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.41-43
    • /
    • 2022
  • In order to prevent and block infectious diseases caused by the recent COVID-19 pandemic, non-contact biometric information acquisition and analysis technology is attracting attention. The invasive and attached biometric information acquisition method accurately has the advantage of measuring biometric information, but has a risk of increasing contagious diseases due to the close contact. To solve these problems, the non-contact method of extracting biometric information such as human fingerprints, faces, iris, veins, voice, and signatures with automated devices is increasing in various industries as data processing speed increases and recognition accuracy increases. However, although the accuracy of the non-contact biometric data acquisition technology is improved, the non-contact method is greatly influenced by the surrounding environment of the object to be measured, which is resulting in distortion of measurement information and poor accuracy. In this paper, we propose a context-based bio-signal modeling technique for the interpretation of personalized information (image, signal, etc.) for bio-information analysis. Context-based biometric information modeling techniques present a model that considers contextual and user information in biometric information measurement in order to improve performance. The proposed model analyzes signal information based on the feature probability distribution through context-based signal analysis that can maximize the predicted value probability.

  • PDF