• Title/Summary/Keyword: vision-based method

Search Result 1,454, Processing Time 0.031 seconds

A Study on Utilization of Vision Transformer for CTR Prediction (CTR 예측을 위한 비전 트랜스포머 활용에 관한 연구)

  • Kim, Tae-Suk;Kim, Seokhun;Im, Kwang Hyuk
    • Knowledge Management Research
    • /
    • v.22 no.4
    • /
    • pp.27-40
    • /
    • 2021
  • Click-Through Rate (CTR) prediction is a key function that determines the ranking of candidate items in the recommendation system and recommends high-ranking items to reduce customer information overload and achieve profit maximization through sales promotion. The fields of natural language processing and image classification are achieving remarkable growth through the use of deep neural networks. Recently, a transformer model based on an attention mechanism, differentiated from the mainstream models in the fields of natural language processing and image classification, has been proposed to achieve state-of-the-art in this field. In this study, we present a method for improving the performance of a transformer model for CTR prediction. In order to analyze the effect of discrete and categorical CTR data characteristics different from natural language and image data on performance, experiments on embedding regularization and transformer normalization are performed. According to the experimental results, it was confirmed that the prediction performance of the transformer was significantly improved when the L2 generalization was applied in the embedding process for CTR data input processing and when batch normalization was applied instead of layer normalization, which is the default regularization method, to the transformer model.

Calibration of Thermal Camera with Enhanced Image (개선된 화질의 영상을 이용한 열화상 카메라 캘리브레이션)

  • Kim, Ju O;Lee, Deokwoo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.4
    • /
    • pp.621-628
    • /
    • 2021
  • This paper proposes a method to calibrate a thermal camera with three different perspectives. In particular, the intrinsic parameters of the camera and re-projection errors were provided to quantify the accuracy of the calibration result. Three lenses of the camera capture the same image, but they are not overlapped, and the image resolution is worse than the one captured by the RGB camera. In computer vision, camera calibration is one of the most important and fundamental tasks to calculate the distance between camera (s) and a target object or the three-dimensional (3D) coordinates of a point in a 3D object. Once calibration is complete, the intrinsic and the extrinsic parameters of the camera(s) are provided. The intrinsic parameters are composed of the focal length, skewness factor, and principal points, and the extrinsic parameters are composed of the relative rotation and translation of the camera(s). This study estimated the intrinsic parameters of thermal cameras that have three lenses of different perspectives. In particular, image enhancement based on a deep learning algorithm was carried out to improve the quality of the calibration results. Experimental results are provided to substantiate the proposed method.

Mean Teacher Learning Structure Optimization for Semantic Segmentation of Crack Detection (균열 탐지의 의미론적 분할을 위한 Mean Teacher 학습 구조 최적화 )

  • Seungbo Shim
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.27 no.5
    • /
    • pp.113-119
    • /
    • 2023
  • Most infrastructure structures were completed during periods of economic growth. The number of infrastructure structures reaching their lifespan is increasing, and the proportion of old structures is gradually increasing. The functions and performance of these structures at the time of design may deteriorate and may even lead to safety accidents. To prevent this repercussion, accurate inspection and appropriate repair are requisite. To this end, demand is increasing for computer vision and deep learning technology to accurately detect even minute cracks. However, deep learning algorithms require a large number of training data. In particular, label images indicating the location of cracks in the image are required. To secure a large number of those label images, a lot of labor and time are consumed. To reduce these costs as well as increase detection accuracy, this study proposed a learning structure based on mean teacher method. This learning structure was trained on a dataset of 900 labeled image dataset and 3000 unlabeled image dataset. The crack detection network model was evaluated on over 300 labeled image dataset, and the detection accuracy recorded a mean intersection over union of 89.23% and an F1 score of 89.12%. Through this experiment, it was confirmed that detection performance was improved compared to supervised learning. It is expected that this proposed method will be used in the future to reduce the cost required to secure label images.

Video Backlight Compensation Algorithm Based on Reliability of Brightness Variation (밝기 변화량의 신뢰도에 기반한 역광 비디오 영상의 보정 알고리듬)

  • Hyun, Dae-Young;Heu, Jun-Hee;Kim, Chang-Su;Lee, Sang-Uk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.6
    • /
    • pp.117-126
    • /
    • 2010
  • In the case of failure images with controlling lighting like backlighting and excessive frontlinghting, the compensation scheme for a specific area in an image is required. The interested region is first selected by user in our method to compensate the first frame. Then we define the matching function of brightness and energy function is proposed with weight of matching function and the relationship among the neighbors. Finally, the energy is minimized by the graph-cut algorithm to compensate the brightness of the first frame. Other frames are straightforwardly compensated using the results of the first frame. The brightness variations of the previous frame is transmitted to the next frame via motion vectors. The reliability of the brightness variation is calculated based on the motion vector reliability. Video compensation result is achieved by the process of the image case. Simulation show that the proposed algorithm provides more natural results than the conventional algorithms.

Directionally Adaptive Aliasing and Noise Removal Using Dictionary Learning and Space-Frequency Analysis (사전 학습과 공간-주파수 분석을 사용한 방향 적응적 에일리어싱 및 잡음 제거)

  • Chae, Eunjung;Lee, Eunsung;Cheong, Hejin;Paik, Joonki
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.8
    • /
    • pp.87-96
    • /
    • 2014
  • In this paper, we propose a directionally adaptive aliasing and noise removal using dictionary learning based on space-frequency analysis. The proposed aliasing and noise removal algorithm consists of two modules; i) aliasing and noise detection using dictionary learning and analysis of frequency characteristics from the combined wavelet-Fourier transform and ii) aliasing removal with suppressing noise based on the directional shrinkage in the detected regions. The proposed method can preserve the high-frequency details because aliasing and noise region is detected. Experimental results show that the proposed algorithm can efficiently reduce aliasing and noise while minimizing losses of high-frequency details and generation of artifacts comparing with the conventional methods. The proposed algorithm is suitable for various applications such as image resampling, super-resolution image, and robot vision.

Damage estimation for structural safety evaluation using dynamic displace measurement (구조안전도 평가를 위한 동적변위 기반 손상도 추정 기법 개발)

  • Shin, Yoon-Soo;Kim, Junhee
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.23 no.7
    • /
    • pp.87-94
    • /
    • 2019
  • Recently, the advance of accurate dynamic displacement measurement devices, such as GPS, computer vision, and optic laser sensor, has enhanced the structural monitoring technology. In this study, the dynamic displacement data was used to verify the applicability of the structural physical parameter estimation method through subspace system identification. The subspace system identification theory for estimating state-space model from measured data and physics-based interpretation for deriving the physical parameter of the estimated system are presented. Three-degree-freedom steel structures were fabricated for the experimental verification of the theory in this study. Laser displacement sensor and accelerometer were used to measure the displacement data of each floor and the acceleration data of the shaking table. Discrete state-space model generated from measured data was verified for precision. The discrete state-space model generated from the measured data extracted the floor stiffness of the building after accuracy verification. In addition, based on the story stiffness extracted from the state space model, five column stiffening and damage samples were set up to extract the change rate of story stiffness for each sample. As a result, in case of reinforcement and damage under the same condition, the stiffness change showed a high matching rate.

Visual-Attention Using Corner Feature Based SLAM in Indoor Environment (실내 환경에서 모서리 특징을 이용한 시각 집중 기반의 SLAM)

  • Shin, Yong-Min;Yi, Chu-Ho;Suh, Il-Hong;Choi, Byung-Uk
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.49 no.4
    • /
    • pp.90-101
    • /
    • 2012
  • The landmark selection is crucial to successful perform in SLAM(Simultaneous Localization and Mapping) with a mono camera. Especially, in unknown environment, automatic landmark selection is needed since there is no advance information about landmark. In this paper, proposed visual attention system which modeled human's vision system will be used in order to select landmark automatically. The edge feature is one of the most important element for attention in previous visual attention system. However, when the edge feature is used in complicated indoor area, the response of complicated area disappears, and between flat surfaces are getting higher. Also, computation cost increases occurs due to the growth of the dimensionality since it uses the responses for 4 directions. This paper suggests to use a corner feature in order to solve or prevent the problems mentioned above. Using a corner feature can also increase the accuracy of data association by concentrating on area which is more complicated and informative in indoor environments. Finally, this paper will prove that visual attention system based on corner feature can be more effective in SLAM compared to previous method by experiment.

A modified U-net for crack segmentation by Self-Attention-Self-Adaption neuron and random elastic deformation

  • Zhao, Jin;Hu, Fangqiao;Qiao, Weidong;Zhai, Weida;Xu, Yang;Bao, Yuequan;Li, Hui
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.1-16
    • /
    • 2022
  • Despite recent breakthroughs in deep learning and computer vision fields, the pixel-wise identification of tiny objects in high-resolution images with complex disturbances remains challenging. This study proposes a modified U-net for tiny crack segmentation in real-world steel-box-girder bridges. The modified U-net adopts the common U-net framework and a novel Self-Attention-Self-Adaption (SASA) neuron as the fundamental computing element. The Self-Attention module applies softmax and gate operations to obtain the attention vector. It enables the neuron to focus on the most significant receptive fields when processing large-scale feature maps. The Self-Adaption module consists of a multiplayer perceptron subnet and achieves deeper feature extraction inside a single neuron. For data augmentation, a grid-based crack random elastic deformation (CRED) algorithm is designed to enrich the diversities and irregular shapes of distributed cracks. Grid-based uniform control nodes are first set on both input images and binary labels, random offsets are then employed on these control nodes, and bilinear interpolation is performed for the rest pixels. The proposed SASA neuron and CRED algorithm are simultaneously deployed to train the modified U-net. 200 raw images with a high resolution of 4928 × 3264 are collected, 160 for training and the rest 40 for the test. 512 × 512 patches are generated from the original images by a sliding window with an overlap of 256 as inputs. Results show that the average IoU between the recognized and ground-truth cracks reaches 0.409, which is 29.8% higher than the regular U-net. A five-fold cross-validation study is performed to verify that the proposed method is robust to different training and test images. Ablation experiments further demonstrate the effectiveness of the proposed SASA neuron and CRED algorithm. Promotions of the average IoU individually utilizing the SASA and CRED module add up to the final promotion of the full model, indicating that the SASA and CRED modules contribute to the different stages of model and data in the training process.

The Mediating Effect of CEO's Innovation Direction on the Impact of Market Environment Favorability on Sales Growth Rates : Focused on Small and Medium-sized Manufacturing Companies (시장환경 호의성이 매출성장률에 미치는 영향에서 최고경영자 혁신지향성의 매개효과 : 중소제조기업을 중심으로)

  • Lee, Jong-chan
    • Journal of Venture Innovation
    • /
    • v.4 no.3
    • /
    • pp.17-30
    • /
    • 2021
  • Environmental deterministic perspectives and resource-based perspectives have different perceptions on the factors that determine corporate performance. While the environmental deterministic viewpoint sees the external environment as having a significant impact on corporate performance. On the other hand, the resource-compliant viewpoint believes that it is important to obtain the necessary resources through appropriate decision-making in order to overcome the uncertainty of the environment. Although the external environmental impact on corporate performance is important, the study is in the position that efforts within the company to cope with environmental uncertainty are necessary. This study identified the role that factors within the company play in the process of affecting the external environment of the company's performance. This study looked at whether the CEO's innovation direction plays an mediating role in the market environment favorability affecting sales growth rate. The data was collected using a survey method. We collected data from 138 small and medium-sized manufacturing companies in Gyeongin area. The collected data was analyzed using SPSS 22 packages. According to the analysis, market environment favorability positively affects sales growth rate, and the CEO's innovation direction plays a mediating role between market environment favorability and sales growth rate. The results of this study showed that depending on the market environment, the CEO's interest and willingness to innovate, present a vision for innovation, and institutionalize innovation activities increase management performance through innovation.

The Audience Behavior-based Emotion Prediction Model for Personalized Service (고객 맞춤형 서비스를 위한 관객 행동 기반 감정예측모형)

  • Ryoo, Eun Chung;Ahn, Hyunchul;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.73-85
    • /
    • 2013
  • Nowadays, in today's information society, the importance of the knowledge service using the information to creative value is getting higher day by day. In addition, depending on the development of IT technology, it is ease to collect and use information. Also, many companies actively use customer information to marketing in a variety of industries. Into the 21st century, companies have been actively using the culture arts to manage corporate image and marketing closely linked to their commercial interests. But, it is difficult that companies attract or maintain consumer's interest through their technology. For that reason, it is trend to perform cultural activities for tool of differentiation over many firms. Many firms used the customer's experience to new marketing strategy in order to effectively respond to competitive market. Accordingly, it is emerging rapidly that the necessity of personalized service to provide a new experience for people based on the personal profile information that contains the characteristics of the individual. Like this, personalized service using customer's individual profile information such as language, symbols, behavior, and emotions is very important today. Through this, we will be able to judge interaction between people and content and to maximize customer's experience and satisfaction. There are various relative works provide customer-centered service. Specially, emotion recognition research is emerging recently. Existing researches experienced emotion recognition using mostly bio-signal. Most of researches are voice and face studies that have great emotional changes. However, there are several difficulties to predict people's emotion caused by limitation of equipment and service environments. So, in this paper, we develop emotion prediction model based on vision-based interface to overcome existing limitations. Emotion recognition research based on people's gesture and posture has been processed by several researchers. This paper developed a model that recognizes people's emotional states through body gesture and posture using difference image method. And we found optimization validation model for four kinds of emotions' prediction. A proposed model purposed to automatically determine and predict 4 human emotions (Sadness, Surprise, Joy, and Disgust). To build up the model, event booth was installed in the KOCCA's lobby and we provided some proper stimulative movie to collect their body gesture and posture as the change of emotions. And then, we extracted body movements using difference image method. And we revised people data to build proposed model through neural network. The proposed model for emotion prediction used 3 type time-frame sets (20 frames, 30 frames, and 40 frames). And then, we adopted the model which has best performance compared with other models.' Before build three kinds of models, the entire 97 data set were divided into three data sets of learning, test, and validation set. The proposed model for emotion prediction was constructed using artificial neural network. In this paper, we used the back-propagation algorithm as a learning method, and set learning rate to 10%, momentum rate to 10%. The sigmoid function was used as the transform function. And we designed a three-layer perceptron neural network with one hidden layer and four output nodes. Based on the test data set, the learning for this research model was stopped when it reaches 50000 after reaching the minimum error in order to explore the point of learning. We finally processed each model's accuracy and found best model to predict each emotions. The result showed prediction accuracy 100% from sadness, and 96% from joy prediction in 20 frames set model. And 88% from surprise, and 98% from disgust in 30 frames set model. The findings of our research are expected to be useful to provide effective algorithm for personalized service in various industries such as advertisement, exhibition, performance, etc.