Search | Korea Science

Anomaly Detection Methodology Based on Multimodal Deep Learning (멀티모달 딥 러닝 기반 이상 상황 탐지 방법론)

Lee, DongHoon;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.28 no.2
- /
- pp.101-125
- /
- 2022
Recently, with the development of computing technology and the improvement of the cloud environment, deep learning technology has developed, and attempts to apply deep learning to various fields are increasing. A typical example is anomaly detection, which is a technique for identifying values or patterns that deviate from normal data. Among the representative types of anomaly detection, it is very difficult to detect a contextual anomaly that requires understanding of the overall situation. In general, detection of anomalies in image data is performed using a pre-trained model trained on large data. However, since this pre-trained model was created by focusing on object classification of images, there is a limit to be applied to anomaly detection that needs to understand complex situations created by various objects. Therefore, in this study, we newly propose a two-step pre-trained model for detecting abnormal situation. Our methodology performs additional learning from image captioning to understand not only mere objects but also the complicated situation created by them. Specifically, the proposed methodology transfers knowledge of the pre-trained model that has learned object classification with ImageNet data to the image captioning model, and uses the caption that describes the situation represented by the image. Afterwards, the weight obtained by learning the situational characteristics through images and captions is extracted and fine-tuning is performed to generate an anomaly detection model. To evaluate the performance of the proposed methodology, an anomaly detection experiment was performed on 400 situational images and the experimental results showed that the proposed methodology was superior in terms of anomaly detection accuracy and F1-score compared to the existing traditional pre-trained model.
https://doi.org/10.13088/jiis.2022.28.2.101 인용 PDF KSCI

Application of deep learning technique for battery lead tab welding error detection (배터리 리드탭 압흔 오류 검출의 딥러닝 기법 적용)

Kim, YunHo;Kim, ByeongMan
- Journal of Korea Society of Industrial Information Systems
- /
- v.27 no.2
- /
- pp.71-82
- /
- 2022
In order to replace the sampling tensile test of products produced in the tab welding process, which is one of the automotive battery manufacturing processes, vision inspectors are currently being developed and used. However, the vision inspection has the problem of inspection position error and the cost of improving it. In order to solve these problems, there are recent cases of applying deep learning technology. As one such case, this paper tries to examine the usefulness of applying Faster R-CNN, one of the deep learning technologies, to existing product inspection. The images acquired through the existing vision inspection machine are used as training data and trained using the Faster R-CNN ResNet101 V1 1024x1024 model. The results of the conventional vision test and Faster R-CNN test are compared and analyzed based on the test standards of 0% non-detection and 10% over-detection. The non-detection rate is 34.5% in the conventional vision test and 0% in the Faster R-CNN test. The over-detection rate is 100% in the conventional vision test and 6.9% in Faster R-CNN. From these results, it is confirmed that deep learning technology is very useful for detecting welding error of lead tabs in automobile batteries.
https://doi.org/10.9723/jksiis.2022.27.2.071 인용 PDF KSCI

A Study on the Restoration of Korean Traditional Palace Image by Adjusting the Receptive Field of Pix2Pix (Pix2Pix의 수용 영역 조절을 통한 전통 고궁 이미지 복원 연구)

Hwang, Won-Yong;Kim, Hyo-Kwan
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.15 no.5
- /
- pp.360-366
- /
- 2022
This paper presents a AI model structure for restoring Korean traditional palace photographs, which remain only black-and-white photographs, to color photographs using Pix2Pix, one of the adversarial generative neural network techniques. Pix2Pix consists of a combination of a synthetic image generator model and a discriminator model that determines whether a synthetic image is real or fake. This paper deals with an artificial intelligence model by adjusting a receptive field of the discriminator, and analyzes the results by considering the characteristics of the ancient palace photograph. The receptive field of Pix2Pix, which is used to restore black-and-white photographs, was commonly used in a fixed size, but a fixed size of receptive field is not suitable for a photograph which consisting with various change in an image. This paper observed the result of changing the size of the existing fixed a receptive field to identify the proper size of the discriminator that could reflect the characteristics of ancient palaces. In this experiment, the receptive field of the discriminator was adjusted based on the prepared ancient palace photos. This paper measure a loss of the model according to the change in a receptive field of the discriminator and check the results of restored photos using a well trained AI model from experiments.
https://doi.org/10.17661/jkiiect.2022.15.5.360 인용 PDF KSCI HTML

Development of an Intelligent Illegal Gambling Site Detection Model Based on Tag2Vec (Tag2vec 기반의 지능형 불법 도박 사이트 탐지 모형 개발)

Song, ChanWoo;Ahn, Hyunchul
- Journal of Intelligence and Information Systems
- /
- v.28 no.4
- /
- pp.211-227
- /
- 2022
Illegal gambling through online gambling sites has become a significant social problem. The development of Internet technology and the spread of smartphones have led to the proliferation of illegal gambling sites, so now illegal online gambling has become accessible to anyone. In order to mitigate its negative effect, the Korean government is trying to detect illegal gambling sites by using self-monitoring agents or reporting systems such as 'Nuricops.' However, it is difficult to detect all illegal sites due to limitations such as a lack of staffing. Accordingly, several scholars have proposed intelligent illegal gambling site detection techniques. Xu et al. (2019) found that fake or illegal websites generally have unique features in the HTML tag structure. It implies that the HTML tag structure can be important for detecting illegal sites. However, prior studies to improve the model's performance by utilizing the HTML tag structure in the illegal site detection model are rare. Against this background, our study aimed to improve the model's performance by utilizing the HTML tag structure and proposes Tag2Vec, a modified version of Doc2Vec, as a methodology to vectorize the HTML tag structure properly. To validate the proposed model, we perform the empirical analysis using a data set consisting of the list of harmful sites from 'The Cheat' and normal sites through Google search. As a result, it was confirmed that the Tag2Vec-based detection model proposed in this study showed better classification accuracy, recall, and F1_Score than the URL-based detection model-a comparative model. The proposed model of this study is expected to be effectively utilized to improve the health of our society through intelligent technology.
https://doi.org/10.13088/jiis.2022.28.4.211 인용 PDF KSCI

Methods for Quantitative Disassembly and Code Establishment of CBS in BIM for Program and Payment Management (BIM의 공정과 기성 관리 적용을 위한 CBS 수량 분개 및 코드 정립 방안)

Hando Kim;Jeongyong Nam;Yongju Kim;Inhye Ryu
- Journal of the Computational Structural Engineering Institute of Korea
- /
- v.36 no.6
- /
- pp.381-389
- /
- 2023
One of the crucial components in building information modeling (BIM) is data. To systematically manage these data, various research studies have focused on the creation of object breakdown structures and property sets. Specifically, crucial data for managing programs and payments involves work breakdown structures (WBSs) and cost breakdown structures (CBSs), which are indispensable for mapping BIM objects. Achieving this requires disassembling CBS quantities based on 3D objects and WBS. However, this task is highly tedious owing to the large volume of CBS and divergent coding practices employed by different organizations. Manual processes, such as those based on Excel, become nearly impossible for such extensive tasks. In response to the challenge of computing quantities that are difficult to derive from BIM objects, this study presents methods for disassembling length-based quantities, incorporating significant portions of the bill of quantities (BOQs). The proposed approach recommends suitable CBS by leveraging the accumulated history of WBS-CBS mapping databases. Additionally, it establishes a unified CBS code, facilitating the effective operation of CBS databases.
https://doi.org/10.7734/COSEIK.2023.36.6.381 인용 PDF

Optimization-based Deep Learning Model to Localize L3 Slice in Whole Body Computerized Tomography Images (컴퓨터 단층촬영 영상에서 3번 요추부 슬라이스 검출을 위한 최적화 기반 딥러닝 모델)

Seongwon Chae;Jae-Hyun Jo;Ye-Eun Park;Jin-Hyoung, Jeong;Sung Jin Kim;Ahnryul Choi
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.16 no.5
- /
- pp.331-337
- /
- 2023
In this paper, we propose a deep learning model to detect lumbar 3 (L3) CT images to determine the occurrence and degree of sarcopenia. In addition, we would like to propose an optimization technique that uses oversampling ratio and class weight as design parameters to address the problem of performance degradation due to data imbalance between L3 level and non-L3 level portions of CT data. In order to train and test the model, a total of 150 whole-body CT images of 104 prostate cancer patients and 46 bladder cancer patients who visited Gangneung Asan Medical Center were used. The deep learning model used ResNet50, and the design parameters of the optimization technique were selected as six types of model hyperparameters, data augmentation ratio, and class weight. It was confirmed that the proposed optimization-based L3 level extraction model reduced the median L3 error by about 1.0 slices compared to the control model (a model that optimized only 5 types of hyperparameters). Through the results of this study, accurate L3 slice detection was possible, and additionally, we were able to present the possibility of effectively solving the data imbalance problem through oversampling through data augmentation and class weight adjustment.
https://doi.org/10.17661/jkiiect.2023.16.5.331 인용 PDF HTML

YouTube Video Content Analysis: Focusing on Korean Dance Videos (유튜브(YouTube) 영상 콘텐츠 분석: 국내 무용 영상을 중심으로)

Suejung Chae;Jihae Suh
- Journal of Intelligence and Information Systems
- /
- v.29 no.4
- /
- pp.1-13
- /
- 2023
The widespread adoption of smartphones and advancements in internet technology have notably shifted content consumption habits toward video. This research aims to dissect the nature of videos posted on YouTube, the global video-sharing platform, to understand the characteristics of both produced and preferred content. For this study, dance was chosen as a specific subject from a variety of video categories. Data on YouTube videos associated with the term "dance" was compiled over three years, from 2019 to 2021. The investigation revealed a clear distinction between the types of dance videos frequently uploaded to YouTube and those that receive a high number of views. The empirical analysis of this study indicates a viewer preference for vlogs that provide insights into the daily lives of dance students, as well as for purpose-driven videos, such as those highlighting dance exam preparations or school dance events. Notably, the vlogs that attract the most attention are typically created by dance students at the college or secondary school level, rather than by professionals. Although the study was focused on dance, its methodologies can be applied to different subjects. These insights are expected to contribute to the development of a recommendation system that aids content creators in effectively targeting their productions.
https://doi.org/10.13088/jiis.2023.29.4.001 인용 PDF

A Study on the Intention to Use ChatGPT Focusing on the Moderating Effect of the MZ Generation (MZ세대의 조절효과를 중심으로 한 ChatGPT의 사용의도에 관한 연구)

Yang-bum Jung;Jungmin Park;Hyoung-Yong Lee
- Journal of Intelligence and Information Systems
- /
- v.29 no.4
- /
- pp.111-127
- /
- 2023
This study is a study on user perception of ChatGPT use. The goal of this study is to analyze the relationship between user policy expectations and user innovativeness on ChatGPT technology acceptance and intention to use using variables of TRA (Theory of Reasoned Action). The impact of policy expectations and user innovativeness on the intention to use by mediating usefulness and hedonic motivation, and the impact of subjective norms on the usefulness and intention to use were analyzed by dividing them into the MZ generation and the non-MZ generation. It was verified whether there was a moderating effect on the effect of age differences on usefulness by interacting with policy expectations. An online survey was conducted on 300 ChatGPT users using PLS (Partial Least Square) structural equations and SPSS Package, and statistical analysis was performed using PLS and SPSS. According to the analysis results, it was confirmed that the higher the initial user's innovativeness, the higher the intention to use ChatGPT. In addition, the moderating effect analysis comparing the differences between the MZ generation and the non-MZ generation showed that policy expectations had a negative effect on the usefulness of ChatGPT use.
https://doi.org/10.13088/jiis.2023.29.4.111 인용 PDF

Development of a Prediction Model for Fall Patients in the Main Diagnostic S Code Using Artificial Intelligence (인공지능을 이용한 주진단 S코드의 낙상환자 예측모델 개발)

Ye-Ji Park;Eun-Mee Choi;So-Hyeon Bang;Jin-Hyoung Jeong
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.16 no.6
- /
- pp.526-532
- /
- 2023
Falls are fatal accidents that occur more than 420,000 times a year worldwide. Therefore, to study patients with falls, we found the association between extrinsic injury codes and principal diagnosis S-codes of patients with falls, and developed a prediction model to predict extrinsic injury codes based on the data of principal diagnosis S-codes of patients with falls. In this study, we received two years of data from 2020 and 2021 from Institution A, located in Gangneung City, Gangwon Special Self-Governing Province, and extracted only the data from W00 to W19 of the extrinsic injury codes related to falls, and developed a prediction model using W01, W10, W13, and W18 of the extrinsic injury codes of falls, which had enough principal diagnosis S-codes to develop a prediction model. 80% of the data were categorized as training data and 20% as testing data. The model was developed using MLP (Multi-Layer Perceptron) with 6 variables (gender, age, principal diagnosis S-code, surgery, hospitalization, and alcohol consumption) in the input layer, 2 hidden layers with 64 nodes, and an output layer with 4 nodes for W01, W10, W13, and W18 exogenous damage codes using the softmax activation function. As a result of the training, the first training had an accuracy of 31.2%, but the 30th training had an accuracy of 87.5%, which confirmed the association between the fall extrinsic code and the main diagnosis S code of the fall patient.
https://doi.org/10.17661/jkiiect.2023.16.6.526 인용 PDF HTML

Analysis of Research Trends in Deep Learning-Based Video Captioning (딥러닝 기반 비디오 캡셔닝의 연구동향 분석)

Lyu Zhi;Eunju Lee;Youngsoo Kim
- KIPS Transactions on Software and Data Engineering
- /
- v.13 no.1
- /
- pp.35-49
- /
- 2024
Video captioning technology, as a significant outcome of the integration between computer vision and natural language processing, has emerged as a key research direction in the field of artificial intelligence. This technology aims to achieve automatic understanding and language expression of video content, enabling computers to transform visual information in videos into textual form. This paper provides an initial analysis of the research trends in deep learning-based video captioning and categorizes them into four main groups: CNN-RNN-based Model, RNN-RNN-based Model, Multimodal-based Model, and Transformer-based Model, and explain the concept of each video captioning model. The features, pros and cons were discussed. This paper lists commonly used datasets and performance evaluation methods in the video captioning field. The dataset encompasses diverse domains and scenarios, offering extensive resources for the training and validation of video captioning models. The model performance evaluation method mentions major evaluation indicators and provides practical references for researchers to evaluate model performance from various angles. Finally, as future research tasks for video captioning, there are major challenges that need to be continuously improved, such as maintaining temporal consistency and accurate description of dynamic scenes, which increase the complexity in real-world applications, and new tasks that need to be studied are presented such as temporal relationship modeling and multimodal data integration.
https://doi.org/10.3745/KTSDE.2024.13.1.35 인용 PDF

Search Result 1,962, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)