Search | Korea Science

Context-Dependent Video Data Augmentation for Human Instance Segmentation (인물 개체 분할을 위한 맥락-의존적 비디오 데이터 보강)

HyunJin Chun;JongHun Lee;InCheol Kim
- KIPS Transactions on Software and Data Engineering
- /
- v.12 no.5
- /
- pp.217-228
- /
- 2023
Video instance segmentation is an intelligent visual task with high complexity because it not only requires object instance segmentation for each image frame constituting a video, but also requires accurate tracking of instances throughout the frame sequence of the video. In special, human instance segmentation in drama videos has an unique characteristic that requires accurate tracking of several main characters interacting in various places and times. Also, it is also characterized by a kind of the class imbalance problem because there is a significant difference between the frequency of main characters and that of supporting or auxiliary characters in drama videos. In this paper, we introduce a new human instance datatset called MHIS, which is built upon drama videos, Miseang, and then propose a novel video data augmentation method, CDVA, in order to overcome the data imbalance problem between character classes. Different from the previous video data augmentation methods, the proposed CDVA generates more realistic augmented videos by deciding the optimal location within the background clip for a target human instance to be inserted with taking rich spatio-temporal context embedded in videos into account. Therefore, the proposed augmentation method, CDVA, can improve the performance of a deep neural network model for video instance segmentation. Conducting both quantitative and qualitative experiments using the MHIS dataset, we prove the usefulness and effectiveness of the proposed video data augmentation method.
https://doi.org/10.3745/KTSDE.2023.12.5.217 인용 PDF

Remote Sensing based Algae Monitoring in Dams using High-resolution Satellite Image and Machine Learning (고해상도 위성영상과 머신러닝을 활용한 녹조 모니터링 기법 연구)

Jung, Jiyoung;Jang, Hyeon June;Kim, Sung Hoon;Choi, Young Don;Yi, Hye-Suk;Choi, Sunghwa
- Proceedings of the Korea Water Resources Association Conference
- /
- 2022.05a
- /
- pp.42-42
- /
- 2022
지금까지도 유역에서의 녹조 모니터링은 현장채수를 통한 점 단위 모니터링에 크게 의존하고 있어 기후, 유속, 수온조건 등에 따라 수체에 광범위하게 발생하는 녹조를 효율적으로 모니터링하고 대응하기에는 어려운 점들이 있어왔다. 또한, 그동안 제한된 관측 데이터로 인해 현장 측정된 실측 데이터 보다는 녹조와 관련이 높은 NDVI, FGAI, SEI 등의 파생적인 지수를 산정하여 원격탐사자료와 매핑하는 방식의 분석연구 등이 선행되었다. 본 연구는 녹조의 모니터링시 정확도와 효율성을 향상을 목표로 하여, 우선은 녹조 측정장비를 활용, 7000개 이상의 녹조 관측 데이터를 확보하였으며, 이를 바탕으로 동기간의 고해상도 위성 자료와 실측자료를 매핑하기 위해 다양한Machine Learning기법을 적용함으로써 그 효과성을 검토하고자 하였다. 연구대상지는 낙동강 내성천 상류에 위치한 영주댐 유역으로서 데이터 수집단계에서는 면단위 현장(in-situ) 관측을 위해 2020년 2~9월까지 4회에 걸쳐 7291개의 녹조를 측정하고, 동일 시간 및 공간의 Sentinel-2자료 중 Band 1~12까지 총 13개(Band 8은 8과 8A로 2개)의 분광특성자료를 추출하였다. 다음으로 Machine Learning 분석기법의 적용을 위해 algae_monitoring Python library를 구축하였다. 개발된 library는 1) Training Set과 Test Set의 구분을 위한 Data 준비단계, 2) Random Forest, Gradient Boosting Regression, XGBoosting 알고리즘 중 선택하여 적용할 수 있는 모델적용단계, 3) 모델적용결과를 확인하는 Performance test단계(R², MSE, MAE, RMSE, NSE, KGE 등), 4) 모델결과의 Visualization단계, 5) 선정된 모델을 활용 위성자료를 녹조값으로 변환하는 적용단계로 구분하여 영주댐뿐만 아니라 다양한 유역에 범용적으로 적용할 수 있도록 구성하였다. 본 연구의 사례에서는 Sentinel-2위성의 12개 밴드, 기상자료(대기온도, 구름비율) 총 14개자료를 활용하여 Machine Learning기법 중 Random Forest를 적용하였을 경우에, 전반적으로 가장 높은 적합도를 나타내었으며, 적용결과 Test Set을 기준으로 NSE(Nash Sutcliffe Efficiency)가 0.96(Training Set의 경우에는 0.99) 수준의 성능을 나타내어, 광역적인 위성자료와 충분히 확보된 현장실측 자료간의 데이터 학습을 통해서 조류 모니터링 분석의 효율성이 획기적으로 증대될 수 있음을 확인하였다.
PDF

Development of Intelligent OCR Technology to Utilize Document Image Data (문서 이미지 데이터 활용을 위한 지능형 OCR 기술 개발)

Kim, Sangjun;Yu, Donghui;Hwang, Soyoung;Kim, Minho
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2022.05a
- /
- pp.212-215
- /
- 2022
In the era of so-called digital transformation today, the need for the construction and utilization of big data in various fields has increased. Today, a lot of data is produced and stored in a digital device and media-friendly manner, but the production and storage of data for a long time in the past has been dominated by print books. Therefore, the need for Optical Character Recognition (OCR) technology to utilize the vast amount of print books accumulated for a long time as big data was also required in line with the need for big data. In this study, a system for digitizing the structure and content of a document object inside a scanned book image is proposed. The proposal system largely consists of the following three steps. 1) Recognition of area information by document objects (table, equation, picture, text body) in scanned book image. 2) OCR processing for each area of the text body-table-formula module according to recognized document object areas. 3) The processed document informations gather up and returned to the JSON format. The model proposed in this study uses an open-source project that additional learning and improvement. Intelligent OCR proposed as a system in this study showed commercial OCR software-level performance in processing four types of document objects(table, equation, image, text body).
PDF

Unlicensed Band Traffic and Fairness Maximization Approach Based on Rate-Splitting Multiple Access (전송률 분할 다중 접속 기술을 활용한 비면허 대역의 트래픽과 공정성 최대화 기법)

Jeon Zang Woo;Kim Sung Wook
- KIPS Transactions on Computer and Communication Systems
- /
- v.12 no.10
- /
- pp.299-308
- /
- 2023
As the spectrum shortage problem has accelerated by the emergence of various services, New Radio-Unlicensed (NR-U) has appeared, allowing users who communicated in licensed bands to communicate in unlicensed bands. However, NR-U network users reduce the performance of Wi-Fi network users who communicate in the same unlicensed band. In this paper, we aim to simultaneously maximize the fairness and throughput of the unlicensed band, where the NR-U network users and the WiFi network users coexist. First, we propose an optimal power allocation scheme based on Monte Carlo Policy Gradient of reinforcement learning to maximize the sum of rates of NR-U networks utilizing rate-splitting multiple access in unlicensed bands. Then, we propose a channel occupancy time division algorithm based on sequential Raiffa bargaining solution of game theory that can simultaneously maximize system throughput and fairness for the coexistence of NR-U and WiFi networks in the same unlicensed band. Simulation results show that the rate splitting multiple access shows better performance than the conventional multiple access technology by comparing the sum-rate when the result value is finally converged under the same transmission power. In addition, we compare the data transfer amount and fairness of NR-U network users, WiFi network users, and total system, and prove that the channel occupancy time division algorithm based on sequential Raiffa bargaining solution of this paper satisfies throughput and fairness at the same time than other algorithms.
https://doi.org/10.3745/KTCCS.2023.12.10.299 인용 PDF

Analysis of grout injection distance in single rock joint (단일절리 암반에서 그라우팅 주입거리 분석)

Ji-Yeong Kim;Jo-Hyun Weon;Jong-Won Lee;Tae-Min Oh
- Journal of Korean Tunnelling and Underground Space Association
- /
- v.25 no.6
- /
- pp.541-554
- /
- 2023
The utilization of underground spaces in relation to tunnels and energy/waste storage is on the rise. To ensure the stability of underground spaces, it is crucial to reinforce rock fractures and discontinuities. Discontinuities, such as joints, can weaken the strength of the rock and lead to groundwater inflow into underground spaces. In order to enhance the strength and stability of the area around these discontinuities, rock grouting techniques are employed. However, during rock grouting, it is impossible to visually confirm whether the grouting material is being smoothly injected as intended. Without proper injection, the expected increases in strength, durability, and degree of consolidation may not be achieved. Therefore, it is necessary to predict in advance whether the grouting material is being injected as designed. In this study, we aimed to assess the injection performance based on injection variables such as the water/cement mixture ratio, injection pressure, and injection flow using UDEC (Universal Distinct Element Code) numerical program. Additionally, numerical results were validated by the lab experiment. The results of this study are expected to help optimize variables such as injection material properties, injection time, and pump pressure in the grouting design in the field.
https://doi.org/10.9711/KTAJ.2023.25.6.541 인용 PDF

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

Sohee Han;Jisub Um;Hoirin Kim
- Phonetics and Speech Sciences
- /
- v.16 no.1
- /
- pp.67-76
- /
- 2024
Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.
https://doi.org/10.13064/KSSS.2024.16.1.067 인용 PDF

Blind Rhythmic Source Separation (블라인드 방식의 리듬 음원 분리)

Kim, Min-Je;Yoo, Ji-Ho;Kang, Kyeong-Ok;Choi, Seung-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.8
- /
- pp.697-705
- /
- 2009
An unsupervised (blind) method is proposed aiming at extracting rhythmic sources from commercial polyphonic music whose number of channels is limited to one. Commercial music signals are not usually provided with more than two channels while they often contain multiple instruments including singing voice. Therefore, instead of using conventional modeling of mixing environments or statistical characteristics, we should introduce other source-specific characteristics for separating or extracting sources in the under determined environments. In this paper, we concentrate on extracting rhythmic sources from the mixture with the other harmonic sources. An extension of nonnegative matrix factorization (NMF), which is called nonnegative matrix partial co-factorization (NMPCF), is used to analyze multiple relationships between spectral and temporal properties in the given input matrices. Moreover, temporal repeatability of the rhythmic sound sources is implicated as a common rhythmic property among segments of an input mixture signal. The proposed method shows acceptable, but not superior separation quality to referred prior knowledge-based drum source separation systems, but it has better applicability due to its blind manner in separation, for example, when there is no prior information or the target rhythmic source is irregular.
https://doi.org/10.7776/ASK.2009.28.8.697 인용 PDF KSCI

A Study on the Drug Classification Using Machine Learning Techniques (머신러닝 기법을 이용한 약물 분류 방법 연구)

Anmol Kumar Singh;Ayush Kumar;Adya Singh;Akashika Anshum;Pradeep Kumar Mallick
- Advanced Industrial SCIence
- /
- v.3 no.2
- /
- pp.8-16
- /
- 2024
This paper shows the system of drug classification, the goal of this is to foretell the apt drug for the patients based on their demographic and physiological traits. The dataset consists of various attributes like Age, Sex, BP (Blood Pressure), Cholesterol Level, and Na_to_K (Sodium to Potassium ratio), with the objective to determine the kind of drug being given. The models used in this paper are K-Nearest Neighbors (KNN), Logistic Regression and Random Forest. Further to fine-tune hyper parameters using 5-fold cross-validation, GridSearchCV was used and each model was trained and tested on the dataset. To assess the performance of each model both with and without hyper parameter tuning evaluation metrics like accuracy, confusion matrices, and classification reports were used and the accuracy of the models without GridSearchCV was 0.7, 0.875, 0.975 and with GridSearchCV was 0.75, 1.0, 0.975. According to GridSearchCV Logistic Regression is the most suitable model for drug classification among the three-model used followed by the K-Nearest Neighbors. Also, Na_to_K is an essential feature in predicting the outcome.
https://doi.org/10.23153/AI-Science.2024.3.2.008 인용 PDF

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

Kim, Seungsoo;Kim, Jongwoo
- Journal of Intelligence and Information Systems
- /
- v.24 no.2
- /
- pp.221-241
- /
- 2018
Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.
https://doi.org/10.13088/jiis.2018.24.2.221 인용 PDF KSCI

Improvement of the Fishing Gear and Fishing Method of the East-Sea Trawl Fishery (동해구 트롤 어구어법의 개량)

권병국;이주희;이춘우;김형석;김용식;안영일;김정문
- Journal of the Korean Society of Fisheries and Ocean Technology
- /
- v.37 no.2
- /
- pp.106-116
- /
- 2001
A serious of studies on the fishing gear and system of the East Sea trawl fishery was carried out to improve the fishing efficiency and the working conditions. As the first step of these studies, the fishing gear and system of the traditional East Sea trawl were checked in order to solve the some problems, such as the poor sheering efficiency of net mouth, the inconvenient fishing system of the side trawl and etc. And then the fishing system was reorganized from the side trawl into the stern trawl by setting up the net drum system on the stern deck, and introduction of two types of new designed nets, one for mainly the midwater trawl and the other for the bottom trawl. The results of the field experiment on the modified system and nets can be summarized as follows : 1. the modified system was well worked and could save the man-labour by about 80%. 2. The sheering efficiency of the improved net, A type was improved to 20 m height and 30 m width in the net mouth, and that of B type net, to 10 m height and 33 m width, compared with 1.5 m height and 15 m width in the traditional net. 3. Catch efficiency of pink shrimp in A or B type net was better about 3 or 5 times than that of traditional net, and in B net, for herring and other bottom fishes is better about 2 times than that of the traditional net.
PDF

Search Result 2,174, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)