• Title/Summary/Keyword: multi-task learning

Search Result 132, Processing Time 0.029 seconds

Utilization of age information for speaker verification using multi-task learning deep neural networks (멀티태스크 러닝 심층신경망을 이용한 화자인증에서의 나이 정보 활용)

  • Kim, Ju-ho;Heo, Hee-Soo;Jung, Jee-weon;Shim, Hye-jin;Kim, Seung-Bin;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.5
    • /
    • pp.593-600
    • /
    • 2019
  • The similarity in tones between speakers can lower the performance of speaker verification. To improve the performance of speaker verification systems, we propose a multi-task learning technique using deep neural network to learn speaker information and age information. Multi-task learning can improve generalization performances, because it helps deep neural networks to prevent hidden layers from overfitting into one task. However, we found in experiments that learning of age information does not work well in the process of learning the deep neural network. In order to improve the learning, we propose a method to dynamically change the objective function weights of speaker identification and age estimation in the learning process. Results show the equal error rate based on RSR2015 evaluation data set, 6.91 % for the speaker verification system without using age information, 6.77 % using age information only, and 4.73 % using age information when weight change technique was applied.

Effective Multi-label Feature Selection based on Large Offspring Set created by Enhanced Evolutionary Search Process

  • Lim, Hyunki;Seo, Wangduk;Lee, Jaesung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.9
    • /
    • pp.7-13
    • /
    • 2018
  • Recent advancement in data gathering technique improves the capability of information collecting, thus allowing the learning process between gathered data patterns and application sub-tasks. A pattern can be associated with multiple labels, demanding multi-label learning capability, resulting in significant attention to multi-label feature selection since it can improve multi-label learning accuracy. However, existing evolutionary multi-label feature selection methods suffer from ineffective search process. In this study, we propose a evolutionary search process for the task of multi-label feature selection problem. The proposed method creates large set of offspring or new feature subsets and then retains the most promising feature subset. Experimental results demonstrate that the proposed method can identify feature subsets giving good multi-label classification accuracy much faster than conventional methods.

Multi-modal Emotion Recognition using Semi-supervised Learning and Multiple Neural Networks in the Wild (준 지도학습과 여러 개의 딥 뉴럴 네트워크를 사용한 멀티 모달 기반 감정 인식 알고리즘)

  • Kim, Dae Ha;Song, Byung Cheol
    • Journal of Broadcast Engineering
    • /
    • v.23 no.3
    • /
    • pp.351-360
    • /
    • 2018
  • Human emotion recognition is a research topic that is receiving continuous attention in computer vision and artificial intelligence domains. This paper proposes a method for classifying human emotions through multiple neural networks based on multi-modal signals which consist of image, landmark, and audio in a wild environment. The proposed method has the following features. First, the learning performance of the image-based network is greatly improved by employing both multi-task learning and semi-supervised learning using the spatio-temporal characteristic of videos. Second, a model for converting 1-dimensional (1D) landmark information of face into two-dimensional (2D) images, is newly proposed, and a CNN-LSTM network based on the model is proposed for better emotion recognition. Third, based on an observation that audio signals are often very effective for specific emotions, we propose an audio deep learning mechanism robust to the specific emotions. Finally, so-called emotion adaptive fusion is applied to enable synergy of multiple networks. The proposed network improves emotion classification performance by appropriately integrating existing supervised learning and semi-supervised learning networks. In the fifth attempt on the given test set in the EmotiW2017 challenge, the proposed method achieved a classification accuracy of 57.12%.

No-Reference Image Quality Assessment based on Quality Awareness Feature and Multi-task Training

  • Lai, Lijing;Chu, Jun;Leng, Lu
    • Journal of Multimedia Information System
    • /
    • v.9 no.2
    • /
    • pp.75-86
    • /
    • 2022
  • The existing image quality assessment (IQA) datasets have a small number of samples. Some methods based on transfer learning or data augmentation cannot make good use of image quality-related features. A No Reference (NR)-IQA method based on multi-task training and quality awareness is proposed. First, single or multiple distortion types and levels are imposed on the original image, and different strategies are used to augment different types of distortion datasets. With the idea of weak supervision, we use the Full Reference (FR)-IQA methods to obtain the pseudo-score label of the generated image. Then, we combine the classification information of the distortion type, level, and the information of the image quality score. The ResNet50 network is trained in the pre-train stage on the augmented dataset to obtain more quality-aware pre-training weights. Finally, the fine-tuning stage training is performed on the target IQA dataset using the quality-aware weights to predicate the final prediction score. Various experiments designed on the synthetic distortions and authentic distortions datasets (LIVE, CSIQ, TID2013, LIVEC, KonIQ-10K) prove that the proposed method can utilize the image quality-related features better than the method using only single-task training. The extracted quality-aware features improve the accuracy of the model.

A study on the optimal task-based instructional model: Focused on Korean EFL classroom practice (효율적인 과업중심 교수.학습모형 연구: EFL 교실 상황을 중심으로)

  • Jeon, In-Jae
    • English Language & Literature Teaching
    • /
    • v.11 no.4
    • /
    • pp.365-389
    • /
    • 2005
  • The purpose of this study is to present the task model that is the most effective in English language methodology based on the investigation of task-based performance in Korean EFL classroom practice. The subjects were 538 high school students and 126 high school teachers, each of whom had common experiences using the materials of task-based activities for more than one year. To analyze the data, the program SPSS WIN 11.0 including frequency distribution and chi-square analysis was used. The results of the questionnaire analysis showed that both teachers and students had a comparatively high level of satisfaction in task rationale, but that they had some mixed responses in the fields of input data, settings, and activity types. To conclude, a few suggestions are made to provide some meaningful considerations for the EFL teachers and material developers: a) task goals and rationale that encourage the learner's positive motivation; b) authenticity of input data based on the real-world context; c) collaborative learning environment that enhances communicative interaction; d) proportional representation of the creative problem-solving activities related to discussions and decision-making processes; e) systematic introduction of integrated language skills. It also suggests that the multi-lateral task model, which has some positive assets compared to previous task models, be newly introduced and applied to the second language learning classrooms.

  • PDF

Effects of Multi-modal Guidance for the Acquisition of Sight Reading Skills: A Case Study with Simple Drum Sequences (멀티모달 가이던스가 독보 기능 습득에 미치는 영향: 드럼 타격 시퀀스에서의 사례 연구)

  • Lee, In;Choi, Seungmoon
    • The Journal of Korea Robotics Society
    • /
    • v.8 no.3
    • /
    • pp.217-227
    • /
    • 2013
  • We introduce a learning system for the sight reading of simple drum sequences. Sight reading is a cognitive-motor skill that requires reading of music symbols and actions of multiple limbs for playing the music. The system provides knowledge of results (KR) pertaining to the learner's performance by color-coding music symbols, and guides the learner by indicating the corresponding action for a given music symbol using additional auditory or vibrotactile cues. To evaluate the effects of KR and guidance cues, three learning methods were experimentally compared: KR only, KR with auditory cues, and KR with vibrotactile cues. The task was to play a random 16-note-long drum sequence displayed on a screen. Thirty university students learned the task using one of the learning methods in a between-subjects design. The experimental results did not show statistically significant differences between the methods in terms of task accuracy and completion time.

Safety and Efficiency Learning for Multi-Robot Manufacturing Logistics Tasks (다중 로봇 제조 물류 작업을 위한 안전성과 효율성 학습)

  • Minkyo Kang;Incheol Kim
    • The Journal of Korea Robotics Society
    • /
    • v.18 no.2
    • /
    • pp.225-232
    • /
    • 2023
  • With the recent increase of multiple robots cooperating in smart manufacturing logistics environments, it has become very important how to predict the safety and efficiency of the individual tasks and dynamically assign them to the best one of available robots. In this paper, we propose a novel task policy learner based on deep relational reinforcement learning for predicting the safety and efficiency of tasks in a multi-robot manufacturing logistics environment. To reduce learning complexity, the proposed system divides the entire safety/efficiency prediction process into two distinct steps: the policy parameter estimation and the rule-based policy inference. It also makes full use of domain-specific knowledge for policy rule learning. Through experiments conducted with virtual dynamic manufacturing logistics environments using NVIDIA's Isaac simulator, we show the effectiveness and superiority of the proposed system.

A light-weight Gender/Age Estimation model based on Multi-taking Deep Learning for an Embedded System (임베디드 시스템을 위한 멀티태스킹 딥러닝 학습 기반 경량화 성별/연령별 추정)

  • Bao, Huy-Tran Quoc;Chung, Sun-Tae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.05a
    • /
    • pp.483-486
    • /
    • 2020
  • Age estimation and gender classification for human is a classic problem in computer vision. Almost research focus just only one task and the models are too heavy to run on low-cost system. In our research, we aim to apply multitasking learning to perform both task on a lightweight model which can achieve good precision on embedded system in the real time.

X-ray Image Segmentation using Multi-task Learning

  • Park, Sejin;Jeong, Woojin;Moon, Young Shik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.3
    • /
    • pp.1104-1120
    • /
    • 2020
  • The chest X-rays are a common way to diagnose lung cancer or pneumonia. In particular, the finding of a lung nodule is the most important problem in the early detection of lung cancer. Recently, a lot of automatic diagnosis algorithms have been studied to find the lung nodules missed by doctors. The algorithms are typically based on segmentation network like U-Net. However, the occurrence of false positives that similar to lung nodules present outside the lungs can severely degrade performance. In this study, we propose a multi-task learning method that simultaneously learns the lung region and nodule-labeled data based on the prior knowledge that lung nodules exist only in the lung. The proposed method significantly reduces false positives outside the lung and improves the recognition rate of lung nodules to 83.8 F1 score compared to 66.6 F1 score of single task learning with U-net model. The experimental results on the JSRT public dataset demonstrate the effectiveness of the proposed method compared with other baseline methods.

Fast and Robust Face Detection based on CNN in Wild Environment (CNN 기반의 와일드 환경에 강인한 고속 얼굴 검출 방법)

  • Song, Junam;Kim, Hyung-Il;Ro, Yong Man
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.8
    • /
    • pp.1310-1319
    • /
    • 2016
  • Face detection is the first step in a wide range of face applications. However, detecting faces in the wild is still a challenging task due to the wide range of variations in pose, scale, and occlusions. Recently, many deep learning methods have been proposed for face detection. However, further improvements are required in the wild. Another important issue to be considered in the face detection is the computational complexity. Current state-of-the-art deep learning methods require a large number of patches to deal with varying scales and the arbitrary image sizes, which result in an increased computational complexity. To reduce the complexity while achieving better detection accuracy, we propose a fully convolutional network-based face detection that can take arbitrarily-sized input and produce feature maps (heat maps) corresponding to the input image size. To deal with the various face scales, a multi-scale network architecture that utilizes the facial components when learning the feature maps is proposed. On top of it, we design multi-task learning technique to improve detection performance. Extensive experiments have been conducted on the FDDB dataset. The experimental results show that the proposed method outperforms state-of-the-art methods with the accuracy of 82.33% at 517 false alarms, while improving computational efficiency significantly.