• Title/Summary/Keyword: 장단기 기억 구조

Search Result 11, Processing Time 0.029 seconds

Performance comparison of various deep neural network architectures using Merlin toolkit for a Korean TTS system (Merlin 툴킷을 이용한 한국어 TTS 시스템의 심층 신경망 구조 성능 비교)

  • Hong, Junyoung;Kwon, Chulhong
    • Phonetics and Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.57-64
    • /
    • 2019
  • In this paper, we construct a Korean text-to-speech system using the Merlin toolkit which is an open source system for speech synthesis. In the text-to-speech system, the HMM-based statistical parametric speech synthesis method is widely used, but it is known that the quality of synthesized speech is degraded due to limitations of the acoustic modeling scheme that includes context factors. In this paper, we propose an acoustic modeling architecture that uses deep neural network technique, which shows excellent performance in various fields. Fully connected deep feedforward neural network (DNN), recurrent neural network (RNN), gated recurrent unit (GRU), long short-term memory (LSTM), bidirectional LSTM (BLSTM) are included in the architecture. Experimental results have shown that the performance is improved by including sequence modeling in the architecture, and the architecture with LSTM or BLSTM shows the best performance. It has been also found that inclusion of delta and delta-delta components in the acoustic feature parameters is advantageous for performance improvement.

Background subtraction using LSTM and spatial recurrent neural network (장단기 기억 신경망과 공간적 순환 신경망을 이용한 배경차분)

  • Choo, Sungkwon;Cho, Nam Ik
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2016.11a
    • /
    • pp.13-16
    • /
    • 2016
  • 본 논문에서는 순환 신경망을 이용하여 동영상에서의 배경과 전경을 구분하는 알고리즘을 제안한다. 순환 신경망은 일련의 순차적인 입력에 대해서 내부의 루프(loop)를 통해 이전 입력에 의한 정보를 지속할 수 있도록 구성되는 신경망을 말한다. 순환 신경망의 여러 구조들 가운데, 우리는 장기적인 관계에도 반응할 수 있도록 장단기 기억 신경망(Long short-term memory networks, LSTM)을 사용했다. 그리고 동영상에서의 시간적인 연결 뿐 아니라 공간적인 연관성도 배경과 전경을 판단하는 것에 영향을 미치기 때문에, 공간적 순환 신경망을 적용하여 내부 신경망(hidden layer)들의 정보가 공간적으로 전달될 수 있도록 신경망을 구성하였다. 제안하는 알고리즘은 기본적인 배경차분 동영상에 대해 기존 알고리즘들과 비교할만한 결과를 보인다.

  • PDF

Vocal and nonvocal separation using combination of kernel model and long-short term memory networks (커널 모델과 장단기 기억 신경망을 결합한 보컬 및 비보컬 분리)

  • Cho, Hye-Seung;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.4
    • /
    • pp.261-266
    • /
    • 2017
  • In this paper, we propose a vocal and nonvocal separation method which uses a combination of kernel model and LSTM (Long-Short Term Memory) networks. Conventional vocal and nonvocal separation methods estimate the vocal component even in sections where only non-vocal components exist. This causes a problem of the source estimation error. Therefore we combine the existing kernel based separation method with the vocal/nonvocal classification based on LSTM networks in order to overcome the limitation of the existing separation methods. We propose a parallel combined separation algorithm and series combined separation algorithm as combination structures. The experimental results verify that the proposed method achieves better separation performance than the conventional approaches.

Neural Architecture Search for Korean Text Classification (한국어 문서 분류를 위한 신경망 구조 탐색)

  • ByoungKyu Ji
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.125-130
    • /
    • 2023
  • 최근 심층 신경망을 활용한 한국어 자연어 처리에 대한 관심이 높아지고 있지만, 한국어 자연어 처리에 적합한 신경망 구조 탐색에 대한 연구는 이뤄지지 않았다. 본 논문에서는 문서 분류 정확도를 보상으로 하는 강화 학습 알고리즘을 이용하여 장단기 기억 신경망으로 한국어 문서 분류에 적합한 심층 신경망 구조를 탐색하였으며, 탐색을 위해 사전 학습한 한국어 임베딩 성능과 탐색한 신경망 구조를 분석하였다. 탐색을 통해 찾아낸 신경망 구조는 기존 한국어 자연어 처리 모델에 대해 4 가지 한국어 문서 분류 과제로 비교하였을 때 일반적으로 성능이 우수하고 모델의 크기가 작아 효율적이었다.

  • PDF

LSTM Hyperparameter Optimization for an EEG-Based Efficient Emotion Classification in BCI (BCI에서 EEG 기반 효율적인 감정 분류를 위한 LSTM 하이퍼파라미터 최적화)

  • Aliyu, Ibrahim;Mahmood, Raja Majid;Lim, Chang-Gyoon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.6
    • /
    • pp.1171-1180
    • /
    • 2019
  • Emotion is a psycho-physiological process that plays an important role in human interactions. Affective computing is centered on the development of human-aware artificial intelligence that can understand and regulate emotions. This field of study is also critical as mental diseases such as depression, autism, attention deficit hyperactivity disorder, and game addiction are associated with emotion. Despite the efforts in emotions recognition and emotion detection from nonstationary, detecting emotions from abnormal EEG signals requires sophisticated learning algorithms because they require a high level of abstraction. In this paper, we investigated LSTM hyperparameters for an optimal emotion EEG classification. Results of several experiments are hereby presented. From the results, optimal LSTM hyperparameter configuration was achieved.

Prediction of Jamming Techniques by Using LSTM (LSTM을 이용한 재밍 기법 예측)

  • Lee, Gyeong-Hoon;Jo, Jeil;Park, Cheong Hee
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.22 no.2
    • /
    • pp.278-286
    • /
    • 2019
  • Conventional methods for selecting jamming techniques in electronic warfare are based on libraries in which a list of jamming techniques for radar signals is recorded. However, the choice of jamming techniques by the library is limited when modified signals are received. In this paper, we propose a method to predict the jamming technique for radar signals by using deep learning methods. Long short-term memory(LSTM) is a deep running method which is effective for learning the time dependent relationship in sequential data. In order to determine the optimal LSTM model structure for jamming technique prediction, we test the learning parameter values that should be selected, such as the number of LSTM layers, the number of fully-connected layers, optimization methods, the size of the mini batch, and dropout ratio. Experimental results demonstrate the competent performance of the LSTM model in predicting the jamming technique for radar signals.

Machine learning model for residual chlorine prediction in sediment basin to control pre-chlorination in water treatment plant (정수장 전염소 공정제어를 위한 침전지 잔류염소농도 예측 머신러닝 모형)

  • Kim, Juhwan;Lee, Kyunghyuk;Kim, Soojun;Kim, Kyunghun
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.spc1
    • /
    • pp.1283-1293
    • /
    • 2022
  • The purpose of this study is to predict residual chlorine in order to maintain stable residual chlorine concentration in sedimentation basin by using artificial intelligence algorithms in water treatment process employing pre-chlorination. Available water quantity and quality data are collected and analyzed statistically to apply into mathematical multiple regression and artificial intelligence models including multi-layer perceptron neural network, random forest, long short term memory (LSTM) algorithms. Water temperature, turbidity, pH, conductivity, flow rate, alkalinity and pre-chlorination dosage data are used as the input parameters to develop prediction models. As results, it is presented that the random forest algorithm shows the most moderate prediction result among four cases, which are long short term memory, multi-layer perceptron, multiple regression including random forest. Especially, it is result that the multiple regression model can not represent the residual chlorine with the input parameters which varies independently with seasonal change, numerical scale and dimension difference between quantity and quality. For this reason, random forest model is more appropriate for predict water qualities than other algorithms, which is classified into decision tree type algorithm. Also, it is expected that real time prediction by artificial intelligence models can play role of the stable operation of residual chlorine in water treatment plant including pre-chlorination process.

A Survey on Neural Networks Using Memory Component (메모리 요소를 활용한 신경망 연구 동향)

  • Lee, Jihwan;Park, Jinuk;Kim, Jaehyung;Kim, Jaein;Roh, Hongchan;Park, Sanghyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.8
    • /
    • pp.307-324
    • /
    • 2018
  • Recently, recurrent neural networks have been attracting attention in solving prediction problem of sequential data through structure considering time dependency. However, as the time step of sequential data increases, the problem of the gradient vanishing is occurred. Long short-term memory models have been proposed to solve this problem, but there is a limit to storing a lot of data and preserving it for a long time. Therefore, research on memory-augmented neural network (MANN), which is a learning model using recurrent neural networks and memory elements, has been actively conducted. In this paper, we describe the structure and characteristics of MANN models that emerged as a hot topic in deep learning field and present the latest techniques and future research that utilize MANN.

Unsupervised Vortex-induced Vibration Detection Using Data Synthesis (합성데이터를 이용한 비지도학습 기반 실시간 와류진동 탐지모델)

  • Sunho Lee;Sunjoong Kim
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.36 no.5
    • /
    • pp.315-321
    • /
    • 2023
  • Long-span bridges are flexible structures with low natural frequencies and damping ratios, making them susceptible to vibrational serviceability problems. However, the current design guideline of South Korea assumes a uniform threshold of wind speed or vibrational amplitude to assess the occurrence of harmful vibrations, potentially overlooking the complex vibrational patterns observed in long-span bridges. In this study, we propose a pointwise vortex-induced vibration (VIV) detection method using a deep-learning-based signalsegmentation model. Departing from conventional supervised methods of data acquisition and manual labeling, we synthesize training data by generating sinusoidal waves with an envelope to accurately represent VIV. A Fourier synchrosqueezed transform is leveraged to extract time-frequency features, which serve as input data for training a bidirectional long short-term memory model. The effectiveness of the model trained on synthetic VIV data is demonstrated through a comparison with its counterpart trained on manually labeled real datasets from an actual cable-supported bridge.

A Deep Learning Based Approach to Recognizing Accompanying Status of Smartphone Users Using Multimodal Data (스마트폰 다종 데이터를 활용한 딥러닝 기반의 사용자 동행 상태 인식)

  • Kim, Kilho;Choi, Sangwoo;Chae, Moon-jung;Park, Heewoong;Lee, Jaehong;Park, Jonghun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.163-177
    • /
    • 2019
  • As smartphones are getting widely used, human activity recognition (HAR) tasks for recognizing personal activities of smartphone users with multimodal data have been actively studied recently. The research area is expanding from the recognition of the simple body movement of an individual user to the recognition of low-level behavior and high-level behavior. However, HAR tasks for recognizing interaction behavior with other people, such as whether the user is accompanying or communicating with someone else, have gotten less attention so far. And previous research for recognizing interaction behavior has usually depended on audio, Bluetooth, and Wi-Fi sensors, which are vulnerable to privacy issues and require much time to collect enough data. Whereas physical sensors including accelerometer, magnetic field and gyroscope sensors are less vulnerable to privacy issues and can collect a large amount of data within a short time. In this paper, a method for detecting accompanying status based on deep learning model by only using multimodal physical sensor data, such as an accelerometer, magnetic field and gyroscope, was proposed. The accompanying status was defined as a redefinition of a part of the user interaction behavior, including whether the user is accompanying with an acquaintance at a close distance and the user is actively communicating with the acquaintance. A framework based on convolutional neural networks (CNN) and long short-term memory (LSTM) recurrent networks for classifying accompanying and conversation was proposed. First, a data preprocessing method which consists of time synchronization of multimodal data from different physical sensors, data normalization and sequence data generation was introduced. We applied the nearest interpolation to synchronize the time of collected data from different sensors. Normalization was performed for each x, y, z axis value of the sensor data, and the sequence data was generated according to the sliding window method. Then, the sequence data became the input for CNN, where feature maps representing local dependencies of the original sequence are extracted. The CNN consisted of 3 convolutional layers and did not have a pooling layer to maintain the temporal information of the sequence data. Next, LSTM recurrent networks received the feature maps, learned long-term dependencies from them and extracted features. The LSTM recurrent networks consisted of two layers, each with 128 cells. Finally, the extracted features were used for classification by softmax classifier. The loss function of the model was cross entropy function and the weights of the model were randomly initialized on a normal distribution with an average of 0 and a standard deviation of 0.1. The model was trained using adaptive moment estimation (ADAM) optimization algorithm and the mini batch size was set to 128. We applied dropout to input values of the LSTM recurrent networks to prevent overfitting. The initial learning rate was set to 0.001, and it decreased exponentially by 0.99 at the end of each epoch training. An Android smartphone application was developed and released to collect data. We collected smartphone data for a total of 18 subjects. Using the data, the model classified accompanying and conversation by 98.74% and 98.83% accuracy each. Both the F1 score and accuracy of the model were higher than the F1 score and accuracy of the majority vote classifier, support vector machine, and deep recurrent neural network. In the future research, we will focus on more rigorous multimodal sensor data synchronization methods that minimize the time stamp differences. In addition, we will further study transfer learning method that enables transfer of trained models tailored to the training data to the evaluation data that follows a different distribution. It is expected that a model capable of exhibiting robust recognition performance against changes in data that is not considered in the model learning stage will be obtained.