• Title/Summary/Keyword: Cross-Entropy

Search Result 113, Processing Time 0.023 seconds

A study on end-to-end speaker diarization system using single-label classification (단일 레이블 분류를 이용한 종단 간 화자 분할 시스템 성능 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.536-543
    • /
    • 2023
  • Speaker diarization, which labels for "who spoken when?" in speech with multiple speakers, has been studied on a deep neural network-based end-to-end method for labeling on speech overlap and optimization of speaker diarization models. Most deep neural network-based end-to-end speaker diarization systems perform multi-label classification problem that predicts the labels of all speakers spoken in each frame of speech. However, the performance of the multi-label-based model varies greatly depending on what the threshold is set to. In this paper, it is studied a speaker diarization system using single-label classification so that speaker diarization can be performed without thresholds. The proposed model estimate labels from the output of the model by converting speaker labels into a single label. To consider speaker label permutations in the training, the proposed model is used a combination of Permutation Invariant Training (PIT) loss and cross-entropy loss. In addition, how to add the residual connection structures to model is studied for effective learning of speaker diarization models with deep structures. The experiment used the Librispech database to generate and use simulated noise data for two speakers. When compared with the proposed method and baseline model using the Diarization Error Rate (DER) performance the proposed method can be labeling without threshold, and it has improved performance by about 20.7 %.

A Management Plan According to the Estimation of Nutria (Myocastorcoypus) Distribution Density and Potential Suitable Habitat (뉴트리아(Myocastor coypus) 분포밀도 및 잠재적 서식가능지역 예측에 따른 관리방향)

  • Kim, Areum;Kim, Young-Chae;Lee, Do-Hun
    • Journal of Environmental Impact Assessment
    • /
    • v.27 no.2
    • /
    • pp.203-214
    • /
    • 2018
  • The purpose of this study is to estimate the concentrated distribution area of nutria (Myocastor coypus) and potential suitable habitat and to provide useful data for the effective management direction setting. Based on the nationwide distribution data of nutria, the cross-validation value was applied to analyze the distribution density. As a result, the concentrated distribution areas thatrequired preferential elimination is found in 14 administrative areas including Busan Metropolitan City, Daegu Metropolitan City, 11 cities and counties in Gyeongsangnam-do and 1 county in Gyeongsangbuk-do. In the potential suitable habitat estimation using a MaxEnt (Maximum Entropy) model, the possibility of emergency was found in the Nakdong River middle and lower stream area and the Seomjin riverlower stream area and Gahwacheon River area. As for the contribution by variables of a model, it showed DEM, precipitation of driest month, min temperature of coldest month and distance from river had contribution from the highest order. In terms of the relation with the probability of appearance, the probability of emergence was higher than the threshold value in areas with less than 34m of altitude, with $-5.7^{\circ}C{\sim}-0.6^{\circ}C$ of min temperature of the coldest month, with 15-30mm of precipitation of the driest month and with less than 1,373m away from the river. Variables that Altitude, existence of water and wintertemperature affected settlement and expansion of nutria, considering the research results and the physiological and ecological characteristics of nutria. Therefore, it is necessary to reflect them as important variables in the future habitable area detection and expansion estimation modeling. It must be essential to distinguish the concentrated distribution area and the management area of invasive alien species such as nutria and to establish and apply a suitable management strategy to the management site for the permanent control. The results in this study can be used as useful data for a strategic management such as rapid management on the preferential management area and preemptive and preventive management on the possible spreading area.

A Deep Learning Based Approach to Recognizing Accompanying Status of Smartphone Users Using Multimodal Data (스마트폰 다종 데이터를 활용한 딥러닝 기반의 사용자 동행 상태 인식)

  • Kim, Kilho;Choi, Sangwoo;Chae, Moon-jung;Park, Heewoong;Lee, Jaehong;Park, Jonghun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.163-177
    • /
    • 2019
  • As smartphones are getting widely used, human activity recognition (HAR) tasks for recognizing personal activities of smartphone users with multimodal data have been actively studied recently. The research area is expanding from the recognition of the simple body movement of an individual user to the recognition of low-level behavior and high-level behavior. However, HAR tasks for recognizing interaction behavior with other people, such as whether the user is accompanying or communicating with someone else, have gotten less attention so far. And previous research for recognizing interaction behavior has usually depended on audio, Bluetooth, and Wi-Fi sensors, which are vulnerable to privacy issues and require much time to collect enough data. Whereas physical sensors including accelerometer, magnetic field and gyroscope sensors are less vulnerable to privacy issues and can collect a large amount of data within a short time. In this paper, a method for detecting accompanying status based on deep learning model by only using multimodal physical sensor data, such as an accelerometer, magnetic field and gyroscope, was proposed. The accompanying status was defined as a redefinition of a part of the user interaction behavior, including whether the user is accompanying with an acquaintance at a close distance and the user is actively communicating with the acquaintance. A framework based on convolutional neural networks (CNN) and long short-term memory (LSTM) recurrent networks for classifying accompanying and conversation was proposed. First, a data preprocessing method which consists of time synchronization of multimodal data from different physical sensors, data normalization and sequence data generation was introduced. We applied the nearest interpolation to synchronize the time of collected data from different sensors. Normalization was performed for each x, y, z axis value of the sensor data, and the sequence data was generated according to the sliding window method. Then, the sequence data became the input for CNN, where feature maps representing local dependencies of the original sequence are extracted. The CNN consisted of 3 convolutional layers and did not have a pooling layer to maintain the temporal information of the sequence data. Next, LSTM recurrent networks received the feature maps, learned long-term dependencies from them and extracted features. The LSTM recurrent networks consisted of two layers, each with 128 cells. Finally, the extracted features were used for classification by softmax classifier. The loss function of the model was cross entropy function and the weights of the model were randomly initialized on a normal distribution with an average of 0 and a standard deviation of 0.1. The model was trained using adaptive moment estimation (ADAM) optimization algorithm and the mini batch size was set to 128. We applied dropout to input values of the LSTM recurrent networks to prevent overfitting. The initial learning rate was set to 0.001, and it decreased exponentially by 0.99 at the end of each epoch training. An Android smartphone application was developed and released to collect data. We collected smartphone data for a total of 18 subjects. Using the data, the model classified accompanying and conversation by 98.74% and 98.83% accuracy each. Both the F1 score and accuracy of the model were higher than the F1 score and accuracy of the majority vote classifier, support vector machine, and deep recurrent neural network. In the future research, we will focus on more rigorous multimodal sensor data synchronization methods that minimize the time stamp differences. In addition, we will further study transfer learning method that enables transfer of trained models tailored to the training data to the evaluation data that follows a different distribution. It is expected that a model capable of exhibiting robust recognition performance against changes in data that is not considered in the model learning stage will be obtained.