• Title/Summary/Keyword: 음성망

Search Result 874, Processing Time 0.028 seconds

Query Normalization Using P-tuning of Large Pre-trained Language Model (Large Pre-trained Language Model의 P-tuning을 이용한 질의 정규화)

  • Suh, Soo-Bin;In, Soo-Kyo;Park, Jin-Seong;Nam, Kyeong-Min;Kim, Hyeon-Wook;Moon, Ki-Yoon;Hwang, Won-Yo;Kim, Kyung-Duk;Kang, In-Ho
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.396-401
    • /
    • 2021
  • 초거대 언어모델를 활용한 퓨샷(few shot) 학습법은 여러 자연어 처리 문제에서 좋은 성능을 보였다. 하지만 데이터를 활용한 추가 학습으로 문제를 추론하는 것이 아니라, 이산적인 공간에서 퓨샷 구성을 통해 문제를 정의하는 방식은 성능 향상에 한계가 존재한다. 이를 해결하기 위해 초거대 언어모델의 모수 전체가 아닌 일부를 추가 학습하거나 다른 신경망을 덧붙여 연속적인 공간에서 추론하는 P-tuning과 같은 데이터 기반 추가 학습 방법들이 등장하였다. 본 논문에서는 문맥에 따른 질의 정규화 문제를 대화형 음성 검색 서비스에 맞게 직접 정의하였고, 초거대 언어모델을 P-tuning으로 추가 학습한 경우 퓨샷 학습법 대비 정확도가 상승함을 보였다.

  • PDF

Compression of CNN Using Local Nonlinear Quantization in MPEG-NNR (MPEG-NNR 의 지역 비선형 양자화를 이용한 CNN 압축)

  • Lee, Jeong-Yeon;Moon, Hyeon-Cheol;Kim, Sue-Jeong;Kim, Jae-Gon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.07a
    • /
    • pp.662-663
    • /
    • 2020
  • 최근 MPEG 에서는 인공신경망 모델을 다양한 딥러닝 프레임워크에서 상호운용 가능한 포맷으로 압축 표현할 수 있는 NNR(Compression of Neural Network for Multimedia Content Description and Analysis) 표준화를 진행하고 있다. 본 논문에서는 MPEG-NNR 에서 CNN 모델을 압축하기 위한 지역 비선형 양자화(Local Non-linear Quantization: LNQ) 기법을 제시한다. 제안하는 LNQ 는 균일 양자화된 CNN 모델의 각 계층의 가중치 행렬 블록 단위로 추가적인 비선형 양자화를 적용한다. 또한, 제안된 LNQ 는 가지치기(pruning)된 모델의 경우 블록내의 영(zero) 값의 가중치들은 그대로 전송하고 영이 아닌 가중치만을 이진 군집화를 적용한다. 제안 기법은 음성 분류를 위한 CNN 모델(DCASE Task)의 압축 실험에서 기존 균일 양자화를 대비 동일한 분류 성능에서 약 1.78 배 압축 성능 향상이 있음을 확인하였다.

  • PDF

An Integrated E-model Implementation for Speech Quality Measurement in VoIP and VoLTE (VoIP와 VoLTE 음성 품질 측정을 위한 통합 E-model 구현)

  • Kim, Bog-Soon;Baek, Kwang-Hyun;Cho, Gi-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.7
    • /
    • pp.10-18
    • /
    • 2013
  • With advancing of mobile communication services and commercializing of VoLTE (Voice of LTE), it is getting to pay attention on QoS of VoLTE. This paper proposes an integrated E-model in which some factors influenced to service quality of VoIP and VoLTE based voice communication system are considered in calculating the voice quality of Wideband Codec. The model aims to calculate R value which reflects the situations of access network, network characteristics, terminals' usage and mobility. We mainly deal with the integrated E-model's structure, related algorithms and optimal parameters for VoLTE. Some experiments show that the voice quality difference between VoIP and VoiceChecker, and VoLTE and POLQA, is below 10%. With the proposed model, we can calculate the voice quality by making use of the factors directly affected to service quality and the environment of VoLTE terminal and network. As a result, we can estimate the service quality in advance, without measuring it in real wireless environment.

Performance Analysis of a Packet Voice Multiplexer Using the Overload Control Strategy by Bit Dropping (Bit-dropping에 의한 Overload Control 방식을 채용한 Packet Voice Multiplexer의 성능 분석에 관한 연구)

  • 우준석;은종관
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.1
    • /
    • pp.110-122
    • /
    • 1993
  • When voice is transmitted through packet switching network, there needs a overload control, that is, a control for the congestion which lasts short periods and occurrs in local extents. In this thesis, we analyzed the performance of the statistical packet voice multiplexer using the overload control strategy by bit dropping. We assume that the voice is coded accordng to (4,2) embedded ADPCM and that the voice packet is generated and transmitted according to the procedures in the CCITT recomendation G. 764. For the performance analysis, we must model the superposed packet arrival process to the multiplexer as exactly as possible. It is well known that interarrival times of the packets are highly correlated and for this reason MMPP is more suited for the modelling in the viewpoint of accuracy. Hence the packet arrival process in modeled as MMPP and the matrix geometric method is used for the performance analysis. Performance analysis is similar to the MMPP IG II queueing system. But the overload control makes the service time distribution G dependent on system status or queue length in the multiplexer. Through the performance analysis we derived the probability generating function for the queue length and using this we derived the mean and standard deviation of the queue length and waiting time. The numerical results are verified through the simulation and the results show that the values embedded in the departure times and that in the arbitrary times are almost the same. Results also show bit dropping reduces the mean and the variation of the queue length and those of the waiting time.

  • PDF

The Implementation of Sign Board Receiving DARC for Vehicle (차량용 FM 부가 방송 수신 전광판의 구현)

  • 김남두;최재석;김영길
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.11a
    • /
    • pp.560-565
    • /
    • 2002
  • In this paper, we implemented the sign board system that displays user's image, user's sentence, the information from DARC an[1 information based position by GPS module for vehicle. The existing sign board is displaying only user's image and sentence. Or other existing sign board is displaying the information via CDMA network. However, our system is also able to display the user's message like other system and gain the information more cheap by DARC. This system consists of 6 parts. The DARC control part classes the DARC information - news, weather, stock and time. The GPS control part gains moment and item to display with calculating the information of global position, direction, speed and satellite. The LED control part has two buffers to store and handle the image. The buffers help the system display various effected images on LED board. An external memory card includes the location based data, the option file and the displayed data files. The data files are stored by FAT 16 with the folder structure on external memory card. The USB controls the communication with PC. PC programs can control and monitor this system. This system is using G72l voice file format, for casting the information. This system was established at the vehicle and we monitored this system. The system displayed the DARC data , user's data and the location based data on the LED board, successfully.

  • PDF

Development an Artificial Neural Network to Predict Infectious Bronchitis Virus Infection in Laying Hen Flocks (산란계의 전염성 기관지염을 예측하기 위한 인공신경망 모형의 개발)

  • Pak Son-Il;Kwon Hyuk-Moo
    • Journal of Veterinary Clinics
    • /
    • v.23 no.2
    • /
    • pp.105-110
    • /
    • 2006
  • A three-layer, feed-forward artificial neural network (ANN) with sixteen input neurons, three hidden neurons, and one output neuron was developed to identify the presence of infectious bronchitis (IB) infection as early as possible in laying hen flocks. Retrospective data from flocks that enrolled IB surveillance program between May 2003 and November 2005 were used to build the ANN. Data set of 86 flocks was divided randomly into two sets: 77 cases for training set and 9 cases for testing set. Input factors were 16 epidemiological findings including characteristics of the layer house, management practice, flock size, and the output was either presence or absence of IB. ANN was trained using training set with a back-propagation algorithm and test set was used to determine the network's capability to predict outcomes that it has never seen. Diagnostic performance of the trained network was evaluated by constructing receiver operating characteristic (ROC) curve with the area under the curve (AUC), which were also used to determine the best positivity criterion for the model. Several different ANNs with different structures were created. The best-fitted trained network, IBV_D1, was able to predict IB in 73 cases out of 77 (diagnostic accuracy 94.8%) in the training set. Sensitivity and specificity of the trained neural network was 95.5% (42/44, 95% CI, 84.5-99.4) and 93.9% (31/33, 95% CI, 79.8-99.3), respectively. For testing set, AVC of the ROC curve for the IBV_D1 network was 0.948 (SE=0.086, 95% CI 0.592-0.961) in recognizing IB infection status accurately. At a criterion of 0.7149, the diagnostic accuracy was the highest with a 88.9% with the highest sensitivity of 100%. With this value of sensitivity and specificity together with assumed 44% of IB prevalence, IBV_D1 network showed a PPV of 80% and an NPV of 100%. Based on these findings, the authors conclude that neural network can be successfully applied to the development of a screening model for identifying IB infection in laying hen flocks.

A study on end-to-end speaker diarization system using single-label classification (단일 레이블 분류를 이용한 종단 간 화자 분할 시스템 성능 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.536-543
    • /
    • 2023
  • Speaker diarization, which labels for "who spoken when?" in speech with multiple speakers, has been studied on a deep neural network-based end-to-end method for labeling on speech overlap and optimization of speaker diarization models. Most deep neural network-based end-to-end speaker diarization systems perform multi-label classification problem that predicts the labels of all speakers spoken in each frame of speech. However, the performance of the multi-label-based model varies greatly depending on what the threshold is set to. In this paper, it is studied a speaker diarization system using single-label classification so that speaker diarization can be performed without thresholds. The proposed model estimate labels from the output of the model by converting speaker labels into a single label. To consider speaker label permutations in the training, the proposed model is used a combination of Permutation Invariant Training (PIT) loss and cross-entropy loss. In addition, how to add the residual connection structures to model is studied for effective learning of speaker diarization models with deep structures. The experiment used the Librispech database to generate and use simulated noise data for two speakers. When compared with the proposed method and baseline model using the Diarization Error Rate (DER) performance the proposed method can be labeling without threshold, and it has improved performance by about 20.7 %.

A Study on the Diphone Recognition of Korean Connected Words and Eojeol Reconstruction (한국어 연결단어의 이음소 인식과 어절 형성에 관한 연구)

  • ;Jeong, Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.46-63
    • /
    • 1995
  • This thesis described an unlimited vocabulary connected speech recognition system using Time Delay Neural Network(TDNN). The recognition unit is the diphone unit which includes the transition section of two phonemes, and the number of diphone unit is 329. The recognition processing of korean connected speech is composed by three part; the feature extraction section of the input speech signal, the diphone recognition processing and post-processing. In the feature extraction section, the extraction of diphone interval in input speech signal is carried and then the feature vectors of 16th filter-bank coefficients are calculated for each frame in the diphone interval. The diphone recognition processing is comprised by the three stage hierachical structure and is carried using 30 Time Delay Neural Networks. particularly, the structure of TDNN is changed so as to increase the recognition rate. The post-processing section, mis-recognized diphone strings are corrected using the probability of phoneme transition and the probability o phoneme confusion and then the eojeols (Korean word or phrase) are formed by combining the recognized diphones.

  • PDF

A Study on the Pitch Contour Generator with Neural Network in the Isolated Words (신경망을 이용한 고립단어에서의 피치변화곡선 발생기에 관한 연구)

  • Lim Unchun;Kwak Jingu;Chang Sokwang
    • Proceedings of the KSPS conference
    • /
    • 1996.02a
    • /
    • pp.137-155
    • /
    • 1996
  • The purpose of this paper is to generate a pitch contour which is affected by tile phonetic environment and the number of syllables in each Korean isolated word using a neural network. To do this, we analyzed a set of 513 Korean isolated words, consisting of 1-4 syllables and extracted the pitch contour and the duration of each phoneme in all the words. The total number of phonemes we analyzed is about 3800. After that we approximated the pitch contour with a 1st order polynominal by a regression analysis. We could get the slope, the initial pitch and the duration of each phoneme. We used these 3 parameters as the target pattern of the neural network and let the neural network learn the rule of the variation of the pitch and duration, which was affected by the phonetic environment of each phoneme. We used 7 consecutive phoneme strings as an input pattern for a neural network to make the network learn the effect of phonetic environment around the center phoneme. In the learning phase, we used 3545 items(463 words) as target patterns which contained the phonetic environment of front and rear 3 phonemes and the neural network showed the correctness rate of 98.43%, 98.59%, 97.7% in the estimation of the duration, the slope, the initial pitch. In the recall phase, we tested the performance of tile neural network with 251 items(50 words) which weren't need as learning data and we could get the good correctness rate of 97.34%, 95.45%, 96.3% in the generation of the duration, the slope, and the initial pitch of each phoneme.

  • PDF

A Study on a Elevator Emergency Call Device System and Performance Evaluation based on ICT for Efficient Handling in Emergency Situation (위급상황 시 효율적인 대처를 위한 ICT 기반의 엘리베이터 비상통화장치 시스템 및 성능 비교 연구)

  • Jung, Se-Hoon;Park, Sung-Kyun;Park, Hong-Jun;So, Won-Ho;Park, Dong-Kook;Sim, Chun-Bo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.4
    • /
    • pp.449-459
    • /
    • 2015
  • A lot of people were trapped in elevators without power supply when BLACK-OUT situation occurred in 2011. The telephone network of control room connected to the elevators had problem operating poorly. In this paper we propose an ICT based elevator emergency call device prototype system and evaluate the performance of the system. The proposed system quickly responds in emergency situation to guarantee passenger safety. For the goal, firstly the system tries to connect to a control room. If it fails the system attempts to call numbers for emergency contact and a rescue team sequentially. The system is designed to quickly support emergency contact as well. Finally, the information of elevator failure is rapidly transferred to the failure process device by the proposed system.