통합 검색 | Korea Science

네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계 (A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments)

이길호;윤재삼;오유리;김홍국
- 대한음성학회지:말소리
- /
- 제54호
- /
- pp.27-43
- /
- 2005
Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.
PDF

A Multi-Stage Convolution Machine with Scaling and Dilation for Human Pose Estimation

Nie, Yali;Lee, Jaehwan;Yoon, Sook;Park, Dong Sun
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제13권6호
- /
- pp.3182-3198
- /
- 2019
Vision-based Human Pose Estimation has been considered as one of challenging research subjects due to problems including confounding background clutter, diversity of human appearances and illumination changes in scenes. To tackle these problems, we propose to use a new multi-stage convolution machine for estimating human pose. To provide better heatmap prediction of body joints, the proposed machine repeatedly produces multiple predictions according to stages with receptive field large enough for learning the long-range spatial relationship. And stages are composed of various modules according to their strategic purposes. Pyramid stacking module and dilation module are used to handle problem of human pose at multiple scales. Their multi-scale information from different receptive fields are fused with concatenation, which can catch more contextual information from different features. And spatial and channel information of a given input are converted to gating factors by squeezing the feature maps to a single numeric value based on its importance in order to give each of the network channels different weights. Compared with other ConvNet-based architectures, we demonstrated that our proposed architecture achieved higher accuracy on experiments using standard benchmarks of LSP and MPII pose datasets.
https://doi.org/10.3837/tiis.2019.06.023 인용 PDF KSCI HTML

Dimmable Spatial Intensity Modulation for Visible-light Communication: Capacity Analysis and Practical Design

Kim, Byung Wook;Jung, Sung-Yoon
- Current Optics and Photonics
- /
- 제2권6호
- /
- pp.532-539
- /
- 2018
Multiple LED arrays can be utilized in visible-light communication (VLC) to improve communication efficiency, while maintaining smart illumination functionality through dimming control. This paper proposes a modulation scheme called "Spatial Intensity Modulation" (SIM), where the effective number of turned-on LEDs is employed for data modulation and dimming control in VLC systems. Unlike the conventional pulse-amplitude modulation (PAM), symbol intensity levels are not determined by the amplitude levels of a VLC signal from each LED, but by counting the number of turned-on LEDs, illuminating with a single amplitude level. Because the intensity of a SIM symbol and the target dimming level are determined solely in the spatial domain, the problems of conventional PAM-based VLC and related MIMO VLC schemes, such as unstable dimming control, non uniform illumination functionality, and burdens of channel prediction, can be solved. By varying the number and formation of turned-on LEDs around the target dimming level in time, the proposed SIM scheme guarantees homogeneous illumination over a target area. An analysis of the dimming capacity, which is the achievable communication rate under the target dimming level in VLC, is provided by deriving the turn-on probability to maximize the entropy of the SIM-based VLC system. In addition, a practical design of dimmable SIM scheme applying the multilevel inverse source coding (MISC) method is proposed. The simulation results under a range of parameters provide baseline data to verify the performance of the proposed dimmable SIM scheme and applications in real systems.
https://doi.org/10.3807/COPP.2018.2.6.532 인용 PDF KSCI HTML

A QoS-aware Adaptive Coloring Scheduling Algorithm for Co-located WBANs

Wang, Jingxian;Sun, Yongmei;Luo, Shuyun;Ji, Yuefeng
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제12권12호
- /
- pp.5800-5818
- /
- 2018
Interference may occur when several co-located wireless body area networks (WBANs) share the same channel simultaneously, which is compressed by resource scheduling generally. In this paper, a QoS-aware Adaptive Coloring (QAC) scheduling algorithm is proposed, which contains two components: interference sets determination and time slots assignment. The highlight of QAC is to determine the interference graph based on the relay scheme and adapted to the network QoS by multi-coloring approach. However, the frequent resource assignment brings in extra energy consumption and packet loss. Thus we come up with a launch condition for the QAC scheduling algorithm, that is if the interference duration is longer than a threshold predetermined, time slots rescheduling is activated. Furthermore, based on the relative distance and moving speed between WBANs, a prediction model for interference duration is proposed. The simulation results show that compared with the state-of-the-art approaches, the QAC scheduling algorithm has better performance in terms of network capacity, average delay and resource utility.
https://doi.org/10.3837/tiis.2018.12.011 인용 PDF KSCI

Correcting Misclassified Image Features with Convolutional Coding

문예지;김나영;이지은;강제원
- 한국방송∙미디어공학회:학술대회논문집
- /
- 한국방송∙미디어공학회 2018년도 추계학술대회
- /
- pp.11-14
- /
- 2018
The aim of this study is to rectify the misclassified image features and enhance the performance of image classification tasks by incorporating a channel- coding technique, widely used in telecommunication. Specifically, the proposed algorithm employs the error - correcting mechanism of convolutional coding combined with the convolutional neural networks (CNNs) that are the state - of- the- arts image classifier s. We develop an encoder and a decoder to employ the error - correcting capability of the convolutional coding. In the encoder, the label values of the image data are converted to convolutional codes that are used as target outputs of the CNN, and the network is trained to minimize the Euclidean distance between the target output codes and the actual output codes. In order to correct misclassified features, the outputs of the network are decoded through the trellis structure with Viterbi algorithm before determining the final prediction. This paper demonstrates that the proposed architecture advances the performance of the neural networks compared to the traditional one- hot encoding method.
PDF

Numerical analysis of the temperature distribution of the EM pump for the sodium thermo-hydraulic test loop of the GenIV PGSFR

Kwak, Jaesik;Kim, Hee Reyoung
- Nuclear Engineering and Technology
- /
- 제53권5호
- /
- pp.1429-1435
- /
- 2021
The temperature distribution of an electromagnetic pump was analyzed with a flow rate of 1380 L/min and a pressure of 4 bar designed for the sodium thermo-hydraulic test in the Sodium Test Loop for Safety Simulation and Assessment-Phase 1 (STELLA-1). The electromagnetic pump was used for the circulation of the liquid sodium coolant in the Intermediate Heat Transport System (IHTS) of the Prototype Gen-IV Sodium-cooled Fast Reactor (PGSFR) with an electric power of 150 MWe. The temperature distribution of the components of the electromagnetic pump was numerically analyzed to prevent functional degradation in the high temperature environment during pump operation. The heat transfer was numerically calculated using ANSYS Fluent for prediction of the temperature distribution in the excited coils, the electromagnet core, and the liquid sodium flow channel of the electromagnetic pump. The temperature distribution of operating electromagnetic pump was compared with cooling of natural and forced air circulation. The temperature in the coil, the core and the flow gap in the two conditions, natural circulation and forced circulation, were compared. The electromagnetic pump with cooling of forced circulation had better efficiency than natural circulation even considering consumption of the input power for the air blower. Accordingly, this study judged that forced cooling is good for both maintenance and efficiency of the electromagnetic pump.
https://doi.org/10.1016/j.net.2020.11.015 인용 PDF KSCI

Electroencephalography-based imagined speech recognition using deep long short-term memory network

Agarwal, Prabhakar;Kumar, Sandeep
- ETRI Journal
- /
- 제44권4호
- /
- pp.672-685
- /
- 2022
This article proposes a subject-independent application of brain-computer interfacing (BCI). A 32-channel Electroencephalography (EEG) device is used to measure imagined speech (SI) of four words (sos, stop, medicine, washroom) and one phrase (come-here) across 13 subjects. A deep long short-term memory (LSTM) network has been adopted to recognize the above signals in seven EEG frequency bands individually in nine major regions of the brain. The results show a maximum accuracy of 73.56% and a network prediction time (NPT) of 0.14 s which are superior to other state-of-the-art techniques in the literature. Our analysis reveals that the alpha band can recognize SI better than other EEG frequencies. To reinforce our findings, the above work has been compared by models based on the gated recurrent unit (GRU), convolutional neural network (CNN), and six conventional classifiers. The results show that the LSTM model has 46.86% more average accuracy in the alpha band and 74.54% less average NPT than CNN. The maximum accuracy of GRU was 8.34% less than the LSTM network. Deep networks performed better than traditional classifiers.
https://doi.org/10.4218/etrij.2021-0118 인용 PDF KSCI

초기 시청시간 패턴 분석을 통한 대흥행 드라마 예측 (Prediction of a hit drama with a pattern analysis on early viewing ratings)

남기환;성노윤
- 지능정보연구
- /
- 제24권4호
- /
- pp.33-49
- /
- 2018
TV 드라마는 타 장르에 비해 시청률과 채널 홍보 효과가 매우 크며, 한류를 통해 산업적 효과와 문화적 영향력을 확인시켜줬다. 따라서, 이와 같은 드라마의 흥행 여부를 예측하는 일은 방송 관련 산업에서 매우 중요한 부분임은 주지의 사실이다. 이를 위해서 본 연구에서는 2003년부터 2012년까지 10년간, 지상파 채널을 통해 방송된, 총 280개의 TV 미니시리즈 드라마를 분석하였다. 이들 드라마 중 평균 시청률 상위 45개, 하위 시청률 45개를 선정하여 흥행 드라마의 시청시간 분포 (5%~100%, 11-Step) 모형을 만들었다. 이들 기준 모형과 신규 드라마의 시청시간 분포와의 이격 거리를 Euclidean/Correlation으로 측정한 유사도(Similarity)를 통해, 시청자의 초기(1~5회) 시청시간 분포로 신규 드라마의 성패 여부를 예측하는 모델을 만들었다. 또한 총 방송 시간 중 70% 이상 시청한 시청자를 열혈 시청층(이하 열혈층) 으로 분류하고, 상위/하위 드라마의 평균값과 비교하여, 신규 드라마의 흥행여부를 판별할 수 있도록 설계하였다. 연구 결과 드라마의 초반 시청자 충성도(시청시간)는 드라마의 대흥행 여부를 예측하는데 중요한 요소임을 밝혔으며, 최대 75.47%의 확률로 대흥행 드라마의 탄생을 예측할 수 있었다.
https://doi.org/10.13088/jiis.2018.24.4.033 인용 PDF KSCI HTML

위성기반 GK2A의 대기운동벡터와 Aeolus/ALADIN 바람 비교 (Comparison of Wind Vectors Derived from GK2A with Aeolus/ALADIN)

신혜민;안명환;김지수;이시혜;이병일
- 대한원격탐사학회지
- /
- 제37권6_1호
- /
- pp.1631-1645
- /
- 2021
세계 최초 능동형 라이더 센서 Atmospheric Laser Doppler Instrument (ALADIN)의 바람 자료와 한국형 수치예보모델에 바람 자료로 활용되고 있는 Geostationary Korea Multi Purpose Satellite 2A (GK2A)의 대기운동벡터의 자료를 비교함으로써 두 위성의 바람 자료의 특징을 분석하였다. 2019년 9월부터 20220년 8월 1년의 자료를 ALADIN의 미(Mie)채널과 GK2A 적외채널에 대하여 비교한 결과 수집된 자료는 177,681개이며 평균 제곱근 오차(Root Mean Square Error; RMSE)는 3.73 m/s, 상관계수는 0.98이다. 상세한 분석을 위해 위도와 고도를 고려하여 비교한 결과, 대부분의 위도에서 표준화된 평균 제곱근 오차(Normalized Root Mean Squared Error; NRMSE)가 0.2~0.3으로 두 바람 자료가 일치하지만 상층, 중층의 경우 저위도지역에서, 하층의 경우 남반구 특정 위도(30°S-15°S)에서 0.4 이상으로 큰 값을 가진다. 이러한 결과는 계절에 상관없이 수증기채널, 가시채널에서도 동일하게 나타나며 채널 별 특징과 계절별 특징은 두드러지게 나타나지 않는다. 두 바람 자료 간에 차이가 큰 위도 영역에 대하여 구름의 분포를 확인해본 결과, 대기운동벡터의 고도 할당 정확도를 낮출 수 있는 권운 이나 적운이 다른 위도에 비해 더 많이 분포하고 있다. 이러한 특성에 따라, 정확한 고도 할당이 어려워 대기운동벡터의 오차가 크게 나타나는 남반구와 저위도 영역에서 ALADIN 바람 자료는 기존 대기운동벡터의 바람 정보를 보완함으로써 수치예보모델에 긍정적인 영향을 미칠 수 있음을 제시한다.
https://doi.org/10.7780/kjrs.2021.37.6.1.12 인용 PDF KSCI HTML

효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용 (A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market)

이모세;안현철
- 지능정보연구
- /
- 제24권1호
- /
- pp.167-181
- /
- 2018
지난 10여 년간 딥러닝(Deep Learning)은 다양한 기계학습 알고리즘 중에서 많은 주목을 받아 왔다. 특히 이미지를 인식하고 분류하는데 효과적인 알고리즘으로 알려져 있는 합성곱 신경망(Convolutional Neural Network, CNN)은 여러 분야의 분류 및 예측 문제에 널리 응용되고 있다. 본 연구에서는 기계학습 연구에서 가장 어려운 예측 문제 중 하나인 주식시장 예측에 합성곱 신경망을 적용하고자 한다. 구체적으로 본 연구에서는 그래프를 입력값으로 사용하여 주식시장의 방향(상승 또는 하락)을 예측하는 이진분류기로써 합성곱 신경망을 적용하였다. 이는 그래프를 보고 주가지수가 오를 것인지 내릴 것인지에 대해 경향을 예측하는 이른바 기술적 분석가를 모방하는 기계학습 알고리즘을 개발하는 과제라 할 수 있다. 본 연구는 크게 다음의 네 단계로 수행된다. 첫 번째 단계에서는 데이터 세트를 5일 단위로 나눈다. 두 번째 단계에서는 5일 단위로 나눈 데이터에 대하여 그래프를 만든다. 세 번째 단계에서는 이전 단계에서 생성된 그래프를 사용하여 학습용과 검증용 데이터 세트를 나누고 합성곱 신경망 분류기를 학습시킨다. 네 번째 단계에서는 검증용 데이터 세트를 사용하여 다른 분류 모형들과 성과를 비교한다. 제안한 모델의 유효성을 검증하기 위해 2009년 1월부터 2017년 2월까지의 약 8년간의 KOSPI200 데이터 2,026건의 실험 데이터를 사용하였다. 실험 데이터 세트는 CCI, 모멘텀, ROC 등 한국 주식시장에서 사용하는 대표적인 기술지표 12개로 구성되었다. 결과적으로 실험 데이터 세트에 합성곱 신경망 알고리즘을 적용하였을 때 로지스틱회귀모형, 단일계층신경망, SVM과 비교하여 제안모형인 CNN이 통계적으로 유의한 수준의 예측 정확도를 나타냈다.
https://doi.org/10.13088/jiis.2018.24.1.167 인용 PDF KSCI

검색결과 475건 처리시간 0.028초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)