Search | Korea Science

DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition (음성 인식을 위한 개선된 평균 예측 LMS 필터를 이용한 DNN 기반의 강인한 음성 특징 추출 및 신호 잡음 제거 기법)

Oh, SangYeob
- Journal of Convergence for Information Technology
- /
- v.11 no.6
- /
- pp.1-6
- /
- 2021
In the field of speech recognition, as the DNN is applied, the use of speech recognition is increasing, but the amount of calculation for parallel training needs to be larger than that of the conventional GMM, and if the amount of data is small, overfitting occurs. To solve this problem, we propose an efficient method for robust voice feature extraction and voice signal noise removal even when the amount of data is small. Speech feature extraction efficiently extracts speech energy by applying the difference in frame energy for speech and the zero-crossing ratio and level-crossing ratio that are affected by the speech signal. In addition, in order to remove noise, the noise of the speech signal is removed by removing the noise of the speech signal with an average predictive improved LMS filter with little loss of speech information while maintaining the intrinsic characteristics of speech in detection of the speech signal. The improved LMS filter uses a method of processing noise on the input speech signal by adjusting the active parameter threshold for the input signal. As a result of comparing the method proposed in this paper with the conventional frame energy method, it was confirmed that the error rate at the start point of speech is 7% and the error rate at the end point is improved by 11%.
https://doi.org/10.22156/CS4SMB.2021.11.06.001 인용 PDF KSCI

Extending StarGAN-VC to Unseen Speakers Using RawNet3 Speaker Representation (RawNet3 화자 표현을 활용한 임의의 화자 간 음성 변환을 위한 StarGAN의 확장)

Bogyung Park;Somin Park;Hyunki Hong
- KIPS Transactions on Software and Data Engineering
- /
- v.12 no.7
- /
- pp.303-314
- /
- 2023
Voice conversion, a technology that allows an individual's speech data to be regenerated with the acoustic properties(tone, cadence, gender) of another, has countless applications in education, communication, and entertainment. This paper proposes an approach based on the StarGAN-VC model that generates realistic-sounding speech without requiring parallel utterances. To overcome the constraints of the existing StarGAN-VC model that utilizes one-hot vectors of original and target speaker information, this paper extracts feature vectors of target speakers using a pre-trained version of Rawnet3. This results in a latent space where voice conversion can be performed without direct speaker-to-speaker mappings, enabling an any-to-any structure. In addition to the loss terms used in the original StarGAN-VC model, Wasserstein distance is used as a loss term to ensure that generated voice segments match the acoustic properties of the target voice. Two Time-Scale Update Rule (TTUR) is also used to facilitate stable training. Experimental results show that the proposed method outperforms previous methods, including the StarGAN-VC network on which it was based.
https://doi.org/10.3745/KTSDE.2023.12.7.303 인용 PDF

Reconstruction of Stereo MR Angiography Optimized to View Position and Distance using MIP (최대강도투사를 이용한 관찰 위치와 거리에 최적화 된 입체 자기공명 뇌 혈관영상 재구성)

Shin, Seok-Hyun;Hwang, Do-Sik
- Investigative Magnetic Resonance Imaging
- /
- v.16 no.1
- /
- pp.67-75
- /
- 2012
Purpose : We studied enhanced method to view the vessels in the brain using Magnetic Resonance Angiography (MRA). Noticing that Maximum Intensity Projection (MIP) image is often used to evaluate the arteries of the neck and brain, we propose a new method for view brain vessels to stereo image in 3D space with more superior and more correct compared with conventional method. Materials and Methods: We use 3T Siemens Tim Trio MRI scanner with 4 channel head coil and get a 3D MRA brain data by fixing volunteers head and radiating Phase Contrast pulse sequence. MRA brain data is 3D rotated according to the view angle of each eyes. Optimal view angle (projection angle) is determined by the distance between eye and center of the data. Newly acquired MRA data are projected along with the projection line and display only the highest values. Each left and right view MIP image is integrated through anaglyph imaging method and optimal stereoscopic MIP image is acquired. Results: Result image shows that proposed method let enable to view MIP image at any direction of MRA data that is impossible to the conventional method. Moreover, considering disparity and distance from viewer to center of MRA data at spherical coordinates, we can get more realistic stereo image. In conclusion, we can get optimal stereoscopic images according to the position that viewers want to see and distance between viewer and MRA data. Conclusion: Proposed method overcome problems of conventional method that shows only specific projected image (z-axis projection) and give optimal depth information by converting mono MIP image to stereoscopic image considering viewers position. And can display any view of MRA data at spherical coordinates. If the optimization algorithm and parallel processing is applied, it may give useful medical information for diagnosis and treatment planning in real-time.
PDF KSCI

Template-Based Object-Order Volume Rendering with Perspective Projection (원형기반 객체순서의 원근 투영 볼륨 렌더링)

Koo, Yun-Mo;Lee, Cheol-Hi;Shin, Yeong-Gil
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.7
- /
- pp.619-628
- /
- 2000
Abstract Perspective views provide a powerful depth cue and thus aid the interpretation of complicated images. The main drawback of current perspective volume rendering is the long execution time. In this paper, we present an efficient perspective volume rendering algorithm based on coherency between rays. Two sets of templates are built for the rays cast from horizontal and vertical scanlines in the intermediate image which is parallel to one of volume faces. Each sample along a ray is calculated by interpolating neighboring voxels with the pre-computed weights in the templates. We also solve the problem of uneven sampling rate due to perspective ray divergence by building more templates for the regions far away from a viewpoint. Since our algorithm operates in object-order, it can avoid redundant access to each voxel and exploit spatial data coherency by using run-length encoded volume. Experimental results show that the use of templates and the object-order processing with run-length encoded volume provide speedups, compared to the other approaches. Additionally, the image quality of our algorithm improves by solving uneven sampling rate due to perspective ray di vergence.
PDF

A Frequency Domain DV-to-MPEG-2 Transcoding (DV에서 MPEG-2로의 주파수 영역 변환 부호화)

Kim, Do-Nyeon;Yun, Beom-Sik;Choe, Yun-Sik
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.38 no.2
- /
- pp.138-148
- /
- 2001
Digital Video (DV) coding standards for digital video cassette recorder are based mainly on DCT and variable length coding. DV has low hardware complexity but high compressed bit rate of about 26 Mb/s. Thus, it is necessary to encode video with low complex video coding at the studios and then transcode compressed video into MPEG-2 for video-on-demand system. Because these coding methods exploit DCT, transcoding in the DCT domain can reduce computational complexity by excluding duplicated procedures. In transcoding DV into MPEC-2 intra coding, multiplying matrix by transformed data is used for 4:1:1-to-4:2:2 chroma format conversion and the conversion from 2-4-8 to 8-8 DCT mode, and therefore enables parallel processing. Variance of sub block for MPEG-2 rate control is computed completely in the DCT domain. These are verified through experiments. We estimate motion hierarchically using DCT coefficients for transcoding into MPEG-2 inter coding. First, we estimate motion of a macro block (MB) only with 4 DC values of 4 sub blocks and then estimate motion with 16-point MB using IDCT of 2$\times$2 low frequencies in each sub block, and finish estimation at a sub pixel as the fifth step. ME with overlapped search range shows better PSNR performance than ME without overlapping.
PDF

Study of perception of the visual depth caused by the color correction (입체영상 제작에서 색 보정 결과가 입체감 인지에 미치는 영향 연구)

Han, Myung-Hee;Kim, Chee-Yong
- Journal of Digital Contents Society
- /
- v.11 no.2
- /
- pp.177-184
- /
- 2010
These days, as digital producing technique has been developed, 3D imaging technique is used in high-tech computer and T.V. Also study for 3D producing technique is actively in progress. Moreover, as James Cameron's movie, 'Avatar' released in 2009 was a box office hit, the issue about 3D image came to the fore again. At this point, I decided to study the effect of the visual depth caused by the color correction during the post-production stage. The purpose of this study is to offer information about processing effective images through data about the effect of the visual depth that applies the color correction during the post-production stage. Basically, I supposed that color and contract would have effects on depth of 3D image. As a result, I could find out the changes of visual depth, space perception and sense of depth throughout the experiment. Applying this result,, I produced the 15 minutes of 3D advertisement movie and I found out that the color correction during the post-production stage was very effective for 3D depth. The left image and the right image by beam splitter based rig and parallel rig were used for this study. Also I adjusted the strong contrast by the color correction during the post-production stage after correcting convergence and visual depth during editing. As a result, I could produce images which had strong sense of space and sense of depth.
PDF KSCI

Complexity-based Sample Adaptive Offset Parallelism (복잡도 기반 적응적 샘플 오프셋 병렬화)

Ryu, Eun-Kyung;Jo, Hyun-Ho;Seo, Jung-Han;Sim, Dong-Gyu;Kim, Doo-Hyun;Song, Joon-Ho
- Journal of Broadcast Engineering
- /
- v.17 no.3
- /
- pp.503-518
- /
- 2012
In this paper, we propose a complexity-based parallelization method of the sample adaptive offset (SAO) algorithm which is one of HEVC in-loop filters. The SAO algorithm can be regarded as region-based process and the regions are obtained and represented with a quad-tree scheme. A offset to minimize a reconstruction error is sent for each partitioned region. The SAO of the HEVC can be parallelized in data-level. However, because the sizes and complexities of the SAO regions are not regular, workload imbalance occurs with multi-core platform. In this paper, we propose a LCU-based SAO algorithm and a complexity prediction algorithm for each LCU. With the proposed complexity-based LCU processing, we found that the proposed algorithm is faster than the sequential implementation by a factor of 2.38 times. In addition, the proposed algorithm is faster than regular parallel implementation SAO by 21%.
https://doi.org/10.5909/JBE.2012.17.3.503 인용 PDF KSCI

A practial design of direct digital frequency synthesizer with multi-ROM configuration (병렬 구조의 직접 디지털 주파수 합성기의 설계)

이종선;김대용;유영갑
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.12
- /
- pp.3235-3245
- /
- 1996
A DDFS(Direct Digital Frequency Synthesizer) used in spread spectrum communication systems must need fast switching speed, high resolution(the step size of the synthesizer), small size and low power. The chip has been designed with four parallel sine look-up table to achieve four times throughput of a single DDFS. To achieve a high processing speed DDFS chip, a 24-bit pipelined CMOS technique has been applied to the phase accumulator design. To reduce the size of the ROM, each sine ROM of the DDFS is stored 0-.pi./2 sine wave data by taking advantage of the fact that only one quadrant of the sine needs to be stored, since the sine the sine has symmetric property. And the 8 bit of phase accumulator's output are used as ROM addresses, and the 2 MSBs control the quadrants to synthesis the sine wave. To compensate the spectrum purity ty phase truncation, the DDFS use a noise shaper that structure like a phase accumlator. The system input clock is divided clock, 1/2*clock, and 1/4*clock. and the system use a low frequency(1/4*clock) except MUX block, so reduce the power consumption. A 107MHz DDFS(Direct Digital Frequency Synthesizer) implemented using 0.8.mu.m CMOS gate array technologies is presented. The synthesizer covers a bandwidth from DC to 26.5MHz in steps of 1.48Hz with a switching speed of 0.5.mu.s and a turing latency of 55 clock cycles. The DDFS synthesizes 10 bit sine waveforms with a spectral purity of -65dBc. Power consumption is 276.5mW at 40MHz and 5V.
PDF

A Design of AXI hybrid on-chip Bus Architecture for the Interconnection of MPSoC (MPSoC 인터커넥션을 위한 AXI 하이브리드 온-칩 버스구조 설계)

Lee, Kyung-Ho;Kong, Jin-Hyeung
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.48 no.8
- /
- pp.33-44
- /
- 2011
In this paper, we presents a hybrid on-chip bus architecture based on the AMBA 3.0 AXI protocol for MPSoC with high performance and low power. Among AXI channels, data channels with a lot of traffic are designed by crossbar-switch architecture for massively parallel processing. On the other hand, addressing and write-response channels having a few of traffic is handled by shared-bus architecture due to the overheads of (areas, interconnection wires and power consumption) reduction. In experiments, the comparisons are carried out in terms of time, space and power domains for the verification of proposed hybrid on-chip bus architecture. For $16{\times}16$ bus configuration, the hybrid on-chip bus architecture has almost similar performance in time domain with respect to crossbar on-chip bus architecture, as the masters's latency is differenced about 9% and the total execution time is only about 4%. Furthermore, the hybrid on-chip bus architecture is very effective on the overhead reduction, such as it reduced about 47% of areas, and about 52% of interconnection wires, as well as about 66% of dynamic power consumption. Thus, the presented hybrid on-chip bus architecture is shown to be very effective for the MPSoC interconnection design aiming at high performance and low power.
PDF KSCI

Area-efficient Interpolation Architecture for Soft-Decision List Decoding of Reed-Solomon Codes (연판정 Reed-Solomon 리스트 디코딩을 위한 저복잡도 Interpolation 구조)

Lee, Sungman;Park, Taegeun
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.3
- /
- pp.59-67
- /
- 2013
Reed-Solomon (RS) codes are powerful error-correcting codes used in diverse applications. Recently, algebraic soft-decision decoding algorithm for RS codes that can correct the errors beyond the error correcting bound has been proposed. The algorithm requires very intensive computations for interpolation, therefore an efficient VLSI architecture, which is realizable in hardware with a moderate hardware complexity, is mandatory for various applications. In this paper, we propose an efficient architecture with low hardware complexity for interpolation in soft-decision list decoding of Reed-Solomon codes. The proposed architecture processes the candidate polynomial in such a way that the terms of X degrees are processed in serial and the terms of Y degrees are processed in parallel. The processing order of candidate polynomials adaptively changes to increase the efficiency of memory access for coefficients; this minimizes the internal registers and the number of memory accesses and simplifies the memory structure by combining and storing data in memory. Also, the proposed architecture shows high hardware efficiency, since each module is balanced in terms of latency and the modules are maximally overlapped in schedule. The proposed interpolation architecture for the (255, 239) RS list decoder is designed and synthesized using the DongbuHitek $0.18{\mu}m$ standard cell library, the number of gate counts is 25.1K and the maximum operating frequency is 200 MHz.
https://doi.org/10.5573/ieek.2013.50.3.059 인용 PDF KSCI

Search Result 751, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)