• 제목/요약/키워드: perceptual loss

검색결과 59건 처리시간 0.022초

이중 분기 디코더를 사용하는 복소 중첩 U-Net 기반 음성 향상 모델 (Complex nested U-Net-based speech enhancement model using a dual-branch decoder)

  • 황서림;박성욱;박영철
    • 한국음향학회지
    • /
    • 제43권2호
    • /
    • pp.253-259
    • /
    • 2024
  • 본 논문에서는 이중 분기 디코더를 갖는 복소 중첩 U-Net 기반의 새로운 음성 향상 모델을 제안하였다. 제안된 모델은 음성 신호의 크기와 위상 성분을 동시에 추정할 수 있도록 복소 중첩 U-Net으로 구성되며, 디코더는 스펙트럼 사상과 시간 주파수 마스킹을 각각의 분기에서 수행하는 이중 분기 디코더 구조를 갖는다. 이때, 이중 분기 디코더 구조는 단일 디코더 구조에 비하여, 음성 정보의 손실을 최소화하면서 잡음을 효과적으로 제거할 수 있도록 한다. 실험은 음성 향상 모델 학습을 위해 보편적으로 사용되는 VoiceBank + DEMAND 데이터베이스 상에서 이루어졌으며, 다양한 객관적 평가 지표를 통해 평가되었다. 실험 결과, 이중 분기 디코더를 사용하는 복소 중첩 U-Net 기반 음성 향상 모델은 기존의 베이스라인과 비교하여 Perceptual Evaluation of Speech Quality(PESQ) 점수가 0.13가량 증가하였으며, 최근 제안된 음성 향상 모델들보다도 높은 객관적 평가 점수를 보였다.

The Analysis of Information Transfer Efficiency in Medical Image Display

  • 김종효;민병구;한만청;이충웅
    • 대한의용생체공학회:학술대회논문집
    • /
    • 대한의용생체공학회 1992년도 춘계학술대회
    • /
    • pp.55-57
    • /
    • 1992
  • Image display is the last step of imaging chain in which the diagnostic information is transformed into perceivable intensities and transformed to observer's eye-brain system. In this process, a certain part of information may be efficiently transfered and another part may be inefficiently transfered leading to information loss. In this study, the visual perceptual properties of image display on CRT monitor has been investigated. Psychophysical experiment of target image detection has been performed using CRT monitor for various background grey levels, and the threshold difference grey levels required for visual discrimination have been predicted by computer simulation with visual model.

  • PDF

범용 DSP를 이용한 MPEG-2 오디오 부호화기의 성능 개선 (An Enhancement of the MPEG-2 Audio Encoder Using General DSPs)

  • 오현오;김성윤;윤대희;차일환;이준용
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송공학회 1997년도 학술대회
    • /
    • pp.63-67
    • /
    • 1997
  • The ISO(International Standard Organization) has standardized MPEG-2 audio. The MPEG-2 audio compression algorithm is based upon subband analysis and exploits the human auditory characteristics to achieve a low bit rate with minimum perceptual loss of audio signal quality. This thesis presents an enhanced MPEG-2 audio encoder using multiple TMS320C30 general purpose DSP's. The developed system is made up of five slave boards and one master board. Each slave board performs susband analysis psychoacoustic parameter calculation for one channel, and the master board manages bit allocation, quantization, and bit-stream formatting for all channels. Parallel processing and pipelining techniques are used in hardware structure and fast algorithms are applied in each subroutine to implement a real-time process. The implemented system supports multichannel up to 5.1 and various bitrates.

  • PDF

Perceptual Fusion of Infrared and Visible Image through Variational Multiscale with Guide Filtering

  • Feng, Xin;Hu, Kaiqun
    • Journal of Information Processing Systems
    • /
    • 제15권6호
    • /
    • pp.1296-1305
    • /
    • 2019
  • To solve the problem of poor noise suppression capability and frequent loss of edge contour and detailed information in current fusion methods, an infrared and visible light image fusion method based on variational multiscale decomposition is proposed. Firstly, the fused images are separately processed through variational multiscale decomposition to obtain texture components and structural components. The method of guided filter is used to carry out the fusion of the texture components of the fused image. In the structural component fusion, a method is proposed to measure the fused weights with phase consistency, sharpness, and brightness comprehensive information. Finally, the texture components of the two images are fused. The structure components are added to obtain the final fused image. The experimental results show that the proposed method displays very good noise robustness, and it also helps realize better fusion quality.

국내 헬리콥터 조종사 인적오류 사고 분류 및 분석 (Classification and Analysis of Human Error Accidents of Helicopter Pilots in Korea)

  • 유태정;권영국;송병흠
    • 한국항공운항학회지
    • /
    • 제28권4호
    • /
    • pp.21-31
    • /
    • 2020
  • There are two to three helicopter accidents every year in Korea, representing 5.7 deaths per 100,000 flights. In this study, an analysis was conducted on helicopter accidents that occurred in Korea from 2005 to 2017. The accident analysis was based on the aircraft accident and incident report published by the Aircraft and Railway Accident Investigation Board. This Research analyzed the characteristics of accidents occurring in Korea caused by human error by pilots. Accident analysis was done by classifying the organization, flight mission, aircraft class, flight stage, accident cause, etc. Pilot's huan error was classified as Skill-based error, decision error and perceptual error in accordance with the HFACS taxonomy. The accidents caused by pilot's human error were classified into five categories: powerlines collision, loss of control, fuel exhaustion, unstable approach to reservoir, and elimination of tail rotor.

StyleGAN을 이용한 미래 2세대 얼굴 예측 웹 서비스 (Future 2nd generation face prediction web service using StyleGAN)

  • 김황;김민정;이지현;정진아;김동욱;곽호영
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2024년도 제69차 동계학술대회논문집 32권1호
    • /
    • pp.329-330
    • /
    • 2024
  • 최근 생성형 AI에 대한 수요가 상승하고 있으며, MZ세대의 자기애 성향으로 자신의 얼굴을 활용한 미디어 콘텐츠에 대한 호기심이 높아지고 있다. 이에 따라 본 논문에서는 MZ세대의 창의성과 미디어 소비를 고취시키기 위해, StyleGAN 기술을 중심으로 자신과 닮은 2세의 가상 모습을 생성하는 웹 서비스를 설계하고 구현하였다.

  • PDF

정현파 모델링을 이용한 폴리포닉 오디오 신호의 시간축 변화 (Time-Scale Modification of Polyphonic Audio Signals Using Sinusoidal Modeling)

  • 장호근;박주성
    • 한국음향학회지
    • /
    • 제20권2호
    • /
    • pp.77-85
    • /
    • 2001
  • 본 논문에서는 폴리포닉 음과 같은 복잡한 스펙트럼을 갖는 오디오 신호를 정현파 성분으로 모델링하고, 이를 바탕으로 고음질의 시간축 변화된 음을 얻는 방법을 제안한다. 입력 신호는 옥타브 밴드 구조의 다중 해상도 필터 뱅크를 통과하고 여기에서 나온 각 서브밴드 신호로부터 정현파 성분이 축출된다. 서브밴드 신호의 정현파 분석시 정현파 성분을 추출하는 구간의 크기를 국지적인 신호의 특성에 따라 다르게 해 주는 동적 세그멘테이션 방법을 적용한다. 이렇게 함으로써 기존 정현파 모델링에서 신호의 천이 구간에서 발생하는 퍼짐 현상을 개선하고, 시간축 변화 시에도 원래 음에 가까운 음질을 얻을 수 있다. 정현파 분석을 위한 스펙트럼 분석 도구로는 심리 음향 모델을 적용한 matching pursuit을 사용함으로써 정현파 성분의 갯수를 줄이고, matching pursuit의 반복 과정에 대한 합리적인 정지 조건을 제공할 수 있다. 정현파 성분으로 표현하기 어려운 신호의 잡음 성분은 원래 신호에서 정현파 성분으로 합성된 신호를 뺀 것으로 얻을 수 있으며, 스펙트럼 포락선 근사화 방법으로써 모델링된다. 본 논문의 알고리즘을 적용해 다양한 폴리포닉 음에 대해 실험한 결과 제안한 정현파 모델링 방법이 원래 신호의 음질을 잘 복원할 수 있고, 시간축 변화율이 큰 경우에도 신호의 천이 구간을 잘 표현할 수 있음을 확인하였다.

  • PDF

난청인의 주파수 선택도와 비대칭적 청각 필터를 고려한 난청 시뮬레이터 개발에 관한 연구 (A Study on Development of a Hearing Impairment Simulator considering Frequency Selectivity and Asymmetrical Auditory Filter of the Hearing Impaired)

  • 주상익;강현덕;송영록;이상민
    • 전기학회논문지
    • /
    • 제59권4호
    • /
    • pp.831-840
    • /
    • 2010
  • In this paper, we propose a hearing impairment simulator considering reduced frequency selectivity and asymmetrical auditory filter of the hearing impaired, and we verified the reduced frequency selectivity and asymmetrical auditory filter affected in speech perception through experiments. The reduced frequency selectivity has made embodied by spectral smearing using LPC(linear prediction coding). The shapes of auditory filter are asymmetrical different with each center frequency. Hearing impaired person which has hearing loss was differently changed with that of normal hearing people and it has different value for speech of quality through auditory filter. The experiments confirmed subjective test and objective test. The subjective experiments are composed of 4 kinds of tests: pure tone test, SRT(speech reception threshold) test, and WRS(word recognition score) test without spectral smearing, and WRS test with spectral smearing. The experiment of the hearing impairment simulator was performed from 9 subjects who have normal ears. The amount of spectral smearing was controlled by LPC order. The asymmetrical auditory filter of proposed hearing impairment simulator was simulated and then some tests to estimate the filter's performance objectively were performed. The objective experiment as simulated auditory filter's performance evaluation method used PESQ(perceptual evaluation of speech quality) and LLR(log likelihood ratio) for speech through auditory filter. The processed speech was evaluated objective speech quality and distortion using PESQ and LLR value. When hearing loss processed, PESQ and LLR value have big difference according to asymmetrical auditory filter in hearing impairment simulator.

난청인의 주파수 선택도 둔화현상이 음질에 미치는 영향 평가 (The Assessment on the Sound Quality of Reduced Frequency Selectivity of Hearing Impaired People)

  • 안홍섭;박규석;전유용;송영록;이상민
    • 전기학회논문지
    • /
    • 제60권6호
    • /
    • pp.1196-1203
    • /
    • 2011
  • The reduced frequency selectivity is a typical phenomenon of sensorineural hearing loss. In this paper, we compared two modeling methods for reduced frequency selectivity of hearing impaired people. The two models of reduced frequency selectivity were made using LPC(linear prediction coding) algorithm and bandwidth control algorithm based on ERB(equivalent rectangular bandwidth) of auditory filter, respectively. To compare the effectiveness of two models, we compared the result of PESQ (perceptual evaluation of speech quality) and LLR(log likelihood ratio) using 36 Korean words of two syllables. To verify the effect on noise condition, we mixed white and babble noise with 0dB and -3dB SNR to speech words. As the result, it is confirmed that the PESQ score of bandwidth control algorithm is higher than the score of LPC algorithm, on the other hands, and the LLR score of LPC algorithm is lower than the score of bandwidth control algorithm. It means that both non-linearity and widen auditory filter characteristics caused by reduced frequency selectivity could be more reflected in bandwidth control algorithm than in LPC algorithm.

디지털환경 매체로서 패션에 나타난 사이버네틱스의 특성에 관한 연구 (A Study of the Characteristics of Cybernetics Exhibited in Fashion as a Media of Digital Environment)

  • 김현수;김민자
    • 복식
    • /
    • 제55권4호
    • /
    • pp.79-94
    • /
    • 2005
  • The goal of this research, contorted from the perspective of media aesthetics, is to uncover the ways how mechanical/cybersensual fashion products express aesthetic characteristics of cybernetics, by comparing them with digital products designed by an application of cybernetics. The other goal is to provide a cultural and design framework of cybernetics as d digital-environmental medium for fashion in which hightech and human sensibilities are fused. The results urged to explore two new contrasting perceptual possibilities for an understanding of digital technology application: negative and positive feedbacks. Cybernetic optimism, centered on technological dimensions, focuses on a concept of fashion that emphasize instrumental aspects-efficiency and convenience. In contrast, cybernetic pessimism focuses on digital fashion that expresses environmental destruction and the loss of human identity. A comparative analysis of the aesthetics of expression in digital fashion design and digital industrial products from a cybernetic perspective showed that in digital environment society, the combination of negative and positive feedbacks resulted in design products in which internal and external aspects of beauty complemented each other.