Search | Korea Science

Voice Synthesis Detection Using Language Model-Based Speech Feature Extraction (언어 모델 기반 음성 특징 추출을 활용한 생성 음성 탐지)

Seung-min Kim;So-hee Park;Dae-seon Choi
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.34 no.3
- /
- pp.439-449
- /
- 2024
Recent rapid advancements in voice generation technology have enabled the natural synthesis of voices using text alone. However, this progress has led to an increase in malicious activities, such as voice phishing (voishing), where generated voices are exploited for criminal purposes. Numerous models have been developed to detect the presence of synthesized voices, typically by extracting features from the voice and using these features to determine the likelihood of voice generation.This paper proposes a new model for extracting voice features to address misuse cases arising from generated voices. It utilizes a deep learning-based audio codec model and the pre-trained natural language processing model BERT to extract novel voice features. To assess the suitability of the proposed voice feature extraction model for voice detection, four generated voice detection models were created using the extracted features, and performance evaluations were conducted. For performance comparison, three voice detection models based on Deepfeature proposed in previous studies were evaluated against other models in terms of accuracy and EER. The model proposed in this paper achieved an accuracy of 88.08%and a low EER of 11.79%, outperforming the existing models. These results confirm that the voice feature extraction method introduced in this paper can be an effective tool for distinguishing between generated and real voices.
https://doi.org/10.13089/JKIISC.2024.34.3.439 인용 PDF HTML

A Study on Contents Analysis for Descriptive Video Services (화면해설방송 콘텐츠 분석기술 연구)

AHN, Chung Hyun;Jang, Inseon
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2017.06a
- /
- pp.209-210
- /
- 2017
본 논문에서는 시각장애인의 방송시청을 위해 제공되는 화면해설방송에 있어 실제 방송프로그램내의 화면해설오디오의 비율을 정량적으로 분석하는 방안에 대해 제안한다. 이를 위해 약 100 여편의 화면해설방송을 직접 청취하여 비율을 추출하였으며, 소프트웨어를 통한 분석결과와 비교하였다.
PDF

Digital Watermarking for Text Document Using Diagonal Profile (Diagonal 프로파일을 이용한 텍스트 문서의 디지털 워터마킹)

정숙이;김은실;박지환
- Proceedings of the Korea Multimedia Society Conference
- /
- 2000.04a
- /
- pp.165-169
- /
- 2000
인터넷과 같은 개방형 컴퓨터 네트워크의 발전에 따라, 오디오 ,이미지, 비디오 EH는 텍스트 문서와 같은 멀티미디어 데이터에 대해 어느 정도의 열화없이도 지적 재산권의 불법적인 이용이 가능해졌다. 본 논문에서는 불법으로 배포되거나 복제되는 텍스트 문서의 저작권 보호를 위한 워터마팅 스킴을 제안한다. 이 스킴에서는 텍스트 문서 이미지에 대한 diagonal 프로파일을 이용하여 문서상에 원소유자의 비밀정보, 즉, 저작권 정보를 삽입하여 불법 복제를 억제하기 위한 새로운 워터마킹 미치 추출방법을 소개한다. 이 방법에 따른 diagonal 프로파일의 특성으로 인해 공격자에 의한 워터마크의 제거나 문서의 형태 변경을 쉽게 검출할 수 있다.
PDF

Copyright Protection of Multimedia Contents Using Watermark (워터마크를 이용한 멀티미디어 컨텐츠의 저작권 보호)

Seok, J.W.;Hong, J.W.
- Electronics and Telecommunications Trends
- /
- v.14 no.6 s.60
- /
- pp.64-73
- /
- 1999
디지털 워터마크는 디지털 데이터에 삽입된 후 검출되거나 추출될 수 있도록 원신호에 추가된 신호를 의미한다. 디지털 서명(signature)이라고 말하기도 하는 워터마크는 디지털 데이터에 삽입된 일종의 패턴으로써, 디지털 멀티미디어 저작물의 저작권 보호를 위해 최근 들어 활발히 연구되고 있는 분야이다. 본 고에서는 멀티미디어 데이터의 소유권을 보호할 수 있는 워터마킹 기술의 역사와 정의 및 응용범위, 워터마크가 갖추어야 할 조건들, 그리고 문자, 영상 및 오디오 데이터의 워터마킹 기술에 대해 살펴보았다.
https://doi.org/10.22648/ETRI.1999.J.140607 인용 PDF

The Research On the improvements of Speaker's Frequency Characteristic using DSP Audio Processor (DSP 오디오 프로세서를 이용한 스피커 주파수 특성 개선에 관한 연구)

Lee, Soon-Reyo;Choi, Hong-Sub
- Journal of Digital Contents Society
- /
- v.8 no.3
- /
- pp.341-346
- /
- 2007
The purpose of this paper is to propose the design of VADSM(Value-Added Digital Speaker Module) which tunes up the speaker unit by measuring the speaker's frequency responses and controlling EQ band. This module can reduce audible distortions at particular frequency band and improve some flatness in the speaker's frequency response. VADSM is composed of DSP AMP and speaker unit. When a speaker transforms electrical signal to sound, the magnitude response at some frequencies are more or less than normal level. So, DSP AMP can be used to adjust those magnitudes up or down by controlling its EQ bands.
PDF

Multi-Modal based ViT Model for Video Data Emotion Classification (영상 데이터 감정 분류를 위한 멀티 모달 기반의 ViT 모델)

Yerim Kim;Dong-Gyu Lee;Seo-Yeong Ahn;Jee-Hyun Kim
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2023.01a
- /
- pp.9-12
- /
- 2023
최근 영상 콘텐츠를 통해 영상물의 메시지뿐 아니라 메시지의 형식을 통해 전달된 감정이 시청하는 사람의 심리 상태에 영향을 주고 있다. 이에 따라, 영상 콘텐츠의 감정을 분류하는 연구가 활발히 진행되고 있고 본 논문에서는 대중적인 영상 스트리밍 플랫폼 중 하나인 유튜브 영상을 7가지의 감정 카테고리로 분류하는 여러 개의 영상 데이터 중 각 영상 데이터에서 오디오와 이미지 데이터를 각각 추출하여 학습에 이용하는 멀티 모달 방식 기반의 영상 감정 분류 모델을 제안한다. 사전 학습된 VGG(Visual Geometry Group)모델과 ViT(Vision Transformer) 모델을 오디오 분류 모델과 이미지 분류 모델에 이용하여 학습하고 본 논문에서 제안하는 병합 방법을 이용하여 병합 후 비교하였다. 본 논문에서는 기존 영상 데이터 감정 분류 방식과 다르게 영상 속에서 화자를 인식하지 않고 감정을 분류하여 최고 48%의 정확도를 얻었다.
PDF

Audio-Visual Scene Aware Dialogue System Utilizing Action From Vision and Language Features (이미지-텍스트 자질을 이용한 행동 포착 비디오 기반 대화시스템)

Jungwoo Lim;Yoonna Jang;Junyoung Son;Seungyoon Lee;Kinam Park;Heuiseok Lim
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.253-257
- /
- 2023
최근 다양한 대화 시스템이 스마트폰 어시스턴트, 자동 차 내비게이션, 음성 제어 스피커, 인간 중심 로봇 등의 실세계 인간-기계 인터페이스에 적용되고 있다. 하지만 대부분의 대화 시스템은 텍스트 기반으로 작동해 다중 모달리티 입력을 처리할 수 없다. 이 문제를 해결하기 위해서는 비디오와 같은 다중 모달리티 장면 인식을 통합한 대화 시스템이 필요하다. 기존의 비디오 기반 대화 시스템은 주로 시각, 이미지, 오디오 등의 다양한 자질을 합성하거나 사전 학습을 통해 이미지와 텍스트를 잘 정렬하는 데에만 집중하여 중요한 행동 단서와 소리 단서를 놓치고 있다는 한계가 존재한다. 본 논문은 이미지-텍스트 정렬의 사전학습 임베딩과 행동 단서, 소리 단서를 활용해 비디오 기반 대화 시스템을 개선한다. 제안한 모델은 텍스트와 이미지, 그리고 오디오 임베딩을 인코딩하고, 이를 바탕으로 관련 프레임과 행동 단서를 추출하여 발화를 생성하는 과정을 거친다. AVSD 데이터셋에서의 실험 결과, 제안한 모델이 기존의 모델보다 높은 성능을 보였으며, 대표적인 이미지-텍스트 자질들을 비디오 기반 대화시스템에서 비교 분석하였다.
PDF

Music Transcription Using Non-Negative Matrix Factorization (비음수 행렬 분해 (NMF)를 이용한 악보 전사)

Park, Sang-Ha;Lee, Seok-Jin;Sung, Koeng-Mo
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.2
- /
- pp.102-110
- /
- 2010
Music transcription is extracting pitch (the height of a musical note) and rhythm (the length of a musical note) information from audio file and making a music score. In this paper, we decomposed a waveform into frequency and rhythm components using Non-Negative Matrix Factorization (NMF) and Non-Negative Sparse coding (NNSC) which are often used for source separation and data clustering. And using the subharmonic summation method, fundamental frequency is calculated from the decomposed frequency components. Therefore, the accurate pitch of each score can be estimated. The proposed method successfully performed music transcription with its results superior to those of the conventional methods which used either NMF or NNSC.
https://doi.org/10.7776/ASK.2010.29.2.102 인용 PDF KSCI

A Study on Auditory Data Visualization Design for Multimedia Contents (멀티미디어 컨텐츠를 위한 청각데이터의 시각화 디자인에 관한 연구)

Hong, Sung-Dae;Park, Jin-Wan
- Archives of design research
- /
- v.18 no.1 s.59
- /
- pp.195-204
- /
- 2005
Due to the of evolution of digital technology, trends are moving toward personalization and customization in design (art), media, science. Existing mass media has been broadcasting to the general public due to technical and economic limitation and art works also communicate one-sidedly with spectators in the gallery or stage. But nowaday, it is possible for spectators to participate directly. We can make different products depending on the tastes of individuals who demand media or art. The essence of technology which makes it possible is 'interactive technology'. A goal of this research is to find out the true nature of the interactive design in multimedia contents and find the course of interactive communication design research. In this paper, we pass through two stages to solve this kind of problem. At first, we studied the concept of multimedia contents from the aspect of information revolution. Next, we decided our research topic to be 'visual reacting with audio' and made audio-visual art work as graphic designers. Through this research we can find the possibility to promote 'communication' in a broad sense, with appropriate interactive design.
PDF

Hand-held Multimedia Device Identification Based on Audio Source (음원을 이용한 멀티미디어 휴대용 단말장치 판별)

Lee, Myung Hwan;Jang, Tae Ung;Moon, Chang Bae;Kim, Byeong Man;Oh, Duk-Hwan
- Journal of Korea Society of Industrial Information Systems
- /
- v.19 no.2
- /
- pp.73-83
- /
- 2014
Thanks to the development of diverse audio editing Technology, audio file can be easily revised. As a result, diverse social problems like forgery may be caused. Digital forensic technology is actively studied to solve these problems. In this paper, a hand-held device identification method, an area of digital forensic technology is proposed. It uses the noise features of devices caused by the design and the integrated circuit of each device but cannot be identified by the audience. Wiener filter is used to get the noise sounds of devices and their acoustic features are extracted via MIRtoolbox and then they are trained by multi-layer neural network. To evaluate the proposed method, we use 5-fold cross-validation for the recorded data collected from 6 mobile devices. The experiments show the performance 99.9%. We also perform some experiments to observe the noise features of mobile devices are still useful after the data are uploaded to UCC. The experiments show the performance of 99.8% for UCC data.
https://doi.org/10.9723/jksiis.2014.19.2.073 인용 PDF KSCI

Search Result 170, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)