• Title/Summary/Keyword: 음성 기록

Search Result 188, Processing Time 0.029 seconds

Study on the Vulnerabilities of Automatic Speech Recognition Models in Military Environments (군사적 환경에서 음성인식 모델의 취약성에 관한 연구)

  • Elim Won;Seongjung Na;Youngjin Ko
    • Convergence Security Journal
    • /
    • v.24 no.2
    • /
    • pp.201-207
    • /
    • 2024
  • Voice is a critical element of human communication, and the development of speech recognition models is one of the significant achievements in artificial intelligence, which has recently been applied in various aspects of human life. The application of speech recognition models in the military field is also inevitable. However, before artificial intelligence models can be applied in the military, it is necessary to research their vulnerabilities. In this study, we evaluates the military applicability of the multilingual speech recognition model "Whisper" by examining its vulnerabilities to battlefield noise, white noise, and adversarial attacks. In experiments involving battlefield noise, Whisper showed significant performance degradation with an average Character Error Rate (CER) of 72.4%, indicating difficulties in military applications. In experiments with white noise, Whisper was robust to low-intensity noise but showed performance degradation under high-intensity noise. Adversarial attack experiments revealed vulnerabilities at specific epsilon values. Therefore, the Whisper model requires improvements through fine-tuning, adversarial training, and other methods.

Design and Implementation of PDA-Based Busan Culture and Tourism Guide System (PDA 기반의 부산경남 문화 관광 안내 시스템의 설계 및 구현)

  • Cha, Jong-Woo;Kim, Hyun-Soo;Ann, Chul-Jun;Cho, Mi-Gyung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.11b
    • /
    • pp.837-840
    • /
    • 2003
  • PDA와 같은 휴대용 컴퓨터는 장소에 구애받지 않고 어디에서든지 사용할 수 있다는 장점이 있다. 본 논문에서는 이러한 PDA의 장점을 살려 여행자들이 여행을 하는 도중 어디서든지 부산경남 지역의 문화 관광 정보 및 숙박 시설 등에 대한 정보를 안내받을 수 있는 휴대용 문화관광 안내 시스템을 개발하였다. 개발된 시스템은 ADOCE(Microsoft ActiveX Data Objects for Windows CE)를 이용 데이터베이스와의 연동으로 부산 경남의 문화 관광 정보, 테마 언행, 숙박시설, 교통, 먹거리 등에 대한 정보들을 단순한 텍스트 정보만이 아닌 동영상, 음성, 지도 정보 등으로 제공한다. 여행자들에게 현 위치에서 다른 위치로 이동하고자 한 때 필요한 대략적인 지리 정보와 이동 거리 등에 대한 정보를 제공한다. 또한 여행자들이 관광지에 대한 여행기를 펜과 음성으로 기록하여 PC로 전송할 수 있는 기능을 제공한다.

  • PDF

Research on Implementing Digital Diary Minting Application By Using DALL-E2 and Blockchain (DALL-E2와 블록체인을 활용한 일기 작성 및 NFT 민팅 애플리케이션 구현)

  • Ha-Yoon Kim;Woo-Jung Park;You-Jeen Lee;So-Young Kim;Min-Jae Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.888-889
    • /
    • 2023
  • 문서를 통해 기록을 남기는 과거의 기록 방식은 현대에 이르러 블로그, 인스타그램 등 다양한 SNS를 활용하는 방식으로 변모하고 있다. SNS의 발달과 대중화는 현대인에게 일반적인 일기 작성 포맷으로 자리 잡고 있다. 증가하는 수요와 디지털 기술 혁신에 대비되는 기존의 수동적인 일기 작성 애플리케이션을 대체하기 위해 본 논문은 DALL-E2와 블록체인을 활용한 일기 작성 및 민팅 애플리케이션 구현을 제안한다. 사용자는 제안하는 애플리케이션을 통해 음성인식, 광학 문자인식을 통한 다양한 일기 작성 방식을 제공받고, 완성된 일기 이미지를 디지털 자산으로서 보존할 수 있다.

Relationship of Somatic Cell Score and Udder Type Traits of Holstein Cattle (체세포점수와 홀스타인 유방형질간의 관계)

  • Choi, Tae Jeong;Seo, Kang Seok;Kim, Sidong;Park, Byung Ho;Choi, Je Kwan;Yoon, Ho Paek;Na, Seung Hwan;Son, Sam Kyu;Kwon, Oh Sub;Cho, Kwang Hyun
    • Journal of Animal Science and Technology
    • /
    • v.50 no.3
    • /
    • pp.285-292
    • /
    • 2008
  • Data were taken from the dairy herd improve- ment program from the year 2000, composed of 10,929 first lactation cows consisting of 290,144 test-day records and 37,723 udder type records. The objective of the study was to estimate genetic and phenotypic correlation between fore udder attachment, rear udder height, rear udder width, udder cleft, udder depth, and somatic cell score (SCS) and to calculate heritability of udder depth, front teat length and SCS in Holstein cattle in Korea. The variance component estima- tion using test day model was determined by a derivative-free algorithm-restricted maximum likeli- hood(DF-REML) analysis method. Generally phenotypic correlations were very low between udder traits and lactation SCS which varied from -0.03 to -0.06. Heritability of all type traits and SCS was smaller than 0.12. The results of this study would be applicable to SCS using linear genetic evaluation for future studies.

A Method of Recognizing and Validating Road Name Address from Speech-oriented Text (음성 기반 도로명 주소 인식 및 주소 검증 기법)

  • Lee, Keonsoo;Kim, Jung-Yeon;Kang, Byeong-Gwon
    • Journal of Internet Computing and Services
    • /
    • v.22 no.1
    • /
    • pp.31-39
    • /
    • 2021
  • Obtaining delivery addresses from calls is one of the most important processes in TV home shopping business. By automating this process, the operational efficiency of TV home shopping can be increased. In this paper, a method of recognizing and validating road name address, which is the address system of South Korea, from speech oriented text is proposed. The speech oriented text has three challenges. The first is that the numbers are represented in the form of pronunciation. The second is that the recorded address has noises that are made from repeated pronunciation of the same address, or unordered address. The third is that the readability of the resulted address. For resolving these problems, the proposed method enhances the existing address databases provided by the Korea Post and Ministry of the Interior and Safety. Various types of pronouncing address are added, and heuristic rules for dividing ambiguous pronunciations are employed. And the processed address is validated by checking the existence in the official address database. Even though, this proposed method is for the STT result of the address pronunciation, this also can be used for any 3rd party services that need to validate road name address. The proposed method works robustly on noises such as positions change or omission of elements.

The fundamental frequency (f0) distribution of Korean speakers in a dialogue corpus using Praat and R (Praat과 R로 분석한 한국인 대화 음성 말뭉치의 fundamental frequency(f0)값 분포)

  • Byunggon Yang
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.17-25
    • /
    • 2023
  • This study examines the fundamental frequency(f0) distribution of 2,740 Korean speakers in a dialogue speech corpus. Praat and R were used for the collection and analysis of acoustical f0 data after removing extreme values considering the interquartile f0 range of the intonational phrases produced by each individual speaker. Results showed that the average f0 value of all speakers was 185 Hz and the median value was 187 Hz. The f0 data showed a positively skewed distribution of 0.11, and the kurtosis was -0.09, which is close to the normal distribution. The pitch values of daily conversations varied in the range of 238 Hz. Further examination of the male and female groups showed distinct median f0 values: 114 Hz for males and 199 Hz for females. A t-test between the two groups yielded a significant difference. The skewness representing the distribution shape was 1.24 for the male group and 0.58 for the female group. The kurtosis was 5.21 and 3.88 for the male and female groups, and the male group values appeared leptokurtic. A regression analysis between the median f0 and age yielded a slope of 0.15 for the male group and -0.586 for the female group, which indicated a divergent relationship. In conclusion, a normative f0 distribution of different Korean age and sex groups can be examined in the conversational speech corpus recorded by a massive number of participants. However, more rigorous data might be required to define a relation between age and f0 values.

Value and Prosect of individual diary as research materials : Based on the "The 12th May Diaries Collection" (개인 일기의 연구 자료로서의 가치와 전망 "5월12일 일기컬렉션"을 중심으로)

  • Choi, Hyo Jin;Yim, Jin Hee
    • The Korean Journal of Archival Studies
    • /
    • no.46
    • /
    • pp.95-152
    • /
    • 2015
  • "Archives of Everyday Life" refers to an organization or facility which collects, appraises, selects and preserves the document from the memory of individuals, groups, or a society through categorizing and classifying lives and cultures of ordinary people. The document includes materials such as diaries, autobiography, letters, and notes. It also covers any digital files or hypertext like posts from blogs and online communities, or photos uploaded on Social Network Services. Many research fields including the Records Management Studies has continuously claimed the necessity of collection and preservation of ordinary people's records on daily life produced every moment. Especially diary is a written record reflecting the facts experienced by an individual and his self-examination. Its originality, individuality and uniqueness are considered truly valuable as a document regardless of the era. Lately many diaries have been discovered and presented to the historical research communities, and diverse researchers in human and social studies have embarked more in-depth research on diaries, their authors, and social background of the time. Furthermore, researchers from linguistics, educational studies, and psychology analyze linguistic behaviors, status of cultural assimilation, and emotional or psychological changes of an author. In this study, we are conducting a metastudy from various research on diaries in order to reaffirm the value of "The 12th May Diaries Collection" as everyday life archives. "The 12th May Diaries Collection" consists of diaries produced and donated directly by citizens on the 12th May every year. It was only 2013 when Digital Archiving Institute in Univ. of Myungji organized the first "Annual call for the 12th May". Now more than 2,000 items were collected including hand writing diaries, digital documents, photos, audio and video files, etc. The age of participants also varies from children to senior citizens. In this study, quantitative analysis will be made on the diaries collected as well as more profound discoveries on the detailed contents of each item. It is not difficult to see stories about family and friends, school life, concerns over career path, daily life and feelings of citizens ranging all different generations, regions, and professions. Based on keyword and descriptors of each item, more comprehensive examination will be further made. Additionally this study will also provide suggestions to examine future research opportunities of these diaries for different fields such as linguistics, educational studies, historical studies or humanities considering diverse formats and contents of diaries. Finally this study will also discuss necessary tasks and challenges for "the 12th May Diaries Collection" to be continuously collected and preserved as Everyday Life Archives.

An Electromyographic Study of the Levator Palatini Activity in the Production of Korea Sentences Containing Three Types of Initial Stops Placed at the Postnasal Position (한국 구개열 환자의 농음(Fortis) 산출곤란 원인규명을 위한 실험음성학적 연구 -정상인에 관한 구개범거근 근전도 소견을 중심으로-)

  • Park, Hea-Suk;Hirose, Hajime;Kumada, Masanobu;Choi, Hong-Shik;Imagawa, Hiroshi;Umeda, Hiroyuki
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.15 no.2
    • /
    • pp.118-121
    • /
    • 2004
  • 배경 및 목적 : 한국 구개열환자에게는 된소리 자음(농음0의 구음산출의 난도가 높다는 것은 임상적으로 잘 알려져 있다. 그러므로 본 연구에서는 한국 구개열 환자에게 있어서 난도가 높은 농음의 산출 메커니즘의 기본적 요소를 규명하므로서 언어치료의 새로운 방법모색에 기여하고져 하였다. 연구방법; 비강자음에 후속된 3종의 어두 파열자음 산출시의 구개범거근의 근활동 양상의 차이를 비교검토하므로서 농음의 산출특성을 검색하고져 하였다. 관찰기록 방법은 근전도는 유구침금전극(hooked wire electrodes)을 구강내로부터 경점막적으로 유도하였다. 연구결과 : 격음과 농음의 파열자음에서 평음보다 높은 구개범거근의 근활동이 관찰되었으나 격음과 농음 사이에선 유의미적인 차이는 보이지 않았다. 결 론 : 금후의 과제로는 피험자를 늘려 재확인을 하는 일, 및 농음과 격음의 변별요소에 관해 더욱 검토할 필요가 있는 것이 시사되었다.

  • PDF

A New Temporal Filtering Method for Improved Automatic Lipreading (향상된 자동 독순을 위한 새로운 시간영역 필터링 기법)

  • Lee, Jong-Seok;Park, Cheol-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.15B no.2
    • /
    • pp.123-130
    • /
    • 2008
  • Automatic lipreading is to recognize speech by observing the movement of a speaker's lips. It has received attention recently as a method of complementing performance degradation of acoustic speech recognition in acoustically noisy environments. One of the important issues in automatic lipreading is to define and extract salient features from the recorded images. In this paper, we propose a feature extraction method by using a new filtering technique for obtaining improved recognition performance. The proposed method eliminates frequency components which are too slow or too fast compared to the relevant speech information by applying a band-pass filter to the temporal trajectory of each pixel in the images containing the lip region and, then, features are extracted by principal component analysis. We show that the proposed method produces improved performance in both clean and visually noisy conditions via speaker-independent recognition experiments.

Performance Comparison of Korean Dialect Classification Models Based on Acoustic Features

  • Kim, Young Kook;Kim, Myung Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.37-43
    • /
    • 2021
  • Using the acoustic features of speech, important social and linguistic information about the speaker can be obtained, and one of the key features is the dialect. A speaker's use of a dialect is a major barrier to interaction with a computer. Dialects can be distinguished at various levels such as phonemes, syllables, words, phrases, and sentences, but it is difficult to distinguish dialects by identifying them one by one. Therefore, in this paper, we propose a lightweight Korean dialect classification model using only MFCC among the features of speech data. We study the optimal method to utilize MFCC features through Korean conversational voice data, and compare the classification performance of five Korean dialects in Gyeonggi/Seoul, Gangwon, Chungcheong, Jeolla, and Gyeongsang in eight machine learning and deep learning classification models. The performance of most classification models was improved by normalizing the MFCC, and the accuracy was improved by 1.07% and F1-score by 2.04% compared to the best performance of the classification model before normalizing the MFCC.