Search | Korea Science

Creation of Speech Corpora for STiLL at SiTEC (SiTEC의 STiLL관련 음성 코퍼스의 구축 현황)

Kim, Young-Il;Kim, Bong-Wan;Choi, Dae-Lim;Lee, Kwang-Hyun;Jeong, Eun-Soon;Lee, Yong-Ju
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.13-16
- /
- 2005
As language learning that utilizes speech and information processing technology is getting popular. Speech Information Technology & Promotion Center(SiTEC) has created and is distributing speech corpora for STiLL in order to support basic research and development of products. We will introduce the corpus for Korean and those for English which we have created and are distributing.
PDF

A Validity Study on Measurement of Mental Fatigue Using Speech Technology (음성기술을 이용한 정신피로 측정에 관한 타당성 연구)

Song, Seungkyu;Kim, Jongyeol;Jang, Junsu;Kwon, Chulhong
- Phonetics and Speech Sciences
- /
- v.5 no.1
- /
- pp.3-10
- /
- 2013
This study proposes a method to measure mental fatigue using speech technology, which has not been used in previous research and is easier than existing complex and difficult methods. It aims at establishing a relationship between the human voice and mental fatigue based on experiments to measure the influence of mental fatigue on the human voice. Two monotonous tasks of simple calculation such as finding the sum of three one digit numbers were used to measure the feeling of monotony and two sets of subjective questionnaires were used to measure mental fatigue. While thirty subjects perform the experiment, responses to the questionnaire and speech data were collected. Speech features related to speech source and the vocal tract filter were extracted from the speech data. According to the results, speech parameters deeply related to mental fatigue are a mean and standard deviation of fundamental frequency, jitter, and shimmer. This study shows that speech technology is a useful method for measuring mental fatigue.
https://doi.org/10.13064/KSSS.2013.5.1.003 인용 PDF

Applying Mobile Agent for Internet-based Distributed Speech Recognition

Saaim, Emrul Hamide Md;Alias, Mohamad Ashari;Ahmad, Abdul Manan;Ahmad, Jamal Nasir
- 제어로봇시스템학회:학술대회논문집
- /
- 2005.06a
- /
- pp.134-138
- /
- 2005
There are several application have been developed on internet-based speech recognition. Internet-based speech recognition is a distributed application and there were various techniques and methods have been using for that purposed. Currently, client-server paradigm was one of the popular technique that been using for client-server communication in web application. However, there is a new paradigm with the same purpose: mobile agent technology. Mobile agent technology has several advantages working on distributed internet-based system. This paper presents, applying mobile agent technology in internet-based speech recognition which based on client-server processing architecture.
PDF

Common Speech Database Collection and Validation for Communications (한국어 공통 음성 DB구축 및 오류 검증)

Lee Soo-jong;Kim Sanghun;Lee Youngjik
- MALSORI
- /
- no.46
- /
- pp.145-157
- /
- 2003
In this paper, we'd like to briefly introduce Korean common speech database, which project has been started to construct a large scaled speech database since 2002. The project aims at supporting the R&D environment of the speech technology for industries. It encourages domestic speech industries and activates speech technology domestic market. In the first year, the resulting common speech database consists of 25 kinds of databases considering various recording conditions such as telephone, PC, VoIP etc. The speech database will be widely used for speech recognition, speech synthesis, and speaker identification. On the other hand, although the database was originally corrected by manual, still it retains unknown errors and human errors. So, in order to minimize the errors in the database, we tried to find the errors based on the recognition errors and classify several kinds of errors. To be more effective than typical recognition technique, we will develop the automatic error detection method. In the future, we will try to construct new databases reflecting the needs of companies and universities.
PDF

A Low-Cost Speech to Sign Language Converter

Le, Minh;Le, Thanh Minh;Bui, Vu Duc;Truong, Son Ngoc
- International Journal of Computer Science & Network Security
- /
- v.21 no.3
- /
- pp.37-40
- /
- 2021
This paper presents a design of a speech to sign language converter for deaf and hard of hearing people. The device is low-cost, low-power consumption, and it can be able to work entirely offline. The speech recognition is implemented using an open-source API, Pocketsphinx library. In this work, we proposed a context-oriented language model, which measures the similarity between the recognized speech and the predefined speech to decide the output. The output speech is selected from the recommended speech stored in the database, which is the best match to the recognized speech. The proposed context-oriented language model can improve the speech recognition rate by 21% for working entirely offline. A decision module based on determining the similarity between the two texts using Levenshtein distance decides the output sign language. The output sign language corresponding to the recognized speech is generated as a set of sequential images. The speech to sign language converter is deployed on a Raspberry Pi Zero board for low-cost deaf assistive devices.
https://doi.org/10.22937/IJCSNS.2021.21.3.5 인용 PDF KSCI

Implementation of Speaker Verification Security System Using DSP Processor(TMS320C32) (DSP Processor(TMS320C32)를 이용한 화자인증 보안시스템의 구현)

Haam, Young-Jun;Kwon, Hyuk-Jae;Choi, Soo-Young;Jeong, lk-Joo
- Journal of Industrial Technology
- /
- v.21 no.B
- /
- pp.107-116
- /
- 2001
The speech includes various kinds of information : language information, speaker's information, affectivity, hygienic condition, utterance environment etc. when a person communicates with others. All technologies to utilize in real life processing this speech are called the speech technology. The speech technology contains speaker's information that among them and it includes a speech which is known as a speaker recognition. DTW(Dynamic Time Warping) is the speaker recognition technology that seeks the pattern of standard speech signal and the similarity degree in an inputted speech signal using dynamic programming. ln this study, using TMS320C32 DSP processor, we are to embody this DTW and to construct a security system.
PDF

A Speech Homomorphic Encryption Scheme with Less Data Expansion in Cloud Computing

Shi, Canghong;Wang, Hongxia;Hu, Yi;Qian, Qing;Zhao, Hong
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.13 no.5
- /
- pp.2588-2609
- /
- 2019
Speech homomorphic encryption has become one of the key components in secure speech storing in the public cloud computing. The major problem of speech homomorphic encryption is the huge data expansion of speech cipher-text. To address the issue, this paper presents a speech homomorphic encryption scheme with less data expansion, which is a probabilistic statistics and addition homomorphic cryptosystem. In the proposed scheme, the original digital speech with some random numbers selected is firstly grouped to form a series of speech matrix. Then, a proposed matrix encryption method is employed to encrypt that speech matrix. After that, mutual information in sample speech cipher-texts is reduced to limit the data expansion. Performance analysis and experimental results show that the proposed scheme is addition homomorphic, and it not only resists statistical analysis attacks but also eliminates some signal characteristics of original speech. In addition, comparing with Paillier homomorphic cryptosystem, the proposed scheme has less data expansion and lower computational complexity. Furthermore, the time consumption of the proposed scheme is almost the same on the smartphone and the PC. Thus, the proposed scheme is extremely suitable for secure speech storing in public cloud computing.
https://doi.org/10.3837/tiis.2019.05.020 인용 PDF KSCI HTML

Common Speech Database Collection for Telecommunications (통신망환경 한국어 공통음성 DB 구축)

Kim Sanghun;Park Moonwhan;Kim Hyunsuk
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.23-26
- /
- 2003
This paper presents common speech database collection for telecommunication applications. During 3 year project, we will construct very large scale speech and text databases for speech recognition, speech synthesis, and speaker identification. The common speech database has been considered various communication environments, distribution of speakers' sex, distribution of speakers' age, and distribution of speakers' region. It consists of Korean continuous digit, isolated words, and sentences which reflects Korean phonetic coverage. In addition, it consists of various pronunciation style such as read speech, dialogue speech, and semi-spontaneous speech. Thanks to the common speech databases, the duplicated resources of Korean speech industries are prohibited. It encourages domestic speech industries and activate speech technology domestic market.
PDF

Speech Database Design and Structuring for High Quality TTS (고품질 음성합성을 위한 합성 DB 구축)

Kang Dong-Gyu;Yi Sionghun;Ryu Won-Ho
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.33-36
- /
- 2002
As the telematics service that is the integration of information technology approaches commercialization, the necessity and gravity of speech technology is rapidly growing. The speech technology occupies important position in the telematics service because it informs the starting of service and the retrieved result. This service must provide high accuracy of speech recognition and natural synthesis of human speech in a driving environment and it is especially true for the fee-for-service. For high quality TTS, the speech synthesis technique that makes optimal synthesis database and uses efficiently this database is required. In this paper, we describe the design of phonetically balanced sentences used for speech database, the selection of service-suitable-speaker, the extraction methods of accurate phoneme boundary, and the factors which are taken into consideration in the extraction stage of prosody. Finally we show the real case that has commercially implemented.
PDF

A CART-based diagnostic model using speech technology for evaluating mental fatigue caused by monotonous work (단순작업으로 인한 정신피로도 측정을 위한 음성기술을 이용한 CART 기반 진단모델)

Kwon, Chul Hong
- Phonetics and Speech Sciences
- /
- v.8 no.4
- /
- pp.97-101
- /
- 2016
This paper presents a CART(Classification and Regression Tree)-based model to diagnose mental fatigue using speech technology. The parameters used in the model are the significant speech parameters highly correlated to the fatigue and questionnaire responses obtained before and after imposing the fatigue. It is shown from the experiments that the proposed model achieves classification accuracies of 96.67% and 98.33% using the speech parameters and questionnaire responses, respectively. This implies that the proposed model can be used as a tool to diagnose the mental fatigue, and that speech technology is useful to diagnose the fatigue.
https://doi.org/10.13064/KSSS.2016.8.4.097 인용 PDF KSCI

Search Result 1,895, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)