• Title/Summary/Keyword: Speech-to-text services

Search Result 38, Processing Time 0.028 seconds

A Method of Automated Quality Evaluation for Voice-Based Consultation (음성 기반 상담의 품질 평가를 위한 자동화 기법)

  • Lee, Keonsoo;Kim, Jung-Yeon
    • Journal of Internet Computing and Services
    • /
    • v.22 no.2
    • /
    • pp.69-75
    • /
    • 2021
  • In a contact-free society, online services are becoming more important than classic offline services. At the same time, the role of a contact center, which executes customer relation management (CRM), is increasingly essential. For supporting the CRM tasks and their effectiveness, techniques of process automation need to be applied. Quality assurance (QA) is one of the time and resource consuming, and typical processes that are suitable for automation. In this paper, a method of automatic quality evaluation for voice based consultations is proposed. Firstly, the speech in consultations is transformed into a text by speech recognition. Then quantitative evaluation based on the QA metrics, including checking the elements in opening and closing mention, the existence of asking the mandatory information, the attitude of listening and speaking, is executed. 92.7% of the automated evaluations are the same to the result done by human experts. It was found that the non matching cases of the automated evaluations were mainly caused from the mistranslated Speech-to-Text (STT) result. With the confidence of STT result, this proposed method can be employed for enhancing the efficiency of QA process in contact centers.

A Study on Quantitative Evaluation Method for STT Engine Accuracy based on Korean Characteristics (한국어 특성 기반의 STT 엔진 정확도를 위한 정량적 평가방법 연구)

  • Min, So-Yeon;Lee, Kwang-Hyong;Lee, Dong-Seon;Ryu, Dong-Yeop
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.7
    • /
    • pp.699-707
    • /
    • 2020
  • With the development of deep learning technology, voice processing-related technology is applied to various areas, such as STT (Speech To Text), TTS (Text To Speech), ChatBOT, and intelligent personal assistant. In particular, the STT is a voice-based, relevant service that changes human languages to text, so it can be applied to various IT related services. Recently, many places, such as general private enterprises and public institutions, are attempting to introduce the relevant technology. On the other hand, in contrast to the general IT solution that can be evaluated quantitatively, the standard and methods of evaluating the accuracy of the STT engine are ambiguous, and they do not consider the characteristics of the Korean language. Therefore, it is difficult to apply the quantitative evaluation standard. This study aims to provide a guide to an evaluation of the STT engine conversion performance based on the characteristics of the Korean language, so that engine manufacturers can perform the STT conversion based on the characteristics of the Korean language, while the market could perform a more accurate evaluation. In the experiment, a 35% more accurate evaluation could be performed compared to the existing methods.

A Design and Implementation of The Deep Learning-Based Senior Care Service Application Using AI Speaker

  • Mun Seop Yun;Sang Hyuk Yoon;Ki Won Lee;Se Hoon Kim;Min Woo Lee;Ho-Young Kwak;Won Joo Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.4
    • /
    • pp.23-30
    • /
    • 2024
  • In this paper, we propose a deep learning-based personalized senior care service application. The proposed application uses Speech to Text technology to convert the user's speech into text and uses it as input to Autogen, an interactive multi-agent large-scale language model developed by Microsoft, for user convenience. Autogen uses data from previous conversations between the senior and ChatBot to understand the other user's intent and respond to the response, and then uses a back-end agent to create a wish list, a shared calendar, and a greeting message with the other user's voice through a deep learning model for voice cloning. Additionally, the application can perform home IoT services with SKT's AI speaker (NUGU). The proposed application is expected to contribute to future AI-based senior care technology.

Developing a New Algorithm for Conversational Agent to Detect Recognition Error and Neologism Meaning: Utilizing Korean Syllable-based Word Similarity (대화형 에이전트 인식오류 및 신조어 탐지를 위한 알고리즘 개발: 한글 음절 분리 기반의 단어 유사도 활용)

  • Jung-Won Lee;Il Im
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.267-286
    • /
    • 2023
  • The conversational agents such as AI speakers utilize voice conversation for human-computer interaction. Voice recognition errors often occur in conversational situations. Recognition errors in user utterance records can be categorized into two types. The first type is misrecognition errors, where the agent fails to recognize the user's speech entirely. The second type is misinterpretation errors, where the user's speech is recognized and services are provided, but the interpretation differs from the user's intention. Among these, misinterpretation errors require separate error detection as they are recorded as successful service interactions. In this study, various text separation methods were applied to detect misinterpretation. For each of these text separation methods, the similarity of consecutive speech pairs using word embedding and document embedding techniques, which convert words and documents into vectors. This approach goes beyond simple word-based similarity calculation to explore a new method for detecting misinterpretation errors. The research method involved utilizing real user utterance records to train and develop a detection model by applying patterns of misinterpretation error causes. The results revealed that the most significant analysis result was obtained through initial consonant extraction for detecting misinterpretation errors caused by the use of unregistered neologisms. Through comparison with other separation methods, different error types could be observed. This study has two main implications. First, for misinterpretation errors that are difficult to detect due to lack of recognition, the study proposed diverse text separation methods and found a novel method that improved performance remarkably. Second, if this is applied to conversational agents or voice recognition services requiring neologism detection, patterns of errors occurring from the voice recognition stage can be specified. The study proposed and verified that even if not categorized as errors, services can be provided according to user-desired results.

Ubiquitous Car Maintenance Services Using Augmented Reality and Context Awareness (증강현실을 활용한 상황인지기반의 편재형 자동차 정비 서비스)

  • Rhee, Gue-Won;Seo, Dong-Woo;Lee, Jae-Yeol
    • Korean Journal of Computational Design and Engineering
    • /
    • v.12 no.3
    • /
    • pp.171-181
    • /
    • 2007
  • Ubiquitous computing is a vision of our future computing lifestyle in which computer systems seamlessly integrate into our everyday lives, providing services and information in anywhere and anytime fashion. Augmented reality (AR) can naturally complement ubiquitous computing by providing an intuitive and collaborative visualization and simulation interface to a three-dimensional information space embedded within physical reality. This paper presents a service framework and its applications for providing context-aware u-car maintenance services using augmented reality, which can support a rich set of ubiquitous services and collaboration. It realizes bi-augmentation between physical and virtual spaces using augmented reality. It also offers a context processing module to acquire, interpret and disseminate context information. In particular, the context processing module considers user's preferences and security profile for providing private and customer-oriented services. The prototype system has been implemented to support 3D animation, TTS (Text-to-Speech), augmented manual, annotation, and pre- and post-augmentation services in ubiquitous car service environments.

Design and Implementation of a Navigation System for Visually Impaired Persons (시각장애인을 위한 네비게이션 시스템 설계 및 구현)

  • Jang, Su-Min;Hwang, Dong-Gyo;Kang, Soo;Kim, Eun-Ju;Park, Jun-Ho;Jang, Ki-Hun;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.38-47
    • /
    • 2012
  • In order to extend the activity range of visually impaired persons, we design and implement a navigation system that supports road information services and points of interest. The proposed navigation system consists of route creation modules and storage modules for visually impaired persons. In particular, the main interface of the navigation system are implemented using TTS(Text-to-Speech) program for sound and braille module that outputs braille with sense of touch. We also use google map APIs that can provide latest map information for the navigation system.

A Method of Recognizing and Validating Road Name Address from Speech-oriented Text (음성 기반 도로명 주소 인식 및 주소 검증 기법)

  • Lee, Keonsoo;Kim, Jung-Yeon;Kang, Byeong-Gwon
    • Journal of Internet Computing and Services
    • /
    • v.22 no.1
    • /
    • pp.31-39
    • /
    • 2021
  • Obtaining delivery addresses from calls is one of the most important processes in TV home shopping business. By automating this process, the operational efficiency of TV home shopping can be increased. In this paper, a method of recognizing and validating road name address, which is the address system of South Korea, from speech oriented text is proposed. The speech oriented text has three challenges. The first is that the numbers are represented in the form of pronunciation. The second is that the recorded address has noises that are made from repeated pronunciation of the same address, or unordered address. The third is that the readability of the resulted address. For resolving these problems, the proposed method enhances the existing address databases provided by the Korea Post and Ministry of the Interior and Safety. Various types of pronouncing address are added, and heuristic rules for dividing ambiguous pronunciations are employed. And the processed address is validated by checking the existence in the official address database. Even though, this proposed method is for the STT result of the address pronunciation, this also can be used for any 3rd party services that need to validate road name address. The proposed method works robustly on noises such as positions change or omission of elements.

Intelligent Records and Archives Management That Applies Artificial Intelligence (인공지능을 활용한 지능형 기록관리 방안)

  • Kim, Intaek;An, Dae-Jin;Rieh, Hae-young
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.17 no.4
    • /
    • pp.225-250
    • /
    • 2017
  • The Fourth Industrial Revolution has become a focus of attention. Artificial intelligence (AI) is the key technology that will lead us to the industrial revolution. AI is also used to facilitate efficient workflow in records and archives management area, particularly abroad. In this study, we introduced the concept of AI and examined the background on how it rose. Then we reviewed the various applications of AI with prominent examples. We have also examined how AI is used in various areas such as text analysis, and image and speech recognition. In each of these areas, we have reviewed the application of AI from the viewpoint of records and archives management and suggested further utilization of the methods, including module and interface for intelligent records and archives information services.

Multi-Emotion Regression Model for Recognizing Inherent Emotions in Speech Data (음성 데이터의 내재된 감정인식을 위한 다중 감정 회귀 모델)

  • Moung Ho Yi;Myung Jin Lim;Ju Hyun Shin
    • Smart Media Journal
    • /
    • v.12 no.9
    • /
    • pp.81-88
    • /
    • 2023
  • Recently, communication through online is increasing due to the spread of non-face-to-face services due to COVID-19. In non-face-to-face situations, the other person's opinions and emotions are recognized through modalities such as text, speech, and images. Currently, research on multimodal emotion recognition that combines various modalities is actively underway. Among them, emotion recognition using speech data is attracting attention as a means of understanding emotions through sound and language information, but most of the time, emotions are recognized using a single speech feature value. However, because a variety of emotions exist in a complex manner in a conversation, a method for recognizing multiple emotions is needed. Therefore, in this paper, we propose a multi-emotion regression model that extracts feature vectors after preprocessing speech data to recognize complex, inherent emotions and takes into account the passage of time.

A Study on the Syntagma & Paradigm by Repetition, Variation and Contrast in Ads

  • Choi, Seong-hoon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.9
    • /
    • pp.1-12
    • /
    • 2017
  • This study is the academic work to explore the potential meanings of print advertisements. Linguistic features such as repetition, variation, contrast and phonological structure in the verbal texts of ads can give rise to shades-of-meaning or slight variations in advertising. The language of advertising is not only language in words. It is also a language in images, colors, and pictures. Pictures and words combine to form the advertisement's visual text.. While the words are very important in delivering the sales message, the visual text cannot be ignored in advertisements. Forming part of the visual text is the paralanguage of the ad. Paralanguage is the meaningful behaviour accompanying language, such as voice quality, gestures, facial expressions and touch in speech, and choice of typeface and letter sizes in writing. Foregrounding is the throwing into relief of the linguistic sign against the background of the norms of ordinary language. This paper focuses its discussion on the advertisements within the framework of the paradigmatic and the syntagmatic relationship. The sources of ads have been confined to Malboro. The ads were reselected based on purposive sampling methods.