• Title/Summary/Keyword: 텍스트/음성변환

Search Result 75, Processing Time 0.023 seconds

Design of CNN-based Braille Conversion and Voice Output Device for the Blind (시각장애인을 위한 CNN 기반의 점자 변환 및 음성 출력 장치 설계)

  • Seung-Bin Park;Bong-Hyun Kim
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.3
    • /
    • pp.87-92
    • /
    • 2023
  • As times develop, information becomes more diverse and methods of obtaining it become more diverse. About 80% of the amount of information gained in life is acquired through the visual sense. However, visually impaired people have limited ability to interpret visual materials. That's why Braille, a text for the blind, appeared. However, the Braille decoding rate of the blind is only 5%, and as the demand of the blind who want various forms of platforms or materials increases over time, development and product production for the blind are taking place. An example of product production is braille books, which seem to have more disadvantages than advantages, and unlike non-disabled people, it is true that access to information is still very difficult. In this paper, we designed a CNN-based Braille conversion and voice output device to make it easier for visually impaired people to obtain information than conventional methods. The device aims to improve the quality of life by allowing books, text images, or handwritten images that are not made in Braille to be converted into Braille through camera recognition, and designing a function that can be converted into voice according to the needs of the blind.

User certification module development of Gallery-Auction for NFC-based 2 Factor mobile electronic payment (NFC 기반 2 Factor 모바일 전자결제를 위한 갤러리-옥션의 사용자인증 모듈 개발)

  • Jo, Won Oh;Cha, Yoon Seok;Oh, Soo Hee;Choi, Myeong Soo;Kim, Hyung Jong
    • Smart Media Journal
    • /
    • v.6 no.3
    • /
    • pp.29-40
    • /
    • 2017
  • Lately weight for smartphone mounted to function for NFC is increasing, rapidly. Because of this, NFC related technology is made by many companies. We developed Gallery-Auction for security enhancements and new services of NFC-based 2 factor electronic payment system. Enhanced security features development of user authentication module through fingerprint recognition to apply FIDO authentication technology and developed electronic contract voice service of Gallery-Auction using TTS(Text to Speech). Therefore we enhanced convenient and simple authentication method and security through NFC mobile electronic payment.

A Study on the Impact of Speech Data Quality on Speech Recognition Models

  • Yeong-Jin Kim;Hyun-Jong Cha;Ah Reum Kang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.41-49
    • /
    • 2024
  • Speech recognition technology is continuously advancing and widely used in various fields. In this study, we aimed to investigate the impact of speech data quality on speech recognition models by dividing the dataset into the entire dataset and the top 70% based on Signal-to-Noise Ratio (SNR). Utilizing Seamless M4T and Google Cloud Speech-to-Text, we examined the text transformation results for each model and evaluated them using the Levenshtein Distance. Experimental results revealed that Seamless M4T scored 13.6 in models using data with high SNR, which is lower than the score of 16.6 for the entire dataset. However, Google Cloud Speech-to-Text scored 8.3 on the entire dataset, indicating lower performance than data with high SNR. This suggests that using data with high SNR during the training of a new speech recognition model can have an impact, and Levenshtein Distance can serve as a metric for evaluating speech recognition models.

Voice Recognition Chatbot System for an Aging Society: Technology Development and Customized UI/UX Design (고령화 사회를 위한 음성 인식 챗봇 시스템 : 기술 개발과 맞춤형 UI/UX 설계)

  • Yun-Ji Jeong;Min-Seong Yu;Joo-Young Oh;Hyeon-Seok Hwang;Won-Whoi Hun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.4
    • /
    • pp.9-14
    • /
    • 2024
  • This study developed a voice recognition chatbot system to address depression and loneliness among the elderly in an aging society. The system utilizes the Whisper model, GPT 2.5, and XTTS2 to provide high-performance voice recognition, natural language processing, and text-to-speech conversion. Users can express their emotions and states and receive appropriate responses, with voice recognition functionality using familiar voices for comfort and reassurance. The UX/UI design considers the cognitive responses, visual impairments, and physical limitations of the smart senior generation, using high contrast colors and readable fonts for enhanced usability. This research is expected to improve the quality of life for the elderly through voice-based interfaces.

Design of a Mirror for Fragrance Recommendation based on Personal Emotion Analysis (개인의 감성 분석 기반 향 추천 미러 설계)

  • Hyeonji Kim;Yoosoo Oh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.4
    • /
    • pp.11-19
    • /
    • 2023
  • The paper proposes a smart mirror system that recommends fragrances based on user emotion analysis. This paper combines natural language processing techniques such as embedding techniques (CounterVectorizer and TF-IDF) and machine learning classification models (DecisionTree, SVM, RandomForest, SGD Classifier) to build a model and compares the results. After the comparison, the paper constructs a personal emotion-based fragrance recommendation mirror model based on the SVM and word embedding pipeline-based emotion classifier model with the highest performance. The proposed system implements a personalized fragrance recommendation mirror based on emotion analysis, providing web services using the Flask web framework. This paper uses the Google Speech Cloud API to recognize users' voices and use speech-to-text (STT) to convert voice-transcribed text data. The proposed system provides users with information about weather, humidity, location, quotes, time, and schedule management.

Automatic Electronic Medical Record Generation System using Speech Recognition and Natural Language Processing Deep Learning (음성인식과 자연어 처리 딥러닝을 통한 전자의무기록자동 생성 시스템)

  • Hyeon-kon Son;Gi-hwan Ryu
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.731-736
    • /
    • 2023
  • Recently, the medical field has been applying mandatory Electronic Medical Records (EMRs) and Electronic Health Records (EHRs) systems that computerize and manage medical records, and distributing them throughout the entire medical industry to utilize patients' past medical records for additional medical procedures. However, the conversations between medical professionals and patients that occur during general medical consultations and counseling sessions are not separately recorded or stored, so additional important patient information cannot be efficiently utilized. Therefore, we propose an electronic medical record system that uses speech recognition and natural language processing deep learning to store conversations between medical professionals and patients in text form, automatically extracts and summarizes important medical consultation information, and generates electronic medical records. The system acquires text information through the recognition process of medical professionals and patients' medical consultation content. The acquired text is then divided into multiple sentences, and the importance of multiple keywords included in the generated sentences is calculated. Based on the calculated importance, the system ranks multiple sentences and summarizes them to create the final electronic medical record data. The proposed system's performance is verified to be excellent through quantitative analysis.

Automatic sentence segmentation of subtitles generated by STT (STT로 생성된 자막의 자동 문장 분할)

  • Kim, Ki-Hyun;Kim, Hong-Ki;Oh, Byoung-Doo;Kim, Yu-Seop
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.559-560
    • /
    • 2018
  • 순환 신경망(RNN) 기반의 Long Short-Term Memory(LSTM)는 자연어처리 분야에서 우수한 성능을 보이는 모델이다. 음성을 문자로 변환해주는 Speech to Text (STT)를 이용해 자막을 생성하고, 생성된 자막을 다른 언어로 동시에 번역을 해주는 서비스가 활발히 진행되고 있다. STT를 사용하여 자막을 추출하는 경우에는 마침표가 없이 전부 연결된 문장이 생성되기 때문에 정확한 번역이 불가능하다. 본 논문에서는 영어자막의 자동 번역 시, 정확도를 높이기 위해 텍스트를 문장으로 분할하여 마침표를 생성해주는 방법을 제안한다. 이 때, LSTM을 이용하여 데이터를 학습시킨 후 테스트한 결과 62.3%의 정확도로 마침표의 위치를 예측했다.

  • PDF

Development of an English Study Application using Deep Learning-based lmage Recognition techniques (딥러닝 기반 이미지 인식 기술을 활용한 영어 학습 애플리케이션 개발)

  • Kim, Yoo Jung;Kim, Ju Yeon;Lee, Yu bin;Lee, Ki Yong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.151-154
    • /
    • 2017
  • 본 논문에서는 사용자의 주변사물을 인식하여 영단어로 알려줌으로써 사용자가 실생활에서 영단어를 능동적으로 학습할 수 있도록 돕는 애플리케이션을 개발한다. 본 애플리케이션은 사용자가 카메라로 촬영하거나 사진첩에서 선택한 이미지를 인식하여 사진 속 물체의 영어 단어와 한국어 뜻을 알려주며, 단어의 발음 또한 확인할 수 있고, 직접 단어장에 저장하여 다시 학습할 수 있도록 한다. 이를 위해 TensorFolw를 활용한 딥러닝 기반 이미지 인식 기술을 사용하였으며, 추후 TensorFolw를 통하여 모델을 추가적으로 훈련시킴으로써 이미지 인식의 정확도를 높일 수 있다. 그 외 영어-한국어 번역, 텍스트-음성 변환 등 부가 기능을 통해 사용자가 다양한 방식으로 영단어를 학습할 수 있도록 한다.

A SOUND-QR Authentication Method for an Electronic Attendance System (전자출결시스템을 위한 SOUND-QR 인증 방안)

  • Kang, Jung-Hun;Sim, Hyeon-Seok;Seo, Seung-Hyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.326-327
    • /
    • 2022
  • 현행 전자출결 시스템은 강의실이 아닌 곳에서 지정된 숫자만 제공되면, 정당하지 않은 방법으로도 출석을 진행할 수 있다. 이런 맹점을 개선하기 위해 본 논문에서는 인증용 데이터를 텍스트가 아닌 음성 데이터를 이용한다. 본 논문에서 제안하는 SOUND-QR 인증 시스템은 음원을 전달할 때 특정 데이터가 포함된 음원을 잡음과 왜곡 환경에 강한 오디오로 변환하고 이러한 음원을 각 기기에서 입출력시 발생하는 특성 차이를 인증에 활용함으로써 현장에 있는 기기에서만 출결이 인정되도록 한다.

Design of Real-Time Voice Phishing Detection Techniques using KoBERT (KoBERT를 활용한 실시간 보이스피싱 탐지기법 개념설계)

  • Yeong Jin Kim;Byoung-Yup Lee;Ah Reum Kang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.95-96
    • /
    • 2024
  • 본 논문은 금융 범죄 중 하나인 보이스피싱을 실시간으로 예방하기 위한 탐지 기법을 제안한다. 제안된 모델은 수화기에 출력되는 음성을 녹음하고 네이버 CSR(Cloud Speech Recognition)을 통해 텍스트 파일로 변환한 후 딥러닝 기반의 KoBERT를 바탕으로 다양한 보이스피싱 패턴을 학습하여 실시간 환경에서의 신속하고 정확한 탐지를 위해 실제 통화 데이터를 적절하게 처리하여, 이를 통해 효과적인 보이스피싱 예방에 도움을 줄 것으로 예상된다.

  • PDF