• Title/Summary/Keyword: text-to-speech

Search Result 494, Processing Time 0.037 seconds

Expiration Date Notification System Based on YOLO and OCR algorithms for Visually Impaired Person (YOLO와 OCR 알고리즘에 기반한 시각 장애우를 위한 유통기한 알림 시스템)

  • Kim, Min-Soo;Moon, Mi-Kyung;Han, Chang-Hee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.6
    • /
    • pp.1329-1338
    • /
    • 2021
  • There are rarely effective methods to help visually impaired people when they want to know the expiration date of products excepted to only Braille. In this study, we developed an expiration date notification system based on YOLO and OCR for visually impaired people. The handicapped people can automatically know the expiration date of a specific product by using our system without the help of a caregiver, fast and accurately. The proposed system is worked by four different steps: (1) identification of a target product by scanning its barcode; (2) segmentation of an image area with the expiration date using YOLO; (3) classification of the expiration date by OCR: (4) notification of the expiration date by TTS. Our system showed an average classification accuracy of about 86.00% when blindfolded subjects used the proposed system in real-time. This result validates that the proposed system can be potentially used for visually impaired people.

Emotion-based Real-time Facial Expression Matching Dialogue System for Virtual Human (감정에 기반한 가상인간의 대화 및 표정 실시간 생성 시스템 구현)

  • Kim, Kirak;Yeon, Heeyeon;Eun, Taeyoung;Jung, Moonryul
    • Journal of the Korea Computer Graphics Society
    • /
    • v.28 no.3
    • /
    • pp.23-29
    • /
    • 2022
  • Virtual humans are implemented with dedicated modeling tools like Unity 3D Engine in virtual space (virtual reality, mixed reality, metaverse, etc.). Various human modeling tools have been introduced to implement virtual human-like appearance, voice, expression, and behavior similar to real people, and virtual humans implemented via these tools can communicate with users to some extent. However, most of the virtual humans so far have stayed unimodal using only text or speech. As AI technologies advance, the outdated machine-centered dialogue system is now changing to a human-centered, natural multi-modal system. By using several pre-trained networks, we implemented an emotion-based multi-modal dialogue system, which generates human-like utterances and displays appropriate facial expressions in real-time.

Expansion of Word Representation for Named Entity Recognition Based on Bidirectional LSTM CRFs (Bidirectional LSTM CRF 기반의 개체명 인식을 위한 단어 표상의 확장)

  • Yu, Hongyeon;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.44 no.3
    • /
    • pp.306-313
    • /
    • 2017
  • Named entity recognition (NER) seeks to locate and classify named entities in text into pre-defined categories such as names of persons, organizations, locations, expressions of times, etc. Recently, many state-of-the-art NER systems have been implemented with bidirectional LSTM CRFs. Deep learning models based on long short-term memory (LSTM) generally depend on word representations as input. In this paper, we propose an approach to expand word representation by using pre-trained word embedding, part of speech (POS) tag embedding, syllable embedding and named entity dictionary feature vectors. Our experiments show that the proposed approach creates useful word representations as an input of bidirectional LSTM CRFs. Our final presentation shows its efficacy to be 8.05%p higher than baseline NERs with only the pre-trained word embedding vector.

Noise Robust Text-Independent Speaker Identification for Ubiquitous Robot Companion (지능형 서비스 로봇을 위한 잡음에 강인한 문맥독립 화자식별 시스템)

  • Kim, Sung-Tak;Ji, Mi-Kyoung;Kim, Hoi-Rin;Kim, Hye-Jin;Yoon, Ho-Sub
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.190-194
    • /
    • 2008
  • This paper presents a speaker identification technique which is one of the basic techniques of the ubiquitous robot companion. Though the conventional mel-frequency cepstral coefficients guarantee high performance of speaker identification in clean condition, the performance is degraded dramatically in noise condition. To overcome this problem, we employed the relative autocorrelation sequence mel-frequency cepstral coefficient which is one of the noise robust features. However, there are two problems in relative autocorrelation sequence mel-frequency cepstral coefficient: 1) the limited information problem. 2) the residual noise problem. In this paper, to deal with these drawbacks, we propose a multi-streaming method for the limited information problem and a hybrid method for the residual noise problem. To evaluate proposed methods, noisy speech is used in which air conditioner noise, classic music, and vacuum noise are artificially added. Through experiments, proposed methods provide better performance of speaker identification than the conventional methods.

  • PDF

Prediction of Prosodic Break Using Syntactic Relations and Prosodic Features (구문 관계와 운율 특성을 이용한 한국어 운율구 경계 예측)

  • Jung, Young-Im;Cho, Sun-Ho;Yoon, Ae-Sun;Kwon, Hyuk-Chul
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.1
    • /
    • pp.89-105
    • /
    • 2008
  • In this paper, we suggest a rule-based system for the prediction of natural prosodic phrase breaks from Korean texts. For the implementation of the rule-based system, (1) sentence constituents are sub-categorized according to their syntactic functions, (2) syntactic phrases are recognized using the dependency relations among sub-categorized constituents, (3) rules for predicting prosodic phrase breaks are created. In addition, (4) the length of syntactic phrases and sentences, the position of syntactic phrases in a sentence, sense information of contextual words have been considered as to determine the variable prosodic phrase breaks. Based on these rules and features, we obtained the accuracy over 90% in predicting the position of major break and no break which have high correlation with the syntactic structure of the sentence. As for the overall accuracy in predicting the whole prosodic phrase breaks, the suggested system shows Break_Correct of 87.18% and Juncture Correct of 89.27% which is higher than that of other models.

  • PDF

Artificial Intelligence Art : A Case study on the Artwork An Evolving GAIA (대화형 인공지능 아트 작품의 제작 연구 :진화하는 신, 가이아(An Evolving GAIA)사례를 중심으로)

  • Roh, Jinah
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.5
    • /
    • pp.311-318
    • /
    • 2018
  • This paper presents the artistic background and implementation structure of a conversational artificial intelligence interactive artwork, "An Evolving GAIA". Recent artworks based on artificial intelligence technology are introduced. Development of biomimetics and artificial life technology has burred differentiation of machine and human. In this paper, artworks presenting machine-life metaphor are shown, and the distinct implementation of conversation system is emphasized in detail. The artwork recognizes and follows the movement of audience using its eyes for natural interaction. It listens questions of the audience and replies appropriate answers by text-to-speech voice, using the conversation system implemented with an Android client in the artwork and a webserver based on the question-answering dictionary. The interaction gives to the audience discussion of meaning of life in large scale and draws sympathy for the artwork itself. The paper shows the mechanical structure, the implementation of conversational system of the artwork, and reaction of the audience which can be helpful to direct and make future artificial intelligence interactive artworks.

Development a Meal Support System for the Visually Impaired Using YOLO Algorithm (YOLO알고리즘을 활용한 시각장애인용 식사보조 시스템 개발)

  • Lee, Gun-Ho;Moon, Mi-Kyeong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.5
    • /
    • pp.1001-1010
    • /
    • 2021
  • Normal people are not deeply aware of their dependence on sight when eating. However, since the visually impaired do not know what kind of food is on the table, the assistant next to them holds the blind spoon and explains the position of the food in a clockwise direction, front and rear, left and right, etc. In this paper, we describe the development of a meal assistance system that recognizes each food image and announces the name of the food by voice when a visually impaired person looks at their table using a smartphone camera. This system extracts the food on which the spoon is placed through the YOLO model that has learned the image of food and tableware (spoon), recognizes what the food is, and notifies it by voice. Through this system, it is expected that the visually impaired will be able to eat without the help of a meal assistant, thereby increasing their self-reliance and satisfaction.

Case Study : Cinematography using Digital Human in Tiny Virtual Production (초소형 버추얼 프로덕션 환경에서 디지털 휴먼을 이용한 촬영 사례)

  • Jaeho Im;Minjung Jang;Sang Wook Chun;Subin Lee;Minsoo Park;Yujin Kim
    • Journal of the Korea Computer Graphics Society
    • /
    • v.29 no.3
    • /
    • pp.21-31
    • /
    • 2023
  • In this paper, we introduce a case study of cinematography using digital human in virtual production. This case study deals with the system overview of virtual production using LEDs and an efficient filming pipeline using digital human. Unlike virtual production using LEDs, which mainly project the background on LEDs, in this case, we use digital human as a virtual actor to film scenes communicating with a real actor. In addition, to film the dialogue scene between the real actor and the digital human using a real-time engine, we automatically generated speech animation of the digital human in advance by applying our Korean lip-sync technology based on audio and text. We verified this filming case by using a real-time engine to produce short drama content using real actor and digital human in an LED-based virtual production environment.

The Narrative Discourse of the Novel and the Film L'Espoir (소설과 영화 『희망 L'Espoir』의 서사담론)

  • Oh, Se-Jung
    • Cross-Cultural Studies
    • /
    • v.48
    • /
    • pp.289-323
    • /
    • 2017
  • L'Espoir, a novel by Andre Malraux, contains traits of the genre of literacy reportage that depicts the full account of the Spanish Civil War as non-fiction based on his personal experience of participating in war; the novel has been dramatized into a semi-documentary film that corresponds to reportage literature. A semi-documentary film is the genre of film that pursues realistic illustration of social incidents or phenomenon. Despite difference in types of genre of the novel and the film L'Espoir, such creative activities deserve close relevance and considerable narrative connectivity. Therefore, $G{\acute{e}}rard$ Genette's narrative discourse of novel and film based on narrative theory carries value of research. Every kind of story, in a narrative message, has duplicate times in which story time and discourse time are different. This is because, in a narrative message, one event may occur before or later than another, told lengthily or concisely, and aroused once or repeatedly. Accordingly, analyzing differing timeliness of the actual event occurring and of recording that event is in terms of order, duration, and frequency. Since timeliness of order, duration, and frequency indicates dramatic pace that controls the passage of a story, it appears as an editorial notion in the novel and the film L'Espoir. It is an aesthetic discourse raising curiosity and shock, the correspondence of time in arranging, summarizing, deleting the story. In addition, Genette mentions notions of speech and voice to clearly distinguish position and focalization of a narrator or a speaker in text. The necessity to discriminate 'who speaks' and 'who sees' comes from difference in views of the narrator of text and the text. The matter of 'who speaks' is about who portrays narrator of the story. However, 'who sees' is related to from whose stance the story is being narrated. In the novel L'Espoir, change of focalization was ushered through zero focalization and internal focalization, and pertains to the multicamera in the film. Also, the frame story was commonly taken as metadiegetic type of voice in both film and novel of L'Espoir. In sum, narrative discourse in the novel and the film L'Espoir is the dimension of story communication among text, the narrator, and recipient.

Detecting Errors in POS-Tagged Corpus on XGBoost and Cross Validation (XGBoost와 교차검증을 이용한 품사부착말뭉치에서의 오류 탐지)

  • Choi, Min-Seok;Kim, Chang-Hyun;Park, Ho-Min;Cheon, Min-Ah;Yoon, Ho;Namgoong, Young;Kim, Jae-Kyun;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.7
    • /
    • pp.221-228
    • /
    • 2020
  • Part-of-Speech (POS) tagged corpus is a collection of electronic text in which each word is annotated with a tag as the corresponding POS and is widely used for various training data for natural language processing. The training data generally assumes that there are no errors, but in reality they include various types of errors, which cause performance degradation of systems trained using the data. To alleviate this problem, we propose a novel method for detecting errors in the existing POS tagged corpus using the classifier of XGBoost and cross-validation as evaluation techniques. We first train a classifier of a POS tagger using the POS-tagged corpus with some errors and then detect errors from the POS-tagged corpus using cross-validation, but the classifier cannot detect errors because there is no training data for detecting POS tagged errors. We thus detect errors by comparing the outputs (probabilities of POS) of the classifier, adjusting hyperparameters. The hyperparameters is estimated by a small scale error-tagged corpus, in which text is sampled from a POS-tagged corpus and which is marked up POS errors by experts. In this paper, we use recall and precision as evaluation metrics which are widely used in information retrieval. We have shown that the proposed method is valid by comparing two distributions of the sample (the error-tagged corpus) and the population (the POS-tagged corpus) because all detected errors cannot be checked. In the near future, we will apply the proposed method to a dependency tree-tagged corpus and a semantic role tagged corpus.