• Title/Summary/Keyword: Speech-Recognition

Search Result 2,045, Processing Time 0.029 seconds

Language Variation and World Englishes (언어변이와 세계영어들)

  • Kim, Yangsoon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.1
    • /
    • pp.234-239
    • /
    • 2021
  • The purpose of this paper is to find out the nature of language variation by exploring the ways of the progress of the language variation that produces all English-lects, i.e., the World Englishes. The study of language variation in linguistics is a hybrid enterprise, so the study of World Englishes has led to the recognition of a highly diverse set of all English-lects, encompassing regional dialects, sociolects, ethnolects and (post-)colonial dialects of World Englishes. In this paper, we propose a hybrid language variation model with three interacting factors of social distancing, on/off-contact, and linguistic diversity to examine the characteristics of language variation. In the context of World Englishes, the social distance is typically low in terms of their local location (country/speech) for local purposes. The social distance also varies based on online/offline communication modes and other social factors like gender, age and ethnic groups, resulting in all English-lects. To clarify the nature of World Englishes, the core Englishes, BrE, AmE and CanE are discussed here.

A Study on the Linkage Model Between Institutions Related to Lifelong Education for People with Developmental Disabilities Based on the K-PACE Center of Daegu University: A Perspective on the Whole Life Cycle for People with Developmental Disabilities

  • Kim, Young-Jun;Kim, Wha-Soo;Rhee, Kun-Yong
    • International Journal of Advanced Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.24-35
    • /
    • 2022
  • The purpose of this study was to form a linked model in which local institutions related to lifelong education for the disabled can cooperate based on the Daegu University K-PACE Center. The contents of the study started with recognizing the problem that the adult-centered lifelong education support system does not effectively cope with these factors, even though the independent life of people with developmental disabilities is a major factor determining the quality of life. Regarding this problem recognition, this study primarily emphasized the view that educational support for independent life of people with developmental disabilities should establish the context of the school foundation. The context of the school foundation is established for lifelong education centered on adulthood for people with developmental disabilities because the curriculum is embodied through the standards of subject matter education. In this regard, the Daegu University K-PACE Center, which established a curriculum that supports the independent life of people with developmental disabilities in terms of linking higher and lifelong education, actually reflects the context of the school foundation. As a result, this study prepared a strategy that could be considered as a transition to advance the curriculum organized by the Daegu University K-PACE Center, and the strategy was secondarily reflected as a procedure that could be linked to local lifelong education-related institutions for the disabled. Finally, this study presented a form of transition in which people with developmental disabilities can access the curriculum of lifelong education through the connection of local lifelong education-related institutions for the disabled, centering on the entire life of adulthood.

An analysis study on the quality of article to improve the performance of hate comments discrimination (악성댓글 판별의 성능 향상을 위한 품사 자질에 대한 분석 연구)

  • Kim, Hyoung Ju;Min, Moon Jong;Kim, Pan Koo
    • Smart Media Journal
    • /
    • v.10 no.4
    • /
    • pp.71-79
    • /
    • 2021
  • One of the social aspects that changes as the use of the Internet becomes widespread is communication in online space. In the past, only one-on-one conversations were possible remotely, except when they were physically in the same space, but nowadays, technology has been developed to enable communication with a large number of people remotely through bulletin boards, communities, and social network services. Due to the development of such information and communication networks, life becomes more convenient, and at the same time, the damage caused by rapid information exchange is also constantly increasing. Recently, cyber crimes such as sending sexual messages or personal attacks to certain people with recognition on the Internet, such as not only entertainers but also influencers, have occurred, and some of those exposed to these cybercrime have committed suicide. In this paper, in order to reduce the damage caused by malicious comments, research a method for improving the performance of discriminate malicious comments through feature extraction based on parts-of-speech.

Application of Information Technologies to Improve the Quality of Services Provided to the Tourism Industry Under the COVID-19 Restrictions

  • Iudina, Elena Vladimirovna;Balova, Suzana L.;Maksimov, Dmitrij Vasilievich;Skoromets, Elena Klimentinovna;Ponyaeva, Tatyana Anatolyevna;Ksenofontova, Ekaterina Andreevna
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.6
    • /
    • pp.7-12
    • /
    • 2022
  • The modern stage of society's development is characterized by the rapid penetration of information technologies into all spheres of life. Their use contributes to improving the quality of tourism services, as well as the competitiveness of tourism industry enterprises. The role of information technology in tourism is growing more and more every year, which determines the relevance of the study of modern trends in the use of information technology in the tourism sector. The purpose of the study is to determine the possibilities of using information technologies to improve the quality of services provided to the tourism industry under the COVID-19 restrictions. The article systematizes the main approaches to the "cluster" category and provides an original definition of the "regional tourist cluster" concept. Based on an expert survey, the main trends in the introduction of information technologies in the tourism industry under the COVID-19 restrictions have been identified, which include virtual reality and augmented reality, speech recognition technologies, photo, video, audio (contactless control technologies), mobile IT applications and Big Data technologies. It has been concluded that the vast majority of improvements in the organization of tourism services under restrictions will be based on the organization of virtual solutions and online activities. The types of tourism services will also change, and information technology will help their development and dissemination.

Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU (GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법)

  • Kim, Mincheol;Lee, Kwangyeob
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.10
    • /
    • pp.935-943
    • /
    • 2017
  • CNN (Convolution neural network), which is used for image classification and speech recognition among neural networks learning based on positive data, has been continuously developed to have a high performance structure to date. There are many difficulties to utilize in an embedded system with limited resources. Therefore, we use GPU (General-Purpose Computing on Graphics Processing Units), which is used for general-purpose operation of GPU to solve the problem because we use pre-learned weights but there are still limitations. Since CNN performs simple and iterative operations, the computation speed varies greatly depending on the thread allocation and utilization method in the Single Instruction Multiple Thread (SIMT) based GPGPU. To solve this problem, there is a thread that needs to be relaxed when performing Convolution and Pooling operations with threads. The remaining threads have increased the operation speed by using the method used in the following feature maps and kernel calculations.

Research on Developing a Conversational AI Callbot Solution for Medical Counselling

  • Won Ro LEE;Jeong Hyon CHOI;Min Soo KANG
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.4
    • /
    • pp.9-13
    • /
    • 2023
  • In this study, we explored the potential of integrating interactive AI callbot technology into the medical consultation domain as part of a broader service development initiative. Aimed at enhancing patient satisfaction, the AI callbot was designed to efficiently address queries from hospitals' primary users, especially the elderly and those using phone services. By incorporating an AI-driven callbot into the hospital's customer service center, routine tasks such as appointment modifications and cancellations were efficiently managed by the AI Callbot Agent. On the other hand, tasks requiring more detailed attention or specialization were addressed by Human Agents, ensuring a balanced and collaborative approach. The deep learning model for voice recognition for this study was based on the Transformer model and fine-tuned to fit the medical field using a pre-trained model. Existing recording files were converted into learning data to perform SSL(self-supervised learning) Model was implemented. The ANN (Artificial neural network) neural network model was used to analyze voice signals and interpret them as text, and after actual application, the intent was enriched through reinforcement learning to continuously improve accuracy. In the case of TTS(Text To Speech), the Transformer model was applied to Text Analysis, Acoustic model, and Vocoder, and Google's Natural Language API was applied to recognize intent. As the research progresses, there are challenges to solve, such as interconnection issues between various EMR providers, problems with doctor's time slots, problems with two or more hospital appointments, and problems with patient use. However, there are specialized problems that are easy to make reservations. Implementation of the callbot service in hospitals appears to be applicable immediately.

Robust Real-time Pose Estimation to Dynamic Environments for Modeling Mirror Neuron System (거울 신경 체계 모델링을 위한 동적 환경에 강인한 실시간 자세추정)

  • Jun-Ho Choi;Seung-Min Park
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.3
    • /
    • pp.583-588
    • /
    • 2024
  • With the emergence of Brain-Computer Interface (BCI) technology, analyzing mirror neurons has become more feasible. However, evaluating the accuracy of BCI systems that rely on human thoughts poses challenges due to their qualitative nature. To harness the potential of BCI, we propose a new approach to measure accuracy based on the characteristics of mirror neurons in the human brain that are influenced by speech speed, depending on the ultimate goal of movement. In Chapter 2 of this paper, we introduce mirror neurons and provide an explanation of human posture estimation for mirror neurons. In Chapter 3, we present a powerful pose estimation method suitable for real-time dynamic environments using the technique of human posture estimation. Furthermore, we propose a method to analyze the accuracy of BCI using this robotic environment.

A Study of Psychometric Function Curve for Korean Standard Monosyllabic Word Lists for Preschoolers (KS-MWL-P) (한국표준 학령전기용 단음절어표 (Korean Standard Monosyllabic Word Lists for Preschoolers, KS-MWL-P)의 심리음향기능곡선 연구)

  • Shin, Hyun-Wook;Kim, Jin-Sook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.534-541
    • /
    • 2009
  • Word recognition test (WRT) for the children can be useful for diagnosing the degree of communication disability, prescribing hearing instruments, planning aural rehabilitation and speech therapy, and determination of site of lesions. The Korean standard monosyllabic word lists for preschoolers (KS-MWL-P) were developed considering the criteria given by the literatures. However, the authors of KS-MWL-P suggested more children should be included to verify homogeneity of the lists using psychometric function curve since only 8 children participated in the developing process. The purpose of this study was to explore the homogeneity of KS-MWL-P for supplementing the limitations of the lists employing psychometric analysis. To 23 preschoolers who have normal-hearing, 100 monosyllabic KS-MWL-P words were examined with the pictures. Psychometric function curve with linear slopes of 20% and 80%'s correct rates through accounting recognition scores of each monosyllabic word at variable intensities from -10 to 40 dBHL was obtained and analyzed. As a result, s-shaped psychometric function curve was presented with increasing correct rate depending on intensity and showed no statistical significant differences among each word and list. The congruous graph shapes among lists also indicated good homogeneity and the list 1,2,3,4's average slopes were 4.48, 3.86, 4.65, 4.50. It was verified that the homogeneity was suitable because the analysis of variance showed no statistical significance among lists (p>0.05). However, KS-MWL-P's order of slope according to the order of the number of items, $1{\sim}10$, $1{\sim}20$, $1{\sim}25$ showed no difference with the p-value of 0.93, 0.59, 0.91, 0.70 for the lists 1,2,3, and 4, respectively. Although KS-MWL-P was assumed that the lower-numbered items were easy for testing younger ages, this study's results could not agree with the author's conclusion. Considering this matter, rearranging of the number of items should be performed according to the analysis of slope suggested by this study for testing younger children with easier items. Other than this, in conclusion, KS-MWL-P was proved to be useful for clinical and rehabilitative evaluating and training tools for preschoolers.

Deep Learning Architectures and Applications (딥러닝의 모형과 응용사례)

  • Ahn, SungMahn
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.127-142
    • /
    • 2016
  • Deep learning model is a kind of neural networks that allows multiple hidden layers. There are various deep learning architectures such as convolutional neural networks, deep belief networks and recurrent neural networks. Those have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks. Among those architectures, convolutional neural networks and recurrent neural networks are classified as the supervised learning model. And in recent years, those supervised learning models have gained more popularity than unsupervised learning models such as deep belief networks, because supervised learning models have shown fashionable applications in such fields mentioned above. Deep learning models can be trained with backpropagation algorithm. Backpropagation is an abbreviation for "backward propagation of errors" and a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of an error function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the error function. Convolutional neural networks use a special architecture which is particularly well-adapted to classify images. Using this architecture makes convolutional networks fast to train. This, in turn, helps us train deep, muti-layer networks, which are very good at classifying images. These days, deep convolutional networks are used in most neural networks for image recognition. Convolutional neural networks use three basic ideas: local receptive fields, shared weights, and pooling. By local receptive fields, we mean that each neuron in the first(or any) hidden layer will be connected to a small region of the input(or previous layer's) neurons. Shared weights mean that we're going to use the same weights and bias for each of the local receptive field. This means that all the neurons in the hidden layer detect exactly the same feature, just at different locations in the input image. In addition to the convolutional layers just described, convolutional neural networks also contain pooling layers. Pooling layers are usually used immediately after convolutional layers. What the pooling layers do is to simplify the information in the output from the convolutional layer. Recent convolutional network architectures have 10 to 20 hidden layers and billions of connections between units. Training deep learning networks has taken weeks several years ago, but thanks to progress in GPU and algorithm enhancement, training time has reduced to several hours. Neural networks with time-varying behavior are known as recurrent neural networks or RNNs. A recurrent neural network is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. Early RNN models turned out to be very difficult to train, harder even than deep feedforward networks. The reason is the unstable gradient problem such as vanishing gradient and exploding gradient. The gradient can get smaller and smaller as it is propagated back through layers. This makes learning in early layers extremely slow. The problem actually gets worse in RNNs, since gradients aren't just propagated backward through layers, they're propagated backward through time. If the network runs for a long time, that can make the gradient extremely unstable and hard to learn from. It has been possible to incorporate an idea known as long short-term memory units (LSTMs) into RNNs. LSTMs make it much easier to get good results when training RNNs, and many recent papers make use of LSTMs or related ideas.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.