• Title/Summary/Keyword: speech technology

Search Result 1,900, Processing Time 0.037 seconds

CNN based dual-channel sound enhancement in the MAV environment (MAV 환경에서의 CNN 기반 듀얼 채널 음향 향상 기법)

  • Kim, Young-Jin;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.12
    • /
    • pp.1506-1513
    • /
    • 2019
  • Recently, as the industrial scope of multi-rotor unmanned aerial vehicles(UAV) is greatly expanded, the demands for data collection, processing, and analysis using UAV are also increasing. However, the acoustic data collected by using the UAV is greatly corrupted by the UAV's motor noise and wind noise, which makes it difficult to process and analyze the acoustic data. Therefore, we have studied a method to enhance the target sound from the acoustic signal received through microphones connected to UAV. In this paper, we have extended the densely connected dilated convolutional network, one of the existing single channel acoustic enhancement technique, to consider the inter-channel characteristics of the acoustic signal. As a result, the extended model performed better than the existed model in all evaluation measures such as SDR, PESQ, and STOI.

A Study on the Linkage Model Between Institutions Related to Lifelong Education for People with Developmental Disabilities Based on the K-PACE Center of Daegu University: A Perspective on the Whole Life Cycle for People with Developmental Disabilities

  • Kim, Young-Jun;Kim, Wha-Soo;Rhee, Kun-Yong
    • International Journal of Advanced Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.24-35
    • /
    • 2022
  • The purpose of this study was to form a linked model in which local institutions related to lifelong education for the disabled can cooperate based on the Daegu University K-PACE Center. The contents of the study started with recognizing the problem that the adult-centered lifelong education support system does not effectively cope with these factors, even though the independent life of people with developmental disabilities is a major factor determining the quality of life. Regarding this problem recognition, this study primarily emphasized the view that educational support for independent life of people with developmental disabilities should establish the context of the school foundation. The context of the school foundation is established for lifelong education centered on adulthood for people with developmental disabilities because the curriculum is embodied through the standards of subject matter education. In this regard, the Daegu University K-PACE Center, which established a curriculum that supports the independent life of people with developmental disabilities in terms of linking higher and lifelong education, actually reflects the context of the school foundation. As a result, this study prepared a strategy that could be considered as a transition to advance the curriculum organized by the Daegu University K-PACE Center, and the strategy was secondarily reflected as a procedure that could be linked to local lifelong education-related institutions for the disabled. Finally, this study presented a form of transition in which people with developmental disabilities can access the curriculum of lifelong education through the connection of local lifelong education-related institutions for the disabled, centering on the entire life of adulthood.

Emotion-based Real-time Facial Expression Matching Dialogue System for Virtual Human (감정에 기반한 가상인간의 대화 및 표정 실시간 생성 시스템 구현)

  • Kim, Kirak;Yeon, Heeyeon;Eun, Taeyoung;Jung, Moonryul
    • Journal of the Korea Computer Graphics Society
    • /
    • v.28 no.3
    • /
    • pp.23-29
    • /
    • 2022
  • Virtual humans are implemented with dedicated modeling tools like Unity 3D Engine in virtual space (virtual reality, mixed reality, metaverse, etc.). Various human modeling tools have been introduced to implement virtual human-like appearance, voice, expression, and behavior similar to real people, and virtual humans implemented via these tools can communicate with users to some extent. However, most of the virtual humans so far have stayed unimodal using only text or speech. As AI technologies advance, the outdated machine-centered dialogue system is now changing to a human-centered, natural multi-modal system. By using several pre-trained networks, we implemented an emotion-based multi-modal dialogue system, which generates human-like utterances and displays appropriate facial expressions in real-time.

COMPOSITION OF A UNIFIED MODEL ACCORDING TO THE STRUCTURE OF QUALIFICATION TYPES OF LIFELONG EDUCATION PROFESSIONALS FOR THE DISABLED: A BASIC STUDY ON THE ESTABLISHMENT OF A CONVERGENCE MAJOR IN DAEGU UNIVERSITY

  • Kim, Young-Jun;Kim, Wha-Soo;Rhee, Kun-Yong
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.40-51
    • /
    • 2021
  • This study was conducted with the aim of constructing a unified model according to the structure of qualification types of lifelong education professionals for the disabled. The research method consisted of procedures in which literature analysis and expert meetings were constructed in connection with each other. The contents of the study were suggested from the classification of qualification types into professional teacher type and coordinator type by focusing on special education and rehabilitation, which are related convergence fields that affect the qualification training of lifelong education professionals for the disabled. The two convergence fields, such as special education and rehabilitation welfare, lead to a separate application base from the perspective of education and welfare for the qualification of lifelong education professionals for the disabled, and finally confusion and conflict in the nature and contents of the curriculum and related services. A dichotomy structure system in which this phenomenon results in a divided type of qualification training for lifelong education professionals with disabilities was composed of several samples. In this regard, the curriculum and related services that can build convergence fields related to lifelong education for the disabled were reflected in the context of priority through the criteria that should be emphasized from the standpoint of the disabled in the overall category of establishing lifelong education support system for the disabled. In addition, by forming four qualification criteria centering on this, the common convergence field was composed of special education, thereby enhancing the aspect of inclusion in the rehabilitation welfare field and specific convergence into lifelong education for the disabled. As a result, the two qualification types were unified.

Meta-Analysis of Self-Advocacy of People with Developmental Disabilities : Focusing on Research from 2000 to 2023 (발달장애인의 자기옹호에 관련 메타분석 2000년부터 2023년까지 -)

  • Su-Mi Jin;Wha-Soo Kim;Ji-Woo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.201-210
    • /
    • 2023
  • The purpose of this study is to analyze the general characteristics, effect size, and qualitative indicators of self-advocacy studies of people with developmental disabilities published in domestic academic journals and theses. For this purpose, among a total of 2153 papers related to self-advocacy published from 2000 to 2023, 41 studies with developmental disabilities as the keyword were selected, and the specific research results are as follows. Based on the results of this study, when developing a language intervention program related to self-advocacy for people with developmental disabilities, it is recommended to develop an intervention program based on the number of sessions of 10-19 in a learning situation with 20-30 people in adolescents and adults, or during the transition period. There are many studies limited to educational aspects such as special education and integrated education, and by applying this, it is hoped that a self-advocacy language intervention program will be developed at the level of language rehabilitation that can effectively and sophisticatedly assert self-assertion and self-rights after experiencing difficulties in communication.

Syllabus Design and Pronunciation Teaching

  • Amakawa, Yukiko
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.235-240
    • /
    • 2000
  • In the age of global communication, more human exchange is extended at the grass-roots level. In the old days, language policy and language planning was based on one nation-state with one language. But high waves of globalizaiton have allowed extended human flow of exchange beyond one's national border on a daily basis. Under such circumstances, homogeneity in Japan may not allow Japanese to speak and communicate only in Japanese and only with Japanese people. In Japan, an advisory report was made to the Ministry of Education in June 1996 about what education should be like in the 21st century. In this report, an introduction of English at public elementary schools was for the first time made. A basic policy of English instruction at the elementary school level was revealed. With this concept, English instruction is not required at the elementary school level but each school has their own choice of introducing English as their curriculum starting April 2002. As Baker, Colin (1996) indicates the age of three as being the threshold diving a child becoming bilingual naturally or by formal instruction. Threre is a movement towards making second language acquisition more naturalistic in an educational setting, developing communicative competence in a more or less formal way. From the lesson of the Canadian immersion success, Genesee (1987) stresses the importance of early language instruction. It is clear that from a psycho-linguistic perspective, most children acquire basic communication skills in their first language apparently effortlessly and without systematic and formal instruction during the first six or seven years of life. This innate capacity diminishes with age, thereby making language learning increasingly difficult. The author, being a returnee, experienced considerable difficulty acquiring L2, and especially achieving native-like competence. There will be many hurdles to conquer until Japanese students are able to reach at least a communicative level in English. It has been mentioned that English is not taught to clear the college entrance examination, but to communicate. However, Japanese college entrance examination still makes students focus more on the grammar-translation method. This is expected to shift to a more communication stressed approach. Japan does not have to aim at becoming an official bilingual country, but at least communicative English should be taught at every level in school Mito College is a small two-year co-ed college in Japan. Students at Mito College are basically notgood at English. It has only one department for business and economics, and English is required for all freshmen. It is necessary for me to make my classes enjoyable and attractive so that students can at least get motivated to learn English. My major target is communicative English so that students may be prepared to use English in various business settings. As an experiment to introduce more communicative English, the author has made the following syllabus design. This program aims at training students speak and enjoy English. 90-minute class (only 190-minute session per week is most common in Japanese colleges) is divided into two: The first half is to train students orally using Graded Direct Method. The latter half uses different materials each time so that students can learn and enjoy English culture and language simultaneously. There are no quizes or examinations in my one-academic year program. However, all students are required to make an original English poem by the end of the spring semester. 2-6 students work together in a group on one poem. Students coming to Mito College, Japan have one of the lowest English levels in all of Japan. However, an attached example of one poem made by a group shows that students can improve their creativity as long as they are kept encouraged. At the end of the fall semester, all students are then required individually to make a 3-minute original English speech. An example of that speech contest will be presented at the Convention in Seoul.

  • PDF

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

A Study on the Development Trend of Artificial Intelligence Using Text Mining Technique: Focused on Open Source Software Projects on Github (텍스트 마이닝 기법을 활용한 인공지능 기술개발 동향 분석 연구: 깃허브 상의 오픈 소스 소프트웨어 프로젝트를 대상으로)

  • Chong, JiSeon;Kim, Dongsung;Lee, Hong Joo;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.1-19
    • /
    • 2019
  • Artificial intelligence (AI) is one of the main driving forces leading the Fourth Industrial Revolution. The technologies associated with AI have already shown superior abilities that are equal to or better than people in many fields including image and speech recognition. Particularly, many efforts have been actively given to identify the current technology trends and analyze development directions of it, because AI technologies can be utilized in a wide range of fields including medical, financial, manufacturing, service, and education fields. Major platforms that can develop complex AI algorithms for learning, reasoning, and recognition have been open to the public as open source projects. As a result, technologies and services that utilize them have increased rapidly. It has been confirmed as one of the major reasons for the fast development of AI technologies. Additionally, the spread of the technology is greatly in debt to open source software, developed by major global companies, supporting natural language recognition, speech recognition, and image recognition. Therefore, this study aimed to identify the practical trend of AI technology development by analyzing OSS projects associated with AI, which have been developed by the online collaboration of many parties. This study searched and collected a list of major projects related to AI, which were generated from 2000 to July 2018 on Github. This study confirmed the development trends of major technologies in detail by applying text mining technique targeting topic information, which indicates the characteristics of the collected projects and technical fields. The results of the analysis showed that the number of software development projects by year was less than 100 projects per year until 2013. However, it increased to 229 projects in 2014 and 597 projects in 2015. Particularly, the number of open source projects related to AI increased rapidly in 2016 (2,559 OSS projects). It was confirmed that the number of projects initiated in 2017 was 14,213, which is almost four-folds of the number of total projects generated from 2009 to 2016 (3,555 projects). The number of projects initiated from Jan to Jul 2018 was 8,737. The development trend of AI-related technologies was evaluated by dividing the study period into three phases. The appearance frequency of topics indicate the technology trends of AI-related OSS projects. The results showed that the natural language processing technology has continued to be at the top in all years. It implied that OSS had been developed continuously. Until 2015, Python, C ++, and Java, programming languages, were listed as the top ten frequently appeared topics. However, after 2016, programming languages other than Python disappeared from the top ten topics. Instead of them, platforms supporting the development of AI algorithms, such as TensorFlow and Keras, are showing high appearance frequency. Additionally, reinforcement learning algorithms and convolutional neural networks, which have been used in various fields, were frequently appeared topics. The results of topic network analysis showed that the most important topics of degree centrality were similar to those of appearance frequency. The main difference was that visualization and medical imaging topics were found at the top of the list, although they were not in the top of the list from 2009 to 2012. The results indicated that OSS was developed in the medical field in order to utilize the AI technology. Moreover, although the computer vision was in the top 10 of the appearance frequency list from 2013 to 2015, they were not in the top 10 of the degree centrality. The topics at the top of the degree centrality list were similar to those at the top of the appearance frequency list. It was found that the ranks of the composite neural network and reinforcement learning were changed slightly. The trend of technology development was examined using the appearance frequency of topics and degree centrality. The results showed that machine learning revealed the highest frequency and the highest degree centrality in all years. Moreover, it is noteworthy that, although the deep learning topic showed a low frequency and a low degree centrality between 2009 and 2012, their ranks abruptly increased between 2013 and 2015. It was confirmed that in recent years both technologies had high appearance frequency and degree centrality. TensorFlow first appeared during the phase of 2013-2015, and the appearance frequency and degree centrality of it soared between 2016 and 2018 to be at the top of the lists after deep learning, python. Computer vision and reinforcement learning did not show an abrupt increase or decrease, and they had relatively low appearance frequency and degree centrality compared with the above-mentioned topics. Based on these analysis results, it is possible to identify the fields in which AI technologies are actively developed. The results of this study can be used as a baseline dataset for more empirical analysis on future technology trends that can be converged.

Implementation of a Learning Support System that Facilitates Teacher-Student Interaction Utilizing a Digital Human (디지털 휴먼을 활용하여 교수-학생 상호작용을 촉진시키는 학습지원 시스템 구현)

  • Gyu-Sung Jung;Chan-Hyeong Im;Hae-Chan Lee;Ra Yun Boo;Soonuk Seol
    • Journal of Practical Engineering Education
    • /
    • v.14 no.3
    • /
    • pp.523-533
    • /
    • 2022
  • During the COVID-19 pandemic, the use of video classes and real-time online education has increased, but the lack of interaction between instructors and learners remains a challenging problem to be resolved. This paper designs and implements a learning support system that utilizes a digital human to improve faculty-student interaction, which plays an important role in increasing the educational effect and satisfaction of real-time online classes. In this paper, a digital human participates in a class as a virtual learner and asks questions raised by other learners through an anonymous chat system to the instructor on behalf of the learners. In addition, as a class facilitator, the digital human analyzes the lecturer's speech in real time and provides it to the learner in the form of a summary of the class, thereby facilitating faculty-student interaction. In order to confirm that the proposed system can be used in actual online real-time classes, we apply our system to Zoom classes. Experimental results show that facilitated Q&A and real-time class summaries are successfully provided through our digital human-based learning support system.

Performance Improvement of Speaker Recognition by MCE-based Score Combination of Multiple Feature Parameters (MCE기반의 다중 특징 파라미터 스코어의 결합을 통한 화자인식 성능 향상)

  • Kang, Ji Hoon;Kim, Bo Ram;Kim, Kyu Young;Lee, Sang Hoon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.6
    • /
    • pp.679-686
    • /
    • 2020
  • In this thesis, an enhanced method for the feature extraction of vocal source signals and score combination using an MCE-Based weight estimation of the score of multiple feature vectors are proposed for the performance improvement of speaker recognition systems. The proposed feature vector is composed of perceptual linear predictive cepstral coefficients, skewness, and kurtosis extracted with lowpass filtered glottal flow signals to eliminate the flat spectrum region, which is a meaningless information section. The proposed feature was used to improve the conventional speaker recognition system utilizing the mel-frequency cepstral coefficients and the perceptual linear predictive cepstral coefficients extracted with the speech signals and Gaussian mixture models. In addition, to increase the reliability of the estimated scores, instead of estimating the weight using the probability distribution of the convectional score, the scores evaluated by the conventional vocal tract, and the proposed feature are fused by the MCE-Based score combination method to find the optimal speaker. The experimental results showed that the proposed feature vectors contained valid information to recognize the speaker. In addition, when speaker recognition is performed by combining the MCE-based multiple feature parameter scores, the recognition system outperformed the conventional one, particularly in low Gaussian mixture cases.