• Title/Summary/Keyword: speech technology

Search Result 1,900, Processing Time 0.025 seconds

Understanding the semantic change of Hangeul using word embedding (단어 임베딩 기법을 이용한 한글의 의미 변화 파악)

  • Sun, Hyunseok;Lee, Yung-Seop;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.295-308
    • /
    • 2021
  • In recent years, as many people post their interests on social media or store documents in digital form due to the development of the internet and computer technologies, the amount of text data generated has exploded. Accordingly, the demand for technology to create valuable information from numerous document data is also increasing. In this study, through statistical techniques, we investigate how the meanings of Korean words change over time by using the presidential speech records and newspaper articles public data. Using this, we present a strategy that can be utilized in the study of the synchronic change of Hangeul. The purpose of this study is to deviate from the study of the theoretical language phenomenon of Hangeul, which was studied by the intuition of existing linguists or native speakers, to derive numerical values through public documents that can be used by anyone, and to explain the phenomenon of changes in the meaning of words.

An analysis study on the quality of article to improve the performance of hate comments discrimination (악성댓글 판별의 성능 향상을 위한 품사 자질에 대한 분석 연구)

  • Kim, Hyoung Ju;Min, Moon Jong;Kim, Pan Koo
    • Smart Media Journal
    • /
    • v.10 no.4
    • /
    • pp.71-79
    • /
    • 2021
  • One of the social aspects that changes as the use of the Internet becomes widespread is communication in online space. In the past, only one-on-one conversations were possible remotely, except when they were physically in the same space, but nowadays, technology has been developed to enable communication with a large number of people remotely through bulletin boards, communities, and social network services. Due to the development of such information and communication networks, life becomes more convenient, and at the same time, the damage caused by rapid information exchange is also constantly increasing. Recently, cyber crimes such as sending sexual messages or personal attacks to certain people with recognition on the Internet, such as not only entertainers but also influencers, have occurred, and some of those exposed to these cybercrime have committed suicide. In this paper, in order to reduce the damage caused by malicious comments, research a method for improving the performance of discriminate malicious comments through feature extraction based on parts-of-speech.

Generating Audio Adversarial Examples Using a Query-Efficient Decision-Based Attack (질의 효율적인 의사 결정 공격을 통한 오디오 적대적 예제 생성 연구)

  • Seo, Seong-gwan;Mun, Hyunjun;Son, Baehoon;Yun, Joobeom
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.1
    • /
    • pp.89-98
    • /
    • 2022
  • As deep learning technology was applied to various fields, research on adversarial attack techniques, a security problem of deep learning models, was actively studied. adversarial attacks have been mainly studied in the field of images. Recently, they have even developed a complete decision-based attack technique that can attack with just the classification results of the model. However, in the case of the audio field, research is relatively slow. In this paper, we applied several decision-based attack techniques to the audio field and improved state-of-the-art attack techniques. State-of-the-art decision-attack techniques have the disadvantage of requiring many queries for gradient approximation. In this paper, we improve query efficiency by proposing a method of reducing the vector search space required for gradient approximation. Experimental results showed that the attack success rate was increased by 50%, and the difference between original audio and adversarial examples was reduced by 75%, proving that our method could generate adversarial examples with smaller noise.

Style Synthesis of Speech Videos Through Generative Adversarial Neural Networks (적대적 생성 신경망을 통한 얼굴 비디오 스타일 합성 연구)

  • Choi, Hee Jo;Park, Goo Man
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.11
    • /
    • pp.465-472
    • /
    • 2022
  • In this paper, the style synthesis network is trained to generate style-synthesized video through the style synthesis through training Stylegan and the video synthesis network for video synthesis. In order to improve the point that the gaze or expression does not transfer stably, 3D face restoration technology is applied to control important features such as the pose, gaze, and expression of the head using 3D face information. In addition, by training the discriminators for the dynamics, mouth shape, image, and gaze of the Head2head network, it is possible to create a stable style synthesis video that maintains more probabilities and consistency. Using the FaceForensic dataset and the MetFace dataset, it was confirmed that the performance was increased by converting one video into another video while maintaining the consistent movement of the target face, and generating natural data through video synthesis using 3D face information from the source video's face.

Human Rights in The Context of Digitalization. International-Legal Analysis

  • Panova, Liydmyla;Gramatskyy, Ernest;Kryvosheyina, Inha;Makoda, Volodymyr
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.320-326
    • /
    • 2022
  • The use of the Internet has become commonplace for billions of people on the planet. The rapid development of technology, in particular, mobile gadgets, has provided access to communication anywhere, anytime. At the same time, there are growing concerns about the behavior of people on the Internet, in particular, towards each other and social groups in general. This raises the issue of human rights in today's information society. In this study, we focused on human rights such as the right to privacy, confidentiality, freedom of expression, the right to be forgotten, etc. We point to some differences in this regard, in particular between the EU, etc. In addition, we describe the latest legal regulation in this aspect in European countries. Such methods as systemic, factual, formal and legal, to show the factors of formation and development of human rights in the context of digitalization were used. The authors indicate which of them deserve the most attention due to their prevalence and relevance. Thus, we concluded that the technological development of social communications has laid the groundwork for a legal settlement of privacy and opinion issues on the Internet. Simultaneously, jurisdictions address issues on every aspect of human rights on the Internet, based on previous norms, case law, and principles of law. It is concluded that human rights legislation on the Internet will continue to be actively developed to ensure a balance of private and public interests, safe online access and unimpeded access to it.

PATTERNS OF ASSIMILATION OF IGBO VOWELS : AN ACOUSTIC ACCOUNT

  • Clara I. Ikekeonwu
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.514-514
    • /
    • 1996
  • Igbo, a new Benue Congo language has a vowel harmony system which, like that of Akan, is based on the pharynx size or tongue root position. In this study we examine Igbo vowel harmony with particular reference to assimilatory patterns of vowels in different harmony sets. This is to gain some insight into the factors involved in Igbo vowel assimilation, and to establish to what extent reports on Akan vowel assimilation are validated in Igbo. Tokens of the eight phonemic vowels of Standard Igbo are recorded from three native speakers of Igbo. The vowels are acoustically investigated (using the LPC analysis of CSL) in individual lexical items and within carefully designed carrier phrases. The F1 and F2 values of the vowels are obtained as these formant values are generally useful in establishing the salient characteristics of vowels. Vowels from the harmony sets are juxtaposed in the carrier phrases to ascertain the extent of assimilation. Results of the investigation show that the F1 values, to a large extend, are enough to characterize these vowels. The (-Expanded) vowels have higher F1 values than their (+Expanded) counterpart. Where there is an overlap in F1 values for some vowels the F1 bandwidth values serve to distinguish between the vowels. The overlap often reported in Akan for /I/ and /e/ on the one hand and /${\mho}$/ and /o/ on the other is not validated in Igbo. While the F1 values for these pairs of vowels are quite similar for one of our speakers, there is an appreciable difference between the F1 values of these vowels for the other two speakers. There is however an overlap for /e/ and /o/ for one of the speakers. Assimilations are generally regressive across word boundaries. It is, however, necessary to point out that the general perceptual impression that one of the vowels completely assimilates to the other, is not borne out by our investigation. Most of our F1 and F2 values for the vowels in individual lexical items are altered in assimilations. This then suggests that assimilation involving these vowels is partial rather than complete. The emerging 'allophones' are acoustically similar to the (+Expanded) vowel involved in the assimilation, that is when vowels from different harmony sets are involved. We conclude that while assimilation of Igbo vowels involves some phonological considerations, phonetic factors appear to be permanent in deciding the final form of the vowels.

  • PDF

Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU (GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법)

  • Kim, Mincheol;Lee, Kwangyeob
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.10
    • /
    • pp.935-943
    • /
    • 2017
  • CNN (Convolution neural network), which is used for image classification and speech recognition among neural networks learning based on positive data, has been continuously developed to have a high performance structure to date. There are many difficulties to utilize in an embedded system with limited resources. Therefore, we use GPU (General-Purpose Computing on Graphics Processing Units), which is used for general-purpose operation of GPU to solve the problem because we use pre-learned weights but there are still limitations. Since CNN performs simple and iterative operations, the computation speed varies greatly depending on the thread allocation and utilization method in the Single Instruction Multiple Thread (SIMT) based GPGPU. To solve this problem, there is a thread that needs to be relaxed when performing Convolution and Pooling operations with threads. The remaining threads have increased the operation speed by using the method used in the following feature maps and kernel calculations.

A Needs Analysis of Educational Content for Overseas Job Applicants in the Digital Bio-health Industry

  • Soobok Lee;Wootaek Lim
    • Physical Therapy Korea
    • /
    • v.30 no.3
    • /
    • pp.230-236
    • /
    • 2023
  • Background: The globalization of the healthcare industry and the increasing demand for skilled professionals in the global healthcare industry have opened up opportunities for specialized biotech healthcare professionals to seek overseas employment and career advancement. Objects: This study aimed to develop educational content essential for the overseas employment of digital bio-health professionals. Methods: A survey was conducted among 196 participants. Google Forms (Google) were utilized to create and administer the survey, employing purposive sampling, a non-probability sampling method. Data analysis was performed using IBM SPSS 25.0 (IBM Co.), including Cronbach's α and independent sample t-tests to assess significant differences. Results: About half of college students are interested in overseas employment and international careers, while the other half had not. The most common reason for wanting to work or go overseas was "foreign experience will be useful for future activities in Korea." Students who had experience taking courses from the Bio-health Convergence Open Sharing University preferred overseas programs more than those who did not have that experience. In terms of the degree of desire for overseas education courses provided by universities, contents related to human health were the highest, followed by bio-health big data. Conclusion: Many students wanted to work and go overseas if there is sufficient support from the university. The findings in this study suggest that universities are necessary to play an important role in supporting students' aspirations to work or go overseas by providing language education, education and training programs, information on overseas jobs, and mentoring programs.

Case Study : Cinematography using Digital Human in Tiny Virtual Production (초소형 버추얼 프로덕션 환경에서 디지털 휴먼을 이용한 촬영 사례)

  • Jaeho Im;Minjung Jang;Sang Wook Chun;Subin Lee;Minsoo Park;Yujin Kim
    • Journal of the Korea Computer Graphics Society
    • /
    • v.29 no.3
    • /
    • pp.21-31
    • /
    • 2023
  • In this paper, we introduce a case study of cinematography using digital human in virtual production. This case study deals with the system overview of virtual production using LEDs and an efficient filming pipeline using digital human. Unlike virtual production using LEDs, which mainly project the background on LEDs, in this case, we use digital human as a virtual actor to film scenes communicating with a real actor. In addition, to film the dialogue scene between the real actor and the digital human using a real-time engine, we automatically generated speech animation of the digital human in advance by applying our Korean lip-sync technology based on audio and text. We verified this filming case by using a real-time engine to produce short drama content using real actor and digital human in an LED-based virtual production environment.

Research on Developing a Conversational AI Callbot Solution for Medical Counselling

  • Won Ro LEE;Jeong Hyon CHOI;Min Soo KANG
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.4
    • /
    • pp.9-13
    • /
    • 2023
  • In this study, we explored the potential of integrating interactive AI callbot technology into the medical consultation domain as part of a broader service development initiative. Aimed at enhancing patient satisfaction, the AI callbot was designed to efficiently address queries from hospitals' primary users, especially the elderly and those using phone services. By incorporating an AI-driven callbot into the hospital's customer service center, routine tasks such as appointment modifications and cancellations were efficiently managed by the AI Callbot Agent. On the other hand, tasks requiring more detailed attention or specialization were addressed by Human Agents, ensuring a balanced and collaborative approach. The deep learning model for voice recognition for this study was based on the Transformer model and fine-tuned to fit the medical field using a pre-trained model. Existing recording files were converted into learning data to perform SSL(self-supervised learning) Model was implemented. The ANN (Artificial neural network) neural network model was used to analyze voice signals and interpret them as text, and after actual application, the intent was enriched through reinforcement learning to continuously improve accuracy. In the case of TTS(Text To Speech), the Transformer model was applied to Text Analysis, Acoustic model, and Vocoder, and Google's Natural Language API was applied to recognize intent. As the research progresses, there are challenges to solve, such as interconnection issues between various EMR providers, problems with doctor's time slots, problems with two or more hospital appointments, and problems with patient use. However, there are specialized problems that are easy to make reservations. Implementation of the callbot service in hospitals appears to be applicable immediately.