• Title/Summary/Keyword: Speech Processing

Search Result 956, Processing Time 0.029 seconds

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Cortical Network Activated by Korean Traditional Opera (Pansori): A Functional MR Study

  • Kim, Yun-Hee;Kim, Hyun-Gi;Kim, Seong-Yong;Kim, Hyoung-Ihl;Todd. B. Parrish;Hong, In-Ki;Sohn, Jin-Hun
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2000.04a
    • /
    • pp.113-119
    • /
    • 2000
  • The Pansori is a Korean traditional vocal music that has a unique story and melody which converts deep emotion into art. It has both verbal and emotional components. which can be coordinated by large-scale neural network. The purpose of this study is to illustrate the cortical network activated by a Korean traditional opera, Pansori, with different emotional valence using functional MRI (fMRI).Nine right-handed volunteers participated. Their mean age was 25.3 and the mean modified Edinburgh score was +90.1. Activation tasks were designed for the subjects to passively listen to the two parts of Pansories with sad or hilarious emotional valence. White noise was introduced during the control periods. Imaging was conducted on a 1.5T Siemens Vision Vision scanner. Single-shot echoplanar fMRI scans (TR/TE 3840/40 ms, flip angle 90, FOV 220, 64 x 64 matrix, 6mm thickness) were acquired in 20 contiguous slices. Imaging data were motion-corrected, coregistered, normalized, and smoothed using SPM-96 software.Bilateral posterior temporal regions were activated in both of Pansori tasks, but different asymmetry between the tasks was found. The Pansori with sad emotion showed more activation in the light superior temporal regions as well as the right inferior frontal and the orbitofrontal areas than in the right superior temporal regions as well as the right inferior frontal and the orbitofrontal areas than in the left side. In the Pansori with hilarious emotion, there was a remarkable activation in the left hemisphere especially at the posterior temporal and the temporooccipital regions as well as in the left inferior and the prefrontal areas. After subtraction between two tasks, the sad Pansori showed more activation in the right temporoparietal and the orbitofrontal areas, in contrast, the one with hilarious emotion showed more activation in the left temporal and the prefrontal areas. These results suggested that different hemispheric asymmetry and cortical areas are subserved for the processing of different emotional valences carried by the Pansories.

  • PDF

Pivot Discrimination Approach for Paraphrase Extraction from Bilingual Corpus (이중 언어 기반 패러프레이즈 추출을 위한 피봇 차별화 방법)

  • Park, Esther;Lee, Hyoung-Gyu;Kim, Min-Jeong;Rim, Hae-Chang
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.1
    • /
    • pp.57-78
    • /
    • 2011
  • Paraphrasing is the act of writing a text using other words without altering the meaning. Paraphrases can be used in many fields of natural language processing. In particular, paraphrases can be incorporated in machine translation in order to improve the coverage and the quality of translation. Recently, the approaches on paraphrase extraction utilize bilingual parallel corpora, which consist of aligned sentence pairs. In these approaches, paraphrases are identified, from the word alignment result, by pivot phrases which are the phrases in one language to which two or more phrases are connected in the other language. However, the word alignment is itself a very difficult task, so there can be many alignment errors. Moreover, the alignment errors can lead to the problem of selecting incorrect pivot phrases. In this study, we propose a method in paraphrase extraction that discriminates good pivot phrases from bad pivot phrases. Each pivot phrase is weighted according to its reliability, which is scored by considering the lexical and part-of-speech information. The experimental result shows that the proposed method achieves higher precision and recall of the paraphrase extraction than the baseline. Also, we show that the extracted paraphrases can increase the coverage of the Korean-English machine translation.

  • PDF

A Study about the Users's Preferred Playing Speeds on Categorized Video Content using WSOLA method (WSOLA를 이용한 동영상 미세배속 재생 서비스에 대한 콘텐츠별 배속 선호도 분석 연구)

  • Kim, I-Gil
    • Journal of Digital Contents Society
    • /
    • v.16 no.2
    • /
    • pp.291-298
    • /
    • 2015
  • In a fast-paced information technology environment, consumption of video content is changing from one-way television viewing to VOD (Video on Demand) playing anywhere, anytime, on any device. This video-watching trend gives additional importance to videos with fine-speed-control, in addition to the strength of the digital video signal. Currently, many video players provide a fine-speed-control function which can speed up the video to skip a boring part, or slow it down to focus on an exciting scene. The audio information is just as important as the visual information for understanding the content of the speed-controlled video. Thus, a number of algorithms for fine-speed-control video-playing technologies have been proposed to solve the pitch distortion in the audio-processing area. In this study, well-known techniques for prosodic modification of speech signals, WSOLA (Waveform-Similarity-Based Overlap-Add), have been applied to analyze users' needs for fine-speed-control video playing. By surveying the users' preferred speeds on categorized video content and analyzing the results, this paper proposes that various fine-speed adjustments are needed to accommodate users' preferred video consumption.

A Classification Method of Delirium Patients Using Local Covering-Based Rule Acquisition Approach with Rough Lower Approximation (러프 하한 근사를 갖는 로컬 커버링 기반 규칙 획득 기법을 이용한 섬망 환자의 분류 방법)

  • Son, Chang Sik;Kang, Won Seok;Lee, Jong Ha;Moon, Kyoung Ja
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.4
    • /
    • pp.137-144
    • /
    • 2020
  • Delirium is among the most common mental disorders encountered in patients with a temporary cognitive impairment such as consciousness disorder, attention disorder, and poor speech, particularly among those who are older. Delirium is distressing for patients and families, can interfere with the management of symptoms such as pain, and is associated with increased elderly mortality. The purpose of this paper is to generate useful clinical knowledge that can be used to distinguish the outcomes of patients with delirium in long-term care facilities. For this purpose, we extracted the clinical classification knowledge associated with delirium using a local covering rule acquisition approach with the rough lower approximation region. The clinical applicability of the proposed method was verified using data collected from a prospective cohort study. From the results of this study, we found six useful clinical pieces of evidence that the duration of delirium could more than 12 days. Also, we confirmed eight factors such as BMI, Charlson Comorbidity Index, hospitalization path, nutrition deficiency, infection, sleep disturbance, bed scores, and diaper use are important in distinguishing the outcomes of delirium patients. The classification performance of the proposed method was verified by comparison with three benchmarking models, ANN, SVM with RBF kernel, and Random Forest, using a statistical five-fold cross-validation method. The proposed method showed an improved average performance of 0.6% and 2.7% in both accuracy and AUC criteria when compared with the SVM model with the highest classification performance of the three models respectively.

Voice Interactions with A. I. Agent : Analysis of Domestic and Overseas IT Companies (A.I.에이전트와의 보이스 인터랙션 : 국내외 IT회사 사례연구)

  • Lee, Seo-Young
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.4
    • /
    • pp.15-29
    • /
    • 2021
  • Many countries and companies are pursuing and developing Artificial intelligence as it is the core technology of the 4th industrial revolution. Global IT companies such as Apple, Microsoft, Amazon, Google and Samsung have all released their own AI assistant hardware products, hoping to increase customer loyalty and capture market share. Competition within the industry for AI agent is intense. AI assistant products that command the biggest market shares and customer loyalty have a higher chance of becoming the industry standard. This study analyzed the current status of major overseas and domestic IT companies in the field of artificial intelligence, and suggested future strategic directions for voice UI technology development and user satisfaction. In terms of B2B technology, it is recommended that IT companies use cloud computing to store big data, innovative artificial intelligence technologies and natural language technologies. Offering voice recognition technologies on the cloud enables smaller companies to take advantage of such technologies at considerably less expense. Companies also consider using GPT-3(Generative Pre-trained Transformer 3) an open source artificial intelligence language processing software that can generate very natural human-like interactions and high levels of user satisfaction. There is a need to increase usefulness and usability to enhance user satisfaction. This study has practical and theoretical implications for industry and academia.

Development of a Web-based Presentation Attitude Correction Program Centered on Analyzing Facial Features of Videos through Coordinate Calculation (좌표계산을 통해 동영상의 안면 특징점 분석을 중심으로 한 웹 기반 발표 태도 교정 프로그램 개발)

  • Kwon, Kihyeon;An, Suho;Park, Chan Jung
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.2
    • /
    • pp.10-21
    • /
    • 2022
  • In order to improve formal presentation attitudes such as presentation of job interviews and presentation of project results at the company, there are few automated methods other than observation by colleagues or professors. In previous studies, it was reported that the speaker's stable speech and gaze processing affect the delivery power in the presentation. Also, there are studies that show that proper feedback on one's presentation has the effect of increasing the presenter's ability to present. In this paper, considering the positive aspects of correction, we developed a program that intelligently corrects the wrong presentation habits and attitudes of college students through facial analysis of videos and analyzed the proposed program's performance. The proposed program was developed through web-based verification of the use of redundant words and facial recognition and textualization of the presentation contents. To this end, an artificial intelligence model for classification was developed, and after extracting the video object, facial feature points were recognized based on the coordinates. Then, using 4000 facial data, the performance of the algorithm in this paper was compared and analyzed with the case of facial recognition using a Teachable Machine. Use the program to help presenters by correcting their presentation attitude.

The Role of Fundamentalization of Education in Improving the Future Specialists Professional Training with Usage of Multimedia Technologies

  • Palshkov, Kostiantyn;Kochubei, Olena;Tsokur, Olga;Tiahur, Vasyl;Tiahur, Liubomyra;Filimonova, Tetiana;Kuzminskyi, Anatolii
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.95-102
    • /
    • 2022
  • The article considers the fundamentalization of education in improving the future specialists professional training with usage of multimedia technologies by various scientists. Various points of view and approaches to defining the concepts of fundamentalization of education and multimedia technologies are identified. The concept of fundamentalization of professional training of a future specialist is based on the goals and functions of fundamentalization and - on the ways and means of achieving it, etc. Most authors agree only in their views that the fundamentalization of education is aimed at improving the quality of education and the education of the individual. Others involve the formation of a culture and worldview, increasing the creative and intellectual potential, forming the professional competence of a specialist and the potential for further education, and so on. The term multimedia refers to interactive systems that provide processing of moving and still video images, animated graphics, high-quality audio and speech. It is found out that professional training of a specialist by means of multimedia technologies includes not only the activities of the teacher and student, which form the learning process, but also the independent activity of the subject, self-development, assimilation of experience by the subject through analysis, comprehension and transformation of the field of activity in which he is included. It is revealed through the implementation of which approaches to the fundamentalization of higher professional education, it becomes possible to fully present theoretical training courses and effectively pass practical training by students, which contributes to improving the quality of training of future specialists in higher education institutions. Theoretical analysis of scientific views indicates a fairly serious attention of scientists to the problem of professional readiness of specialists and the possibility of higher educational institutions in preparing for it. At the same time, professional readiness is considered from different positions: as an active state of a person, which manifests itself in activity; as a result of activity; as goals of activity; as a quality that characterizes the attitude to solving professional problems and social situations; as a prerequisite for purposeful activity; as a form of activity of the subject; as an integral formation of personality; as a component of socio-professional culture; as a complex professionally significant neoplasm of the individual.

Multifaceted Evaluation Methodology for AI Interview Candidates - Integration of Facial Recognition, Voice Analysis, and Natural Language Processing (AI면접 대상자에 대한 다면적 평가방법론 -얼굴인식, 음성분석, 자연어처리 영역의 융합)

  • Hyunwook Ji;Sangjin Lee;Seongmin Mun;Jaeyeol Lee;Dongeun Lee;kyusang Lim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.55-58
    • /
    • 2024
  • 최근 각 기업의 AI 면접시스템 도입이 증가하고 있으며, AI 면접에 대한 실효성 논란 또한 많은 상황이다. 본 논문에서는 AI 면접 과정에서 지원자를 평가하는 방식을 시각, 음성, 자연어처리 3영역에서 구현함으로써, 면접 지원자를 다방면으로 분석 방법론의 적절성에 대해 평가하고자 한다. 첫째, 시각적 측면에서, 면접 지원자의 감정을 인식하기 위해, 합성곱 신경망(CNN) 기법을 활용해, 지원자 얼굴에서 6가지 감정을 인식했으며, 지원자가 카메라를 응시하고 있는지를 시계열로 도출하였다. 이를 통해 지원자가 면접에 임하는 태도와 특히 얼굴에서 드러나는 감정을 분석하는 데 주력했다. 둘째, 시각적 효과만으로 면접자의 태도를 파악하는 데 한계가 있기 때문에, 지원자 음성을 주파수로 환산해 특성을 추출하고, Bidirectional LSTM을 활용해 훈련해 지원자 음성에 따른 6가지 감정을 추출했다. 셋째, 지원자의 발언 내용과 관련해 맥락적 의미를 파악해 지원자의 상태를 파악하기 위해, 음성을 STT(Speech-to-Text) 기법을 이용하여 텍스트로 변환하고, 사용 단어의 빈도를 분석하여 지원자의 언어 습관을 파악했다. 이와 함께, 지원자의 발언 내용에 대한 감정 분석을 위해 KoBERT 모델을 적용했으며, 지원자의 성격, 태도, 직무에 대한 이해도를 파악하기 위해 객관적인 평가지표를 제작하여 적용했다. 논문의 분석 결과 AI 면접의 다면적 평가시스템의 적절성과 관련해, 시각화 부분에서는 상당 부분 정확도가 객관적으로 입증되었다고 판단된다. 음성에서 감정분석 분야는 면접자가 제한된 시간에 모든 유형의 감정을 드러내지 않고, 또 유사한 톤의 말이 진행되다 보니 특정 감정을 나타내는 주파수가 다소 집중되는 현상이 나타났다. 마지막으로 자연어처리 영역은 면접자의 발언에서 나오는 말투, 특정 단어의 빈도수를 넘어, 전체적인 맥락과 느낌을 이해할 수 있는 자연어처리 분석모델의 필요성이 더욱 커졌음을 판단했다.

  • PDF

Modeling of Sensorineural Hearing Loss for the Evaluation of Digital Hearing Aid Algorithms (디지털 보청기 알고리즘 평가를 위한 감음신경성 난청의 모델링)

  • 김동욱;박영철
    • Journal of Biomedical Engineering Research
    • /
    • v.19 no.1
    • /
    • pp.59-68
    • /
    • 1998
  • Digital hearing aids offer many advantages over conventional analog hearing aids. With the advent of high speed digital signal processing chips, new digital techniques have been introduced to digital hearing aids. In addition, the evaluation of new ideas in hearing aids is necessarily accompanied by intensive subject-based clinical tests which requires much time and cost. In this paper, we present an objective method to evaluate and predict the performance of hearing aid systems without the help of such subject-based tests. In the hearing impairment simulation(HIS) algorithm, a sensorineural hearing impairment medel is established from auditory test data of the impaired subject being simulated. Also, the nonlinear behavior of the loudness recruitment is defined using hearing loss functions generated from the measurements. To transform the natural input sound into the impaired one, a frequency sampling filter is designed. The filter is continuously refreshed with the level-dependent frequency response function provided by the impairment model. To assess the performance, the HIS algorithm was implemented in real-time using a floating-point DSP. Signals processed with the real-time system were presented to normal subjects and their auditory data modified by the system was measured. The sensorineural hearing impairment was simulated and tested. The threshold of hearing and the speech discrimination tests exhibited the efficiency of the system in its use for the hearing impairment simulation. Using the HIS system we evaluated three typical hearing aid algorithms.

  • PDF