• Title/Summary/Keyword: Speech Processing

Search Result 960, Processing Time 0.039 seconds

Bi-directional LSTM-CNN-CRF for Korean Named Entity Recognition System with Feature Augmentation (자질 보강과 양방향 LSTM-CNN-CRF 기반의 한국어 개체명 인식 모델)

  • Lee, DongYub;Yu, Wonhee;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.12
    • /
    • pp.55-62
    • /
    • 2017
  • The Named Entity Recognition system is a system that recognizes words or phrases with object names such as personal name (PS), place name (LC), and group name (OG) in the document as corresponding object names. Traditional approaches to named entity recognition include statistical-based models that learn models based on hand-crafted features. Recently, it has been proposed to construct the qualities expressing the sentence using models such as deep-learning based Recurrent Neural Networks (RNN) and long-short term memory (LSTM) to solve the problem of sequence labeling. In this research, to improve the performance of the Korean named entity recognition system, we used a hand-crafted feature, part-of-speech tagging information, and pre-built lexicon information to augment features for representing sentence. Experimental results show that the proposed method improves the performance of Korean named entity recognition system. The results of this study are presented through github for future collaborative research with researchers studying Korean Natural Language Processing (NLP) and named entity recognition system.

Hanja Information in the Entries of Korean Unabridged Dictionary (국어대사전의 표제어에 나타나는 한자 정보)

  • Kim, Cheol-Su
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.4
    • /
    • pp.438-446
    • /
    • 2010
  • For language information processing that includes both Hangul and Hanja, an electronic dictionary supporting Hangul and Hanja simultaneously is necessary. This paper examined statistical information on Hanja entries of Korean Unabridged Dictionary such as the number of entries that include Hanja based on the KSC-5601 character set, the frequency of the pronunciation and meaning of each character of Hanja included in the entries, the frequency per part of speech of Hanja in entries and the average number of Hanja characters per entry. At least one or more of Hanja characters appear in 303,951 entries out of 440,594, accounting for 68.99% of the total. 858,595 characters of Hanja are included in the 440,594 entries, which is 1.95 Hanja characters per entry. As the average syllable length of the entries is 3.56 and the average count of the Hanja characters per entry is 1.96, it can be said that 54.7% of all the characters of the entries are in Hanja. Among 4,888 Hanja character codes, 4,660 are used once or more, whereas 228 Hanja codes never appear in any entry. There were 5 characters which appear more than 4,000 times. A total of 858,595 Hanja characters used in all the entries correspond to 471 Hangeul codes.

Detecting Errors in POS-Tagged Corpus on XGBoost and Cross Validation (XGBoost와 교차검증을 이용한 품사부착말뭉치에서의 오류 탐지)

  • Choi, Min-Seok;Kim, Chang-Hyun;Park, Ho-Min;Cheon, Min-Ah;Yoon, Ho;Namgoong, Young;Kim, Jae-Kyun;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.7
    • /
    • pp.221-228
    • /
    • 2020
  • Part-of-Speech (POS) tagged corpus is a collection of electronic text in which each word is annotated with a tag as the corresponding POS and is widely used for various training data for natural language processing. The training data generally assumes that there are no errors, but in reality they include various types of errors, which cause performance degradation of systems trained using the data. To alleviate this problem, we propose a novel method for detecting errors in the existing POS tagged corpus using the classifier of XGBoost and cross-validation as evaluation techniques. We first train a classifier of a POS tagger using the POS-tagged corpus with some errors and then detect errors from the POS-tagged corpus using cross-validation, but the classifier cannot detect errors because there is no training data for detecting POS tagged errors. We thus detect errors by comparing the outputs (probabilities of POS) of the classifier, adjusting hyperparameters. The hyperparameters is estimated by a small scale error-tagged corpus, in which text is sampled from a POS-tagged corpus and which is marked up POS errors by experts. In this paper, we use recall and precision as evaluation metrics which are widely used in information retrieval. We have shown that the proposed method is valid by comparing two distributions of the sample (the error-tagged corpus) and the population (the POS-tagged corpus) because all detected errors cannot be checked. In the near future, we will apply the proposed method to a dependency tree-tagged corpus and a semantic role tagged corpus.

Artificial Intelligence for Assistance of Facial Expression Practice Using Emotion Classification (감정 분류를 이용한 표정 연습 보조 인공지능)

  • Dong-Kyu, Kim;So Hwa, Lee;Jae Hwan, Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.6
    • /
    • pp.1137-1144
    • /
    • 2022
  • In this study, an artificial intelligence(AI) was developed to help with facial expression practice in order to express emotions. The developed AI used multimodal inputs consisting of sentences and facial images for deep neural networks (DNNs). The DNNs calculated similarities between the emotions predicted by the sentences and the emotions predicted by facial images. The user practiced facial expressions based on the situation given by sentences, and the AI provided the user with numerical feedback based on the similarity between the emotion predicted by sentence and the emotion predicted by facial expression. ResNet34 structure was trained on FER2013 public data to predict emotions from facial images. To predict emotions in sentences, KoBERT model was trained in transfer learning manner using the conversational speech dataset for emotion classification opened to the public by AIHub. The DNN that predicts emotions from the facial images demonstrated 65% accuracy, which is comparable to human emotional classification ability. The DNN that predicts emotions from the sentences achieved 90% accuracy. The performance of the developed AI was evaluated through experiments with changing facial expressions in which an ordinary person was participated.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Cortical Network Activated by Korean Traditional Opera (Pansori): A Functional MR Study

  • Kim, Yun-Hee;Kim, Hyun-Gi;Kim, Seong-Yong;Kim, Hyoung-Ihl;Todd. B. Parrish;Hong, In-Ki;Sohn, Jin-Hun
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2000.04a
    • /
    • pp.113-119
    • /
    • 2000
  • The Pansori is a Korean traditional vocal music that has a unique story and melody which converts deep emotion into art. It has both verbal and emotional components. which can be coordinated by large-scale neural network. The purpose of this study is to illustrate the cortical network activated by a Korean traditional opera, Pansori, with different emotional valence using functional MRI (fMRI).Nine right-handed volunteers participated. Their mean age was 25.3 and the mean modified Edinburgh score was +90.1. Activation tasks were designed for the subjects to passively listen to the two parts of Pansories with sad or hilarious emotional valence. White noise was introduced during the control periods. Imaging was conducted on a 1.5T Siemens Vision Vision scanner. Single-shot echoplanar fMRI scans (TR/TE 3840/40 ms, flip angle 90, FOV 220, 64 x 64 matrix, 6mm thickness) were acquired in 20 contiguous slices. Imaging data were motion-corrected, coregistered, normalized, and smoothed using SPM-96 software.Bilateral posterior temporal regions were activated in both of Pansori tasks, but different asymmetry between the tasks was found. The Pansori with sad emotion showed more activation in the light superior temporal regions as well as the right inferior frontal and the orbitofrontal areas than in the right superior temporal regions as well as the right inferior frontal and the orbitofrontal areas than in the left side. In the Pansori with hilarious emotion, there was a remarkable activation in the left hemisphere especially at the posterior temporal and the temporooccipital regions as well as in the left inferior and the prefrontal areas. After subtraction between two tasks, the sad Pansori showed more activation in the right temporoparietal and the orbitofrontal areas, in contrast, the one with hilarious emotion showed more activation in the left temporal and the prefrontal areas. These results suggested that different hemispheric asymmetry and cortical areas are subserved for the processing of different emotional valences carried by the Pansories.

  • PDF

Pivot Discrimination Approach for Paraphrase Extraction from Bilingual Corpus (이중 언어 기반 패러프레이즈 추출을 위한 피봇 차별화 방법)

  • Park, Esther;Lee, Hyoung-Gyu;Kim, Min-Jeong;Rim, Hae-Chang
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.1
    • /
    • pp.57-78
    • /
    • 2011
  • Paraphrasing is the act of writing a text using other words without altering the meaning. Paraphrases can be used in many fields of natural language processing. In particular, paraphrases can be incorporated in machine translation in order to improve the coverage and the quality of translation. Recently, the approaches on paraphrase extraction utilize bilingual parallel corpora, which consist of aligned sentence pairs. In these approaches, paraphrases are identified, from the word alignment result, by pivot phrases which are the phrases in one language to which two or more phrases are connected in the other language. However, the word alignment is itself a very difficult task, so there can be many alignment errors. Moreover, the alignment errors can lead to the problem of selecting incorrect pivot phrases. In this study, we propose a method in paraphrase extraction that discriminates good pivot phrases from bad pivot phrases. Each pivot phrase is weighted according to its reliability, which is scored by considering the lexical and part-of-speech information. The experimental result shows that the proposed method achieves higher precision and recall of the paraphrase extraction than the baseline. Also, we show that the extracted paraphrases can increase the coverage of the Korean-English machine translation.

  • PDF

A Study about the Users's Preferred Playing Speeds on Categorized Video Content using WSOLA method (WSOLA를 이용한 동영상 미세배속 재생 서비스에 대한 콘텐츠별 배속 선호도 분석 연구)

  • Kim, I-Gil
    • Journal of Digital Contents Society
    • /
    • v.16 no.2
    • /
    • pp.291-298
    • /
    • 2015
  • In a fast-paced information technology environment, consumption of video content is changing from one-way television viewing to VOD (Video on Demand) playing anywhere, anytime, on any device. This video-watching trend gives additional importance to videos with fine-speed-control, in addition to the strength of the digital video signal. Currently, many video players provide a fine-speed-control function which can speed up the video to skip a boring part, or slow it down to focus on an exciting scene. The audio information is just as important as the visual information for understanding the content of the speed-controlled video. Thus, a number of algorithms for fine-speed-control video-playing technologies have been proposed to solve the pitch distortion in the audio-processing area. In this study, well-known techniques for prosodic modification of speech signals, WSOLA (Waveform-Similarity-Based Overlap-Add), have been applied to analyze users' needs for fine-speed-control video playing. By surveying the users' preferred speeds on categorized video content and analyzing the results, this paper proposes that various fine-speed adjustments are needed to accommodate users' preferred video consumption.

A Classification Method of Delirium Patients Using Local Covering-Based Rule Acquisition Approach with Rough Lower Approximation (러프 하한 근사를 갖는 로컬 커버링 기반 규칙 획득 기법을 이용한 섬망 환자의 분류 방법)

  • Son, Chang Sik;Kang, Won Seok;Lee, Jong Ha;Moon, Kyoung Ja
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.4
    • /
    • pp.137-144
    • /
    • 2020
  • Delirium is among the most common mental disorders encountered in patients with a temporary cognitive impairment such as consciousness disorder, attention disorder, and poor speech, particularly among those who are older. Delirium is distressing for patients and families, can interfere with the management of symptoms such as pain, and is associated with increased elderly mortality. The purpose of this paper is to generate useful clinical knowledge that can be used to distinguish the outcomes of patients with delirium in long-term care facilities. For this purpose, we extracted the clinical classification knowledge associated with delirium using a local covering rule acquisition approach with the rough lower approximation region. The clinical applicability of the proposed method was verified using data collected from a prospective cohort study. From the results of this study, we found six useful clinical pieces of evidence that the duration of delirium could more than 12 days. Also, we confirmed eight factors such as BMI, Charlson Comorbidity Index, hospitalization path, nutrition deficiency, infection, sleep disturbance, bed scores, and diaper use are important in distinguishing the outcomes of delirium patients. The classification performance of the proposed method was verified by comparison with three benchmarking models, ANN, SVM with RBF kernel, and Random Forest, using a statistical five-fold cross-validation method. The proposed method showed an improved average performance of 0.6% and 2.7% in both accuracy and AUC criteria when compared with the SVM model with the highest classification performance of the three models respectively.

Voice Interactions with A. I. Agent : Analysis of Domestic and Overseas IT Companies (A.I.에이전트와의 보이스 인터랙션 : 국내외 IT회사 사례연구)

  • Lee, Seo-Young
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.4
    • /
    • pp.15-29
    • /
    • 2021
  • Many countries and companies are pursuing and developing Artificial intelligence as it is the core technology of the 4th industrial revolution. Global IT companies such as Apple, Microsoft, Amazon, Google and Samsung have all released their own AI assistant hardware products, hoping to increase customer loyalty and capture market share. Competition within the industry for AI agent is intense. AI assistant products that command the biggest market shares and customer loyalty have a higher chance of becoming the industry standard. This study analyzed the current status of major overseas and domestic IT companies in the field of artificial intelligence, and suggested future strategic directions for voice UI technology development and user satisfaction. In terms of B2B technology, it is recommended that IT companies use cloud computing to store big data, innovative artificial intelligence technologies and natural language technologies. Offering voice recognition technologies on the cloud enables smaller companies to take advantage of such technologies at considerably less expense. Companies also consider using GPT-3(Generative Pre-trained Transformer 3) an open source artificial intelligence language processing software that can generate very natural human-like interactions and high levels of user satisfaction. There is a need to increase usefulness and usability to enhance user satisfaction. This study has practical and theoretical implications for industry and academia.