• Title/Summary/Keyword: 단어 인식

Search Result 926, Processing Time 0.022 seconds

ManiFL : A Better Natural-Language-Processing Tool Based On Shallow-Learning (ManiFL : 얕은 학습 기반의 더 나은 자연어처리 도구)

  • Shin, Joon-Choul;Kim, Wan-Su;Lee, Ju-Sang;Ock, Cheol-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.311-315
    • /
    • 2021
  • 근래의 자연어처리 분야에서는 잘 만들어진 도구(Library)를 이용하여 생산성 높은 개발과 연구가 활발하게 이뤄지고 있다. 이 중에 대다수는 깊은 학습(Deep-Learning, 딥러닝) 기반인데, 이런 모델들은 학습 속도가 느리고, 비용이 비싸고, 사용(Run-Time) 속도도 느리다. 이뿐만 아니라 라벨(Label)의 가짓수가 굉장히 많거나, 라벨의 구성이 단어마다 달라질 수 있는 의미분별(동형이의어, 다의어 번호 태깅) 분야에서 딥러닝은 굉장히 비효율적인 문제가 있다. 이런 문제들은 오히려 기존의 얕은 학습(Shallow-Learning)기반 모델에서는 없던 것들이지만, 최근의 연구경향에서 딥러닝 비중이 급격히 증가하면서, 멀티스레딩 같은 고급 기능들을 지원하는 얕은 학습 기반 언어모델이 새로이 개발되지 않고 있었다. 본 논문에서는 학습과 태깅 모두에서 멀티스레딩을 지원하고, 딥러닝에서 연구된 드롭아웃 기법이 구현된 자연어처리 도구인 혼합 자질 가변 표지기 ManiFL(Manifold Feature Labelling : ManiFL)을 소개한다. 본 논문은 실험을 통해서 ManiFL로 다의어태깅이 가능함을 보여주고, 딥러닝과 CRFsuite에서 높은 성능을 보여주는 개체명 인식에서도 비교할만한 성능이 나옴을 보였다.

  • PDF

Korean Idiom Classification Using Word Embedding (워드 임베딩을 활용한 관용표현 인식 연구)

  • Park, Seo-Yoon;Kang, Ye-Jee;Kang, Hye-Rin;Jang, Yeon-Ji;Kim, Han-Saem
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.548-553
    • /
    • 2020
  • 우리가 쓰는 일상 언어 중에는 언어적 직관이 없는 사람은 의미 파악이 힘든 관용표현이 존재한다. 관용표현을 이해하기 위해서는 표현에 대한 형태적, 의미적 이해가 수반되어야 하기 때문이다. 기계도 마찬가지로 언어적 직관이 없기 때문에 관용표현에 대한 자연어 처리에는 어려움이 따른다. 특히 일반표현과 중의성 관계에 있는 관용표현의 특성이 고려되지 않은 채 문자적으로만 분석될 위험성이 높다. 본 연구에서는 '관용표현은 주변 문맥과의 관련성이 떨어진다'라는 가정을 중심으로 워드 임베딩을 활용한 관용표현과 일반표현에 대한 구분을 시도하였다. 실험은 4개 표현에 대해 이루어 졌으며 Skip-gram, Fasttext를 활용한 방법을 통해 관용표현은 주변 단어들과의 유사성이 떨어짐을 확인하였다.

  • PDF

A study on QR code-based backup methods to strengthen the security of Cold wallet Purse (콜드월렛 지갑 보안 강화를 위한 QR코드 기반 백업 방안에 대한 연구)

  • Byoung Hoon Choi;JinYong Lee;Nam Hyun Koh;Sam Hyun Chun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.6
    • /
    • pp.21-26
    • /
    • 2023
  • Recently, cryptocurrencies such as Ethereum and Bitcoin, which are called digital assets, Cryptocurrency has completely different characteristics from real assets and must be handled carefully and safely. But The disadvantage of digital assets is that anyone who knows the private key of the wallet can easily steal the digital assets. If the seed card is lost, stolen, or exposed when used, you can use the wallet by recovering the private key using the seed card acquired by someone else. In this paper We aim to safely protect encrypted assets by using QR codes when providing mnemonic words needed to create seed cards.

The Study on the Recognition and the Class Practice Rate of Environmental Education-Relevant Contents in the Unit of 'Clothing Life' of the 7th 'Technology-Home Economics' Curriculum of Middle School (제7차 중학교 '기술.가정' 의생활 단원의 환경교육관련 내용에 관한 학생 인식과 수업실행인식도 조사)

  • Lee, Jong-Soon;Bae, Hyun-Young;Lee, Hye-Ja
    • Journal of Korean Home Economics Education Association
    • /
    • v.21 no.2
    • /
    • pp.171-185
    • /
    • 2009
  • We investigated the extent of recognition and the class practice rate of environmental education-relevant contents of the unit of 'Clothing Life' of the 7th 'Technology-Home Economics' curriculum in the students in Korea. Five hundred fifty students in the second and third grade in the middle school and in the first grade in high school, who had taken the course of 'Clothing Life' and responded to the questionnaires, were enrolled in this study. Questionnaires were sent and collected by mail from December 2007 to January 2008. Most students recognized that the environmental problems in their residences were serious enough to affect their own lives. Only 36.4% of the students, however, expressed the intention to join to the environmental groups and change their clothing lives in order to improve those problems. Also they conceived that the unit of 'Clothing Life' of the 7th 'Technology-Home Economics' curriculum containing only a few statements on the environmental pollution had little relevance to environmental education. According to the results on the class practice rate of environmental education-relevant contents of the unit of 'Clothing Life', the quality of the classes on the environmental education, except the unit of 'Planning and Purchasing Clothes', and the degree of practice of environmental preservation were proved to be low. Subgroup analysis showed that the second grade middle school students as well as female students had the higher interest and class practice rate. Teachers have to try to raise the recognition and the class practice rate of environmental education-relevant contents in the unit of 'Clothing Life' of the 7th 'Technology-Home Economics' curriculum of middle school in the class.

  • PDF

A Study on Parents' Mental Model of Media Environment and Children's Media Use (미디어 환경과 사용에 대한 부모의 심성모형 연구)

  • Lee, Ran;Hong, Jimin
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.12
    • /
    • pp.818-834
    • /
    • 2014
  • The purpose of this study is to examine parents' mental model of media environment and children's media use and to provide some educational suggestions. For this purpose, twelve parents of second-graders to fourth-graders sampled in elementary schools were interviewed with three activities such as a word-association experiment, a sentence completion task and a in-depth interview. The result was categorized into 8 elements such as interaction, source of supply and adverse effects. Furthermore, the analysis on the mental model of media use shows that firstly, the parents understand modern media reflects competence while they have a feeling of fear and newness on media themselves. Secondly, the parents show an ambivalent understanding on media use in terms of both negative and positive effects and have a tendency to control them. Another finding is the fact that the parents understand digital media as a representation of both connection and disconnection. Also, the parents realize media as a cause of conflict and as a place for reconciliation as well. Finally, it is showed that media is not only a personal territory but also a part of social system in the parents' understanding. Based on these findings, some interpretations and parents' educational applications are provided in terms of the Meyrowitz(1998; 1999)'s three perspectives on media.

Improved Decision Tree-Based State Tying In Continuous Speech Recognition System (연속 음성 인식 시스템을 위한 향상된 결정 트리 기반 상태 공유)

  • ;Xintian Wu;Chaojun Liu
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.6
    • /
    • pp.49-56
    • /
    • 1999
  • In many continuous speech recognition systems based on HMMs, decision tree-based state tying has been used for not only improving the robustness and accuracy of context dependent acoustic modeling but also synthesizing unseen models. To construct the phonetic decision tree, standard method performs one-level pruning using just single Gaussian triphone models. In this paper, two novel approaches, two-level decision tree and multi-mixture decision tree, are proposed to get better performance through more accurate acoustic modeling. Two-level decision tree performs two level pruning for the state tying and the mixture weight tying. Using the second level, the tied states can have different mixture weights based on the similarities in their phonetic contexts. In the second approach, phonetic decision tree continues to be updated with training sequence, mixture splitting and re-estimation. Multi-mixture Gaussian as well as single Gaussian models are used to construct the multi-mixture decision tree. Continuous speech recognition experiment using these approaches on BN-96 and WSJ5k data showed a reduction in word error rate comparing to the standard decision tree based system given similar number of tied states.

  • PDF

EEG based Vowel Feature Extraction for Speech Recognition System using International Phonetic Alphabet (EEG기반 언어 인식 시스템을 위한 국제음성기호를 이용한 모음 특징 추출 연구)

  • Lee, Tae-Ju;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.1
    • /
    • pp.90-95
    • /
    • 2014
  • The researchs using brain-computer interface, the new interface system which connect human to macine, have been maded to implement the user-assistance devices for control of wheelchairs or input the characters. In recent researches, there are several trials to implement the speech recognitions system based on the brain wave and attempt to silent communication. In this paper, we studied how to extract features of vowel based on international phonetic alphabet (IPA), as a foundation step for implementing of speech recognition system based on electroencephalogram (EEG). We conducted the 2 step experiments with three healthy male subjects, and first step was speaking imagery with single vowel and second step was imagery with successive two vowels. We selected 32 channels, which include frontal lobe related to thinking and temporal lobe related to speech function, among acquired 64 channels. Eigen value of the signal was used for feature vector and support vector machine (SVM) was used for classification. As a result of first step, we should use over than 10th order of feature vector to analyze the EEG signal of speech and if we used 11th order feature vector, the highest average classification rate was 95.63 % in classification between /a/ and /o/, the lowest average classification rate was 86.85 % with /a/ and /u/. In the second step of the experiments, we studied the difference of speech imaginary signals between single and successive two vowels.

An Investigation on the Future Recognition of Career Counselors and their Future Competency and Future Adaptability change by using the Future Workshop (미래워크숍을 활용한 진로직업상담가의 미래인식과 미래역량 및 미래적응력 변화 탐색)

  • Yeom, In-Sook;Lim, Geum-Hui
    • Journal of Digital Convergence
    • /
    • v.17 no.11
    • /
    • pp.557-567
    • /
    • 2019
  • This investigation was conducted to derive future recognition and future competency of career counselors using future workshops and to verify the effectiveness of improving future adaptability. For this purpose, the future workshop was conducted for 25 career counselors and the data written and the discussion contents of the future workshop were analyzed. For analysis, word frequency analysis and corresponding sample T-verification were conducted, and the main words were derived through consensus. The results, First, the keywords of future recognition showed high frequency of robot, artificial intelligence, leisure, education, convenience, and the disabled. Second, the future labor sites projected the most changes due to high technology. Third, at the career counseling site, professional career counselors and robot counselors related to the fourth industrial revolution are expected to appear. Fourth, future competencies of career counselors were derived from information processing ability, professional counseling ability, communication ability, and ethical consciousness. Finally, it was confirmed that the future adaptability of career counselors increases after participating in future workshops, and the future competencies derived from this study are expected to be used for job training of career counselors.

A Study on Analysis of consumer perception of YouTube advertising using text mining (텍스트 마이닝을 활용한 Youtube 광고에 대한 소비자 인식 분석)

  • Eum, Seong-Won
    • Management & Information Systems Review
    • /
    • v.39 no.2
    • /
    • pp.181-193
    • /
    • 2020
  • This study is a study that analyzes consumer perception by utilizing text mining, which is a recent issue. we analyzed the consumer's perception of Samsung Galaxy by analyzing consumer reviews of Samsung Galaxy YouTube ads. for analysis, 1,819 consumer reviews of YouTube ads were extracted. through this data pre-processing, keywords for advertisements were classified and extracted into nouns, adjectives, and adverbs. after that, frequency analysis and emotional analysis were performed. Finally, clustering was performed through CONCOR. the summary of this study is as follows. the first most frequently mentioned words were Galaxy Note (n = 217), Good (n = 135), Pen (n = 40), and Function (n = 29). it can be judged through the advertisement that consumers "Galaxy Note", "Good", "Pen", and "Features" have good functional aspects for Samsung mobile phone products and positively recognize the Note Pen. in addition, the recognition of "Samsung Pay", "Innovation", "Design", and "iPhone" shows that Samsung's mobile phone is highly regarded for its innovative design and functional aspects of Samsung Pay. second, it is the result of sentiment analysis on YouTube advertising. As a result of emotional analysis, the ratio of emotional intensity was positive (75.95%) and higher than negative (24.05%). this means that consumers are positively aware of Samsung Galaxy mobile phones. As a result of the emotional keyword analysis, positive keywords were "good", "good", "innovative", "highest", "fast", "pretty", etc., negative keywords were "frightening", "I want to cry", "discomfort", "sorry", "no", etc. were extracted. the implication of this study is that most of the studies by quantitative analysis methods were considered when looking at the consumer perception study of existing advertisements. In this study, we deviated from quantitative research methods for advertising and attempted to analyze consumer perception through qualitative research. this is expected to have a great influence on future research, and I am sure that it will be a starting point for consumer awareness research through qualitative research.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.