• Title/Summary/Keyword: Word

Search Result 6,385, Processing Time 0.03 seconds

SSF: Sentence Similar Function Based on word2vector Similar Elements

  • Yuan, Xinpan;Wang, Songlin;Wan, Lanjun;Zhang, Chengyuan
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1503-1516
    • /
    • 2019
  • In this paper, to improve the accuracy of long sentence similarity calculation, we proposed a sentence similarity calculation method based on a system similarity function. The algorithm uses word2vector as the system elements to calculate the sentence similarity. The higher accuracy of our algorithm is derived from two characteristics: one is the negative effect of penalty item, and the other is that sentence similar function (SSF) based on word2vector similar elements doesn't satisfy the exchange rule. In later studies, we found the time complexity of our algorithm depends on the process of calculating similar elements, so we build an index of potentially similar elements when training the word vector process. Finally, the experimental results show that our algorithm has higher accuracy than the word mover's distance (WMD), and has the least query time of three calculation methods of SSF.

Construction of Korean WordNet (한국어 워드넷의 구축)

  • Lim, Sung-Shin;Lee, Eun-Ryoung;Kwon, Hyuk-Chul
    • Annual Conference on Human and Language Technology
    • /
    • 2004.10d
    • /
    • pp.106-111
    • /
    • 2004
  • 사람의 언어를 이해하는 자연언어처리 시스템을 개발하기 위해서는 의미처리를 위한 지식 베이스(knowledge base)가 필요하다. 지금까지 사람이 가진 지식 베이스를 컴퓨터에 도입하려는 많은 노력을 기울이고 있고 그 결과물로 온톨로지(ontology)와 시소러스(thesaurus)가 만들어지고 있다. 외국에서는 지식 베이스의 중요성을 알고 많은 연구를 수행하고 있으며 그 대표적인 사례들에는 Roget's Thesaurus, WordNet, EDR 개념사전, CYC, Euro WordNet 등이 있다. 이 중에서 가장 대표적이며 많은 활용을 보이는 것이 Princeton 대학의 WordNet이다. WordNet은 인간의 어휘지식에 대한 심리 언어학적인 연구의 결과물로써 심리학자와 언어학자들에 의해 10여 년 동안 구축되고 있는 영어에 대한 어휘데이터베이스이다. 본 논문에서는 WordNet을 기반으로 명사에 대해서 영한사전과 국어사전을 이용하여 구축한 한국어 워드넷을 소개하구 구축시 고려한 기본지침을 소개하도록 하겠다.

  • PDF

Word Embedding Analysis for Biomedical Articles (생의학 문헌에 대한 워드 임베딩 적용 및 분석)

  • Choi, Yunsoo;Jeon, Sunhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.04a
    • /
    • pp.394-395
    • /
    • 2016
  • 워드 임베딩(word embedding)은 정보검색이나 기계학습에서 단어를 표현하기 위하여 사용되던 기존의 one-hot 벡터 방식의 희소공간 및 단어들 간의 관계정보를 유지할 수 없는 문제를 해결하기 위한 방법이다. 워드 임베딩의 한 방법으로 word2vec은 최근 빠른 학습시간과 높은 효과를 얻을 수 있는 모델로 주목을 받고 있다. word2vec은 수행 시 주어지는 옵션인 벡터차원과 문맥크기에 의해 그 결과 품질이 상이하다. Mikolov는 구글 뉴스 문헌 집합에 대하여 word2vec을 실험하고, 적합한 옵션을 제시하였다. 본 논문에서는 구글 뉴스 문헌 같은 일반 문서가 아닌 생의학 분야에 특화된 문헌에 대하여 word2vec에 대한 다양한 옵션을 실험하고, 생의학 문헌에 적합한 최적의 조건을 분석한다.

A Study on Categorization of Korean News Article based on CNN using Doc2Vec (Doc2Vec을 활용한 CNN기반 한국어 신문기사 분류에 관한 연구)

  • Kim, Do-Woo;Koo, Myoung-Wan
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.67-71
    • /
    • 2016
  • 본 논문에서는 word2vec과 doc2vec을 함께 CNN에 적용한 문서 분류 방안을 제안한다. 먼저 어절, 형태소, WPM(Word Piece Model)을 각각 사용하여 생성한 토큰(token)으로 doc2vec을 활용하여 문서를 vector로 표현한 후, 초보적인 문서 분류에 적용한 결과 WPM이 분류율 79.5%가 되어 3가지 방법 중 최고 성능을 보였다. 다음으로 CNN의 입력자질로써 WPM을 이용하여 생성한 토큰을 활용한 word2vec을 범주 10개의 문서 분류에 사용한 실험과 doc2vec을 함께 사용한 실험을 수행하였다. 실험 결과 word2vec만을 활용하였을 때 86.89%의 분류율을 얻었고, doc2vec을 함께 적용한 결과 89.51%의 분류율을 얻었다. 따라서 제안한 모델을 통해서 분류율이 2.62% 향상됨을 확인하였다.

  • PDF

A Study on the Correlation between English Word-final Stop and Vowel Duration Produced by Speakers of Korean (한국인 영어 학습자의 어말 폐쇄음과 선행 모음 길이의 상관관계 연구)

  • Kim, Ji-Eun
    • Phonetics and Speech Sciences
    • /
    • v.3 no.1
    • /
    • pp.15-22
    • /
    • 2011
  • The purposes of this study are (1) to investigate the correlation between English word-final stop and the duration of vowels before word-final stop and (2) to suggest a way to detect pronunciation errors and teach the pronunciation of English word-final stops. For these purposes, 18 Korean speakers' production was recorded and analysed using Speech Analyzer and their production was compared with that of native English speakers. In addition, two native English speakers evaluated the subjects' pronunciation. The major findings are the voicing dependent effect of the English vowels produced by native Korean speakers is lower than that of native English speakers; Korean speakers release English word-final stops less than native English speakers; and the pronunciation of English word-final stops and the duration of adjacent vowels are closely related in that the pronunciation score of final stops and the ratio of vowels between the vowels before voiced stops and voiceless stops are correlated. In addition, this study concludes with pedagogical suggestions that may be useful for English pronunciation teaching.

  • PDF

On Characteristics of Word Embeddings by the Word2vec Model (Word2vec 모델의 단어 임베딩 특성 연구)

  • Kang, Hyungsuc;Yang, Janghoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.263-266
    • /
    • 2019
  • 단어 임베딩 모델 중 현재 널리 사용되는 word2vec 모델은 언어의 의미론적 유사성을 잘 반영한다고 알려져 있다. 본 논문은 word2vec 모델로 학습된 단어 벡터가 실제로 의미론적 유사성을 얼마나 잘 반영하는지 확인하는 것을 목표로 한다. 즉, 유사한 범주의 단어들이 벡터 공간상에 가까이 임베딩되는지 그리고 서로 구별되는 범주의 단어들이 뚜렷이 구분되어 임베딩되는지를 확인하는 것이다. 간단한 군집화 알고리즘을 통한 검증의 결과, 상식적인 언어 지식과 달리 특정 범주의 단어들은 임베딩된 벡터 공간에서 뚜렷이 구분되지 않음을 확인했다. 결론적으로, 단어 벡터들의 유사도가 항상 해당 단어들의 의미론적 유사도를 의미하지는 않는다. Word2vec 모델의 결과를 응용하는 향후 연구에서는 이런 한계점에 고려가 요청된다.

"Word of Mouth" in the Chain Restaurant Industry (체인 레스토랑 산업에서 고객의 '구전 효과' 형성에 관한 연구)

  • Hyun, Sung-Hyup;Heo, Cindy Yoon-Joung
    • Journal of the East Asian Society of Dietary Life
    • /
    • v.20 no.4
    • /
    • pp.606-618
    • /
    • 2010
  • The study investigated how 'word of mouth' originates in the chain restaurant industry. It has long been acknowledged that 'word of mouth' is a critical factor for the success of a restaurant business due to its targetability and cost effectiveness. A review of the literature revealed four antecedents of 'word of mouth': service quality, perceived value, satisfaction, and relationship quality. Based on the theoretical/empirical relationships between those constructs, a structural model composed of the hypotheses was proposed. The structural model was tested with data collected from 471 chain restaurant patrons. The structural equation modeling analysis revealed that five constructs in the proposed model are interrelated, and during this process, word of mouth is formed in the chain restaurant industry. A positive relationship between service quality and satisfaction (0.265, p<0.05), service quality and perceived value (0.831, p<0.05), service quality and relationship quality (0.465, p< 0.05), and service quality and WOM (0.263, p< 0.05) were found, indicating that service quality is a key prerequisite for word of mouth and other constructs proposed in the model. It was revealed that perceived value doe not have a direct impact on WOM formation (t=1.275, p=0.202), but a positive relationship between perceived value and satisfaction (0.293, p<0.05) and between satisfaction and WOM (0.627, p< 0.05) were found. Therefore, it was concluded that patrons' perceived value influences word of mouth formation, but that impact is mediated by satisfaction. During this process, relationship quality also plays a mediating role in generating word of mouth. Based on data analysis, theoretical/managerial implications are discussed.

The characteristics of eye-movement in Korean sentence reading: cluster length, word frequency, and landing position effects (우리 문장 읽기에서 안구 운동의 특성: 어절 길이, 단어 빈도 및 착지점 관련 효과)

  • Koh, Sung-Ryongng;Yoon, Nak-Yeong
    • Korean Journal of Cognitive Science
    • /
    • v.18 no.4
    • /
    • pp.325-350
    • /
    • 2007
  • This study investigated global and local characteristics of eye movement while 16 college students read 48 easy Korean sentences. It was found that readers lusted for about 225ms at the word cluster(eojeol), made a forward saccade of about 3.6 characters to the next word, skipped short and high-frequent words about 25% during the first-pass reading, and regressed backward at 19%. There were also individual differences in readers' pattern of fixation and saccade. In addition, the effects of word cluster length and word frequency and the effects related to landing position were examined. The eyes landed on the center of a word cluster more frequently than on the boundaries. When the eyes landed at the boundaries, the eyes fixated the word cluster again more frequently. The word clusters with high-frequency words were read faster than those with low-frequency words.

  • PDF

Study on the Meaning of Nasal discharge(涕) in Five fluids (오액(五液) 중(中) '체(涕)'의 의미에 대한 고찰)

  • Jang, Heewon;Song, Jichung;Eom, Dongmyung
    • Journal of Korean Medical classics
    • /
    • v.29 no.3
    • /
    • pp.75-80
    • /
    • 2016
  • Objectives : The paper raises an objection to the word '涕' being used to refer to nasal discharge, and proposes a word for nasal discharge upon studying a set of medical books. Methods : The author finds and confirms the dictionary definition of '涕' and studies how they are used differently in medical books. Through this study, the author shows how the word '涕' is used incorrectly and makes deductions for its reason. The author takes a look at the old form of the word '涕', its etymological origin, takes a guess as to the real word that should have been used to refer to nasal discharge, and find examples of instances where this correct word for nasal discharge are more appropriate. Results & Conclusions : In medical books such as Huangdineijing Suwen, '涕' is used to mean nasal discharge, but the word's dictionary definition does not validate such usage. Yugunryeombu (劉君廉夫), in its commentary for Somun, used '?' and '鼻夷' for '涕', and '?' means nasal discharge and used as same as '涕' when its used to mean tear. This is a phenomenon that originated from '弟' and '夷' being used interchangeably which led to the incorrect usage of '?'. If someone were to refer to nasal discharge, he needs to use '?'. '鼻夷' is believed to be the same word as '弟鼻', which is the old form of '?', and it means both tear(pronounced 'Che') and nasal discharge(pronounced 'Je'). However, the pronunciation different between 'Che' and 'Je', and its definition as tear, is divided in later periods into '涕' following the shape of '弟'. Following the shape of '夷', the meaning of nasal discharge remains in '?' while retaining the pronunciation of 'yi'. Therefore, the word '涕' used to mean nasal discharge is an incorrect form of '?', and should all be re-written to '?'.

Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec (Doc2Vec과 Word2Vec을 활용한 Convolutional Neural Network 기반 한국어 신문 기사 분류)

  • Kim, Dowoo;Koo, Myoung-Wan
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.742-747
    • /
    • 2017
  • In this paper, we propose a novel approach to improve the performance of the Convolutional Neural Network(CNN) word embedding model on top of word2vec with the result of performing like doc2vec in conducting a document classification task. The Word Piece Model(WPM) is empirically proven to outperform other tokenization methods such as the phrase unit, a part-of-speech tagger with substantial experimental evidence (classification rate: 79.5%). Further, we conducted an experiment to classify ten categories of news articles written in Korean by feeding words and document vectors generated by an application of WPM to the baseline and the proposed model. From the results of the experiment, we report the model we proposed showed a higher classification rate (89.88%) than its counterpart model (86.89%), achieving a 22.80% improvement. Throughout this research, it is demonstrated that applying doc2vec in the document classification task yields more effective results because doc2vec generates similar document vector representation for documents belonging to the same category.