• 제목/요약/키워드: Similar Word

검색결과 415건 처리시간 0.027초

벅아이 코퍼스에서의 젊은 성인 여성의 모음 포먼트 분석 (An Analysis of the Vowel Formants of the Young Females in the Buckeye Corpus)

  • 윤규철
    • 말소리와 음성과학
    • /
    • 제4권4호
    • /
    • pp.45-52
    • /
    • 2012
  • The purpose of this paper is to measure the first two vowel formants of the ten young female speakers from the Buckeye Corpus of Conversational Speech [1] automatically and then to analyze various potential factors that may affect the formant distribution of the eight peripheral vowels of English. The factors that were analyzed included the place of articulation, the content versus function word information, the syllabic stress information, the location in a word, the location in an utterance, the speech rate of the three consecutive words, and the word frequency in the corpus. The results indicate that the overall formant patterns of the female speakers were similar to those of earlier works. The effects of the factors on the realization of the two formants were also similar to those from the male speakers with minor differences.

특허 정보 검색을 위한 대체어 후보 추출 방법 (Extracting Alternative Word Candidates for Patent Information Search)

  • 백종범;김성민;이수원
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제15권4호
    • /
    • pp.299-303
    • /
    • 2009
  • 특허 정보 검색은 연구 및 기술 개발에 앞서 선행연구의 존재 여부를 확인하기 위한 사전 조사 목적으로 주로 사용된다. 이러한 특히 정보 검색에서 원하는 정보를 얻지 못하는 원인은 다양하다. 그 중에서 본 연구는 키워드 불일치에 의한 정보 누락을 최소화하기 위한 대체어 후보 추출 방법을 제안한다. 본 연구에서 제안하는 대체어 후보 추출 방법은 문장 내에서 함께 쓰이는 단어들이 비슷한 두 단어는 서로 비슷한 의미를 지닐 것이다라는 직관적 가설을 전제로 한다. 이와 같은 가설을 만족하는 대체어를 추출하기 위해서 본 연구에서는 분류별 집중도, 신뢰도를 이용한 연관단어뭉치, 연관단어 뭉치간 코사인 유사도 및 순위 보정 기법을 제안한다. 본 연구에서 제안한 대체어 후보 추출 방법의 성능은 대체어 유형별로 작성된 평가지표를 이용하여 재현율을 측정함으로써 평가하였으며, 제안 방법이 문서 벡터공간 모델의 성능보다 더 우수한 것으로 나타났다.

문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안 (Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity)

  • 이민석;양석우;이홍주
    • 지능정보연구
    • /
    • 제25권4호
    • /
    • pp.105-122
    • /
    • 2019
  • 텍스트 데이터가 특정 범주에 속하는지 판별하는 문장 분류에서, 문장의 특징을 어떻게 표현하고 어떤 특징을 선택할 것인가는 분류기의 성능에 많은 영향을 미친다. 특징 선택의 목적은 차원을 축소하여도 데이터를 잘 설명할 수 있는 방안을 찾아내는 것이다. 다양한 방법이 제시되어 왔으며 Fisher Score나 정보 이득(Information Gain) 알고리즘 등을 통해 특징을 선택 하거나 문맥의 의미와 통사론적 정보를 가지는 Word2Vec 모델로 학습된 단어들을 벡터로 표현하여 차원을 축소하는 방안이 활발하게 연구되었다. 사전에 정의된 단어의 긍정 및 부정 점수에 따라 단어의 임베딩을 수정하는 방법 또한 시도하였다. 본 연구는 문장 분류 문제에 대해 선택적 단어 제거를 수행하고 임베딩을 적용하여 문장 분류 정확도를 향상시키는 방안을 제안한다. 텍스트 데이터에서 정보 이득 값이 낮은 단어들을 제거하고 단어 임베딩을 적용하는 방식과, 정보이득 값이 낮은 단어와 코사인 유사도가 높은 주변 단어를 추가로 선택하여 텍스트 데이터에서 제거하고 단어 임베딩을 재구성하는 방식이다. 본 연구에서 제안하는 방안을 수행함에 있어 데이터는 Amazon.com의 'Kindle' 제품에 대한 고객리뷰, IMDB의 영화리뷰, Yelp의 사용자 리뷰를 사용하였다. Amazon.com의 리뷰 데이터는 유용한 득표수가 5개 이상을 만족하고, 전체 득표 중 유용한 득표의 비율이 70% 이상인 리뷰에 대해 유용한 리뷰라고 판단하였다. Yelp의 경우는 유용한 득표수가 5개 이상인 리뷰 약 75만개 중 10만개를 무작위 추출하였다. 학습에 사용한 딥러닝 모델은 CNN, Attention-Based Bidirectional LSTM을 사용하였고, 단어 임베딩은 Word2Vec과 GloVe를 사용하였다. 단어 제거를 수행하지 않고 Word2Vec 및 GloVe 임베딩을 적용한 경우와 본 연구에서 제안하는 선택적으로 단어 제거를 수행하고 Word2Vec 임베딩을 적용한 경우를 비교하여 통계적 유의성을 검정하였다.

The Phonetic Difference Between the Korean Stop Series /p,t,k/ and the English /b,d,g/ Based on the VOT Value

  • Kang, Insun
    • 한국영어학회지:영어학
    • /
    • 제3권3호
    • /
    • pp.427-452
    • /
    • 2003
  • Korean is famous for having all voiceless stop sounds. Korean does have voiced stops but they are considered to exist only as the allophones of word initial /p, t, k/. My experiment shows the English word initial stop sounds [b, d, g] and the Korean lax stop series /p, t, k/ in word initial position are similar in the range of voice onset time. If English word initial[b, d, g] sounds are posited as voiced, then Korean word initial /p, t, k/ should be classified as voiced also. Phonetically English /b, d, g/ phonemes and Korean /p, t, k/ phonemes are very similar except the word initial [p, t, k] are devoiced slightly more, but not significant enough to be classified as voiceless than English word initial [b, d, g]. If we posit /b, d, g/ as Korean phonemes, it explains why Korean /p, t, k/ series has the allophones [b, d, g] instead of fortis stops /p', t', k'/ in Korean even though /p', t', k'/ has less positive VOT value than /p, t, k/. If we posit /b, d, g/ as Korean phonemes, then it does not cause spelling or pronunciation confusion either when Koreans learn English or English speakers learn Korean.

  • PDF

朝鮮時代 服飾用語 硏究I-衣服關聯用語를 中心으로- (A Study on the Costume Terminologies of the Chosun Period)

  • 김진구
    • 복식문화연구
    • /
    • 제9권3호
    • /
    • pp.523-531
    • /
    • 2001
  • The objective of this study was to trace the origins of the costume terminologies and to identify the meanings of the names of costumes of the Chosun Period. Such terms as dukgai 得盖, murot gai 무롯지 or murukai 무루깨, bal 발, bigya 비갸, bium 비음, samachi 사마치, chiene 처네, chienui 薦衣 were included in this research can be summarized as follows: It appeared that similar words to dukagai were found in the languages such as the language of the arctic regions, Mongolians, English, Sumerian, and Latin. It is considered that dukgai of Chosun was related to L. toga. The word murot gai or murukai as a kind of head covering had its origins in Korean meaning to cover or to wear. Also it was found that the word bal was derived from L.palla meaning a robe, cloak or mantle. Korean bal 발 meant a dang jugori 당저고리 or dang go ui, a kind of women\`s formal outer dress. It was found that word bium or biim, a garment of Yi Chosun was similar to Ass. birmu, a garment. The word, samachi of Yi Chosun was derived from the Manchurien word samachi meaning a kind of military skirt. The word, chiene 처네 or chienui was derived from the Chinese chien (Equatopms. See Full-text) that means a skirt, a child\`s covering, a sheet, and women\`s underwear.

  • PDF

朝鮮時代 服飾用語 硏究II-織物關聯用語를 中心으로- (A Study on the Textile Terminologies of the Chosun Period)

  • 김진구
    • 복식문화연구
    • /
    • 제9권3호
    • /
    • pp.532-536
    • /
    • 2001
  • This study is concerned with the textile related terminologies of the Chosun period. The purpose of this study was to trace and to examine some textile related terms such as goro, mooruwi, modan, shiok, jal, gaam, and chien. These words were examined and analyzed in terms of the origins, meanings, and neighbouring languages. The results of this research can be summarized as follows: The results of this study revealed that the word goro of the Chosun period was derived from the Chinese ku lo 羅 or (Equations. See Full-text). Korean goro or goroi is a transliteration of the Chinese moolo 霧羅. The word modan 帽緞 was a kind of rich silk fabric. Manchurian kamku 帽緞 was derived from Arabic word kamkha. The word shiok, shiok, shiuk, shiurk, or shiu 시으 means felt in Korean. Similar words to Korean shiok was found in Afro-Asiatic family such as Egyptian, Hebrew, and Assyrians. Egyptian shiu means a seep or a goat. The word jal meaning black sable was found was originated in the Chinese tzuerl 子兒皮, black sable. The word Korean gaam 가암, 가음, was similar to Mongorian k∂m meaning a material. Also Iraq-Arabian xaam meaning raw, unworked, unprocessed, had the same meaning as the Korean gaam. Xaam and gaam have almost the same phonetical sounds. The Korean gaam was derived from the xaam of Iraq-Arabian. Korean chien meaning cloth was derived from the Chinese chyan or chien (Equations. See Full-text).

  • PDF

A Comparative Study of Word Embedding Models for Arabic Text Processing

  • Assiri, Fatmah;Alghamdi, Nuha
    • International Journal of Computer Science & Network Security
    • /
    • 제22권8호
    • /
    • pp.399-403
    • /
    • 2022
  • Natural texts are analyzed to obtain their intended meaning to be classified depending on the problem under study. One way to represent words is by generating vectors of real values to encode the meaning; this is called word embedding. Similarities between word representations are measured to identify text class. Word embeddings can be created using word2vec technique. However, recently fastText was implemented to provide better results when it is used with classifiers. In this paper, we will study the performance of well-known classifiers when using both techniques for word embedding with Arabic dataset. We applied them to real data collected from Wikipedia, and we found that both word2vec and fastText had similar accuracy with all used classifiers.

자연어 처리 기법을 활용한 산업재해 위험요인 구조화 (Structuring Risk Factors of Industrial Incidents Using Natural Language Process)

  • 강성식;장성록;이종빈;서용윤
    • 한국안전학회지
    • /
    • 제36권1호
    • /
    • pp.56-63
    • /
    • 2021
  • The narrative texts of industrial accident reports help to identify accident risk factors. They relate the accident triggers to the sequence of events and the outcomes of an accident. Particularly, a set of related keywords in the context of the narrative can represent how the accident proceeded. Previous studies on text analytics for structuring accident reports have been limited to extracting individual keywords without context. We proposed a context-based analysis using a Natural Language Processing (NLP) algorithm to remedy this shortcoming. This study aims to apply Word2Vec of the NLP algorithm to extract adjacent keywords, known as word embedding, conducted by the neural network algorithm based on supervised learning. During processing, Word2Vec is conducted by adjacent keywords in narrative texts as inputs to achieve its supervised learning; keyword weights emerge as the vectors representing the degree of neighboring among keywords. Similar keyword weights mean that the keywords are closely arranged within sentences in the narrative text. Consequently, a set of keywords that have similar weights presents similar accidents. We extracted ten accident processes containing related keywords and used them to understand the risk factors determining how an accident proceeds. This information helps identify how a checklist for an accident report should be structured.

남자목개의 연구 (A Study on nam ja mok kai(南子木蓋))

  • 김진구
    • 복식문화연구
    • /
    • 제6권3호
    • /
    • pp.1-5
    • /
    • 1998
  • The objective of this study was to identify and to interpret the word nam ja mok kai(南子木蓋) in Keirim Yusa(鷄林類事). Comparative linguistic analytical approaches ware employed for this research. Results and findings, of this study can be summarized as follows: It was fond that similar words to jamok kai(子木蓋) of Koryo were in Mongolic, Manchuric as well as in Hebrew. Thus, the word nam ja mok kai(南子木蓋) is not reversed word of nam mok ja kai(南子木蓋). The word jamokkai and the meaning of it were derived from Hebrew.

  • PDF

Word Embedding기반 Twitter 해시 태그 클러스터링 (Twitter Hashtags Clustering with Word Embedding)

  • 티엔윙안;양형정
    • 한국콘텐츠학회:학술대회논문집
    • /
    • 한국콘텐츠학회 2019년도 춘계종합학술대회
    • /
    • pp.179-180
    • /
    • 2019
  • Nowadays, clustering algorithm is considered as a promising solution for lacking human-labeled and massive data of social media sites in numerous machine learning tasks. Many researchers propose disaster event detection systems have ability to determine special local events, such as missing people, public transport damage by clustering similar tweets and hashtags together. In this paper, we try to extend tweet hashtag feature definition by applying word embedding. The experimental results are described that word embedding achieve better performance than the reference method.

  • PDF