• Title/Summary/Keyword: Similar Word

Search Result 415, Processing Time 0.029 seconds

Speech Verification using Similar Word Information in Isolated Word Recognition (고립단어 인식에 유사단어 정보를 이용한 단어의 검증)

  • 백창흠;이기정홍재근
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1255-1258
    • /
    • 1998
  • Hidden Markov Model (HMM) is the most widely used method in speech recognition. In general, HMM parameters are trained to have maximum likelihood (ML) for training data. This method doesn't take account of discrimination to other words. To complement this problem, this paper proposes a word verification method by re-recognition of the recognized word and its similar word using the discriminative function between two words. The similar word is selected by calculating the probability of other words to each HMM. The recognizer haveing discrimination to each word is realized using the weighting to each state and the weighting is calculated by genetic algorithm.

  • PDF

SSF: Sentence Similar Function Based on word2vector Similar Elements

  • Yuan, Xinpan;Wang, Songlin;Wan, Lanjun;Zhang, Chengyuan
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1503-1516
    • /
    • 2019
  • In this paper, to improve the accuracy of long sentence similarity calculation, we proposed a sentence similarity calculation method based on a system similarity function. The algorithm uses word2vector as the system elements to calculate the sentence similarity. The higher accuracy of our algorithm is derived from two characteristics: one is the negative effect of penalty item, and the other is that sentence similar function (SSF) based on word2vector similar elements doesn't satisfy the exchange rule. In later studies, we found the time complexity of our algorithm depends on the process of calculating similar elements, so we build an index of potentially similar elements when training the word vector process. Finally, the experimental results show that our algorithm has higher accuracy than the word mover's distance (WMD), and has the least query time of three calculation methods of SSF.

Exclusion of Non-similar Candidates using Positional Accuracy based on Levenstein Distance from N-best Recognition Results of Isolated Word Recognition (레벤스타인 거리에 기초한 위치 정확도를 이용한 고립 단어 인식 결과의 비유사 후보 단어 제외)

  • Yun, Young-Sun;Kang, Jeom-Ja
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.109-115
    • /
    • 2009
  • Many isolated word recognition systems may generate non-similar words for recognition candidates because they use only acoustic information. In this paper, we investigate several techniques which can exclude non-similar words from N-best candidate words by applying Levenstein distance measure. At first, word distance method based on phone and syllable distances are considered. These methods use just Levenstein distance on phones or double Levenstein distance algorithm on syllables of candidates. Next, word similarity approaches are presented that they use characters' position information of word candidates. Each character's position is labeled to inserted, deleted, and correct position after alignment between source and target string. The word similarities are obtained from characters' positional probabilities which mean the frequency ratio of the same characters' observations on the position. From experimental results, we can find that the proposed methods are effective for removing non-similar words without loss of system performance from the N-best recognition candidates of the systems.

  • PDF

Word Verification using Similar Word Information and State-Weights of HMM using Genetic Algorithmin (유사단어 정보와 유전자 알고리듬을 이용한 HMM의 상태하중값을 사용한 단어의 검증)

  • Kim, Gwang-Tae;Baek, Chang-Heum;Hong, Jae-Geun
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.1
    • /
    • pp.97-103
    • /
    • 2001
  • Hidden Markov Model (HMM) is the most widely used method in speech recognition. In general, HMM parameters are trained to have maximum likelihood (ML) for training data. Although the ML method has good performance, it dose not take account into discrimination to other words. To complement this problem, a word verification method by re-recognition of the recognized word and its similar word using the discriminative function of the two words. To find the similar word, the probability of other words to the HMM is calculated and the word showing the highest probability is selected as the similar word of the mode. To achieve discrimination to each word the weight to each state is appended to the HMM parameter. The weight is calculated by genetic algorithm. The verificator complemented discrimination of each word and reduced the error occurred by similar word. As a result of verification the total error is reduced by about 22%

  • PDF

Korean Language Clustering using Word2Vec (Word2Vec를 이용한 한국어 단어 군집화 기법)

  • Heu, Jee-Uk
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.5
    • /
    • pp.25-30
    • /
    • 2018
  • Recently with the development of Internet technology, a lot of research area such as retrieval and extracting data have getting important for providing the information efficiently and quickly. Especially, the technique of analyzing and finding the semantic similar words for given korean word such as compound words or generated newly is necessary because it is not easy to catch the meaning or semantic about them. To handle of this problem, word clustering is one of the technique which is grouping the similar words of given word. In this paper, we proposed the korean language clustering technique that clusters the similar words by embedding the words using Word2Vec from the given documents.

Various Approaches to Improve Exclusion Performance of Non-similar Candidates from N-best Recognition Results on Isolated Word Recognition (고립 단어 인식 결과의 비유사 후보 단어 제외 성능을 개선하기 위한 다양한 접근 방법 연구)

  • Yun, Young-Sun
    • Phonetics and Speech Sciences
    • /
    • v.2 no.4
    • /
    • pp.153-161
    • /
    • 2010
  • Many isolated word recognition systems may generate non-similar words for recognition candidates because they use only acoustic information. The previous study [1,2] investigated several techniques which can exclude non-similar words from N-best candidate words by applying Levenstein distance measure. This paper discusses the various improving techniques of removing non-similar recognition results. The mentioned methods include comparison penalties or weights, phone accuracy based on confusion information, weights candidates by ranking order and partial comparisons. Through experimental results, it is found that some proposed method keeps more accurate recognition results than the previous method's results.

  • PDF

A Study on the Integration of Similar Sentences in Atomatic Summarizing of Document (자동초록 작성시에 발생하는 유사의미 문장요소들의 통합에 관한 연구)

  • Lee, Tae-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.34 no.2
    • /
    • pp.87-115
    • /
    • 2000
  • The effects of the Case, Part of Speech, Word and Clause Location, Word Frequency etc. were studied in discriminating the similar sentences of the Korean text. Word Frequency was much related to the discrimination of similarity and Tilte word and Functional Clause were little, but the others were not. The cosine coefficient and Salton'similarity measurement are used to measure the similarity between sentences. The change of clauses between each sentence is also used to unify the similar sentences into a represenative sentence.

  • PDF

The Analysis of a Causal Relationship of Traditional Korean Restaurant's Well-Bing Attribute Selection on Customers' Re-Visitation and Word-of-Mouth

  • Baek, Hang-Sun;Shin, Chung-Sub;Lee, Sang-Youn
    • East Asian Journal of Business Economics (EAJBE)
    • /
    • v.4 no.2
    • /
    • pp.48-60
    • /
    • 2016
  • This study analyzes what effects does restaurant's well-being attribute selection have on word-of-mouth intention. Based on the result, this study aims to provide basic data for establishing Korean restaurant's service strategy and marketing strategy. The researchers surveyed 350 customers who visited a Korean restaurant located in Kangbook, Seoul. We encoded gathered data and analyzed them using SPSS 17.0 statistics package program. Following are the analyzed results. First, under hypothesis 1 - Korean restaurant's well-being attribute selection will have a positive influence on re-visitation intention - it is shown that sufficiency, healthiness, and steadiness have similar influence on re-visitation intention. Second, under hypothesis 2 - Korean restaurant's well-being attribute selection will have a positive influence on word-of-mouth intention - it is shown that sufficiency, healthiness, environment, and steadiness have similar influence on word -of-mouth intention. Third, under hypothesis 3 - Korean restaurant's re-visitation intention will have a positive influence on word -of-mouth intention - it is considered that eliciting customer's re-visitation intention also has influence on word-of-mouth intention. We will be necessary to consult how to derive customer's re-visitation intention or word-of-mouth intention by considering factors which customers of traditional Korean restaurant value.

A Word Embedding used Word Sense and Feature Mirror Model (단어 의미와 자질 거울 모델을 이용한 단어 임베딩)

  • Lee, JuSang;Shin, JoonChoul;Ock, CheolYoung
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.4
    • /
    • pp.226-231
    • /
    • 2017
  • Word representation, an important area in natural language processing(NLP) used machine learning, is a method that represents a word not by text but by distinguishable symbol. Existing word embedding employed a large number of corpora to ensure that words are positioned nearby within text. However corpus-based word embedding needs several corpora because of the frequency of word occurrence and increased number of words. In this paper word embedding is done using dictionary definitions and semantic relationship information(hypernyms and antonyms). Words are trained using the feature mirror model(FMM), a modified Skip-Gram(Word2Vec). Sense similar words have similar vector. Furthermore, it was possible to distinguish vectors of antonym words.

Phonological Error Patterns of Korean Children With Specific Phonological Disorders (정상 아동과 기능적 음운장애 아동의 음운 오류 비교)

  • Kim, Min-Jung;Pae, So-Yeong
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.7-18
    • /
    • 2000
  • The purpose of this study was to investigate the phonological error patterns of korean children with and without specific phonological disorders(SPD). In this study, 29 normally developing children and 10 SPD children were involved. The children were matched the percentage of consonants correct(PCC). 22 picture cards were used to elicit korean consonants in word initial syllable initial, word medial syllable initial, word medial syllable final, word final syllable final positions. The findings were as follows. First, the phonological error patterns of SPD were 1) similar to those of normal children with the same PCC, 2) similar to those of normal children with the lower PCC, or 3) unusual to those of normal children. Second,. korean children showed phonological processes reflecting the korean phonological characteristics: tensification, reduction of the word medial syllable final consonant. This study suggests that both the PCC and error patterns should be considered in assessing phonological abilities of children.

  • PDF