• Title/Summary/Keyword: 음소 유사도

Search Result 97, Processing Time 0.03 seconds

Recognition Method of Korean Abnormal Language for Spam Mail Filtering (스팸메일 필터링을 위한 한글 변칙어 인식 방법)

  • Ahn, Hee-Kook;Han, Uk-Pyo;Shin, Seung-Ho;Yang, Dong-Il;Roh, Hee-Young
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.2
    • /
    • pp.287-297
    • /
    • 2011
  • As electronic mails are being widely used for facility and speedness of information communication, as the amount of spam mails which have malice and advertisement increase and cause lots of social and economic problem. A number of approaches have been proposed to alleviate the impact of spam. These approaches can be categorized into pre-acceptance and post-acceptance methods. Post-acceptance methods include bayesian filters, collaborative filtering and e-mail prioritization which are based on words or sentances. But, spammers are changing those characteristics and sending to avoid filtering system. In the case of Korean, the abnormal usages can be much more than other languages because syllable is composed of chosung, jungsung, and jongsung. Existing formal expressions and learning algorithms have the limits to meet with those changes promptly and efficiently. So, we present an methods for recognizing Korean abnormal language(Koral) to improve accuracy and efficiency of filtering system. The method is based on syllabic than word and Smith-waterman algorithm. Through the experiment on filter keyword and e-mail extracted from mail server, we confirmed that Koral is recognized exactly according to similarity level. The required time and space costs are within the permitted limit.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

A Study on Spoken Digits Analysis and Recognition (숫자음 분석과 인식에 관한 연구)

  • 김득수;황철준
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.6 no.3
    • /
    • pp.107-114
    • /
    • 2001
  • This paper describes Connected Digit Recognition with Considering Acoustic Feature in Korea. The recognition rate of connected digit is usually lower than word recognition. Therefore, speech feature parameter and acoustic feature are employed to make robust model for digit, and we could confirm the effect of Considering. Acoustic Feature throughout the experience of recognition. We used KLE 4 connected digit as database and 19 continuous distributed HMM as PLUs(Phoneme Like Units) using phonetical rules. For recognition experience, we have tested two cases. The first case, we used usual method like using Mel-Cepstrum and Regressive Coefficient for constructing phoneme model. The second case, we used expanded feature parameter and acoustic feature for constructing phoneme model. In both case, we employed OPDP(One Pass Dynamic Programming) and FSA(Finite State Automata) for recognition tests. When appling FSN for recognition, we applied various acoustic features. As the result, we could get 55.4% recognition rate for Mel-Cepstrum, and 67.4% for Mel-Cepstrum and Regressive Coefficient. Also, we could get 74.3% recognition rate for expanded feature parameter, and 75.4% for applying acoustic feature. Since, the case of applying acoustic feature got better result than former method, we could make certain that suggested method is effective for connected digit recognition in korean.

  • PDF

Improvement of Recognition Speed for Real-time Address Speech Recognition (실시간 주소 음성인식을 위한 인식 시스템의 인식속도 개선)

  • Hwang Cheol-Jun;Oh Se-Jin;Kim Bum-Koog;Jung Ho-Youl;Chung Hyun-Yeol
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.74-77
    • /
    • 1999
  • 본 논문에서는 본 연구실에서 개발한 주소 음성인식 시스템의 인식 속도를 개선시키기 위하예 새로운 가변 프루닝 문턱치를 적용하는 방법을 제안하고 실험을 통하여 그 유효성을 확인하였다. 기존의 가변 프루닝 문턱치는 일정 프레임이 경과하면 일정 값을 가진 문턱치를 계속하여 감소시켜나가는 방법을 반복하기 때문에, 불필요한 탐색공간을 탐색하게 된다. 본 논문에서 새로이 제안하는 가변 프루닝 문턱치를 채용하는 방법은 처음 일정 구간이 경과되면 일정 문턱치를 감소시키나, 다음 일정 프레임에서는 탐색되어야할 후보에 따라서 문턱치를 변화시켜 프루닝시키기 때문에 탐색공간을 효과적으로 감소시킬 수 있다. 제안된 방법의 유효성을 확인하기 위하여, 본 연구실에서 개발한 한국어 주소 입력 시스템에 적용하였다. 이 시스템은 48개의 연속 HMM 유사음소단위(Phoneme Like Units; PLUs)를 인식의 기본단위로 하고, .사용환경 변화에 의한 인식성능의 저하를 최소화하기 위해 최대사후 확률추정법(Maximum A Posteriori Probability Estimation; MAP)을 사용하며, 인식알고리즘으로는OPDP(One Pass Dynamic Programming)법을 이용하고 있다. 남성화자 3인에 의한 75개의 연결주소명을 이용하여 인식 실험을 수행한 결과 고정 프루닝 문턱치를 적용한 경우 인식률은 평균 $96.0\%$, 인식 시간은 5.26초였고, 기존의 가변 프루닝 문턱치의 경우 인식률은 평균 $96.0\%$, 인식 시간은 5.1초인 데 비하여, 새로운 가변 프루닝 문턱치를 적용찬 경우에는 인식률 저하없이 인식 시간이 4.34초로, 기존에 비해 각각 0.92초, 0.76초 인식 시간이 감소되어 제안한 방법의 유효성을 확인할 수 있었다.는 달리 각 산란 영역에서 그 지수는 1씩 작은 값을 갖는다.향에 따라 음장변화가 크게 다를 것이 예상되므로 이를 규명하기 위해서는 궁극적으로 3차원적인 음장분포 연구가 필요하다. 음향센서를 해저면에 매설할 경우 수충의 수온변화와 센서 주변의 수온변화 사이에는 어느 정도의 시간지연이 존재하게 되므로 이에 대한 영향을 규명하는 것도 센서의 성능예측을 위해서 필요하리라 사료된다.가지는 심부 가스의 개발 성공률을 증가시키기 위하여 심부 가스가 존재하는 지역의 지질학적 부존 환경 및 조성상의 특성과 생산시 소요되는 생산비용을 심도에 따라 분석하고 생산에 수반되는 기술적 문제점들을 정리하였으며 마지막으로 향후 요구되는 연구 분야들을 제시하였다. 또한 참고로 현재 심부 가스의 경우 미국이 연구 개발 측면에서 가장 활발한 활동을 전개하고 있으며 그 결과 다수의 신뢰성 있는 자료들을 확보하고 있으므로 본 논문은 USGS와 Gas Research Institute(GRI)에서 제시한 자료에 근거하였다.ऀĀ耀Ā삱?⨀؀Ā Ā?⨀ጀĀ耀Ā?돀ꢘ?⨀硩?⨀ႎ?⨀?⨀넆돐쁖잖⨀쁖잖⨀/ࠐ?⨀焆덐瀆倆Āⶇ퍟ⶇ퍟ĀĀĀĀ磀鲕좗?⨀肤?⨀⁅Ⴅ?⨀쀃잖⨀䣙熸ጁ↏?⨀

  • PDF

A Frame Unit Based Adaptive Pruning Algorithm for the East Speech Recognition (음성인식의 고속화를 위한 프레임 단위 적응 프루닝 알고리즘)

  • Hwang Cheol-Jun;Oh Se-Jin;Kim Bum-Koog;Jung Ho-Youl;Chung Hyun-Yeol
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.183-186
    • /
    • 2000
  • 본 논문에서는 인식이 진행되는 동안 탐색 공간을 효과적으로 줄임으로써 음성인식의 고속화를 달성할 수 있는 새로운 프레임 단위 적응 프루닝 알고리즘을 제안하고 실험을 통하여 그 유효성을 확인하였다. 이것은 앞 프레임과 뒤 프레임 사이의 최대확률은 높은 상관성을 가지므로 프루닝 문턱치를 앞 프레임의 최대 확률로부터 효과적으로 구할 수 있다는 사실에 근거를 두고있다. 이 방법에서는 앞 프레임의 최대 우도 확률과 후보 확률들의 조합으로 현재 프레임의 프루닝 문턱치를 갱신함으로써 현재 프레임의 문턱치를 인식 과정 중에 얻을 수 있기 때문에, 인식 태스크가 바뀌어도 문턱치를 구하기 위한 사전 실험을 수행할 필요가 없게 된다. 또한, 프레임 단위로 적응적으로 얻어진 문턱치는 다른 환경 하에서도 인식 속도의 향상을 가져올 수 있게 된다. 제안된 알고리즘의 유효성을 확인하여 위하여 한국어 주소 인식 시스템에 적용하였다. 본 시스템은 48개의 유사음소단위(PLUs)를 인식의 기본단위로 하고, 적응알고리즘으로는 최대사후확률추정법((MAP: Maximum A Posteriori Probability Estimation)을, 인식 알고리즘으로는 OPDP(One Pass Dynamic Programming)법을 이용하였다 남성화자 3인이 25개의 연결 주소명을 대상으로 인식 실험을 수행한 결과, 제안된 프레임단위 적응프루닝 문턱치를 적용한 경우를 기존의 고정 프루닝 문턱치와 가변 프루닝 문턱치를 적용한 경우와 비교하였을 때 인식률의 변화 없이 탐색공간이 상대적으로 각각 $14.4\%$9.14\%가 감소되어 제안된 프레임 단위 적응 프루닝 알고리즘의 유효성을 확인할 수 있었다. 시,공간적 분포 특성이 구체적으로 규명되면 보다 정확한 음장변화 추정이 이뤄져야 할 것으로 보인다. 또한 내부파와 음파의 상대적인 진행 방향에 따라 음장변화가 크게 다를 것이 예상되므로 이를 규명하기 위해서는 궁극적으로 3차원적인 음장분포 연구가 필요하다. 음향센서를 해저면에 매설할 경우 수충의 수온변화와 센서 주변의 수온변화 사이에는 어느 정도의 시간지연이 존재하게 되므로 이에 대한 영향을 규명하는 것도 센서의 성능예측을 위해서 필요하리라 사료된다.가지는 심부 가스의 개발 성공률을 증가시키기 위하여 심부 가스가 존재하는 지역의 지질학적 부존 환경 및 조성상의 특성과 생산시 소요되는 생산비용을 심도에 따라 분석하고 생산에 수반되는 기술적 문제점들을 정리하였으며 마지막으로 향후 요구되는 연구 분야들을 제시하였다. 또한 참고로 현재 심부 가스의 경우 미국이 연구 개발 측면에서 가장 활발한 활동을 전개하고 있으며 그 결과 다수의 신뢰성 있는 자료들을 확보하고 있으므로 본 논문은 USGS와 Gas Research Institute(GRI)에서 제시한 자료에 근거하였다.ऀĀ耀Ā삱?⨀؀Ā Ā?⨀ጀĀ耀Ā?돀ꢘ?⨀硩?⨀ႎ?⨀?⨀넆돐쁖잖⨀쁖잖⨀/ࠐ?⨀焆덐瀆倆Āⶇ퍟ⶇ퍟ĀĀĀĀ磀鲕좗?⨀肤?⨀⁅Ⴅ?⨀쀃잖⨀䣙熸ጁ↏?⨀

  • PDF

A Study on Applicability of Neem Resin as a Fixative on the Painting Layer of Mural Paintings from Payathonzu Temple in Bagan, Myanmar (미얀마 바간유적 파야톤주 사원벽화의 채색층 고착처리를 위한 님(Neem) 수지 적용 가능성 연구)

  • Eum, Sojeong;Lee, Hwasoo
    • Conservation Science in Museum
    • /
    • v.24
    • /
    • pp.117-132
    • /
    • 2020
  • The painting layer of Payathonzu temple mural paintings in the ruins of Bagan Myanmar has been damaged due to various reasons. In this study, the applicability of Neem resin, a traditional Myanmar adhesive, as a fixative on the painting layer was examined. Cow glue and Paraloid B-72 were selected as fixatives in the comparison group, and pseudo-specimens with conditions similar to the original mural paintings were produced to examine the changes before and after applying the fixatives and according to the deterioration experiments. As a result of conducting the experiments and comparing the fixatives, it was found that changes on the surface such as smudge, yellowing and gloss are greater with the application of higher concentration of Neem resin than with other fixatives. However, such changes were relatively small under the condition of 4% concentration. It was also confirmed that chromaticity and glossiness vary greatly between before- and after-application of the fixatives but that such discrepancies tend to decrease at 4% concentration compared to other concentrations. As for fixation strength, it was found that the fixation capacity of Neem resin on the base and painting layers is overall higher than other fixatives as the concentration is increased. Therefore, the applicability of 4% concentration of Neem resin as a fixative on the painting layer was confirmed considering the low surface changes according to environmental factors, low color discrepancy and glossiness, and characteristics of excellent fixation strength. It is believed that the findings of this study could be used as basic data for the preservation of Payathonzu temple mural paintings in the future.

Mantle Ultrastructure of the Spiny Top Shell, Batillus cornutus (Gastropoda: Turbinidae) (소라(Batillus cornutus) 외투막의 미세구조)

  • Jung, Gui-Kwon;Park, Jung-Jun;Jin, Young-Guk;Ju, Sun-Mi;Lee, Jae-Woo;Jung, Ae-Jin;Lee, Jung-Sick
    • The Korean Journal of Malacology
    • /
    • v.24 no.1
    • /
    • pp.41-50
    • /
    • 2008
  • The histochemical characteristics and ultrastructure of the mantle in the spiny top shell, Batillus cornutus were described using light and electron microscopy. The simple epidermal layer wrapped on the top and bottom of the centrally located connective tissue. And then the epidermal layer were divided into the outer epidermal layer near a shell and the inner epidermal layer closed to the visceral mass. The connective tissue layer was composed of the collagen fiber muscularfiber bundle and hemolymph sinus. Mucous cells in the apical mantle contained acid and neutral mucopolysaccaride, and acidic carboxylated mucopolysaccaride in the mid and marginal mantle. The mantle thickness, epidermal layer thickness and hemolymph sinus area displayed a trend of reduction from the marginal zone to the apical zone. From TEM observation, it was possible to distinguish epithelium, ciliated cell, absorptive cell and secretory cell in the epidermal layer. The epithelia were columnar and the nucleus was elliptical. The free surface were covered with microvilli. The lateral membranes of epithelium was con nected with neighboring cells by the zonular occludens, zonular adherens and membrane interdigitation. Ciliated cell on free surface had cilia and microvilli, and numerous mitochondria in the apical cytoplasm. In the epidermal layer, it observed 2 type cells having absorptive function. The absorptive cells were columnar in shape, and contained microvilli, pinocytotic vesicles, mitochondria and lysosomes of various electron density. Secretory cells can be divided into four types (A, B, C, D) depending on the cell shape and characteristics of secretory granules. These cells were unicellular glands and had similar characteristics to previously reported on the mantle of the gastropod and bivalves.

  • PDF