• Title/Summary/Keyword: Word Input

Search Result 225, Processing Time 0.027 seconds

Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model (Deep Neural Network 언어모델을 위한 Continuous Word Vector 기반의 입력 차원 감소)

  • Kim, Kwang-Ho;Lee, Donghyun;Lim, Minkyu;Kim, Ji-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.3-8
    • /
    • 2015
  • In this paper, we investigate an input dimension reduction method using continuous word vector in deep neural network language model. In the proposed method, continuous word vectors were generated by using Google's Word2Vec from a large training corpus to satisfy distributional hypothesis. 1-of-${\left|V\right|}$ coding discrete word vectors were replaced with their corresponding continuous word vectors. In our implementation, the input dimension was successfully reduced from 20,000 to 600 when a tri-gram language model is used with a vocabulary of 20,000 words. The total amount of time in training was reduced from 30 days to 14 days for Wall Street Journal training corpus (corpus length: 37M words).

A Study on Validation Testing for Input Files of MS Word-Processor (MS 워드프로세서의 입력 파일에 대한 유효성 테스팅 방법에 관한 연구)

  • Yun, Young-Min;Choi, Jong-Cheon;Yoo, Hae-Young;Cho, Seong-Je
    • The KIPS Transactions:PartC
    • /
    • v.14C no.4
    • /
    • pp.313-320
    • /
    • 2007
  • In this paper, we propose a method to analyze security vulnerabilities of MS word-processor by checking the validation of its input files. That is, this study is to detect some vulnerabilities in the input file of the word processor by analyzing the header information of its input file. This validation test can not be conducted by the existing software fault injection tools including Holodeck and CANVAS. The proposed method can be also applied to identify the input file vulnerabilities of Hangul and Microsoft Excel which handle a data file with a header as an input. Moreover, our method can provide a means for assessing the fault tolerance and trustworthiness of the target software.

Japanese Dictionary Input System in Korean Traditional Reading Rule of Chinese Character (한자음으로 일본어 사전을 검색하는 방법(독음입력법))

  • Jeong, Cheol
    • Annual Conference on Human and Language Technology
    • /
    • 2005.10a
    • /
    • pp.139-144
    • /
    • 2005
  • When a Japanese learner in Korea tries to find Japanese dictionary, he must know the pronunciation of the target word. But it's not easy to know the pronunciation of target word from Japanese sentence. Because most of general Japanese sentence shows only HanJa(Chinese character) instead of Kana(Japanese alphabet). If the Japanese learner knows the Korean traditional pronunciation of the target word, he can input the word to electronic Japanese dictionary with the Korean pronunciation. For this solution, the dictionary service provider must convert the Japanese word to Korean pronunciation, in advance. After setting of the conversions as a additional searching process, we can find the target word through Korean pronunciation of the Japanese HanJa, This process is possible for the three reasons below, 1. Korean, Japanese and Chinese are using the nearly same HanJa. The difference is small. 2. Most Japanese learner in Korea, knows the Korean pronunciation of the HanJa. 3. The Korean pronunciation of the HanJa is nearly unique, a HanJa has a Korean pronunciation, generally.

  • PDF

Data Input and Output of Unstructured Data of Large Capacity (대용량 비정형 데이터 자료 입력 및 출력)

  • Sim, Kyu-Cheol;Kang, Byung-Jun;Kim, Kyung-Hwan;Jung, Hoe-Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.05a
    • /
    • pp.613-615
    • /
    • 2013
  • Request to provide a service to XML word file recently has been increasing. In this paper, it is converted to an XML file data input (HWP, MS-Office) a Word file, stored in a database by extracting data directly input to the word processor user creates an XML mapping file I to provide a system that. This can be retrieved from the database the required data to previously created forms word processor, to generate a Word file from the application program a word processing document.

  • PDF

A New Rijection Algorithm Using Word-Dependent Garbage Models

  • Lee, Gang-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2E
    • /
    • pp.27-31
    • /
    • 1997
  • This paper proposes a new rejection algorithm which distinguishes unregistered spoken words(or non-keywords) from registered vocabulary. Two kinds of garbage models are employed in this design ; the original garbage model and a new word garbage model. The original garbage model collects all non-keyword patterns where the new word garbage model collects patterns classified by recognizing each non-keyword pattern with registered vocabulary. These two types of garbage models work together to make a robust reject decision. The first stage of processing is the classification of an input pattern through the original garbage model. In the event that the first stage of processing is ambiguous, the new word dependent garbage model is used to classify thye input pattern as either a registered or non-registered word. This paper shows the efficiency of the new word dependent garbage model. A Dynamic Multisection method is used to test the performance of the algorithm. Results of this experiment show that the proposed algorithm performs at a higher level than that of the original garbage model.

  • PDF

Predictive Morphological Analysis of Korean with Dynamic Programming (동적 프로그래밍기법에 근거한 예측중심의 한국어 형태소 분석)

  • 김덕봉;최기선
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.2
    • /
    • pp.145-180
    • /
    • 1994
  • In this paper,we present an efficient morphological analysis model for Korean which produces from an input word all the feasible sequences of morphemes in the word.This model is deterministic in applying spelling rules,and has few redundant computations in processing complex and ambiguous words.This is the effect of three types of new techniques:first,a new method for interpreting speilling rules;second,predictive rule applications which restrict to the spelling rules suitable for the input word;third,the use of dynamic programming which enables the analyzer to avoid recomputing analyzed substring in case the input word is morphologically ambiguous.our model has been experimented with 413,975 word randomly selected from the corpus of Korean elementary textbooks.Experimental results show that our model guarantees fast and reliable processing.

A Usability Testing of the Word-Prediction Function of the AAC Keyboard for the People with Cerebral Palsy (보완대체의사소통(AAC) 글자판의 단어예측기능에 대한 뇌병변장애인 대상의 사용성 평가)

  • Lee, H.Y.;Hong, K-H.
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.9 no.3
    • /
    • pp.209-214
    • /
    • 2015
  • The purpose of this study was to examine (1) the influence of the word-prediction function on the sentence generation speed and (2) the necessity, convenience, and satisfaction of the word-prediction function of the AAC keyboard. A total of 10 adults with cerebral palsy participated and the word-prediction function of the Korean high-tech AAC device called "MyTalkie Smart" keyboard was used for this study. Participants were required to generate sentence as voice outputs using a word-prediction function and letters direct-input function respectively, then they were required to evaluate the necessity, convenience, and satisfaction using a five-point Likert scale. Other user requirements were examined using a free feedback. The results of this study presented that the sentence generation speeds were faster when participants used a word-prediction function than using a letters direct-input function. However, there was no statistically significant difference between these two input methods, and it might be due to the lack of time to practice the new device. Participants showed positive responses for the necessity, convenience, and satisfaction of the word-prediction function.

  • PDF

Modeling of Convolutional Neural Network-based Recommendation System

  • Kim, Tae-Yeun
    • Journal of Integrative Natural Science
    • /
    • v.14 no.4
    • /
    • pp.183-188
    • /
    • 2021
  • Collaborative filtering is one of the commonly used methods in the web recommendation system. Numerous researches on the collaborative filtering proposed the numbers of measures for enhancing the accuracy. This study suggests the movie recommendation system applied with Word2Vec and ensemble convolutional neural networks. First, user sentences and movie sentences are made from the user, movie, and rating information. Then, the user sentences and movie sentences are input into Word2Vec to figure out the user vector and movie vector. The user vector is input on the user convolutional model while the movie vector is input on the movie convolutional model. These user and movie convolutional models are connected to the fully-connected neural network model. Ultimately, the output layer of the fully-connected neural network model outputs the forecasts for user, movie, and rating. The test result showed that the system proposed in this study showed higher accuracy than the conventional cooperative filtering system and Word2Vec and deep neural network-based system suggested in the similar researches. The Word2Vec and deep neural network-based recommendation system is expected to help in enhancing the satisfaction while considering about the characteristics of users.

Visual and Phonological Neighborhood Effects in Computational Visual Word Recognition Model (계산주의적 시각단어재인 모델에서의 시각이웃과 음운이웃 효과)

  • Lim, Heui-Seok;Park, Ki-Nam;Nam, Ki-Chun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.4
    • /
    • pp.803-809
    • /
    • 2007
  • This study suggests a computational model to inquire the roles of phonological information and orthography information in the process of visual word recognition among the courses of language information processing, and the representation types of the mental lexicon. The model that this study is presenting here was designed as a feed forward network structure which is comprised of input layer which uses two Korean syllables as its input value, hidden layer, and output layer which express meanings. As the result of the study, the computational model showed the phonological and orthographic neighborhood effect among language phenomena which are shown in Korean word recognition, and showed proofs which implies that the mental lexicon is represented as phonological information in the process of Korean word recognition.

  • PDF

Expansion of Word Representation for Named Entity Recognition Based on Bidirectional LSTM CRFs (Bidirectional LSTM CRF 기반의 개체명 인식을 위한 단어 표상의 확장)

  • Yu, Hongyeon;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.44 no.3
    • /
    • pp.306-313
    • /
    • 2017
  • Named entity recognition (NER) seeks to locate and classify named entities in text into pre-defined categories such as names of persons, organizations, locations, expressions of times, etc. Recently, many state-of-the-art NER systems have been implemented with bidirectional LSTM CRFs. Deep learning models based on long short-term memory (LSTM) generally depend on word representations as input. In this paper, we propose an approach to expand word representation by using pre-trained word embedding, part of speech (POS) tag embedding, syllable embedding and named entity dictionary feature vectors. Our experiments show that the proposed approach creates useful word representations as an input of bidirectional LSTM CRFs. Our final presentation shows its efficacy to be 8.05%p higher than baseline NERs with only the pre-trained word embedding vector.