• Title/Summary/Keyword: Word order

Search Result 1,011, Processing Time 0.026 seconds

Ontology Construction and Its Application to Disambiguate Word Senses (온톨로지 구축 및 단어 의미 중의성 해소에의 활용)

  • Kang, Sin-Jae
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.491-500
    • /
    • 2004
  • This paper presents an ontology construction method using various computational language resources, and an ontology-based word sense disambiguation method. In order to acquire a reasonably practical ontology the Kadokawa thesaurus is extended by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. To apply the ontology to disambiguate word senses, we apply the previously-secured dictionary information to select the correct senses of some ambiguous words with high precision, and then use the ontology to disambiguate the remaining ambiguous words. The mutual information between concepts in the ontology was calculated before using the ontology as knowledge for disambiguating word senses. If mutual information is regarded as a weight between ontology concepts, the ontology can be treated as a graph with weighted edges, and then we locate the weighted path from one concept to the other concept. In our practical machine translation system, our word sense disambiguation method achieved a 9% improvement over methods which do not use ontology for Korean translation.

Weighted Bayesian Automatic Document Categorization Based on Association Word Knowledge Base by Apriori Algorithm (Apriori알고리즘에 의한 연관 단어 지식 베이스에 기반한 가중치가 부여된 베이지만 자동 문서 분류)

  • 고수정;이정현
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.2
    • /
    • pp.171-181
    • /
    • 2001
  • The previous Bayesian document categorization method has problems that it requires a lot of time and effort in word clustering and it hardly reflects the semantic information between words. In this paper, we propose a weighted Bayesian document categorizing method based on association word knowledge base acquired by mining technique. The proposed method constructs weighted association word knowledge base using documents in training set. Then, classifier using Bayesian probability categorizes documents based on the constructed association word knowledge base. In order to evaluate performance of the proposed method, we compare our experimental results with those of weighted Bayesian document categorizing method using vocabulary dictionary by mutual information, weighted Bayesian document categorizing method, and simple Bayesian document categorizing method. The experimental result shows that weighted Bayesian categorizing method using association word knowledge base has improved performance 0.87% and 2.77% and 5.09% over weighted Bayesian categorizing method using vocabulary dictionary by mutual information and weighted Bayesian method and simple Bayesian method, respectively.

  • PDF

Context-sensitive Word Error Detection and Correction for Automatic Scoring System of English Writing (영작문 자동 채점 시스템을 위한 문맥 고려 단어 오류 검사기)

  • Choi, Yong Seok;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.1
    • /
    • pp.45-56
    • /
    • 2015
  • In this paper, we present a method that can detect context-sensitive word errors and generate correction candidates. Spelling error detection is one of the most widespread research topics, however, the approach proposed in this paper is adjusted for an automated English scoring system. A common strategy in context-sensitive word error detection is using a pre-defined confusion set to generate correction candidates. We automatically generate a confusion set in order to consider the characteristics of sentences written by second-language learners. We define a word error that cannot be detected by a conventional grammar checker because of part-of-speech ambiguity, and propose how to detect the error and generate correction candidates for this kind of error. An experiment is performed on the English writings composed by junior-high school students whose mother tongue is Korean. The f1 value of the proposed method is 70.48%, which shows that our method is promising comparing to the current-state-of-the art.

A Micro-Payment Protocol based on PayWord for Multiple Payments (다중 지불이 가능한 PayWord 기반의 소액 지불 프로토콜)

  • 김선형;김태윤
    • Journal of KIISE:Information Networking
    • /
    • v.30 no.2
    • /
    • pp.199-206
    • /
    • 2003
  • one of the representative micropayment protocols. The original PayWord system is designed for a user who generates paywords by performing hash chain operation for payment to an only designated vendor. In other words, a user has to create new hash chain values in order to establish commercial transactions with different vendors on the Internet. Therefore, we suggest an efficient scheme that is able to deal with business to different vendors by using only one hash chain operation to supplement this drawback. In this proposed system, a broker creates a new series of hash chain values along with a certificate for the user's certificate request. This certificate is signed by a broker to give authority enabling a user to generate hash chain values. hew hash chain values generated by a broker provide means to a user to do business with multiple vendors.

Two Statistical Models for Automatic Word Spacing of Korean Sentences (한글 문장의 자동 띄어쓰기를 위한 두 가지 통계적 모델)

  • 이도길;이상주;임희석;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.358-371
    • /
    • 2003
  • Automatic word spacing is a process of deciding correct boundaries between words in a sentence including spacing errors. It is very important to increase the readability and to communicate the accurate meaning of text to the reader. The previous statistical approaches for automatic word spacing do not consider the previous spacing state, and thus can not help estimating inaccurate probabilities. In this paper, we propose two statistical word spacing models which can solve the problem of the previous statistical approaches. The proposed models are based on the observation that the automatic word spacing is regarded as a classification problem such as the POS tagging. The models can consider broader context and estimate more accurate probabilities by generalizing hidden Markov models. We have experimented the proposed models under a wide range of experimental conditions in order to compare them with the current state of the art, and also provided detailed error analysis of our models. The experimental results show that the proposed models have a syllable-unit accuracy of 98.33% and Eojeol-unit precision of 93.06% by the evaluation method considering compound nouns.

The Effects of e-WOM's Information Characteristics and Reliability of e-WOM's Information on e-WOM's Perceived Usefulness and Acceptance (온라인 구전정보특성과 정보신뢰성이 지각된 정보유용성과 정보수용성에 미치는 영향)

  • Kim, Young Hun
    • Culinary science and hospitality research
    • /
    • v.24 no.1
    • /
    • pp.151-163
    • /
    • 2018
  • Today, the development of internet brings many changes in formation exploration and acceptance. Not only the customers can come into contact much information about the firm and its product by quick and easy search, but also they produce information by themselves or can spread the information via the internet. Nowadays, customers are progressive information explorer and producer on online. In this sense, this study examined the effects of e-word-of-mouth information characteristics on the consumer's perceived usefulness and perceived acceptance of e-word-of-mouth information in the food service industry in order to suggest directions to enhances marketing strategies for marketer. The research model for this study was designed based on the hypothesis that the characteristics of e-word-of-mouth information and credibility of the information influenced both the user's perceived usefulness and acceptance. Based on total 277 customers obtained from the empirical research, this study reviewed validity, reliability and fitness of research model. The analysis results on these factors are as follow. First, the characteristics of e-word-of-mouth information; vividness, consensus, direction had an influence on the customer's perceived usefulness. Second, the characteristics of e-word-of-mouth information; vividness, consensus, direction had an influence on the customer's perceived acceptance. Third, the reliability of information had an influence on the customer's perceived usefulness and the credibility of e-word-of-mouth information perceived acceptance. Fourth, the customer's perceived usefulness had an influence on the customer's perceived acceptance.

The Effect of Standard Keyboard and Fixed-Split Keyboard on Wrist Posture During Word Processing (문서입력 작업 시 컴퓨터 키보드 유형이 손목관절의 운동학적 특성에 미치는 영향)

  • Kwon, Hyuk-Cheol;Jeong, Dong-Hoon;Kong, Jin-Yong
    • Physical Therapy Korea
    • /
    • v.11 no.1
    • /
    • pp.35-43
    • /
    • 2004
  • There were two purposes of this study. The first was to research the effects of standard and fixed-split keyboards on wrist posture and movements during word processing. The second was to select optimal computer input devices in order to prevent cummulative trauma disorder in the wrist region. The group of subjects consisted of thirteen healthy men and women who all agreed to participate in this study. Kinematic data was measured from both wrist flexion and extension, and wrist radial and ulnar deviation during a 20 minute period of word processing work. The measuring tool was an electrical goniometer, and was produced by Biometrics Cooperation. The results were as follows: 1. The wrist flexion and extension at resting starting position were not significantly different (p>.05), however the angle of radial and ulnar deviation were significantly different in standard and split keyboard use during word processing (p<.05). 2. In the initial 10 minutes, the dynamic angle of wrist flexion and extension were not significantly different (p>.05), however the dynamic angle of radial and ulnar deviation was significantly different in standard and split keyboard use during word processing (p<.05). These results suggest that the split keyboard is more optimal than the standard keyboard, because it prevented excessive ulnar deviation during word processing.

  • PDF

Parting Lyrics Emotion Classification using Word2Vec and LSTM (Word2Vec과 LSTM을 활용한 이별 가사 감정 분류)

  • Lim, Myung Jin;Park, Won Ho;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.9 no.3
    • /
    • pp.90-97
    • /
    • 2020
  • With the development of the Internet and smartphones, digital sound sources are easily accessible, and accordingly, interest in music search and recommendation is increasing. As a method of recommending music, research using melodies such as pitch, tempo, and beat to classify genres or emotions is being conducted. However, since lyrics are becoming one of the means of expressing human emotions in music, the role of the lyrics is increasing, so a study of emotion classification based on lyrics is needed. Therefore, in this thesis, we analyze the emotions of the farewell lyrics in order to subdivide the farewell emotions based on the lyrics. After constructing an emotion dictionary by vectoriziong the similarity between words appearing in the parting lyrics through Word2Vec learning, we propose a method of classifying parting lyrics emotions using Word2Vec and LSTM, which classify lyrics by similar emotions by learning lyrics using LSTM.

Document Classification using Recurrent Neural Network with Word Sense and Contexts (단어의 의미와 문맥을 고려한 순환신경망 기반의 문서 분류)

  • Joo, Jong-Min;Kim, Nam-Hun;Yang, Hyung-Jeong;Park, Hyuck-Ro
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.7
    • /
    • pp.259-266
    • /
    • 2018
  • In this paper, we propose a method to classify a document using a Recurrent Neural Network by extracting features considering word sense and contexts. Word2vec method is adopted to include the order and meaning of the words expressing the word in the document as a vector. Doc2vec is applied for considering the context to extract the feature of the document. RNN classifier, which includes the output of the previous node as the input of the next node, is used as the document classification method. RNN classifier presents good performance for document classification because it is suitable for sequence data among neural network classifiers. We applied GRU (Gated Recurrent Unit) model which solves the vanishing gradient problem of RNN. It also reduces computation speed. We used one Hangul document set and two English document sets for the experiments and GRU based document classifier improves performance by about 3.5% compared to CNN based document classifier.

Study on History Tracking Technique of the Document File through RSID Analysis in MS Word (MS 워드의 RSID 분석을 통한 문서파일 이력 추적 기법 연구)

  • Joun, Jihun;Han, Jaehyeok;Jung, Doowon;Lee, Sangjin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.6
    • /
    • pp.1439-1448
    • /
    • 2018
  • Many electronic document files, including Microsoft Office Word (MS Word), have become a major issue in various legal disputes such as privacy, contract forgery, and trade secret leakage. The internal metadata of OOXML (Office Open XML) format, which is used since MS Word 2007, stores the unique Revision Identifier (RSID). The RSID is a distinct value assigned to a corresponding word, sentence, or paragraph that has been created/modified/deleted after a document is saved. Also, document history, such as addition/correction/deletion of contents or the order of creation, can be tracked using the RSID. In this paper, we propose a methodology to investigate discrimination between the original document and copy as well as possible document file leakage by utilizing the changes of the RSID according to the user's behavior.