• Title/Summary/Keyword: unknown word

Search Result 70, Processing Time 0.023 seconds

Swear Word Detection and Unknown Word Classification for Automatic English Writing Assessment (영작문 자동평가를 위한 비속어 검출과 미등록어 분류)

  • Lee, Gyoung;Kim, Sung Gwon;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.9
    • /
    • pp.381-388
    • /
    • 2014
  • In this paper, we deal with implementation issues of an unknown word classifier for middle-school level English writing test. We define the type of unknown words occurred in English text and discuss the detection process for unknown words. Also, we define the type of swear words occurred in students's English writings, and suggest how to handle this type of words. We implement an unknown word classifier with a swear detection module for developing an automatic English writing scoring system. By experiments with actual test data, we evaluate the accuracy of the unknown word classifier as well as the swear detection module.

Probabilistic Segmentation and Tagging of Unknown Words (확률 기반 미등록 단어 분리 및 태깅)

  • Kim, Bogyum;Lee, Jae Sung
    • Journal of KIISE
    • /
    • v.43 no.4
    • /
    • pp.430-436
    • /
    • 2016
  • Processing of unknown words such as proper nouns and newly coined words is important for a morphological analyzer to process documents in various domains. In this study, a segmentation and tagging method for unknown Korean words is proposed for the 3-step probabilistic morphological analysis. For guessing unknown word, it uses rich suffixes that are attached to open class words, such as general nouns and proper nouns. We propose a method to learn the suffix patterns from a morpheme tagged corpus, and calculate their probabilities for unknown open word segmentation and tagging in the probabilistic morphological analysis model. Results of the experiment showed that the performance of unknown word processing is greatly improved in the documents containing many unregistered words.

Automatic Construction Method of Unknown Word Lexical Dictionary (Unknown Word Lexical Dictionary의 자동 생성 방법)

  • Hwang, Myung-Gwon;Youn, Byung-Su;Jeong, Il-Yong;Kim, Pan-Koo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.05a
    • /
    • pp.3-6
    • /
    • 2008
  • 본 연구는 의미적 정보 검색을 위한 연구 중의 하나로, 현재까지의 의미적 문서 검색에서 큰 걸림돌이었던 사전에 정의되지 않은 단어(Unknown Word)들의 어휘 사전(Lexical Dictionary)을 자동으로 생성하기 위한 것이다. 이를 위해 UW를 기존의 영어 어휘 사전인 워드넷(WordNet)에 정의되지 않은 단어로 간주하고, 웹 문서의 입력을 통하여 UW와 관련된 단어들을 추출하여 의미적 관련 정도를 확률적, 의미적 방법으로 측정한다. 본 논문에서는 UW Lexical Dictionary를 자동으로 구축하기 위한 방법에 대해서만 기술하였고, 정량적이고 객관적인 평가는 포함하지 않고 있다. 하지만 본 연구의 효용성을 확인하기 위한 몇 가지 문서로부터 추출된 결과는 본 연구가 상당히 의미적이며 가치가 높을 것으로 기대되고 있다.

An Analysis on the Elementary 2nd·3rd Students' Problem Solving Ability in Addition and Subtraction Problems with Natural Numbers (초등학교 2·3학년 학생들의 자연수의 덧셈과 뺄셈에 대한 문제해결 능력 분석)

  • Jeong, So Yun;Lee, Dae Hyun
    • Education of Primary School Mathematics
    • /
    • v.19 no.2
    • /
    • pp.127-142
    • /
    • 2016
  • The purpose of this study was to examine the students' problem solving ability according to numeric expression and the semantic types of addition and subtraction word problems. For this, a research was to analyze the addition and subtraction calculation ability, word problem solving ability of the selected $2^{nd}$ grade(118) and 3rd grade(109) students. We got the conclusion as follows: When the students took the survey to assess their ability to solve the numerical expression and the word problems, the correct answer rates of the result unknown problems was larger than those of the change unknown problems or the start unknown problems. the correct answer rates of the change add-into situation was larger than those of the part-part-whole situation in the result unknown addition word problems: they often presented in text books. And, in the cases of the result unknown subtraction word problems that often presented in text books, the correct answer rates of the change take-away situation was the largest. It seemed probably because the students frequently experienced similar situations in the textbooks. We know that the formal calculation ability of the students was a precondition for successful word problem solving, but that it was not a sufficient condition for that.

A Methodology for Urdu Word Segmentation using Ligature and Word Probabilities

  • Khan, Yunus;Nagar, Chetan;Kaushal, Devendra S.
    • International Journal of Ocean System Engineering
    • /
    • v.2 no.1
    • /
    • pp.24-31
    • /
    • 2012
  • This paper introduce a technique for Word segmentation for the handwritten recognition of Urdu script. Word segmentation or word tokenization is a primary technique for understanding the sentences written in Urdu language. Several techniques are available for word segmentation in other languages but not much work has been done for word segmentation of Urdu Optical Character Recognition (OCR) System. A method is proposed for word segmentation in this paper. It finds the boundaries of words in a sequence of ligatures using probabilistic formulas, by utilizing the knowledge of collocation of ligatures and words in the corpus. The word identification rate using this technique is 97.10% with 66.63% unknown words identification rate.

Step-by-step Approach for Effective Korean Unknown Word Recognition (한국어 미등록어 인식을 위한 단계별 접근방법)

  • Park, So-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.369-372
    • /
    • 2009
  • Recently, newspapers as well as web documents include many newly coined words such as "mid"(meaning "American drama" since "mi" means "America" in Korean and "d" refers to the "d" of drama) and "anseup"(meaning "pathetic" since "an" and "seup" literally mean eyeballs and moist respectively). However, these words cause a Korean analyzing system's performance to decrease. In order to recognize these unknown word automatically, this paper propose a step-by-step approach consisting of an unknown noun recognition phase based on full text analysis, an unknown verb recognition phase based on web document frequency, and an unknown noun recognition phase based on web document frequency. The proposed approach includes the phase based on full text analysis to recognize accurately the unknown words occurred once and again in a document. Also, the proposed approach includes two phases based on web document frequency to recognize broadly the unknown words occurred once in the document. Besides, the proposed model divides between an unknown noun recognition phase and an unknown verb recognition phase to recognize various unknown words. Experimental results shows that the proposed approach improves precision 1.01% and recall 8.50% as compared with a previous approach.

  • PDF

Automatic Construction of Korean Unknown Word Dictionary using Occurrence Frequency in Web Documents (웹문서에서의 출현빈도를 이용한 한국어 미등록어 사전 자동 구축)

  • Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.3
    • /
    • pp.27-33
    • /
    • 2008
  • In this paper, we propose a method of automatically constructing a dictionary by extracting unknown words from given eojeols in order to improve the performance of a Korean morphological analyzer. The proposed method is composed of a dictionary construction phase based on full text analysis and a dictionary construction phase based on web document frequency. The first phase recognizes unknown words from strings repeatedly occurred in a given full text while the second phase recognizes unknown words based on frequency of retrieving each string, once occurred in the text, from web documents. Experimental results show that the proposed method improves 32.39% recall by utilizing web document frequency compared with a previous method.

  • PDF

KNE: An Automatic Dictionary Expansion Method Using Use-cases for Morphological Analysis

  • Nam, Chung-Hyeon;Jang, Kyung-Sik
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.3
    • /
    • pp.191-197
    • /
    • 2019
  • Morphological analysis is used for searching sentences and understanding context. As most morpheme analysis methods are based on predefined dictionaries, the problem of a target word not being registered in the given morpheme dictionary, the so-called unregistered word problem, can be a major cause of reduced performance. The current practical solution of such unregistered word problem is to add them by hand-write into the given dictionary. This method is a limitation that restricts the scalability and expandability of dictionaries. In order to overcome this limitation, we propose a novel method to automatically expand a dictionary by means of use-case analysis, which checks the validity of the unregistered word by exploring the use-cases through web crawling. The results show that the proposed method is a feasible one in terms of the accuracy of the validation process, the expandability of the dictionary and, after registration, the fast extraction time of morphemes.

Vocabulary Generation Method by Optical Character Recognition (광학 문자 인식을 통한 단어 정리 방법)

  • Kim, Nam-Gyu;Kim, Dong-Eon;Kim, Seong-Woo;Kwon, Soon-Kak
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.8
    • /
    • pp.943-949
    • /
    • 2015
  • A reader usually spends a lot of time browsing and searching word meaning in a dictionary, internet or smart applications in order to find the unknown words. In this paper, we propose a method to compensate this drawback. The proposed method introduces a vocabulary upon recognizing a word or group of words that was captured by a smart phone camera. Through this proposed method, organizing and editing words that were captured by smart phone, searching the dictionary data using bisection method, listening pronunciation with the use of speech synthesizer, building and editing of vocabulary stored in database are given as the features. A smart phone application for organizing English words was established. The proposed method significantly reduces the organizing time for unknown English words and increases the English learning efficiency.

Detection of System Abnormal State by Cyber Attack (사이버 공격에 의한 시스템 이상상태 탐지 기법)

  • Yoon, Yeo-jeong;Jung, You-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.5
    • /
    • pp.1027-1037
    • /
    • 2019
  • Conventional cyber-attack detection solutions are generally based on signature-based or malicious behavior analysis so that have had difficulty in detecting unknown method-based attacks. Since the various information occurring all the time reflects the state of the system, by modeling it in a steady state and detecting an abnormal state, an unknown attack can be detected. Since a variety of system information occurs in a string form, word embedding, ie, techniques for converting strings into vectors preserving their order and semantics, can be used for modeling and detection. Novelty Detection, which is a technique for detecting a small number of abnormal data in a plurality of normal data, can be performed in order to detect an abnormal condition. This paper proposes a method to detect system anomaly by cyber attack using embedding and novelty detection.