• Title/Summary/Keyword: Text Error Correction

Search Result 18, Processing Time 0.029 seconds

Features of an Error Correction Memory to Enhance Technical Texts Authoring in LELIE

  • SAINT-DIZIER, Patrick
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.5 no.2
    • /
    • pp.75-101
    • /
    • 2015
  • In this paper, we investigate the notion of error correction memory applied to technical texts. The main purpose is to introduce flexibility and context sensitivity in the detection and the correction of errors related to Constrained Natural Language (CNL) principles. This is realized by enhancing error detection paired with relatively generic correction patterns and contextual correction recommendations. Patterns are induced from previous corrections made by technical writers for a given type of text. The impact of such an error correction memory is also investigated from the point of view of the technical writer's cognitive activity. The notion of error correction memory is developed within the framework of the LELIE project an experiment is carried out on the case of fuzzy lexical items and negation, which are both major problems in technical writing. Language processing and knowledge representation aspects are developed together with evaluation directions.

Improved Statistical Language Model for Context-sensitive Spelling Error Candidates (문맥의존 철자오류 후보 생성을 위한 통계적 언어모형 개선)

  • Lee, Jung-Hun;Kim, Minho;Kwon, Hyuk-Chul
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.2
    • /
    • pp.371-381
    • /
    • 2017
  • The performance of the statistical context-sensitive spelling error correction depends on the quality and quantity of the data for statistical language model. In general, the size and quality of data in a statistical language model are proportional. However, as the amount of data increases, the processing speed becomes slower and storage space also takes up a lot. We suggest the improved statistical language model to solve this problem. And we propose an effective spelling error candidate generation method based on a new statistical language model. The proposed statistical model and the correction method based on it improve the performance of the spelling error correction and processing speed.

A Spelling Error Correction Model in Korean Using a Correction Dictionary and a Newspaper Corpus (교정사전과 신문기사 말뭉치를 이용한 한국어 철자 오류 교정 모델)

  • Lee, Se-Hee;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.427-434
    • /
    • 2009
  • With the rapid evolution of the Internet and mobile environments, text including spelling errors such as newly-coined words and abbreviated words are widely used. These spelling errors make it difficult to develop NLP (natural language processing) applications because they decrease the readability of texts. To resolve this problem, we propose a spelling error correction model using a spelling error correction dictionary and a newspaper corpus. The proposed model has the advantage that the cost of data construction are not high because it uses a newspaper corpus, which we can easily obtain, as a training corpus. In addition, the proposed model has an advantage that additional external modules such as a morphological analyzer and a word-spacing error correction system are not required because it uses a simple string matching method based on a correction dictionary. In the experiments with a newspaper corpus and a short message corpus collected from real mobile phones, the proposed model has been shown good performances (a miss-correction rate of 7.3%, a F1-measure of 97.3%, and a false positive rate of 1.1%) in the various evaluation measures.

The Utilization of Local Document Information to Improve Statistical Context-Sensitive Spelling Error Correction (통계적 문맥의존 철자오류 교정 기법의 향상을 위한 지역적 문서 정보의 활용)

  • Lee, Jung-Hun;Kim, Minho;Kwon, Hyuk-Chul
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.7
    • /
    • pp.446-451
    • /
    • 2017
  • The statistical context-sensitive spelling correction technique in this thesis is based upon Shannon's noisy channel model. The interpolation method is used for the improvement of the correction method proposed in the paper, and the general interpolation method is to fill the middle value of the probability by (N-1)-gram and (N-2)-gram. This method is based upon the same statistical corpus. In the proposed method, interpolation is performed using the frequency information between the statistical corpus and the correction document. The advantages of using frequency of correction documents are twofold. First, the probability of the coined word existing only in the correction document can be obtained. Second, even if there are two correction candidates with ambiguous probability values, the ambiguity is solved by correcting them by referring to the correction document. The method proposed in this thesis showed better precision and recall than the existing correction model.

Rule-based Speech Recognition Error Correction for Mobile Environment (모바일 환경을 고려한 규칙기반 음성인식 오류교정)

  • Kim, Jin-Hyung;Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.10
    • /
    • pp.25-33
    • /
    • 2012
  • In this paper, we propose a rule-based model to correct errors in a speech recognition result in the mobile device environment. The proposed model considers the mobile device environment with limited resources such as processing time and memory, as follows. In order to minimize the error correction processing time, the proposed model removes some processing steps such as morphological analysis and the composition and decomposition of syllable. Also, the proposed model utilizes the longest match rule selection method to generate one error correction candidate per point, assumed that an error occurs. For the purpose of deploying memory resource, the proposed model uses neither the Eojeol dictionary nor the morphological analyzer, and stores a combined rule list without any classification. Considering the modification and maintenance of the proposed model, the error correction rules are automatically extracted from a training corpus. Experimental results show that the proposed model improves 5.27% on the precision and 5.60% on the recall based on Eojoel unit for the speech recognition result.

A Joint Statistical Model for Word Spacing and Spelling Error Correction Simultaneously (띄어쓰기 및 철자 오류 동시교정을 위한 통계적 모델)

  • Noh, Hyung-Jong;Cha, Jeong-Won;Lee, GaryGeun-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.2
    • /
    • pp.131-139
    • /
    • 2007
  • In this paper, we present a preprocessor which corrects word spacing errors and spelling correction errors simultaneously. The proposed expands noisy-channel model so that it corrects both errors in colloquial style sentences effectively, while preprocessing algorithms have limitations because they correct each error separately. Using Eojeol transition pattern dictionary and statistical data such as n-gram and Jaso transition probabilities, it minimizes the usage of dictionaries and produces the corrected candidates effectively. In experiments we did not get satisfactory results at current stage, we noticed that the proposed methodology has the utility by analyzing the errors. So we expect that the preprocessor will function as an effective error corrector for general colloquial style sentence by doing more improvements.

Discrete Wavelet Transform for Watermarking Three-Dimensional Triangular Meshes from a Kinect Sensor

  • Wibowo, Suryo Adhi;Kim, Eun Kyeong;Kim, Sungshin
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.249-255
    • /
    • 2014
  • We present a simple method to watermark three-dimensional (3D) triangular meshes that have been generated from the depth data of the Kinect sensor. In contrast to previous methods, which maintain the shape of 3D triangular meshes and decide the embedding place, requiring calculations of vertices and their neighbors, our method is based on selecting one of the coordinate axes. To maintain shape, we use discrete wavelet transform and constant regularization. We know that the watermarking system needs the information to be embedded; we used a text to provide that information. We used geometry attacks such as rotation, scales, and translation, to test the performance of this watermarking system. Performance parameters in this paper include the vertices error rate (VER) and bit error rate (BER). The results from the VER and BER indicate that using a correction term before the extraction process makes our system robust to geometry attacks.

Automatic Error Correction System for Erroneous SMS Strings (SMS 변형된 문자열의 자동 오류 교정 시스템)

  • Kang, Seung-Shik;Chang, Du-Seong
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.6
    • /
    • pp.386-391
    • /
    • 2008
  • Some spoken word errors that violate grammatical or writing rules occurs frequently in communication environments like mobile phone and messenger. These unexpected errors cause a problem in a language processing system for many applications like speech recognition, text-to-speech translation, and so on. In this paper, we proposed and implemented an automatic correction system of ill-formed words and word spacing errors in SMS sentences that has been the major errors of poor accuracy. We experimented three methods of constructing the word correction dictionary and evaluated the results of those methods. They are (1) manual construction of error words from the vocabulary list of ill-formed communication languages, (2) automatic construction of error dictionary from the manually constructed corpus, and (3) context-dependent method of automatic construction of error dictionary.

Sequence-to-sequence Autoencoder based Korean Text Error Correction using Syllable-level Multi-hot Vector Representation (음절 단위 Multi-hot 벡터 표현을 활용한 Sequence-to-sequence Autoencoder 기반 한글 오류 보정기)

  • Song, Chisung;Han, Myungsoo;Cho, Hoonyoung;Lee, Kyong-Nim
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.661-664
    • /
    • 2018
  • 온라인 게시판 글과 채팅창에서 주고받는 대화는 실제 사용되고 있는 구어체 특성이 잘 반영된 텍스트 코퍼스로 음성인식의 언어 모델 재료로 활용하기 좋은 학습 데이터이다. 하지만 온라인 특성상 노이즈가 많이 포함되어 있기 때문에 학습에 직접 활용하기가 어렵다. 본 논문에서는 사용자 입력오류가 다수 포함된 문장에서의 한글 오류 보정을 위한 sequence-to-sequence Denoising Autoencoder 모델을 제안한다.

  • PDF

Adaptive Conversion of Web Content for Mobile Terminals (이동단말을 위한 적응적 웹 문서 변환)

  • Kang, Sueng-Chun;Chung, Kwang-Sue
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.6
    • /
    • pp.635-642
    • /
    • 2000
  • In this paper, we proposed an efficient document conversion mechanism to provide a adaptive web document to mobile terminals. We also proposed a RHTML(Reduced HTML) to archive the adaptive tag reduction. Markup error correction process in the proposed adaptive document conversion mechanism converts a HTML(HyperText Markup Language) document into a XML(Extensible Markup Language) application document. This. process makes web document easy to handle with a DOM (Document Object Mode)) as the tree model and removes the hardware overhead in mobile terminals. Also, tag reduction process provides the adaptive web document with three DTD(Document Type Definition)s in the RHTML.

  • PDF