• Title/Summary/Keyword: Korean Spacing-error Correction

Search Result 10, Processing Time 0.026 seconds

A Joint Statistical Model for Word Spacing and Spelling Error Correction Simultaneously (띄어쓰기 및 철자 오류 동시교정을 위한 통계적 모델)

  • Noh, Hyung-Jong;Cha, Jeong-Won;Lee, GaryGeun-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.2
    • /
    • pp.131-139
    • /
    • 2007
  • In this paper, we present a preprocessor which corrects word spacing errors and spelling correction errors simultaneously. The proposed expands noisy-channel model so that it corrects both errors in colloquial style sentences effectively, while preprocessing algorithms have limitations because they correct each error separately. Using Eojeol transition pattern dictionary and statistical data such as n-gram and Jaso transition probabilities, it minimizes the usage of dictionaries and produces the corrected candidates effectively. In experiments we did not get satisfactory results at current stage, we noticed that the proposed methodology has the utility by analyzing the errors. So we expect that the preprocessor will function as an effective error corrector for general colloquial style sentence by doing more improvements.

An Implementation of a Lightweight Spacing-Error Correction System for Korean (한국어 경량형 띄어쓰기 교정 시스템의 구현)

  • Song, Yeong-Kil;Kim, Hark-Soo
    • The Journal of Korean Association of Computer Education
    • /
    • v.12 no.2
    • /
    • pp.87-96
    • /
    • 2009
  • We propose a Korean spacing-error correction system that requires small memory usage although the proposed method is a mixture of rule-based and statistical methods. In addition, to train the proposed model to be robust in mobile colloquial sentences in which spelling errors and omissions of functional words are frequently occurred, we propose a method to automatically transform typical colloquial corpus to mobile colloquial corpus. The proposed system uses statistical information of syllable uni-grams in order to increase coverages on new syllable patterns. Then, the proposed system uses error correction rules of two or more grams of syllables in order to increase accuracies. In the experiments on fake mobile colloquial sentences, the proposed system showed relatively high accuracy of 92.10% (93.80% in typical colloquial corpus, 94.07% in typical balanced corpus) spite of small memory usage of about 1MB.

  • PDF

A Spelling Error Correction Model in Korean Using a Correction Dictionary and a Newspaper Corpus (교정사전과 신문기사 말뭉치를 이용한 한국어 철자 오류 교정 모델)

  • Lee, Se-Hee;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.427-434
    • /
    • 2009
  • With the rapid evolution of the Internet and mobile environments, text including spelling errors such as newly-coined words and abbreviated words are widely used. These spelling errors make it difficult to develop NLP (natural language processing) applications because they decrease the readability of texts. To resolve this problem, we propose a spelling error correction model using a spelling error correction dictionary and a newspaper corpus. The proposed model has the advantage that the cost of data construction are not high because it uses a newspaper corpus, which we can easily obtain, as a training corpus. In addition, the proposed model has an advantage that additional external modules such as a morphological analyzer and a word-spacing error correction system are not required because it uses a simple string matching method based on a correction dictionary. In the experiments with a newspaper corpus and a short message corpus collected from real mobile phones, the proposed model has been shown good performances (a miss-correction rate of 7.3%, a F1-measure of 97.3%, and a false positive rate of 1.1%) in the various evaluation measures.

Classification and analysis of error types for deep learning-based Korean spelling correction (딥러닝 기반 한국어 맞춤법 교정을 위한 오류 유형 분류 및 분석)

  • Koo, Seonmin;Park, Chanjun;So, Aram;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.12
    • /
    • pp.65-74
    • /
    • 2021
  • Recently, studies on Korean spelling correction have been actively conducted based on machine translation and automatic noise generation. These methods generate noise and use as train and data set. This has limitation in that it is difficult to accurately measure performance because it is unlikely that noise other than the noise used for learning is included in the test set In addition, there is no practical error type standard, so the type of error used in each study is different, making qualitative analysis difficult. This paper proposes new 'error type classification' for deep learning-based Korean spelling correction research, and error analysis perform on existing commercialized Korean spelling correctors (System A, B, C). As a result of analysis, it was found the three correction systems did not perform well in correcting other error types presented in this paper other than spacing, and hardly recognized errors in word order or tense.

Automatic Error Correction System for Erroneous SMS Strings (SMS 변형된 문자열의 자동 오류 교정 시스템)

  • Kang, Seung-Shik;Chang, Du-Seong
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.6
    • /
    • pp.386-391
    • /
    • 2008
  • Some spoken word errors that violate grammatical or writing rules occurs frequently in communication environments like mobile phone and messenger. These unexpected errors cause a problem in a language processing system for many applications like speech recognition, text-to-speech translation, and so on. In this paper, we proposed and implemented an automatic correction system of ill-formed words and word spacing errors in SMS sentences that has been the major errors of poor accuracy. We experimented three methods of constructing the word correction dictionary and evaluated the results of those methods. They are (1) manual construction of error words from the vocabulary list of ill-formed communication languages, (2) automatic construction of error dictionary from the manually constructed corpus, and (3) context-dependent method of automatic construction of error dictionary.

State space disturbance observer based controller design for self servo writing (셀프 서보 라이팅을 위한 상태공간 외란 관측기 기반의 제어기 설계)

  • Jung, Youn-Sung;Kang, Hyun-Jae;Lee, Choong-Woo;Chung, Chung-Choo;Cho, Kyu-Nam;Suh, Sang-Min;Oh, Dong-Ho
    • Proceedings of the KIEE Conference
    • /
    • 2007.10a
    • /
    • pp.129-130
    • /
    • 2007
  • Self servo track writing(SSTW)은 servo track writer(STW)를 이용하지 않고 hard disk drive의 내부 VCM을 이용하여 servo track을 기록하는 방식이다. SSTW는 이전 servo track을 상대적인 reference로 하여 기록하게 되므로 초기에 발생된 error와 외부의 disturbance의 영향으로 error는 급속하게 증가된다. 이것을 radial error propagation 이라 한다. 본 논문에서는 radial error propagation을 억제하기 위한 correction signal을 설계하고 servo writing 과정에서 발생하는 disturbance의 영향을 제거하기 위하여 disturbance observer(DOB)를 add-on type으로 구성하여 tracking 제어기를 설계하였다. 또한 DOB를 적용한 경우와 유사한 gain margin, phase margin과 sensitivity function을 갖는 제어기를 설계하여 그 성능을 비교하였다. 제안된 방식은 radial error propagation을 억제 하였을 뿐만 아니라 disturbance의 최소화하여 쓰여진 track의 DC track spacing과 AC track Squeeze가 개선된 것을 모의실험을 통하여 검증하였다.

  • PDF

The Evaluation of Quantitative Accuracy According to Detection Distance in SPECT/CT Applied to Collimator Detector Response(CDR) Recovery (Collimator Detector Response(CDR) 회복이 적용된 SPECT/CT에서 검출거리에 따른 정량적 정확성 평가)

  • Kim, Ji-Hyeon;Son, Hyeon-Soo;Lee, Juyoung;Park, Hoon-Hee
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.21 no.2
    • /
    • pp.55-64
    • /
    • 2017
  • Purpose Recently, with the spread of SPECT/CT, various image correction methods can be applied quickly and accurately, which enabled us to expect quantitative accuracy as well as image quality improvement. Among them, the Collimator Detector Response(CDR) recovery is a correction method aiming at resolution recovery by compensating the blurring effect generated from the distance between the detector and the object. The purpose of this study is to find out quantitative change depending on the change in detection distance in SPECT/CT images with CDR recovery applied. Materials and Methods In order to find out the error of acquisition count depending on the change of detection distance, we set the detection distance according to the obit type as X, Y axis radius 30cm for circular, X, Y axis radius 21cm, 10cm for non-circular and non-circular auto(=auto body contouring, ABC_spacing limit 1cm) and applied reconstruction methods by dividing them into Astonish(3D-OSEM with CDR recovery) and OSEM(w/o CDR recovery) to find out the difference in activity recovery depending on the use of CDR recovery. At this time, attenuation correction, scatter correction, and decay correction were applied to all images. For the quantitative evaluation, calibration scan(cylindrical phantom, $^{99m}TcO_4$ 123.3 MBq, water 9293 ml) was obtained for the purpose of calculating the calibration factor(CF). For the phantom scan, a 50 cc syringe was filled with 31 ml of water and a phantom image was obtained by setting $^{99m}TcO_4$ 123.3 MBq. We set the VOI(volume of interest) in the entire volume of the syringe in the phantom image to measure total counts for each condition and obtained the error of the measured value against true value set by setting CF to check the quantitative accuracy according to the correction. Results The calculated CF was 154.28 (Bq/ml/cps/ml) and the measured values against true values in each conditional image were analyzed to be circular 87.5%, non-circular 90.1%, ABC 91.3% and circular 93.6%, non-circular 93.6%, ABC 93.9% in OSEM and Astonish, respectively. The closer the detection distance, the higher the accuracy of OSEM, and Astonish showed almost similar values regardless of distance. The error was the largest in the OSEM circular(-13.5%) and the smallest in the Astonish ABC(-6.1%). Conclusion SPECT/CT images showed that when the distance compensation is made through the application of CDR recovery, the detection distance shows almost the same quantitative accuracy as the proximity detection even under the distant condition, and accurate correction is possible without being affected by the change in detection distance.

  • PDF

Word Spacing Error Correction for the Postprocessing of Speech Recognition (음성 인식 후처리를 위한 띄어쓰기 오류의 교정)

  • Lim Dong-Hee;Kang Seung-Shik;Chang Du-Seong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06b
    • /
    • pp.25-27
    • /
    • 2006
  • 음성인식 결과는 띄어쓰기 오류가 포함되어 있으며 이는 인식 결과에 대한 이후의 정보처리를 어렵게 하는 요인이 된다. 본 논문은 음성 인식 결과의 띄어쓰기 오류를 수정하기 위하여 품사 정보를 이용한 어절 재결합 기법을 기본 알고리즘으로 사용하고 추가로 음절 바이그램 및 4-gram 정보를 이용하는 띄어쓰기 오류 교정 방법을 제안하였다. 또한, 음성인식기의 출력으로 품사 정보가 부착된 경우와 미부착된 경우에 대한 비교 실험을 하였다. 품사 미부착된 경우에는 사전을 이용하여 품사 정보를 복원하였으며 N-gram 통계 정보를 적용했을 때 기본적인 어절 재결합 알고리즘만을 사용 경우보다 띄어쓰기 정확도가 향상되는 것을 확인하였다.

  • PDF

Implementing the Urban Effect in an Interpolation Scheme for Monthly Normals of Daily Minimum Temperature (도시효과를 고려한 일 최저기온의 월별 평년값 분포 추정)

  • 최재연;윤진일
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.4 no.4
    • /
    • pp.203-212
    • /
    • 2002
  • This study was conducted to remove the urban heat island effects embedded in the interpolated surfaces of daily minimum temperature in the Korean Peninsula. Fifty six standard weather stations are usually used to generate the gridded temperature surface in South Korea. Since most of the weather stations are located in heavily populated and urbanized areas, the observed minimum temperature data are contaminated with the so-called urban heat island effect. Without an appropriate correction, temperature estimates over rural area or forests might deviate significantly from the actual values. We simulated the spatial pattern of population distribution within any single population reporting district (city or country) by allocating the reported population to the "urban" pixels of a land cover map with a 30 by 30 m spacing. By using this "digital population model" (DPM), we can simulate the horizontal diffusion of urban effect, which is not possible with the spatially discontinuous nature of the population statistics fer each city or county. The temperature estimation error from the existing interpolation scheme, which considers both the distance and the altitude effects, was regressed to the DPMs smoothed at 5 different scales, i.e., the radial extent of 0.5, 1.5, 2.5, 3.5 and 5.0 km. Optimum regression models were used in conjunction with the distance-altitude interpolation to predict monthly normals of daily minimum temperature in South Korea far 1971-2000 period. Cross validation showed around 50% reduction in terms of RMSE and MAE over all months compared with those by the conventional method.conventional method.

A Spatial Interpolation Model for Daily Minimum Temperature over Mountainous Regions (산악지대의 일 최저기온 공간내삽모형)

  • Yun Jin-Il;Choi Jae-Yeon;Yoon Young-Kwan;Chung Uran
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.2 no.4
    • /
    • pp.175-182
    • /
    • 2000
  • Spatial interpolation of daily temperature forecasts and observations issued by public weather services is frequently required to make them applicable to agricultural activities and modeling tasks. In contrast to the long term averages like monthly normals, terrain effects are not considered in most spatial interpolations for short term temperatures. This may cause erroneous results in mountainous regions where the observation network hardly covers full features of the complicated terrain. We developed a spatial interpolation model for daily minimum temperature which combines inverse distance squared weighting and elevation difference correction. This model uses a time dependent function for 'mountain slope lapse rate', which can be derived from regression analyses of the station observations with respect to the geographical and topographical features of the surroundings including the station elevation. We applied this model to interpolation of daily minimum temperature over the mountainous Korean Peninsula using 63 standard weather station data. For the first step, a primitive temperature surface was interpolated by inverse distance squared weighting of the 63 point data. Next, a virtual elevation surface was reconstructed by spatially interpolating the 63 station elevation data and subtracted from the elevation surface of a digital elevation model with 1 km grid spacing to obtain the elevation difference at each grid cell. Final estimates of daily minimum temperature at all the grid cells were obtained by applying the calculated daily lapse rate to the elevation difference and adjusting the inverse distance weighted estimates. Independent, measured data sets from 267 automated weather station locations were used to calculate the estimation errors on 12 dates, randomly selected one for each month in 1999. Analysis of 3 terms of estimation errors (mean error, mean absolute error, and root mean squared error) indicates a substantial improvement over the inverse distance squared weighting.

  • PDF