• Title/Summary/Keyword: Edit distance

Search Result 48, Processing Time 0.056 seconds

Context Based Real-time Korean Writing Correction for Foreigners (외국인 학습자를 위한 문맥 기반 실시간 국어 문장 교정)

  • Park, Young-Keun;Kim, Jae-Min;Lee, Seong-Dong;Lee, Hyun Ah
    • Journal of KIISE
    • /
    • v.44 no.10
    • /
    • pp.1087-1093
    • /
    • 2017
  • Educating foreigners in Korean language is attracting increasing attention with the growing number of foreigners who want to learn Korean or want to reside in Korea. Existing spell checkers mostly focus on native Korean speakers, so they are inappropriate for foreigners. In this paper, we propose a correction method for the Korean language that reflects the contextual characteristics of Korean and writing characteristics of foreigners. Our method can extract frequently used expressions by Koreans by constructing syllable reverse-index for eojeol bi-gram extracted from corpus as correction candidates, and generate ranked Korean corrections for foreigners with upgraded edit distance calculation. Our system provides a user interface based on keyboard hooking, so a user can easily use the correction system along with other applications. Our system improves the detection rate for foreign language users by about 45% compared to other systems in foreign language writing environments. This will help foreign users to judge and correct their own writing errors.

Sentence Similarity Measurement Method Using a Set-based POI Data Search (집합 기반 POI 검색을 이용한 문장 유사도 측정 기법)

  • Ko, EunByul;Lee, JongWoo
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.12
    • /
    • pp.711-716
    • /
    • 2014
  • With the gradual increase of interest in plagiarism and intelligent file content search, the demand for similarity measuring between two sentences is increasing. There is a lot of researches for sentence similarity measurement methods in various directions such as n-gram, edit-distance and LSA. However, these methods have their own advantages and disadvantages. In this paper, we propose a new sentence similarity measurement method approaching from another direction. The proposed method uses the set-based POI data search that improves search performance compared to the existing hard matching method when data includes the inverse, omission, insertion and revision of characters. Using this method, we are able to measure the similarity between two sentences more accurately and more quickly. We modified the data loading and text search algorithm of the set-based POI data search. We also added a word operation algorithm and a similarity measure between two sentences expressed as a percentage. From the experimental results, we observe that our sentence similarity measurement method shows better performance than n-gram and the set-based POI data search.

Classification of Porcine Wasting Diseases Using Sound Analysis

  • Gutierrez, W.M.;Kim, S.;Kim, D.H.;Yeon, S.C.;Chang, H.H.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.23 no.8
    • /
    • pp.1096-1104
    • /
    • 2010
  • This bio-acoustic study was aimed at classifying the different porcine wasting diseases through sound analysis with emphasis given to differences in the acoustic footprints of coughs in porcine circo virus type 2 (PCV2), porcine reproductive and respiratory syndrome (PRRS) virus and Mycoplasma hyopneumoniae (MH) - infected pigs from a normal cough. A total of 36 pigs (Yorkshire${\times}$Landrace${\times}$Duroc) with average weight ranging between 25-30 kg were studied, and blood samples of the suspected infected pigs were collected and subjected to serological analysis to determine PCV2, PRRS and MH. Sounds emitted by coughing pigs were recorded individually for 30 minutes depending on cough attacks by a digital camcorder placed within a meter distance from the animal. Recorded signals were digitalized in a PC using the Cool Edit Program, classified through labeling method, and analyzed by one-way analysis of variance and discriminant analysis. Input features after classification showed that normal cough had the highest pitch level compared to other infectious diseases (p<0.002) but not statistically different from PRRS and MH. PCV2 differed statistically (p<0.002) from the normal cough and PRRS but not from MH. MH had the highest intensity and all coughs differed statistically from each other (p<0.0001). PCV2 was statistically different from others (p<0.0001) in formants 1, 2, 3 and 4. There was no statistical difference in duration between different porcine diseases and the normal cough (p>0.6863). Mechanisms of cough sound creation in the airway could be used to explain these observed acoustic differences and these findings indicated that the existence of acoustically different cough patterns depend on causes or the animals' respiratory system conditions. Conclusively, differences in the status of lungs results in different cough sounds. Finally, this study could be useful in supporting an early detection method based on the on-line cough counter algorithm for the initial diagnosis of sick animals in breeding farms.

A Comparative Analysis of Content-based Music Retrieval Systems (내용기반 음악검색 시스템의 비교 분석)

  • Ro, Jung-Soon
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.3
    • /
    • pp.23-48
    • /
    • 2013
  • This study compared and analyzed 15 CBMR (Content-based Music Retrieval) systems accessible on the web in terms of DB size and type, query type, access point, input and output type, and search functions, with reviewing features of music information and techniques used for transforming or transcribing of music sources, extracting and segmenting melodies, extracting and indexing features of music, and matching algorithms for CBMR systems. Application of text information retrieval techniques such as inverted indexing, N-gram indexing, Boolean search, truncation, keyword and phrase search, normalization, filtering, browsing, exact matching, similarity measure using edit distance, sorting, etc. to enhancing the CBMR; effort for increasing DB size and usability; and problems in extracting melodies, deleting stop notes in queries, and using solfege as pitch information were found as the results of analysis.

Construction of Linearly Aliened Corpus Using Unsupervised Learning (자율 학습을 이용한 선형 정렬 말뭉치 구축)

  • Lee, Kong-Joo;Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.387-394
    • /
    • 2004
  • In this paper, we propose a modified unsupervised linear alignment algorithm for building an aligned corpus. The original algorithm inserts null characters into both of two aligned strings (source string and target string), because the two strings are different from each other in length. This can cause some difficulties like the search space explosion for applications using the aligned corpus with null characters and no possibility of applying to several machine learning algorithms. To alleviate these difficulties, we modify the algorithm not to contain null characters in the aligned source strings. We have shown the usability of our approach by applying it to different areas such as Korean-English back-trans literation, English grapheme-phoneme conversion, and Korean morphological analysis.

Topic Similarity-based Event Routing Algorithm for Wireless Ad-Hoc Publish/Subscribe Systems (Ad-Hoc 무선 환경의 발행/구독 시스템을 위한 구독주제 유사도 기반의 이벤트 라우팅 알고리즘)

  • Nguyen, Hieu Trung;Oh, Sang-Yoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.11-22
    • /
    • 2009
  • For a wireless ad-hoc network, event routing algorithm of the publish/subscribe system is especially important for the performance of the system because of the dynamic characteristic and constraint network of its own. In this paper, we propose a new hybrid event routing algorithm. TopSim for efficient publish/subscribe system on the wireless ad-hoc network by extending the ShopParent algorithm by considering not only network overheads to choose a Parent of the publish/subscribe tree, but also topic similarity which is closeness of subscriptions. Our evaluation shows our proposed TopSim performs better for the case where a new joining node subscribed to the multiple topics and there is a node among Parent candidate nodes who subscribe to the ones in the list of multiple topics (related topics).

A study on the Construction of Remote Status Display Software for Soft-RAID system of Linux based Server (리눅스 기반 서버의 소프트-RAID 시스템용 원격 상태 표시 소프트웨어의 구성에 관한 연구)

  • Na, Won-Shik;Lee, Hyun-Chang
    • Journal of Software Assessment and Valuation
    • /
    • v.15 no.1
    • /
    • pp.97-102
    • /
    • 2019
  • In this paper, we propose a method to remotely intuitively identify faults found in storage devices in soft-RAID used in a server system composed of Linux. To do this, we analyze the principle and problem of fault reporting method in the soft-RAID system of Linux OS and suggest the state of storage devices in remote Internet Home-page. The proposed method consists of a method of displaying images on the Internet home-page, so that it can be arranged freely when creating a home-page, and the image data is composed of external files, so it is bery convenient to edit and replace images In order to verify the effectiveness of the proposed method, we have confirmed that the state of each storage device can be confirmed at a long distance without any major addition to the Home-page configuration.

Verification of the Usefulness of the Mock TOEIC Test using Corpus Indices : Focusing on the Analysis of Difficulty and Discrimination (코퍼스 지표를 활용한 모의 토익시험의 유용성 검증 : 난이도와 변별도 분석을 중심으로)

  • Lee, Yena
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.10
    • /
    • pp.576-593
    • /
    • 2021
  • In this study, in order to investigate the factors that affect the percentage of correct answers and the degree of discrimination of the TOEIC test, a regression analysis was performed using corpus indicators that influence correct answer rate and the degree of discrimination for each part derived from the item analysis. The basic calculation word_length, consistency index LSA_overlap_adjacent_sentences, lexical diversity MTLD_VOCD, conjunction All_logical_causal_connectives_incidence, situational model casual_particles_causal_verbs_Ratio, syntactic complexity Left_embeddedness, and syntactic pattern density Infinitive_density were found to have negative effects. These factors that lower the correct answer rate can be utilized when setting learning goals. Vocabulary diversity index MTLD_VOCD, conjunction Additive_connectives_incidence, syntactic pattern density Infinitive_density, and lexical information person1_2_pronoun_incidence were found to have a positive effect. Factors influencing the increase in discrimination may provide important information for developing a learning program.