• Title/Summary/Keyword: edit distance algorithm

Search Result 23, Processing Time 0.017 seconds

Sentence Similarity Measurement Method Using a Set-based POI Data Search (집합 기반 POI 검색을 이용한 문장 유사도 측정 기법)

  • Ko, EunByul;Lee, JongWoo
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.12
    • /
    • pp.711-716
    • /
    • 2014
  • With the gradual increase of interest in plagiarism and intelligent file content search, the demand for similarity measuring between two sentences is increasing. There is a lot of researches for sentence similarity measurement methods in various directions such as n-gram, edit-distance and LSA. However, these methods have their own advantages and disadvantages. In this paper, we propose a new sentence similarity measurement method approaching from another direction. The proposed method uses the set-based POI data search that improves search performance compared to the existing hard matching method when data includes the inverse, omission, insertion and revision of characters. Using this method, we are able to measure the similarity between two sentences more accurately and more quickly. We modified the data loading and text search algorithm of the set-based POI data search. We also added a word operation algorithm and a similarity measure between two sentences expressed as a percentage. From the experimental results, we observe that our sentence similarity measurement method shows better performance than n-gram and the set-based POI data search.

Construction of Linearly Aliened Corpus Using Unsupervised Learning (자율 학습을 이용한 선형 정렬 말뭉치 구축)

  • Lee, Kong-Joo;Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.387-394
    • /
    • 2004
  • In this paper, we propose a modified unsupervised linear alignment algorithm for building an aligned corpus. The original algorithm inserts null characters into both of two aligned strings (source string and target string), because the two strings are different from each other in length. This can cause some difficulties like the search space explosion for applications using the aligned corpus with null characters and no possibility of applying to several machine learning algorithms. To alleviate these difficulties, we modify the algorithm not to contain null characters in the aligned source strings. We have shown the usability of our approach by applying it to different areas such as Korean-English back-trans literation, English grapheme-phoneme conversion, and Korean morphological analysis.

Classification of Porcine Wasting Diseases Using Sound Analysis

  • Gutierrez, W.M.;Kim, S.;Kim, D.H.;Yeon, S.C.;Chang, H.H.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.23 no.8
    • /
    • pp.1096-1104
    • /
    • 2010
  • This bio-acoustic study was aimed at classifying the different porcine wasting diseases through sound analysis with emphasis given to differences in the acoustic footprints of coughs in porcine circo virus type 2 (PCV2), porcine reproductive and respiratory syndrome (PRRS) virus and Mycoplasma hyopneumoniae (MH) - infected pigs from a normal cough. A total of 36 pigs (Yorkshire${\times}$Landrace${\times}$Duroc) with average weight ranging between 25-30 kg were studied, and blood samples of the suspected infected pigs were collected and subjected to serological analysis to determine PCV2, PRRS and MH. Sounds emitted by coughing pigs were recorded individually for 30 minutes depending on cough attacks by a digital camcorder placed within a meter distance from the animal. Recorded signals were digitalized in a PC using the Cool Edit Program, classified through labeling method, and analyzed by one-way analysis of variance and discriminant analysis. Input features after classification showed that normal cough had the highest pitch level compared to other infectious diseases (p<0.002) but not statistically different from PRRS and MH. PCV2 differed statistically (p<0.002) from the normal cough and PRRS but not from MH. MH had the highest intensity and all coughs differed statistically from each other (p<0.0001). PCV2 was statistically different from others (p<0.0001) in formants 1, 2, 3 and 4. There was no statistical difference in duration between different porcine diseases and the normal cough (p>0.6863). Mechanisms of cough sound creation in the airway could be used to explain these observed acoustic differences and these findings indicated that the existence of acoustically different cough patterns depend on causes or the animals' respiratory system conditions. Conclusively, differences in the status of lungs results in different cough sounds. Finally, this study could be useful in supporting an early detection method based on the on-line cough counter algorithm for the initial diagnosis of sick animals in breeding farms.