• 제목/요약/키워드: n-gram similarity

검색결과 32건 처리시간 0.023초

Score Image Retrieval to Inaccurate OMR performance

  • Kim, Haekwang
    • 방송공학회논문지
    • /
    • 제26권7호
    • /
    • pp.838-843
    • /
    • 2021
  • This paper presents an algorithm for effective retrieval of score information to an input score image. The originality of the proposed algorithm is that it is designed to be robust to recognition errors by an OMR (Optical Music Recognition), while existing methods such as pitch histogram requires error induced OMR result be corrected before retrieval process. This approach helps people to retrieve score without training on music score for error correction. OMR takes a score image as input, recognizes musical symbols, and produces structural symbolic notation of the score as output, for example, in MusicXML format. Among the musical symbols on a score, it is observed that filled noteheads are rarely detected with errors with its simple black filled round shape for OMR processing. Barlines that separate measures also strong to OMR errors with its long uniform length vertical line characteristic. The proposed algorithm consists of a descriptor for a score and a similarity measure between a query score and a reference score. The descriptor is based on note-count, the number of filled noteheads in a measure. Each part of a score is represented by a sequence of note-count numbers. The descriptor is an n-gram sequence of the note-count sequence. Simulation results show that the proposed algorithm works successfully to a certain degree in score image-based retrieval for an erroneous OMR output.

Modern Methods of Text Analysis as an Effective Way to Combat Plagiarism

  • Myronenko, Serhii;Myronenko, Yelyzaveta
    • International Journal of Computer Science & Network Security
    • /
    • 제22권8호
    • /
    • pp.242-248
    • /
    • 2022
  • The article presents the analysis of modern methods of automatic comparison of original and unoriginal text to detect textual plagiarism. The study covers two types of plagiarism - literal, when plagiarists directly make exact copying of the text without changing anything, and intelligent, using more sophisticated techniques, which are harder to detect due to the text manipulation, like words and signs replacement. Standard techniques related to extrinsic detection are string-based, vector space and semantic-based. The first, most common and most successful target models for detecting literal plagiarism - N-gram and Vector Space are analyzed, and their advantages and disadvantages are evaluated. The most effective target models that allow detecting intelligent plagiarism, particularly identifying paraphrases by measuring the semantic similarity of short components of the text, are investigated. Models using neural network architecture and based on natural language sentence matching approaches such as Densely Interactive Inference Network (DIIN), Bilateral Multi-Perspective Matching (BiMPM) and Bidirectional Encoder Representations from Transformers (BERT) and its family of models are considered. The progress in improving plagiarism detection systems, techniques and related models is summarized. Relevant and urgent problems that remain unresolved in detecting intelligent plagiarism - effective recognition of unoriginal ideas and qualitatively paraphrased text - are outlined.

ISOLATION, IDENTIFICATION AND CHARACTERIZATION OF AN IMMOBILIZED BACTERIUM PRODUCING N2 FROM NH4+ UNDER AN AEROBIC CONDITION

  • Park, Kyoung-Joo;Cho, Kyoung-Sook;Kim, Jeong-Bo;Lee, Min-Gyu;Lee, Byung-Hun;Hong, Young-Ki;Kim, Joong-Kyun
    • Environmental Engineering Research
    • /
    • 제10권5호
    • /
    • pp.213-226
    • /
    • 2005
  • To treat wastewater efficiently by a one-step process of nitrogen removal, a new bacterial strain producing $N_2$ gas from ${NH_4}^+$ under an aerobic condition was isolated and identified. The cell was motile and a Gram-negative rod, and usually occurred in pairs. By 16S-rDNA analysis, the isolated strain was identified as Enterobacter asburiae with 96% similarity. The isolate showed that the capacity of $N_2$ production under an oxic condition was approximately three times higher than that under an anoxic condition. Thus, the consumption of ${NH_4}^+$ by the isolate was significantly different in the metabolism of $N_2$ production under the two different environmental conditions. The optimal conditions of the immobilized isolate for $N_2$ production were found to be pH 7.0, $30^{\circ}C$ and C/N ratio 5, respectively. Under all the optimum reaction conditions, $N_2$ production by the immobilized isolate resulted in reduction of ORP with both the consumption of DO and the drop of pH. The removal efficiencies of $COD_{Cr}$, and TN were 56.1 and 60.9%, respectively. The removal rates of $COD_{Cr}$, and TN were the highest for the first 2.5 hrs with the removal $COD_{Cr}/TN$ ratios of 32.1, and afterwards the rates decreased as reaction proceeded. For application of the immobilized isolate to a practical process of ammonium removal, a continuous operation was executed with a synthetic medium of a low C/N ratio. The continuous bioreactor system exhibited a satisfactory performance at 12.1 hrs of HRT, in which the effluent concentrations of ${NH_4}^+$-N was measured to be 15.4 mg/L with its removal efficiency of 56.0%. The maximum removal rate of ${NH_4}^+$-N reached 1.6 mg ${NH_4}^+$-N/L/hr at 12.1 hrs of HRT(with N loading rate of $0.08\;Kg-N/m^3$-carrier/d). As a result, the application of the immobilized isolate appears a viable alternative to the nitrification-denitrification processes.

허밍 질의 처리 시스템의 성능 향상을 위한 효율적인 빈번 멜로디 인덱싱 방법 (An Efficient Frequent Melody Indexing Method to Improve Performance of Query-By-Humming System)

  • 유진희;박상현
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제34권4호
    • /
    • pp.283-303
    • /
    • 2007
  • 최근 방대한 양의 음악데이타를 효율적으로 저장하고 검색하기 위한 방법의 필요성이 증대되고 있다. 현재 음악 데이타 검색에서 가장 일반적으로 쓰이는 방법은 텍스트 기반의 검색 방법이다. 그러나 이러한 방법은 사용자가 키워드를 기억하지 못할 경우 검색이 어려울 뿐만 아니라 키워드와 정확하게 일치하는 정보만 검색해 주기 때문에 유사한 내용을 가진 정보를 검색하기에 부적절하다. 이러한 문제점을 해결하기 위해 본 논문에서는 내용 기반 인덱싱 방법(Content-Based Indexing Method)을 사용하여 사용자가 부정확한 멜로디(Humming)로 질의하였을 경우라도 원하는 음악을 효율적으로 찾아주는 허밍 질의처리 시스템(Query-By-Humming System)을 설계한다. 이를 위해 방대한 음악 데이타베이스에서 한 음악을 대표하는 의미 있는 멜로디를 추출하여 인덱싱하는 방법을 제안한다. 본 논문에서는 이러한 의미 있는 멜로디를 사용자가 자주 질의할 가능성이 높은 멜로디로서 하나의 음악에서 여러 번 나타나는 반면 멜로디와 긴 쉼표 후에 시작되는 쉼표 단위 멜로디로 정의한다. 실험을 통해 사용자들이 이들 멜로디를 자주 질의한다는 가정을 증명하였다. 본 논문은 성능 향상을 위한 3가지 방법을 제안한다. 첫 번째는 검색속도를 높이기 위해 인덱스에 저장할 멜로디를 문자열 형태로 변환한다. 이때 사용되는 문자 변환 방법은 허밍에 포함된 에러를 허용한 방법으로써 검색 결과의 정확도를 높일 수 있다. 두 번째는 사용자가 자주 질의할 가능성이 높은 의미 있는 멜로디를 인덱싱 하여 검색 속도를 높이고자 한다. 이를 위해 신뢰도가 높은 의미 있는 멜로디를 생성하는 빈번 멜로디 추출 알고리즘과 쉼표 단위 멜로디 추출 방법을 제안한다. 세 번째로는 정확도를 향상시키기 위한 3단계 검색 방법을 제안한다. 이는 데이타베이스 접근을 최소화하여 정확한 검색 결과를 얻기 위하여 제안되었다. 또한 기존 허밍 질의 처리 시스템의 대표적인 인덱싱 방법으로 제안되었던 N-gram 방법과의 성능 비교를 통해 본 논문이 제안하는 방법의 성능이 보다 더 향상되었음을 검증하였다.

키워드 출현 빈도 분석과 CONCOR 기법을 이용한 ICT 교육 동향 분석 (Analysis of ICT Education Trends using Keyword Occurrence Frequency Analysis and CONCOR Technique)

  • 이영석
    • 산업융합연구
    • /
    • 제21권1호
    • /
    • pp.187-192
    • /
    • 2023
  • 본 연구는 기계학습의 키워드 출현 빈도 분석과 CONCOR(CONvergence of iteration CORrealtion) 기법을 통한 ICT 교육에 대한 흐름을 탐색한다. 2018년부터 현재까지의 등재지 이상의 논문을 'ICT 교육'의 키워드로 구글 스칼라에서 304개 검색하였고, 체계적 문헌 리뷰 절차에 따라 ICT 교육과 관련이 높은 60편의 논문을 선정하면서, 논문의 제목과 요약을 중심으로 키워드를 추출하였다. 단어 빈도 및 지표 데이터는 자연어 처리의 TF-IDF를 통한 빈도 분석, 동시 출현 빈도의 단어를 분석하여 출현 빈도가 높은 49개의 중심어를 추출하였다. 관계의 정도는 단어 간의 연결 구조와 연결 정도 중심성을 분석하여 검증하였고, CONCOR 분석을 통해 유사성을 가진 단어들로 구성된 군집을 도출하였다. 분석 결과 첫째, '교육', '연구', '결과', '활용', '분석'이 주요 키워드로 분석되었다. 둘째, 교육을 키워드로 N-GRAM 네트워크 그래프를 진행한 결과 '교육과정', '활용'이 가장 높은 단어의 관계로 나타났다. 셋째, 교육을 키워드로 군집분석을 한 결과, '교육과정', '프로그래밍', '학생', '향상', '정보'의 5개 군이 형성되었다. 이러한 연구 결과를 바탕으로 ICT 교육 동향의 분석 및 트렌드 파악을 토대로 ICT 교육에 필요한 실질적인 연구를 수행할 수 있을 것이다.

Isolation of Novel Alkalophilic Bacillus alcalophilus subsp. YB380 and the Characteristics of Its Yeast Cell Wall Hydrolase

  • Yeo, Ik-Hyun;Han, Suk-Kyun;Yu, Ju-Hyun;Bai, Dong-Hoon
    • Journal of Microbiology and Biotechnology
    • /
    • 제8권5호
    • /
    • pp.501-508
    • /
    • 1998
  • An alkalophilic mi.croorganism (strain YB380), which produces yeast cell wall hydrolase extracellulary, was isolated from Korean soil. The rod-shaped cells were 0.3~0.4 by 2~4${\mu}{\textrm}{m}$ long, motile, aerobic, gram-positive, and spore-forming. The color of the colony was light yellow. The temperature range for growth at pH 9.0 was 25 to $45{\circ}C, with optimum growth at $35{\circ}C. The pH range for growth at $35{\circ}C was 8 to 11 with an optimum pH of 9.0. Therefore, the strain YB380 is an obligate alkalophile. The 16S rRNA of strain YB380 has a 99% sequence similarity with that of Bacillus alcalophilus. On the basis of physiological properties, cell wall fatty acid composition, and phylogenetic analysis, we propose that the isolated strain is Bacillus alcalophilus. The yeast cell wall hydrolase from Bacillus alcalophilus subsp. YB380 has been purified and partially characterized. The molecular weight was estimated to be 27,000 daltons with an optimum temperature and pH of $60{\circ}C and 9.0, respectively. The N-terminal amino acid sequence of the enzyme was analyzed as Gln- Thr- Val- Pro- Trp- Gly- Ile- Asn- Arg- Val.

  • PDF

Sphingobacterium composti sp. nov., a Novel DNase-Producing Bacterium Isolated from Compost

  • Ten Leonid N.;Liu, Qing-Mei;Im Wan-Taek;Aslam Zubair;Lee, Sung-Taik
    • Journal of Microbiology and Biotechnology
    • /
    • 제16권11호
    • /
    • pp.1728-1733
    • /
    • 2006
  • A Gram-negative, strictly aerobic, nonmotile, and nonspore-forming bacterial strain, designated $T5-12^T$, was isolated from compost and characterized using a polyphasic taxonomical approach. The isolate was positive for catalase and oxidase tests. It could degrade DNA, but was negative for degradation of macromolecules such as casein, collagen, starch, chitin, cellulose, and xylan. The DNA G+C content was 36.0 mol%. The predominant isoprenoid quinone was menaquinone 7 (MK-7). The major fatty acids were $iso-C_{15:0}$ (45.6%), $iso-C_{17:0}$ 3OH (17.2%), and summed feature 4 ($C_{16:0}\;{\omega}7c$ and/or $iso-C_{15:0}$ 2OH, 14.9%). Comparative 16S rRNA gene sequence analysis showed that strain $T5-12^T$ fell within the radiation of the cluster comprising members of the genus Sphingobacterium. Strain $T5-12^T$ exhibited lower than 94% of 16S rRNA gene sequence similarity with respect to the type strains of recognized Sphingobacterium species. On the basis of its phenotypic properties and phylogenetic distinctiveness, strain $T5-12^T$ ($=KCTC\;12578^T=LMG\;23401^T=CCUG\;52467^T$) should be classified in the genus Sphingobacterium as the type strain of a novel species, for which the name Sphingobacterium composti sp. novo is proposed.

A report of 42 unrecorded actinobacterial species in Korea

  • Lee, Na-Young;Cha, Chang-Jun;Im, Wan-Taek;Kim, Seung-Bum;Seong, Chi-Nam;Bae, Jin-Woo;Jahng, Kwang Yeop;Cho, Jang-Cheon;Joh, Kiseong;Jeon, Che Ok;Yi, Hana;Lee, Soon Dong
    • Journal of Species Research
    • /
    • 제7권1호
    • /
    • pp.36-49
    • /
    • 2018
  • During a study to discover indigenous prokaryotic species in Korea in 2016, a total of 42 actinobacterial isolates were recovered from various environmental samples collected from natural cave, squid, sewage, sea water, trees, droppings of birds, freshwater, eelgrass, mud flat, sediment and soil. On the basis of a tight phylogenetic clade with the closest species and high level of 16S rRNA gene sequence similarity, it was shown that each isolate was assigned to independent and previously described bacterial species which were assigned to the phylum Actinobacteria. The following 42 species have not been reported in Korea: eight species in two genera n the order Corynebacteriales, 26 species of 16 genera in the Micrococcales, one species of one genus in the Micromonosporales, one species of one genus in the Propionibacteriales, four species of two genera in the Streptomycetales and two species of two genera in the Streptosporangiale. Cell morphology, Gram staining reaction, colony colors and features, the media and conditions of incubation, physiological and biochemical characteristics, origins of isolation and strain IDs of 42 unrecorded actinobacterial species are presented in the species description.

Novosphingobium ginsenosidimutans sp. nov., with the Ability to Convert Ginsenoside

  • Kim, Jin-Kwang;He, Dan;Liu, Qing-Mei;Park, Hye-Yoon;Jung, Mi-Sun;Yoon, Min-Ho;Kim, Sun-Chang;Im, Wan-Taek
    • Journal of Microbiology and Biotechnology
    • /
    • 제23권4호
    • /
    • pp.444-450
    • /
    • 2013
  • A Gram-negative, strictly aerobic, non-motile, non-spore-forming, and rod-shaped bacterial strain designated FW-$6^T$ was isolated from a freshwater sample and its taxonomic position was investigated by using a polyphasic approach. Strain FW-$6^T$ grew optimally at $10-42^{\circ}C$ and at pH 7.0 on nutrient and R2A agar. Strain FW-$6^T$ displayed ${\beta}$-glucosidase activity that was responsible for its ability to transform ginsenoside $Rb_1$ (one of the dominant active components of ginseng) to Rd. On the basis of 16S rRNA gene sequence similarity, strain FW-$6^T$ was shown to belong to the family Sphingomonadaceae and was related to Novosphingobium aromaticivorans DSM $12444^T$ (98.1% sequence similarity) and N. subterraneum IFO $16086^T$ (98.0%). The G+C content of the genomic DNA was 64.4%. The major menaquinone was Q-10 and the major fatty acids were summed feature 7 (comprising $C_{18:1}{\omega}9c/{\omega}12t/{\omega}7c$), summed feature 4 (comprising $C_{16:1}{\omega}7c/iso-C_{15:0}2OH$), $C_{16:0}$, and $C_{14:0}$ 2OH. DNA and chemotaxonomic data supported the affiliation of strain FW-$6^T$ to the genus Novosphingobium. Strain FW-$6^T$ could be differentiated genotypically and phenotypically from the recognized species of the genus Novosphingobium. The isolate that has ginsenoside converting ability therefore represents a novel species, for which the name Novosphingobium ginsenosidimutans sp. nov. is proposed, with the type strain FW-$6^T$ (= KACC $16615^T$ = JCM $18202^T$).

참담치(Mytilus coruscus) 혈구(hemocyte)에서 분리한 McSSP-31의 항균 특성 분석 (The Antimicrobial Characteristics of McSSP-31 Purified from the Hemocyte of the Hard-shelled Mussel, Mytilus coruscus)

  • 오륜경;이민정;김영옥;남보혜;공희정;김주원;박중연;서정길;김동균
    • 생명과학회지
    • /
    • 제27권11호
    • /
    • pp.1276-1289
    • /
    • 2017
  • 참담치 hemocyte에 존재하는 항균 펩타이드를 역상 HPLC column을 사용하여 분리 및 정제하였다. 정제된 펩타이드는 matrix-assisted laser desorption ionization time-of-flight mass spectrophotometer (MALDI-TOF/MS) 분석을 통해 분자량이 3330.549 Da이며, edman 분해법을 통해 14개의 N-말단 아미노산 서열을 확보하였다. 분석한 N-말단 서열은 M. californianus의 sperm-specific protein Phi-1과 protamine-like PL-III protein과 각각 93%와 87%의 유사도를 나타냈으며, M. edulis의 sperm-specific protein Phi-1과 87% 일치함을 확인하였다. 또한 open-reading frame (ORF)은 306 bp의 길이에 101개의 아미노산을 코딩하고 있음을 밝혔으며, 이는 M. californianus의 sperm-specific protein Phi-1와 93.5% 유사하였다. 분자량과 아미노산 서열에 근거하여 31개 아미노산으로 구성된 펩타이드를 합성하였으며 이는 그람 양성균인 B. subtilis, S. mutans, S. aureus와 그람 음성균인 E. coli, K. pneumoniae, P. mirabilis, P. aeruginosa 그리고 진균류인 C. albicans에 항균 활성을 보였다. 합성한 펩타이드는 항생제 내성균주인 S. aureus CCARM 0203와 S. aureus CCARM 0204에 항균 활성을 보였다. 합성 항균 펩타이드는 넙치 혈장에 대한 용혈현상은 없었고, 세포독성을 확인한 결과 HUVEC cell line에 전혀 독성을 보이지 않았다. 본 연구결과, 참담치의 혈구로부터 분리 및 정제한 sperm-specific protein 유래 항균 펩타이드는 다양한 균주에 항균 활성을 보였고 낮은 세포독성을 가졌으며, 이러한 특성은 본 실험에서 분리한 항균 펩타이드가 항생제 대체재로서 개발 가능성을 제시하고 있다.