• Title/Summary/Keyword: N-gram

Search Result 573, Processing Time 0.025 seconds

Text Mining Analysis Technique on ECDIS Accident Report (텍스트 마이닝 기법을 활용한 ECDIS 사고보고서 분석)

  • Lee, Jeong-Seok;Lee, Bo-Kyeong;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.25 no.4
    • /
    • pp.405-412
    • /
    • 2019
  • SOLAS requires that ECDIS be installed on ships of more than 500 gross tonnage engaged in international navigation until the first inspection arriving after July 1, 2018. Several accidents related to the use of ECDIS have occurred with its installation as a new major navigation instrument. The 12 incident reports issued by MAIB, BSU, BEAmer, DMAIB, and DSB were analyzed, and the cause of accident was determined to be related to the operation of the navigator and the ECDIS system. The text was analyzed using the R-program to quantitatively analyze words related to the cause of the accident. We used text mining techniques such as Wordcloud, Wordnetwork and Wordweight to represent the importance of words according to their frequency of derivation. Wordcloud uses the N-gram model as a way of expressing the frequency of used words in cloud form. As a result of the uni-gram analysis of the N-gram model, ECDIS words were obtained the most, and the bi-gram analysis results showed that the word "Safety Contour" was used most frequently. Based on the bi-gram analysis, the causative words are classified into the officer and the ECDIS system, and the related words are represented by Wordnetwork. Finally, the related words with the of icer and the ECDIS system were composed of word corpus, and Wordweight was applied to analyze the change in corpus frequency by year. As a result of analyzing the tendency of corpus variation with the trend line graph, more recently, the corpus of the officer has decreased, and conversely, the corpus of the ECDIS system is gradually increasing.

Biophysical Studies Reveal Key Interactions between Papiliocin-Derived PapN and Lipopolysaccharide in Gram-Negative Bacteria

  • Durai, Prasannavenkatesh;Lee, Yeongjoon;Kim, Jieun;Jeon, Dasom;Kim, Yangmee
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.5
    • /
    • pp.671-678
    • /
    • 2018
  • Papiliocin, isolated from the swallowtail butterfly (Papilio xuthus), is an antimicrobial peptide with high selectivity against gram-negative bacteria. We previously showed that the N-terminal helix of papiliocin (PapN) plays a key role in the antibacterial and anti-inflammatory activity of papiliocin. In this study, we measured the selectivity of PapN against multidrug-resistant gram-negative bacteria, as well as its anti-inflammatory activity. Interactions between Trp2 of PapN and lipopolysaccharide (LPS), which is a major component of the outer membrane of gram-negative bacteria, were studied using the Trp fluorescence blue shift and quenching in LPS micelles. Furthermore, using circular dichroism, we investigated the interactions between PapN and LPS, showing that LPS plays critical roles in peptide folding. Our results demonstrated that Trp2 in PapN was buried deep in the negatively charged LPS, and Trp2 induced the ${\alpha}$-helical structure of PapN. Importantly, docking studies determined that predominant electrostatic interactions of positively charged arginine residues in PapN with phosphate head groups of LPS were key factors for binding. Similarly, hydrophobic interactions by aromatic residues of PapN with fatty acid chains in LPS were also significant for binding. These results may facilitate the development of peptide antibiotics with anti-inflammatory activity.

Structure-Activity Relationship of the N-terminal Helix Analog of Papiliocin, PapN

  • Jeon, Dasom;Jeong, Min-Cheol;Kim, Jin-Kyoung;Jeong, Ki-Woong;Ko, Yoon-Joo;Kim, Yangmee
    • Journal of the Korean Magnetic Resonance Society
    • /
    • v.19 no.2
    • /
    • pp.54-60
    • /
    • 2015
  • Papiliocin, from the swallowtail butterfly, Papilio xuthus, shows high bacterial cell selectivity against Gram-negative bacteria. Recently, we designed a 22mer analog with N-terminal helix from $Lys^3$ to $Ala^{22}$, PapN. It shows outstanding antimicrobial activity against Gram-negative bacteria with low toxicity against mammalian cells. In this study, we determined the 3-D structure of PapN in 300 mM DPC micelle using NMR spectroscopy and investigated the interactions between PapN and DPC micelles. The results showed that PapN has an amphipathic ${\alpha}$-helical structure from $Lys^3$ to $Lys^{21}$. STD-NMR and DOSY experiment showed that this helix is important in binding to the bacterial cell membrane. Furthermore, we tested antibacterial activities of PapN in the presence of salt for therapeutic application. PapN was calcium- and magnesium-resistant in a physiological condition, especially against Gram-negative bacteria, implying that it can be a potent candidate as peptide antibiotics.

N-terminal GNBP homology domain of Gram-negative binding protein 3 functions as a beta-1,3-glucan binding motif in Tenebrio molitor

  • Lee, Han-Na;Kwon, Hyun-Mi;Park, Ji-Won;Kurokawa, Kenji;Lee, Bok-Luel
    • BMB Reports
    • /
    • v.42 no.8
    • /
    • pp.506-510
    • /
    • 2009
  • The Toll signalling pathway in invertebrates is responsible for defense against Gram-positive bacteria and fungi, leading to the expression of antimicrobial peptides via NF-$\kappa$B-like transcription factors. Gram-negative binding protein 3 (GNBP3) detects beta-1,3-glucan, a fungal cell wall component, and activates a three step serine protease cascade for activation of the Toll signalling pathway. Here, we showed that the recombinant N-terminal domain of Tenebrio molitor GNBP3 bound to beta-1,3-glucan, but did not activate down-stream serine protease cascade in vitro. Reversely, the N-terminal domain blocked GNBP3-mediated serine protease cascade activation in vitro and also inhibited beta-1,3-glucan-mediated antimicrobial peptide induction in Tenebrio molitor larvae. These results suggest that the N-terminal GNBP homology domain of GNBP3 functions as a beta-1,3-glucan binding domain and the C-terminal domain of GNBP3 may be required for the recruitment of immediate down-stream serine protease zymogen during Toll signalling pathway activation.

Detecting Local Text Reuse in the Texts of East Asian Traditional Medicine (한의학 고문헌 텍스트에서의 인용문 추정과 탐색)

  • Oh, Junho
    • Journal of Korean Medical classics
    • /
    • v.34 no.1
    • /
    • pp.37-45
    • /
    • 2021
  • Objectives : The purpose of this paper was to examine quantitative methods for estimating and detecting local text reuse in the texts of East Asian Traditional Medicine. Methods : We introduce techniques that estimate the volume of local text reuse with n-gram and those that directly detect the reuse with the Smith-Waterman algorithm (SW algorithm). Based on this, the estimation and detection of local text reuse were carried out for 『Donguibogam』 and 『Huangdineijing·Suwen』. Results : Estimates with n-gram had more errors than methods with SW algorithms. SW algorithms detected suspected strings directly with local text reuse, resulting in more accurate results. Conclusions : Although n-gram does not accurately find local text reuse, its high speed makes it a preferable method for certain purposes, such as screening similar documents. On the other hand, SW algorithms have the advantage of being relatively good at finding similar phrases suspected as local text reuse even if the strings do not completely match. However, due to its excessive consumption of time and computing resources, its benefits are limited to cases where precise results are required.

Content-based Image Retrieval System Using Color $N{\times}M$-grams & CCV (Color $N{\times}M$-grams과 CCV를 이용한 내용기반 영상 검색 시스템)

  • 이은주;이상미;정성환
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 1998.04a
    • /
    • pp.40-45
    • /
    • 1998
  • 칼라 히스토그램의 단점을 보완할 수 있는 CCV(Color Coherence Vector) 방법이 소개되었다. CCV는 구현이 쉽고, 칼라 히스토그램과 달리 같은 색상 분포를 가지는 다른 영상을 구별하는 것이 가능하다. 그러나, CCV는 계산량이 많아 많은 처리 시간이 요구된다. 본 논문에서는 효율적인 계산을 위하여 N$\times$M-grams과 계층적인 검색 방법을 이용하여 처리 시간을 줄이는 검색 방법을 제시한다. 먼저, 영상의 구조적 특징을 잘 반영하는 N$\times$M-grams를 사용하여 주어진 질의 영상과 같은 부류(category)에 속하는 모든 영상들을 찾는다. 그리고, 찾은 영상들만을 대상으로 CCV를 계산하여 검색한다. 200개의 영상을 가지고 실험한 결과, 검색율은 약 79%이고, CCV만을 사용한 방법보다 시간이 약 37% 감소하였다.

  • PDF

The Utilization of Local Document Information to Improve Statistical Context-Sensitive Spelling Error Correction (통계적 문맥의존 철자오류 교정 기법의 향상을 위한 지역적 문서 정보의 활용)

  • Lee, Jung-Hun;Kim, Minho;Kwon, Hyuk-Chul
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.7
    • /
    • pp.446-451
    • /
    • 2017
  • The statistical context-sensitive spelling correction technique in this thesis is based upon Shannon's noisy channel model. The interpolation method is used for the improvement of the correction method proposed in the paper, and the general interpolation method is to fill the middle value of the probability by (N-1)-gram and (N-2)-gram. This method is based upon the same statistical corpus. In the proposed method, interpolation is performed using the frequency information between the statistical corpus and the correction document. The advantages of using frequency of correction documents are twofold. First, the probability of the coined word existing only in the correction document can be obtained. Second, even if there are two correction candidates with ambiguous probability values, the ambiguity is solved by correcting them by referring to the correction document. The method proposed in this thesis showed better precision and recall than the existing correction model.

A study on the Filtering of Spam E-mail using n-Gram indexing and Support Vector Machine (n-Gram 색인화와 Support Vector Machine을 사용한 스팸메일 필터링에 대한 연구)

  • 서정우;손태식;서정택;문종섭
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.14 no.2
    • /
    • pp.23-33
    • /
    • 2004
  • Because of a rapid growth of internet environment, it is also fast increasing to exchange message using e-mail. But, despite the convenience of e-mail, it is rising a currently bi9 issue to waste their time and cost due to the spam mail in an individual or enterprise. Many kinds of solutions have been studied to solve harmful effects of spam mail. Such typical methods are as follows; pattern matching using the keyword with representative method and method using the probability like Naive Bayesian. In this paper, we propose a classification method of spam mails from normal mails using Support Vector Machine, which has excellent performance in pattern classification problems, to compensate for the problems of existing research. Especially, the proposed method practices efficiently a teaming procedure with a word dictionary including a generated index by the n-Gram. In the conclusion, we verified the proposed method through the accuracy comparison of spm mail separation between an existing research and proposed scheme.

A Study on Machine Learning Based Anti-Analysis Technique Detection Using N-gram Opcode (N-gram Opcode를 활용한 머신러닝 기반의 분석 방지 보호 기법 탐지 방안 연구)

  • Kim, Hee Yeon;Lee, Dong Hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.181-192
    • /
    • 2022
  • The emergence of new malware is incapacitating existing signature-based malware detection techniques., and applying various anti-analysis techniques makes it difficult to analyze. Recent studies related to signature-based malware detection have limitations in that malware creators can easily bypass them. Therefore, in this study, we try to build a machine learning model that can detect and classify the anti-analysis techniques of packers applied to malware, not using the characteristics of the malware itself. In this study, the n-gram opcodes are extracted from the malicious binary to which various anti-analysis techniques of the commercial packers are applied, and the features are extracted by using TF-IDF, and through this, each anti-analysis technique is detected and classified. In this study, real-world malware samples packed using The mida and VMProtect with multiple anti-analysis techniques were trained and tested with 6 machine learning models, and it constructed the optimal model showing 81.25% accuracy for The mida and 95.65% accuracy for VMProtect.

Language Modeling based on Inter-Word Dependency Relation (단어간 의존관계에 기반한 언어모델링)

  • Lee, Seung-Mi;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 1998.10c
    • /
    • pp.239-246
    • /
    • 1998
  • 확률적 언어모델링은 일련의 단어열에 문장확률값을 적용하는 기법으로서 음성인식, 확률적 기계번역 등의 많은 자연언어처리 응용시스템의 중요한 한 요소이다. 기존의 접근방식으로는 크게 n-gram 기반, 문법 기반의 두가지가 있다. 일반적으로 n-gram 방식은 원거리 의존관계를 잘 표현 할 수 없으며 문법 기반 방식은 광범위한 커버리지의 문법을 습득하는데에 어려움을 가지고 있다. 본 논문에서는 일종의 단순한 의존문법을 기반으로 하는 언어모델링 기법을 제시한다. 의존문법은 단어와 단어 사이의 지배-피지배 관계로 구성되며 본 논문에서 소개되는 의존문법 재추정 알고리즘을 이용하여 원시 코퍼스로부터 자동적으로 학습된다. 실험 결과, 제시된 의존관계기반 모델이 tri-gram, bi-gram 모델보다 실험코퍼스에 대해서 약 11%에서 11.5%의 엔트로피 감소를 보임으로써 성능의 개선이 있었다.

  • PDF