• Title/Summary/Keyword: string algorithms

Search Result 105, Processing Time 0.036 seconds

An Index Data Structure for String Search in External Memory (외부 메모리에서 문자열을 효율적으로 탐색하기 위한 인덱스 자료 구조)

  • Na, Joong-Chae;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.598-607
    • /
    • 2005
  • We propose a new external-memory index data structure, the Suffix B-tree. The Suffix B-tree is a B-tree in which the key is a string like the String B-tree. While the node in the String B-tree is implemented with a Patricia trio, the node in the Suffix B-tree is implemented with an array. So the Suffix B-tree is simpler and easier to be Implemented than the String B-tree. Nevertheless, the branching algorithm of the Suffix B-tree is as efficient as that of the String B-tree. Consequently, the Suffix B-tree takes the same worst-case disk accesses as the String B-tree to solve the string matching problem, which is fundamental and important in the area of string algorithms.

Constant Time RMESH Algorithm for Computing Longest Common Substring and Maximal Repeat of String (문자열의 최장 공통 부분문자열과 최대 반복자를 구하기 위한 상수시간 RMESH 알고리즘)

  • Han, Seon-Mi;Woo, Jin-Woon
    • The KIPS Transactions:PartA
    • /
    • v.16A no.5
    • /
    • pp.319-326
    • /
    • 2009
  • Since string operations were applied to computational biology area, various data structures and algorithms for computing efficient string operations have been studied. The longest common substring problem is an operation to find the longest matching substring in more than two strings, and maximal repeat of string problem is an operation to find substrings repeated more than once in the given string. These operations are importantly used in the string processing area such as pattern matching and likelihood measurement. In this paper, we present algorithms to compute the longest common substring of two strings and to find the maximal repeat of string using three-dimensional $n{\times}n{\times}n$ processors on RMESH(Reconfigurable MESH). Our algorithms have O(1) time complexity.

Efficient External Memory Algorithm for Finding the Maximum Suffix of a String (스트링의 최대 서픽스를 계산하는 효율적인 외부 메모리 알고리즘)

  • Kim, Sung-Kwon;Kim, Soo-Cheol;Cho, Jung-Sik
    • The KIPS Transactions:PartA
    • /
    • v.15A no.4
    • /
    • pp.239-242
    • /
    • 2008
  • We study the problem of finding the maximum suffix of a string on the external memory model of computation with one disk. In this model, we are primarily interested in designing algorithms that reduce the number of I/Os between the disk and the internal memory. A string of length N has N suffixes and among these, the lexicographically largest one is called the maximum suffix of the string. Finding the maximum suffix of a string plays a crucial role in solving some string problems. In this paper, we present an external memory algorithm for computing the maximum suffix of a string of length N. The algorithm uses four blocks in the internal memory and performs at most 4(N/L) disk I/Os, where L is the size of a block.

Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms

  • Galbadrakh, Bulgan;Lee, Kyung-Eun;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.10 no.4
    • /
    • pp.266-270
    • /
    • 2012
  • Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.

Photovoltaic Multi-string PCS with a Grid-connection (계통연계형 멀티스트링 태양광 발전 시스템)

  • Kwon, Jung-Min;Kim, Eung-Ho;Nam, Kwang-Hee;Kwon, Bong-Hwan
    • New & Renewable Energy
    • /
    • v.3 no.4
    • /
    • pp.69-76
    • /
    • 2007
  • In this paper, a PV multi-string PCS with a grid-connection is proposed. An improved MPPT algorithm for the PV multi-string PCS is suggested. Each PV string has its own MPP tracker and the proposed MPPT algorithm prevents LMPP tracking due to power ripple. In the PV PCS with single-phase inverter has a large current ripple at twice the grid frequency. The current ripple reduction algorithm without external component is suggested. Also, this paper proposes a simple control method to achieve sharing of the PV string voltage and current among the interleaved parallel boost converters. All algorithms and controllers are implemented on a single-chip microcontroller. Experimental results obtained on a 3kW prototype show high performance of the proposed PV multi-string PCS.

  • PDF

Photovoltaic Multi-string PCS with a Grid-connection (계통연계형 멀티스트링 태양광 발전 시스템)

  • Kwon, Jung-Min;Kim, Eung-Ho;Kwon, Bong-Hwan
    • 한국신재생에너지학회:학술대회논문집
    • /
    • 2007.11a
    • /
    • pp.255-258
    • /
    • 2007
  • In this paper, a PV multi-string PCS with a grid-connection is proposed. An improved MPPT algorithm for the PV multi-string PCS is suggested. Each PV string has its own MPP tracker and the proposed MPPT algorithm prevents LMPP tracking due to power ripple. In the PV PCS with single-phase inverter has a large current ripple at twice the grid frequency. The current ripple reduction algorithm without external component is suggested. Also, this paper proposes a simple control method to achieve sharing of the PV string voltage and current among the interleaved parallel boost converters. All algorithms and controllers are implemented on a single-chip microcontroller. Experimental results obtained on a 3kW prototype show high performance of the proposed PV multi-string PCS.

  • PDF

String Matching Algorithms for Real-time Intrusion Detection and Response (실시간 침입 탐지 및 대응을 위한 String Matching 알고리즘 개발)

  • 김주엽;김준기;한나래;강성훈;이상후;예홍진
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04a
    • /
    • pp.970-972
    • /
    • 2004
  • 최근 들어 웜 바이러스의 출현과 더불어, 인터넷 대란과 같은 서비스 거부 공격의 피해 사례가 급증하고 있다. 이에 따라 네트워크 보안이 많은 관심을 받고 있는데, 보안의 여러 분야 가운데에서도 특히 침입탐지와 대응에 관한 연구가 활발히 이루어지고 있다. 또한 이러한 작업들을 자동화하기 위한 도구들이 개발되고 있지만 그 정확성이 아직 신뢰할 만한 수준에 이르지 못하고 있는 것이 지금의 현실이다. 본 논문에서는 이벤트 로그를 분석하여 침입 패턴을 예측하고, 이를 기반으로 자동화된 침입 탐지 및 대응을 구현할 수 있는 String Matching 알고리즘을 제안하고자 한다.

  • PDF

Robust Quick String Matching Algorithm for Network Security (네트워크 보안을 위한 강력한 문자열 매칭 알고리즘)

  • Lee, Jong Woock;Park, Chan Kil
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.9 no.4
    • /
    • pp.135-141
    • /
    • 2013
  • String matching is one of the key algorithms in network security and many areas could be benefit from a faster string matching algorithm. Based on the most efficient string matching algorithm in sual applications, the Boyer-Moore (BM) algorithm, a novel algorithm called RQS is proposed. RQS utilizes an improved bad character heuristic to achieve bigger shift value area and an enhanced good suffix heuristic to dramatically improve the worst case performance. The two heuristics combined with a novel determinant condition to switch between them enable RQS achieve a higher performance than BM both under normal and worst case situation. The experimental results reveal that RQS appears efficient than BM many times in worst case, and the longer the pattern, the bigger the performance improvement. The performance of RQS is 7.57~36.34% higher than BM in English text searching, 16.26~26.18% higher than BM in uniformly random text searching, and 9.77% higher than BM in the real world Snort pattern set searching.

Inverted Index based Modified Version of KNN for Text Categorization

  • Jo, Tae-Ho
    • Journal of Information Processing Systems
    • /
    • v.4 no.1
    • /
    • pp.17-26
    • /
    • 2008
  • This research proposes a new strategy where documents are encoded into string vectors and modified version of KNN to be adaptable to string vectors for text categorization. Traditionally, when KNN are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the supervised learning algorithms adaptable to string vectors for text categorization.

Comparing String Similarity Algorithms for Recognizing Task Names Found in Construction Documents (문자열 유사도 알고리즘을 이용한 공종명 인식의 자연어처리 연구 - 공종명 문자열 유사도 알고리즘의 비교 -)

  • Jeong, Sangwon;Jeong, Kichang
    • Korean Journal of Construction Engineering and Management
    • /
    • v.21 no.6
    • /
    • pp.125-134
    • /
    • 2020
  • Natural language encountered in construction documents largely deviates from those that are recommended by the authorities. Such practice that is lacking in coherence will discourage integrated research with automation, and it will hurt the productivity in the industry for the long run. This research aims to compare multiple string similarity (string matching) algorithms to compare each algorithm's performance in recognizing the same task name written in multiple different ways. We also aim to start a debate on how prevalent the aforementioned deviation is. Finally, we composed a small dataset that associates construction task names found in practice with the corresponding task names that are less cluttered w.r.t their formatting. We expect that this dataset can be used to validate future natural language processing approaches.