• Title/Summary/Keyword: string search

Search Result 73, Processing Time 0.029 seconds

The Fuzzy Modeling by Virus-messy Genetic Algorithm (바이러스-메시 유전 알고리즘에 의한 퍼지 모델링)

  • 최종일;이연우;주영훈;박진배
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.11a
    • /
    • pp.157-160
    • /
    • 2000
  • This paper deals with the fuzzy modeling for the complex and uncertain system in which conventional and mathematical models may fail to give satisfactory results. mGA(messy Genetic Algorithm) has more effective and adaptive structure than sGA with respect to using changeable-length string and VEGA(Virus Evolution Genetic) Algorithm) can search the global and local optimal solution simultaneously with reverse transcription operator and transduction operator. Therefore in this paper, the optimal fuzzy model is obtained using Virus-messy Genetic Algorithm(Virus-mGA). In this method local information is exchanged in population so that population may sustain genetic divergence. To prove the surperioty of the proposed approach, we provide the numerical example.

  • PDF

Analysis of Partnering Strategies in Symbiotic Evolutionary Algorithms (공생진화 알고리듬에서의 공생파트너 선택전략 분석)

  • 김재윤;김여근;신태호
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.25 no.4
    • /
    • pp.67-80
    • /
    • 2000
  • Symbiotic evolutionary algorithms, also called cooperative coevolutionary algorithms, are stochastic search algorithms that imitate the biological coevolution process through symbiotic interactions. In the algorithms, the fitness evaluation of an individual required first selecting symbiotic partners of the individual. Several partner selection strategies are provided. The goal of this study is to analyze how much partnering strategies can influence the performance of the algorithms. With two types of test-bed problems: the NKC model and the binary string covering problem, extensive experiments are carried out to compare the performance of partnering strategies, using the analysis of variance. The experimental results indicate that there does not exist statistically significant difference in their performance.

  • PDF

A Fast Algorithm for the k-Keyword Ordered Proximity Problem (순서를 고려하는 k-키워드 근접도 문제를 위한 빠른 알고리즘)

  • Kim, Jin-Wook
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.3
    • /
    • pp.281-288
    • /
    • 2010
  • In the web search engines, the proximity is used to compute the relevance of a document to the given query. There exist various research results about the proximity problems and the ordered proximity problems. In this paper, we present O(n) time algorithms for the k-keyword ordered proximity problems where n is the total number of occurrences of the k keywords in a document. Experimental results show that the proposed algorithms are about 1.2 times and over 3 times faster than the previous results when k=2 and k=5, respectively.

Efficient Approximate String Searches with Inverted Lists through Search Range Reduction (효율적인 유사문자열 검색을 위한 역리스트 탐색 기법)

  • Lee, Eun-Seok;Kim, Jong-Ik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.1310-1313
    • /
    • 2011
  • 유사문자열 검색이란 문자열 집합에서 주어진 문자열과 유사한 문자열들을 검색하는 것으로 정보검색, 데이터 클리닝 등의 분야에서 활용되고 있다. 효율적인 유사문자열 검색을 위해 사전에 문자열 집합에 대한 역리스트를 구성하고 문자열이 주어졌을 때, 주어진 문자열에 관련된 역리스트를 병합하여 유사도 기준을 만족하는 문자열을 찾는다. 이때 비용을 줄이기 위해 일부의 역리스트만 병합하고 나머지 역리스트에 대해서는 이진탐색을 하는 방법이 있다. 본 논문에서는 역리스트를 이진탐색할 때, 불필요한 탐색구간을 제거하여 역리스트 탐색 비용을 줄이는 방법을 제안한다.

XML Schema Model of Great Staff Music Score using the Integration Method (통합 방식을 이용한 대보표 악보의 XML 스키마 모델)

  • 김정희;곽호영
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.2
    • /
    • pp.302-313
    • /
    • 2003
  • Currently, DTD(Document Type Definition) Definition of Music score has been widely studied according to applications, and the methods of automatic transformation from defined DTD to XML Schema is in progress. In addition, studies of structure of DTD definition are focused on the expression of music information by individual format. In this paper, expression method of the music information by continuous string values is suggested using the fact that measure is basically a component of score, and XML Schema is also modelled. In addition, mechanism extracting the music information from XML instance which was expressed using the proposed method is presented. As a result, XML Schema taking the continuous string values could be defined, instance obtained by the proposed method results in increasing efficiency by simplicity of XPATH and reduction of search step compared to previous method. In addition, it is possible for human to make direct expression, and it is known that the instance size decreases.

A Genetic Algorithm for the Chinese Postman Problem on the Mixed Networks (유전자 알고리즘을 이용한 혼합 네트워크에서의 Chinese Postman Problem 해법)

  • Jun Byung Hyun;Kang Myung Ju;Han Chi Geun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.1 s.33
    • /
    • pp.181-188
    • /
    • 2005
  • Chinese Postman Problem (CPP) is a problem that finds a shortest tour traversing all edges or arcs at least once in a given network. The Chinese Postman Problem on Mixed networks (MCPP) is a Practical generalization of the classical CPP and it has many real-world applications. The MCPP has been shown to be NP-complete. In this paper, we transform a mixed network into a symmetric network using virtual arcs that are shortest paths by Floyd's algorithm. With the transformed network, we propose a Genetic Algorithm (GA) that converges to a near optimal solution quickly by a multi-directional search technique. We study the chromosome structure used in the GA and it consists of a path string and an encoding string. An encoding method, a decoding method, and some genetic operators that are needed when the MCPP is solved using the Proposed GA are studied. . In addition, two scaling methods are used in proposed GA. We compare the performance of the GA with an existing Modified MDXED2 algorithm (Pearn et al. , 1995) In the simulation results, the proposed method is better than the existing methods in case the network has many edges, the Power Law scaling method is better than the Logarithmic scaling method.

  • PDF

Construction of Linearly Aliened Corpus Using Unsupervised Learning (자율 학습을 이용한 선형 정렬 말뭉치 구축)

  • Lee, Kong-Joo;Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.387-394
    • /
    • 2004
  • In this paper, we propose a modified unsupervised linear alignment algorithm for building an aligned corpus. The original algorithm inserts null characters into both of two aligned strings (source string and target string), because the two strings are different from each other in length. This can cause some difficulties like the search space explosion for applications using the aligned corpus with null characters and no possibility of applying to several machine learning algorithms. To alleviate these difficulties, we modify the algorithm not to contain null characters in the aligned source strings. We have shown the usability of our approach by applying it to different areas such as Korean-English back-trans literation, English grapheme-phoneme conversion, and Korean morphological analysis.

Tabu Search-Genetic Process Mining Algorithm for Discovering Stochastic Process Tree (확률적 프로세스 트리 생성을 위한 타부 검색 -유전자 프로세스 마이닝 알고리즘)

  • Joo, Woo-Min;Choi, Jin Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.42 no.4
    • /
    • pp.183-193
    • /
    • 2019
  • Process mining is an analytical technique aimed at obtaining useful information about a process by extracting a process model from events log. However, most existing process models are deterministic because they do not include stochastic elements such as the occurrence probabilities or execution times of activities. Therefore, available information is limited, resulting in the limitations on analyzing and understanding the process. Furthermore, it is also important to develop an efficient methodology to discover the process model. Although genetic process mining algorithm is one of the methods that can handle data with noises, it has a limitation of large computation time when it is applied to data with large capacity. To resolve these issues, in this paper, we define a stochastic process tree and propose a tabu search-genetic process mining (TS-GPM) algorithm for a stochastic process tree. Specifically, we define a two-dimensional array as a chromosome to represent a stochastic process tree, fitness function, a procedure for generating stochastic process tree and a model trace as a string of activities generated from the process tree. Furthermore, by storing and comparing model traces with low fitness values in the tabu list, we can prevent duplicated searches for process trees with low fitness value being performed. In order to verify the performance of the proposed algorithm, we performed a numerical experiment by using two kinds of event log data used in the previous research. The results showed that the suggested TS-GPM algorithm outperformed the GPM algorithm in terms of fitness and computation time.

Efficient Construction of Generalized Suffix Arrays by Merging Suffix Arrays (써픽스 배열 합병을 이용한 일반화된 써픽스 배열의 효율적인 구축 알고리즘)

  • Jeon, Jeong-Eun;Park, Heejin;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.6
    • /
    • pp.268-278
    • /
    • 2005
  • We consider constructing the generalized suffix way of strings A and B when the suffix arrays of A and B are given, j.e., merging two suffix arrays of A and B. There are efficient algorithms to merge some special suffix arrays such as the odd array and the even array. However, for the general case that A and B are arbitrary strings, no efficient merging algorithms have been developed. Thus, one had to construct the generalized suffix arrays of A and B by constructing the suffix array of A$\#$B$\$$ from scratch, even though the suffix ways of A and B are given. In this paper, we Present efficient merging algorithms for the suffix arrays of two arbitrary strings A and B drawn from constant and integer alphabets. The experimental results show that merging two suffix ways of A and B are about 5 times faster than constructing the suffix way of A$\#$B$\$$ from scratch for constant alphabets. Our algorithms include searching all suffixes of string B in the suffix array of A. To do this, we use suffix links in suffix ways and we developed efficient algorithms for computing the suffix links. Efficient computation of suffix links is another contribution of this paper because it can be used to solve other problems occurred in bioinformatics that should search all suffixes of a given string in the suffix array of another string such as computing matching statistics, finding longest common substrings, and so on. The experimental results show that our methods for computing suffix links is about 3-4 times faster than the previous fastest methods.

Evaluating Reverse Logistics Networks with Centralized Centers : Hybrid Genetic Algorithm Approach (집중형센터를 가진 역물류네트워크 평가 : 혼합형 유전알고리즘 접근법)

  • Yun, YoungSu
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.55-79
    • /
    • 2013
  • In this paper, we propose a hybrid genetic algorithm (HGA) approach to effectively solve the reverse logistics network with centralized centers (RLNCC). For the proposed HGA approach, genetic algorithm (GA) is used as a main algorithm. For implementing GA, a new bit-string representation scheme using 0 and 1 values is suggested, which can easily make initial population of GA. As genetic operators, the elitist strategy in enlarged sampling space developed by Gen and Chang (1997), a new two-point crossover operator, and a new random mutation operator are used for selection, crossover and mutation, respectively. For hybrid concept of GA, an iterative hill climbing method (IHCM) developed by Michalewicz (1994) is inserted into HGA search loop. The IHCM is one of local search techniques and precisely explores the space converged by GA search. The RLNCC is composed of collection centers, remanufacturing centers, redistribution centers, and secondary markets in reverse logistics networks. Of the centers and secondary markets, only one collection center, remanufacturing center, redistribution center, and secondary market should be opened in reverse logistics networks. Some assumptions are considered for effectively implementing the RLNCC The RLNCC is represented by a mixed integer programming (MIP) model using indexes, parameters and decision variables. The objective function of the MIP model is to minimize the total cost which is consisted of transportation cost, fixed cost, and handling cost. The transportation cost is obtained by transporting the returned products between each centers and secondary markets. The fixed cost is calculated by opening or closing decision at each center and secondary markets. That is, if there are three collection centers (the opening costs of collection center 1 2, and 3 are 10.5, 12.1, 8.9, respectively), and the collection center 1 is opened and the remainders are all closed, then the fixed cost is 10.5. The handling cost means the cost of treating the products returned from customers at each center and secondary markets which are opened at each RLNCC stage. The RLNCC is solved by the proposed HGA approach. In numerical experiment, the proposed HGA and a conventional competing approach is compared with each other using various measures of performance. For the conventional competing approach, the GA approach by Yun (2013) is used. The GA approach has not any local search technique such as the IHCM proposed the HGA approach. As measures of performance, CPU time, optimal solution, and optimal setting are used. Two types of the RLNCC with different numbers of customers, collection centers, remanufacturing centers, redistribution centers and secondary markets are presented for comparing the performances of the HGA and GA approaches. The MIP models using the two types of the RLNCC are programmed by Visual Basic Version 6.0, and the computer implementing environment is the IBM compatible PC with 3.06Ghz CPU speed and 1GB RAM on Windows XP. The parameters used in the HGA and GA approaches are that the total number of generations is 10,000, population size 20, crossover rate 0.5, mutation rate 0.1, and the search range for the IHCM is 2.0. Total 20 iterations are made for eliminating the randomness of the searches of the HGA and GA approaches. With performance comparisons, network representations by opening/closing decision, and convergence processes using two types of the RLNCCs, the experimental result shows that the HGA has significantly better performance in terms of the optimal solution than the GA, though the GA is slightly quicker than the HGA in terms of the CPU time. Finally, it has been proved that the proposed HGA approach is more efficient than conventional GA approach in two types of the RLNCC since the former has a GA search process as well as a local search process for additional search scheme, while the latter has a GA search process alone. For a future study, much more large-sized RLNCCs will be tested for robustness of our approach.