• Title/Summary/Keyword: Similarity Threshold

Search Result 140, Processing Time 0.026 seconds

System Trading using Case-based Reasoning based on Absolute Similarity Threshold and Genetic Algorithm (절대 유사 임계값 기반 사례기반추론과 유전자 알고리즘을 활용한 시스템 트레이딩)

  • Han, Hyun-Woong;Ahn, Hyun-Chul
    • The Journal of Information Systems
    • /
    • v.26 no.3
    • /
    • pp.63-90
    • /
    • 2017
  • Purpose This study proposes a novel system trading model using case-based reasoning (CBR) based on absolute similarity threshold. The proposed model is designed to optimize the absolute similarity threshold, feature selection, and instance selection of CBR by using genetic algorithm (GA). With these mechanisms, it enables us to yield higher returns from stock market trading. Design/Methodology/Approach The proposed CBR model uses the absolute similarity threshold varying from 0 to 1, which serves as a criterion for selecting appropriate neighbors in the nearest neighbor (NN) algorithm. Since it determines the nearest neighbors on an absolute basis, it fails to select the appropriate neighbors from time to time. In system trading, it is interpreted as the signal of 'hold'. That is, the system trading model proposed in this study makes trading decisions such as 'buy' or 'sell' only if the model produces a clear signal for stock market prediction. Also, in order to improve the prediction accuracy and the rate of return, the proposed model adopts optimal feature selection and instance selection, which are known to be very effective in enhancing the performance of CBR. To validate the usefulness of the proposed model, we applied it to the index trading of KOSPI200 from 2009 to 2016. Findings Experimental results showed that the proposed model with optimal feature or instance selection could yield higher returns compared to the benchmark as well as the various comparison models (including logistic regression, multiple discriminant analysis, artificial neural network, support vector machine, and traditional CBR). In particular, the proposed model with optimal instance selection showed the best rate of return among all the models. This implies that the application of CBR with the absolute similarity threshold as well as the optimal instance selection may be effective in system trading from the perspective of returns.

ART1 Algorithm by Using Enhanced Similarity Test and Dynamical Vigilance Threshold (개선된 유사성 측정 방법과 동적인 경계 변수를 이용한 ART1 알고리즘)

  • 문정욱;김광백
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.6
    • /
    • pp.1318-1324
    • /
    • 2003
  • There are two problems in the conventional ART1 algorithm. One is in similarity testing method of the conventional ART1 between input patterns and stored patterns. The other is that vigilance threshold of conventional ART1 influences the number of clusters and the rate of recognition. In this paper, new similarity testing method and dynamical vigilance threshold method are proposed to solve these problems. The former is similarity test method using the rate of norm of exclusive-NOR between input patterns and stored patterns and the rate of nodes have equivalence value, and the latter method dynamically controls vigilance threshold to similarity using fuzzy operations and the sum operation of Yager. To check the performance of new methods, we used 26 alphabet characters and nosed characters. In experiment results, the proposed methods are better than the conventional methods in ART1, because the proposed methods are less sensitive than the conventional methods for initial vigilance and the recognition rate of the proposed methods is higher than that of the conventional methods.

A Sampling-based Algorithm for Top-${\kappa}$ Similarity Joins (Top-${\kappa}$ 유사도 조인을 위한 샘플링 기반 알고리즘)

  • Park, Jong Soo
    • Journal of KIISE:Databases
    • /
    • v.41 no.4
    • /
    • pp.256-261
    • /
    • 2014
  • The problem of top-${\kappa}$ set similarity joins finds the top-${\kappa}$ pairs of records ranked by their similarities between two sets of input records. We propose an efficient algorithm to return top-${\kappa}$ similarity join pairs using a sampling technique. From a sample of the input records, we construct a histogram of set similarity joins, and then compute an estimated similarity threshold in the histogram for top-${\kappa}$ join pairs within the error bound of 95% confidence level based on statistical inference. Finally, the estimated threshold is applied to the traditional similarity join algorithm which uses the min-heap structure to get top-${\kappa}$ similarity joins. The experimental results show the good performance of the proposed algorithm on large real datasets.

A Steganography based on Bit Plane using Similarity (유사도를 이용한 비트플레인 기반의 스테가노그라피)

  • Moon, Il-Nam;Lee, Sin-Joo;Kim, Jang-Hyung;Lee, Kwang-Man
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.4
    • /
    • pp.684-690
    • /
    • 2009
  • In this paper, we proposed a new method of the steganography based on bit plane using similarity. Applying a fixed threshold, the insert information into all bit planes showed different image quality. Therefore, we first defined the bit plane of block similarity to solve the fixing threshold problem. We then proposed a new method using the Bit Plane complexity and similarity to insert information into bit planes of block. In the experiment, we inserted information into the standard images with the same image quality and same insertion capacity. Finally analyzed the insertion capacity and image quality. As a result, the proposed method increased the insertion capacity of about 6% and improved the image quality of about 3.3dB than fixing threshold method.

Plagiarism Detection among Source Codes using Adaptive Methods

  • Lee, Yun-Jung;Lim, Jin-Su;Ji, Jeong-Hoon;Cho, Hwaun-Gue;Woo, Gyun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.6
    • /
    • pp.1627-1648
    • /
    • 2012
  • We propose an adaptive method for detecting plagiarized pairs from a large set of source code. This method is adaptive in that it uses an adaptive algorithm and it provides an adaptive threshold for determining plagiarism. Conventional algorithms are based on greedy string tiling or on local alignments of two code strings. However, most of them are not adaptive; they do not consider the characteristics of the program set, thereby causing a problem for a program set in which all the programs are inherently similar. We propose adaptive local alignment-a variant of local alignment that uses an adaptive similarity matrix. Each entry of this matrix is the logarithm of the probabilities of the keywords based on their frequency in a given program set. We also propose an adaptive threshold based on the local outlier factor (LOF), which represents the likelihood of an entity being an outlier. Experimental results indicate that our method is more sensitive than JPlag, which uses greedy string tiling for detecting plagiarism-suspected code pairs. Further, the adaptive threshold based on the LOF is shown to be effective, and the detection performance shows high sensitivity with negligible loss of specificity, compared with that using a fixed threshold.

Practical Datasets for Similarity Measures and Their Threshold Values (유사도 측정 데이터 셋과 쓰레숄드)

  • Yang, Byoungju;Shim, Junho
    • The Journal of Society for e-Business Studies
    • /
    • v.18 no.1
    • /
    • pp.97-105
    • /
    • 2013
  • In the e-business domain where data objects are quantitatively large, measuring similarity to find the same or similar objects is important. It basically requires comparing and computing the features of objects in pairs, and therefore takes longer time as the amount of data becomes bigger. Recent studies have shown various algorithms to efficiently perform it. Most of them show their performance superiority by empirical tests over some sets of data. In this paper, we introduce those data sets, present their characteristics and the meaningful threshold values that each of data sets contain in nature. The analysis on practical data sets with respect to their threshold values may serve as a referential baseline to the future experiments of newly developed algorithms.

GORank: Semantic Similarity Search for Gene Products using Gene Ontology (GORank: Gene Ontology를 이용한 유전자 산물의 의미적 유사성 검색)

  • Kim, Ki-Sung;Yoo, Sang-Won;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.682-692
    • /
    • 2006
  • Searching for gene products which have similar biological functions are crucial for bioinformatics. Modern day biological databases provide the functional description of gene products using Gene Ontology(GO). In this paper, we propose a technique for semantic similarity search for gene products using the GO annotation information. For this purpose, an information-theoretic measure for semantic similarity between gene products is defined. And an algorithm for semantic similarity search using this measure is proposed. We adapt Fagin's Threshold Algorithm to process the semantic similarity query as follows. First, we redefine the threshold for our measure. This is because our similarity function is not monotonic. Then cluster-skipping and the access ordering of the inverted index lists are proposed to reduce the number of disk accesses. Experiments with real GO and annotation data show that GORank is efficient and scalable.

Statistical Fingerprint Recognition Matching Method with an Optimal Threshold and Confidence Interval

  • Hong, C.S.;Kim, C.H.
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.6
    • /
    • pp.1027-1036
    • /
    • 2012
  • Among various biometrics recognition systems, statistical fingerprint recognition matching methods are considered using minutiae on fingerprints. We define similarity distance measures based on the coordinate and angle of the minutiae, and suggest a fingerprint recognition model following statistical distributions. We could obtain confidence intervals of similarity distance for the same and different persons, and optimal thresholds to minimize two kinds of error rates for distance distributions. It is found that the two confidence intervals of the same and different persons are not overlapped and that the optimal threshold locates between two confidence intervals. Hence an alternative statistical matching method can be suggested by using nonoverlapped confidence intervals and optimal thresholds obtained from the distributions of similarity distances.

Tuning the Parameters for the Decision Making System in Order to Define Athlete's Aerobic and Anaerobic Thresholds

  • Ketola, Jaakko;Saastamoinen, Kalle;Turunen, Esko
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.317-320
    • /
    • 2004
  • In this work we have managed to find parameters for defining athlete's aerobic and anaerobic thresholds. Thresholds which are of vital importance for top athletes. It is shown how differential evolution and different similarity measures has been used to tune computational model for threshold definitions. From our results it is obvious that the use of right parameter values for this kind expert system is of vital importance.

  • PDF

Optimizing Similarity Threshold and Coverage of CBR (사례기반추론의 유사 임계치 및 커버리지 최적화)

  • Ahn, Hyunchul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.8
    • /
    • pp.535-542
    • /
    • 2013
  • Since case-based reasoning(CBR) has many advantages, it has been used for supporting decision making in various areas including medical checkup, production planning, customer classification, and so on. However, there are several factors to be set by heuristics when designing effective CBR systems. Among these factors, this study addresses the issue of selecting appropriate neighbors in case retrieval step. As the criterion for selecting appropriate neighbors, conventional studies have used the preset number of neighbors to combine(i.e. k of k-nearest neighbor), or the relative portion of the maximum similarity. However, this study proposes to use the absolute similarity threshold varying from 0 to 1, as the criterion for selecting appropriate neighbors to combine. In this case, too small similarity threshold value may make the model rarely produce the solution. To avoid this, we propose to adopt the coverage, which implies the ratio of the cases in which solutions are produced over the total number of the training cases, and to set it as the constraint when optimizing the similarity threshold. To validate the usefulness of the proposed model, we applied it to a real-world target marketing case of an online shopping mall in Korea. As a result, we found that the proposed model might significantly improve the performance of CBR.