• Title/Summary/Keyword: graph similarity

Search Result 142, Processing Time 0.022 seconds

Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model (굼벨 분포 모델을 이용한 표절 프로그램 자동 탐색 및 추적)

  • Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.453-462
    • /
    • 2009
  • Studies on software plagiarism detection, prevention and judgement have become widespread due to the growing of interest and importance for the protection and authentication of software intellectual property. Many previous studies focused on comparing all pairs of submitted codes by using attribute counting, token pattern, program parse tree, and similarity measuring algorithm. It is important to provide a clear-cut model for distinguishing plagiarism and collaboration. This paper proposes a source code clustering algorithm using a probability model on extreme value distribution. First, we propose an asymmetric distance measure pdist($P_a$, $P_b$) to measure the similarity of $P_a$ and $P_b$ Then, we construct the Plagiarism Direction Graph (PDG) for a given program set using pdist($P_a$, $P_b$) as edge weights. And, we transform the PDG into a Gumbel Distance Graph (GDG) model, since we found that the pdist($P_a$, $P_b$) score distribution is similar to a well-known Gumbel distribution. Second, we newly define pseudo-plagiarism which is a sort of virtual plagiarism forced by a very strong functional requirement in the specification. We conducted experiments with 18 groups of programs (more than 700 source codes) collected from the ICPC (International Collegiate Programming Contest) and KOI (Korean Olympiad for Informatics) programming contests. The experiments showed that most plagiarized codes could be detected with high sensitivity and that our algorithm successfully separated real plagiarism from pseudo plagiarism.

A Study on the Visual Representation of TREC Text Documents in the Construction of Digital Library (디지털도서관 구축과정에서 TREC 텍스트 문서의 시각적 표현에 관한 연구)

  • Jeong, Ki-Tai;Park, Il-Jong
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.3
    • /
    • pp.1-14
    • /
    • 2004
  • Visualization of documents will help users when they do search similar documents. and all research in information retrieval addresses itself to the problem of a user with an information need facing a data source containing an acceptable solution to that need. In various contexts. adequate solutions to this problem have included alphabetized cubbyholes housing papyrus rolls. microfilm registers. card catalogs and inverted files coded onto discs. Many information retrieval systems rely on the use of a document surrogate. Though they might be surprise to discover it. nearly every information seeker uses an array of document surrogates. Summaries. tables of contents. abstracts. reviews, and MARC recordsthese are all document surrogates. That is, they stand infor a document allowing a user to make some decision regarding it. whether to retrieve a book from the stacks, whether to read an entire article, etc. In this paper another type of document surrogate is investigated using a grouping method of term list. lising Multidimensional Scaling Method (MDS) those surrogates are visualized on two-dimensional graph. The distances between dots on the two-dimensional graph can be represented as the similarity of the documents. More close the distance. more similar the documents.

Query Optimization Algorithm for Image Retrieval by Spatial Similarity) (위치 관계에 의한 영상 검색을 위한 질의 및 검색 기법)

  • Cho, Sue-Jin;Yoo, Suk-In
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.5
    • /
    • pp.551-562
    • /
    • 2000
  • Content-based image retrieval system retrieves an image from a database using visual features. Among approaches to express visual aspects in queries, 'query by sketch' is most convenient and expressive. However, every 'query by sketch' system has the query imperfectness problem. GContent-based image retrieval system retrieves an image from a database using visual features. Among approaches to express visual aspects in queries, 'query by sketch' is most convenient and expressive. However, every 'query by sketch' system has the query imperfectness problem. Generally, the query image produced by a user is different from the intended target image. To overcome this problem, many image retrieval systems use the spatial relationships of the objects, instead of pixel coordinates of the objects. In this paper, a query-converting algorithm for an image retrieval system, which uses the spatial relationship of every two objects as an image feature, is proposed. The proposed algorithm converts the query image into a graph that has the minimum number of edges, by eliminating every transitive edge. Since each edge in the graph represents the spatial relationship of two objects, the elimination of unnecessary edges makes the retrieval process more efficient. Experimental results show that the proposed algorithm leads the smaller number of comparison in searching process as compared with other algorithms that do not guarantee the minimum number of edges.

  • PDF

Exploiting Query Proximity and Graph Profiling Method for Tag-based Personalized Search in Folksonomy (질의어의 근접성 정보 및 그래프 프로파일링 기법을 이용한 태그 기반 개인화 검색)

  • Han, Keejun;Jang, Jincheul;Yi, Mun Yong
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1117-1125
    • /
    • 2014
  • Folksonomy data, which is derived from social tagging systems, is a useful source for understanding a user's intention and interest. Using the folksonomy data, it is possible to create an accurate user profile which can be utilized to build a personalized search system. However there are limitations in some of the traditional methods such as Vector Space Model(VSM) for user profiling and similarity computation. This paper suggests a novel method with graph-based user and document profile which uses the proximity information of query terms to improve personalized search. We demonstrate the performance of the suggested method by comparing its performance with several state-of-the-art VSM based personalization models in two different folksonomy datasets. The results show that the proposed model constantly outperforms the other state-of-the-art personalization models. Furthermore, the parameter sensitivity results show that the proposed model is parameter-free in that it is not affected by the idiosyncratic nature of datasets.

Fuzzy Relevance-based Transcoding for Differentiated Streaming Media Service in the Proxy System (프록시 시스템에서 차별화된 스트리밍 미디어 서비스를 위한 퍼지 적합도 기반 트랜스 코딩)

  • Lee, Chong-Deuk
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.6
    • /
    • pp.2785-2792
    • /
    • 2011
  • Such problems as delay, congestion, and crosstalk in the proxy system degrade not only QoS (Quality of Service) but responsiveness and reliability of the streaming media service. To solve this problem this paper proposed a FRTP (Fuzzy Relevance-based Transcoding Proxy) mechanism. The proposed FRTP mechanism analyzes fuzzy similarity for partitioned segment versions of media objects to create a FRTG (Fuzzy Relevance-based Transcoding Graph). Created FRTG determines the transcoding for partitioned media object segment versions. Determined transcoding improves DSR (Delay Saving Ratios), CHPR (Cache Hit Precision Ratio), and CHRR (Cache Hit Recall Ratio). The proposed mechanism is simulated to evaluate such performance parameters as DSR, CHPR, and CHRR. Simulation results shows that the proposed mechanism outperforms in DSR, CHPR and CHRR compared with the other existing mechanisms.

Analysis of Intention in Spoken Dialogue based on Classifying Sentence Patterns (문형구조의 분류에 따른 대화음성의 의도분석에 관한 연구)

  • Choi, Hwan-Jin;Song, Chang-Hwan;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.1
    • /
    • pp.61-70
    • /
    • 1996
  • According to topics or speaker's intentions in a dialogue, utterance spoken by speaker has a different sentence structure of word combinations. Based on these facts, we have proposed the statistical approach. IDT(intention decision table), which is modeling the correlations between sentence patterns and the intention. In a IDT, the sentence is splitted into 5 different factors, and the intention of a sentence is determined by the similarity between and intention and 5 factors that have represent a sentence. From the experimental results, the IDT has indicated that the prediction rate of an intention is improved 10~18% over the word-intention correlations and is enhanced 3~12% compared with the MIG(Markov intention graph) that models the intention with a transition graph for word categories in a sentence. Based on these facts, we have found that the IDT is effective method for the prediction of an intention.

  • PDF

Photo Clustering using Maximal Clique Finding Algorithm and Its Visualized Interface (최대 클리크 찾기 알고리즘을 이용한 사진 클러스터링 방법과 사진 시각화 인터페이스)

  • Ryu, Dong-Sung;Cho, Hwan-Gue
    • Journal of the Korea Computer Graphics Society
    • /
    • v.16 no.4
    • /
    • pp.35-40
    • /
    • 2010
  • Due to the distribution of digital camera, many work for photo management has been studied. However, most work use a sequential grid layout which arranges photos considering one criterion of digital photo. This interface makes users have lots of scrolling and concentrate ability when they manage their photos. In this paper, we propose a clustering method based on a temporal sequence considering their color similarity in detail. First we cluster photos using Cooper's event clustering method. Second, we makes more detailed clusters from each clustered photo set, which are clustered temporal clustering before, using maximal clique finding algorithm of interval graph. Finally, we arrange each detailed dusters on a user screen with their overlap keeping their temporal sequence. In order to evaluate our proposed system, we conducted on user studies based on a simple questionnaire.

Efficient graph-based two-stage superpixel generation method (효율적인 그래프 기반 2단계 슈퍼픽셀 생성 방법)

  • Park, Sanghyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.12
    • /
    • pp.1520-1527
    • /
    • 2019
  • Superpixel methods are widely used in the preprocessing stage as a method to reduce computational complexity by simplifying images while maintaining the characteristics of images in the field of computer vision. It is common to generate superpixels with a regular size and form based on the pixel values rather than considering the characteristics of the image. In this paper, we propose a method to generate superpixels considering the characteristics of an image according to the application. The proposed method consists of two steps, and the first step is to oversegment an image so that the boundary information of the image is well preserved. In the second step, superpixels are merged based on similarity to produce the desired number of superpixels, where the form of superpixels are controlled by limiting the maximum size of superpixels. Experimental results show that the proposed method preserves the boundaries of an image more accurately than the existing method.

Authorship Attribution Framework Using Survival Network Concept : Semantic Features and Tolerances (서바이벌 네트워크 개념을 이용한 저자 식별 프레임워크: 의미론적 특징과 특징 허용 범위)

  • Hwang, Cheol-Hun;Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1013-1021
    • /
    • 2020
  • Malware Authorship Attribution is a research field for identifying malware by comparing the author characteristics of unknown malware with the characteristics of known malware authors. The authorship attribution method using binaries has the advantage that it is easy to collect and analyze targeted malicious codes, but the scope of using features is limited compared to the method using source code. This limitation has the disadvantage that accuracy decreases for a large number of authors. This study proposes a method of 'Defining semantic features from binaries' and 'Defining allowable ranges for redundant features using the concept of survival network' to complement the limitations in the identification of binary authors. The proposed method defines Opcode-based graph features from binary information, and defines the allowable range for selecting unique features for each author using the concept of a survival network. Through this, it was possible to define the feature definition and feature selection method for each author as a single technology, and through the experiment, it was confirmed that it was possible to derive the same level of accuracy as the source code-based analysis with an improvement of 5.0% accuracy compared to the previous study.

A Study on the Intelligent Service Selection Reasoning for Enhanced User Satisfaction : Appliance to Cloud Computing Service (사용자 만족도 향상을 위한 지능형 서비스 선정 방안에 관한 연구 : 클라우드 컴퓨팅 서비스에의 적용)

  • Shin, Dong Cheon
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.35-51
    • /
    • 2012
  • Cloud computing is internet-based computing where computing resources are offered over the Internet as scalable and on-demand services. In particular, in case a number of various cloud services emerge in accordance with development of internet and mobile technology, to select and provide services with which service users satisfy is one of the important issues. Most of previous works show the limitation in the degree of user satisfaction because they are based on so called concept similarity in relation to user requirements or are lack of versatility of user preferences. This paper presents cloud service selection reasoning which can be applied to the general cloud service environments including a variety of computing resource services, not limited to web services. In relation to the service environments, there are two kinds of services: atomic service and composite service. An atomic service consists of service attributes which represent the characteristics of service such as functionality, performance, or specification. A composite service can be created by composition of atomic services and other composite services. Therefore, a composite service inherits attributes of component services. On the other hand, the main participants in providing with cloud services are service users, service suppliers, and service operators. Service suppliers can register services autonomously or in accordance with the strategic collaboration with service operators. Service users submit request queries including service name and requirements to the service management system. The service management system consists of a query processor for processing user queries, a registration manager for service registration, and a selection engine for service selection reasoning. In order to enhance the degree of user satisfaction, our reasoning stands on basis of the degree of conformance to user requirements of service attributes in terms of functionality, performance, and specification of service attributes, instead of concept similarity as in ontology-based reasoning. For this we introduce so called a service attribute graph (SAG) which is generated by considering the inclusion relationship among instances of a service attribute from several perspectives like functionality, performance, and specification. Hence, SAG is a directed graph which shows the inclusion relationships among attribute instances. Since the degree of conformance is very close to the inclusion relationship, we can say the acceptability of services depends on the closeness of inclusion relationship among corresponding attribute instances. That is, the high closeness implies the high acceptability because the degree of closeness reflects the degree of conformance among attributes instances. The degree of closeness is proportional to the path length between two vertex in SAG. The shorter path length means more close inclusion relationship than longer path length, which implies the higher degree of conformance. In addition to acceptability, in this paper, other user preferences such as priority for attributes and mandatary options are reflected for the variety of user requirements. Furthermore, to consider various types of attribute like character, number, and boolean also helps to support the variety of user requirements. Finally, according to service value to price cloud services are rated and recommended to users. One of the significances of this paper is the first try to present a graph-based selection reasoning unlike other works, while considering various user preferences in relation with service attributes.