• Title/Summary/Keyword: string similarity

Search Result 47, Processing Time 0.022 seconds

A Program-Plagiarism Checker using Abstract Syntax Tree (구문트리 비고를 통한 프로그램 유형 복제 검사)

  • 김영철;김성근;염세훈;최종명;유재우
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.792-802
    • /
    • 2003
  • Earlier program plagiarism check systems are performed by using simple text, attribute or token string base on match techniques. They have difficulties in checking program styles which have nothing to do with program syntax such as indentation, spacing and comments. This paper introduces a plagiarism check model which compares syntax-trees for the given programs. By using syntax-trees, this system can overcome the weekness of filtering program styles and have advantage of comparing the structure of programs by syntax and semantic analysis. Our study introduces syntactic tree creation, unparsing and similarity check algorithms about C/C++ program plagiarism checking for internet cyber education and estimate plagiarism pattern.

Ontology Alignment based on Parse Tree Kernel usig Structural and Semantic Information (구조 및 의미 정보를 활용한 파스 트리 커널 기반의 온톨로지 정렬 방법)

  • Son, Jeong-Woo;Park, Seong-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.329-334
    • /
    • 2009
  • The ontology alignment has two kinds of major problems. First, the features used for ontology alignment are usually defined by experts, but it is highly possible for some critical features to be excluded from the feature set. Second, the semantic and the structural similarities are usually computed independently, and then they are combined in an ad-hoc way where the weights are determined heuristically. This paper proposes the modified parse tree kernel (MPTK) for ontology alignment. In order to compute the similarity between entities in the ontologies, a tree is adopted as a representation of an ontology. After transforming an ontology into a set of trees, their similarity is computed using MPTK without explicit enumeration of features. In computing the similarity between trees, the approximate string matching is adopted to naturally reflect not only the structural information but also the semantic information. According to a series of experiments with a standard data set, the kernel method outperforms other structural similarities such as GMO. In addition, the proposed method shows the state-of-the-art performance in the ontology alignment.

Phonetic Similarity Meausre for the Korean Transliterations of Foreign Words (외국어 음차 표기의 음성적 유사도 비교 알고리즘)

  • Gang, Byeong-Ju;Lee, Jae-Seong;Choe, Gi-Seon
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.10
    • /
    • pp.1237-1246
    • /
    • 1999
  • 최근 모든 분야에서 외국과의 교류가 증대됨에 따라서 한국어 문서에는 점점 더 많은 외국어 음차 표기가 사용되는 경향이 있다. 하지만 같은 외국어에 대한 음차 표기에 개인차가 심하여 이들 음차 표기를 포함한 문서들에 대한 검색을 어렵게 만드는 원인이 되고 있다. 한 가지 해결 방법은 색인 시에 같은 외국어에서 온 음차 표기들을 등가부류로 묶어서 색인해 놓았다가 질의 시에 확장하는 방법이다. 본 논문에서는 외국어 음차 표기들의 등가부류를 만드는데 필요한 음차 표기의 음성적 유사도 비교 알고리즘인 Kodex를 제안한다. Kodex 방법은 기존의 스트링 비교 방법인 비음성적 방법에 비해 음차 표기들을 등가부류로 클러스터링하는데 있어 더 나은 성능을 보이면서도, 계산이 간단하여 훨씬 효율적으로 구현될 수 있는 장점이 있다.Abstract With the advent of digital communication technologies, as Koreans communicate with foreigners more frequently, more foreign word transliterations are being used in Korean documents more than ever before. The transliterations of foreign words are very various among individuals. This makes text retrieval tasks about these documents very difficult. In this paper we propose a new method, called Kodex, of measuring the phonetic similarity among foreign word transliterations. Kodex can be used to generate the equivalence classes of the transliterations while indexing and conflate the equivalent transliterations at the querying stage. We show that Kodex gives higher precision at the similar recall level and is more efficient in computation than non-phonetic methods based on string similarity measure.

A Novel Cryptosystem Based on Steganography and Automata Technique for Searchable Encryption

  • Truong, Nguyen Huy
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.2258-2274
    • /
    • 2020
  • In this paper we first propose a new cryptosystem based on our data hiding scheme (2,9,8) introduced in 2019 with high security, where encrypting and hiding are done at once, the ciphertext does not depend on the input image size as existing hybrid techniques of cryptography and steganography. We then exploit our automata approach presented in 2019 to design two algorithms for exact and approximate pattern matching on secret data encrypted by our cryptosystem. Theoretical analyses remark that these algorithms both have O(n) time complexity in the worst case, where for the approximate algorithm, we assume that it uses ⌈(1-ε)m)⌉ processors, where ε, m and n are the error of our string similarity measure and lengths of the pattern and secret data, respectively. In searchable encryption, our cryptosystem is used by users and our pattern matching algorithms are performed by cloud providers.

The Method of Searching Unified Medical Language System Using Automatic Modified a Query (자동 질의수정을 통한 통합의학언어 시스템 검색)

  • 김종광;하원식;이정현
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.129-132
    • /
    • 2003
  • The metathesaurus(UMLS, 2003AA edition) supports multi language and includes 875, 233 concepts, 2, 146, 897 concept names. It is impossible for PubMed or NLM serve searching of the metatheaurus to retrieval using a query that is not to be text, a fault sentence structure or a part of concept name. That means the user notice correctly suitable medical words in order to get correct answer, otherwise she or he can't find information that they want to find I propose that the method of searching unified medical language system using automatic modified a query for problem that I mentioned. This method use dictionary that is standard for automation of modified query gauge similarity between query and dictionary using string comparison algorithm. And then, the tested term converse the form of metathesaurus for optimized result. For the evaluation of method, I select some query and I contrast NLM method that renewed Aug. 2003.

  • PDF

A Proposal of a Shape Matching and Geo-referencing method for Building Features in Construction CAD Data to Digital Map using a Vertex Attributed String Matching algorithm (VASM 알고리즘을 이용한 건축물 CAD 자료의 수치지도 건물 객체와의 형상 정합 및 지도좌표 부여 방법의 제안)

  • Huh, Yong;Yu, Ki-Yun;Kim, Hyung-Tae
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.26 no.4
    • /
    • pp.387-396
    • /
    • 2008
  • An integration between construction CAD data and GIS data needs geo-referencing processes of construction CAD data whose coordinate systems are their own native or even unknown. Generally, these processes are based on manually detected conjugate-vertices. In this study, we proposed an semi-automated conjugate -vertices detection method for building features between construction CAD data and a digital map using a vertex attributed string matching algorithm. A geo-referencing function for construction CAD data based on the similarity transform could be derived with those conjugate-vertices. Using our proposed method, we overlaid geo-referenced CAD data to a digital map of the College of Engineering, Seoul National University and evaluated our method.

Video Index Generation and Search using Trie Structure (Trie 구조를 이용한 비디오 인덱스 생성 및 검색)

  • 현기호;김정엽;박상현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.610-617
    • /
    • 2003
  • Similarity matching in video database is of growing importance in many new applications such as video clustering and digital video libraries. In order to provide efficient access to relevant data in large databases, there have been many research efforts in video indexing with diverse spatial and temporal features. however, most of the previous works relied on sequential matching methods or memory-based inverted file techniques, thus making them unsuitable for a large volume of video databases. In order to resolve this problem, this paper proposes an effective and scalable indexing technique using a trie, originally proposed for string matching, as an index structure. For building an index, we convert each frame into a symbol sequence using a window order heuristic and build a disk-resident trie from a set of symbol sequences. For query processing, we perform a depth-first search on the trie and execute a temporal segmentation. To verify the superiority of our approach, we perform several experiments with real and synthetic data sets. The results reveal that our approach consistently outperforms the sequential scan method, and the performance gain is maintained even with a large volume of video databases.

Effect of Korean and Western Attire of Eldery Women and Perceiver's Age on Impression Formation (노년여성의 한복 및 양장 착용과 관찰자의 연령이 인상형성에 미치는 영향)

  • 이명희
    • Journal of the Korean Society of Costume
    • /
    • v.43
    • /
    • pp.187-202
    • /
    • 1999
  • The objectives of this study were to analyze the effect of dress(Korean traditional dress and suit) of elderly Women and situation on impression formation. The experimental design was $10\times{2}\times{2(dress}\times{perceiver's age}\times{situation)}$ factorial design by 3 independent variables. The stimuli of color photographs of female in her 60's model and the semantic differential scale were used. Six variables of impression formation were used: preference: elegance: potency: activity: feminine: and modernity. Samples were 400 women 200 were in their twenties and 200 in their forties and fifties. The data were analyzed by $\alpha$-reliability t-test ANOVA and duncan's multiple range test. The Korean traditional dress with the combination of Korean traditional color(light blue upper dress with dark red purple collar and string.dark blue skit) had the most positive effect on impression of elegance. Pink traditional dress and light blue traditional dress had a negative effect on impression of potency activity and modernity. Red purple suit had a positive effect on potency and modernity. The interaction between dress perceiver's age and stituation was significant for the impression of activity. Women in their 40's and 50's perceived the activity of red purple suit positively in the situation of alumnae meeting more than in the wedding ceremony. The perceived age of the stimulus person was different according to dresses. Traditional dresses was perceived older than suits were. Women in their 40's and 50's evaluated preferences of the dresses positively more than 20's did. This means that 40's and 50's feel similarity with the stimulus person more than 20's as the age of model was in their 60's The result supports the theory that similarity is basic factor in interpersonal attraction.

  • PDF

Efficient Handwritten Character Verification Using an Improved Dynamic Time Warping Algorithm (개선된 동적 타임 워핑 알고리즘을 이용한 효율적인 필기문자 감정)

  • Jang, Seok-Woo;Park, Young-Jae;Kim, Gye-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.7
    • /
    • pp.19-26
    • /
    • 2010
  • In this paper, we suggest a efficient handwritten character verification method in on-line environments which automatically analyses two input character string and computes their similarity degrees. The proposed algorithm first applies the circular projection method to input handwritten strings and extracts their representative features including shape, directions, etc. It then calculates the similarity between two character strings by using an improved dynamic time warping (DTW) algorithm. We improved the conventional DTW algorithm efficiently through adopting the branch-and-bound policy to the existing DTW algorithm which is well-known to produce good results in the various optimization problems. The experimental results to verify the performance of the proposed system show that the suggested handwritten character verification method operates more efficiently than the existing DTW and DDTW algorithms in terms of the speed.

Topic Similarity-based Event Routing Algorithm for Wireless Ad-Hoc Publish/Subscribe Systems (Ad-Hoc 무선 환경의 발행/구독 시스템을 위한 구독주제 유사도 기반의 이벤트 라우팅 알고리즘)

  • Nguyen, Hieu Trung;Oh, Sang-Yoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.11-22
    • /
    • 2009
  • For a wireless ad-hoc network, event routing algorithm of the publish/subscribe system is especially important for the performance of the system because of the dynamic characteristic and constraint network of its own. In this paper, we propose a new hybrid event routing algorithm. TopSim for efficient publish/subscribe system on the wireless ad-hoc network by extending the ShopParent algorithm by considering not only network overheads to choose a Parent of the publish/subscribe tree, but also topic similarity which is closeness of subscriptions. Our evaluation shows our proposed TopSim performs better for the case where a new joining node subscribed to the multiple topics and there is a node among Parent candidate nodes who subscribe to the ones in the list of multiple topics (related topics).