• Title/Summary/Keyword: Post Clustering

Search Result 75, Processing Time 0.024 seconds

A Post-Verification Method of Near-Duplicate Image Detection using SIFT Descriptor Binarization (SIFT 기술자 이진화를 이용한 근-복사 이미지 검출 후-검증 방법)

  • Lee, Yu Jin;Nang, Jongho
    • Journal of KIISE
    • /
    • v.42 no.6
    • /
    • pp.699-706
    • /
    • 2015
  • In recent years, as near-duplicate image has been increasing explosively by the spread of Internet and image-editing technology that allows easy access to image contents, related research has been done briskly. However, BoF (Bag-of-Feature), the most frequently used method for near-duplicate image detection, can cause problems that distinguish the same features from different features or the different features from same features in the quantization process of approximating a high-level local features to low-level. Therefore, a post-verification method for BoF is required to overcome the limitation of vector quantization. In this paper, we proposed and analyzed the performance of a post-verification method for BoF, which converts SIFT (Scale Invariant Feature Transform) descriptors into 128 bits binary codes and compares binary distance regarding of a short ranked list by BoF using the codes. Through an experiment using 1500 original images, it was shown that the near-duplicate detection accuracy was improved by approximately 4% over the previous BoF method.

Unsupervised Document Clustering for Constructing User Profile of Web Agent (웹 에이전트 사용자 특성모델 구축을 위한 비감독 문서 분류)

  • 오재준;박영택
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10c
    • /
    • pp.105-107
    • /
    • 1998
  • 본 연구는 웹 에이전트에 있어서 가장 핵심적인 부분이라 할 수 있는 사용자 특성모델 구축방법을 개선하는데 목적을 두고 있다. 사용자 특성모델을 귀납적 기계학습 방식으로 자동 추출하기 위해서는, 사용자가 관심을 가지는 분야별로 문서를 자동 분류하는 작업이 매우 중요하다. 지금까지의 방식은 사람이 관심부여에 따라 문서를 수동적으로 분류해 왔으나, 문서의 양이 기하급수적으로 증가할 경우 처리할 수 있는 문서의 양에는 한계가 있을 수밖에 없다. 또한 수작업 문서 분류 방식을 웹 에이전트에 그대로 적용하였을 경우 사용자가 일일이 문서를 분류해야한다는 번거로움으로 인해 웹 에이전트의 효용성이 반감될 것이다. 따라서 본 연구에서는 비감독 문서 분류 알고리즘과 그것을 바탕으로 얻어진 문서 분류 정보를 후처리 (Post-Processing)함으로써 보다 간결하고 정확한 문서 분류 결과를 얻을 수 있는 구체적인 방법을 제공하고자 한다.

  • PDF

Unsupervised Document Clustering for Constructing User Profile of Web Agent (웹 에이전트 사용자 특성모델 구축을 위한 비감독 문서 분류)

  • 오재준;박영택
    • Journal of Intelligence and Information Systems
    • /
    • v.4 no.2
    • /
    • pp.61-83
    • /
    • 1998
  • 본 연구는 웹 에이전트에 있어서 가장 핵심적인 부분이라 할 수 있는 사용자 특성모델 구축방법을 개선하는데 목적을 두고 있다. 사용자 특성모델을 귀납적 기계학습 방식으로 자동 추출하기 위해서는 사용자가 관심을 가지는 분야별로 문서를 자동 분류하는 작업이 매우 중요하다 지금까지의 방식은 사람이 관심여부에 따라 문서를 수동적으로 분류해 왔으나, 문서의 양이 기하급수적으로 증가할 경우 처리할 수 있는 문서의 양에는 한계가 있을 수밖에 없다. 또한 수작업 문서분류 방식을 웹 에이전트에 그대로 적용하였을 경우 사용자가 일일이 문서를 분류해야한다는 번거로움으로 인해 웹 에이전트의 효용성이 반감될 것이다. 따라서 본 연구에서는 비감독 문서분류 알고리즘과 그것을 바탕으로 얻어진 문서분류정보를 후처리(Post-Processing)함으로써 보다 간결하고 정확한 문서분류 결과를 얻을 수 있는 구체적인 방법을 제공하고자 한다.

  • PDF

Analysis of Land Uses in the Nakdong River Floodplain Using RapidEye Imagery and LiDAR DEM (RapidEye 영상과 LiDAR DEM을 이용한 낙동강 범람원 내 토지 이용 현황 분석)

  • Choung, Yun-Jae
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.17 no.4
    • /
    • pp.189-199
    • /
    • 2014
  • Floodplain is a flat plain between levees and rivers. This paper suggests a methodology for analyzing the land uses in the Nakdong River floodplain using the RapidEye imagery and the given LiDAR(LIght Detection And Ranging) DEM(Digital Elevation Models). First, the levee boundaries are generated using the LiDAR DEM, and the area of the floodplain is extracted from the given RapidEye imagery. The land uses in the floodplain are identified in the extracted RapidEye imagery by the ISODATA(Iterative Self-Organizing Data Analysis Technique Analysis) clustering. The overall accuracy of the identified land uses by the ISODATA clustering is 91%. Analysis of the identified land uses in the floodplain is implemented by counting the number of the pixels constituting the land cover clusters. The results of this research shows that the area of the river occupies 46%, the area of the bare soil occupies 36%, the area of the marsh occupies 11%, and the area of the grass occupies 7% in the identified floodplain.

Density-Based Estimation of POI Boundaries Using Geo-Tagged Tweets (공간 태그된 트윗을 사용한 밀도 기반 관심지점 경계선 추정)

  • Shin, Won-Yong;Vu, Dung D.
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.453-459
    • /
    • 2017
  • Users tend to check in and post their statuses in location-based social networks (LBSNs) to describe that their interests are related to a point-of-interest (POI). While previous studies on discovering area-of-interests (AOIs) were conducted mostly on the basis of density-based clustering methods with the collection of geo-tagged photos from LBSNs, we focus on estimating a POI boundary, which corresponds to only one cluster containing its POI center. Using geo-tagged tweets recorded from Twitter users, this paper introduces a density-based low-complexity two-phase method to estimate a POI boundary by finding a suitable radius reachable from the POI center. We estimate a boundary of the POI as the convex hull of selected geo-tags through our two-phase density-based estimation, where each phase proceeds with different sizes of radius increment. It is shown that our method outperforms the conventional density-based clustering method in terms of computational complexity.

The syllable recovrey rule-based system and the application of a morphological analysis method for the post-processing of a continuous speech recognition (연속음성인식 후처리를 위한 음절 복원 rule-based 시스템과 형태소분석기법의 적용)

  • 박미성;김미진;김계성;최재혁;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.3
    • /
    • pp.47-56
    • /
    • 1999
  • Various phonological alteration occurs when we pronounce continuously in korean. This phonological alteration is one of the major reasons which make the speech recognition of korean difficult. This paper presents a rule-based system which converts a speech recognition character string to a text-based character string. The recovery results are morphologically analyzed and only a correct text string is generated. Recovery is executed according to four kinds of rules, i.e., a syllable boundary final-consonant initial-consonant recovery rule, a vowel-process recovery rule, a last syllable final-consonant recovery rule and a monosyllable process rule. We use a x-clustering information for an efficient recovery and use a postfix-syllable frequency information for restricting recovery candidates to enter morphological analyzer. Because this system is a rule-based system, it doesn't necessitate a large pronouncing dictionary or a phoneme dictionary and the advantage of this system is that we can use the being text based morphological analyzer.

  • PDF

A Fast Way for Alignment Marker Detection and Position Calibration (Alignment Marker 고속 인식 및 위치 보정 방법)

  • Moon, Chang Bae;Kim, HyunSoo;Kim, HyunYong;Lee, Dongwon;Kim, Tae-Hoon;Chung, Hae;Kim, Byeong Man
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.1
    • /
    • pp.35-42
    • /
    • 2016
  • The core of the machine vision that is frequently used at the pre/post-production stages is a marker alignment technology. In this paper, a method to detect the angle and position of a product at high speed by use of a unique pattern present in the marker stamped on the product, and calibrate them is proposed. In the proposed method, to determine the angle and position of a marker, the candidates of the marker are extracted by using a variation of the integral histogram, and then clustering is applied to reduce the candidates. The experimental results revealed about 5s 719ms improvement in processing time and better precision in detecting the rotation angle of a product.

Fast k-NN based Malware Analysis in a Massive Malware Environment

  • Hwang, Jun-ho;Kwak, Jin;Lee, Tae-jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.6145-6158
    • /
    • 2019
  • It is a challenge for the current security industry to respond to a large number of malicious codes distributed indiscriminately as well as intelligent APT attacks. As a result, studies using machine learning algorithms are being conducted as proactive prevention rather than post processing. The k-NN algorithm is widely used because it is intuitive and suitable for handling malicious code as unstructured data. In addition, in the malicious code analysis domain, the k-NN algorithm is easy to classify malicious codes based on previously analyzed malicious codes. For example, it is possible to classify malicious code families or analyze malicious code variants through similarity analysis with existing malicious codes. However, the main disadvantage of the k-NN algorithm is that the search time increases as the learning data increases. We propose a fast k-NN algorithm which improves the computation speed problem while taking the value of the k-NN algorithm. In the test environment, the k-NN algorithm was able to perform with only the comparison of the average of similarity of 19.71 times for 6.25 million malicious codes. Considering the way the algorithm works, Fast k-NN algorithm can also be used to search all data that can be vectorized as well as malware and SSDEEP. In the future, it is expected that if the k-NN approach is needed, and the central node can be effectively selected for clustering of large amount of data in various environments, it will be possible to design a sophisticated machine learning based system.

Hair Classification and Region Segmentation by Location Distribution and Graph Cutting (위치 분포 및 그래프 절단에 의한 모발 분류와 영역 분할)

  • Kim, Yong-Gil;Moon, Kyung-Il
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.3
    • /
    • pp.1-8
    • /
    • 2022
  • Recently, Google MedeiaPipe presents a novel approach for neural network-based hair segmentation from a single camera input specifically designed for real-time, mobile application. Though neural network related to hair segmentation is relatively small size, it produces a high-quality hair segmentation mask that is well suited for AR effects such as a realistic hair recoloring. However, it has undesirable segmentation effects according to hair styles or in case of containing noises and holes. In this study, the energy function of the test image is constructed according to the estimated prior distributions of hair location and hair color likelihood function. It is further optimized according to graph cuts algorithm and initial hair region is obtained. Finally, clustering algorithm and image post-processing techniques are applied to the initial hair region so that the final hair region can be segmented precisely. The proposed method is applied to MediaPipe hair segmentation pipeline.

A Study on the Reading Program Improvement Plan of a Public Library Based on the Reading Culture Promotion Policy (독서문화진흥 정책에 기반한 공공도서관의 독서 프로그램 개선 방안 연구)

  • Miah Cho;Seung-Jin Kwak
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.3
    • /
    • pp.191-210
    • /
    • 2023
  • The purpose of this study is to draw implications through domestic and international best case studies of library programs, and to suggest ways to improve a public library reading programs through analysis based on the 3rd Reading Culture Promotion Basic Plan in line with the changing role of future libraries. there is To this end, first, prior studies were analyzed from various angles to derive clustering standards for library programs. Based on this, programs of various domestic and foreign libraries were analyzed based on clustering criteria. And based on the clustering criteria of library programs and the 13 key tasks under the 4 strategies of the 3rd Reading Culture Promotion Basic Plan, the status of a specific public library reading programs was analyzed. Through this, in consideration of the demand of users in the era of the 4th Industrial Revolution, participatory reading promotion programs are expanded, and in response to the post-COVID-19 era, beyond face-to-face library services, non-face-to-face and non-contact library services are also considered. A development plan was presented. It is expected that this analysis and application attempt will ultimately go beyond the unit library and contribute to improving the public library service in Korea into a library program closely related to the lives of users.