• Title/Summary/Keyword: Retrieval Algorithm

Search Result 736, Processing Time 0.019 seconds

Development of an Image Tagging System Based on Crowdsourcing (크라우드소싱 기반 이미지 태깅 시스템 구축 연구)

  • Lee, Hyeyoung;Chang, Yunkeum
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.29 no.3
    • /
    • pp.297-320
    • /
    • 2018
  • This study aims to improve the access and retrieval of images and to find a way to effectively generate tags as a tool for providing explanation of images. To do this, this study investigated the features of human tagging and machine tagging, and compare and analyze them. Machine tags had the highest general attributes, some specific attributes and visual elements, and few abstract attributes. The general attribute of the human tag was the highest, but the specific attribute was high for the object and scene where the human tag constructor can recognize the name. In addition, sentiments and emotions, as well as subjects of abstract concepts, events, places, time, and relationships are represented by various tags. The tag set generated through this study can be used as basic data for constructing training data set to improve the machine learning algorithm.

Collection and Extraction Algorithm of Field-Associated Terms (분야연상어의 수집과 추출 알고리즘)

  • Lee, Sang-Kon;Lee, Wan-Kwon
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.347-358
    • /
    • 2003
  • VSField-associated term is a single or compound word whose terms occur in any document, and which makes it possible to recognize a field of text by using common knowledge of human. For example, human recognizes the field of document such as or , a field name of text, when she encounters a word 'Pitcher' or 'election', respectively We Proposes an efficient construction method of field-associated terms (FTs) for specializing field to decide a field of text. We could fix document classification scheme from well-classified document database or corpus. Considering focus field we discuss levels and stability ranks of field-associated terms. To construct a balanced FT collection, we construct a single FTs. From the collections we could automatically construct FT's levels, and stability ranks. We propose a new extraction algorithms of FT's for document classification by using FT's concentration rate, its occurrence frequencies.

An Efficient Computation Method of Zernike Moments Using Symmetric Properties of the Basis Function (기저 함수의 대칭성을 이용한 저니키 모멘트의 효율적인 계산 방법)

  • 황선규;김회율
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.5
    • /
    • pp.563-569
    • /
    • 2004
  • A set of Zernike moments has been successfully used for object recognition or content-based image retrieval systems. Real time applications using Zernike moments, however, have been limited due to its complicated definition. Conventional methods to compute Zernike moments fast have focused mainly on the radial components of the moments. In this paper, utilizing symmetric/anti-symmetric properties of Zernike basis functions, we propose a fast and efficient method for Zernike moments. By reducing the number of operations to one quarter of the conventional methods in the proposed method, the computation time to generate Zernike basis functions was reduced to about 20% compared with conventional methods. In addition, the amount of memory required for efficient computation of the moments is also reduced to a quarter. We also showed that the algorithm can be extended to compute the similar classes of rotational moments, such as pseudo-Zernike moments, and ART descriptors in same manner.

Content-based Shot Boundary Detection from MPEG Data using Region Flow and Color Information (영역 흐름 및 칼라 정보를 이용한 MPEG 데이타의 내용 기반 셧 경계 검출)

  • Kang, Hang-Bong
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.4
    • /
    • pp.402-411
    • /
    • 2000
  • It is an important step in video indexing and retrieval to detect shot boundaries on video data. Some approaches are proposed to detect shot changes by computing color histogram differences or the variances of DCT coefficients. However, these approaches do not consider the content or meaningful features in the image data which are useful in high level video processing. In particular, it is desirable to detect these features from compressed video data because this requires less processing overhead. In this paper, we propose a new method to detect shot boundaries from MPEG data using region flow and color information. First, we reconstruct DC images and compute region flow information and color histogram differences from HSV quantized images. Then, we compute the points at which region flow has discontinuities or color histogram differences are high. Finally, we decide those points as shot boundaries according to our proposed algorithm.

  • PDF

A Scheduling Algorithm for Parsing of MPEG Video on the Heterogeneous Distributed Environment (이질적인 분산 환경에서의 MPEG비디오의 파싱을 위한 스케줄링 알고리즘)

  • Nam Yunyoung;Hwang Eenjun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.12
    • /
    • pp.673-681
    • /
    • 2004
  • As the use of digital videos is getting popular, there is an increasing demand for efficient browsing and retrieval of video. To support such operations, effective video indexing should be incorporated. One of the most fundamental steps in video indexing is to parse video stream into shots and scenes. Generally, it takes long time to parse a video due to the huge amount of computation in a traditional single computing environment. Previous studies had widely used Round Robin scheduling which basically allocates tasks to each slave for a time interval of one quantum. This scheduling is difficult to adapt in a heterogeneous environment. In this paper, we propose two different parallel parsing algorithms which are Size-Adaptive Round Robin and Dynamic Size-Adaptive Round Robin for the heterogeneous distributed computing environments. In order to show their performance, we perform several experiments and show some of the results.

Developing of Text Plagiarism Detection Model using Korean Corpus Data (한글 말뭉치를 이용한 한글 표절 탐색 모델 개발)

  • Ryu, Chang-Keon;Kim, Hyong-Jun;Cho, Hwan-Gue
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.2
    • /
    • pp.231-235
    • /
    • 2008
  • Recently we witnessed a few scandals on plagiarism among academic paper and novels. Plagiarism on documents is getting worse more frequently. Although plagiarism on English had been studied so long time, we hardly find the systematic and complete studies on plagiarisms in Korean documents. Since the linguistic features of Korean are quite different from those of English, we cannot apply the English-based method to Korean documents directly. In this paper, we propose a new plagiarism detecting method for Korean, and we throughly tested our algorithm with one benchmark Korean text corpus. The proposed method is based on "k-mer" and "local alignment" which locates the region of plagiarized document pairs fast and accurately. Using a Korean corpus which contains more than 10 million words, we establish a probability model (or local alignment score (random similarity by chance). The experiment has shown that our system was quite successful to detect the plagiarized documents.

An Indexing System for Retrieving Similar Paths in XML Documents (XML 문서의 유사 경로 검색을 위한 인덱싱 시스템)

  • Lee, Bum-Suk;Hwang, Byung-Yeon
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.171-178
    • /
    • 2008
  • Since the XML standard was introduced by the W3C in 1998, documents that have been written in XML have been gradually increasing. Accordingly, several systems have been developed in order to efficiently manage and retrieve massive XML documents. BitCube-a bitmap indexing system-is a representative system for this field of research. Based on the bitmap indexing technique, the path bitmap indexing system(LH06), which performs the clustering of similar paths, improved the problem that the existing BitCube system could not solve, namely, determining similar paths. The path bitmap indexing system has the advantage of a higher retrieval speed in not only exactly matched path searching but also similar path searching. However, the similarity calculation algorithm of this system has a few particular problems. Consequently, it sometimes cannot calculate the similarity even though some of two paths have extremely similar relationships; further, it results in an increment in the number of meaningless clusters. In this paper, we have proposed a novel method that clustering, the similarity between the paths in order to solve these problems. The proposed system yields a stable result for clustering, and it obtains a high score in clustering precision during a performance evaluation against LH06.

Latent Semantic Indexing Analysis of K-Means Document Clustering for Changing Index Terms Weighting (색인어 가중치 부여 방법에 따른 K-Means 문서 클러스터링의 LSI 분석)

  • Oh, Hyung-Jin;Go, Ji-Hyun;An, Dong-Un;Park, Soon-Chul
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.735-742
    • /
    • 2003
  • In the information retrieval system, document clustering technique is to provide user convenience and visual effects by rearranging documents according to the specific topics from the retrieved ones. In this paper, we clustered documents using K-Means algorithm and present the effect of index terms weighting scheme on the document clustering. To verify the experiment, we applied Latent Semantic Indexing approach to illustrate the clustering results and analyzed the clustering results in 2-dimensional space. Experimental results showed that in case of applying local weighting, global weighting and normalization factor, the density of clustering is higher than those of similar or same weighting schemes in 2-dimensional space. Especially, the logarithm of local and global weighting is noticeable.

A Cell-based Clustering Method for Large High-dimensional Data in Data Mining (데이타마이닝에서 고차원 대용량 데이타를 위한 셀-기반 클러스터 링 방법)

  • Jin, Du-Seok;Chang, Jae-Woo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.558-567
    • /
    • 2001
  • Recently, data mining applications require a large amount of high-dimensional data Most algorithms for data mining applications however, do not work efficiently of high-dimensional large data because of the so-called curse of dimensionality[1] and the limitation of available memory. To overcome these problems, this paper proposes a new cell-based clustering which is more efficient than the existing algorithms for high-dimensional large data, Our clustering method provides a cell construction algorithm for dealing with high-dimensional large data and a index structure based of filtering .We do performance comparison of our cell-based clustering method with the CLIQUE method in terms of clustering time, precision, and retrieval time. Finally, the results from our experiment show that our cell-based clustering method outperform the CLIQUE method.

  • PDF

The Characteristics of Visible Reflectance and Infra Red Band over Snow Cover Area (적설역에서 나타나는 적외 휘도온도와 반사도 특성)

  • Yeom, Jong-Min;Han, Kyung-Soo;Lee, Ga-Lam
    • Korean Journal of Remote Sensing
    • /
    • v.25 no.2
    • /
    • pp.193-203
    • /
    • 2009
  • Snow cover is one of the important parameters since it determines surface energy balance and its variation. To classify snow and cloud from satellite data is very important process when inferring land surface information. Generally, misclassified cloud and snow pixel can lead directly to error factor for retrieval of surface products from satellite data. Therefore, in this study, we perform algorithm for detecting snow cover area with remote sensing data. We just utilize visible reflectance, and infrared channels rather than using NDSI (Normalized Difference Snow Index) which is one of optimized methods to detect snow cover. Because COMS MI (Meteorological Imager) channels doesn't include near infra-red, which is used to produce NDSI. Detecting snow cover with visible channel is well performed over clear sky area, but it is difficult to discriminate snow cover from mixed cloudy pixels. To improve those detecting abilities, brightness temperature difference (BTD) between 11 and 3.7 is used for snow detection. BTD method shows improved results than using only visible channel.