• Title/Summary/Keyword: Statistics Matching

Search Result 184, Processing Time 0.029 seconds

Feature-Based Image Retrieval using SOM-Based R*-Tree

  • Shin, Min-Hwa;Kwon, Chang-Hee;Bae, Sang-Hyun
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.223-230
    • /
    • 2003
  • Feature-based similarity retrieval has become an important research issue in multimedia database systems. The features of multimedia data are useful for discriminating between multimedia objects (e 'g', documents, images, video, music score, etc.). For example, images are represented by their color histograms, texture vectors, and shape descriptors, and are usually high-dimensional data. The performance of conventional multidimensional data structures(e'g', R- Tree family, K-D-B tree, grid file, TV-tree) tends to deteriorate as the number of dimensions of feature vectors increases. The R*-tree is the most successful variant of the R-tree. In this paper, we propose a SOM-based R*-tree as a new indexing method for high-dimensional feature vectors.The SOM-based R*-tree combines SOM and R*-tree to achieve search performance more scalable to high dimensionalities. Self-Organizing Maps (SOMs) provide mapping from high-dimensional feature vectors onto a two dimensional space. The mapping preserves the topology of the feature vectors. The map is called a topological of the feature map, and preserves the mutual relationship (similarity) in the feature spaces of input data, clustering mutually similar feature vectors in neighboring nodes. Each node of the topological feature map holds a codebook vector. A best-matching-image-list. (BMIL) holds similar images that are closest to each codebook vector. In a topological feature map, there are empty nodes in which no image is classified. When we build an R*-tree, we use codebook vectors of topological feature map which eliminates the empty nodes that cause unnecessary disk access and degrade retrieval performance. We experimentally compare the retrieval time cost of a SOM-based R*-tree with that of an SOM and an R*-tree using color feature vectors extracted from 40, 000 images. The result show that the SOM-based R*-tree outperforms both the SOM and R*-tree due to the reduction of the number of nodes required to build R*-tree and retrieval time cost.

  • PDF

Spatial Gap-Filling of Hourly AOD Data from Himawari-8 Satellite Using DCT (Discrete Cosine Transform) and FMM (Fast Marching Method)

  • Youn, Youjeong;Kim, Seoyeon;Jeong, Yemin;Cho, Subin;Kang, Jonggu;Kim, Geunah;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.4
    • /
    • pp.777-788
    • /
    • 2021
  • Since aerosol has a relatively short duration and significant spatial variation, satellite observations become more important for the spatially and temporally continuous quantification of aerosol. However, optical remote sensing has the disadvantage that it cannot detect AOD (Aerosol Optical Depth) for the regions covered by clouds or the regions with extremely high concentrations. Such missing values can increase the data uncertainty in the analyses of the Earth's environment. This paper presents a spatial gap-filling framework using a univariate statistical method such as DCT-PLS (Discrete Cosine Transform-based Penalized Least Square Regression) and FMM (Fast Matching Method) inpainting. We conducted a feasibility test for the hourly AOD product from AHI (Advanced Himawari Imager) between January 1 and December 31, 2019, and compared the accuracy statistics of the two spatial gap-filling methods. When the null-pixel area is not very large (null-pixel ratio < 0.6), the validation statistics of DCT-PLS and FMM techniques showed high accuracy of CC=0.988 (MAE=0.020) and CC=0.980 (MAE=0.028), respectively. Together with the AI-based gap-filling method using extra explanatory variables, the DCT-PLS and FMM techniques can be tested for the low-resolution images from the AMI (Advanced Meteorological Imager) of GK2A (Geostationary Korea Multi-purpose Satellite 2A), GEMS (Geostationary Environment Monitoring Spectrometer) and GOCI2 (Geostationary Ocean Color Imager) of GK2B (Geostationary Korea Multi-purpose Satellite 2B) and the high-resolution images from the CAS500 (Compact Advanced Satellite) series soon.

Overview of estimating the average treatment effect using dimension reduction methods (차원축소 방법을 이용한 평균처리효과 추정에 대한 개요)

  • Mijeong Kim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.4
    • /
    • pp.323-335
    • /
    • 2023
  • In causal analysis of high dimensional data, it is important to reduce the dimension of covariates and transform them appropriately to control confounders that affect treatment and potential outcomes. The augmented inverse probability weighting (AIPW) method is mainly used for estimation of average treatment effect (ATE). AIPW estimator can be obtained by using estimated propensity score and outcome model. ATE estimator can be inconsistent or have large asymptotic variance when using estimated propensity score and outcome model obtained by parametric methods that includes all covariates, especially for high dimensional data. For this reason, an ATE estimation using an appropriate dimension reduction method and semiparametric model for high dimensional data is attracting attention. Semiparametric method or sparse sufficient dimensionality reduction method can be uesd for dimension reduction for the estimation of propensity score and outcome model. Recently, another method has been proposed that does not use propensity score and outcome regression. After reducing dimension of covariates, ATE estimation can be performed using matching. Among the studies on ATE estimation methods for high dimensional data, four recently proposed studies will be introduced, and how to interpret the estimated ATE will be discussed.

A Study on the Data Fusion Method using Decision Rule for Data Enrichment (의사결정 규칙을 이용한 데이터 통합에 관한 연구)

  • Kim S.Y.;Chung S.S.
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.291-303
    • /
    • 2006
  • Data mining is the work to extract information from existing data file. So, the one of best important thing in data mining process is the quality of data to be used. In this thesis, we propose the data fusion technique using decision rule for data enrichment that one phase to improve data quality in KDD process. Simulations were performed to compare the proposed data fusion technique with the existing techniques. As a result, our data fusion technique using decision rule is characterized with low MSE or misclassification rate in fusion variables.

Factors Influencing Somatization in Adolescents (청소년의 신체화 증상에 영향을 미치는 요인)

  • Lee, Han-Ju;Seo, Mi-A
    • Journal of the Korean Society of School Health
    • /
    • v.23 no.1
    • /
    • pp.79-87
    • /
    • 2010
  • Purpose: The purpose of this study was to explore the relationship between depression, alexithymia, social support and somatic symptom in adolescents. Methods: The subjects were 1,519 adolescents in Seoul. Radloff's CES-D (The Center for Epidemiological Studies-Depression scale) for depression, Bagby, Parker and Taylor's TAS (Toronto Alexithymia Scale) for alexithymia, Park's social support and Derogatis's SCL-90 (Brief Symptom Inventory & Matching Clinical Rating Scale) were used. The data was analyzed using descriptive statistics, Pearson's correlation coefficients, t or F test, and stepwise multiple regression. Results: Depression and somatic symptom were lower but social support was higher when compared to mean score. The somatic symptom was significantly positive correlations to age, depression, alexithymia but no correlation to social support. Stepwise multiple regression analysis showed that 21.8% of the somatic symptom was significantly accounted for depression, alexithymia, social support, gender, economic status, living alone, and living with parent. Conclusion: These results suggest that depression, alexithymia, living alone can be potential risk factors for somatic symptom in the adolescents. Therefore, these findings will give useful information for developing a promotion program focused on social support in the adolescents.

Application of Market Basket Analysis to Personalized advertisements on Internet Storefront (인터넷 상점에서 개인화 광고를 위한 장바구니 분석 기법의 활용)

  • 김종우;이경미
    • Korean Management Science Review
    • /
    • v.17 no.3
    • /
    • pp.19-30
    • /
    • 2000
  • Customization and personalization services are considered as a critical success factor to be a successful Internet store or web service provider. As a representative personalization technique, personalized recommendation techniques are studied and commercialized to suggest products or services to a customer of Internet storefronts based on demographics of the customer or based on an analysis of the past purchasing behavior of the customer. The underlining theories of recommendation techniques are statistics, data mining, artificial intelligence, and/or rule-based matching. In the rule-based approach for personalized recommendation, marketing rules for personalization are usually collected from marketing experts and are used to inference with customers data. however, it is difficult to extract marketing rules from marketing experts, and also difficult to validate and to maintain the constructed knowledge base. In this paper, we proposed a marketing rule extraction technique for personalized recommendation on Internet storefronts using market basket analysis technique, a well-known data mining technique. Using marketing basket analysis technique, marketing rules for cross sales are extracted, and are used to provide personalized advertisement selection when a customer visits in an Internet store. An experiment has been performed to evaluate the effectiveness of proposed approach comparing with preference scoring approach and random selection.

  • PDF

Precise Detection of Car License Plates by Locating Main Characters

  • Lee, Dae-Ho;Choi, Jin-Hyuk
    • Journal of the Optical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.376-382
    • /
    • 2010
  • We propose a novel method to precisely detect car license plates by locating main characters, which are printed with large font size. The regions of the main characters are directly detected without detecting the plate region boundaries, so that license regions can be detected more precisely than by other existing methods. To generate a binary image, multiple thresholds are applied, and segmented regions are selected from multiple binarized images by a criterion of size and compactness. We do not employ any character matching methods, so that many candidates for main character groups are detected; thus, we use a neural network to reject non-main character groups from the candidates. The relation of the character regions and the intensity statistics are used as the input to the neural network for classification. The detection performance has been investigated on real images captured under various illumination conditions for 1000 vehicles. 980 plates were correctly detected, and almost all non-detected plates were so stained that their characters could not be isolated for character recognition. In addition, the processing time is fast enough for a commercial automatic license plate recognition system. Therefore, the proposed method can be used for recognition systems with high performance and fast processing.

A Study on the Data Fusion for Data Enrichment (데이터 보강을 위한 데이터 통합기법에 관한 연구)

  • 정성석;김순영;김현진
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.605-617
    • /
    • 2004
  • One of the best important thing in data mining process is the quality of data used. When we perform the mining on data with excellent quality, the potential value of data mining can be improved. In this paper, we propose the data fusion technique for data enrichment that one phase can improve data quality in KDD process. We attempted to add k-NN technique to the regression technique, to improve performance of fusion technique through reduction of the loss of information. Simulations were performed to compare the proposed data fusion technique with the regression technique. As a result, the newly proposed data fusion technique is characterized with low MSE in continuous fusion variables.

An Ontology Matching Method based on ISO/IEC 11179 (ISO/IEC 11179 기반의 온톨로지 매칭 방법)

  • Lee, Ji-Yoon;Lee, Suk-Hoon;Kim, Jang-Won;Jeong, Dong-Won;Baik, Doo-Kwon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.95-97
    • /
    • 2012
  • 다양한 온톨로지들이 구축되고 이를 적용한 시스템들이 늘어가면서 시스템 간 상호운용성에 문제가 발생하게 되었다. 이러한 문제점을 해결하기 위해 공통 개념이라 볼 수 있는 온톨로지를 메타데이터 레지스트리에 등록하고, 이를 기반으로 한 시스템들이 개발되면서 시스템 간 상호운용성이 향상되었다. 하지만 서로 다른 메타데이터 레지스트리를 기반으로 한 시스템 간에는 상호운용성 문제가 여전히 존재하므로, 메타데이터 레지스트리에 등록된 온톨로지 간 매칭 방법에 대한 필요성이 대두되었다. 기존의 온톨로지 매칭 방법들은 온톨로지의 규모가 작을 경우 정확한 매칭 결과를 제공하지 못하는 문제점을 가진다. 따라서 이 논문에서는 메타데이터에 레지스트리에 등록된 온톨로지들을 매칭하기 위하여 메타데이터 레지스트리의 구조상의 특징을 반영하여 온톨로지를 확장한다. 그리고 확장된 온톨로지를 이용하여 온톨로지를 매칭 함으로써 정확한 매칭이 이루어지는 온톨로지 매칭 방법을 제안한다. 또한 제안 방법의 장점을 보이기 위해 기존 온톨로지 매칭 방법들과의 비교평가를 수행한다. 제안 방법은 매칭의 정확성을 보장하고 효율성을 높이며 메타데이터 레지스트리간 상호운용성을 높인다.

RECENT RESEARCH AND DEVELOPING TREND OF ENGINEERING MANAGEMENT IN CHINA BASED ON TEXT MINING

  • Shaohua Jiang;Wenling Zhang;Zhaohong Qiu;Shaojun Wang
    • International conference on construction engineering and project management
    • /
    • 2009.05a
    • /
    • pp.814-820
    • /
    • 2009
  • With the rapid development of China economy, many engineering projects with large scale and investment were constructed in China and some were the biggest ones in the world. With the development of engineering practice, great progress in the research of engineering management of China was made and a large number of research findings were embodied in content of research papers and were represented by technical words. To know the state of arts in the research field of engineering management in China, three major parts, namely title, abstract and keywords of research papers in last five years from three representative Chinese journals about engineering management were chose as research materials. Unlike western languages, there are no delimiters between the words of Chinese, so the maximum matching and frequency statistics (MMFS) method, a text segmentation technique of text mining Chinese, was presented to extract the features consisting of technical words, phrases and words from the research materials. Recent research and developing trend of engineering management in China were found by comparing and analyzing the difference of technical words in the research materials of last five years.

  • PDF