• 제목/요약/키워드: Statistics Matching

검색결과 185건 처리시간 0.027초

Feature-Based Image Retrieval using SOM-Based R*-Tree

  • Shin, Min-Hwa;Kwon, Chang-Hee;Bae, Sang-Hyun
    • 한국산학기술학회:학술대회논문집
    • /
    • 한국산학기술학회 2003년도 Proceeding
    • /
    • pp.223-230
    • /
    • 2003
  • Feature-based similarity retrieval has become an important research issue in multimedia database systems. The features of multimedia data are useful for discriminating between multimedia objects (e 'g', documents, images, video, music score, etc.). For example, images are represented by their color histograms, texture vectors, and shape descriptors, and are usually high-dimensional data. The performance of conventional multidimensional data structures(e'g', R- Tree family, K-D-B tree, grid file, TV-tree) tends to deteriorate as the number of dimensions of feature vectors increases. The R*-tree is the most successful variant of the R-tree. In this paper, we propose a SOM-based R*-tree as a new indexing method for high-dimensional feature vectors.The SOM-based R*-tree combines SOM and R*-tree to achieve search performance more scalable to high dimensionalities. Self-Organizing Maps (SOMs) provide mapping from high-dimensional feature vectors onto a two dimensional space. The mapping preserves the topology of the feature vectors. The map is called a topological of the feature map, and preserves the mutual relationship (similarity) in the feature spaces of input data, clustering mutually similar feature vectors in neighboring nodes. Each node of the topological feature map holds a codebook vector. A best-matching-image-list. (BMIL) holds similar images that are closest to each codebook vector. In a topological feature map, there are empty nodes in which no image is classified. When we build an R*-tree, we use codebook vectors of topological feature map which eliminates the empty nodes that cause unnecessary disk access and degrade retrieval performance. We experimentally compare the retrieval time cost of a SOM-based R*-tree with that of an SOM and an R*-tree using color feature vectors extracted from 40, 000 images. The result show that the SOM-based R*-tree outperforms both the SOM and R*-tree due to the reduction of the number of nodes required to build R*-tree and retrieval time cost.

  • PDF

Spatial Gap-Filling of Hourly AOD Data from Himawari-8 Satellite Using DCT (Discrete Cosine Transform) and FMM (Fast Marching Method)

  • Youn, Youjeong;Kim, Seoyeon;Jeong, Yemin;Cho, Subin;Kang, Jonggu;Kim, Geunah;Lee, Yangwon
    • 대한원격탐사학회지
    • /
    • 제37권4호
    • /
    • pp.777-788
    • /
    • 2021
  • Since aerosol has a relatively short duration and significant spatial variation, satellite observations become more important for the spatially and temporally continuous quantification of aerosol. However, optical remote sensing has the disadvantage that it cannot detect AOD (Aerosol Optical Depth) for the regions covered by clouds or the regions with extremely high concentrations. Such missing values can increase the data uncertainty in the analyses of the Earth's environment. This paper presents a spatial gap-filling framework using a univariate statistical method such as DCT-PLS (Discrete Cosine Transform-based Penalized Least Square Regression) and FMM (Fast Matching Method) inpainting. We conducted a feasibility test for the hourly AOD product from AHI (Advanced Himawari Imager) between January 1 and December 31, 2019, and compared the accuracy statistics of the two spatial gap-filling methods. When the null-pixel area is not very large (null-pixel ratio < 0.6), the validation statistics of DCT-PLS and FMM techniques showed high accuracy of CC=0.988 (MAE=0.020) and CC=0.980 (MAE=0.028), respectively. Together with the AI-based gap-filling method using extra explanatory variables, the DCT-PLS and FMM techniques can be tested for the low-resolution images from the AMI (Advanced Meteorological Imager) of GK2A (Geostationary Korea Multi-purpose Satellite 2A), GEMS (Geostationary Environment Monitoring Spectrometer) and GOCI2 (Geostationary Ocean Color Imager) of GK2B (Geostationary Korea Multi-purpose Satellite 2B) and the high-resolution images from the CAS500 (Compact Advanced Satellite) series soon.

차원축소 방법을 이용한 평균처리효과 추정에 대한 개요 (Overview of estimating the average treatment effect using dimension reduction methods)

  • 김미정
    • 응용통계연구
    • /
    • 제36권4호
    • /
    • pp.323-335
    • /
    • 2023
  • 고차원 데이터의 인과 추론에서 고차원 공변량의 차원을 축소하고 적절히 변형하여 처리와 잠재 결과에 영향을 줄 수 있는 교란을 통제하는 것은 중요한 문제이다. 평균 처리 효과(average treatment effect; ATE) 추정에 있어서, 성향점수와 결과 모형 추정을 이용한 확장된 역확률 가중치 방법이 주로 사용된다. 고차원 데이터의 분석시 모든 공변량을 포함한 모수 모형을 이용하여 성향 점수와 결과 모형 추정을 할 경우, ATE 추정량이 일치성을 갖지 않거나 추정량의 분산이 큰 값을 가질 수 있다. 이런 이유로 고차원 데이터에 대한 적절한 차원 축소 방법과 준모수 모형을 이용한 ATE 방법이 주목 받고 있다. 이와 관련된 연구로는 차원 축소부분에 준모수 모형과 희소 충분 차원 축소 방법을 활용한 연구가 있다. 최근에는 성향점수와 결과 모형을 추정하지 않고, 차원 축소 후 매칭을 활용한 ATE 추정 방법도 제시되었다. 고차원 데이터의 ATE 추정 방법연구 중 최근에 제시된 네 가지 연구에 대해 소개하고, 추정치 해석시 유의할 점에 대하여 논하기로 한다.

의사결정 규칙을 이용한 데이터 통합에 관한 연구 (A Study on the Data Fusion Method using Decision Rule for Data Enrichment)

  • 김순영;정성석
    • 응용통계연구
    • /
    • 제19권2호
    • /
    • pp.291-303
    • /
    • 2006
  • 대용량의 데이터로부터 의미있는 지식을 찾는 과정에서 데이터의 질은 무엇보다도 중요하다. 본 연구에서는 데이터의 충실도를 높이기 위한 방법으로 여러 경로로부터 수집된 데이터의 정보를 활용하기 위해 데이터 마이닝 알고리즘인 의사결정 규칙을 이용한 데이터 통합 기법을 제안하고, 실제 데이터를 이용하여 모의실험을 통해 제안된 알고리즘의 효율성을 비교하였다. 실험결과 제안된 알고리즘이 데이터 통합의 성능을 향상시킴을 알 수 있었다.

청소년의 신체화 증상에 영향을 미치는 요인 (Factors Influencing Somatization in Adolescents)

  • 이한주;서미아
    • 한국학교보건학회지
    • /
    • 제23권1호
    • /
    • pp.79-87
    • /
    • 2010
  • Purpose: The purpose of this study was to explore the relationship between depression, alexithymia, social support and somatic symptom in adolescents. Methods: The subjects were 1,519 adolescents in Seoul. Radloff's CES-D (The Center for Epidemiological Studies-Depression scale) for depression, Bagby, Parker and Taylor's TAS (Toronto Alexithymia Scale) for alexithymia, Park's social support and Derogatis's SCL-90 (Brief Symptom Inventory & Matching Clinical Rating Scale) were used. The data was analyzed using descriptive statistics, Pearson's correlation coefficients, t or F test, and stepwise multiple regression. Results: Depression and somatic symptom were lower but social support was higher when compared to mean score. The somatic symptom was significantly positive correlations to age, depression, alexithymia but no correlation to social support. Stepwise multiple regression analysis showed that 21.8% of the somatic symptom was significantly accounted for depression, alexithymia, social support, gender, economic status, living alone, and living with parent. Conclusion: These results suggest that depression, alexithymia, living alone can be potential risk factors for somatic symptom in the adolescents. Therefore, these findings will give useful information for developing a promotion program focused on social support in the adolescents.

인터넷 상점에서 개인화 광고를 위한 장바구니 분석 기법의 활용 (Application of Market Basket Analysis to Personalized advertisements on Internet Storefront)

  • 김종우;이경미
    • 경영과학
    • /
    • 제17권3호
    • /
    • pp.19-30
    • /
    • 2000
  • Customization and personalization services are considered as a critical success factor to be a successful Internet store or web service provider. As a representative personalization technique, personalized recommendation techniques are studied and commercialized to suggest products or services to a customer of Internet storefronts based on demographics of the customer or based on an analysis of the past purchasing behavior of the customer. The underlining theories of recommendation techniques are statistics, data mining, artificial intelligence, and/or rule-based matching. In the rule-based approach for personalized recommendation, marketing rules for personalization are usually collected from marketing experts and are used to inference with customers data. however, it is difficult to extract marketing rules from marketing experts, and also difficult to validate and to maintain the constructed knowledge base. In this paper, we proposed a marketing rule extraction technique for personalized recommendation on Internet storefronts using market basket analysis technique, a well-known data mining technique. Using marketing basket analysis technique, marketing rules for cross sales are extracted, and are used to provide personalized advertisement selection when a customer visits in an Internet store. An experiment has been performed to evaluate the effectiveness of proposed approach comparing with preference scoring approach and random selection.

  • PDF

Precise Detection of Car License Plates by Locating Main Characters

  • Lee, Dae-Ho;Choi, Jin-Hyuk
    • Journal of the Optical Society of Korea
    • /
    • 제14권4호
    • /
    • pp.376-382
    • /
    • 2010
  • We propose a novel method to precisely detect car license plates by locating main characters, which are printed with large font size. The regions of the main characters are directly detected without detecting the plate region boundaries, so that license regions can be detected more precisely than by other existing methods. To generate a binary image, multiple thresholds are applied, and segmented regions are selected from multiple binarized images by a criterion of size and compactness. We do not employ any character matching methods, so that many candidates for main character groups are detected; thus, we use a neural network to reject non-main character groups from the candidates. The relation of the character regions and the intensity statistics are used as the input to the neural network for classification. The detection performance has been investigated on real images captured under various illumination conditions for 1000 vehicles. 980 plates were correctly detected, and almost all non-detected plates were so stained that their characters could not be isolated for character recognition. In addition, the processing time is fast enough for a commercial automatic license plate recognition system. Therefore, the proposed method can be used for recognition systems with high performance and fast processing.

데이터 보강을 위한 데이터 통합기법에 관한 연구 (A Study on the Data Fusion for Data Enrichment)

  • 정성석;김순영;김현진
    • 응용통계연구
    • /
    • 제17권3호
    • /
    • pp.605-617
    • /
    • 2004
  • 데이터마이닝에서 가장 중요한 요소 중 하나는 마이닝에 사용될 데이터의 질이다. 질 높은 데이터를 바탕으로 마이닝이 수행될 때, 데이터마이닝의 잠재적 가치는 증대될 것이다. 본 논문에서는 지식발견 과정 중 데이터의 질을 향상시키기 위한 한 단계인 데이터 보강을 위해 데이터 통합 기법을 제안하고, 모의실험을 통해 제안된 알고리즘의 효율성을 비교하였다. 실험결과 제안된 알고리즘이 데이터 통합의 성능을 향상시킴을 알 수 있었다.

ISO/IEC 11179 기반의 온톨로지 매칭 방법 (An Ontology Matching Method based on ISO/IEC 11179)

  • 이지윤;이석훈;김장원;정동원;백두권
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2012년도 한국컴퓨터종합학술대회논문집 Vol.39 No.1(C)
    • /
    • pp.95-97
    • /
    • 2012
  • 다양한 온톨로지들이 구축되고 이를 적용한 시스템들이 늘어가면서 시스템 간 상호운용성에 문제가 발생하게 되었다. 이러한 문제점을 해결하기 위해 공통 개념이라 볼 수 있는 온톨로지를 메타데이터 레지스트리에 등록하고, 이를 기반으로 한 시스템들이 개발되면서 시스템 간 상호운용성이 향상되었다. 하지만 서로 다른 메타데이터 레지스트리를 기반으로 한 시스템 간에는 상호운용성 문제가 여전히 존재하므로, 메타데이터 레지스트리에 등록된 온톨로지 간 매칭 방법에 대한 필요성이 대두되었다. 기존의 온톨로지 매칭 방법들은 온톨로지의 규모가 작을 경우 정확한 매칭 결과를 제공하지 못하는 문제점을 가진다. 따라서 이 논문에서는 메타데이터에 레지스트리에 등록된 온톨로지들을 매칭하기 위하여 메타데이터 레지스트리의 구조상의 특징을 반영하여 온톨로지를 확장한다. 그리고 확장된 온톨로지를 이용하여 온톨로지를 매칭 함으로써 정확한 매칭이 이루어지는 온톨로지 매칭 방법을 제안한다. 또한 제안 방법의 장점을 보이기 위해 기존 온톨로지 매칭 방법들과의 비교평가를 수행한다. 제안 방법은 매칭의 정확성을 보장하고 효율성을 높이며 메타데이터 레지스트리간 상호운용성을 높인다.

RECENT RESEARCH AND DEVELOPING TREND OF ENGINEERING MANAGEMENT IN CHINA BASED ON TEXT MINING

  • Shaohua Jiang;Wenling Zhang;Zhaohong Qiu;Shaojun Wang
    • 국제학술발표논문집
    • /
    • The 3th International Conference on Construction Engineering and Project Management
    • /
    • pp.814-820
    • /
    • 2009
  • With the rapid development of China economy, many engineering projects with large scale and investment were constructed in China and some were the biggest ones in the world. With the development of engineering practice, great progress in the research of engineering management of China was made and a large number of research findings were embodied in content of research papers and were represented by technical words. To know the state of arts in the research field of engineering management in China, three major parts, namely title, abstract and keywords of research papers in last five years from three representative Chinese journals about engineering management were chose as research materials. Unlike western languages, there are no delimiters between the words of Chinese, so the maximum matching and frequency statistics (MMFS) method, a text segmentation technique of text mining Chinese, was presented to extract the features consisting of technical words, phrases and words from the research materials. Recent research and developing trend of engineering management in China were found by comparing and analyzing the difference of technical words in the research materials of last five years.

  • PDF