• Title/Summary/Keyword: 고차원 데이터

Search Result 254, Processing Time 0.029 seconds

A Content based Web Image Retrieval System using MPEG-7 Visual Descriptors and Textual Information (MPEG-7 시각 정보 기술자와 텍스트 정보를 이용한 내용 기반 웹 이미지 검색 시스템)

  • Park Joo-Hyoun;Nang Jong-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06a
    • /
    • pp.232-234
    • /
    • 2006
  • 인터넷 기술의 발달과 디지털 카메라와 같은 디지털 미디어 생산 장비의 발달로 WWW에 이미지 데이터의 양이 급격하게 늘어나면서 웹 이미지에 대한 효율적인 검색에 대한 요구가 증가하고 있다. 본 논문에서는 사용자의 다양한 검색 요구를 만족시킬 수 있도록 기존의 텍스트 기반의 검색과 시각 정보 기반의 검색을 병합하여 수행할 수 있는 웹 이미지 검색 시스템을 설계하고 구현한다. 제안한 웹 이미지 검색 시스템은 웹 이미지 수집 및 검색정보 추출 도구. 검색 서버. 그리고 검색 클라이언트로 구성된다. 웹 이미지 수집 및 검색 정보 추출 도구는 웹에서 이미지를 수집하여 이미지가 속해있는 웹 문서 구조를 이용하여 적절한 키워드를 선택하며 시각 정보 기반의 검색을 지원하기 위해 MPEG-7 시각 정보 기술자(1)를 추출한다. 빠른 검색을 위해 추출된 텍스트 정보는 상용 데이터베이스에 저장되며 MPEG-7 시각 정보 기술자는 고차원 데이터 색인 방법인 HBI (Hierarchical Bitmap Index)(2)를 사용하여 색인 정보를 만들어 사용한다. 검색 클라이언트는 사용자가 각 검색 요소에 가중치를 부여하여 검색 할 수 있도록 하며 원하는 검색 결과를 얻을 때까지 반복하여 검색할 수 있는 연관 피드백 과정도 포함한다.

  • PDF

Test for reliability of MS Excel statistical analysis output and modification of macros (Focused on an Analysis of Variance menu) (MS 엑셀 프로그램의 통계분석결과 신뢰성 검증 및 매크로 보완 (분산분석 메뉴를 중심으로))

  • Kim, Sook-Young
    • Journal of the Korea Computer Industry Society
    • /
    • v.9 no.5
    • /
    • pp.207-216
    • /
    • 2008
  • Statistical analysis menus of MS Excel software, with powerful spreadsheet functions has not been modified since Excel 2000 Edition and its utilization is very low. To improve utilization of Excel menu for statistical analysis, this research compared outputs of Excel statistical menus and computed test statistics, and developed high-level macros. Outputs of Excel menus, both oneway layout and twoway layout, on real data are exactly same as the computed test statistics, and therefore, Excel menus for statistical analysis are reliable. Macros to provide results for Analysis of Variance with a block and multiple comparison of means are developed using Excel functions.

  • PDF

Design and Implementation of OLAP/DataMining integration Tool using XMLA (XMLA를 이용한 OLAP/데이터마이닝 통합 툴의 설계 및 구현)

  • Kim, Seong-Ju;Choi, Ji-Woong;Kim, Myung-Ho
    • Annual Conference of KIPS
    • /
    • 2006.11a
    • /
    • pp.409-412
    • /
    • 2006
  • 빠르게 변화하는 시장 및 기업 간의 경쟁 환경에서 기업의 의사결정권자들은 보다 신속한 의사결정을 내려야 하고, 의사결정의 위험을 최소화해야 하는 무거운 중책이 새롭게 추가 되었다. 이에 비즈니스 인텔리전스는 주로 고차원의 분석을 필요로 하는 시장분석가나, IT조직의 소수 멤버들을 위한 여러가지 BI툴을 제공 하였다. 과거의 비즈니스 인텔리전스 제품 가격이나 솔루션 구축에 따른 비용은 사용자가 적음에도 불구하고 만만치 않았다. 최근 들어, 환경 변화와 사용자의 요구의 다양성에 따라 기업 내의 많은 사용자들은 데이터를 분석하길 원한다. 또한 기업의 업무를 보다 원할히 진행시키기 위해 많은 의사결정이 하부조직에서 이루어지고 있으며, 그에 따라 현장 직원들에게 의사결정에 대한 책임이 부과되고 있다. 또한 BI 제품의 데이터 저장소의 기술차이에 따라 호환성이 떨어지는 플랫폼을 기반으로 보고서를 작성하였다. 이에 본 논문에서는 XMLA 웹서비스를 이용하여 다중 플랫폼을 지원하는 자바 기반의 리포팅 툴과 연동 가능한 OLAP/데이터마이닝 비즈니스 인텔리전스 툴을 제안한다. 구현 시스템은 다양한 형태로 표현 가능한 프론트엔드 툴을 제공함으로써 최종 사용자의 편의성을 제공하며 BI의 기능을 지원한다.

  • PDF

Gene Expression Data Analysis Using Parallel Processor based Pattern Classification Method (병렬 프로세서 기반의 패턴 분류 기법을 이용한 유전자 발현 데이터 분석)

  • Choi, Sun-Wook;Lee, Chong-Ho
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.6
    • /
    • pp.44-55
    • /
    • 2009
  • Diagnosis of diseases using gene expression data obtained from microarray chip is an active research area recently. It has been done by general machine learning algorithms, because it is difficult to analyze directly. However, recent research results about the analysis based on the interaction between genes is essential for the gene expression analysis, which means the analysis using the traditional machine learning algorithms has limitations. In this paper, we classify the gene expression data using the hyper-network model that considers the higher-order correlations between the features, and then compares the classification accuracies. And also, we present the new hypo-network model that improve the disadvantage of existing model, and compare the processing performances of the existing hypo-network model based on general sequential processor and the improved hypo-network model implemented on parallel processors. In the experimental results, we show that the performance of our model shows improved and competitive classification performance than traditional machine learning methods, as well as, the existing hypo-network model. We show that the performance is maximized when the hypernetwork model is implemented on our parallel processors.

Automaitc Generation of Fashion Image Dataset by Using Progressive Growing GAN (PG-GAN을 이용한 패션이미지 데이터 자동 생성)

  • Kim, Yanghee;Lee, Chanhee;Whang, Taesun;Kim, Gyeongmin;Lim, Heuiseok
    • Journal of Internet of Things and Convergence
    • /
    • v.4 no.2
    • /
    • pp.1-6
    • /
    • 2018
  • Techniques for generating new sample data from higher dimensional data such as images have been utilized variously for speech synthesis, image conversion and image restoration. This paper adopts Progressive Growing of Generative Adversarial Networks(PG-GANs) as an implementation model to generate high-resolution images and to enhance variation of the generated images, and applied it to fashion image data. PG-GANs allows the generator and discriminator to progressively learn at the same time, continuously adding new layers from low-resolution images to result high-resolution images. We also proposed a Mini-batch Discrimination method to increase the diversity of generated data, and proposed a Sliced Wasserstein Distance(SWD) evaluation method instead of the existing MS-SSIM to evaluate the GAN model.

Design and Implementation of a Directory System for Disease Retrieval Services (질병 검색 서비스를 위한 디렉토리 시스템 설계 및 구현)

  • Yeo, Myung-ho;Lee, Yoon-kyeong;Rho, Kyu-jong;Park, Hyoung-soon;Kim, Hak-sin;Park, Jun-ho;Kang, Tae-ho;Kim, Hak-yong;Yoo, Jae-soo
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.709-714
    • /
    • 2009
  • Recently, biological researches are required to deal with a large scale of data. While scientists used classical experimental approaches for researches in the past, it is possible to get more sophisticated observations easily with convergence of information technologies and biology. The study on diseases is one of the most important issues of the life science. Conventional services and databases provide users with information such as classification of diseases, symptoms, and medical treatments through web. However, it is hard to connect or develop them for other new services because they have independent and different criterions. It may be a factor that interferes the development of biology. In this paper, we propose an integrated data structure for the disease database, and design and implement a novel directory system for diseases as an infrastructure for developing other new services.

  • PDF

An Implementation of Efficient M-tree based Indexing on Flash-Memory Storage System (플래시 메모리 저장장치에서 효율적인 M-트리 기반의 인덱싱 구현)

  • Yu, Jeong-Soo;Nang, Jong-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.1
    • /
    • pp.70-74
    • /
    • 2010
  • As the storage capacity of the flash memories increased portable devices began to store mass amount of multimedia data on flash memory. Therefore, there has been a need for an effective data management scheme by indexing structure. Among many indexing schemes, M-tree is well known for it's suitability for multimedia data with high dimensional matrix space. Since flash memories have writing operation restriction, there is a performance limitation in indexing scheme with frequent write operation. In this paper, a new node split method with reduced write operation for m-tree indexing scheme in flash memory is proposed. According to experiments the proposed method reduced the write operation to about 7% of the original method. The proposed method will effectively construct an indexing structure for multimedia data in flash memories.

Replacement Condition Detection of Railway Point Machines Using Data Cube and SVM (데이터 큐브 모델과 SVM을 이용한 철도 선로전환기의 교체시기 탐지)

  • Choi, Yongju;Oh, Jeeyoung;Park, Daihee;Chung, Yongwha;Kim, Hee-Young
    • Smart Media Journal
    • /
    • v.6 no.2
    • /
    • pp.33-41
    • /
    • 2017
  • Railway point machines act as actuators that provide different routes to trains by driving switchblades from the current position to the opposite one. Since point failure caused by the aging effect can significantly affect railway operations with potentially disastrous consequences, replacement detection of point machine at an appropriate time is critical. In this paper, we propose a replacement condition detection method of point machine in railway condition monitoring systems using electrical current signals, after analyzing and relabeling domestic in-field replacement data by means of OLAP(On-Line Analytical Processing) operations in the multidimensional data cube into "does-not-need-to-be replaced" and "needs-to-be-replaced" data. The system enables extracting suitable feature vectors from the incoming electrical current signals by DWT(Discrete Wavelet Transform) with reduced feature dimensions using PCA(Principal Components Analysis), and employs SVM(Support Vector Machine) for the real-time replacement detection of point machine. Experimental results with in-field replacement data including points anomalies show that the system could detect the replacement conditions of railway point machines with accuracy exceeding 98%.

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.

A Distributed Activity Recognition Algorithm based on the Hidden Markov Model for u-Lifecare Applications (u-라이프케어를 위한 HMM 기반의 분산 행위 인지 알고리즘)

  • Kim, Hong-Sop;Yim, Geo-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.5
    • /
    • pp.157-165
    • /
    • 2009
  • In this paper, we propose a distributed model that recognize ADLs of human can be occurred in daily living places. We collect and analyze user's environmental, location or activity information by simple sensor attached home devices or utensils. Based on these information, we provide a lifecare services by inferring the user's life pattern and health condition. But in order to provide a lifecare services well-refined activity recognition data are required and without enough inferred information it is very hard to build an ADL activity recognition model for high-level situation awareness. The sequence that generated by sensors are very helpful to infer the activities so we utilize the sequence to analyze an activity pattern and propose a distributed linear time inference algorithm. This algorithm is appropriate to recognize activities in small area like home, office or hospital. For performance evaluation, we test with an open data from MIT Media Lab and the recognition result shows over 75% accuracy.