• 제목/요약/키워드: Large Data Set

검색결과 1,054건 처리시간 0.029초

Discovery of Association Rules Using Latent Variables

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권1호
    • /
    • pp.149-160
    • /
    • 2006
  • Association rule mining searches for interesting relationships among items in a given large data set. Association rules are frequently used by retail stores to assist in marketing, advertising, floor placement, and inventory control. There are three primary threshold measures in association rule; support and confidence and lift. In the case of appling real world to association rules, we have some difficulties in data interpretation because we obtain many rules. In this paper, we develop the model of association rules using latent variables for environmental survey data.

  • PDF

Large Solvent and Noise Peak Suppression by Combined SVD-Harr Wavelet Transform

  • Kim, Dae-Sung;Kim, Dai-Gyoung;Lee, Yong-Woo;Won, Ho-Shik
    • Bulletin of the Korean Chemical Society
    • /
    • 제24권7호
    • /
    • pp.971-974
    • /
    • 2003
  • By utilizing singular value decomposition (SVD) and shift averaged Harr wavelet transform (WT) with a set of Daubechies wavelet coefficients (1/2, -1/2), a method that can simultaneously eliminate an unwanted large solvent peak and noise peaks from NMR data has been developed. Noise elimination was accomplished by shift-averaging the time domain NMR data after a large solvent peak was suppressed by SVD. The algorithms took advantage of the WT, giving excellent results for the noise elimination in the Gaussian type NMR spectral lines of NMR data pretreated with SVD, providing superb results in the adjustment of phase and magnitude of the spectrum. SVD and shift averaged Haar wavelet methods were quantitatively evaluated in terms of threshold values and signal to noise (S/N) ratio values.

맵리듀스에서 데이터의 유용성을 이용한 데이터 분할 기법 (Data Partitioning on MapReduce by Leveraging Data Utility)

  • 김종욱
    • 한국멀티미디어학회논문지
    • /
    • 제16권5호
    • /
    • pp.657-666
    • /
    • 2013
  • 현대사회는 소셜 미디어, 비즈니스, 바이오 인포메틱스 같은 다양한 응용프로그램에서 지속적으로 생산되어 지고 있는 수많은 데이터의 빠른 유입으로 특징지어 지고 있다. 이에 따라 폭발적으로 증가하고 있는 대규모 데이터를 보다 효율적으로 분석하고 처리 할 수 있는 방법이 그 어느 때보다 강조 되고 있다. 지난 몇 년간 학계에서는 배치 지향 시스템 (batch oriented system) 환경 내에서 병렬 처리를 효과적으로 지원할 수 있는 맵리듀스 기법이 활발히 연구 되어 왔으며, 맵리듀스 기법은 다양한 분야에서 성공적으로 사용되고 있다. 그러나 이 기법은 데이터의 상대적 유용성 (data utility)을 고려하지 않기 때문에, 멀티미디어 응용프로그램 사용자의 특성 (즉, 높은 혹은 낮은 스코어를 가지는 몇몇 결과물에 관심을 가지는 사용자들의 특성)으로 인하여 효과적인 성능을 보여 주지 못하고 있다. 따라서 본 논문에서는 이러한 문제점을 해소하기 위해, 맵리듀스 상에서의 데이터 분할 방식을 제안한다. 또한, 제안된 분할 방식에 대한 성능 실험을 통하여 우리가 제안하는 데이터 분할 방식이 기존 방식보다 성능 향상을 자져올 수 있음을 보여준다.

Clustering Algorithm Using Hashing in Classification of Multispectral Satellite Images

  • Park, Sung-Hee;Kim, Hwang-Soo;Kim, Young-Sup
    • 대한원격탐사학회지
    • /
    • 제16권2호
    • /
    • pp.145-156
    • /
    • 2000
  • Clustering is the process of partitioning a data set into meaningful clusters. As the data to process increase, a laster algorithm is required than ever. In this paper, we propose a clustering algorithm to partition a multispectral remotely sensed image data set into several clusters using a hash search algorithm. The processing time of our algorithm is compared with that of clusters algorithm using other speed-up concepts. The experiment results are compared with respect to the number of bands, the number of clusters and the size of data. It is also showed that the processing time of our algorithm is shorter than that of cluster algorithms using other speed-up concepts when the size of data is relatively large.

Discovery of Association Rules Using Latent Variables

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2005년도 추계학술대회
    • /
    • pp.177-188
    • /
    • 2005
  • Association rule mining searches for interesting relationships among items in a given large data set. Association rules are frequently used by retail stores to assist in marketing, advertising, floor placement, and inventory control. There are three primary threshold measures in association rule; support and confidence and lift. In the case of appling real world to association rules, we have some difficulties in data interpretation because we obtain many rules. In this paper, we develop the model of association rules using latent variables for environmental survey data.

  • PDF

LES와 Level-set Flamelet 기법을 이용한 가스터빈 환형 연소기용 스월 분사기의 난류 연소 특성 (Turbulent Combustion Characteristics of a Swirl Injector in a Gas Turbine Annular Combustor Using LES and Level-set Flamelet)

  • 김리나;홍지석;정원철;유광희;김종찬;성홍계
    • 한국추진공학회지
    • /
    • 제18권2호
    • /
    • pp.1-9
    • /
    • 2014
  • 환형 연소기 내에서의 난류 연소 유동을 해석하고 유동 특성을 도출하기 위해 3차원 large-eddy simulation (LES)를 수행하였다. 연소실 내 복잡한 반응 연소 유동의 화염모사를 위해 level-set flamelet 기법을 적용하였다. 계산 모델로서 GEAE사의 LM6000 환형 싱글 연소기를 이용하였으며 작동 조건은 실험결과에 근거하였다. 연소실 내에서 난류 유동의 중요한 특징인 vortex breakdown과 스월분사기에서 분사되는 연소가스의 팽창으로 인한 중심 재순환 영역, 코너 재순환 영역 등을 관찰하였고, 난류화염 구조를 분석하였다.

대단위 협력 연구개발 사업을 위한 통합정보시스템 구축 (The development of integrated information system for the large scale cooperative R & D project)

  • 이원중;김의준
    • 항공우주기술
    • /
    • 제7권2호
    • /
    • pp.38-45
    • /
    • 2008
  • It is challenging to build the integrated information system for a large scale cooperative R & D project. To develop the aircraft program which especially has several leading agencies and is supported by many demestic/foreign participating companies, the common data flow in harmony is the core factor to achieve a development goal. For this, the development are carried out maintaining the existing management systems of agencies and companies. As a first step, the standard for the common data information and the classification category of technical data are defined. Second, the work flow standards are also set. Based on the foundation, the efficient technical data management system are built including the function of storage, inquiry, revision, link, approval, submission, etc.

  • PDF

AN APPROACH TO THE TRAINING OF A SUPPORT VECTOR MACHINE (SVM) CLASSIFIER USING SMALL MIXED PIXELS

  • Yu, Byeong-Hyeok;Chi, Kwang-Hoon
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2008년도 International Symposium on Remote Sensing
    • /
    • pp.386-389
    • /
    • 2008
  • It is important that the training stage of a supervised classification is designed to provide the spectral information. On the design of the training stage of a classification typically calls for the use of a large sample of randomly selected pure pixels in order to characterize the classes. Such guidance is generally made without regard to the specific nature of the application in-hand, including the classifier to be used. An approach to the training of a support vector machine (SVM) classifier that is the opposite of that generally promoted for training set design is suggested. This approach uses a small sample of mixed spectral responses drawn from purposefully selected locations (geographical boundaries) in training. A sample of such data should, however, be easier and cheaper to acquire than that suggested by traditional approaches. In this research, we evaluated them against traditional approaches with high-resolution satellite data. The results proved that it can be used small mixed pixels to derive a classification with similar accuracy using a large number of pure pixels. The approach can also reduce substantial costs in training data acquisition because the sampling locations used are commonly easy to observe.

  • PDF

Price Forecasting on a Large Scale Data Set using Time Series and Neural Network Models

  • Preetha, KG;Remesh Babu, KR;Sangeetha, U;Thomas, Rinta Susan;Saigopika, Saigopika;Walter, Shalon;Thomas, Swapna
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권12호
    • /
    • pp.3923-3942
    • /
    • 2022
  • Environment, price, regulation, and other factors influence the price of agricultural products, which is a social signal of product supply and demand. The price of many agricultural products fluctuates greatly due to the asymmetry between production and marketing details. Horticultural goods are particularly price sensitive because they cannot be stored for long periods of time. It is very important and helpful to forecast the price of horticultural products which is crucial in designing a cropping plan. The proposed method guides the farmers in agricultural product production and harvesting plans. Farmers can benefit from long-term forecasting since it helps them plan their planting and harvesting schedules. Customers can also profit from daily average price estimates for the short term. This paper study the time series models such as ARIMA, SARIMA, and neural network models such as BPN, LSTM and are used for wheat cost prediction in India. A large scale available data set is collected and tested. The results shows that since ARIMA and SARIMA models are well suited for small-scale, continuous, and periodic data, the BPN and LSTM provide more accurate and faster results for predicting well weekly and monthly trends of price fluctuation.

위치 제약 조건을 고려한 효율적인 스카이라인 계산 (Efficient Computation of a Skyline under Location Restrictions)

  • 김지현;김명
    • 정보처리학회논문지D
    • /
    • 제18D권5호
    • /
    • pp.313-316
    • /
    • 2011
  • 다차원 데이터 집합에서 서로 지배되지 않는 데이터로 구성된 부분 집합을 스카이라인이라고 한다. 스카이라인 계산은 다차원 데이터를 대상으로 한 의사결정에 유용한 연산이다. 그러나 스카이라인이 지나치게 큰 경우 이를 의사결정에 활용하기 어려울 수 있다. 본 연구에서는 사용자가 제시하는 원점의 이동, 원점으로부터의 각도와 거리 정보를 반영하여 스카이라인의 일부를 효율적으로 구하는 방법을 모색하였다. 제안한 알고리즘은 스카이라인에 속하지 않는 데이터를 신속하게 제거해가며, 사용자의 요구를 점진적으로 반영할 수 있다는 특징을 갖는다. 알고리즘의 효율성은 실험을 통해 검증하였다.