• Title/Summary/Keyword: Same data

Search Result 10,968, Processing Time 0.044 seconds

Sort-Based Distributed Parallel Data Cube Computation Algorithm using MapReduce (맵리듀스를 이용한 정렬 기반의 데이터 큐브 분산 병렬 계산 알고리즘)

  • Lee, Suan;Kim, Jinho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.9
    • /
    • pp.196-204
    • /
    • 2012
  • Recently, many applications perform OLAP(On-Line Analytical Processing) over a very large volume of data. Multidimensional data cube is regarded as a core tool in OLAP analysis. This paper focuses on the method how to efficiently compute data cubes in parallel by using a popular parallel processing tool, MapReduce. We investigate efficient ways to implement PipeSort algorithm, a well-known data cube computation method, on the MapReduce framework. The PipeSort executes several (descendant) cuboids at the same time as a pipeline by scanning one (ancestor) cuboid once, which have the same sorting order. This paper proposed four ways implementing the pipeline of the PipeSort on the MapReduce framework which runs across 20 servers. Our experiments show that PipeMap-NoReduce algorithm outperforms the rest algorithms for high-dimensional data. On the contrary, Post-Pipe stands out above the others for low-dimensional data.

Structural Change Detection Technique for RDF Data in MapReduce (맵리듀스에서의 구조적 RDF 데이터 변경 탐지 기법)

  • Lee, Taewhi;Im, Dong-Hyuk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.8
    • /
    • pp.293-298
    • /
    • 2014
  • Detecting and understanding the changes between RDF data is crucial in the evolutionary process, synchronization system, and versioning system on the web of data. However, current researches on detecting changes still remain unsatisfactory in that they did neither consider the large scale of RDF data nor accurately produce the RDF deltas. In this paper, we propose a scalable and effective change detection using a MapReduce framework which has been used in many fields to process and analyze large volumes of data. In particular, we focus on the structure-based change detection that adopts a strategy for the comparison of blank nodes in RDF data. To achieve this, we employ a method which is composed of two MapReduce jobs. First job partitions the triples with blank nodes by grouping each triple with the same blank node ID and then computes the incoming path to the blank node. Second job partitions the triples with the same path and matchs blank nodes with the Hungarian method. In experiments, we show that our approach is more accurate and effective than the previous approach.

A study on the ordering of PIM family similarity measures without marginal probability (주변 확률을 고려하지 않는 확률적 흥미도 측도 계열 유사성 측도의 서열화)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.367-376
    • /
    • 2015
  • Today, big data has become a hot keyword in that big data may be defined as collection of data sets so huge and complex that it becomes difficult to process by traditional methods. Clustering method is to identify the information in a big database by assigning a set of objects into the clusters so that the objects in the same cluster are more similar to each other clusters. The similarity measures being used in the cluster analysis may be classified into various types depending on the nature of the data. In this paper, we computed upper and lower limits for probability interestingness measure based similarity measures without marginal probability such as Yule I and II, Michael, Digby, Baulieu, and Dispersion measure. And we compared these measures by real data and simulated experiment. By Warrens (2008), Coefficients with the same quantities in the numerator and denominator, that are bounded, and are close to each other in the ordering, are likely to be more similar. Thus, results on bounds provide means of classifying various measures. Also, knowing which coefficients are similar provides insight into the stability of a given algorithm.

A Study on the Synthetic ECG Generation for User Recognition (사용자 인식을 위한 가상 심전도 신호 생성 기술에 관한 연구)

  • Kim, Min Gu;Kim, Jin Su;Pan, Sung Bum
    • Smart Media Journal
    • /
    • v.8 no.4
    • /
    • pp.33-37
    • /
    • 2019
  • Because the ECG signals are time-series data acquired as time elapses, it is important to obtain comparative data the same in size as the enrolled data every time. This paper suggests a network model of GAN (Generative Adversarial Networks) based on an auxiliary classifier to generate synthetic ECG signals which may address the different data size issues. The Cosine similarity and Cross-correlation are used to examine the similarity of synthetic ECG signals. The analysis shows that the Average Cosine similarity was 0.991 and the Average Euclidean distance similarity based on cross-correlation was 0.25: such results indicate that data size difference issue can be resolved while the generated synthetic ECG signals, similar to real ECG signals, can create synthetic data even when the registered data are not the same as the comparative data in size.

Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic

  • Chung, Hoon;Lee, Sung Joo;Lee, Yun Keun
    • ETRI Journal
    • /
    • v.36 no.5
    • /
    • pp.714-720
    • /
    • 2014
  • In this paper, we propose the use of data-driven probabilistic utterance-level decision logic to improve Weighted Finite State Transducer (WFST)-based endpoint detection. In general, endpoint detection is dealt with using two cascaded decision processes. The first process is frame-level speech/non-speech classification based on statistical hypothesis testing, and the second process is a heuristic-knowledge-based utterance-level speech boundary decision. To handle these two processes within a unified framework, we propose a WFST-based approach. However, a WFST-based approach has the same limitations as conventional approaches in that the utterance-level decision is based on heuristic knowledge and the decision parameters are tuned sequentially. Therefore, to obtain decision knowledge from a speech corpus and optimize the parameters at the same time, we propose the use of data-driven probabilistic utterance-level decision logic. The proposed method reduces the average detection failure rate by about 14% for various noisy-speech corpora collected for an endpoint detection evaluation.

A Study on the Characteristic of Driving Sound Noise for Various Optical Disk Drives (광디스크 드라이브의 종류별 구동소음 특성에 관한 연구)

  • Oh, Se-Won;Kim, Yu-Sung;Kim, Dong-Hyun
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2005.05a
    • /
    • pp.580-586
    • /
    • 2005
  • In this study, experimental tests for driving noise of various optical disk drives (ODD) have been performed using 1/2' microphone noise measurement system. Several new and old ODD models by different manufacturers are practically considered and compared far realistic driving conditions. Sound insulation case with absorbing material fur the present experimental tests is designed and constructed using CATIA system. It is found that average data transfer rate, operating RPM, and sound noise level seems to be different for the same opposed speed ODD by different manufacturers. Moreover, driving sound noise level can be largely affected by both tray shape and driving speed even for the same apparent data transfer rate.

  • PDF

Development of Machining Simulation System using Enhanced Z Map Model (Enhanced Z map을 이용한 절삭 공정 시뮬레이션 시스템의 개발)

  • 이상규;고성림
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2002.05a
    • /
    • pp.551-554
    • /
    • 2002
  • The paper discusses new approach for machining operation simulation using enhanced Z map algorithm. To extract the required geometric information from NC code, suggested algorithm uses supersampling method to enhance the efficiency of a simulation process. By executing redundant Boolean operations in a grid cell and averaging down calculated data, presented algorithm can accurately represent material removal volume though tool swept volume is negligibly small. Supersampling method is the most common form of antialiasing and usually used with polygon mesh rendering in computer graphics. The key advantage of enhanced Z map model is that the data structure is same with conventional Z map model, though it can acquire higher accuracy and reliability with same or lower computation time. By simulating machining operation efficiently, this system can be used to improve the reliability and efficiency of NC machining process as well as the quality of the final product.

  • PDF

Chip Load Control Using A NC Verification Model Based on Z-Map (Z-map 기반 NC 검증모델을 이용한 칩부하 제어)

  • 백대균;고태조;김희술
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2000.11a
    • /
    • pp.801-805
    • /
    • 2000
  • This paper presents a new method of tool path optimization. A NC verification model based Z-map was utilized to obtain chip load in feed per tooth. This developed software can regenerate a NC program from cutting condition and the NC program that was generated in CAM. The regenerated NC program has not only all same data of the ex-NC program but also the new feed rates in every block. The new NC data can reduce the cutting time and manufacture precision dies with the same chip load in feed per tooth. This method can also prevent tool chipping and make constant tool wear. This paper considered the effects of acceleration and deceleration in feed rate change.

  • PDF

The Effect of Health Education on the Performance of Health Promoting Behavior in E.M.T. Students (건강관련 과목이수가 대학생의 건강증진 생활양식에 미치는 변화에 관한 연구)

  • Lee, In-Soo;Choi, Eun-Sook
    • The Korean Journal of Emergency Medical Services
    • /
    • v.4 no.1
    • /
    • pp.7-16
    • /
    • 2000
  • The purpose of this study to test the effect of health education on the performance of health promoting behavior in E.M.T. students. The data were collected from 77 EMT students by questionnaire. The first survey were conducted from March 20 to April 2. The second survey were conducted from August 8 to September 5 on same group. The data were analyzed by pecentage, mean, t-test using SAS program. The result of this study were as follows : 1. The average item score for the health promoting was 2.35 at freshman. 2. The average item score for the health promoting was 2.59 after one year on same group. In the subcategories, the highest degree of performance was personal relationship support, self-actualization, stress management, nutrition and health responsibility and the lowest degree was sports. 3. Hypothesis that the EMT student who get health education will have a higher degree of health promoting behavior than the freshman EMT student was accepted.

  • PDF

A Study on the Characteristic of Driving Sound Noise for Various Optical Disk Drives (광디스크 드라이브의 종류별 구동소음 특성에 관한 연구)

  • Oh, Se-Won;Kim, Yu-Sung;Kim, Dong-Hyun
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.15 no.10 s.103
    • /
    • pp.1169-1176
    • /
    • 2005
  • In this study, experimental tests for driving noise of various optical disk drives (ODD) have been performed using 1/2' microphone noise measurement system. Several new and old ODD models by different manufacturers are practically considered and compared for realistic driving conditions. Sound insulation case with absorbing material for the present experimental tests is designed and constructed using CATIA system. It is found that average data transfer rate, operating RPM, and sound noise level seem to be different for the ODD models with same denoted speed by different manufacturers. Moreover, driving sound noise level can be largely affected by both tray shape and driving speed even for the condition of the same apparent data transfer rate.