• Title/Summary/Keyword: map-reduce

Search Result 853, Processing Time 0.036 seconds

An Efficient Implementation of Mobile Raspberry Pi Hadoop Clusters for Robust and Augmented Computing Performance

  • Srinivasan, Kathiravan;Chang, Chuan-Yu;Huang, Chao-Hsi;Chang, Min-Hao;Sharma, Anant;Ankur, Avinash
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.989-1009
    • /
    • 2018
  • Rapid advances in science and technology with exponential development of smart mobile devices, workstations, supercomputers, smart gadgets and network servers has been witnessed over the past few years. The sudden increase in the Internet population and manifold growth in internet speeds has occasioned the generation of an enormous amount of data, now termed 'big data'. Given this scenario, storage of data on local servers or a personal computer is an issue, which can be resolved by utilizing cloud computing. At present, there are several cloud computing service providers available to resolve the big data issues. This paper establishes a framework that builds Hadoop clusters on the new single-board computer (SBC) Mobile Raspberry Pi. Moreover, these clusters offer facilities for storage as well as computing. Besides the fact that the regular data centers require large amounts of energy for operation, they also need cooling equipment and occupy prime real estate. However, this energy consumption scenario and the physical space constraints can be solved by employing a Mobile Raspberry Pi with Hadoop clusters that provides a cost-effective, low-power, high-speed solution along with micro-data center support for big data. Hadoop provides the required modules for the distributed processing of big data by deploying map-reduce programming approaches. In this work, the performance of SBC clusters and a single computer were compared. It can be observed from the experimental data that the SBC clusters exemplify superior performance to a single computer, by around 20%. Furthermore, the cluster processing speed for large volumes of data can be enhanced by escalating the number of SBC nodes. Data storage is accomplished by using a Hadoop Distributed File System (HDFS), which offers more flexibility and greater scalability than a single computer system.

External Merge Sorting in Tajo with Variable Server Configuration (매개변수 환경설정에 따른 타조의 외부합병정렬 성능 연구)

  • Lee, Jongbaeg;Kang, Woon-hak;Lee, Sang-won
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.820-826
    • /
    • 2016
  • There is a growing requirement for big data processing which extracts valuable information from a large amount of data. The Hadoop system employs the MapReduce framework to process big data. However, MapReduce has limitations such as inflexible and slow data processing. To overcome these drawbacks, SQL query processing techniques known as SQL-on-Hadoop were developed. Apache Tajo, one of the SQL-on-Hadoop techniques, was developed by a Korean development group. External merge sort is one of the heavily used algorithms in Tajo for query processing. The performance of external merge sort in Tajo is influenced by two parameters, sort buffer size and fanout. In this paper, we analyzed the performance of external merge sort in Tajo with various sort buffer sizes and fanouts. In addition, we figured out that there are two major causes of differences in the performance of external merge sort: CPU cache misses which increase as the sort buffer size grows; and the number of merge passes determined by fanout.

Handling Streaming Data by Using Open Source Framework Storm in IoT Environment (오픈소스 프레임워크 Storm을 활용한 IoT 환경 스트리밍 데이터 처리)

  • Kang, Yunhee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.7
    • /
    • pp.313-318
    • /
    • 2016
  • To utilize sensory data, it is necessary to design architecture for processing and handling data generated from sensors in an IoT environment. Especially in the IoT environment, a thing connects to the Internet and efficiently enables to communicate a device with diverse sensors. But Hadoop and Twister based on MapReduce are good at handling data in a batch processing. It has a limitation for processing stream data from a sensor in a motion. Traditional streaming data processing has been mainly applied a MoM based message queuing system. It has maintainability and scalability problems because a programmer should consider details related with complex messaging flow. In this paper architecture is designed to handle sensory data aggregated The designed software architecture is used to operate an application on the open source framework Storm. The application is conceptually used to transform streaming data which aggregated via sensor gateway by pipe-filter style.

Development of Application to Deal with Large Data Using Hadoop for 3D Printer (하둡을 이용한 3D 프린터용 대용량 데이터 처리 응용 개발)

  • Lee, Kang Eun;Kim, Sungsuk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.1
    • /
    • pp.11-16
    • /
    • 2020
  • 3D printing is one of the emerging technologies and getting a lot of attention. To do 3D printing, 3D model is first generated, and then converted to G-code which is 3D printer's operations. Facet, which is a small triangle, represents a small surface of 3D model. Depending on the height or precision of the 3D model, the number of facets becomes very large and so the conversion time from 3D model to G-code takes longer. Apach Hadoop is a software framework to support distributed processing for large data set and its application range gets widening. In this paper, Hadoop is used to do the conversion works time-efficient way. 2-phase distributed algorithm is developed first. In the algorithm, all facets are sorted according to its lowest Z-value, divided into N parts, and converted on several nodes independently. The algorithm is implemented in four steps; preprocessing - Map - Shuffling - Reduce of Hadoop. Finally, to show the performance evaluation, Hadoop systems are set up and converts testing 3D model while changing the height or precision.

MissingFound: An Assistant System for Finding Missing Companions via Mobile Crowdsourcing

  • Liu, Weiqing;Li, Jing;Zhou, Zhiqiang;He, Jiling
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.10
    • /
    • pp.4766-4786
    • /
    • 2016
  • Looking for missing companions who are out of touch in public places might suffer a long and painful process. With the help of mobile crowdsourcing, the missing person's location may be reported in a short time. In this paper, we propose MissingFound, an assistant system that applies mobile crowdsourcing for finding missing companions. Discovering valuable users who have chances to see the missing person is the most important task of MissingFound but also a big challenge with the requirements of saving battery and protecting users' location privacy. A customized metric is designed to measure the probability of seeing, according to users' movement traces represented by WiFi RSSI fingerprints. Since WiFi RSSI fingerprints provide no knowledge of users' physical locations, the computation of probability is too complex for practical use. By parallelizing the original sequential algorithms under MapReduce framework, the selecting process can be accomplished within a few minutes for 10 thousand users with records of several days. Experimental evaluation with 23 volunteers shows that MissingFound can select out the potential witnesses in reality and achieves a high accuracy (76.75% on average). We believe that MissingFound can help not only find missing companions, but other public services (e.g., controlling communicable diseases).

Correspondence Strategy for Big Data's New Customer Value and Creation of Business (빅 데이터의 새로운 고객 가치와 비즈니스 창출을 위한 대응 전략)

  • Koh, Joon-Cheol;Lee, Hae-Uk;Jeong, Jee-Youn;Kim, Kyung-Sik
    • Journal of the Korea Safety Management & Science
    • /
    • v.14 no.4
    • /
    • pp.229-238
    • /
    • 2012
  • Within last 10 years, internet has become a daily activity, and humankind had to face the Data Deluge, a dramatic increase of digital data (Economist 2012). Due to exponential increase in amount of digital data, large scale data has become a big issue and hence the term 'big data' appeared. There is no official agreement in quantitative and detailed definition of the 'big data', but the meaning is expanding to its value and efficacy. Big data not only has the standardized personal information (internal) like customer information, but also has complex data of external, atypical, social, and real time data. Big data's technology has the concept that covers wide range technology, including 'data achievement, save/manage, analysis, and application'. To define the connected technology of 'big data', there are Big Table, Cassandra, Hadoop, MapReduce, Hbase, and NoSQL, and for the sub-techniques, Text Mining, Opinion Mining, Social Network Analysis, Cluster Analysis are gaining attention. The three features that 'bid data' needs to have is about creating large amounts of individual elements (high-resolution) to variety of high-frequency data. Big data has three defining features of volume, variety, and velocity, which is called the '3V'. There is increase in complexity as the 4th feature, and as all 4features are satisfied, it becomes more suitable to a 'big data'. In this study, we have looked at various reasons why companies need to impose 'big data', ways of application, and advanced cases of domestic and foreign applications. To correspond effectively to 'big data' revolution, paradigm shift in areas of data production, distribution, and consumption is needed, and insight of unfolding and preparing future business by considering the unpredictable market of technology, industry environment, and flow of social demand is desperately needed.

The outcome of surfactant replacement therapy in above nearterm neonates with severe pulmonary disease (준 만삭 이상아에서 폐표면 활성제 보충요법의 성적)

  • Shon, Su-Min;Lee, Bo-Young;Kim, Chun-Soo;Lee, Sang-Lak;Kwon, Tae-Chan
    • Clinical and Experimental Pediatrics
    • /
    • v.50 no.12
    • /
    • pp.1200-1205
    • /
    • 2007
  • Purpose : We performed this study to investigate the outcome of surfactant replacement therapy (SRT) in above nearterm neonates who were required mechanical ventilatory care due to meconium aspiration pneumonia (MAP), respiratory distress syndrome (RDS) or other severe pneumonia (PN). Methods : 48 patients, gestational period ${\geq}36weeks$, who were admitted in NICU of Dongsan Medical Center, Keimyung University between July 1999 and June 2004 were enrolled. They were divided into three groups, MAP group (15 cases), RDS group (27 cases) and PN group (6 cases). All patients were received SRT and evaluated several clinical data (gestational age, oxygen index, duration of ventilator care) and outcome (complications and mortality rate) between pre-SRT and post-SRT. The mean dose of surfactant (modified bovine surfactant, Newfacten, Yuhan Co., Seoul, Korea) was 120 mg/kg. Results : Among each groups, mean pre-SRT OI was higher in MAP group ($21{\pm}3.2$) than other groups, mean duration (days) of ventilatory care and oxygen therapy were similar distributions. Compared with pre-SRT values, significant improvements (P<0.05) in mean values for FiO2 and oxygenation index were documented at 12 hours after SRT. Early complications (persistent pulmonary hypertension of newborm, pneumothorax) and survival rate were lower in MAP group. Within RDS group, earlier SRT (given before 12 hours of life) revealed significantly lower early complication rate than later SRT (given after 12 hours of life) (13.3% vs 58.3%, P<0.05) Conclusion : Our study suggested that SRT seems to be an effective therapy in above nearterm neonates with severe pulmonary disease, and earlier SRT tends to reduce complications in RDS group than later therapy.

Evaluation of the Usefulness of MapPHAN for the Verification of Volumetric Modulated Arc Therapy Planning (용적세기조절회전치료 치료계획 확인에 사용되는 MapPHAN의 유용성 평가)

  • Woo, Heon;Park, Jang Pil;Min, Jae Soon;Lee, Jae Hee;Yoo, Suk Hyun
    • The Journal of Korean Society for Radiation Therapy
    • /
    • v.25 no.2
    • /
    • pp.115-121
    • /
    • 2013
  • Purpose: Latest linear accelerator and the introduction of new measurement equipment to the agency that the introduction of this equipment in the future, by analyzing the process of confirming the usefulness of the preparation process for applying it in the clinical causes some problems, should be helpful. Materials and Methods: All measurements TrueBEAM STX (Varian, USA) was used, and a file specific to each energy, irradiation conditions, the dose distribution was calculated using a computerized treatment planning equipment (Eclipse ver 10.0.39, Varian, USA). Measuring performance and cause errors in MapCHECK 2 were analyzed and measured against. In order to verify the performance of the MapCHECK 2, 6X, 6X-FFF, 10X, 10X-FFF, 15X field size $10{\times}10$ cm, gantry $0^{\circ}$, $180^{\circ}$ direction was measured by the energy. IGRT couch of the CT values affect the measurements in order to confirm, CT number values : -800 (Carbon) & -950 (COUCH in the air), -100 & 6X-950 in the state for FFF, 15X of the energy field sizes $10{\times}10$, gantry $180^{\circ}$, $135^{\circ}$, $275^{\circ}$ directionwas measured at, MapPHAN allocated to confirm the value of HU were compared, using the treatment planning computer for, Measurement error problem by the sharp edges MapPHAN Learn gantry direction MapPHAN of dependence was measured in three ways. GANTRY $90^{\circ}$, $270^{\circ}$ in the direction of the vertically erected settings 6X-FFF, 15X respectively, and Setting the state established as a horizontal field sizes $10{\times}10$, $90^{\circ}$, $45^{\circ}$, $315^{\circ}$, $270^{\circ}$ of in the direction of the energy-6X-FFF, 15X, respectively, were measured. Without intensity modulated beam of the third open arc were investigated. Results: Of basic performance MapCHECK confirm the attenuation measured by Couch, measured from the measured HU values that are assigned to the MAP-PHAN, check for calculation accuracy for the angled edge of the MapPHAN all come in a range of valid measurement errors do not affect the could see. three ways for the Gantry direction dependence, the first of the meter built into the value of the Gantry $270^{\circ}$ (relative $0^{\circ}$), $90^{\circ}$ (relative $180^{\circ}$), 6X-FFF, 15X from each -1.51, 0.83% and -0.63, -0.22% was not affected by the AP/PA direction represented. Setting the meter horizontally Gantry $90^{\circ}$, $270^{\circ}$ from the couch, Energy 6X-FFF 4.37, 2.84%, 15X, -9.63, -13.32% the difference. By-side direction measurements MapPHAN in value is not within the valid range can not, because that could be confirmed as gamma pass rate 3% of the value is greater than the value shown. You can check the Open Arc 6X-FFF, 15X energy, field size $10{\times}10$ cm $360^{\circ}$ rotation of the dose distribution in the state to look at nearly 90% pass rate to emerge. Conclusion: Based on the above results, the MapPHAN gantry direction dependence by side in the direction of the beam relative dose distribution suitable for measuring the gamma value, but accurate measurement of the absolute dose can not be considered is. this paper, a more accurate treatment plan in order to confirm, Reduce the tolerance for VMAT, such as lateral rotation investigation in order to measure accurate absolute isodose using a combination of IMF (Isocentric Mounting Fixture) MapCHEK 2, will be able to minimize the impact due to the angular dependence.

  • PDF

Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain

  • Lee, Yun-Kyung;Park, Jeon Gue;Lee, Yun Keun;Kwon, Oh-Wook
    • ETRI Journal
    • /
    • v.36 no.5
    • /
    • pp.721-729
    • /
    • 2014
  • We propose a novel phase-based method for single-channel speech enhancement to extract and enhance the desired signals in noisy environments by utilizing the phase information. In the method, a phase-dependent a priori signal-to-noise ratio (SNR) is estimated in the log-mel spectral domain to utilize both the magnitude and phase information of input speech signals. The phase-dependent estimator is incorporated into the conventional magnitude-based decision-directed approach that recursively computes the a priori SNR from noisy speech. Additionally, we reduce the performance degradation owing to the one-frame delay of the estimated phase-dependent a priori SNR by using a minimum mean square error (MMSE)-based and maximum a posteriori (MAP)-based estimator. In our speech enhancement experiments, the proposed phase-dependent a priori SNR estimator is shown to improve the output SNR by 2.6 dB for both the MMSE-based and MAP-based estimator cases as compared to a conventional magnitude-based estimator.

An Adaptive Bit-reduced Mean Absolute Difference Criterion for Block-Matching Algorithm and Its VlSI Implementation (블럭 정합 알고리즘을 위한 적응적 비트 축소 MAD 정합 기준과 VLSI 구현)

  • Oh, Hwang-Seok;Baek, Yun-Ju;Lee, Heung-Kyu
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.5
    • /
    • pp.543-550
    • /
    • 2000
  • An adaptive bit-reduced mean absolute difference (ABRMAD) is presented as a criterion for the block-matching algorithm (BMA) to reduce the complexity of the VLSI Implementation and to improve the processing time. The ABRMAD uses the lower pixel resolution of the significant bits instead of full resolution pixel values to estimate the motion vector (MV) by examining the pixels Ina block. Simulation results show that the 4-bit ABRMAD has competitive mean square error (MSE)results and a half less hardware complexity than the MAD criterion, It has also better characteristics in terms of both MSE performance and hardware complexity than the Minimax criterion and has better MSE performance than the difference pixel counting(DPC), binary block-matching with edge-map(BBME), and bit-plane matching(BPM) with the same number of bits.

  • PDF