Search | Korea Science

Processing large-scale data with Apache Spark (Apache Spark를 활용한 대용량 데이터의 처리)

Ko, Seyoon;Won, Joong-Ho
- The Korean Journal of Applied Statistics
- /
- v.29 no.6
- /
- pp.1077-1094
- /
- 2016
Apache Spark is a fast and general-purpose cluster computing package. It provides a new abstraction named resilient distributed dataset, which is capable of support for fault tolerance while keeping data in memory. This type of abstraction results in a significant speedup compared to legacy large-scale data framework, MapReduce. In particular, Spark framework is suitable for iterative machine learning applications such as logistic regression and K-means clustering, and interactive data querying. Spark also supports high level libraries for various applications such as machine learning, streaming data processing, database querying and graph data mining thanks to its versatility. In this work, we introduce the concept and programming model of Spark as well as show some implementations of simple statistical computing applications. We also review the machine learning package MLlib, and the R language interface SparkR.
https://doi.org/10.5351/KJAS.2016.29.6.1077 인용 PDF KSCI

Enhancing the performance of taxi application based on in-memory data grid technology (In-memory data grid 기술을 활용한 택시 애플리케이션 성능 향상 기법 연구)

Choi, Chi-Hwan;Kim, Jin-Hyuk;Park, Min-Kyu;Kwon, Kaaen;Jung, Seung-Hyun;Nazareno, Franco;Cho, Wan-Sup
- Journal of the Korean Data and Information Science Society
- /
- v.26 no.5
- /
- pp.1035-1045
- /
- 2015
Recent studies in Big Data Analysis are showing promising results, utilizing the main memory for rapid data processing. In-memory computing technology can be highly advantageous when used with high-performing servers having tens of gigabytes of RAM with multi-core processors. The constraint in network in these infrastructure can be lessen by combining in-memory technology with distributed parallel processing. This paper discusses the research in the aforementioned concept applying to a test taxi hailing application without disregard to its underlying RDBMS structure. The application of IMDG technology in the application's backend API without restructuring the database schema yields 6 to 9 times increase in performance in data processing and throughput. Specifically, the change in throughput is very small even with increase in data load processing.
https://doi.org/10.7465/jkdi.2015.26.5.1035 인용 PDF KSCI

Resource Availability-based Multi Auction Model for Cloud Service Reservation and Resource Brokering System (자원 가용성 기반 다중 경매 모델을 이용한 서비스 예약형 클라우드 자원 거래 시스템)

Lee, Seok Woo;Kim, Tae Young;Lee, Jong Sik
- Journal of the Korea Society for Simulation
- /
- v.23 no.1
- /
- pp.1-10
- /
- 2014
A cloud computing is one of a parallel and distributed computing. The cloud computing provides some service for user with virtual resources. However, a user's service request does not show a time pattern. As a result, each resource also shows a different availability at the same time. This difference affects a quality of service (QoS) and a resource selection for users. Therefore, we propose the resource availability-based multi auction model for cloud service reservation and resource brokering system. The proposed system is to select the proper resource provider based on the users' request. The proposal adopts the multi phase of the auction to transact resources. The system evaluates the available factor of each resource on the auction phase, and finally reserves the service on the adaptive queue. The proposed model shows the better performance than other existing method.
https://doi.org/10.9709/JKSS.2014.23.1.001 인용 PDF KSCI

Fast Hologram Generating of 3D Object with Super Multi-Light Source using Parallel Distributed Computing (병렬 분산 컴퓨팅을 이용한 초다광원 3차원 물체의 홀로그램 고속 생성)

Song, Joongseok;Kim, Changseob;Park, Jong-Il
- Journal of Broadcast Engineering
- /
- v.20 no.5
- /
- pp.706-717
- /
- 2015
The computer generated hologram (CGH) method is the technology which can generate a hologram by using only a personal computer (PC) commonly used. However, the CGH method requires a huge amount of calculational time for the 3D object with a super multi-light source or a high-definition hologram. Hence, some solutions are obviously necessary for reducing the computational complexity of a CGH algorithm or increasing the computing performance of hardware. In this paper, we propose a method which can generate a digital hologram of the 3D object with a super multi-light source using parallel distributed computing. The traditional methods has the limitation of improving CGH performance by using a single PC. However, the proposed method where a server PC efficiently uses the computing power of client PCs can quickly calculate the CGH method for 3D object with super multi-light source. In the experimental result, we verified that the proposed method can generate the digital hologram with 1,5361,536 resolution size of 3D object with 157,771 light source in 121 ms. In addition, in the proposed method, we verify that the proposed method can reduce generation time of a digital hologram in proportion to the number of client PCs.
https://doi.org/10.5909/JBE.2015.20.5.706 인용 PDF KSCI KPUBS HTML

Design and Implementation of a Large-Scale Spatial Reasoner Using MapReduce Framework (맵리듀스 프레임워크를 이용한 대용량 공간 추론기의 설계 및 구현)

Nam, Sang Ha;Kim, In Cheol
- KIPS Transactions on Software and Data Engineering
- /
- v.3 no.10
- /
- pp.397-406
- /
- 2014
In order to answer the questions successfully on behalf of the human in DeepQA environments such as Jeopardy! of the American quiz show, the computer is required to have the capability of fast temporal and spatial reasoning on a large-scale commonsense knowledge base. In this paper, we present a scalable spatial reasoning algorithm for deriving efficiently new directional and topological relations using the MapReduce framework, one of well-known parallel distributed computing environments. The proposed reasoning algorithm assumes as input a large-scale spatial knowledge base including CSD-9 directional relations and RCC-8 topological relations. To infer new directional and topological relations from the given spatial knowledge base, it performs the cross-consistency checks as well as the path-consistency checks on the knowledge base. To maximize the parallelism of reasoning computations according to the principle of the MapReduce framework, we design the algorithm to partition effectively the large knowledge base into smaller ones and distribute them over multiple computing nodes at the map phase. And then, at the reduce phase, the algorithm infers the new knowledge from distributed spatial knowledge bases. Through experiments performed on the sample knowledge base with the MapReduce-based implementation of our algorithm, we proved the high performance of our large-scale spatial reasoner.
https://doi.org/10.3745/KTSDE.2014.3.10.397 인용 PDF KSCI

Implementation of Massive FDTD Simulation Computing Model Based on MPI Cluster for Semi-conductor Process (반도체 검증을 위한 MPI 기반 클러스터에서의 대용량 FDTD 시뮬레이션 연산환경 구축)

Lee, Seung-Il;Kim, Yeon-Il;Lee, Sang-Gil;Lee, Cheol-Hoon
- The Journal of the Korea Contents Association
- /
- v.15 no.9
- /
- pp.21-28
- /
- 2015
In the semi-conductor process, a simulation process is performed to detect defects by analyzing the behavior of the impurity through the physical quantity calculation of the inner element. In order to perform the simulation, Finite-Difference Time-Domain(FDTD) algorithm is used. The improvement of semiconductor which is composed of nanoscale elements, the size of simulation is getting bigger. Problems that a processor such as CPU or GPU cannot perform the simulation due to the massive size of matrix or a computer consist of multiple processors cannot handle a massive FDTD may come up. For those problems, studies are performed with parallel/distributed computing. However, in the past, only single type of processor was used. In GPU's case, it performs fast, but at the same time, it has limited memory. On the other hand, in CPU, it performs slower than that of GPU. To solve the problem, we implemented a computing model that can handle any FDTD simulation regardless of size on the cluster which consist of heterogeneous processors. We tested the simulation on processors using MPI libraries which is based on 'point to point' communication and verified that it operates correctly regardless of the number of node and type. Also, we analyzed the performance by measuring the total execution time and specific time for the simulation on each test.
https://doi.org/10.5392/JKCA.2015.15.09.021 인용 PDF KSCI

Enhanced NOW-Sort on a PC Cluster with a Low-Speed Network (저속 네트웍 PC 클러스터상에서 NOW-Sort의 성능향상)

Kim, Ji-Hyoung;Kim, Dong-Seung
- Journal of KIISE:Computer Systems and Theory
- /
- v.29 no.10
- /
- pp.550-560
- /
- 2002
External sort on cluster computers requires not only fast internal sorting computation but also careful scheduling of disk input and output and interprocessor communication through networks. This is because the overall time for the execution is determined by reflecting the times for all the jobs involved, and the portion for interprocessor communication and disk I/O operations is significant. In this paper, we improve the sorting performance (sorting throughput) on a cluster of PCs with a low-speed network by developing a new algorithm that enables even distribution of load among processors, and optimizes the disk read and write operations with other computation/communication activities during the sort. Experimental results support the effectiveness of the algorithm. We observe the algorithm reduces the sort time by 45% compared to the previous NOW-sort[1], and provides more scalability in the expansion of the computing nodes of the cluster as well.
PDF KSCI

A Design and Implementation of a Grid Job Monitoring Service Based on the OGSA(Open Grid Service Architecture) (OGSA(Open Grid Service Architecture)에 기반한 그리드 작업 모니터링 서비스 설계 및 구현)

Hahm, Jae-Gyoon;Kwon, Ok-Kyoung;Kim, Sang-Wan;Park, Hyoung-Woo
- Proceedings of the Korea Information Processing Society Conference
- /
- 2003.11a
- /
- pp.213-216
- /
- 2003
그리드 컴퓨팅에 있어서 핵심적인 역할을 하는 그리드 미들웨어는 사용자에게 있어서 사용하기에 편리해야 한다. 사용자가 자신의 계산을 수행하려고 할 때 사용해야 할 자원의 위치 및 가용성 등에 대해서 지식이 없더라도 자원의 할당을 자율적으로 할 수 있어야 한다. 특히 그리드 작업은 대부분 병렬작업으로서 분산된 복수의 자원을 동시에 이용하게 되는데, 이러한 환경에서 작업에 대한 모니터링은 사용자의 편의성을 최대한 고려하여 통합적인 서비스를 제공해야 한다. 그리고 OGSA(Open Grid Service Architecture)는 그리드에 웹 서비스 개념을 도입하여, 그리드 서비스의 확장성 및 구현의 용이성을 크게 향상시켰다. OGSA를 이용하여 그리드 서비스를 개발함으로써 사용자가 직접 미들웨어를 이용하기에 용이하게 할 뿐만 아니라, 사용자 어플리케이션을 만드는데 있어서도 쉽게 할 수 있다. 따라서 본 논문에서는 OGSA를 이용하여 사용자에게 통합적인 모니터링 서비스를 제공하는 그리드 작업 모니터링 서비스를 구현하였다.
PDF

Design and Implementation of Precision Time Synchronization in Wireless Networks Using ZigBee (ZigBee를 이용한 무선 네트워크 환경에서의 정밀 시각 동기 기법 설계 및 구현)

Cho, Hyun-Tae;Son, Sang-Hyun;Baek, Yun-Ju
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.33 no.5A
- /
- pp.561-570
- /
- 2008
Time synchronization is essential for a number of network applications such as high speed communication and parallel/distribution processing systems. As the era of ubiquitous computing is ushered in, the high precise time synchronization in wireless networks have been required in. This paper presents the design ana the implementation of the high precision time synchronization in wireless networks using ZigBee. To achieve high precision requirements, we have tried to analyze and reduce error factors such as the latency and jitters of a protocol stack on wireless environments. In addition, this paper includes some experiments and performance evaluations of our system. The result is that we established for nodes in a network to maintain their elects to within a 50 nanosecond offset from the reference clock.
PDF KSCI

Visualization System for Natural Disaster Data (자연재난 데이터 실감 가시화 시스템)

Kim, Jongyong;Jeong, Seokcheol;Lee, Gyeweon;Cho, Joonyoung;Kim, Dongwook;Park, Sanghun
- Journal of the Korea Computer Graphics Society
- /
- v.24 no.3
- /
- pp.21-31
- /
- 2018
We introduces a system that enables fast and effective visualization of natural disaster data such as typhoons, tsunamis, floods, and flooding to help make informed decisions in disaster situations. Data containing disaster information consists of a few hundred megabytes to many tens and hundreds of gigabytes, which can not be handled by a PC. This system was implemented in the form of a client-server based service to generate and output results from high-performance servers. The server in a built-in, high-performance cluster handles client requests and sends the result of visualization to the client. Clients can receive the results in any form of images, videos, or 3D graphic model by specifying a desired time frame, effectively viewing the results with a user-friendly GUI.
https://doi.org/10.15701/kcgs.2018.24.3.21 인용 PDF KSCI

Search Result 153, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)