Search | Korea Science

Analysis of Scalable Triple Repository Architecture for Big Data (대용량 데이터 기반 트리플 저장소 아키텍처 분석)

Kim, Tae-Hong;Um, Jung-Ho;Cho, Min-Hee;Choi, Sung-Pil;Jung, Han-Min
- Proceedings of the Korean Information Science Society Conference
- /
- 2012.06b
- /
- pp.423-425
- /
- 2012
비정형데이터의 분석을 위한 다양한 연구가 진행되면서 폭발적인 트리플 데이터 증가가 이루어졌다. 이는 결국 서비스 인프라의 병목현상을 초래하고 있으며, 그 해결책으로서 분산 병렬 아키텍처가 주목받고 있다. 본 논문은 대용량 시맨틱웹 자원을 저장, 적재, 질의 및 추론할 수 있는 트리플 저장소 특성에 가장 적합한 시스템 구조를 선정하기 위해 대용량 처리 능력, 데이터 처리 속도 및 안정성의 측면에서 연합 DBMS와 맵리듀스를 분석하는데 초점을 맞추고 있다. 분석 결과는 대용량 데이터 기반 트리플 저장소의 특성과 아키텍처의 유연성 및 향후 성능 개선 가능성을 판단하는 요소로 활용하여 맵리듀스 방식을 대용량 트리플 저장소에 적합한 방식으로 선정하였다. 본 연구는 대용량 데이터 기반 트리플 저장소 개발의 방향 수립을 위한 기반 연구로서 중요한 가치를 가진다.

User-based Collaborative Filtering Recommender Technique using MapReduce (맵리듀스를 이용한 사용자 기반 협업 필터링 추천 기법)

Yun, So-young;Youn, Sung-dae
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2015.10a
- /
- pp.331-333
- /
- 2015
Data is increasing explosively with the spread of networks and mobile devices and there are problems in effectively processing the rapidly increasing data using existing recommendation techniques. Therefore, researches are being conducted on how to solve the scalability problem of the collaborative filtering technique. In this paper applies MapReduce, which is a distributed parallel process framework, to the collaborative filtering technique to reduce the scalability problem and heighten accuracy. The proposed technique applies MapReduce and the index technique to a user-based collaborative filtering technique and as a method which improves neighbor numbers which are used in similarity calculations and neighbor suitability, scalability and accuracy improvement effects can be expected.
PDF

Sim-Hadoop : Leveraging Hadoop Distributed File System and Parallel I/O for Reliable and Efficient N-body Simulations (Sim-Hadoop : 신뢰성 있고 효율적인 N-body 시뮬레이션을 위한 Hadoop 분산 파일 시스템과 병렬 I / O)

Awan, Ammar Ahmad;Lee, Sungyoung;Chung, Tae Choong
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.05a
- /
- pp.476-477
- /
- 2013
Gadget-2 is a scientific simulation code has been used for many different types of simulations like, Colliding Galaxies, Cluster Formation and the popular Millennium Simulation. The code is parallelized with Message Passing Interface (MPI) and is written in C language. There is also a Java adaptation of the original code written using MPJ Express called Java Gadget. Java Gadget writes a lot of checkpoint data which may or may not use the HDF-5 file format. Since, HDF-5 is MPI-IO compliant, we can use our MPJ-IO library to perform parallel reading and writing of the checkpoint files and improve I/O performance. Additionally, to add reliability to the code execution, we propose the usage of Hadoop Distributed File System (HDFS) for writing the intermediate (checkpoint files) and final data (output files). The current code writes and reads the input, output and checkpoint files sequentially which can easily become bottleneck for large scale simulations. In this paper, we propose Sim-Hadoop, a framework to leverage HDFS and MPJ-IO for improving the I/O performance of Java Gadget code.
https://doi.org/10.3745/PKIPS.y2013m05a.476 인용 PDF

Real-Time Stock Price Prediction using Apache Spark (Apache Spark를 활용한 실시간 주가 예측)

Dong-Jin Shin;Seung-Yeon Hwang;Jeong-Joon Kim
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.23 no.4
- /
- pp.79-84
- /
- 2023
Apache Spark, which provides the fastest processing speed among recent distributed and parallel processing technologies, provides real-time functions and machine learning functions. Although official documentation guides for these functions are provided, a method for fusion of functions to predict a specific value in real time is not provided. Therefore, in this paper, we conducted a study to predict the value of data in real time by fusion of these functions. The overall configuration is collected by downloading stock price data provided by the Python programming language. And it creates a model of regression analysis through the machine learning function, and predicts the adjusted closing price among the stock price data in real time by fusing the real-time streaming function with the machine learning function.
https://doi.org/10.7236/JIIBC.2023.23.4.79 인용 PDF HTML

An Improved Depth-Based TDMA Scheduling Algorithm for Industrial WSNs to Reduce End-to-end Delay (산업 무선 센서 네트워크에서 종단 간 지연시간 감소를 위한 향상된 깊이 기반 TDMA 스케줄링 개선 기법)

Lee, Hwakyung;Chung, Sang-Hwa;Jung, Ik-Joo
- Journal of KIISE
- /
- v.42 no.4
- /
- pp.530-540
- /
- 2015
Industrial WSNs need great performance and reliable communication. In industrial WSNs, cluster structure reduces the cost to form a network, and the reservation-based MAC is a more powerful and reliable protocol than the contention-based MAC. Depth-based TDMA assigns time slots to each sensor node in a cluster-based network and it works in a distributed manner. DB-TDMA is a type of depth-based TDMA and guarantees scalability and energy efficiency. However, it cannot allocate time slots in parallel and cannot perfectly avoid a collision because each node does not know the total network information. In this paper, we suggest an improved distributed algorithm to reduce the end-to-end delay of DB-TDMA, and the proposed algorithm is compared with DRAND and DB-TDMA.
https://doi.org/10.5626/JOK.2015.42.4.530 인용 KSCI

A Study on Distributed Parallel SWRL Inference in an In-Memory-Based Cluster Environment (인메모리 기반의 클러스터 환경에서 분산 병렬 SWRL 추론에 대한 연구)

Lee, Wan-Gon;Bae, Seok-Hyun;Park, Young-Tack
- Journal of KIISE
- /
- v.45 no.3
- /
- pp.224-233
- /
- 2018
Recently, there are many of studies on SWRL reasoning engine based on user-defined rules in a distributed environment using a large-scale ontology. Unlike the schema based axiom rules, efficient inference orders cannot be defined in SWRL rules. There is also a large volumet of network shuffled data produced by unnecessary iterative processes. To solve these problems, in this study, we propose a method that uses Map-Reduce algorithm and distributed in-memory framework to deduce multiple rules simultaneously and minimizes the volume data shuffling occurring between distributed machines in the cluster. For the experiment, we use WiseKB ontology composed of 200 million triples and 36 user-defined rules. We found that the proposed reasoner makes inferences in 16 minutes and is 2.7 times faster than previous reasoning systems that used LUBM benchmark dataset.
https://doi.org/10.5626/JOK.2018.45.3.224 인용 KSCI

Implementation of A Multiple-agent System for Conference Calling (회의 소집을 위한 다중 에이전트 시스템의 구현)

유재홍;노승진;성미영
- Journal of Intelligence and Information Systems
- /
- v.8 no.2
- /
- pp.205-227
- /
- 2002
Our study is focused on a multiple-agent system to provide efficient collaborative work by automating the conference calling process with the help of intelligent agents. Automating the meeting scheduling requires a careful consideration of the individual official schedule as well as the privacy and personal preferences. Therefore, the automation of conference calling needs the distributed processing task where a separate calendar management process is associated for increasing the reliability and inherent parallelism. This paper describes in detail the design and implementation issues of a multiple-agent system for conference calling that allows the convener and participants to minimize their efforts in creating a meeting. Our system is based on the client-sewer model. In the sewer side, a scheduling agent, a negotiating agent, a personal information managing agent, a group information managing agent, a session managing agent, and a coordinating agent are operating. In the client side, an interface agent, a media agent, and a collaborating agent are operating. Agents use a standardized knowledge manipulation language to communicate amongst themselves. Communicating through a standardized knowledge manipulation language allows the system to overcome heterogeneity which is one of the most important problems in communication among agents for distributed collaborative computing. The agents of our system propose the dates on which as many participants as possible are available to attend the conference using the forward chaining algorithm and the back propagation network algorithm.
PDF

Design and implementation of a Shared-Concurrent File System in distributed UNIX environment (분산 UNIX 환경에서 Shared-Concurrent File System의 설계 및 구현)

Jang, Si-Ung;Jeong, Gi-Dong
- The Transactions of the Korea Information Processing Society
- /
- v.3 no.3
- /
- pp.617-630
- /
- 1996
In this paper, a shared-concurrent file system (S-CFS) is designed and implemented using conventional disks as disk arrays on a Workstation Cluster which can be used as a small-scale server. Since it is implemented on UNIX operating systems, S_CFS is not only portable and flexible but also efficient in resource usage because it does not require additional I/O nodes. The result of the research shows that on small-scale systems with enough disks, the performance of the concurrent file system on transaction processing applications is bounded by the bottleneck of CPUs computing powers while the performance of the concurrent file system on massive data I/Os is bounded by the time required to copy data between buffers. The concurrent file system,which has been implemented on a Workstation Cluster with 8 disks,shows a throughput of 388 tps in case of transaction processing applications and can provide the bandwidth of 15.8 Mbytes/sec in case of massive data processing applications. Moreover,the concurrent file system has been dsigned to enhance the throughput of applications requirring high performance I/O by controlling the paralleism of the concurrent file system on user's side.
PDF

A Study on implementation model for security log analysis system using Big Data platform (빅데이터 플랫폼을 이용한 보안로그 분석 시스템 구현 모델 연구)

Han, Ki-Hyoung;Jeong, Hyung-Jong;Lee, Doog-Sik;Chae, Myung-Hui;Yoon, Cheol-Hee;Noh, Kyoo-Sung
- Journal of Digital Convergence
- /
- v.12 no.8
- /
- pp.351-359
- /
- 2014
The log data generated by security equipment have been synthetically analyzed on the ESM(Enterprise Security Management) base so far, but due to its limitations of the capacity and processing performance, it is not suited for big data processing. Therefore the another way of technology on the big data platform is necessary. Big Data platform can achieve a large amount of data collection, storage, processing, retrieval, analysis, and visualization by using Hadoop Ecosystem. Currently ESM technology has developed in the way of SIEM (Security Information & Event Management) technology, and to implement security technology in SIEM way, Big Data platform technology is essential that can handle large log data which occurs in the current security devices. In this paper, we have a big data platform Hadoop Ecosystem technology for analyzing the security log for sure how to implement the system model is studied.
https://doi.org/10.14400/JDC.2014.12.8.351 인용 PDF KSCI

Enhancing the performance of taxi application based on in-memory data grid technology (In-memory data grid 기술을 활용한 택시 애플리케이션 성능 향상 기법 연구)

Choi, Chi-Hwan;Kim, Jin-Hyuk;Park, Min-Kyu;Kwon, Kaaen;Jung, Seung-Hyun;Nazareno, Franco;Cho, Wan-Sup
- Journal of the Korean Data and Information Science Society
- /
- v.26 no.5
- /
- pp.1035-1045
- /
- 2015
Recent studies in Big Data Analysis are showing promising results, utilizing the main memory for rapid data processing. In-memory computing technology can be highly advantageous when used with high-performing servers having tens of gigabytes of RAM with multi-core processors. The constraint in network in these infrastructure can be lessen by combining in-memory technology with distributed parallel processing. This paper discusses the research in the aforementioned concept applying to a test taxi hailing application without disregard to its underlying RDBMS structure. The application of IMDG technology in the application's backend API without restructuring the database schema yields 6 to 9 times increase in performance in data processing and throughput. Specifically, the change in throughput is very small even with increase in data load processing.
https://doi.org/10.7465/jkdi.2015.26.5.1035 인용 PDF KSCI

Search Result 411, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)