• Title/Summary/Keyword: Spark Problem

Search Result 54, Processing Time 0.027 seconds

Implementation of Parallel Local Alignment Method for DNA Sequence using Apache Spark (Apache Spark을 이용한 병렬 DNA 시퀀스 지역 정렬 기법 구현)

  • Kim, Bosung;Kim, Jinsu;Choi, Dojin;Kim, Sangsoo;Song, Seokil
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.10
    • /
    • pp.608-616
    • /
    • 2016
  • The Smith-Watrman (SW) algorithm is a local alignment algorithm which is one of important operations in DNA sequence analysis. The SW algorithm finds the optimal local alignment with respect to the scoring system being used, but it has a problem to demand long execution time. To solve the problem of SW, some methods to perform SW in distributed and parallel manner have been proposed. The ADAM which is a distributed and parallel processing framework for DNA sequence has parallel SW. However, the parallel SW of the ADAM does not consider that the SW is a dynamic programming method, so the parallel SW of the ADAM has the limit of its performance. In this paper, we propose a method to enhance the parallel SW of ADAM. The proposed parallel SW (PSW) is performed in two phases. In the first phase, the PSW splits a DNA sequence into the number of partitions and assigns them to multiple nodes. Then, the original Smith-Waterman algorithm is performed in parallel at each node. In the second phase, the PSW estimates the portion of data sequence that should be recalculated, and the recalculation is performed on the portions in parallel at each node. In the experiment, we compare the proposed PSW to the parallel SW of the ADAM to show the superiority of the PSW.

A Comparative Analysis of Recursive Query Algorithm Implementations based on High Performance Distributed In-Memory Big Data Processing Platforms (대용량 데이터 처리를 위한 고속 분산 인메모리 플랫폼 기반 재귀적 질의 알고리즘들의 구현 및 비교분석)

  • Kang, Minseo;Kim, Jaesung;Lee, Jaegil
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.621-626
    • /
    • 2016
  • Recursive query algorithm is used in many social network services, e.g., reachability queries in social networks. Recently, the size of social network data has increased as social network services evolve. As a result, it is almost impossible to use the recursive query algorithm on a single machine. In this paper, we implement recursive query on two popular in-memory distributed platforms, Spark and Twister, to solve this problem. We evaluate the performance of two implementations using 50 machines on Amazon EC2, and real-world data sets: LiveJournal and ClueWeb. The result shows that recursive query algorithm shows better performance on Spark for the Livejournal input data set with relatively high average degree, but smaller vertices. However, recursive query on Twister is superior to Spark for the ClueWeb input data set with relatively low average degree, but many vertices.

Mechanical Property Evaluation of WC-Co-Mo2C Hard Materials by a Spark Plasma Sintering Process (방전플라즈마 소결 공정을 이용한 WC-Co-Mo2C 소재의 기계적 특성평가)

  • Kim, Ju-Hun;Park, Hyun-Kuk
    • Korean Journal of Materials Research
    • /
    • v.31 no.7
    • /
    • pp.392-396
    • /
    • 2021
  • Expensive PCBN or ceramic cutting tools are used for processing of difficult-to-cut materials such as Ti and Ni alloy materials. These tools have the problem of breaking easily due to their high hardness but low fracture toughness. To solve these problems, cutting tools that form various coating layers are used in low-cost WC-Co hard material tools, and research on various tool materials is being conducted. In this study, binderless-WC, WC-6 wt%Co, WC-6 wt%Co-1 wt% Mo2C, and WC-6 wt%Co-2.5 wt% Mo2C hard materials are densified using horizontal ball milled WC-Co, WC-Co-Mo2C powders, and spark plasma sintering process (SPS process). Each SPSed Binderless-WC, WC-6 wt%Co-1 wt% Mo2C, and WC-6 wt%Co-2.5 wt% Mo2C hard materials are almost completely dense, with relative density of up to 99.5 % after the simultaneous application of pressure of 60 MPa and almost no significant change in grain size. The average grain sizes of WC for Binderless-WC, WC-6 wt%Co-1 wt% Mo2C, and WC-6 wt%Co-2.5 wt% Mo2C hard materials are about 0.37, 0.6, 0.54, and 0.43 ㎛, respectively. Mechanical properties, microstructure, and phase analysis of SPSed Binderless-WC, WC-6 wt%Co-1 wt% Mo2C, and WC-6 wt%Co-2.5 wt% Mo2C hard materials are investigated.

Fabrication and Characterization of the Ti-TCP Composite Biomaterials by Spark Plasma Sintering

  • Mondal, Dibakar;Park, Hyun-Kuk;Oh, Ik-Hyun;Lee, Byong-Taek
    • Proceedings of the Materials Research Society of Korea Conference
    • /
    • 2011.05a
    • /
    • pp.53.2-53.2
    • /
    • 2011
  • Ti metal has superior mechanical properties along with biocompatibility, but it still has the problem of bio-inertness thus forming weaker bond in bone/implant interface and long term clinical performance as orthopaedic and dental devices are restricted for stress shielding effect. On the other hand, despite the excellent biodegradable behavior as being an integral constituent of the natural bone, the mechanical properties of ${\beta}$-tricalcium phosphate $(Ca_3(PO_4)_2;\;{\beta}-TCP)$ ceramics are not reliable enough for post operative load bearing application in human hard tissue defect site. One reasonable approach would be to mediate the features of the two by making a composite. In this study, ${\beta}$-TCP/Ti ceramic-metal composites were fabricated by spark plasma sintering in inert atmosphere to inhibit the formation of $TiO_2$. Composites of 30 vol%, 50 vol% and 70 vol% ${\beta}$-TCP with Ti were fabricated. Detailed microstructural and phase characteristics were investigated by FE-SEM, EDS and XRD. Material properties like relative density, hardness, compressive strength, elastic modulus etc. were characterized. Cell viability and biocompatibility were investigated using the MTT assay and by examining cell proliferation behavior.

  • PDF

The Research about Free Piston Linear Engine Fueled with Hydrogen using Numerical Analysis (수소를 연료로 사용한 프리피스톤 리니어 엔진의 수치해석에 관한 연구)

  • Nguyen, Ba Hung;Oh, Yong-Il;Lim, Ock-Taeck
    • Transactions of the Korean hydrogen and new energy society
    • /
    • v.23 no.2
    • /
    • pp.162-172
    • /
    • 2012
  • This paper presents a research about free piston linear engine (FPLE) fueled with hydrogen, in which, the numerical models are built to simulate the operation during the full stroke of the engine. Dynamic model, linear alternator model and thermodynamic model are used as the numerical models to predict piston velocity, in-cylinder pressure and electric power of FPLE. The spark timing and air gap length are changed to provide information for the prediction. Beside, the heat transfer problem is also investigated in the paper. The results of research are divided by two parts, including motoring mode and firing mode. The result of motoring mode showed that there is validation between simulation and experiment for volume and pressure in cylinder. For firing mode, by increasing spark timing, the velocity of piston, peak pressure and electric power also increase respectively. Beside, when increasing air gap length, the electric power increases accordingly while the motion of piston is not symmetric. The effect of heat transfer also observed clearly by reducing of the peak pressure, velocity of piston and electric power.

A Study on the In-Cylinder Injection Type Hydrogen Fueled S.I. Engine (연소실내 분사식 수소연료기관의 특성에 관한 연구)

  • 조우흠;이형승;김응서
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.19 no.7
    • /
    • pp.1702-1708
    • /
    • 1995
  • Owing to the serious problem of hydrocarbon fuel such as environmental pollution, the development of alternative fuel is very urgent. To adopt hydrogen to the internal combustion engine, a solenoid-drive type in-cylinder injection system was constructed. The injection system was installed to the single cylinder research engine, and the engine performance and the emission of citric oxide were tested upon the fuel-air equivalence ratio and the spark timing. In the case of in-cylinder injection system, hydrogen is injected after the intake valve is close, so it is possible to operate the engine without the back fire and the fall of its volumetric efficiency. In the region of the fuel-air equivalence ratio below 0.5, hydrogen and air aren't well mixed and the thermal efficiency is lowered, so the nozzle should be designed to inject hydrogen uniformly into the combustion chamber. In the region of the fuel-air equivalence ratio above 0.7,the fuel-air mixture burns very fast and the amount of citric oxide emission increases rapidly, so the spark timing should be retarded as compared with MBT.

Development of Big-data Management Platform Considering Docker Based Real Time Data Connecting and Processing Environments (도커 기반의 실시간 데이터 연계 및 처리 환경을 고려한 빅데이터 관리 플랫폼 개발)

  • Kim, Dong Gil;Park, Yong-Soon;Chung, Tae-Yun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.4
    • /
    • pp.153-161
    • /
    • 2021
  • Real-time access is required to handle continuous and unstructured data and should be flexible in management under dynamic state. Platform can be built to allow data collection, storage, and processing from local-server or multi-server. Although the former centralize method is easy to control, it creates an overload problem because it proceeds all the processing in one unit, and the latter distributed method performs parallel processing, so it is fast to respond and can easily scale system capacity, but the design is complex. This paper provides data collection and processing on one platform to derive significant insights from various data held by an enterprise or agency in the latter manner, which is intuitively available on dashboards and utilizes Spark to improve distributed processing performance. All service utilize dockers to distribute and management. The data used in this study was 100% collected from Kafka, showing that when the file size is 4.4 gigabytes, the data processing speed in spark cluster mode is 2 minute 15 seconds, about 3 minutes 19 seconds faster than the local mode.

Confidence Value based Large Scale OWL Horst Ontology Reasoning (신뢰 값 기반의 대용량 OWL Horst 온톨로지 추론)

  • Lee, Wan-Gon;Park, Hyun-Kyu;Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.5
    • /
    • pp.553-561
    • /
    • 2016
  • Several machine learning techniques are able to automatically populate ontology data from web sources. Also the interest for large scale ontology reasoning is increasing. However, there is a problem leading to the speculative result to imply uncertainties. Hence, there is a need to consider the reliability problems of various data obtained from the web. Currently, large scale ontology reasoning methods based on the trust value is required because the inference-based reliability of quantitative ontology is insufficient. In this study, we proposed a large scale OWL Horst reasoning method based on a confidence value using spark, a distributed in-memory framework. It describes a method for integrating the confidence value of duplicated data. In addition, it explains a distributed parallel heuristic algorithm to solve the problem of degrading the performance of the inference. In order to evaluate the performance of reasoning methods based on the confidence value, the experiment was conducted using LUBM3000. The experiment results showed that our approach could perform reasoning twice faster than existing reasoning systems like WebPIE.

Design of a Platform for Collecting and Analyzing Agricultural Big Data (농업 빅데이터 수집 및 분석을 위한 플랫폼 설계)

  • Nguyen, Van-Quyet;Nguyen, Sinh Ngoc;Kim, Kyungbaek
    • Journal of Digital Contents Society
    • /
    • v.18 no.1
    • /
    • pp.149-158
    • /
    • 2017
  • Big data have been presenting us with exciting opportunities and challenges in economic development. For instance, in the agriculture sector, mixing up of various agricultural data (e.g., weather data, soil data, etc.), and subsequently analyzing these data deliver valuable and helpful information to farmers and agribusinesses. However, massive data in agriculture are generated in every minute through multiple kinds of devices and services such as sensors and agricultural web markets. It leads to the challenges of big data problem including data collection, data storage, and data analysis. Although some systems have been proposed to address this problem, they are still restricted either in the type of data, the type of storage, or the size of data they can handle. In this paper, we propose a novel design of a platform for collecting and analyzing agricultural big data. The proposed platform supports (1) multiple methods of collecting data from various data sources using Flume and MapReduce; (2) multiple choices of data storage including HDFS, HBase, and Hive; and (3) big data analysis modules with Spark and Hadoop.

Design of Client-Server Model For Effective Processing and Utilization of Bigdata (빅데이터의 효과적인 처리 및 활용을 위한 클라이언트-서버 모델 설계)

  • Park, Dae Seo;Kim, Hwa Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.109-122
    • /
    • 2016
  • Recently, big data analysis has developed into a field of interest to individuals and non-experts as well as companies and professionals. Accordingly, it is utilized for marketing and social problem solving by analyzing the data currently opened or collected directly. In Korea, various companies and individuals are challenging big data analysis, but it is difficult from the initial stage of analysis due to limitation of big data disclosure and collection difficulties. Nowadays, the system improvement for big data activation and big data disclosure services are variously carried out in Korea and abroad, and services for opening public data such as domestic government 3.0 (data.go.kr) are mainly implemented. In addition to the efforts made by the government, services that share data held by corporations or individuals are running, but it is difficult to find useful data because of the lack of shared data. In addition, big data traffic problems can occur because it is necessary to download and examine the entire data in order to grasp the attributes and simple information about the shared data. Therefore, We need for a new system for big data processing and utilization. First, big data pre-analysis technology is needed as a way to solve big data sharing problem. Pre-analysis is a concept proposed in this paper in order to solve the problem of sharing big data, and it means to provide users with the results generated by pre-analyzing the data in advance. Through preliminary analysis, it is possible to improve the usability of big data by providing information that can grasp the properties and characteristics of big data when the data user searches for big data. In addition, by sharing the summary data or sample data generated through the pre-analysis, it is possible to solve the security problem that may occur when the original data is disclosed, thereby enabling the big data sharing between the data provider and the data user. Second, it is necessary to quickly generate appropriate preprocessing results according to the level of disclosure or network status of raw data and to provide the results to users through big data distribution processing using spark. Third, in order to solve the problem of big traffic, the system monitors the traffic of the network in real time. When preprocessing the data requested by the user, preprocessing to a size available in the current network and transmitting it to the user is required so that no big traffic occurs. In this paper, we present various data sizes according to the level of disclosure through pre - analysis. This method is expected to show a low traffic volume when compared with the conventional method of sharing only raw data in a large number of systems. In this paper, we describe how to solve problems that occur when big data is released and used, and to help facilitate sharing and analysis. The client-server model uses SPARK for fast analysis and processing of user requests. Server Agent and a Client Agent, each of which is deployed on the Server and Client side. The Server Agent is a necessary agent for the data provider and performs preliminary analysis of big data to generate Data Descriptor with information of Sample Data, Summary Data, and Raw Data. In addition, it performs fast and efficient big data preprocessing through big data distribution processing and continuously monitors network traffic. The Client Agent is an agent placed on the data user side. It can search the big data through the Data Descriptor which is the result of the pre-analysis and can quickly search the data. The desired data can be requested from the server to download the big data according to the level of disclosure. It separates the Server Agent and the client agent when the data provider publishes the data for data to be used by the user. In particular, we focus on the Big Data Sharing, Distributed Big Data Processing, Big Traffic problem, and construct the detailed module of the client - server model and present the design method of each module. The system designed on the basis of the proposed model, the user who acquires the data analyzes the data in the desired direction or preprocesses the new data. By analyzing the newly processed data through the server agent, the data user changes its role as the data provider. The data provider can also obtain useful statistical information from the Data Descriptor of the data it discloses and become a data user to perform new analysis using the sample data. In this way, raw data is processed and processed big data is utilized by the user, thereby forming a natural shared environment. The role of data provider and data user is not distinguished, and provides an ideal shared service that enables everyone to be a provider and a user. The client-server model solves the problem of sharing big data and provides a free sharing environment to securely big data disclosure and provides an ideal shared service to easily find big data.