• Title/Summary/Keyword: Big Node

Search Result 127, Processing Time 0.033 seconds

Large Scale Incremental Reasoning using SWRL Rules in a Distributed Framework (분산 처리 환경에서 SWRL 규칙을 이용한 대용량 점증적 추론 방법)

  • Lee, Wan-Gon;Bang, Sung-Hyuk;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.44 no.4
    • /
    • pp.383-391
    • /
    • 2017
  • As we enter a new era of Big Data, the amount of semantic data has rapidly increased. In order to derive meaningful information from this large semantic data, studies that utilize the SWRL(Semantic Web Rule Language) are being actively conducted. SWRL rules are based on data extracted from a user's empirical knowledge. However, conventional reasoning systems developed on single machines cannot process large scale data. Similarly, multi-node based reasoning systems have performance degradation problems due to network shuffling. Therefore, this paper overcomes the limitations of existing systems and proposes more efficient distributed inference methods. It also introduces data partitioning strategies to minimize network shuffling. In addition, it describes a method for optimizing the incremental reasoning process through data selection and determining the rule order. In order to evaluate the proposed methods, the experiments were conducted using WiseKB consisting of 200 million triples with 83 user defined rules and the overall reasoning task was completed in 32.7 minutes. Also, the experiment results using LUBM bench datasets showed that our approach could perform reasoning twice as fast as MapReduce based reasoning systems.

A Topic Analysis of SW Education Textdata Using R (R을 활용한 SW교육 텍스트데이터 토픽분석)

  • Park, Sunju
    • Journal of The Korean Association of Information Education
    • /
    • v.19 no.4
    • /
    • pp.517-524
    • /
    • 2015
  • In this paper, to find out the direction of interest related to the SW education, SW education news data were gathered and its contents were analyzed. The topic analysis of SW education news was performed by collecting the data of July 23, 2013 to October 19, 2015. By analyzing the relationship among the most mentioned top 20 words with the web crawling using R, the result indicated that the 20 words are the closely relevant data as the thickness of the node size of the 20 words was balancing each other in the co-occurrence matrix graph focusing on the 'SW education' word. Moreover, our analysis revealed that the data were mainly composed of the topics about SW talent, SW support Program, SW educational mandate, SW camp, SW industry and the job creation. This could be used for big data analysis to find out the thoughts and interests of such people in the SW education.

Characteristics of Panicle Traits for 178 Rice Varieties Bred in Korea (국내에서 육성된 벼 품종들의 이삭형질 특성)

  • Park, Hyun-Su;Kim, Ki-Young;Mo, Young-Jun;Choung, Jin-Il;Kang, Hyun-Jung;Kim, Bo-Kyung;Shin, Mun-Sik;Ko, Jae-Kwon;Kim, Sun-Hyung;Lee, Bu-Young
    • Korean Journal of Breeding Science
    • /
    • v.42 no.2
    • /
    • pp.169-180
    • /
    • 2010
  • This study was conducted to investigate characteristics of panicle traits which are important factors affecting yield and grain quality of rice. Twelve panicle traits in 178 Korean rice varieties composed of 160 Japonica type varieties and 18 Tongil type varieties were investigated. Tongil type varieties had longer panicle and thicker neck node than Japonica type varieties. Other traits such as number of total spikelets, total rachis-branches, secondary rachis-branches (SRBs) per panicle, total spikelets on SRBs per panicle, mean number of spikelets on a SRB, and mean number of SRBs per primary rachis branch (PRB) in Tongil type varieties were also higher than in Japonica type varieties. On the other hand, Japonica type varieties were shown to have well exserted panicle and little more mean number of spikelets on a PRB than Tongil type varieties. According to cluster analysis based on 12 panicle traits, 178 varieties were divided into four main groups. Group I had 133 Japonica type varieties and was characterized by relatively well exserted short panicle, small thickness of neck node, few rachis-branches and little sink size than other group. Group II was composed of 24 Japonica type varieties and 6 Tongil type varieties showing medium value and range between Group I and III. Group III included 11 Tongil type varieties and 1 Japonica type variety 'Baegjinju1' characterized by relatively poor exserted long panicle, big thickness of neck node, many rachis-branches and large sink size. Group IV was solely composed of 'Nongan', which had well exserted long panicle, big thickness of neck node, many rachis-branches and large-sink size. In correlation analysis, number of total spikelets per panicle showed very high correlation with the number of total rachis-branches per panicle (r=0.975), number of spikelets on SRBs per panicle (0.962), number of SRBs per panicle (0.959), mean number of SRBs per PRB (0.746) and mean number of spikelets on SRBs (0.738).

PPFP(Push and Pop Frequent Pattern Mining): A Novel Frequent Pattern Mining Method for Bigdata Frequent Pattern Mining (PPFP(Push and Pop Frequent Pattern Mining): 빅데이터 패턴 분석을 위한 새로운 빈발 패턴 마이닝 방법)

  • Lee, Jung-Hun;Min, Youn-A
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.12
    • /
    • pp.623-634
    • /
    • 2016
  • Most of existing frequent pattern mining methods address time efficiency and greatly rely on the primary memory. However, in the era of big data, the size of real-world databases to mined is exponentially increasing, and hence the primary memory is not sufficient enough to mine for frequent patterns from large real-world data sets. To solve this problem, there are some researches for frequent pattern mining method based on disk, but the processing time compared to the memory based methods took very time consuming. There are some researches to improve scalability of frequent pattern mining, but their processes are very time consuming compare to the memory based methods. In this paper, we present PPFP as a novel disk-based approach for mining frequent itemset from big data; and hence we reduced the main memory size bottleneck. PPFP algorithm is based on FP-growth method which is one of the most popular and efficient frequent pattern mining approaches. The mining with PPFP consists of two setps. (1) Constructing an IFP-tree: After construct FP-tree, we assign index number for each node in FP-tree with novel index numbering method, and then insert the indexed FP-tree (IFP-tree) into disk as IFP-table. (2) Mining frequent patterns with PPFP: Mine frequent patterns by expending patterns using stack based PUSH-POP method (PPFP method). Through this new approach, by using a very small amount of memory for recursive and time consuming operation in mining process, we improved the scalability and time efficiency of the frequent pattern mining. And the reported test results demonstrate them.

Development of Safety Performance Functions and Level of Service of Safety on National Roads Using Traffic Big Data (교통 빅데이터를 이용한 전국 도로 안전성능함수 및 안전등급 개발 연구)

  • Kwon, Kenan;Park, Sangmin;Jeong, Harim;Kwon, Cheolwoo;Yun, Ilsoo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.18 no.5
    • /
    • pp.34-48
    • /
    • 2019
  • The purpose of this study was two-fold; first, to develop safety performance functions (SPF) using transportation-related big data for all types of roads in Korea were developed, Second, to provide basic information to develop measures for relatively dangerous roads by evaluating the safety grade for various roads based on it. The coordinates of traffic accident data are used to match roads across the country based on the national standard node and link system. As independent variables, this study effort uses link length, the number of traffic volume data from ViewT established by the Korea Transport Research Institute, and the number of dangerous driving behaviors based on the digital tachograph system installed on commercial vehicles. Based on the methodology and result of analysis used in this study, it is expected that the transportation safety improvement projects can be properly selected, and the effects can be clearly monitored and quantified.

Analysis of the Impact Relationship for Risk Factors on Big Data Projects Using SNA (SNA를 활용한 빅데이터 프로젝트의 위험요인 영향 관계 분석)

  • Park, Dae-Gwi;Kim, Seung-Hee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.1
    • /
    • pp.79-86
    • /
    • 2021
  • In order to increase the probability of success in big data projects, quantified techniques are required to analyze the root cause of risks from complex causes and establish optimal countermeasures. To this end, this study measures risk factors and relationships through SNA analysis and presents a way to respond to risks based on them. In other words, it derives a dependency network matrix by utilizing the results of correlation analysis between risk groups in the big data projects presented in the preliminary study and performs SNA analysis. In order to derive the dependency network matrix, partial correlation is obtained from the correlation between the risk nodes, and activity dependencies are derived by node by calculating the correlation influence and correlation dependency, thereby producing the causal relationship between the risk nodes and the degree of influence between all nodes in correlation. Recognizing the root cause of risks from networks between risk factors derived through SNA between risk factors enables more optimized and efficient risk management. This study is the first to apply SNA analysis techniques in relation to risk management response, and the results of this study are significant in that it not only optimizes the sequence of risk management for major risks in relation to risk management in IT projects but also presents a new risk analysis technique for risk control.

Design of the Structure for Scaling-Wavelet Neural Network Using Genetic Algorithm (유전 알고리즘을 이용한 스케일링-웨이블릿 복합 신경회로망 구조 설계)

  • 김성주;서재용;연정흠;김성현;전홍태
    • Proceedings of the IEEK Conference
    • /
    • 2001.06c
    • /
    • pp.25-28
    • /
    • 2001
  • RBFN has some problem that because the basis function isn't orthogonal to each others the number of used basis function goes to big. In this reason, the Wavelet Neural Network which uses the orthogonal basis function in the hidden node appears. In this paper, we propose the composition method of the actual function in hidden layer with the scaling function which can represent the region by which the several wavelet can be represented. In this method, we can decrease the size of the network with the pure several wavelet function. In addition to, when we determine the parameters of the scaling function we can process rough approximation and then the network becomes more stable. The other wavelets can be determined by the global solutions which is suitable for the suggested problem using the genetic algorithm and also, we use the back-propagation algorithm in the learning of the weights. In this step, we approximate the target function with fine tuning level. The complex neural network suggested In this paper is a new structure and important simultaneously in the point of handling the determination problem in the wavelet initialization.

  • PDF

A Self-Calibrated Localization System using Chirp Spread Spectrum in a Wireless Sensor Network

  • Kim, Seong-Joong;Park, Dong-Joo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.2
    • /
    • pp.253-270
    • /
    • 2013
  • To achieve accurate localization information, complex algorithms that have high computational complexity are usually implemented. In addition, many of these algorithms have been developed to overcome several limitations, e.g., obstruction interference in multi-path and non-line-of-sight (NLOS) environments. However, localization systems those have complex design experience latency when operating multiple mobile nodes occupying various channels and try to compensate for inaccurate distance values. To operate multiple mobile nodes concurrently, we propose a localization system with both low complexity and high accuracy and that is based on a chirp spread spectrum (CSS) radio. The proposed localization system is composed of accurate ranging values that are analyzed by simple linear regression that utilizes a Big-$O(n^2)$ of only a few data points and an algorithm with a self-calibration feature. The performance of the proposed localization system is verified by means of actual experiments. The results show a mean error of about 1 m and multiple mobile node operation in a $100{\times}35m^2$ environment under NLOS condition.

RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment

  • Muhammad Faseeh Qureshi, Nawab;Shin, Dong Ryeol
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.9
    • /
    • pp.4063-4086
    • /
    • 2016
  • Cloud computing is a robust technology, which facilitate to resolve many parallel distributed computing issues in the modern Big Data environment. Hadoop is an ecosystem, which process large data-sets in distributed computing environment. The HDFS is a filesystem of Hadoop, which process data blocks to the cluster nodes. The data block placement has become a bottleneck to overall performance in a Hadoop cluster. The current placement policy assumes that, all Datanodes have equal computing capacity to process data blocks. This computing capacity includes availability of same storage media and same processing performances of a node. As a result, Hadoop cluster performance gets effected with unbalanced workloads, inefficient storage-tier, network traffic congestion and HDFS integrity issues. This paper proposes a storage-tier-aware Robust Data Placement (RDP) scheme, which systematically resolves unbalanced workloads, reduces network congestion to an optimal state, utilizes storage-tier in a useful manner and minimizes the HDFS integrity issues. The experimental results show that the proposed approach reduced unbalanced workload issue to 72%. Moreover, the presented approach resolve storage-tier compatibility problem to 81% by predicting storage for block jobs and improved overall data block placement by 78% through pre-calculated computing capacity allocations and execution of map files over respective Namenode and Datanodes.

CTaG: An Innovative Approach for Optimizing Recovery Time in Cloud Environment

  • Hung, Pham Phuoc;Aazam, Mohammad;Huh, Eui-Nam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.4
    • /
    • pp.1282-1301
    • /
    • 2015
  • Traditional infrastructure has been superseded by cloud computing, due to its cost-effective and ubiquitous computing model. Cloud computing not only brings multitude of opportunities, but it also bears some challenges. One of the key challenges it faces is recovery of computing nodes, when an Information Technology (IT) failure occurs. Since cloud computing mainly depends upon its nodes, physical servers, that makes it very crucial to recover a failed node in time and seamlessly, so that the customer gets an expected level of service. Work has already been done in this regard, but it has still proved to be trivial. In this study, we present a Cost-Time aware Genetic scheduling algorithm, referred to as CTaG, not only to globally optimize the performance of the cloud system, but also perform recovery of failed nodes efficiently. While modeling our work, we have particularly taken into account the factors of network bandwidth and customer's monetary cost. We have implemented our algorithm and justify it through extensive simulations and comparison with similar existing studies. The results show performance gain of our work over the others, in some particular scenarios.