• Title/Summary/Keyword: Large Scale Data

Search Result 2,773, Processing Time 0.03 seconds

Design and Implementation of a Hadoop-based Efficient Security Log Analysis System (하둡 기반의 효율적인 보안로그 분석시스템 설계 및 구현)

  • Ahn, Kwang-Min;Lee, Jong-Yoon;Yang, Dong-Min;Lee, Bong-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.8
    • /
    • pp.1797-1804
    • /
    • 2015
  • Integrated log management system can help to predict the risk of security and contributes to improve the security level of the organization, and leads to prepare an appropriate security policy. In this paper, we have designed and implemented a Hadoop-based log analysis system by using distributed database model which can store large amount of data and reduce analysis time by automating log collecting procedure. In the proposed system, we use the HBase in order to store a large amount of data efficiently in the scale-out fashion and propose an easy data storing scheme for analysing data using a Hadoop-based normal expression, which results in improving data processing speed compared to the existing system.

Janus - Multi Source Event Detection and Collection System for Effective Surveillance of Criminal Activity

  • Shahabi, Cyrus;Kim, Seon Ho;Nocera, Luciano;Constantinou, Giorgos;Lu, Ying;Cai, Yinghao;Medioni, Gerard;Nevatia, Ramakant;Banaei-Kashani, Farnoush
    • Journal of Information Processing Systems
    • /
    • v.10 no.1
    • /
    • pp.1-22
    • /
    • 2014
  • Recent technological advances provide the opportunity to use large amounts of multimedia data from a multitude of sensors with different modalities (e.g., video, text) for the detection and characterization of criminal activity. Their integration can compensate for sensor and modality deficiencies by using data from other available sensors and modalities. However, building such an integrated system at the scale of neighborhood and cities is challenging due to the large amount of data to be considered and the need to ensure a short response time to potential criminal activity. In this paper, we present a system that enables multi-modal data collection at scale and automates the detection of events of interest for the surveillance and reconnaissance of criminal activity. The proposed system showcases novel analytical tools that fuse multimedia data streams to automatically detect and identify specific criminal events and activities. More specifically, the system detects and analyzes series of incidents (an incident is an occurrence or artifact relevant to a criminal activity extracted from a single media stream) in the spatiotemporal domain to extract events (actual instances of criminal events) while cross-referencing multimodal media streams and incidents in time and space to provide a comprehensive view to a human operator while avoiding information overload. We present several case studies that demonstrate how the proposed system can provide law enforcement personnel with forensic and real time tools to identify and track potential criminal activity.

A Scalable and Effective DDS Participant Discovery Mechanism (확장성과 효율성 고려한 DDS 참여자 디스커버리 기법)

  • Kwon, Ki-Jung;You, Yong-Duck;Choi, Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.7
    • /
    • pp.1344-1356
    • /
    • 2009
  • The DDS (Data Distribution Service) is a data-centric communication technology that provides an efficient communication service that supports a dynamic plug & play through an automatic setting of participants' location information for each data (Topic) by using DDS discovery technique. This paper proposes the hierarchical-structured DDS discovery technique (SPDP-TBF) suitable for the large-scale distributed systems by comparing and analyzing the existing DDS discovery techniques in terms of performance and problem areas. The proposed SPDP-TBF performs the periodic discovery of the involved participants only by having separate hierarchical managers which take charge of the registration and search (of participants) so that a participant sends its information to the related participants only, and it enhances the effectiveness of the message transfer. Moreover, the proposed SPDP-TBF provides the improved scalability by performing the hierarchical discovery through hierarchical manager nodes so that it can be applied to the large-scale distributed system.

Research on Deep Learning Performance Improvement for Similar Image Classification (유사 이미지 분류를 위한 딥 러닝 성능 향상 기법 연구)

  • Lim, Dong-Jin;Kim, Taehong
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.8
    • /
    • pp.1-9
    • /
    • 2021
  • Deep learning in computer vision has made accelerated improvement over a short period but large-scale learning data and computing power are still essential that required time-consuming trial and error tasks are involved to derive an optimal network model. In this study, we propose a similar image classification performance improvement method based on CR (Confusion Rate) that considers only the characteristics of the data itself regardless of network optimization or data reinforcement. The proposed method is a technique that improves the performance of the deep learning model by calculating the CRs for images in a dataset with similar characteristics and reflecting it in the weight of the Loss Function. Also, the CR-based recognition method is advantageous for image identification with high similarity because it enables image recognition in consideration of similarity between classes. As a result of applying the proposed method to the Resnet18 model, it showed a performance improvement of 0.22% in HanDB and 3.38% in Animal-10N. The proposed method is expected to be the basis for artificial intelligence research using noisy labeled data accompanying large-scale learning data.

Dependence of Weibull parameters on the diameter and the internal defects of Tyranno ZMI fiber in the strength analysis

  • Morimoto, Tetsuya;Yamamoto, Koji;Ogihara, Shinji
    • Advanced Composite Materials
    • /
    • v.16 no.3
    • /
    • pp.245-258
    • /
    • 2007
  • The single-modal Weibull model has been assessed on Tyranno ZMI Si-Zr-C-O fiber if a set of shape and scale parameters accurately reproduced the effect of the size of the diameter on strength. The tensile data of a single fiber have been divided into two expedient groups as 'small diameter' group and 'large diameter' group in deriving the parameters, which should be consistent if the Weibull model accurately reproduced the size effect. However, the derived Weibull parameters were inconsistent between the two groups. Thereby the authors have concluded that the parameters of the single-modal Weibull model are dependent on the fiber diameter, so that the model is inadequate to reproduce the strength size effect. On the other hand, Weibull parameters were found consistent between the two groups by excluding the data of 'large mirror zone' sample, which was defined as the sample around 10% mirror zone area of the fracture surface. What is more, the exclusion reduced the strength variance more drastically in the 'large diameter' group than in the 'small diameter' group, even though the 'large mirror zone' samples were found identical in the percentage between the two groups. The authors therefore conclude that diameter limitation to the 'small diameter' group level can lead to drastically less distributed strength values than the estimated strength through the Weibull scaling on the present Tyranno ZMI Si-Zr-C-O fiber.

Large eddy simulation of turbulent flow using the parallel computational fluid dynamics code GASFLOW-MPI

  • Zhang, Han;Li, Yabing;Xiao, Jianjun;Jordan, Thomas
    • Nuclear Engineering and Technology
    • /
    • v.49 no.6
    • /
    • pp.1310-1317
    • /
    • 2017
  • GASFLOW-MPI is a widely used scalable computational fluid dynamics numerical tool to simulate the fluid turbulence behavior, combustion dynamics, and other related thermal-hydraulic phenomena in nuclear power plant containment. An efficient scalable linear solver for the large-scale pressure equation is one of the key issues to ensure the computational efficiency of GASFLOW-MPI. Several advanced Krylov subspace methods and scalable preconditioning methods are compared and analyzed to improve the computational performance. With the help of the powerful computational capability, the large eddy simulation turbulent model is used to resolve more detailed turbulent behaviors. A backward-facing step flow is performed to study the free shear layer, the recirculation region, and the boundary layer, which is widespread in many scientific and engineering applications. Numerical results are compared with the experimental data in the literature and the direct numerical simulation results by GASFLOW-MPI. Both time-averaged velocity profile and turbulent intensity are well consistent with the experimental data and direct numerical simulation result. Furthermore, the frequency spectrum is presented and a -5/3 energy decay is observed for a wide range of frequencies, satisfying the turbulent energy spectrum theory. Parallel scaling tests are also implemented on the KIT/IKET cluster and a linear scaling is realized for GASFLOW-MPI.

ANALYSIS OF VORTEX SHEDDING PHENOMENA AROUND PANTOGRAPH PANHEAD FOR TRAIN USING LARGE EDDY SIMULATION (LES를 이용한 판토그라프 팬헤드의 와 흘림 현상 해석)

  • Jang, Yong-Jun
    • Journal of computational fluids engineering
    • /
    • v.16 no.2
    • /
    • pp.17-23
    • /
    • 2011
  • The turbulent flow and vortex shedding phenomena around pantograph panhead of high speed train were investigated and compared with available experimental data and other simulations. The pantograph head was simplified to be a square-cross-section pillar and assumed to be no interference with other bodies. The Reynolds number (Re) was 22,000. The LES(large eddy simulation) of FDS code was applied to solve the momentum equations and the Wener-Wengle wall model was employed to solve the near wall turbulent flow. Smagorinsky model($C_s$=0.2) was used as SGS(subgrid scale) model. The total grid numbers were about 9 millions and the analyzed domain was divided into 12 multi blocks which were communicated with each other by MPI. The time-averaged mainstream flows were calculated and well compared with experimental data. The phased-averaged quantities had also a good agreement with experimental data. The near-wall turbulence should be carefully treated by wall function or direct resolution to get successful application of LES methods.

Learning Discriminative Fisher Kernel for Image Retrieval

  • Wang, Bin;Li, Xiong;Liu, Yuncai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.3
    • /
    • pp.522-538
    • /
    • 2013
  • Content based image retrieval has become an increasingly important research topic for its wide application. It is highly challenging when facing to large-scale database with large variance. The retrieval systems rely on a key component, the predefined or learned similarity measures over images. We note that, the similarity measures can be potential improved if the data distribution information is exploited using a more sophisticated way. In this paper, we propose a similarity measure learning approach for image retrieval. The similarity measure, so called Fisher kernel, is derived from the probabilistic distribution of images and is the function over observed data, hidden variable and model parameters, where the hidden variables encode high level information which are powerful in discrimination and are failed to be exploited in previous methods. We further propose a discriminative learning method for the similarity measure, i.e., encouraging the learned similarity to take a large value for a pair of images with the same label and to take a small value for a pair of images with distinct labels. The learned similarity measure, fully exploiting the data distribution, is well adapted to dataset and would improve the retrieval system. We evaluate the proposed method on Corel-1000, Corel5k, Caltech101 and MIRFlickr 25,000 databases. The results show the competitive performance of the proposed method.

Design and Performance Test of Large-Area Susceptor for the Improvement of Temperature Uniformity (온도 균일도 향상을 위한 대면적 서셉터의 설계 및 성능 시험)

  • Yang, Hac Jin;Kim, Seong Kun;Cho, Jung Kun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.6
    • /
    • pp.3714-3721
    • /
    • 2015
  • Although sheath-type heating line is generally used for susceptor heater, performance deterioration problems in temperature uniformity occurs in the case of large scale and high temperature condition. We developed new design and prototype of the susceptor using sheet metal to provide performance improvement in temperature uniformity. Temperature uniformity below 1.4% in the surface temperature condition of $450^{\circ}C$ was verified in the susceptor prototype. Also we developed Kernel regression algorithm to estimate measured temperature using temperature learning data. The reliability of the measured temperature uniformity was confirmed by comparative analysis between predicted data and measured data.

Text Mining in Online Social Networks: A Systematic Review

  • Alhazmi, Huda N
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.3
    • /
    • pp.396-404
    • /
    • 2022
  • Online social networks contain a large amount of data that can be converted into valuable and insightful information. Text mining approaches allow exploring large-scale data efficiently. Therefore, this study reviews the recent literature on text mining in online social networks in a way that produces valid and valuable knowledge for further research. The review identifies text mining techniques used in social networking, the data used, tools, and the challenges. Research questions were formulated, then search strategy and selection criteria were defined, followed by the analysis of each paper to extract the data relevant to the research questions. The result shows that the most social media platforms used as a source of the data are Twitter and Facebook. The most common text mining technique were sentiment analysis and topic modeling. Classification and clustering were the most common approaches applied by the studies. The challenges include the need for processing with huge volumes of data, the noise, and the dynamic of the data. The study explores the recent development in text mining approaches in social networking by providing state and general view of work done in this research area.