• Title/Summary/Keyword: Large-scale Analysis Data

Search Result 1,169, Processing Time 0.025 seconds

Design of Data Generating for Fast Searching and Customized Service for Underground Utility Facilities (지하공동구 관리를 위한 고속 검색 데이터 생성 및 사용자 맞춤형 서비스 방안 설계)

  • Park, Jonghwa;Jeon, Jihye;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.26 no.4
    • /
    • pp.390-397
    • /
    • 2021
  • As digital twin technology is applied to various industrial fields, technologies to effectively process large amounts of data are required. In this paper, we discuss a customized service method for fast search and effective delivery of large-scale data for underground facility for public utilities management. The proposed schemes are divided into two ways: a fast search data generation method and a customized information service segmentation method to efficiently search and abbreviate vast amounts of data. In the high-speed search data generation, we discuss the configuration of the synchronization process for the time series analysis of the sensors collected in the underground facility and the additional information method according to the data reduction. In the user-customized service method, we define the types of users in normal and disaster situations, and discuss how to service them accordingly. Through this study, it is expected to be able to develop a systematic data generation and service model for the management of underground utilities that can effectively search and receive large-scale data in a disaster situation.

A Study Of EIS Build Method by ISP Base for Large Scale Enterprise Associate Company (대기업 협력 업체를 위한 ISP 기반의 EIS 구축 방법에 관한 연구)

  • Kim, Soo-Kyum;Ha, Soo-Cheol
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.1
    • /
    • pp.159-166
    • /
    • 2010
  • It is noteworthy that EIS Consulting Method by ISP based on ERP Linkage Type for Large Scale Enterprise Associate Company. In this thesis, IT Strategy plan and methodology are suggested in order to solve the several problems including standardization on implementation of EIS Introduction and operation between large-small enterprise. Integration of business and information technical is made by Business management group's continuous IT concern and future information strategy. Also this paper proposes ISP Planning method (environment analysis, present analysis, IT analysis, target IT plan etc..), EIS Construction (based ERP Real data/time). In addition to, we suggest to use electron industry model and LCD/LED field in this system.

Generation of Large-scale Map of Surface Sedimentary Facies in Intertidal Zone by Using UAV Data and Object-based Image Analysis (OBIA) (UAV 자료와 객체기반영상분석을 활용한 대축척 갯벌 표층 퇴적상 분류도 작성)

  • Kim, Kye-Lim;Ryu, Joo-Hyung
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.2_2
    • /
    • pp.277-292
    • /
    • 2020
  • The purpose of this study is to propose the possibility of precise surface sedimentary facies classification and a more accurate classification method by generating the large-scale map of surface sedimentary facies based on UAV data and object-based image analysis (OBIA) for Hwang-do tidal flat in Cheonsu bay. The very high resolution UAV data extracted factors that affect the classification of surface sedimentary facies, such as RGB ortho imagery, Digital elevation model (DEM), and tidal channel density, and analyzed the principal components of surface sedimentary facies through statistical analysis methods. Based on principal components, input data to be used for classification of surface sedimentary facies were divided into three cases such as (1) visible band spectrum, (2) topographical elevation and tidal channel density, (3) visible band spectrum and topographical elevation, tidal channel density. The object-based image analysis classification method was applied to map the classification of surface sedimentary facies according to conditions of input data. The surface sedimentary facies could be classified into a total of six sedimentary facies following the folk classification criteria. In addition, the use of visible band spectrum, topographical elevation, and tidal channel density enabled the most effective classification of surface sedimentary facies with a total accuracy of 63.04% and the Kappa coefficient of 0.54.

Design and Implementation of a Hadoop-based Efficient Security Log Analysis System (하둡 기반의 효율적인 보안로그 분석시스템 설계 및 구현)

  • Ahn, Kwang-Min;Lee, Jong-Yoon;Yang, Dong-Min;Lee, Bong-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.8
    • /
    • pp.1797-1804
    • /
    • 2015
  • Integrated log management system can help to predict the risk of security and contributes to improve the security level of the organization, and leads to prepare an appropriate security policy. In this paper, we have designed and implemented a Hadoop-based log analysis system by using distributed database model which can store large amount of data and reduce analysis time by automating log collecting procedure. In the proposed system, we use the HBase in order to store a large amount of data efficiently in the scale-out fashion and propose an easy data storing scheme for analysing data using a Hadoop-based normal expression, which results in improving data processing speed compared to the existing system.

A guideline for the statistical analysis of compositional data in immunology

  • Yoo, Jinkyung;Sun, Zequn;Greenacre, Michael;Ma, Qin;Chung, Dongjun;Kim, Young Min
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.453-469
    • /
    • 2022
  • The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the alternative approach using Dirichlet regression analysis, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.

Comparison analysis of big data integration models (빅데이터 통합모형 비교분석)

  • Jung, Byung Ho;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.755-768
    • /
    • 2017
  • As Big Data becomes the core of the fourth industrial revolution, big data-based processing and analysis capabilities are expected to influence the company's future competitiveness. Comparative studies of RHadoop and RHIPE that integrate R and Hadoop environment, have not been discussed by many researchers although RHadoop and RHIPE have been discussed separately. In this paper, we constructed big data platforms such as RHadoop and RHIPE applicable to large scale data and implemented the machine learning algorithms such as multiple regression and logistic regression based on MapReduce framework. We conducted a study on performance and scalability with those implementations for various sample sizes of actual data and simulated data. The experiments demonstrated that our RHadoop and RHIPE can scale well and efficiently process large data sets on commodity hardware. We showed RHIPE is faster than RHadoop in almost all the data generally.

F_MixBERT: Sentiment Analysis Model using Focal Loss for Imbalanced E-commerce Reviews

  • Fengqian Pang;Xi Chen;Letong Li;Xin Xu;Zhiqiang Xing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.263-283
    • /
    • 2024
  • Users' comments after online shopping are critical to product reputation and business improvement. These comments, sometimes known as e-commerce reviews, influence other customers' purchasing decisions. To confront large amounts of e-commerce reviews, automatic analysis based on machine learning and deep learning draws more and more attention. A core task therein is sentiment analysis. However, the e-commerce reviews exhibit the following characteristics: (1) inconsistency between comment content and the star rating; (2) a large number of unlabeled data, i.e., comments without a star rating, and (3) the data imbalance caused by the sparse negative comments. This paper employs Bidirectional Encoder Representation from Transformers (BERT), one of the best natural language processing models, as the base model. According to the above data characteristics, we propose the F_MixBERT framework, to more effectively use inconsistently low-quality and unlabeled data and resolve the problem of data imbalance. In the framework, the proposed MixBERT incorporates the MixMatch approach into BERT's high-dimensional vectors to train the unlabeled and low-quality data with generated pseudo labels. Meanwhile, data imbalance is resolved by Focal loss, which penalizes the contribution of large-scale data and easily-identifiable data to total loss. Comparative experiments demonstrate that the proposed framework outperforms BERT and MixBERT for sentiment analysis of e-commerce comments.

DNA Pooling as a Tool for Case-Control Association Studies of Complex Traits

  • Ahn, Chul;King, Terri M.;Lee, Kyusang;Kang, Seung-Ho
    • Genomics & Informatics
    • /
    • v.3 no.1
    • /
    • pp.1-7
    • /
    • 2005
  • Case-control studies are widely used for disease gene mapping using individual genotyping data. However, analyses of large samples are often impractical due to the expense of individual genotyping. The use of DNA pooling can significantly reduce the number of genotyping reactions required; hence reducing the cost of large-scale case-control association studies. Here, we discuss the design and analysis of DNA pooling genetic association studies.

Text Mining in Online Social Networks: A Systematic Review

  • Alhazmi, Huda N
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.3
    • /
    • pp.396-404
    • /
    • 2022
  • Online social networks contain a large amount of data that can be converted into valuable and insightful information. Text mining approaches allow exploring large-scale data efficiently. Therefore, this study reviews the recent literature on text mining in online social networks in a way that produces valid and valuable knowledge for further research. The review identifies text mining techniques used in social networking, the data used, tools, and the challenges. Research questions were formulated, then search strategy and selection criteria were defined, followed by the analysis of each paper to extract the data relevant to the research questions. The result shows that the most social media platforms used as a source of the data are Twitter and Facebook. The most common text mining technique were sentiment analysis and topic modeling. Classification and clustering were the most common approaches applied by the studies. The challenges include the need for processing with huge volumes of data, the noise, and the dynamic of the data. The study explores the recent development in text mining approaches in social networking by providing state and general view of work done in this research area.

Application of a Non-stationary Frequency Analysis Method for Estimating Probable Precipitation in Korea (전국 확률강수량 산정을 위한 비정상성 빈도해석 기법의 적용)

  • Kim, Gwang-Seob;Lee, Gi-Chun
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.54 no.5
    • /
    • pp.141-153
    • /
    • 2012
  • In this study, we estimated probable precipitation amounts at the target year (2020, 2030, 2040) of 55 weather stations in Korea using the 24 hour annual maximum precipitation data from 1973 through 2009 which should be useful for management of agricultural reservoirs. Not only trend tests but also non-stationary tests were performed and non-stationary frequency analysis were conducted to all of 55 sites. Gumbel distribution was chosen and probability weighted moment method was used to estimate model parameters. The behavior of the mean of extreme precipitation data, scale parameter, and location parameter were analyzed. The probable precipitation amount at the target year was estimated by a non-stationary frequency analysis using the linear regression analysis for the mean of extreme precipitation data, scale parameter, and location parameter. Overall results demonstrated that the probable precipitation amounts using the non-stationary frequency analysis were overestimated. There were large increase of the probable precipitation amounts of middle part of Korea and decrease at several sites in Southern part. The non-stationary frequency analysis using a linear model should be applicable to relatively short projection periods.