• 제목/요약/키워드: Data-based analysis

검색결과 30,387건 처리시간 0.057초

Classification via principal differential analysis

  • Jang, Eunseong;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • 제28권2호
    • /
    • pp.135-150
    • /
    • 2021
  • We propose principal differential analysis based classification methods. Computations of squared multiple correlation function (RSQ) and principal differential analysis (PDA) scores are reviewed; in addition, we combine principal differential analysis results with the logistic regression for binary classification. In the numerical study, we compare the principal differential analysis based classification methods with functional principal component analysis based classification. Various scenarios are considered in a simulation study, and principal differential analysis based classification methods classify the functional data well. Gene expression data is considered for real data analysis. We observe that the PDA score based method also performs well.

Structural health monitoring data reconstruction of a concrete cable-stayed bridge based on wavelet multi-resolution analysis and support vector machine

  • Ye, X.W.;Su, Y.H.;Xi, P.S.;Liu, H.
    • Computers and Concrete
    • /
    • 제20권5호
    • /
    • pp.555-562
    • /
    • 2017
  • The accuracy and integrity of stress data acquired by bridge heath monitoring system is of significant importance for bridge safety assessment. However, the missing and abnormal data are inevitably existed in a realistic monitoring system. This paper presents a data reconstruction approach for bridge heath monitoring based on the wavelet multi-resolution analysis and support vector machine (SVM). The proposed method has been applied for data imputation based on the recorded data by the structural health monitoring (SHM) system instrumented on a prestressed concrete cable-stayed bridge. The effectiveness and accuracy of the proposed wavelet-based SVM prediction method is examined by comparing with the traditional autoregression moving average (ARMA) method and SVM prediction method without wavelet multi-resolution analysis in accordance with the prediction errors. The data reconstruction analysis based on 5-day and 1-day continuous stress history data with obvious preternatural signals is performed to examine the effect of sample size on the accuracy of data reconstruction. The results indicate that the proposed data reconstruction approach based on wavelet multi-resolution analysis and SVM is an effective tool for missing data imputation or preternatural signal replacement, which can serve as a solid foundation for the purpose of accurately evaluating the safety of bridge structures.

Accurate Metabolic Flux Analysis through Data Reconciliation of Isotope Balance-Based Data

  • Kim Tae-Yong;Lee Sang-Yup
    • Journal of Microbiology and Biotechnology
    • /
    • 제16권7호
    • /
    • pp.1139-1143
    • /
    • 2006
  • Various techniques and strategies have been developed for the identification of intracellular metabolic conditions, and among them, isotope balance-based flux analysis with gas chromatography/mass spectrometry (GC/ MS) has recently become popular. Even though isotope balance-based flux analysis allows a more accurate estimation of intracellular fluxes, its application has been restricted to relatively small metabolic systems because of the limited number of measurable metabolites. In this paper, a strategy for incorporating isotope balance-based flux data obtained for a small network into metabolic flux analysis was examined as a feasible alternative allowing more accurate quantification of intracellular flux distribution in a large metabolic system. To impose GC/MS based data into a large metabolic network and obtain optimum flux distribution profile, data reconciliation procedure was applied. As a result, metabolic flux values of 308 intracellular reactions could be estimated from 29 GC/ MS based fluxes with higher accuracy.

공공데이터 융합역량 수준에 따른 데이터 기반 조직 역량의 연구 (A Study on the Data-Based Organizational Capabilities by Convergence Capabilities Level of Public Data)

  • 정병호;주형근
    • 디지털산업정보학회논문지
    • /
    • 제18권4호
    • /
    • pp.97-110
    • /
    • 2022
  • The purpose of this study is to analyze the level of public data convergence capabilities of administrative organizations and to explore important variables in data-based organizational capabilities. The theoretical background was summarized on public data and use activation, joint use, convergence, administrative organization, and convergence constraints. These contents were explained Public Data Act, the Electronic Government Act, and the Data-Based Administrative Act. The research model was set as the data-based organizational capabilities effect by a data-based administrative capability, public data operation capabilities, and public data operation constraints. It was also set whether there is a capabilities difference data-based on an organizational operation by the level of data convergence capabilities. This study analysis was conducted with hierarchical cluster analysis and multiple regression analysis. As the research result, First, hierarchical cluster analysis was classified into three groups. It was classified into a group that uses only public data and structured data, a group that uses public data on both structured and unstructured data, and a group that uses both public and private data. Second, the critical variables of data-based organizational operation capabilities were found in the data-based administrative planning and administrative technology, the supervisory organizations and technical systems by public data convergence, and the data sharing and market transaction constraints. Finally, the essential independent variables on data-based organizational competencies differ by group. This study contributed. As a theoretical implication, this research is updated on management information systems by explaining the Public Data Act, the Electronic Government Act, and the Data-Based Administrative Act. As a practical implication, the activity reinforcement of public data should be promoting the establishment of data standardization and search convenience and elimination of the lukewarm attitudes and Selfishness behavior for data sharing.

A Study on Design of Real-time Big Data Collection and Analysis System based on OPC-UA for Smart Manufacturing of Machine Working

  • Kim, Jaepyo;Kim, Youngjoo;Kim, Seungcheon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제13권4호
    • /
    • pp.121-128
    • /
    • 2021
  • In order to design a real time big data collection and analysis system of manufacturing data in a smart factory, it is important to establish an appropriate wired/wireless communication system and protocol. This paper introduces the latest communication protocol, OPC-UA (Open Platform Communication Unified Architecture) based client/server function, applied user interface technology to configure a network for real-time data collection through IoT Integration. Then, Database is designed in MES (Manufacturing Execution System) based on the analysis table that reflects the user's requirements among the data extracted from the new cutting process automation process, bush inner diameter indentation measurement system and tool monitoring/inspection system. In summary, big data analysis system introduced in this paper performs SPC (statistical Process Control) analysis and visualization analysis with interface of OPC-UA-based wired/wireless communication. Through AI learning modeling with XGBoost (eXtream Gradient Boosting) and LR (Linear Regression) algorithm, quality and visualization analysis is carried out the storage and connection to the cloud.

Matlab을 활용한 빅데이터 기반 분석 시스템 연구 (Research on the Analysis System based on the Big Data for Matlab)

  • 주문일;김희철
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2016년도 추계학술대회
    • /
    • pp.96-98
    • /
    • 2016
  • 최근 급속한 데이터의 생성으로 인하여 빅데이터 기술이 발전하고 있으며, 빅데이터를 분석하기 위한 다양한 빅데이터 분석 툴이 개발되어지고 있다. 대표적인 빅데이터 기반의 분석 툴은 R 프로그램, Hive, Tajo 등 다양한 분석 툴이 있다. 그러나, Matlab을 활용한 데이터 분석과 이를 위한 알고리즘 개발이 여전히 보편적이며, 빅데이터 분석에서도 Matlab이 광범위하게 사용되고 있다. 본 논문은 생체신호를 분석하는 Matlab을 활용한 빅데이터 기반 분석 시스템을 연구하고자 한다.

  • PDF

Data anomaly detection and Data fusion based on Incremental Principal Component Analysis in Fog Computing

  • Yu, Xue-Yong;Guo, Xin-Hui
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권10호
    • /
    • pp.3989-4006
    • /
    • 2020
  • The intelligent agriculture monitoring is based on the perception and analysis of environmental data, which enables the monitoring of the production environment and the control of environmental regulation equipment. As the scale of the application continues to expand, a large amount of data will be generated from the perception layer and uploaded to the cloud service, which will bring challenges of insufficient bandwidth and processing capacity. A fog-based offline and real-time hybrid data analysis architecture was proposed in this paper, which combines offline and real-time analysis to enable real-time data processing on resource-constrained IoT devices. Furthermore, we propose a data process-ing algorithm based on the incremental principal component analysis, which can achieve data dimensionality reduction and update of principal components. We also introduce the concept of Squared Prediction Error (SPE) value and realize the abnormal detection of data through the combination of SPE value and data fusion algorithm. To ensure the accuracy and effectiveness of the algorithm, we design a regular-SPE hybrid model update strategy, which enables the principal component to be updated on demand when data anomalies are found. In addition, this strategy can significantly reduce resource consumption growth due to the data analysis architectures. Practical datasets-based simulations have confirmed that the proposed algorithm can perform data fusion and exception processing in real-time on resource-constrained devices; Our model update strategy can reduce the overall system resource consumption while ensuring the accuracy of the algorithm.

A Study on the Sentiment Analysis of City Tour Using Big Data

  • Se-won Jeon;Gi-Hwan Ryu
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제15권2호
    • /
    • pp.112-117
    • /
    • 2023
  • This study aims to find out what tourists' interests and perceptions are like through online big data. Big data for a total of five years from 2018 to 2022 were collected using the Textom program. Sentiment analysis was performed with the collected data. Sentiment analysis expresses the necessity and emotions of city tours in online reviews written by tourists using city tours. The purpose of this study is to extract and analyze keywords representing satisfaction. The sentiment analysis program provided by the big data analysis platform "TEXTOM" was used to study positives and negatives based on sentiment analysis of tourists' online reviews. Sentiment analysis was conducted by collecting reviews related to the city tour. The degree of positive and negative emotions for the city tour was investigated and what emotional words were analyzed for each item. As a result of big data sentiment analysis to examine the emotions and sentiments of tourists about the city tour, 93.8% positive and 6.2% negative, indicating that more than half of the tourists are positively aware. This paper collects tourists' opinions based on the analyzed sentiment analysis, understands the quality characteristics of city tours based on the analysis using the collected data, and sentiment analysis provides important information to the city tour platform for each region.

Road Surface Data Collection and Analysis using A2B Communication in Vehicles from Bearings and Deep Learning Research

  • Young-Min KIM;Jae-Yong HWANG;Sun-Kyoung KANG
    • 한국인공지능학회지
    • /
    • 제11권4호
    • /
    • pp.21-27
    • /
    • 2023
  • This paper discusses a deep learning-based road surface analysis system that collects data by installing vibration sensors on the 4-axis wheel bearings of a vehicle, analyzes the data, and appropriately classifies the characteristics of the current driving road surface for use in the vehicle's control system. The data used for road surface analysis is real-time large-capacity data, with 48K samples per second, and the A2B protocol, which is used for large-capacity real-time data communication in modern vehicles, was used to collect the data. CAN and CAN-FD commonly used in vehicle communication, are unable to perform real-time road surface analysis due to bandwidth limitations. By using A2B communication, data was collected at a maximum bandwidth for real-time analysis, requiring a minimum of 24K samples/sec for evaluation. Based on the data collected for real-time analysis, performance was assessed using deep learning models such as LSTM, GRU, and RNN. The results showed similar road surface classification performance across all models. It was also observed that the quality of data used during the training process had an impact on the performance of each model.

An Adequacy Based Test Data Generation Technique Using Genetic Algorithms

  • Malhotra, Ruchika;Garg, Mohit
    • Journal of Information Processing Systems
    • /
    • 제7권2호
    • /
    • pp.363-384
    • /
    • 2011
  • As the complexity of software is increasing, generating an effective test data has become a necessity. This necessity has increased the demand for techniques that can generate test data effectively. This paper proposes a test data generation technique based on adequacy based testing criteria. Adequacy based testing criteria uses the concept of mutation analysis to check the adequacy of test data. In general, mutation analysis is applied after the test data is generated. But, in this work, we propose a technique that applies mutation analysis at the time of test data generation only, rather than applying it after the test data has been generated. This saves significant amount of time (required to generate adequate test cases) as compared to the latter case as the total time in the latter case is the sum of the time to generate test data and the time to apply mutation analysis to the generated test data. We also use genetic algorithms that explore the complete domain of the program to provide near-global optimum solution. In this paper, we first define and explain the proposed technique. Then we validate the proposed technique using ten real time programs. The proposed technique is compared with path testing technique (that use reliability based testing criteria) for these ten programs. The results show that the adequacy based proposed technique is better than the reliability based path testing technique and there is a significant reduce in number of generated test cases and time taken to generate test cases.