• Title/Summary/Keyword: Data-based analysis

Search Result 30,591, Processing Time 0.044 seconds

An Adequacy Based Test Data Generation Technique Using Genetic Algorithms

  • Malhotra, Ruchika;Garg, Mohit
    • Journal of Information Processing Systems
    • /
    • v.7 no.2
    • /
    • pp.363-384
    • /
    • 2011
  • As the complexity of software is increasing, generating an effective test data has become a necessity. This necessity has increased the demand for techniques that can generate test data effectively. This paper proposes a test data generation technique based on adequacy based testing criteria. Adequacy based testing criteria uses the concept of mutation analysis to check the adequacy of test data. In general, mutation analysis is applied after the test data is generated. But, in this work, we propose a technique that applies mutation analysis at the time of test data generation only, rather than applying it after the test data has been generated. This saves significant amount of time (required to generate adequate test cases) as compared to the latter case as the total time in the latter case is the sum of the time to generate test data and the time to apply mutation analysis to the generated test data. We also use genetic algorithms that explore the complete domain of the program to provide near-global optimum solution. In this paper, we first define and explain the proposed technique. Then we validate the proposed technique using ten real time programs. The proposed technique is compared with path testing technique (that use reliability based testing criteria) for these ten programs. The results show that the adequacy based proposed technique is better than the reliability based path testing technique and there is a significant reduce in number of generated test cases and time taken to generate test cases.

Development of Realtime GRID Analysis Method based on the High Precision Streaming Data

  • Lee, HyeonSoo;Suh, YongCheol
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.34 no.6
    • /
    • pp.569-578
    • /
    • 2016
  • With the recent advancement of surveying and technology, the spatial data acquisition rates and precision have been improved continually. As the updates of spatial data are rapid, and the size of data increases in line with the advancing technology, the LOD (Level of Detail) algorithm has been adopted to process data expressions in real time in a streaming format with spatial data divided precisely into separate steps. The existing GRID analysis utilizes the single DEM, as it is, in examining and analyzing all data outside the analysis area as well, which results in extending the analysis time in proportion to the quantity of data. Hence, this study suggests a method to reduce analysis time and data throughput by acquiring and analyzing DEM data necessary for GRID analysis in real time based on the area of analysis and the level of precision, specifically for streaming DEM data, which is utilized mostly for 3D geographic information service.

BaSDAS: a web-based pooled CRISPR-Cas9 knockout screening data analysis system

  • Park, Young-Kyu;Yoon, Byoung-Ha;Park, Seung-Jin;Kim, Byung Kwon;Kim, Seon-Young
    • Genomics & Informatics
    • /
    • v.18 no.4
    • /
    • pp.46.1-46.4
    • /
    • 2020
  • We developed the BaSDAS (Barcode-Seq Data Analysis System), a GUI-based pooled knockout screening data analysis system, to facilitate the analysis of pooled knockout screen data easily and effectively by researchers with limited bioinformatics skills. The BaSDAS supports the analysis of various pooled screening libraries, including yeast, human, and mouse libraries, and provides many useful statistical and visualization functions with a user-friendly web interface for convenience. We expect that BaSDAS will be a useful tool for the analysis of genome-wide screening data and will support the development of novel drugs based on functional genomics information.

Categorical Data Analysis by Means of Echelon Analysis with Spatial Scan Statistics

  • Moon, Sung-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.1
    • /
    • pp.83-94
    • /
    • 2004
  • In this study we analyze categorical data by means of spatial statistics and echelon analysis. To do this, we first determine the hierarchical structure of a given contingency table by using echelon dendrogram then, we detect candidates of hotspots given as the top echelon in the dendrogram. Next, we evaluate spatial scan statistics for the zones of significantly high or low rates based on the likelihood ratio. Finally, we detect hotspots of any size and shape based on spatial scan statistics.

  • PDF

A Novel Data Prediction Model using Data Weights and Neural Network based on R for Meaning Analysis between Data (데이터간 의미 분석을 위한 R기반의 데이터 가중치 및 신경망기반의 데이터 예측 모형에 관한 연구)

  • Jung, Se Hoon;Kim, Jong Chan;Sim, Chun Bo
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.4
    • /
    • pp.524-532
    • /
    • 2015
  • All data created in BigData times is included potentially meaning and correlation in data. A variety of data during a day in all society sectors has become created and stored. Research areas in analysis and grasp meaning between data is proceeding briskly. Especially, accuracy of meaning prediction and data imbalance problem between data for analysis is part in course of something important in data analysis field. In this paper, we proposed data prediction model based on data weights and neural network using R for meaning analysis between data. Proposed data prediction model is composed of classification model and analysis model. Classification model is working as weights application of normal distribution and optimum independent variable selection of multiple regression analysis. Analysis model role is increased prediction accuracy of output variable through neural network. Performance evaluation result, we were confirmed superiority of prediction model so that performance of result prediction through primitive data was measured 87.475% by proposed data prediction model.

A Study on the Meaning of The First Slam Dunk Based on Text Mining and Semantic Network Analysis

  • Kyung-Won Byun
    • International journal of advanced smart convergence
    • /
    • v.12 no.1
    • /
    • pp.164-172
    • /
    • 2023
  • In this study, we identify the recognition of 'The First Slam Dunk', which is gaining popularity as a sports-based cartoon through big data analysis of social media channels, and provide basic data for the development and development of various contents in the sports industry. Social media channels collected detailed social big data from news provided on Naver and Google sites. Data were collected from January 1, 2023 to February 15, 2023, referring to the release date of 'The First Slam Dunk' in Korea. The collected data were 2,106 Naver news data, and 1,019 Google news data were collected. TF and TF-IDF were analyzed through text mining for these data. Through this, semantic network analysis was conducted for 60 keywords. Big data analysis programs such as Textom and UCINET were used for social big data analysis, and NetDraw was used for visualization. As a result of the study, the keyword with the high frequency in relation to the subject in consideration of TF and TF-IDF appeared 4,079 times as 'The First Slam Dunk' was the keyword with the high frequency among the frequent keywords. Next are 'Slam Dunk', 'Movie', 'Premiere', 'Animation', 'Audience', and 'Box-Office'. Based on these results, 60 high-frequency appearing keywords were extracted. After that, semantic metrics and centrality analysis were conducted. Finally, a total of 6 clusters(competing movie, cartoon, passion, premiere, attention, Box-Office) were formed through CONCOR analysis. Based on this analysis of the semantic network of 'The First Slam Dunk', basic data on the development plan of sports content were provided.

Performance Analysis of Perturbation-based Privacy Preserving Techniques: An Experimental Perspective

  • Ritu Ratra;Preeti Gulia;Nasib Singh Gill
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.81-88
    • /
    • 2023
  • In the present scenario, enormous amounts of data are produced every second. These data also contain private information from sources including media platforms, the banking sector, finance, healthcare, and criminal histories. Data mining is a method for looking through and analyzing massive volumes of data to find usable information. Preserving personal data during data mining has become difficult, thus privacy-preserving data mining (PPDM) is used to do so. Data perturbation is one of the several tactics used by the PPDM data privacy protection mechanism. In Perturbation, datasets are perturbed in order to preserve personal information. Both data accuracy and data privacy are addressed by it. This paper will explore and compare several perturbation strategies that may be used to protect data privacy. For this experiment, two perturbation techniques based on random projection and principal component analysis were used. These techniques include Improved Random Projection Perturbation (IRPP) and Enhanced Principal Component Analysis based Technique (EPCAT). The Naive Bayes classification algorithm is used for data mining approaches. These methods are employed to assess the precision, run time, and accuracy of the experimental results. The best perturbation method in the Nave-Bayes classification is determined to be a random projection-based technique (IRPP) for both the cardiovascular and hypothyroid datasets.

Network-based Microarray Data Analysis Tool

  • Park, Hee-Chang;Ryu, Ki-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.53-62
    • /
    • 2006
  • DNA microarray data analysis is a new technology to investigate the expression levels of thousands of genes simultaneously. Since DNA microarray data structures are various and complicative, the data are generally stored in databases for approaching to and controlling the data effectively. But we have some difficulties to analyze and control the data when the data are stored in the several database management systems or that the data are stored to the file format. The existing analysis tools for DNA microarray data have many difficult problems by complicated instructions, and dependency on data types and operating system. In this paper, we design and implement network-based analysis tool for obtaining to useful information from DNA microarray data. When we use this tool, we can analyze effectively DNA microarray data without special knowledge and education for data types and analytical methods.

  • PDF

Development of Web-based Off-site Consequence Analysis Program and its Application for ILRT Extension (격납건물종합누설률시험 주기연장을 위한 웹기반 소외결말분석 프로그램 개발 및 적용)

  • Na, Jang-Hwan;Hwang, Seok-Won;Oh, Ji-Yong
    • Journal of the Korean Society of Safety
    • /
    • v.27 no.5
    • /
    • pp.219-223
    • /
    • 2012
  • For an off-site consequence analysis at nuclear power plant, MELCOR Accident Consequence Code System(MACCS) II code is widely used as a software tool. In this study, the algorithm of web-based off-site consequence analysis program(OSCAP) using the MACCS II code was developed for an Integrated Leak Rate Test (ILRT) interval extension and Level 3 probabilistic safety assessment(PSA), and verification and validation(V&V) of the program was performed. The main input data for the MACCS II code are meteorological, population distribution and source term information. However, it requires lots of time and efforts to generate the main input data for an off-site consequence analysis using the MACCS II code. For example, the meteorological data are collected from each nuclear power site in real time, but the formats of the raw data collected are different from each site. To reduce the efforts and time for risk assessments, the web-based OSCAP has an automatic processing module which converts the format of the raw data collected from each site to the input data format of the MACCS II code. The program also provides an automatic function of converting the latest population data from Statistics Korea, the National Statistical Office, to the population distribution input data format of the MACCS II code. For the source term data, the program includes the release fraction of each source term category resulting from modular accident analysis program(MAAP) code analysis and the core inventory data from ORIGEN. These analysis results of each plant in Korea are stored in a database module of the web-based OSCAP, so the user can select the defaulted source term data of each plant without handling source term input data.

Reliability analysis of piles based on proof vertical static load test

  • Dong, Xiaole;Tan, Xiaohui;Lin, Xin;Zhang, Xuejuan;Hou, Xiaoliang;Wu, Daoxiang
    • Geomechanics and Engineering
    • /
    • v.29 no.5
    • /
    • pp.487-496
    • /
    • 2022
  • Most of the pile's vertical static load tests in construction sites are the proof load tests, which is difficult to accurately estimate the ultimate bearing capacity and analyze the reliability of piles. Therefore, a reliability analysis method based on the proof load-settlement (Q-s) data is proposed in this study. In this proposed method, a simple ultimate limit state function based on the hyperbolic model is established, where the random variables of reliability analysis include the model factor of the ultimate bearing capacity and the fitting parameters of the hyperbolic model. The model factor M = RuR / RuP is calculated based on the available destructive Q-s data, where the real value of the ultimate bearing capacity (RuR) is obtained by the complete destructive Q-s data; the predicted value of the ultimate bearing capacity (RuP) is obtained by the proof Q-s data, a part of the available destructive Q-s data, that before the predetermined load determined by the pile test report. The results demonstrate that the proposed method can easy and effectively perform the reliability analysis based on the proof Q-s data.