• Title/Summary/Keyword: Large-scale Analysis Data

Search Result 1,157, Processing Time 0.027 seconds

Bioinformatics and Genomic Medicine (생명정보학과 유전체의학)

  • Kim, Ju-Han
    • Journal of Preventive Medicine and Public Health
    • /
    • v.35 no.2
    • /
    • pp.83-91
    • /
    • 2002
  • Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges in computational sciences. Clinical informatics has long developed methodologies to improve biomedical research and clinical care by integrating experimental and clinical information systems. The informatics revolutions both in bioinformatics and clinical informatics will eventually change the current practice of medicine, including diagnostics, therapeutics, and prognostics. Postgenome informatics, powered by high throughput technologies and genomic-scale databases, is likely to transform our biomedical understanding forever much the same way that biochemistry did a generation ago. The paper describes how these technologies will impact biomedical research and clinical care, emphasizing recent advances in biochip-based functional genomics and proteomics. Basic data preprocessing with normalization, primary pattern analysis, and machine learning algorithms will be presented. Use of integrated biochip informatics technologies, text mining of factual and literature databases, and integrated management of biomolecular databases will be discussed. Each step will be given with real examples in the context of clinical relevance. Issues of linking molecular genotype and clinical phenotype information will be discussed.

Evaluating Perceived Smartness of Product from Consumer's Point of View: The Concept and Measurement

  • Lee, Won-Jun
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.6 no.1
    • /
    • pp.149-158
    • /
    • 2019
  • Due to the rapid development of IT (information technology) and internet, products become smart and able to collect, process and produce information and can think of themselves to provide better service to consumers. However, research on the characteristics of smart product is still sparse. In this paper, we report the systemic development of a scale to measure the perceived product smartness associated with smart product. To develop product smartness scale, this study follows systemic scale development processes of item generation, item reduction, scale validation, reliability and validity test consequently. And, after acquiring a large amount of qualitative interview data asking the definition of smart product, we add a unique process to reduce the initial items using both a text mining method using 'r' s/w and traditional reliability and validity tests including factor analysis. Based on an initial qualitative inquiry and subsequent quantitative survey, an eight-factor scale of product smartness is developed. The eight factors are multi-functionality, human-like touch, ability to cooperate, autonomy, situatedness, network connectivity, integrity, and learning capability consequently. Results from Korean samples support the proposed measures of product smartness in terms of reliability, validity, and dimensionality. Implications and directions for further study are discussed. The developed scale offers important theoretical and pragmatic implications for researchers and practitioners.

SVM-Based Incremental Learning Algorithm for Large-Scale Data Stream in Cloud Computing

  • Wang, Ning;Yang, Yang;Feng, Liyuan;Mi, Zhenqiang;Meng, Kun;Ji, Qing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.10
    • /
    • pp.3378-3393
    • /
    • 2014
  • We have witnessed the rapid development of information technology in recent years. One of the key phenomena is the fast, near-exponential increase of data. Consequently, most of the traditional data classification methods fail to meet the dynamic and real-time demands of today's data processing and analyzing needs--especially for continuous data streams. This paper proposes an improved incremental learning algorithm for a large-scale data stream, which is based on SVM (Support Vector Machine) and is named DS-IILS. The DS-IILS takes the load condition of the entire system and the node performance into consideration to improve efficiency. The threshold of the distance to the optimal separating hyperplane is given in the DS-IILS algorithm. The samples of the history sample set and the incremental sample set that are within the scope of the threshold are all reserved. These reserved samples are treated as the training sample set. To design a more accurate classifier, the effects of the data volumes of the history sample set and the incremental sample set are handled by weighted processing. Finally, the algorithm is implemented in a cloud computing system and is applied to study user behaviors. The results of the experiment are provided and compared with other incremental learning algorithms. The results show that the DS-IILS can improve training efficiency and guarantee relatively high classification accuracy at the same time, which is consistent with the theoretical analysis.

Anomalous Pattern Analysis of Large-Scale Logs with Spark Cluster Environment

  • Sion Min;Youyang Kim;Byungchul Tak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.3
    • /
    • pp.127-136
    • /
    • 2024
  • This study explores the correlation between system anomalies and large-scale logs within the Spark cluster environment. While research on anomaly detection using logs is growing, there remains a limitation in adequately leveraging logs from various components of the cluster and considering the relationship between anomalies and the system. Therefore, this paper analyzes the distribution of normal and abnormal logs and explores the potential for anomaly detection based on the occurrence of log templates. By employing Hadoop and Spark, normal and abnormal log data are generated, and through t-SNE and K-means clustering, templates of abnormal logs in anomalous situations are identified to comprehend anomalies. Ultimately, unique log templates occurring only during abnormal situations are identified, thereby presenting the potential for anomaly detection.

Complex sample design effects and inference for Korea National Health and Nutrition Examination Survey data (국민건강영양조사 자료의 복합표본설계효과와 통계적 추론)

  • Chung, Chin-Eun
    • Journal of Nutrition and Health
    • /
    • v.45 no.6
    • /
    • pp.600-612
    • /
    • 2012
  • Nutritional researchers world-wide are using large-scale sample survey methods to study nutritional health epidemiology and services utilization in general, non-clinical populations. This article provides a review of important statistical methods and software that apply to descriptive and multivariate analysis of data collected in sample surveys, such as national health and nutrition examination survey. A comparative data analysis of the Korea National Health and Nutrition Examination Survey (KNHANES) was used to illustrate analytical procedures and design effects for survey estimates of population statistics, model parameters, and test statistics. This article focused on the following points, method of approach to analyze of the sample survey data, right software tools available to perform these analyses, and correct survey analysis methods important to interpretation of survey data. It addresses the question of approaches to analysis of complex sample survey data. The latest developments in software tools for analysis of complex sample survey data are covered, and empirical examples are presented that illustrate the impact of survey sample design effects on the parameter estimates, test statistics, and significance probabilities (p values) for univariate and multivariate analyses.

Power Investigation of the Entropy-Based Test of Fit for Inverse Gaussian Distribution by the Information Discrimination Index

  • Choi, Byungjin
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.837-847
    • /
    • 2012
  • Inverse Gaussian distribution is widely used in applications to analyze and model right-skewed data. To assess the appropriateness of the distribution prior to data analysis, Mudholkar and Tian (2002) proposed an entropy-based test of fit. The test is based on the entropy power fraction(EPF) index suggested by Gokhale (1983). The simulation results report that the power of the entropy-based test is superior compared to other goodness-of-fit tests; however, this observation is based on the small-scale simulation results on the standard exponential, Weibull W(1; 2) and lognormal LN(0:5; 1) distributions. A large-scale simulation should be performed against various alternative distributions to evaluate the power of the entropy-based test; however, the use of a theoretical method is more effective to investigate the powers. In this paper, utilizing the information discrimination(ID) index defined by Ehsan et al. (1995) as a mathematical tool, we scrutinize the power of the entropy-based test. The selected alternative distributions are the gamma, Weibull and lognormal distributions, which are widely used in data analysis as an alternative to inverse Gaussian distribution. The study results are provided and an illustrative example is analyzed.

Trend Evaluation of Self-sustaining, High-efficiency Corrosion Control Technology for Large-scale Pipelines Delivering Natural Gas by Analyzing Patent Data (특허데이터 분석을 통한 천연가스 공급용 대규모 파이프라인을 위한 자립형 고효율 부식 방지 기술의 동향평가)

  • Lee, Jong-Won;Ji, Sanghoon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.12
    • /
    • pp.730-736
    • /
    • 2019
  • The demand for natural gas, which is considered an environmentally friendly energy source, is increasing, and at the same time, the market share of large pipelines for natural gas supply is increasing continuously. On the other hand, the corrosion of such large pipelines reduces the efficiency of natural gas transportation. Therefore, this study aims to establish a strategy for securing the patent rights of related technologies through quantitative analysis of patents on energy-independent high-efficiency corrosion prevention technology for large-scale pipelines for natural gas supply. In this patent technology trend study, Korean, US, Japanese, and European patents filed, published, and registered by June 2018 were analyzed, and a technical classification system and classification criteria were prepared through expert discussion. To use fuel cells as an external power source to prevent the corrosion of natural gas large-scale pipelines, it is believed that rights can be claimed using an energy control system and methods having 1) branch structures of pipeline and facility designs (decompressor/compressor/heat exchanger) and 2) decompression/preheating and pressurization/cooling technology of high pressure natural gas.

Application of wavelet transform in electromagnetics (Wavelet 변환의 전자기학적 응용)

  • Hyeongdong Kim
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.32A no.9
    • /
    • pp.1244-1249
    • /
    • 1995
  • Wavelet transform technique is applied to two important electromagnetic problems:1) to analyze the frequency-domain radar echo from finite-size targets and 2) to the integral solution of two- dimensional electromagnetic scattering problems. Since the frequency- domain radar echo consists of both small-scale natural resonances and large-scale scattering center information, the multiresolution property of the wavelet transform is well suited for analyzing such ulti-scale signals. Wavelet analysis examples of backscattered data from an open- ended waveguide cavity are presented. The different scattering mechanisms are clearly resolved in the wavelet-domain representation. In the wavelet transform domain, the moment method impedance matrix becomes sparse and sparse matrix algorithms can be utilized to solve the resulting matrix equationl. Using the fast wavelet transform in conjunction with the conjugate gradient method, we present the time performance for the solution of a dihedral corner reflector. The total computational time is found to be reduced.

  • PDF

DYNAMICAL AND STATISTICAL ASPECTS OF GRAVITATIONAL CLUSTERING IN THE UNIVERSE

  • SAHNI V.
    • Journal of The Korean Astronomical Society
    • /
    • v.29 no.spc1
    • /
    • pp.19-21
    • /
    • 1996
  • We apply topological measures of clustering such as percolation and genus curves (PC & GC) and shape statistics to a set of scale free N-body simulations of large scale structure. Both genus and percolation curves evolve with time reflecting growth of non-Gaussianity in the N-body density field. The amplitude of the genus curve decreases with epoch due to non-linear mode coupling, the decrease being more noticeable for spectra with small scale power. Plotted against the filling factor GC shows very little evolution - a surprising result, since the percolation curve shows significant evolution for the same data. Our results indicate that both PC and GC could be used to discriminate between rival models of structure formation and the analysis of CMB maps. Using shape sensitive statistics we find that there is a strong tendency for objects in our simulations to be filament-like, the degree of filamentarity increasing with epoch.

  • PDF

Data anomaly detection for structural health monitoring using a combination network of GANomaly and CNN

  • Liu, Gaoyang;Niu, Yanbo;Zhao, Weijian;Duan, Yuanfeng;Shu, Jiangpeng
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.53-62
    • /
    • 2022
  • The deployment of advanced structural health monitoring (SHM) systems in large-scale civil structures collects large amounts of data. Note that these data may contain multiple types of anomalies (e.g., missing, minor, outlier, etc.) caused by harsh environment, sensor faults, transfer omission and other factors. These anomalies seriously affect the evaluation of structural performance. Therefore, the effective analysis and mining of SHM data is an extremely important task. Inspired by the deep learning paradigm, this study develops a novel generative adversarial network (GAN) and convolutional neural network (CNN)-based data anomaly detection approach for SHM. The framework of the proposed approach includes three modules : (a) A three-channel input is established based on fast Fourier transform (FFT) and Gramian angular field (GAF) method; (b) A GANomaly is introduced and trained to extract features from normal samples alone for class-imbalanced problems; (c) Based on the output of GANomaly, a CNN is employed to distinguish the types of anomalies. In addition, a dataset-oriented method (i.e., multistage sampling) is adopted to obtain the optimal sampling ratios between all different samples. The proposed approach is tested with acceleration data from an SHM system of a long-span bridge. The results show that the proposed approach has a higher accuracy in detecting the multi-pattern anomalies of SHM data.