• Title/Summary/Keyword: Data Sets

Search Result 3,740, Processing Time 0.035 seconds

Design of Fuzzy Model for Data Mining

  • Kim, Do-Wan;Joo, Young-Hoon;Park, Jin-Bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.107-113
    • /
    • 2003
  • A new GA-based methodology using information granules is suggested for the construction of fuzzy classifiers. The proposed scheme consists of three steps: selection of information granules, construction of the associated fuzzy sets, and tuning of the fuzzy rules. First, the genetic algorithm (GA) is applied to the development of the adequate information granules. The fuzzy sets are then constructed from the analysis of the developed information granules. An interpretable fuzzy classifier is designed by using the constructed fuzzy sets. Finally, the GA are utilized for tuning of the fuzzy rules, which can enhance the classification performance on the misclassified data (e.g., data with the strange pattern or on the boundaries of the classes). To show the effectiveness of the proposed method, an example, the classification of the Iris data, is provided.

Modeling sulfuric acid induced swell in carbonate clays using artificial neural networks

  • Sivapullaiah, P.V.;Guru Prasad, B.;Allam, M.M.
    • Geomechanics and Engineering
    • /
    • v.1 no.4
    • /
    • pp.307-321
    • /
    • 2009
  • The paper employs a feed forward neural network with back-propagation algorithm for modeling time dependent swell in clays containing carbonate in the presence of sulfuric acid. The oedometer swell percent is estimated at a nominal surcharge pressure of 6.25 kPa to develop 612 data sets for modeling. The input parameters used in the network include time, sulfuric acid concentration, carbonate percentage, and liquid limit. Among the total data sets, 280 (46%) were assigned to training, 175 (29%) for testing and the remaining 157 data sets (25%) were relegated to cross validation. The network was programmed to process this information and predict the percent swell at any time, knowing the variable involved. The study demonstrates that it is possible to develop a general BPNN model that can predict time dependent swell with relatively high accuracy with observed data ($R^2$=0.9986). The obtained results are also compared with generated non-linear regression model.

Quick Evaluations of the KOMPSAT-1 Orbit Maneuvers Using Small Sets of Real-time GPS Navigation Solutions

  • Lee, Byoung-Sun;Lee, Jeong-Sook;Kim, Jae-Hoon
    • Transactions on Control, Automation and Systems Engineering
    • /
    • v.3 no.3
    • /
    • pp.196-202
    • /
    • 2001
  • Quick evaluations of two in-plane orbit maneuvers using small sets of real-time GPS navigation solutions were performed for the KOMPSAT-1 spacecraft operation. Real-time GPS navigation solutions of the KOMPSAT-1 were collected during the Korean Ground Station(KGS) pass. Only a few sets of position and velocity data after completion of the thruster firing were used for the quick maneuver evaluations. The results were used for antenna pointing data predictions for the next station contact. Normal orbit maneuver evaluations using large sets of playback GPS navigation solutions were also performed and the result were compared with the quick evaluation results.

  • PDF

Deep Meta Learning Based Classification Problem Learning Method for Skeletal Maturity Indication (골 성숙도 판별을 위한 심층 메타 학습 기반의 분류 문제 학습 방법)

  • Min, Jeong Won;Kang, Dong Joong
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.2
    • /
    • pp.98-107
    • /
    • 2018
  • In this paper, we propose a method to classify the skeletal maturity with a small amount of hand wrist X-ray image using deep learning-based meta-learning. General deep-learning techniques require large amounts of data, but in many cases, these data sets are not available for practical application. Lack of learning data is usually solved through transfer learning using pre-trained models with large data sets. However, transfer learning performance may be degraded due to over fitting for unknown new task with small data, which results in poor generalization capability. In addition, medical images require high cost resources such as a professional manpower and mcuh time to obtain labeled data. Therefore, in this paper, we use meta-learning that can classify using only a small amount of new data by pre-trained models trained with various learning tasks. First, we train the meta-model by using a separate data set composed of various learning tasks. The network learns to classify the bone maturity using the bone maturity data composed of the radiographs of the wrist. Then, we compare the results of the classification using the conventional learning algorithm with the results of the meta learning by the same number of learning data sets.

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

Neural Network Applications to Determining Suitable Tree Species for Site-Specific Conditions (적지적수(適地適樹) 판정(判定)을 위한 Neural Network 기법(技法)의 응용(應用))

  • Kim, Hyungho;Chung, Joosang
    • Journal of Korean Society of Forest Science
    • /
    • v.90 no.4
    • /
    • pp.437-444
    • /
    • 2001
  • This paper discusses applications of neural network to forest stand field data processing and determining suitable tree species for site-specific stand characteristics. For site-specific species selection, considered were 5 major coniferous species : P. densiflora for. erecta, L. leptolepis, P. koraiensis, P. densiflora, P. thunbergii. Among 1,320 sample plot data sets, 200 data sets with the highest site index (40 data sets for each species) were chosen as the test sets for investigation. Each data set includes 13 factors describing the site characteristics of the corresponding sample plot. The results of this investigation indicate high performance of neural network in data processing procedures for extracting data sets or measurement parameters without any recognizable pattern. These data sets or measurement parameters are those which have rare effect on site-specific species suitability or disturb pattern classification procedures of neural network because of unrecognizable patterns involved. Also the results have shown high potential of neural network in determining the best-suitable tree species for site characteristics. The % accuracy of the neural network model in determining the best-suitable tree species for site characteristics ranges from 77.6% to 91.8% associated with the combination of Site factors.

  • PDF

Design of Service Provision Framework using Medical Big Data (의료 빅 데이터를 활용한 서비스 제공 프레임워크 설계)

  • Shin, Bong-Hi;Jeon, Hye-Kyoung
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.2
    • /
    • pp.1-6
    • /
    • 2019
  • In this article, we have presented a framework, designed to create new services for businesses, which use large sets of medical data. It is not a simple data analysis step, but it clarifies the purpose of data utilization, analyses it, extracts value from it, and designs a process from actual business or service to an operation. The designed frame work covers the basic architecture and social system model. It was designed, using basic data, which was focused on large sets of medical data, and to be applied to a social system with reference to the designed framework. We are looking forward to create various medical business alliances and services applying the designed framework to the available sets of basic medical data.

VARIABILITY OF THE LATENT HEAT FLUX DURING 1988-2005

  • Iwasaki, Shinsuke;Kubota, Masahisa
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.289-292
    • /
    • 2008
  • Recently, several satellite data analyses projects and numerical weather prediction (NWP) reanalysis projects have produced the ocean surface Latent Heat Flux (LHF) data sets in the global coverage. Comparisons of these LHF data sets showed substantial discrepancies in the LHF values. Recently, the increase of LHF in during 1970s-1990s over the global ocean is shown by the LHF data that have been developed at the Objective Analyzed Air-Sea Fluxes (OAFlux) project. It is interesting to investigate the existence of the increase of LHF over a global ocean in the other LHF products. It is interesting to investigate the existence of the increase of LHF over a global ocean in the other LHF products. In this study, we assessed the consistencies and discrepancies of the inter-annual variability and decadal trend for the period 1988-2005 among six LHF products ((J-OFURO2, HOAPS3, IFREMER, NCEP1,2 and OAFlux) over the global ocean. As results, all LHF products showed a positive trend. In particular, the positive trend in satellite-based data analyses (J-OFURO2, HOAPS3, IFREMER) is larger than that in reanalysis products (NCEP1/2). Also, the consistencies and discrepancies are shown on the spatial patterns of the LHF trends across the six data sets. The positive trend of LHF is remarkable in the regions of western boundary currents such as the Kuroshio and the Gulf Stream in all LHF data sets. But, the discrepancies are shown on the spatial patterns of the LHF trends in tropics and subtropics. These discrepancies are primarily caused by the differences of the input meteorological state variables, particularly for the air specific humidity, used to calculate LHF.

  • PDF

A Study on the Description of Archival Datasets (데이터세트 기록물의 기술요소에 관한 연구)

  • Kim, Po-Ok;Yun, Soo-Young
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.18 no.2
    • /
    • pp.39-59
    • /
    • 2007
  • With the rapid spread of the practice of collecting and treating data by using a data base system, it's increasingly more critical to approach data sets in the same manner as general records in the collection, evaluation, preservation, and utilization process. Despite the importance, however, the interest level in data sets in Korea's records management is very low. In order to suggest basic items to regard data sets as records and manage them systematically, this study examined the descriptive elements of data set records. descriptive elements of data set records were suggested by comparing and analyzing those ones adopted by the agencies that regarded data sets as records and provided the concerned service as well as the descriptive rules of electronic records set by the advanced nations in records management based on the descriptive areas of ISAD(G).

Tests for Equality of Two Distributions with Life-Table Model

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.12 no.2
    • /
    • pp.71-82
    • /
    • 2001
  • There are several ways to test the equality of two survival distributions under a variety of situations. Tests for equality of two distributions with life-table model for univariate independent response times are reviewed and introduced. It is developed that the methodology to test it for correlated response times where treatments are applied to different independent sets of cohorts. Data, which can be separated into two independent sets, from an angioplasty study where more than one procedure is performed on some patients are used to illustrate this methodology.

  • PDF