• Title/Summary/Keyword: data-sets

Search Result 3,782, Processing Time 0.028 seconds

Ambiguity Analysis of Defectiveness in NASA MDP Data Sets (NASA MDP 데이터 집합의 결함도 모호성 분석)

  • Hong, Euyseok
    • Journal of Information Technology Services
    • /
    • v.12 no.2
    • /
    • pp.361-371
    • /
    • 2013
  • Public domain defect data sets, such as NASA data sets which are available from the NASA MDP and PROMISE repositories, make it possible to compare the results of different defect prediction models by using the same data sets. This means that repeatable and general prediction models can be built. However, some recent studies have raised questions about the quality of two versions of NASA data set, and made new cleaned data sets by applying their data cleaning processes. We find that there are two ways in the NASA MDP versions to determine the defectiveness of a module, 0 or 1, and the two results are different in some cases. This serious problem, to our knowledge, has not been addressed in previous studies. To handle this ambiguity problem, we define two kinds of module defectiveness and two conditions that can be used to determine the ambiguous cases. We meticulously analyze 5 projects among the 13 NASA projects by using our ambiguity analysis method. The results show that JM1 and PC4 are the best projects with few ambiguous cases.

An SVD-Based Approach for Generating High-Dimensional Data and Query Sets (SVD를 기반으로 한 고차원 데이터 및 질의 집합의 생성)

  • 김상욱
    • The Journal of Information Technology and Database
    • /
    • v.8 no.2
    • /
    • pp.91-101
    • /
    • 2001
  • Previous research efforts on performance evaluation of multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space. However, recent research research result has shown that these hinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the features of the data and query sets that are appropriate for fairly evaluating performances of multidimensional indexes, and then propose HDDQ_Gen(High-Dimensional Data and Query Generator) that satisfies such features. HDDQ_Gen supports the following features : (1) clustered distributions, (2) various object distributions in each cluster, (3) various cluster distributions, (4) various correlations among different dimensions, (5) query distributions depending on data distributions. Using these features, users are able to control tile distribution characteristics of data and query sets. Our contribution is fairly important in that HDDQ_Gen provides the benchmark environment evaluating multidimensional indexes correctly.

  • PDF

Identifying Minimum Data Sets of Oral Mucous Integrity Assessment for Documentation Systematization (구강점막의 통합성 사정기록 체계화를 위한 최소자료세트(Minimum Data Set) 규명)

  • Kim, Myoung Soo;Jung, Hyun Kyeong;Kang, Myung Ja;Park, Nam Jung;Kim, Hyun Hee;Ryu, Jeong Mi
    • Journal of Korean Critical Care Nursing
    • /
    • v.12 no.1
    • /
    • pp.46-56
    • /
    • 2019
  • Purpose : The purpose of this study was to identify minimum data sets for oral mucous integrity-related documentation and to analyze nursing records for oral care. Methods: To identify minimum data sets for oral status, the authors reviewed 26 assessment tools and a practical guideline for oral care. The content validity of the minimum data sets was assessed by three nurse specialists. To map the minimum data sets to nursing records, the authors examined 107 nursing records derived from 44 patients who received chemotherapy or hematopoietic stem cell transplantation in one tertiary hospital. Results: The minimum data sets were 10 elements such as location, mucositis grade, pain, hygiene, dysphagia, exudate, inflammation, difficulty speaking, and moisture. Inflammation contained two value sets: type and color. Mucositis grade, pain, dysphagia and inflammation were recorded well, accounting for a complete mapping rate of 100%. Hygiene (100%) was incompletely mapped, and there were no records for exudate (83.2%), difficulty speaking (99.1%), or moisture (88.8%). Conclusion: This study found that nursing records on oral mucous integrity were not sufficient and could be improved by adopting minimum data sets as identified in this study.

Development and Application of Fracture Toughness Database (파괴인성 데이터베이스 구축 및 응용)

  • Kang, Jae-Youn;Song, Ji-Ho;Choi, Byung-Ick
    • Proceedings of the KSME Conference
    • /
    • 2004.11a
    • /
    • pp.61-66
    • /
    • 2004
  • Fracture toughness database system was developed with Visual Foxpro 6.0 and operates in MS Windows environment. The database system contains 10,278 sets of $K_{IC}$ data, 7,046 sets of $K_{C}$ data, 784 sets of $J_{IC}$ data, 571 sets of CTOD data, 62 sets of $K_{a}$ data and 26 sets of $K_{Id}$ data. The data were collected from JSMS(Society of Material Science, Japan) fracture toughness data book and USAF(United States Air Force) crack growth database. In addition, the database was applied to predicting $K_{IC}$ from tensile material properties using artificial neural networks.

  • PDF

Assessment of the Reliability of Protein-Protein Interactions Using Protein Localization and Gene Expression Data

  • Lee, Hyun-Ju;Deng, Minghua;Sun, Fengzhu;Chen, Ting
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.313-318
    • /
    • 2005
  • Estimating the reliability of protein-protein interaction data sets obtained by high-throughput technologies such as yeast two-hybrid assays and mass spectrometry is of great importance. We develop a maximum likelihood estimation method that uses both protein localization and gene expression data to estimate the reliability of protein interaction data sets. By integrating protein localization data and gene expression data, we can obtain more accurate estimates of the reliability of various interaction data sets. We apply the method to protein physical interaction data sets and protein complex data sets. The reliability of the yeast two-hybrid interactions by Ito et al. (2001) is 27%, and that by Uetz et at.(2000) is 68%. The reliability of the protein complex data sets using tandem affinity purification-mass spec-trometry (TAP) by Gavin et at. (2002) is 45%, and that using high-throughput mass spectrometric protein complex identification (HMS-PCI) by Ho et al. (2002) is 20%. The method is general and can be applied to analyze any protein interaction data sets.

  • PDF

Generalization of Quantification for PLS Correlation

  • Yi, Seong-Keun;Huh, Myung-Hoe
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.1
    • /
    • pp.225-237
    • /
    • 2012
  • This study proposes a quantification algorithm for a PLS method with several sets of variables. We called the quantification method for PLS with more than 2 sets of data a generalization. The basis of the quantification for PLS method is singular value decomposition. To derive the form of singular value decomposition in the data with more than 2 sets more easily, we used the constraint, $a^ta+b^tb+c^tc=3$ not $a^ta=1$, $b^tb=1$, and $c^tc=1$, for instance, in the case of 3 data sets. However, to prove that there is no difference, we showed it by the use of 2 data sets case because it is very complicate to prove with 3 data sets. The keys of the study are how to form the singular value decomposition and how to get the coordinates for the plots of variables and observations.

A Comparison of the Land Cover Data Sets over Asian Region: USGS, IGBP, and UMd (아시아 지역 지면피복자료 비교 연구: USGS, IGBP, 그리고 UMd)

  • Kang, Jeon-Ho;Suh, Myoung-Seok;Kwak, Chong-Heum
    • Atmosphere
    • /
    • v.17 no.2
    • /
    • pp.159-169
    • /
    • 2007
  • A comparison of the three land cover data sets (United States Geological Survey: USGS, International Geosphere Biosphere Programme: IGBP, and University of Maryland: UMd), derived from 1992-1993 Advanced Very High Resolution Radiometer(AVHRR) data sets, was performed over the Asian continent. Preprocesses such as the unification of map projection and land cover definition, were applied for the comparison of the three different land cover data sets. Overall, the agreement among the three land cover data sets was relatively high for the land covers which have a distinct phenology, such as urban, open shrubland, mixed forest, and bare ground (>45%). The ratios of triple agreement (TA), couple agreement (CA) and total disagreement (TD) among the three land cover data sets are 30.99%, 57.89% and 8.91%, respectively. The agreement ratio between USGS and IGBP is much greater (about 80%) than that (about 32%) between USGS and UMd (or IGBP and UMd). The main reasons for the relatively low agreement among the three land cover data sets are differences in 1) the number of land cover categories, 2) the basic input data sets used for the classification, 3) classification (or clustering) methodologies, and 4) level of preprocessing. The number of categories for the USGS, IGBP and UMd are 24, 17 and 14, respectively. USGS and IGBP used only the 12 monthly normalized difference vegetation index (NDVI), whereas UMd used the 12 monthly NDVI and other 29 auxiliary data derived from AVHRR 5 channels. USGS and IGBP used unsupervised clustering method, whereas UMd used the supervised technique, decision tree using the ground truth data derived from the high resolution Landsat data. The insufficient preprocessing in USGS and IGBP compared to the UMd resulted in the spatial discontinuity and misclassification.

Calculating Attribute Values using Interval-valued Fuzzy Sets in Fuzzy Object-oriented Data Models (퍼지객체지향자료모형에서 구간값 퍼지집합을 이용한 속성값 계산)

  • Cho Sang-Yeop;Lee Jong-Chan
    • Journal of Internet Computing and Services
    • /
    • v.4 no.4
    • /
    • pp.45-51
    • /
    • 2003
  • In general, the values for attribute appearing in fuzzy object-oriented data models are represented by the fuzzy sets. If it can allow the attribute values in the fuzzy object-oriented data models to be represented by the interval-valued fuzzy sets, then it can allow the fuzzy object-oriented data models to represent the attribute values in more flexible manner. The attribute values of frames appearing in the inheritance structure of the fuzzy object-oriented data models are calculated by a prloritized conjunction operation using interval-valued fuzzy sets. This approach can be applied to knowledge and information processing in which degree of membership is represented as not the conventional fuzzy sets but the interval-valued fuzzy sets.

  • PDF

Knowledge Extraction from Affective Data using Rough Sets Model and Comparison between Rough Sets Theory and Statistical Method (러프집합이론을 중심으로 한 감성 지식 추출 및 통계분석과의 비교 연구)

  • Hong, Seung-Woo;Park, Jae-Kyu;Park, Sung-Joon;Jung, Eui-S.
    • Journal of the Ergonomics Society of Korea
    • /
    • v.29 no.4
    • /
    • pp.631-637
    • /
    • 2010
  • The aim of affective engineering is to develop a new product by translating customer affections into design factors. Affective data have so far been analyzed using a multivariate statistical analysis, but the affective data do not always have linear features assumed under normal distribution. Rough sets model is an effective method for knowledge discovery under uncertainty, imprecision and fuzziness. Rough sets model is to deal with any type of data regardless of their linearity characteristics. Therefore, this study utilizes rough sets model to extract affective knowledge from affective data. Four types of scent alternatives and four types of sounds were designed and the experiment was performed to look into affective differences in subject's preference on air conditioner. Finally, the purpose of this study also is to extract knowledge from affective data using rough sets model and to figure out the relationships between rough sets based affective engineering method and statistical one. The result of a case study shows that the proposed approach can effectively extract affective knowledge from affective data and is able to discover the relationships between customer affections and design factors. This study also shows similar results between rough sets model and statistical method, but it can be made more valuable by comparing fuzzy theory, neural network and multivariate statistical methods.

Wavenumber Correlation Analysis of Statellite Geopotential Anomalies

  • Kim, Jeong-Woo;Kim, Won-Kyun;Kim, Hye-Yun
    • Economic and Environmental Geology
    • /
    • v.33 no.2
    • /
    • pp.111-116
    • /
    • 2000
  • Indentifying anomaly correlations between data sets is the basis for rationalizig geopotential interpretation and theory. A procedure is presented that constitutes an effective process for identifying correlative features between the two or more geopotential data sets. Anomaly features that show direct, inverse, or no correlations between the data may be separated by applying filters in the frequency domains of the data sets. The correlation filter passes or rejects wavenumbers between co-registered data sets based on the correlation coefficient between common wavenumbers as given by the cosine of their phase difference. This study includes an example of Magsat magnetic anomaly profile that illustrates the usefulness of the procedure for extracting correlative features between the data sets.

  • PDF