• 제목/요약/키워드: Data set

검색결과 10,916건 처리시간 0.037초

러브집합이론과 SOM을 이용한 연속형 속성의 이산화 (Discretization of Continuous Attributes based on Rough Set Theory and SOM)

  • 서완석;김재련
    • 산업경영시스템학회지
    • /
    • 제28권1호
    • /
    • pp.1-7
    • /
    • 2005
  • Data mining is widely used for turning huge amounts of data into useful information and knowledge in the information industry in recent years. When analyzing data set with continuous values in order to gain knowledge utilizing data mining, we often undergo a process called discretization, which divides the attribute's value into intervals. Such intervals from new values for the attribute allow to reduce the size of the data set. In addition, discretization based on rough set theory has the advantage of being easily applied. In this paper, we suggest a discretization algorithm based on Rough Set theory and SOM(Self-Organizing Map) as a means of extracting valuable information from large data set, which can be employed even in the case where there lacks of professional knowledge for the field.

인공신경망 이론을 이용한 위성영상의 카테고리분류 (Multi-temporal Remote-Sensing Imag e ClassificationUsing Artificial Neural Networks)

  • 강문성;박승우;임재천
    • 한국농공학회:학술대회논문집
    • /
    • 한국농공학회 2001년도 학술발표회 발표논문집
    • /
    • pp.59-64
    • /
    • 2001
  • The objectives of the thesis are to propose a pattern classification method for remote sensing data using artificial neural network. First, we apply the error back propagation algorithm to classify the remote sensing data. In this case, the classification performance depends on a training data set. Using the training data set and the error back propagation algorithm, a layered neural network is trained such that the training pattern are classified with a specified accuracy. After training the neural network, some pixels are deleted from the original training data set if they are incorrectly classified and a new training data set is built up. Once training is complete, a testing data set is classified by using the trained neural network. The classification results of Landsat TM data show that this approach produces excellent results which are more realistic and noiseless compared with a conventional Bayesian method.

  • PDF

An Efficient Grid Method for Continuous Skyline Computation over Dynamic Data Set

  • Li, He;Jang, Su-Min;Yoo, Kwan-Hee;Yoo, Jae-Soo
    • International Journal of Contents
    • /
    • 제6권1호
    • /
    • pp.47-52
    • /
    • 2010
  • Skyline queries are an important new search capability for multi-dimensional databases. Most of the previous works have focused on processing skyline queries over static data set. However, most of the real applications deal with the dynamic data set. Since dynamic data set constantly changes as time passes, the continuous skyline computation over dynamic data set becomes ever more complicated. In this paper, we propose a multiple layer grids method for continuous skyline computation (MLGCS) that maintains multiple layer grids to manage the dynamic data set. The proposed method divides the work space into multiple layer grids and creates the skyline influence region in the grid of each layer. In the continuous environment, the continuous skyline queries are only handled when the updating data points are in the skyline influence region of each layer grid. Experiments based on various data distributions show that our proposed method outperforms the existing methods.

전류에 관한 학생들의 오인 유형변화의 종단적 연구 (A Longitudinal Study on Students' Misconception patterns of Electric Current)

  • 문충식;권재술
    • 한국과학교육학회지
    • /
    • 제11권1호
    • /
    • pp.1-14
    • /
    • 1991
  • The objectives of the study is to examine students concepts changes using a longitudinal study. The study compared two data sets collected in 1989 and in 1990 using the same instrument and subjects. The first data set was collected by Ahn(Ahn, 1989). In the study, students' patterns of misconceptions were examined in the following aspects : 1) Comparison of the students' misconception before observation of actual phenomenon in the first data between the two data sets. 2) The analysis of the patterns of students misconceptions of the second data set In terms of students' patterns of conceptual change before and after observation in the first data set. In the study, overall patterns of students' misconceptions appeared in the second data set were similar to those of the first data set ; however, about 40% of individual student's patterns of misconceptions were changed. Even the students who changed their opinion from misconception to scientific by observing the give phenomenon in the previous study(the first data set) returned to their original misconception after one year. The researcher interpreted this phenomenon in terms of the characteristics of the three kinds of cognitive conflict suggested by Kwon(Kwon, 1989).

  • PDF

Finding Unexpected Test Accuracy by Cross Validation in Machine Learning

  • Yoon, Hoijin
    • International Journal of Computer Science & Network Security
    • /
    • 제21권12spc호
    • /
    • pp.549-555
    • /
    • 2021
  • Machine Learning(ML) splits data into 3 parts, which are usually 60% for training, 20% for validation, and 20% for testing. It just splits quantitatively instead of selecting each set of data by a criterion, which is very important concept for the adequacy of test data. ML measures a model's accuracy by applying a set of validation data, and revises the model until the validation accuracy reaches on a certain level. After the validation process, the complete model is tested with the set of test data, which are not seen by the model yet. If the set of test data covers the model's attributes well, the test accuracy will be close to the validation accuracy of the model. To make sure that ML's set of test data works adequately, we design an experiment and see if the test accuracy of model is always close to its validation adequacy as expected. The experiment builds 100 different SVM models for each of six data sets published in UCI ML repository. From the test accuracy and its validation accuracy of 600 cases, we find some unexpected cases, where the test accuracy is very different from its validation accuracy. Consequently, it is not always true that ML's set of test data is adequate to assure a model's quality.

준지도 학습 기반 객체 탐지 모델에서 데이터셋 변화에 따른 성능 변화 (Performance Change accroding to Data Set Size Change in Semi-Supervised Learning based Object Detection)

  • 유승수;황원준
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송∙미디어공학회 2022년도 추계학술대회
    • /
    • pp.88-90
    • /
    • 2022
  • Semi Supervised Learning 은 일부의 data 에는 labeling 을 하고 나머지 data 에는 labeling 을 안한채로 학습을 진행하는 방법이다. Object Detection 은 이미지에서 여러개의 객체들의 대한 위치를 여러개의 바운딩 박스로 지정해서 찾는 Computer Vision task 이다. 당연하게도, model training 단계에서 사용되는 data set 의 크기가 크고 객체가 많을 수록 일반적으로 model 의 성능이 좋아 질 것이다. 하지만 실험 환경에 따라 data set 을 잘 확보하지 못하던가, 실험 장치가 데이터 셋을 감당하지 못하는 등의 문제가 발생 할 수 있다. 그렇기에 본 논문에서는 semi supervised learning based object detection model 을 알아보고 data set 의 크기를 조절해가며 modle 을 training 시킨 뒤 data set 의 크기에 따라 성능이 어떻게 변화하는 지를 알아 볼 것이다.

  • PDF

퍼지 데이터를 이용한 불량률(p) 관리도의 설계 (A Design of Control Chart for Fraction Nonconforming Using Fuzzy Data)

  • 김계완;서현수;윤덕균
    • 품질경영학회지
    • /
    • 제32권2호
    • /
    • pp.191-200
    • /
    • 2004
  • Using the p chart is not adequate in case that there are lots of data and it is difficult to divide into products conforming or nonconforming because of obscurity of binary classification. So we need to design a new control chart which represents obscure situation efficiently. This study deals with the method to performing arithmetic operation representing fuzzy data into fuzzy set by applying fuzzy set theory and designs a new control chart taking account of a concept of classification on the term set and membership function associated with term set.

퍼지 속성 집합을 이용한 데이터 분석 모델 (Data Analysis Model using the Fuzzy Property Set)

  • 이진호;이전영
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1997년도 춘계학술대회 학술발표 논문집
    • /
    • pp.252-255
    • /
    • 1997
  • In this paper, we will propose the methodology of data analysis using the fuzzy property set model. In real world, the data can be represented with the object. $\theta$. and the property, $\pi$, and its has-property relation, P. Then, the conceptual space can be defined with the chosen properties. Each object has a unique location in the conceptual space. In Fuzzy mode, the fuzzy property, and fuzzy conceptual space can be redefined. To analyze data using the fuzzy property set model, the rough set need to be defined in the fuzzy conceptual space.

  • PDF

4분위수에 대한 메모 (A Note on Quartile)

  • 박동준;황현미
    • 품질경영학회지
    • /
    • 제26권3호
    • /
    • pp.150-155
    • /
    • 1998
  • It is necessary to describe a data set after collection of data in elementary statistics course. Two major numerical summary of the data set may be measures of central location and dispersion. There are various unmerical summary methods in presenting how data are dispersed and each method has its own advantages and disadvantages. Quartiles are discussed among several methods to describe dispersion of data set. When data type is discrete, exact quartile values are sometimes ambiguous to find, whereas exact quartile values are obtained for contionuous data. Examples of both data types are given. Programs listed below may be used to provide quartiles in MINITAB and SAS.

  • PDF

On the clustering of huge categorical data

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권6호
    • /
    • pp.1353-1359
    • /
    • 2010
  • Basic objective in cluster analysis is to discover natural groupings of items. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input data. Various measures of similarities between objects are developed. In this paper, we consider a clustering of huge categorical real data set which shows the aspects of time-location-activity of Korean people. Some useful similarity measure for the data set, are developed and adopted for the categorical variables. Hierarchical and nonhierarchical clustering method are applied for the considered data set which is huge and consists of many categorical variables.