• Title/Summary/Keyword: Data Generalization

Search Result 527, Processing Time 0.028 seconds

Developing a Neural-Based Credit Evaluation System with Noisy Data (불량 데이타를 포함한 신경망 신용 평가 시스템의 개발)

  • Kim, Jeong-Won;Choi, Jong-Uk;Choi, Hong-Yun;Chuong, Yoon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.1 no.2
    • /
    • pp.225-236
    • /
    • 1994
  • Many research result conducted by neural network researchers claimed that the degree of generalization of the neural network system is higher or at least equal to that of statistical methods. However, those successful results could be brought only if the neural network was trained by appropriately sound data, having a little of noisy data and being large enough to control noisy data. Real data used in a lot of fields, especially business fields, were not so sound that the network have frequently failed to obtain satisfactory prediction accuracy, the degree of generalization. Enhancing the degree of generalization with noisy data is discussed in this study. The suggestion, which was obtained through a series of experiments, to enhance the degree of generalization is to remove inconsistent data by checking overlapping and inconsistencies. Furthermore, the previous conclusion by other reports is also confirmed that the learning mechanism of neural network takes average value of two inconsistent data included in training set[2]. The interim results of on-going research project are reported in this paper These are ann architecture of the neural network adopted in this project and the whole idea of developing on-line credit evaluation system,being intergration of the expert(resoning)system and the neural network(learning system.Another definite result is corroborated through this study that quickprop,being agopted as a learing algorithm, also has more speedy learning process than does back propagation even in very noisy environment.

  • PDF

Predicting movie audience with stacked generalization by combining machine learning algorithms

  • Park, Junghoon;Lim, Changwon
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.3
    • /
    • pp.217-232
    • /
    • 2021
  • The Korea film industry has matured and the number of movie-watching per capita has reached the highest level in the world. Since then, movie industry growth rate is decreasing and even the total sales of movies per year slightly decreased in 2018. The number of moviegoers is the first factor of sales in movie industry and also an important factor influencing additional sales. Thus it is important to predict the number of movie audiences. In this study, we predict the cumulative number of audiences of films using stacking, an ensemble method. Stacking is a kind of ensemble method that combines all the algorithms used in the prediction. We use box office data from Korea Film Council and web comment data from Daum Movie (www.movie.daum.net). This paper describes the process of collecting and preprocessing of explanatory variables and explains regression models used in stacking. Final stacking model outperforms in the prediction of test set in terms of RMSE.

Effect of Building Generalization in a Lattice Cell Form on the Spatial Connectivity of Overland Storm Waterways in an Urban Residential Area (격자형 건물 일반화가 도시 주거지 빗물 유출경로의 연속성에 미치는 영향)

  • JEON, Ka-Young;HA, Sung-Ryong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.20 no.1
    • /
    • pp.137-151
    • /
    • 2017
  • The space between urban buildings becomes a waterway during rain events and requires a boundary condition in numerical calculations on grids to separate overland storm flows from building areas. Minimization of the building data distortion as a boundary condition is a necessary step for generating accurate calculation results. A building generalization is used to reduce the distortion of building shapes and areas during a raster conversion. The objective of this study was to provide the appropriate threshold value for building generalization and grid size in a numerical calculation. The impact of building generation on the connectivity of urban storm waterways were analyzed for a general residential area. The building generalization threshold value and the grid size for numerical analysis were selected as the independent variables for analysis, and the number and area of sinks were used as the dependent variables. The values for the building generalization threshold and grid size were taken as the optimal values to maximize the building area and minimize the sink area. With a 3 m generalization threshold, sets of $5{\times}5m$ to $10{\times}10m$ caused 5% less building area and 94.4% more sink area compared to the original values. Two sites representing general residential area types 2 and 3 were used to verify building generalization thresholds for improving the connectivity of storm waterways. It is clear that the recommended values are effective for reducing the distortion in both building and sink areas.

Spatiotemporal Moving Pattern Discovery using Location Generalization of Moving Objects (이동객체 위치 일반화를 이용한 시공간 이동 패턴 탐사)

  • Lee, Jun-Wook;Nam, Kwang-Woo
    • The KIPS Transactions:PartD
    • /
    • v.10D no.7
    • /
    • pp.1103-1114
    • /
    • 2003
  • Currently, one of the most critical issues in developing the service support system for various spatio-temporal applications is the discoverying of meaningful knowledge from the large volume of moving object data. This sort of knowledge refers to the spatiotemporal moving pattern. To discovery such knowledge, various relationships between moving objects such as temporal, spatial and spatiotemporal topological relationships needs to be considered in knowledge discovery. In this paper, we proposed an efficient method, MPMine, for discoverying spatiotemporal moving patterns. The method not only has considered both temporal constraint and spatial constrain but also performs the spatial generalization using a spatial topological operation, contain(). Different from the previous temporal pattern methods, the proposed method is able to save the search space by using the location summarization and generalization of the moving object data. Therefore, Efficient discoverying of the useful moving patterns is possible.

Segment unit shuffling layer in deep neural networks for text-independent speaker verification (문장 독립 화자 인증을 위한 세그멘트 단위 혼합 계층 심층신경망)

  • Heo, Jungwoo;Shim, Hye-jin;Kim, Ju-ho;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.2
    • /
    • pp.148-154
    • /
    • 2021
  • Text-Independent speaker verification needs to extract text-independent speaker embedding to improve generalization performance. However, deep neural networks that depend on training data have the potential to overfit text information instead of learning the speaker information when repeatedly learning from the identical time series. In this paper, to prevent the overfitting, we propose a segment unit shuffling layer that divides and rearranges the input layer or a hidden layer along the time axis, thus mixes the time series information. Since the segment unit shuffling layer can be applied not only to the input layer but also to the hidden layers, it can be used as generalization technique in the hidden layer, which is known to be effective compared to the generalization technique in the input layer, and can be applied simultaneously with data augmentation. In addition, the degree of distortion can be adjusted by adjusting the unit size of the segment. We observe that the performance of text-independent speaker verification is improved compared to the baseline when the proposed segment unit shuffling layer is applied.

Comparative Analysis of Generalization and Justification of the Mathematically Gifted 6th Graders by Learning Styles (초등학교 6학년 수학영재학생들의 학습유형에 따른 일반화 및 정당화 비교 분석)

  • Yu, Migyoung;Chang, Hyewon
    • Journal of Educational Research in Mathematics
    • /
    • v.27 no.3
    • /
    • pp.391-410
    • /
    • 2017
  • This study aims to analyze mathematically gifted students' characteristics of generalization and justification for a given mathematical task and induce didactical implications for individual teaching methods by students' learning styles. To do this, we identified the learning styles of three mathematically gifted 6th graders and observed their processes in solving a given problem. Paper-pencil environment as well as dynamic geometrical environment using Geogebra were provided for three students respectively. We collected and analyzed qualitatively the research data such as the students' activity sheets, the students' records in Geogebra, our observation reports about the processes of generalization and justification, and the records of interview. The results of analysis show that the types of the students' generalization are various while the level of their justifications is identical. Futhermore, their preference of learning environment is also distinguished. Based on the results of analysis, we induced some implications for individual teaching for mathematically gifted students by learning styles.

A Study on the Consecutive Renewal of Road and Building Information in the Multi-scale Digital Maps (다축척 수치지도의 도로 및 건물정보 일괄갱신 연구)

  • Park, Kyeong-Sik
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.29 no.1
    • /
    • pp.21-28
    • /
    • 2011
  • In the existing digital map of the Ver.1.0, it is impossible to make a small scale digital map, which is under the 1/5000 scale map, by using the 1/1000 digital map which is the most large scale one. Because of this reason, the existing digital maps are produced into a 1/1000 and a 1/5000 map by means of two different scale aerial photos. The next generation digital map should be successively related to a small scale digital map based on the most large scale digital one. This is so important from the aspects of data share and the consecutive renewal. Ever since the development of the digital map of the Ver. 2.0, the possibility of making a multi-scale consecutive digital map has been presented and the related research has been done again. The most basic thing in the multi-scale digital maps is to decide the criteria of the generalization between the two scales. In this study, I try to formulate the criteria of the generalization required to make the 1/5000 digital map by using the 111000 digital one. In addition, I by to explore the application possibility of the consecutive renewal by carrying out auto-generalization.

Efficient Incremental Learning using the Preordered Training Data (미리 순서가 매겨진 학습 데이타를 이용한 효과적인 증가학습)

  • Lee, Sun-Young;Bang, Sung-Yang
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.2
    • /
    • pp.97-107
    • /
    • 2000
  • Incremental learning generally reduces training time and increases the generalization of a neural network by selecting training data incrementally during the training. However, the existing methods of incremental learning repeatedly evaluate the importance of training data every time they select additional data. In this paper, an incremental learning algorithm is proposed for pattern classification problems. It evaluates the importance of each piece of data only once before starting the training. The importance of the data depends on how close they are to the decision boundary. The current paper presents an algorithm which orders the data according to their distance to the decision boundary by using clustering. Experimental results of two artificial and real world classification problems show that this proposed incremental learning method significantly reduces the size of the training set without decreasing generalization performance.

  • PDF

Learning Domain Invariant Representation via Self-Rugularization (자기 정규화를 통한 도메인 불변 특징 학습)

  • Hyun, Jaeguk;Lee, ChanYong;Kim, Hoseong;Yoo, Hyunjung;Koh, Eunjin
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.4
    • /
    • pp.382-391
    • /
    • 2021
  • Unsupervised domain adaptation often gives impressive solutions to handle domain shift of data. Most of current approaches assume that unlabeled target data to train is abundant. This assumption is not always true in practices. To tackle this issue, we propose a general solution to solve the domain gap minimization problem without any target data. Our method consists of two regularization steps. The first step is a pixel regularization by arbitrary style transfer. Recently, some methods bring style transfer algorithms to domain adaptation and domain generalization process. They use style transfer algorithms to remove texture bias in source domain data. We also use style transfer algorithms for removing texture bias, but our method depends on neither domain adaptation nor domain generalization paradigm. The second regularization step is a feature regularization by feature alignment. Adding a feature alignment loss term to the model loss, the model learns domain invariant representation more efficiently. We evaluate our regularization methods from several experiments both on small dataset and large dataset. From the experiments, we show that our model can learn domain invariant representation as much as unsupervised domain adaptation methods.

A Study on the Development of DGA based on Deep Learning (Deep Learning 기반의 DGA 개발에 대한 연구)

  • Park, Jae-Gyun;Choi, Eun-Soo;Kim, Byung-June;Zhang, Pan
    • Korean Journal of Artificial Intelligence
    • /
    • v.5 no.1
    • /
    • pp.18-28
    • /
    • 2017
  • Recently, there are many companies that use systems based on artificial intelligence. The accuracy of artificial intelligence depends on the amount of learning data and the appropriate algorithm. However, it is not easy to obtain learning data with a large number of entity. Less data set have large generalization errors due to overfitting. In order to minimize this generalization error, this study proposed DGA which can expect relatively high accuracy even though data with a less data set is applied to machine learning based genetic algorithm to deep learning based dropout. The idea of this paper is to determine the active state of the nodes. Using Gradient about loss function, A new fitness function is defined. Proposed Algorithm DGA is supplementing stochastic inconsistency about Dropout. Also DGA solved problem by the complexity of the fitness function and expression range of the model about Genetic Algorithm As a result of experiments using MNIST data proposed algorithm accuracy is 75.3%. Using only Dropout algorithm accuracy is 41.4%. It is shown that DGA is better than using only dropout.