• Title/Summary/Keyword: feature models

Search Result 1,103, Processing Time 0.031 seconds

Analysis of Interactions in Multiple Genes using IFSA(Independent Feature Subspace Analysis) (IFSA 알고리즘을 이용한 유전자 상호 관계 분석)

  • Kim, Hye-Jin;Choi, Seung-Jin;Bang, Sung-Yang
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.3
    • /
    • pp.157-165
    • /
    • 2006
  • The change of external/internal factors of the cell rquires specific biological functions to maintain life. Such functions encourage particular genes to jnteract/regulate each other in multiple ways. Accordingly, we applied a linear decomposition model IFSA, which derives hidden variables, called the 'expression mode' that corresponds to the functions. To interpret gene interaction/regulation, we used a cross-correlation method given an expression mode. Linear decomposition models such as principal component analysis (PCA) and independent component analysis (ICA) were shown to be useful in analyzing high dimensional DNA microarray data, compared to clustering methods. These methods assume that gene expression is controlled by a linear combination of uncorrelated/indepdendent latent variables. However these methods have some difficulty in grouping similar patterns which are slightly time-delayed or asymmetric since only exactly matched Patterns are considered. In order to overcome this, we employ the (IFSA) method of [1] to locate phase- and shut-invariant features. Membership scoring functions play an important role to classify genes since linear decomposition models basically aim at data reduction not but at grouping data. We address a new function essential to the IFSA method. In this paper we stress that IFSA is useful in grouping functionally-related genes in the presence of time-shift and expression phase variance. Ultimately, we propose a new approach to investigate the multiple interaction information of genes.

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM (딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증)

  • Cha, Sungjae;Kang, Jungseok
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.1-32
    • /
    • 2018
  • In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'. A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment. The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman's (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski's (1984) and Ohlson's (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change. Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction. The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared. In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data. Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society. Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.

A Comparative Analysis of News Frame on U. S. Beef Imports and Candlelight Vigils (미국산 수입쇠고기와 촛불시위 보도에 나타난 뉴스 프레임 비교 연구)

  • Im, Yang-June
    • Korean journal of communication and information
    • /
    • v.46
    • /
    • pp.108-147
    • /
    • 2009
  • This study explores the news frames on the U. S. beef imports and candlelight vigils covered by the two national dailies such as ChosunIlbo and the Hankyoreh Shinmun; the KwangwonIlbo, a local daily. The news frames extracted based on the models of Iyengar(1987), Semetko & Valkenburg(2000) and other researchers are attribution of responsibility, economic sequences, protest against the authorities, national health and governmental public relations and so on. The result shows that the news reports are consisted of the straight news(75.9%), feature stories(11.7%) and editorials(6.3%). More specifically, there is a comparatively hight ratio of editorials(11.0%) for the ChosunIlbo, feature stories(20.9%) for the Hankyoreh, and the straight news(89.7%) for the KwangwonIlbo. In terms of the news frames stressed by the three dailies, the ChosunIlbo focuses and stresses on the national health(17.8%) and the attribution of responsibilities(10.6%). However, the Hankyoreh have a tendency to stress on the protest against the authorities(31.3%) and attribution of responsibilities(38.4%); the KwangwonIlbo, focuses on the protest against the authorities(38.4%) and the economic sequences(17.9%). Finally, in the case of the main characteristics of the dailies, the governmental public relations frame is found only on the ChosunIlbo that has a comparatively high ratio; the Hankyoreh also has a high ratio of the feature stories on the U. S. beef imports. Even thought the KwangwonIlbo has a high ratio of the economic sequence frame, the ratio of opinion pages, such as editorial and columns, the local newspaper has not spoken up for the potential economic crisis of the local Kwangwon province beef industry, mainly caused by the U. S. import beef.

  • PDF

A Study on Training Dataset Configuration for Deep Learning Based Image Matching of Multi-sensor VHR Satellite Images (다중센서 고해상도 위성영상의 딥러닝 기반 영상매칭을 위한 학습자료 구성에 관한 연구)

  • Kang, Wonbin;Jung, Minyoung;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1505-1514
    • /
    • 2022
  • Image matching is a crucial preprocessing step for effective utilization of multi-temporal and multi-sensor very high resolution (VHR) satellite images. Deep learning (DL) method which is attracting widespread interest has proven to be an efficient approach to measure the similarity between image pairs in quick and accurate manner by extracting complex and detailed features from satellite images. However, Image matching of VHR satellite images remains challenging due to limitations of DL models in which the results are depending on the quantity and quality of training dataset, as well as the difficulty of creating training dataset with VHR satellite images. Therefore, this study examines the feasibility of DL-based method in matching pair extraction which is the most time-consuming process during image registration. This paper also aims to analyze factors that affect the accuracy based on the configuration of training dataset, when developing training dataset from existing multi-sensor VHR image database with bias for DL-based image matching. For this purpose, the generated training dataset were composed of correct matching pairs and incorrect matching pairs by assigning true and false labels to image pairs extracted using a grid-based Scale Invariant Feature Transform (SIFT) algorithm for a total of 12 multi-temporal and multi-sensor VHR images. The Siamese convolutional neural network (SCNN), proposed for matching pair extraction on constructed training dataset, proceeds with model learning and measures similarities by passing two images in parallel to the two identical convolutional neural network structures. The results from this study confirm that data acquired from VHR satellite image database can be used as DL training dataset and indicate the potential to improve efficiency of the matching process by appropriate configuration of multi-sensor images. DL-based image matching techniques using multi-sensor VHR satellite images are expected to replace existing manual-based feature extraction methods based on its stable performance, thus further develop into an integrated DL-based image registration framework.

Understanding the protox inhibition activity of novel 1-(5-methyl-3-phenylisoxazolin-5-yl)methoxy-2-chloro-4-fluorobenzene derivatives using comparative molecular field analysis (CoMFA) methodology (비교 분자장 분석 (CoMFA) 방법에 따른 1-(5-methyl-3-phenylisoxazolin-5-yl)methoxy-2-chloro-4-fluoro-benzene 유도체들의 Protox 저해 활성에 관한 이해)

  • Sung, Nack-Do;Song, Jong-Hwan;Yang, Sook-Young;Park, Kyeng-Yong
    • The Korean Journal of Pesticide Science
    • /
    • v.8 no.3
    • /
    • pp.151-161
    • /
    • 2004
  • Three dimensional quantitative structure-activity relationships (3D-QSAR) studies for the protox inhibition activities against root and shoot of rice plant (Orysa sativa L.) and barnyardgrass (Echinochloa crus-galli) by a series of new A=3,4,5,6-tetrahydrophthalimino, B=3-chloro-4,5,6,7-tetrahydro-2H-indazolyl and C=3,4-dimethylmaleimino group, and R-group substituted on the phenyl ring in 1-(5-methyl-3-phenylisoxazolin-5-yl)methoxy-2chloro-4-fluorobenzene derivatives were performed using comparative molecular field analyses (CoMFA) methodology with Gasteiger-Huckel charge. Four CoMFA models for the protox inhibition activities against root and shoot of the two plants were generated using 46 molecules as training set and the predictive ability of the each models was evaluated against a test set of 8 molecules. And the statistical results of these models with combination (SIH) of standard field, indicator field and H-bond field showed the best predictability of the protox inhibition activities based on the cross-validated value $r^2_{cv.}$ $(q^2=0.635\sim0.924)$, conventional coefficient $(r^2_{ncv.}=0.928\sim0.977)$ and PRESS value $(0.091\sim0.156)$, respectively. The activities exhibited a strong correlation with steric $(74.3\sim87.4%)$, electrostatic $(10.10\sim18.5%)$ and hydrophobic $(1.10\sim8.30%)$ factors of the molecules. The steric feature of molecule may be an important factor for the activities. We founded that an novel selective and higher protox inhibitors between the two plants may be designed by modification of X-subsitutents for barnyardgrass based upon the results obtained from CoMFA analyses.

Analysis of Reform Model to Records Management System in Public Institution -from Reform to Records Management System in 2006- (행정기관의 기록관리시스템 개선모델 분석 -2006년 기록관리시스템 혁신을 중심으로-)

  • Kwag, Jeong
    • The Korean Journal of Archival Studies
    • /
    • no.14
    • /
    • pp.153-190
    • /
    • 2006
  • Externally, business environment in public institution has being changed as government business reference model(BRM) appeared and business management systems for transparency of a policy decision process are introduced. After Records Automation System started its operation, dissatisfaction grows because of inadequacy in system function and the problems about authenticity of electronic records. With these backgrounds, National Archives and Records Service had carried out 'Information Strategy Planning for Reform to Records Management System' for 5 months from September, 2005. As result, this project reengineers current records management processes and presents the world-class system model. After Records and Archives Management Act was made, the records management in public institution has propelled the concept that paper records are handled by means of the electric data management. In this reformed model, however, we concentrates on the electric records, which have gradually replaced the paper records and investigate on the management methodology considering attributes of electric records. According to this new paradigm, the electric records management raises a new issue in the records management territory. As the major contents of the models connecting with electric records management were analyzed and their significance and bounds were closely reviewed, the aim of this paper is the understanding of the future bearings of the management system. Before the analysis of the reformed models, issues in new business environments and their records management were reviewed. The government's BRM and Business management system prepared the general basis that can manage government's whole results on the online and classify them according to its function. In this points, the model is innovative. However considering the records management, problems such as division into Records Classification, definitions and capturing methods of records management objects, limitations of Records Automation System and so on was identified. For solving these problems, the reformed models that has a records classification system based on the business classification, extended electronic records filing system, added functions for strengthening electric records management and so on was proposed. As regards dramatically improving the role of records center in public institution, searching for the basic management methodology of the records management object from various agency and introducing the detail design to keep documents' authenticity, this model forms the basis of the electric records management system. In spite of these innovations, however, the proposed system for real electric records management era is still in its beginning. In near feature, when the studies is concentrated upon the progress of qualified classifications, records capturing plans for foreign records structures such like administration information system, the further study of the previous preservation technology, the developed prospective of electric records management system will be very bright.

Comparative Study on the Methodology of Motor Vehicle Emission Calculation by Using Real-Time Traffic Volume in the Kangnam-Gu (자동차 대기오염물질 산정 방법론 설정에 관한 비교 연구 (강남구의 실시간 교통량 자료를 이용하여))

  • 박성규;김신도;이영인
    • Journal of Korean Society of Transportation
    • /
    • v.19 no.4
    • /
    • pp.35-47
    • /
    • 2001
  • Traffic represents one of the largest sources of primary air pollutants in urban area. As a consequence. numerous abatement strategies are being pursued to decrease the ambient concentration of pollutants. A characteristic of most of the these strategies is a requirement for accurate data on both the quantity and spatial distribution of emissions to air in the form of an atmospheric emission inventory database. In the case of traffic pollution, such an inventory must be compiled using activity statistics and emission factors for vehicle types. The majority of inventories are compiled using passive data from either surveys or transportation models and by their very nature tend to be out-of-date by the time they are compiled. The study of current trends are towards integrating urban traffic control systems and assessments of the environmental effects of motor vehicles. In this study, a methodology of motor vehicle emission calculation by using real-time traffic data was studied. A methodology for estimating emissions of CO at a test area in Seoul. Traffic data, which are required on a street-by-street basis, is obtained from induction loops of traffic control system. It was calculated speed-related mass of CO emission from traffic tail pipe of data from traffic system, and parameters are considered, volume, composition, average velocity, link length. And, the result was compared with that of a method of emission calculation by VKT(Vehicle Kilometer Travelled) of vehicles of category.

  • PDF

Impedance Spectroscopy Models for X5R Multilayer Ceramic Capacitors

  • Lee, Jong-Sook;Shin, Eui-Chol;Shin, Dong-Kyu;Kim, Yong;Ahn, Pyung-An;Seo, Hyun-Ho;Jo, Jung-Mo;Kim, Jee-Hoon;Kim, Gye-Rok;Kim, Young-Hun;Park, Ji-Young;Kim, Chang-Hoon;Hong, Jeong-Oh;Hur, Kang-Heon
    • Journal of the Korean Ceramic Society
    • /
    • v.49 no.5
    • /
    • pp.475-483
    • /
    • 2012
  • High capacitance X5R MLCCs based on $BaTiO_3$ ceramic dielectric layers exhibit a single broad, asymmetric arc shape impedance and modulus response over the wide frequency range between 1 MHz to 0.01 Hz. Analysis according to the conventional brick-layer model for polycrystalline conductors employing a series connection of multiple RC parallel circuits leads to parameters associated with large errors and of little physical significance. A new parametric impedance model is shown to satisfactorily describe the experimental spectra, which is a parallel network of one resistor R representing the DC conductivity thermally activated by 1.32 eV, one ideal capacitor C exactly representing bulk capacitance, and a constant phase element (CPE) Q with complex capacitance $A(i{\omega})^{{\alpha}-1}$ with ${\alpha}$ close to 2/3 and A thermally activated by 0.45 eV or ca. 1/3 of activation energy of DC conductivity. The feature strongly indicate the CK1 model by J. R. Macdonald, where the CPE with 2/3 power-law exponent represents the polarization effects originating from mobile charge carriers. The CPE term is suggested to be directly related to the trapping of the electronic charge carriers and indirectly related to the ionic defects responsible for the insulation resistance degradation.

Estimation and Weighting of Sub-band Reliability for Multi-band Speech Recognition (다중대역 음성인식을 위한 부대역 신뢰도의 추정 및 가중)

  • 조훈영;지상문;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.552-558
    • /
    • 2002
  • Recently, based on the human speech recognition (HSR) model of Fletcher, the multi-band speech recognition has been intensively studied by many researchers. As a new automatic speech recognition (ASR) technique, the multi-band speech recognition splits the frequency domain into several sub-bands and recognizes each sub-band independently. The likelihood scores of sub-bands are weighted according to reliabilities of sub-bands and re-combined to make a final decision. This approach is known to be robust under noisy environments. When the noise is stationary a sub-band SNR can be estimated using the noise information in non-speech interval. However, if the noise is non-stationary it is not feasible to obtain the sub-band SNR. This paper proposes the inverse sub-band distance (ISD) weighting, where a distance of each sub-band is calculated by a stochastic matching of input feature vectors and hidden Markov models. The inverse distance is used as a sub-band weight. Experiments on 1500∼1800㎐ band-limited white noise and classical guitar sound revealed that the proposed method could represent the sub-band reliability effectively and improve the performance under both stationary and non-stationary band-limited noise environments.