• Title/Summary/Keyword: 데이터 군집화

Search Result 560, Processing Time 0.039 seconds

A Hybrid Forecasting Framework based on Case-based Reasoning and Artificial Neural Network (사례기반 추론기법과 인공신경망을 이용한 서비스 수요예측 프레임워크)

  • Hwang, Yousub
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.43-57
    • /
    • 2012
  • To enhance the competitive advantage in a constantly changing business environment, an enterprise management must make the right decision in many business activities based on both internal and external information. Thus, providing accurate information plays a prominent role in management's decision making. Intuitively, historical data can provide a feasible estimate through the forecasting models. Therefore, if the service department can estimate the service quantity for the next period, the service department can then effectively control the inventory of service related resources such as human, parts, and other facilities. In addition, the production department can make load map for improving its product quality. Therefore, obtaining an accurate service forecast most likely appears to be critical to manufacturing companies. Numerous investigations addressing this problem have generally employed statistical methods, such as regression or autoregressive and moving average simulation. However, these methods are only efficient for data with are seasonal or cyclical. If the data are influenced by the special characteristics of product, they are not feasible. In our research, we propose a forecasting framework that predicts service demand of manufacturing organization by combining Case-based reasoning (CBR) and leveraging an unsupervised artificial neural network based clustering analysis (i.e., Self-Organizing Maps; SOM). We believe that this is one of the first attempts at applying unsupervised artificial neural network-based machine-learning techniques in the service forecasting domain. Our proposed approach has several appealing features : (1) We applied CBR and SOM in a new forecasting domain such as service demand forecasting. (2) We proposed our combined approach between CBR and SOM in order to overcome limitations of traditional statistical forecasting methods and We have developed a service forecasting tool based on the proposed approach using an unsupervised artificial neural network and Case-based reasoning. In this research, we conducted an empirical study on a real digital TV manufacturer (i.e., Company A). In addition, we have empirically evaluated the proposed approach and tool using real sales and service related data from digital TV manufacturer. In our empirical experiments, we intend to explore the performance of our proposed service forecasting framework when compared to the performances predicted by other two service forecasting methods; one is traditional CBR based forecasting model and the other is the existing service forecasting model used by Company A. We ran each service forecasting 144 times; each time, input data were randomly sampled for each service forecasting framework. To evaluate accuracy of forecasting results, we used Mean Absolute Percentage Error (MAPE) as primary performance measure in our experiments. We conducted one-way ANOVA test with the 144 measurements of MAPE for three different service forecasting approaches. For example, the F-ratio of MAPE for three different service forecasting approaches is 67.25 and the p-value is 0.000. This means that the difference between the MAPE of the three different service forecasting approaches is significant at the level of 0.000. Since there is a significant difference among the different service forecasting approaches, we conducted Tukey's HSD post hoc test to determine exactly which means of MAPE are significantly different from which other ones. In terms of MAPE, Tukey's HSD post hoc test grouped the three different service forecasting approaches into three different subsets in the following order: our proposed approach > traditional CBR-based service forecasting approach > the existing forecasting approach used by Company A. Consequently, our empirical experiments show that our proposed approach outperformed the traditional CBR based forecasting model and the existing service forecasting model used by Company A. The rest of this paper is organized as follows. Section 2 provides some research background information such as summary of CBR and SOM. Section 3 presents a hybrid service forecasting framework based on Case-based Reasoning and Self-Organizing Maps, while the empirical evaluation results are summarized in Section 4. Conclusion and future research directions are finally discussed in Section 5.

A Two-Stage Learning Method of CNN and K-means RGB Cluster for Sentiment Classification of Images (이미지 감성분류를 위한 CNN과 K-means RGB Cluster 이-단계 학습 방안)

  • Kim, Jeongtae;Park, Eunbi;Han, Kiwoong;Lee, Junghyun;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.139-156
    • /
    • 2021
  • The biggest reason for using a deep learning model in image classification is that it is possible to consider the relationship between each region by extracting each region's features from the overall information of the image. However, the CNN model may not be suitable for emotional image data without the image's regional features. To solve the difficulty of classifying emotion images, many researchers each year propose a CNN-based architecture suitable for emotion images. Studies on the relationship between color and human emotion were also conducted, and results were derived that different emotions are induced according to color. In studies using deep learning, there have been studies that apply color information to image subtraction classification. The case where the image's color information is additionally used than the case where the classification model is trained with only the image improves the accuracy of classifying image emotions. This study proposes two ways to increase the accuracy by incorporating the result value after the model classifies an image's emotion. Both methods improve accuracy by modifying the result value based on statistics using the color of the picture. When performing the test by finding the two-color combinations most distributed for all training data, the two-color combinations most distributed for each test data image were found. The result values were corrected according to the color combination distribution. This method weights the result value obtained after the model classifies an image's emotion by creating an expression based on the log function and the exponential function. Emotion6, classified into six emotions, and Artphoto classified into eight categories were used for the image data. Densenet169, Mnasnet, Resnet101, Resnet152, and Vgg19 architectures were used for the CNN model, and the performance evaluation was compared before and after applying the two-stage learning to the CNN model. Inspired by color psychology, which deals with the relationship between colors and emotions, when creating a model that classifies an image's sentiment, we studied how to improve accuracy by modifying the result values based on color. Sixteen colors were used: red, orange, yellow, green, blue, indigo, purple, turquoise, pink, magenta, brown, gray, silver, gold, white, and black. It has meaning. Using Scikit-learn's Clustering, the seven colors that are primarily distributed in the image are checked. Then, the RGB coordinate values of the colors from the image are compared with the RGB coordinate values of the 16 colors presented in the above data. That is, it was converted to the closest color. Suppose three or more color combinations are selected. In that case, too many color combinations occur, resulting in a problem in which the distribution is scattered, so a situation fewer influences the result value. Therefore, to solve this problem, two-color combinations were found and weighted to the model. Before training, the most distributed color combinations were found for all training data images. The distribution of color combinations for each class was stored in a Python dictionary format to be used during testing. During the test, the two-color combinations that are most distributed for each test data image are found. After that, we checked how the color combinations were distributed in the training data and corrected the result. We devised several equations to weight the result value from the model based on the extracted color as described above. The data set was randomly divided by 80:20, and the model was verified using 20% of the data as a test set. After splitting the remaining 80% of the data into five divisions to perform 5-fold cross-validation, the model was trained five times using different verification datasets. Finally, the performance was checked using the test dataset that was previously separated. Adam was used as the activation function, and the learning rate was set to 0.01. The training was performed as much as 20 epochs, and if the validation loss value did not decrease during five epochs of learning, the experiment was stopped. Early tapping was set to load the model with the best validation loss value. The classification accuracy was better when the extracted information using color properties was used together than the case using only the CNN architecture.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

A Study on the Exploratory Spatial Data Analysis of the Distribution of Longevity Population and the Scale Effect of the Modifiable Areal Unit Problem(MAUP) (장수 인구의 분포 패턴에 관한 탐색적 공간 데이터 분석과 수정 가능한 공간단위 문제(MAUP)의 Scale Effect에 관한 연구)

  • Choi, Don-Jeong;Suh, Yong-Cheol
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.16 no.3
    • /
    • pp.40-53
    • /
    • 2013
  • Most of the existing domestic studies to identify the distribution of longevity population and influencing factors oriented confirmatory approach. Furthermore, most of the studies in this research topic simply have used their own definition of spatial unit of analysis or employed arbitrary spatial units of analysis according to data availability. These research approaches can not sufficiently reflect the spatial characteristic of longevity phenomenon and exposed to the Modifiable Aerial Unit Problem(MAUP). This research performed the Exploratory Spatial Data Analysis(ESDA) to identify the spatial autocorrelation of the distribution of longevity population and investigated whether the modifiable areal unit problem in the aspect of scale effect using spatial population data in Korea. We used Si_Gun_Gu and Eup_Myeon_Dong as two different spatial units of regional longevity indicators measured. Then, we applied Getis-Ord Gi* to investigate the existence of spatial hot spots and cold spots. The results from our analysis show that there exist statistically significant spatial autocorrelation and spatial hot spots and cold spots of regional longevity at both Si_Gun_Gu and Eup_Myeon_Dong levels. This result implies that the modifiable areal unit problem does exist in the studies of spatial patterns of longevity population distribution. The demand for longevity researches would be increased inevitably. In addition, there were apparent differences for the global spatial autocorrelation and local spatial cluster which calculated different spatial units such as Si_Gun_Gu and Eup_Myeon_Dong and this can be seen as scale effect of MAUP. The findings from our analysis show that any study in this topic can mislead results when the modifiable areal unit problem and spatial autocorrelation are not explicitly considered.

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.

Analyzing Different Contexts for Energy Terms through Text Mining of Online Science News Articles (온라인 과학 기사 텍스트 마이닝을 통해 분석한 에너지 용어 사용의 맥락)

  • Oh, Chi Yeong;Kang, Nam-Hwa
    • Journal of Science Education
    • /
    • v.45 no.3
    • /
    • pp.292-303
    • /
    • 2021
  • This study identifies the terms frequently used together with energy in online science news articles and topics of the news reports to find out how the term energy is used in everyday life and to draw implications for science curriculum and instruction about energy. A total of 2,171 online news articles in science category published by 11 major newspaper companies in Korea for one year from March 1, 2018 were selected by using energy as a search term. As a result of natural language processing, a total of 51,224 sentences consisting of 507,901 words were compiled for analysis. Using the R program, term frequency analysis, semantic network analysis, and structural topic modeling were performed. The results show that the terms with exceptionally high frequencies were technology, research, and development, which reflected the characteristics of news articles that report new findings. On the other hand, terms used more than once per two articles were industry-related terms (industry, product, system, production, market) and terms that were sufficiently expected as energy-related terms such as 'electricity' and 'environment.' Meanwhile, 'sun', 'heat', 'temperature', and 'power generation', which are frequently used in energy-related science classes, also appeared as terms belonging to the highest frequency. From a network analysis, two clusters were found including terms related to industry and technology and terms related to basic science and research. From the analysis of terms paired with energy, it was also found that terms related to the use of energy such as 'energy efficiency,' 'energy saving,' and 'energy consumption' were the most frequently used. Out of 16 topics found, four contexts of energy were drawn including 'high-tech industry,' 'industry,' 'basic science,' and 'environment and health.' The results suggest that the introduction of the concept of energy degradation as a starting point for energy classes can be effective. It also shows the need to introduce high-tech industries or the context of environment and health into energy learning.

Quantitative Assessment Technology of Small Animal Myocardial Infarction PET Image Using Gaussian Mixture Model (다중가우시안혼합모델을 이용한 소동물 심근경색 PET 영상의 정량적 평가 기술)

  • Woo, Sang-Keun;Lee, Yong-Jin;Lee, Won-Ho;Kim, Min-Hwan;Park, Ji-Ae;Kim, Jin-Su;Kim, Jong-Guk;Kang, Joo-Hyun;Ji, Young-Hoon;Choi, Chang-Woon;Lim, Sang-Moo;Kim, Kyeong-Min
    • Progress in Medical Physics
    • /
    • v.22 no.1
    • /
    • pp.42-51
    • /
    • 2011
  • Nuclear medicine images (SPECT, PET) were widely used tool for assessment of myocardial viability and perfusion. However it had difficult to define accurate myocardial infarct region. The purpose of this study was to investigate methodological approach for automatic measurement of rat myocardial infarct size using polar map with adaptive threshold. Rat myocardial infarction model was induced by ligation of the left circumflex artery. PET images were obtained after intravenous injection of 37 MBq $^{18}F$-FDG. After 60 min uptake, each animal was scanned for 20 min with ECG gating. PET data were reconstructed using ordered subset expectation maximization (OSEM) 2D. To automatically make the myocardial contour and generate polar map, we used QGS software (Cedars-Sinai Medical Center). The reference infarct size was defined by infarction area percentage of the total left myocardium using TTC staining. We used three threshold methods (predefined threshold, Otsu and Multi Gaussian mixture model; MGMM). Predefined threshold method was commonly used in other studies. We applied threshold value form 10% to 90% in step of 10%. Otsu algorithm calculated threshold with the maximum between class variance. MGMM method estimated the distribution of image intensity using multiple Gaussian mixture models (MGMM2, ${\cdots}$ MGMM5) and calculated adaptive threshold. The infarct size in polar map was calculated as the percentage of lower threshold area in polar map from the total polar map area. The measured infarct size using different threshold methods was evaluated by comparison with reference infarct size. The mean difference between with polar map defect size by predefined thresholds (20%, 30%, and 40%) and reference infarct size were $7.04{\pm}3.44%$, $3.87{\pm}2.09%$ and $2.15{\pm}2.07%$, respectively. Otsu verse reference infarct size was $3.56{\pm}4.16%$. MGMM methods verse reference infarct size was $2.29{\pm}1.94%$. The predefined threshold (30%) showed the smallest mean difference with reference infarct size. However, MGMM was more accurate than predefined threshold in under 10% reference infarct size case (MGMM: 0.006%, predefined threshold: 0.59%). In this study, we was to evaluate myocardial infarct size in polar map using multiple Gaussian mixture model. MGMM method was provide adaptive threshold in each subject and will be a useful for automatic measurement of infarct size.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Comparison of quality characteristics between seasonal cultivar of salted-Kimchi cabbage (Brassica rapa L. ssp. Pekinesis) (계절별 절임배추의 품질 특성 비교)

    • Ku, Kyung Hyung;Choi, Eun Jeong;Jeong, Moon Cheol
      • Food Science and Preservation
      • /
      • v.21 no.4
      • /
      • pp.512-519
      • /
      • 2014
    • This study was carried to investigate the physicochemical and microbiological characteristics of seasonal salted-Kimchi cabbage order to provide basic data for optimal salting and storage condition of seasonal Kimchi cabbage. Generally, fall season samples had slightly higher pH and acidity value than the other seasonal salted Kimchi cabbage. The soluble solids content of spring, summer, fall and winter samples were 5.95%, 6.18%, 6.29% and 7.76%, respectively. The salt content of all the seasonal salted Kimchi cabbage samples were insignificant. The number of microbial bacteria in the summer samples were generally much more significant than spring and winter samples. There was no significant difference in the color of seasonal salted Kimchi cabbage. As for the texture properties, the firmest samples in the surface rupture test were the spring samples (force: 4.92 kg), and the hardest samples in the puncture test were the summer samples (force: 11.71 kg). In the correlation analysis of the quality characteristics of seasonal samples, the soluble solids content and hardness of the seasonal salted Kimchi cabbage was significantly correlated at 1% significance level. Also, in the principal component analysis, F1 and F2 were shown to explain 27.28% and 35.59% of the total variance (62.87%), respectively. The hierarchical cluster analysis of the quality characteristics of seasonal samples, the samples were divided into three groups: spring cabbage group, summer cabbage group and fall and winter cabbage group.

    The Characteristics and Performances of Manufacturing SMEs that Utilize Public Information Support Infrastructure (공공 정보지원 인프라 활용한 제조 중소기업의 특징과 성과에 관한 연구)

    • Kim, Keun-Hwan;Kwon, Taehoon;Jun, Seung-pyo
      • Journal of Intelligence and Information Systems
      • /
      • v.25 no.4
      • /
      • pp.1-33
      • /
      • 2019
    • The small and medium sized enterprises (hereinafter SMEs) are already at a competitive disadvantaged when compared to large companies with more abundant resources. Manufacturing SMEs not only need a lot of information needed for new product development for sustainable growth and survival, but also seek networking to overcome the limitations of resources, but they are faced with limitations due to their size limitations. In a new era in which connectivity increases the complexity and uncertainty of the business environment, SMEs are increasingly urged to find information and solve networking problems. In order to solve these problems, the government funded research institutes plays an important role and duty to solve the information asymmetry problem of SMEs. The purpose of this study is to identify the differentiating characteristics of SMEs that utilize the public information support infrastructure provided by SMEs to enhance the innovation capacity of SMEs, and how they contribute to corporate performance. We argue that we need an infrastructure for providing information support to SMEs as part of this effort to strengthen of the role of government funded institutions; in this study, we specifically identify the target of such a policy and furthermore empirically demonstrate the effects of such policy-based efforts. Our goal is to help establish the strategies for building the information supporting infrastructure. To achieve this purpose, we first classified the characteristics of SMEs that have been found to utilize the information supporting infrastructure provided by government funded institutions. This allows us to verify whether selection bias appears in the analyzed group, which helps us clarify the interpretative limits of our study results. Next, we performed mediator and moderator effect analysis for multiple variables to analyze the process through which the use of information supporting infrastructure led to an improvement in external networking capabilities and resulted in enhancing product competitiveness. This analysis helps identify the key factors we should focus on when offering indirect support to SMEs through the information supporting infrastructure, which in turn helps us more efficiently manage research related to SME supporting policies implemented by government funded institutions. The results of this study showed the following. First, SMEs that used the information supporting infrastructure were found to have a significant difference in size in comparison to domestic R&D SMEs, but on the other hand, there was no significant difference in the cluster analysis that considered various variables. Based on these findings, we confirmed that SMEs that use the information supporting infrastructure are superior in size, and had a relatively higher distribution of companies that transact to a greater degree with large companies, when compared to the SMEs composing the general group of SMEs. Also, we found that companies that already receive support from the information infrastructure have a high concentration of companies that need collaboration with government funded institution. Secondly, among the SMEs that use the information supporting infrastructure, we found that increasing external networking capabilities contributed to enhancing product competitiveness, and while this was no the effect of direct assistance, we also found that indirect contributions were made by increasing the open marketing capabilities: in other words, this was the result of an indirect-only mediator effect. Also, the number of times the company received additional support in this process through mentoring related to information utilization was found to have a mediated moderator effect on improving external networking capabilities and in turn strengthening product competitiveness. The results of this study provide several insights that will help establish policies. KISTI's information support infrastructure may lead to the conclusion that marketing is already well underway, but it intentionally supports groups that enable to achieve good performance. As a result, the government should provide clear priorities whether to support the companies in the underdevelopment or to aid better performance. Through our research, we have identified how public information infrastructure contributes to product competitiveness. Here, we can draw some policy implications. First, the public information support infrastructure should have the capability to enhance the ability to interact with or to find the expert that provides required information. Second, if the utilization of public information support (online) infrastructure is effective, it is not necessary to continuously provide informational mentoring, which is a parallel offline support. Rather, offline support such as mentoring should be used as an appropriate device for abnormal symptom monitoring. Third, it is required that SMEs should improve their ability to utilize, because the effect of enhancing networking capacity through public information support infrastructure and enhancing product competitiveness through such infrastructure appears in most types of companies rather than in specific SMEs.