• Title/Summary/Keyword: Information System Development

Search Result 15,831, Processing Time 0.049 seconds

A Study of Factors Associated with Software Developers Job Turnover (데이터마이닝을 활용한 소프트웨어 개발인력의 업무 지속수행의도 결정요인 분석)

  • Jeon, In-Ho;Park, Sun W.;Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.191-204
    • /
    • 2015
  • According to the '2013 Performance Assessment Report on the Financial Program' from the National Assembly Budget Office, the unfilled recruitment ratio of Software(SW) Developers in South Korea was 25% in the 2012 fiscal year. Moreover, the unfilled recruitment ratio of highly-qualified SW developers reaches almost 80%. This phenomenon is intensified in small and medium enterprises consisting of less than 300 employees. Young job-seekers in South Korea are increasingly avoiding becoming a SW developer and even the current SW developers want to change careers, which hinders the national development of IT industries. The Korean government has recently realized the problem and implemented policies to foster young SW developers. Due to this effort, it has become easier to find young SW developers at the beginning-level. However, it is still hard to recruit highly-qualified SW developers for many IT companies. This is because in order to become a SW developing expert, having a long term experiences are important. Thus, improving job continuity intentions of current SW developers is more important than fostering new SW developers. Therefore, this study surveyed the job continuity intentions of SW developers and analyzed the factors associated with them. As a method, we carried out a survey from September 2014 to October 2014, which was targeted on 130 SW developers who were working in IT industries in South Korea. We gathered the demographic information and characteristics of the respondents, work environments of a SW industry, and social positions for SW developers. Afterward, a regression analysis and a decision tree method were performed to analyze the data. These two methods are widely used data mining techniques, which have explanation ability and are mutually complementary. We first performed a linear regression method to find the important factors assaociated with a job continuity intension of SW developers. The result showed that an 'expected age' to work as a SW developer were the most significant factor associated with the job continuity intention. We supposed that the major cause of this phenomenon is the structural problem of IT industries in South Korea, which requires SW developers to change the work field from developing area to management as they are promoted. Also, a 'motivation' to become a SW developer and a 'personality (introverted tendency)' of a SW developer are highly importantly factors associated with the job continuity intention. Next, the decision tree method was performed to extract the characteristics of highly motivated developers and the low motivated ones. We used well-known C4.5 algorithm for decision tree analysis. The results showed that 'motivation', 'personality', and 'expected age' were also important factors influencing the job continuity intentions, which was similar to the results of the regression analysis. In addition to that, the 'ability to learn' new technology was a crucial factor for the decision rules of job continuity. In other words, a person with high ability to learn new technology tends to work as a SW developer for a longer period of time. The decision rule also showed that a 'social position' of SW developers and a 'prospect' of SW industry were minor factors influencing job continuity intensions. On the other hand, 'type of an employment (regular position/ non-regular position)' and 'type of company (ordering company/ service providing company)' did not affect the job continuity intension in both methods. In this research, we demonstrated the job continuity intentions of SW developers, who were actually working at IT companies in South Korea, and we analyzed the factors associated with them. These results can be used for human resource management in many IT companies when recruiting or fostering highly-qualified SW experts. It can also help to build SW developer fostering policy and to solve the problem of unfilled recruitment of SW Developers in South Korea.

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

A Study on the Technical and Administrative Innovation of Library Organization in the Perspective of the Contingency Theory (도서관조직의 기술혁신 및 행정혁신에 관한 조직상황론적 연구)

  • Hong Hyun-Jin
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.25
    • /
    • pp.343-388
    • /
    • 1993
  • The ability of any organization to innovate itself in a rapid change of environment means the existence of the organization. Innovative activity is achieved in different ways according to the objectives of organization. the characteristics of external environmental factors. and various attributes in organization. In the present study. all the existing approaches to the innovative nature of organization were synthetically compared to each other and evaluated: then. for a more rational approach. a research model was built and suggested by establishing the inclusive variables of the innovative nature of library organization and categorizing the types of such nature. Additionally. an empirical. analytical study on such a model was done. That is. paying regard to the fact that innovation has basically a close relation with the circumstantial factors of organization. synthetic, circumstantial relations were clarified. considering the external environmental factors and internal characteristics of organization. In the study. the innovation of library organization was seen in two parts i.e .. the feasible degree of technical innovation and the feasible degree of administrative innovation. Regarding the types of innovative implementation. according to the feasible degree of innovation, four types such as a stationary type. technic-oriented type, organization-oriented type. and technical-socio systematic type were classified. There were nine independent variables-i.e., the scale of organization. available resources of the organization, formalization, differentiation, specialization. decentralization, recognizant degree of the technical attribute. degree of response to the change of technical environment, and professional activities. There were three subordinate variables - i.e., technical innovation, administrative innovation. and the performance of organization. Through establishment of such variables, the factors which might influence the innovation of library organization were understood, and with the types of the innovative implementation of library organization being classified according to the feasible degree of innovation. the characteristics of library organization were reviewed in the light of each type. Also. the performance of library organization according to the types of the innovative implementation of library organization was analyzed. and the relations between the types of innovative implementation according to circumstantial variables and the performance of library organization were clarified. In order to clarify the adequacy of the research model in the methodology of empirical study, data were collected from 72 university libraries and 38 special libraries. and for a hypothetical test of the research model. an analysis of correlations, a stepwise regression analysis. and One Way ANOVA were utilized. The following are the major results or findings from the study 1) It appeared there is a trend that the bigger the scale of organization and available resources are, the more active the professional activity of the managerial class is, and the higher the recognizant degree of technical environment (recognizant degree of technical attributes and the degree of response t9 the change of technical environment) is, the higher the feasible degree of innovation becomes. 2) It appeared that among the variables influencing the feasible degree of technical innovation, the order from the variable influencing most was first, the recognizant degree of technical innovation: second, the available resources of organization: and third, professional activity. Regarding the variables influencing the feasible degree of administrative innovation from the most influential variable, it appeared they were the available resources of organization, the differentiation of organization. and the degree of response to the change of technical environment. 3) It appeared that the higher the educational level of the managerial class is, the more active the professional activity becomes. It seemed there is a trend that the group of library managers whose experience as a librarian was at the middle level(three years to six years of experience) was more active in research activity than the group of library managers whose experience as a librarian was at a higher level(more than ten years). Also, it appeared there is a trend that the lower the age of library managers is, the higher the recognizant degree of technical attributes becomes. and the group of library managers whose experience as a librarian was at the middle level (three years to six years of experience) recognized more affirmatively the technical aspect than the group of library managers whose experience as a librarian was at a higher level(more than 10 years). Also, it appeared that, when the activity of the professional association and research activity are active, the recognizant degree of technology becomes higher, and as a result. it influences the innovative nature of organization(the feasible degree of technical innovation and the feasible degree of administrative innovation). 4) As a result of the comparison and analysis of the characteristics of library organization according to the types of innovative implementation of library organization. it was indicated there is a trend that the larger the available resources of library organization, the higher the organic nature of organization such as differentiation. decentralization, etc., and the higher the level of the operation of system development, the more the type of the innovative implementation of library organization becomes the technical-socio systematic type which is higher both in the practical degrees of technical innovation and administrative innovation. 5) As a result of the comparison and analysis of the relations between the types of innovative implementation and the performance of organization, it appeared that the order from the highest performance of organization is the technical-socio systematic type, then the technic-oriented type, the organization­oriented type, and finally the stationary type which is lowest in such performance. That is, it demonstrated that, since the performance of library organization is highest in the library of the technical-socio systematic type while it is lowest in the library whose practical degrees in both technical innovation and administrative innovation are low, the performance of library organization differs significantly according to the types of innovative implementation of library organization. The present study has extracted the factors influencing innovation, classified systematically the types of innovative implementation, and inferred the synthetical, circumstantial correlations between the types and the performance of organization, and empirically inspected those factors. However, due to the present study's restrictive matters and the limit of the research design, results from the study should be more prudently interpreted. Also, the present study, as an investigative study of the types of innovative implementation, with few preceding studies, requires more complete hypothetical inference based on the results of the present study. In other words, if more systematical studies are given to understanding the relations, it will devote the suggestion and demonstration of a more useful theory.

  • PDF

Bankruptcy Forecasting Model using AdaBoost: A Focus on Construction Companies (적응형 부스팅을 이용한 파산 예측 모형: 건설업을 중심으로)

  • Heo, Junyoung;Yang, Jin Yong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.35-48
    • /
    • 2014
  • According to the 2013 construction market outlook report, the liquidation of construction companies is expected to continue due to the ongoing residential construction recession. Bankruptcies of construction companies have a greater social impact compared to other industries. However, due to the different nature of the capital structure and debt-to-equity ratio, it is more difficult to forecast construction companies' bankruptcies than that of companies in other industries. The construction industry operates on greater leverage, with high debt-to-equity ratios, and project cash flow focused on the second half. The economic cycle greatly influences construction companies. Therefore, downturns tend to rapidly increase the bankruptcy rates of construction companies. High leverage, coupled with increased bankruptcy rates, could lead to greater burdens on banks providing loans to construction companies. Nevertheless, the bankruptcy prediction model concentrated mainly on financial institutions, with rare construction-specific studies. The bankruptcy prediction model based on corporate finance data has been studied for some time in various ways. However, the model is intended for all companies in general, and it may not be appropriate for forecasting bankruptcies of construction companies, who typically have high liquidity risks. The construction industry is capital-intensive, operates on long timelines with large-scale investment projects, and has comparatively longer payback periods than in other industries. With its unique capital structure, it can be difficult to apply a model used to judge the financial risk of companies in general to those in the construction industry. Diverse studies of bankruptcy forecasting models based on a company's financial statements have been conducted for many years. The subjects of the model, however, were general firms, and the models may not be proper for accurately forecasting companies with disproportionately large liquidity risks, such as construction companies. The construction industry is capital-intensive, requiring significant investments in long-term projects, therefore to realize returns from the investment. The unique capital structure means that the same criteria used for other industries cannot be applied to effectively evaluate financial risk for construction firms. Altman Z-score was first published in 1968, and is commonly used as a bankruptcy forecasting model. It forecasts the likelihood of a company going bankrupt by using a simple formula, classifying the results into three categories, and evaluating the corporate status as dangerous, moderate, or safe. When a company falls into the "dangerous" category, it has a high likelihood of bankruptcy within two years, while those in the "safe" category have a low likelihood of bankruptcy. For companies in the "moderate" category, it is difficult to forecast the risk. Many of the construction firm cases in this study fell in the "moderate" category, which made it difficult to forecast their risk. Along with the development of machine learning using computers, recent studies of corporate bankruptcy forecasting have used this technology. Pattern recognition, a representative application area in machine learning, is applied to forecasting corporate bankruptcy, with patterns analyzed based on a company's financial information, and then judged as to whether the pattern belongs to the bankruptcy risk group or the safe group. The representative machine learning models previously used in bankruptcy forecasting are Artificial Neural Networks, Adaptive Boosting (AdaBoost) and, the Support Vector Machine (SVM). There are also many hybrid studies combining these models. Existing studies using the traditional Z-Score technique or bankruptcy prediction using machine learning focus on companies in non-specific industries. Therefore, the industry-specific characteristics of companies are not considered. In this paper, we confirm that adaptive boosting (AdaBoost) is the most appropriate forecasting model for construction companies by based on company size. We classified construction companies into three groups - large, medium, and small based on the company's capital. We analyzed the predictive ability of AdaBoost for each group of companies. The experimental results showed that AdaBoost has more predictive ability than the other models, especially for the group of large companies with capital of more than 50 billion won.

Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining (데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발)

  • Yoon, Seungjin;Kim, Suhwan;Shin, Kyungshik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.1-17
    • /
    • 2015
  • In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.

Implementation Strategy for the Elderly Care Solution Based on Usage Log Analysis: Focusing on the Case of Hyodol Product (사용자 로그 분석에 기반한 노인 돌봄 솔루션 구축 전략: 효돌 제품의 사례를 중심으로)

  • Lee, Junsik;Yoo, In-Jin;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.117-140
    • /
    • 2019
  • As the aging phenomenon accelerates and various social problems related to the elderly of the vulnerable are raised, the need for effective elderly care solutions to protect the health and safety of the elderly generation is growing. Recently, more and more people are using Smart Toys equipped with ICT technology for care for elderly. In particular, log data collected through smart toys is highly valuable to be used as a quantitative and objective indicator in areas such as policy-making and service planning. However, research related to smart toys is limited, such as the development of smart toys and the validation of smart toy effectiveness. In other words, there is a dearth of research to derive insights based on log data collected through smart toys and to use them for decision making. This study will analyze log data collected from smart toy and derive effective insights to improve the quality of life for elderly users. Specifically, the user profiling-based analysis and elicitation of a change in quality of life mechanism based on behavior were performed. First, in the user profiling analysis, two important dimensions of classifying the type of elderly group from five factors of elderly user's living management were derived: 'Routine Activities' and 'Work-out Activities'. Based on the dimensions derived, a hierarchical cluster analysis and K-Means clustering were performed to classify the entire elderly user into three groups. Through a profiling analysis, the demographic characteristics of each group of elderlies and the behavior of using smart toy were identified. Second, stepwise regression was performed in eliciting the mechanism of change in quality of life. The effects of interaction, content usage, and indoor activity have been identified on the improvement of depression and lifestyle for the elderly. In addition, it identified the role of user performance evaluation and satisfaction with smart toy as a parameter that mediated the relationship between usage behavior and quality of life change. Specific mechanisms are as follows. First, the interaction between smart toy and elderly was found to have an effect of improving the depression by mediating attitudes to smart toy. The 'Satisfaction toward Smart Toy,' a variable that affects the improvement of the elderly's depression, changes how users evaluate smart toy performance. At this time, it has been identified that it is the interaction with smart toy that has a positive effect on smart toy These results can be interpreted as an elderly with a desire to meet emotional stability interact actively with smart toy, and a positive assessment of smart toy, greatly appreciating the effectiveness of smart toy. Second, the content usage has been confirmed to have a direct effect on improving lifestyle without going through other variables. Elderly who use a lot of the content provided by smart toy have improved their lifestyle. However, this effect has occurred regardless of the attitude the user has toward smart toy. Third, log data show that a high degree of indoor activity improves both the lifestyle and depression of the elderly. The more indoor activity, the better the lifestyle of the elderly, and these effects occur regardless of the user's attitude toward smart toy. In addition, elderly with a high degree of indoor activity are satisfied with smart toys, which cause improvement in the elderly's depression. However, it can be interpreted that elderly who prefer outdoor activities than indoor activities, or those who are less active due to health problems, are hard to satisfied with smart toys, and are not able to get the effects of improving depression. In summary, based on the activities of the elderly, three groups of elderly were identified and the important characteristics of each type were identified. In addition, this study sought to identify the mechanism by which the behavior of the elderly on smart toy affects the lives of the actual elderly, and to derive user needs and insights.

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Policy Direction for The Farmland Sizing Suitable to Regional Trait (지역특성을 반영한 영농규모화사업의 발전방향-충남지역을 중심으로-)

    • Shim, Jae-Sung
      • The Journal of Natural Sciences
      • /
      • v.14 no.1
      • /
      • pp.83-121
      • /
      • 2004
    • This study was carried out to examine how solid the production foundation of rice in Chung-Nam Province is, and, if not, to probe alternative measures through the size of farms specializing in rice, of which direction would be a pivot of rice industry-oriented policy. The results obtained can be summarized as follows : 1. The amount of rice production in Chung-Nam Province is highest in Korea and the size of paddy field area is the second largest : This implying that the probability that rice production in Chung-Nam Province would be severely influenced by a global trend of market conditions. The number of farms specializing in rice becoming the core group of rice farming account for 7.7 percent of the total number of farm household in Korea. Average field area financial support which had been input to farm household by Government had a noticeable effect on the improvement of the policy of farm-size program. 2. Farm-size program in Chung-Nam Province established from 1980 to 2002 in creased the cultivation size of paddy field to 19,484 hectares, and this program enhanced the buying and selling of farmland and the number of farmland bargain reached 6,431 household and 16,517 hectares, respectively, in 1995-2002. Meanwhile, long-term letting and hiring of farmland appeared so active that the bargain acreage reached 6,970 hectares, and farm involved was 7,059 households, however, the farm-exchange-and-unity program did not satisfy our expectation, because the retirement farm operators reluctantly participated to sell their farms. Another reason that had delayed the bargain of farms rested on the general category of social complication attendant upon the exchange and unity operation for scattered farm. Such difficulties would work negative effects out to carry on the target of farm-size work in general. 3. The following measures were presented to propel the farm-size promotion program : a. Occupation shift project, followed by the social security program for retirement and elderly farm operators, should be promptly established and also a number of types of incentives for promoting the letting and hiring work and farm-exchange-and-unity program would also be set up. b. To establish the effective key system of rice production, all the farm operators should increase the unit area yield of rice and lower the production cost. To do so, a great deal of production teams of rice equipped with managerial techniques and capabilities need to be organized. And, also, there should be appropriate arrays of facilities including information system. This plan is desirable to be in line with a diversity of the structural implement of regional integration based on farm system building. c. To extend the size of farm and to improve farm management, we have to devise the enlargement of individual size of farm for maximized management and the utilization of farm-size grouping method. In conclusion, it can be said that the farm-size project in Chung-Nam Province which has continued since the 1980s was satisfactorily achieved. However, we still have a lot of problems to be solved to break down the barrier for attainment of the desirable farm-size operation work.. Farm-size project has fairly close relation with farm specialization in rice and, thus, the positive support for farm household including the integrated program for both retirement farmers and off-farm operators should be considered to pursue the progressive development of the farm-size program, which is key means to successful achievement of rice farming enforcement in Chung-Nam Province.

    • PDF

    Structure of Export Competition between Asian NIEs and Japan in the U.S. Import Market and Exchange Rate Effects (한국(韓國)의 아시아신흥공업국(新興工業國) 및 일본(日本)과의 대미수출경쟁(對美輸出競爭) : 환율효과(換率效果)를 중심(中心)으로)

    • Jwa, Sung-hee
      • KDI Journal of Economic Policy
      • /
      • v.12 no.2
      • /
      • pp.3-49
      • /
      • 1990
    • This paper analyzes U.S. demand for imports from Asian NIEs and Japan, utilizing the Almost Ideal Demand System (AIDS) developed by Deaton and Muellbauer, with an emphasis on the effect of changes in the exchange rate. The empirical model assumes a two-stage budgeting process in which the first stage represents the allocation of total U.S. demand among three groups: the Asian NIEs and Japan, six Western developed countries, and the U.S. domestic non-tradables and import competing sector. The second stage represents the allocation of total U.S. imports from the Asian NIEs and Japan among them, by country. According to the AIDS model, the share equation for the Asia NIEs and Japan in U.S. nominal GNP is estimated as a single equation for the first stage. The share equations for those five countries in total U.S. imports are estimated as a system with the general demand restrictions of homogeneity, symmetry and adding-up, together with polynomially distributed lag restrictions. The negativity condition is also satisfied for all cases. The overall results of these complicated estimations, using quarterly data from the first quarter of 1972 to the fourth quarter of 1989, are quite promising in terms of the significance of individual estimators and other statistics. The conclusions drawn from the estimation results and the derived demand elasticities can be summarized as follows: First, the exports of each Asian NIE to the U.S. are competitive with (substitutes for) Japan's exports, while complementary to the exports of fellow NIEs, with the exception of the competitive relation between Hong Kong and Singapore. Second, the exports of each Asian NIE and of Japan to the U.S. are competitive with those of Western developed countries' to the U.S, while they are complementary to the U.S.' non-tradables and import-competing sector. Third, as far as both the first and second stages of budgeting are coneidered, the imports from each Asian NIE and Japan are luxuries in total U.S. consumption. However, when only the second budgeting stage is considered, the imports from Japan and Singapore are luxuries in U.S. imports from the NIEs and Japan, while those of Korea, Taiwan and Hong Kong are necessities. Fourth, the above results may be evidenced more concretely in their implied exchange rate effects. It appears that, in general, a change in the yen-dollar exchange rate will have at least as great an impact, on an NIE's share and volume of exports to the U.S. though in the opposite direction, as a change in the exchange rate of the NIE's own currency $vis-{\grave{a}}-vis$ the dollar. Asian NIEs, therefore, should counteract yen-dollar movements in order to stabilize their exports to the U.S.. More specifically, Korea should depreciate the value of the won relative to the dollar by approximately the same proportion as the depreciation rate of the yen $vis-{\grave{a}}-vis$ the dollar, in order to maintain the volume of Korean exports to the U.S.. In the worst case scenario, Korea should devalue the won by three times the maguitude of the yen's depreciation rate, in order to keep market share in the aforementioned five countries' total exports to the U.S.. Finally, this study provides additional information which may support empirical findings on the competitive relations among the Asian NIEs and Japan. The correlation matrices among the strutures of those five countries' exports to the U.S.. during the 1970s and 1980s were estimated, with the export structure constructed as the shares of each of the 29 industrial sectors' exports as defined by the 3 digit KSIC in total exports to the U.S. from each individual country. In general, the correlation between each of the four Asian NIEs and Japan, and that between Hong Kong and Singapore, are all far below .5, while the ones among the Asian NIEs themselves (except for the one between Hong Kong and Singapore) all greatly exceed .5. If there exists a tendency on the part of the U.S. to import goods in each specific sector from different countries in a relatively constant proportion, the export structures of those countries will probably exhibit a high correlation. To take this hypothesis to the extreme, if the U.S. maintained an absolutely fixed ratio between its imports from any two countries for each of the 29 sectors, the correlation between the export structures of these two countries would be perfect. Therefore, since any two goods purchased in a fixed proportion could be classified as close complements, a high correlation between export structures will imply a complementary relationship between them. Conversely, low correlation would imply a competitive relationship. According to this interpretation, the pattern formed by the correlation coefficients among the five countries' export structures to the U.S. are consistent with the empirical findings of the regression analysis.

    • PDF

    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.