• Title/Summary/Keyword: Classification Tree

Search Result 910, Processing Time 0.041 seconds

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

  • Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.177-192
    • /
    • 2016
  • Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.

Vegetation Structure of Abies holophylla Forest near Woljeong Temple in Odaesan National Park (오대산국립공원 월정사 전나무숲 식생구조 분석)

  • Lee, Kyong-Jae;Kim, Ji-Seok;Choi, Jin-Woo;Han, Bong-Ho
    • Korean Journal of Environment and Ecology
    • /
    • v.22 no.2
    • /
    • pp.173-183
    • /
    • 2008
  • This research was aimed at looking into the vegetation structure of Abies holophylla forest distributed between Iljumun of Woljeong Temple and Keumgang bridge in Odaesan National Park. It was found that existed a total of 977 tree of Abies holophylla which are more than 20cm in DBH within the target site, and in 2006 when the survey was made, the number of fallen trees and poor growth trees was about 96, accounting for 9.8% of all. The age of Abies holophylla ranged from 41 years to 135 years($11\sim82cm$ in DBH). The number of Abies holophylla over 100cm in DBH was 8 and the largest Abies holophylla was 175cm in DBH and 31m in height. Its density was 5.9 individuals per $400m^2$. As a result of the analysis of the plant community structure using the TWINSP AN classification, Abies holophylla was divided into four community types. Firstly, Pinus densiflora-Abies holophylla community was predicted to vary into Abies holophylla community. In case of other three other communities, Abies holophylla communities were predicted to compete with deciduous broadleaf trees, such as Tilia amurensis and Acer pictum subsp. mono. Abies holophylla forest adjacent to Woljeong Temple of Odaesan National Park has a high value as sustainable resources for culture, landscape and tourism. Thus, it is necessary to clarify the reason for the incidence of poor growth trees and fallen trees among all trees of Abies holophylla and take counter-measures against it for the preservation and management of Abies holophylla forest. In addition, a more aggressive managrment like getting rid of the deciduous broadleaf trees, such as Tilia amurensis and Acer pictum subsp. mono, which appear mostly on understory layer or shrub layer within Abies holophylla, and continuous management is also needed for the young trees of Abies holophylla which are feared to be pressurized outside from their neighboring trees because their initial growth after germination is very slow.

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

Vegetation Characteristics of Ridge in the Seonunsan Provincial Park (선운산도립공원의 능선부 식생 특성)

  • Kang, Hyun-Mi;Park, Seok-Gon;Kim, Ji-Suk;Lee, Sang-Cheol;Choi, Song-Hyun
    • Korean Journal of Environment and Ecology
    • /
    • v.33 no.1
    • /
    • pp.75-85
    • /
    • 2019
  • The purpose of this study is to understand the vegetation characteristics of ridges (Gyeongsusan-Seonunsan-Gaeipalsan) in the Seonunsan Provincial Park and to establish reference information for the management of the park in the future. We designated 62 plots with the area of $100m^2$ were installed and analyzed them to investigate the vegetation characteristics. The results of community classification based on TWINSPAN showed seven categories of vegetation communities in the surveyed region: Quercus dentata-Deciduous broad-leaved Community, Quercus variabilis-Pinus thunbergii-Quercus serrata Community, Pinus densiflora Community, Deciduous broad-leaved Community-I, Carpinus tschonoskii-Castanea crenata-Quercus aliena Community, Deciduous broad-leaved Community-II, and Carpinus tschonoskii-Carpinus laxiflora Community. In the vegetation of Seonunsan Provincial Park, coniferous trees such as Pinus thunbergii and Pinus densiflora have been gradually losing their population as part of ecological succession to deciduous broad-leaved trees such as Quercus spp., Carpinus tschonoskii, and Carpinus laxiflora. Moreover, Carpinus turczaninowii, Mallotus japonicus, and others were identified as vegetation reflecting the geographical characteristics of the region neighboring the west coast. The estimated age is 30-60 years, and the oldest tree Pinus densiflora is 63-years old. The index of diversity ($100m^2$) was 0.7942 for Carpinus tschonoskii-Carpinus laxiflora Community, 0.8406 for Carpinus tschonoskii-Castanea crenata-Quercus aliena Community, 0.8543 for Quercus dentata-Deciduous broad-leaved Community, 0.9434 for Quercus variabilis-Pinus thunbergii-Quercus serrata Community, 0.9520 for Deciduous broad-leaved Community-I, 0.9633 for Pinus densiflora Community, and 1.0340 for Deciduous broad-leaved Community-II in the ascending order.

Development of 1ST-Model for 1 hour-heavy rain damage scale prediction based on AI models (1시간 호우피해 규모 예측을 위한 AI 기반의 1ST-모형 개발)

  • Lee, Joonhak;Lee, Haneul;Kang, Narae;Hwang, Seokhwan;Kim, Hung Soo;Kim, Soojun
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.5
    • /
    • pp.311-323
    • /
    • 2023
  • In order to reduce disaster damage by localized heavy rains, floods, and urban inundation, it is important to know in advance whether natural disasters occur. Currently, heavy rain watch and heavy rain warning by the criteria of the Korea Meteorological Administration are being issued in Korea. However, since this one criterion is applied to the whole country, we can not clearly recognize heavy rain damage for a specific region in advance. Therefore, in this paper, we tried to reset the current criteria for a special weather report which considers the regional characteristics and to predict the damage caused by rainfall after 1 hour. The study area was selected as Gyeonggi-province, where has more frequent heavy rain damage than other regions. Then, the rainfall inducing disaster or hazard-triggering rainfall was set by utilizing hourly rainfall and heavy rain damage data, considering the local characteristics. The heavy rain damage prediction model was developed by a decision tree model and a random forest model, which are machine learning technique and by rainfall inducing disaster and rainfall data. In addition, long short-term memory and deep neural network models were used for predicting rainfall after 1 hour. The predicted rainfall by a developed prediction model was applied to the trained classification model and we predicted whether the rain damage after 1 hour will be occurred or not and we called this as 1ST-Model. The 1ST-Model can be used for preventing and preparing heavy rain disaster and it is judged to be of great contribution in reducing damage caused by heavy rain.

Comparative Study on the Carbon Stock Changes Measurement Methodologies of Perennial Woody Crops-focusing on Overseas Cases (다년생 목본작물의 탄소축적 변화량 산정방법론 비교 연구-해외사례를 중심으로)

  • Hae-In Lee;Yong-Ju Lee;Kyeong-Hak Lee;Chang-Bae Lee
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.4
    • /
    • pp.258-266
    • /
    • 2023
  • This study analyzed methodologies for estimating carbon stocks of perennial woody crops and the research cases in overseas countries. As a result, we found that Australia, Bulgaria, Canada, and Japan are using the stock-difference method, while Austria, Denmark, and Germany are estimating the change in the carbon stock based on the gain-loss method. In some overseas countries, the researches were conducted on estimating the carbon stock change using image data as tier 3 phase beyond the research developing country-specific factors as tier 2 phase. In South Korea, convergence studies as the third stage were conducted in forestry field, but advanced research in the agricultural field is at the beginning stage. Based on these results, we suggest directions for the following four future researches: 1) securing national-specific factors related to emissions and removals in the agricultural field through the development of allometric equation and carbon conversion factors for perennial woody crops to improve the completeness of emission and removals statistics, 2) implementing policy studies on the cultivation area calculation refinement with fruit tree-biomass-based maturity, 3) developing a more advanced estimation technique for perennial woody crops in the agricultural sector using allometric equation and remote sensing techniques based on the agricultural and forestry satellite scheduled to be launched in 2025, and to establish a matrix and monitoring system for perennial woody crop cultivation areas in the agricultural sector, Lastly, 4) estimating soil carbon stocks change, which is currently estimated by treating all agricultural areas as one, by sub-land classification to implement a dynamic carbon cycle model. This study suggests a detailed guideline and advanced methods of carbon stock change calculation for perennial woody crops, which supports 2050 Carbon Neutral Strategy of Ministry of Agriculture, Food, and Rural Affairs and activate related research in agricultural sector.

Improved Social Network Analysis Method in SNS (SNS에서의 개선된 소셜 네트워크 분석 방법)

  • Sohn, Jong-Soo;Cho, Soo-Whan;Kwon, Kyung-Lag;Chung, In-Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.117-127
    • /
    • 2012
  • Due to the recent expansion of the Web 2.0 -based services, along with the widespread of smartphones, online social network services are being popularized among users. Online social network services are the online community services which enable users to communicate each other, share information and expand human relationships. In the social network services, each relation between users is represented by a graph consisting of nodes and links. As the users of online social network services are increasing rapidly, the SNS are actively utilized in enterprise marketing, analysis of social phenomenon and so on. Social Network Analysis (SNA) is the systematic way to analyze social relationships among the members of the social network using the network theory. In general social network theory consists of nodes and arcs, and it is often depicted in a social network diagram. In a social network diagram, nodes represent individual actors within the network and arcs represent relationships between the nodes. With SNA, we can measure relationships among the people such as degree of intimacy, intensity of connection and classification of the groups. Ever since Social Networking Services (SNS) have drawn increasing attention from millions of users, numerous researches have made to analyze their user relationships and messages. There are typical representative SNA methods: degree centrality, betweenness centrality and closeness centrality. In the degree of centrality analysis, the shortest path between nodes is not considered. However, it is used as a crucial factor in betweenness centrality, closeness centrality and other SNA methods. In previous researches in SNA, the computation time was not too expensive since the size of social network was small. Unfortunately, most SNA methods require significant time to process relevant data, and it makes difficult to apply the ever increasing SNS data in social network studies. For instance, if the number of nodes in online social network is n, the maximum number of link in social network is n(n-1)/2. It means that it is too expensive to analyze the social network, for example, if the number of nodes is 10,000 the number of links is 49,995,000. Therefore, we propose a heuristic-based method for finding the shortest path among users in the SNS user graph. Through the shortest path finding method, we will show how efficient our proposed approach may be by conducting betweenness centrality analysis and closeness centrality analysis, both of which are widely used in social network studies. Moreover, we devised an enhanced method with addition of best-first-search method and preprocessing step for the reduction of computation time and rapid search of the shortest paths in a huge size of online social network. Best-first-search method finds the shortest path heuristically, which generalizes human experiences. As large number of links is shared by only a few nodes in online social networks, most nods have relatively few connections. As a result, a node with multiple connections functions as a hub node. When searching for a particular node, looking for users with numerous links instead of searching all users indiscriminately has a better chance of finding the desired node more quickly. In this paper, we employ the degree of user node vn as heuristic evaluation function in a graph G = (N, E), where N is a set of vertices, and E is a set of links between two different nodes. As the heuristic evaluation function is used, the worst case could happen when the target node is situated in the bottom of skewed tree. In order to remove such a target node, the preprocessing step is conducted. Next, we find the shortest path between two nodes in social network efficiently and then analyze the social network. For the verification of the proposed method, we crawled 160,000 people from online and then constructed social network. Then we compared with previous methods, which are best-first-search and breath-first-search, in time for searching and analyzing. The suggested method takes 240 seconds to search nodes where breath-first-search based method takes 1,781 seconds (7.4 times faster). Moreover, for social network analysis, the suggested method is 6.8 times and 1.8 times faster than betweenness centrality analysis and closeness centrality analysis, respectively. The proposed method in this paper shows the possibility to analyze a large size of social network with the better performance in time. As a result, our method would improve the efficiency of social network analysis, making it particularly useful in studying social trends or phenomena.

The Clinical Features of Endobronchial Tuberculosis - A Retrospective Study on 201 Patients for 6 years (기관지결핵의 임상상-201예에 대한 후향적 고찰)

  • Lee, Jae Young;Kim, Chung Mi;Moon, Doo Seop;Lee, Chang Wha;Lee, Kyung Sang;Yang, Suck Chul;Yoon, Ho Joo;Shin, Dong Ho;Park, Sung Soo;Lee, Jung Hee
    • Tuberculosis and Respiratory Diseases
    • /
    • v.43 no.5
    • /
    • pp.671-682
    • /
    • 1996
  • Background : Endobronchial tuberculosis is definded as tuberculous infection of the tracheobronchial tree with microbiological and histopathological evidence. Endobronchial tuberculosis has clinical significance due to its sequela of cicatrical stenosis which causes atelectasis, dyspnea and secondary pneumonia and may mimic bronchial asthma and pulmanary malignancy. Method : The authors carried out, retrospectively, a clinical study on 201 patients confirmed with endobronchial tuberculosis who visited the Department of Pulmonary Medicine at Hangyang University Hospital from January 1990 10 April 1996. The following results were obtained. Results: 1) Total 201 parients(l9.5%) were confirmed as endobronchial tuberculosis among 1031 patients who had been undergone flexible bronchofiberscopic examination. The number of male patients were 55 and that of female patients were 146. and the male to female ratio was 1 : 2.7. 2) The age distribution were as follows: there were 61(30.3%) cases in the third decade, 40 cases(19.9%) in the fourth decade, 27 cases(13.4%) in the sixth decade, 21 cases(10.4%) in the fifth decade, 19 cases(9.5%) in the age group between 15 and 19 years, 19 cases(9.5%) in the seventh decade, and 14 cases(7.0%) over 70 years, in decreasing order. 3) The most common symptom, in 192 cases, was cough 74.5%, followed by sputum 55.2%, dyspnea 28.6%, chest discomfort 19.8%, fever 17.2%, hemoptysis 11.5%, in decreasing order, and localized wheezing was heard in 15.6%. 4) In chest X-ray of 189 cases, consolidation was the most frequent finding in 67.7%, followed by collapse 43.9%. cavitary lesion 11.6%, pleural effusion 7.4%, in decreasing order, and there was no abnormal findings in 3.2%. 5) In the 76 pulmanary function tests, a normal pattern was found in 44.7%, restrictive pattern in 39.5 %, obstructive pattern in 11.8%, and combined pattern in 3.9%. 6) Among total 201 patients, bronchoscopy showed caseous pseudomembrane in 70 cases(34.8%), mucosal erythema and edema in 54 cases(26.9%), hyperplastic lesion in 52 cases(25.9%), fibrous s.enosis in 22 cases(10.9%), and erosion or ulcer in 3 cases(1.5%). 7) In total 201 cases, bronchial washing AFB stain was positive in 103 cases(51.2%), bronchial washing culture for tuberculous bacilli in 55 cases(27.4%). In the 99 bronchoscopic biopsies, AFB slain positive in 36.4%. granuloma without AFB stain positive in 13.1%, chronic inflammation only in 36.4%. and non diagnostic biopsy finding in 14.1%. Conclusions : Young female patients, whose cough resistant to genenal antitussive agents, should be evaluated for endobronchial tuberculosis, even with clear chest roentgenogram and negative sputum AFB stain. Furthermore, we would like to emphasize that the bronchoscopic approach is a substantially useful means of making a differential diagnosis of atelectasis in older patients of cancer age. At this time we have to make a standard endoscopic classification of endobronchial tuberculosis, and well designed prospective studies are required to elucidate the effect of combination therapy using antituberculous chemotherapy with steroids on bronchial stenosis in patients with endobronchial tuberculosis.

  • PDF

Suggestion of Urban Regeneration Type Recommendation System Based on Local Characteristics Using Text Mining (텍스트 마이닝을 활용한 지역 특성 기반 도시재생 유형 추천 시스템 제안)

  • Kim, Ikjun;Lee, Junho;Kim, Hyomin;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.149-169
    • /
    • 2020
  • "The Urban Renewal New Deal project", one of the government's major national projects, is about developing underdeveloped areas by investing 50 trillion won in 100 locations on the first year and 500 over the next four years. This project is drawing keen attention from the media and local governments. However, the project model which fails to reflect the original characteristics of the area as it divides project area into five categories: "Our Neighborhood Restoration, Housing Maintenance Support Type, General Neighborhood Type, Central Urban Type, and Economic Base Type," According to keywords for successful urban regeneration in Korea, "resident participation," "regional specialization," "ministerial cooperation" and "public-private cooperation", when local governments propose urban regeneration projects to the government, they can see that it is most important to accurately understand the characteristics of the city and push ahead with the projects in a way that suits the characteristics of the city with the help of local residents and private companies. In addition, considering the gentrification problem, which is one of the side effects of urban regeneration projects, it is important to select and implement urban regeneration types suitable for the characteristics of the area. In order to supplement the limitations of the 'Urban Regeneration New Deal Project' methodology, this study aims to propose a system that recommends urban regeneration types suitable for urban regeneration sites by utilizing various machine learning algorithms, referring to the urban regeneration types of the '2025 Seoul Metropolitan Government Urban Regeneration Strategy Plan' promoted based on regional characteristics. There are four types of urban regeneration in Seoul: "Low-use Low-Level Development, Abandonment, Deteriorated Housing, and Specialization of Historical and Cultural Resources" (Shon and Park, 2017). In order to identify regional characteristics, approximately 100,000 text data were collected for 22 regions where the project was carried out for a total of four types of urban regeneration. Using the collected data, we drew key keywords for each region according to the type of urban regeneration and conducted topic modeling to explore whether there were differences between types. As a result, it was confirmed that a number of topics related to real estate and economy appeared in old residential areas, and in the case of declining and underdeveloped areas, topics reflecting the characteristics of areas where industrial activities were active in the past appeared. In the case of the historical and cultural resource area, since it is an area that contains traces of the past, many keywords related to the government appeared. Therefore, it was possible to confirm political topics and cultural topics resulting from various events. Finally, in the case of low-use and under-developed areas, many topics on real estate and accessibility are emerging, so accessibility is good. It mainly had the characteristics of a region where development is planned or is likely to be developed. Furthermore, a model was implemented that proposes urban regeneration types tailored to regional characteristics for regions other than Seoul. Machine learning technology was used to implement the model, and training data and test data were randomly extracted at an 8:2 ratio and used. In order to compare the performance between various models, the input variables are set in two ways: Count Vector and TF-IDF Vector, and as Classifier, there are 5 types of SVM (Support Vector Machine), Decision Tree, Random Forest, Logistic Regression, and Gradient Boosting. By applying it, performance comparison for a total of 10 models was conducted. The model with the highest performance was the Gradient Boosting method using TF-IDF Vector input data, and the accuracy was 97%. Therefore, the recommendation system proposed in this study is expected to recommend urban regeneration types based on the regional characteristics of new business sites in the process of carrying out urban regeneration projects."