• Title/Summary/Keyword: 벡터

Search Result 7,453, Processing Time 0.034 seconds

Prediction of multipurpose dam inflow utilizing catchment attributes with LSTM and transformer models (유역정보 기반 Transformer및 LSTM을 활용한 다목적댐 일 단위 유입량 예측)

  • Kim, Hyung Ju;Song, Young Hoon;Chung, Eun Sung
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.7
    • /
    • pp.437-449
    • /
    • 2024
  • Rainfall-runoff prediction studies using deep learning while considering catchment attributes have been gaining attention. In this study, we selected two models: the Transformer model, which is suitable for large-scale data training through the self-attention mechanism, and the LSTM-based multi-state-vector sequence-to-sequence (LSTM-MSV-S2S) model with an encoder-decoder structure. These models were constructed to incorporate catchment attributes and predict the inflow of 10 multi-purpose dam watersheds in South Korea. The experimental design consisted of three training methods: Single-basin Training (ST), Pretraining (PT), and Pretraining-Finetuning (PT-FT). The input data for the models included 10 selected watershed attributes along with meteorological data. The inflow prediction performance was compared based on the training methods. The results showed that the Transformer model outperformed the LSTM-MSV-S2S model when using the PT and PT-FT methods, with the PT-FT method yielding the highest performance. The LSTM-MSV-S2S model showed better performance than the Transformer when using the ST method; however, it showed lower performance when using the PT and PT-FT methods. Additionally, the embedding layer activation vectors and raw catchment attributes were used to cluster watersheds and analyze whether the models learned the similarities between them. The Transformer model demonstrated improved performance among watersheds with similar activation vectors, proving that utilizing information from other pre-trained watersheds enhances the prediction performance. This study compared the suitable models and training methods for each multi-purpose dam and highlighted the necessity of constructing deep learning models using PT and PT-FT methods for domestic watersheds. Furthermore, the results confirmed that the Transformer model outperforms the LSTM-MSV-S2S model when applying PT and PT-FT methods.

Comparison of Electrocardiographic Time Intervals, Amplitudes and Vectors in 7 Different Athletic Groups (운동종목별(運動種目別) 선수(選手)의 심전도시간간격(心電圖時間間隔), 파고(波高) 및 벡터의 비교(比較))

  • Kwon, Ki-Young;Lee, Won-Jung;Hwang, Soo-Kwan;Choo, Young-Eun
    • The Korean Journal of Physiology
    • /
    • v.19 no.1
    • /
    • pp.61-72
    • /
    • 1985
  • In order to compare the cardiac function of various groups of athletes, the resting electrocardiographic time intervals, amplitudes and vectors were analyzed in high school athletes of throwing(n=7), jumping(n=11), short track(n=8), long track(n=14), boxing(n=7), volleyball(n=8) and baseball(n=9), and nonathletic control students(n= 19). All athletic groups showed a significantly longer R-R interval(0.96-1.09 sec) than the controls (0.78 sec). Therefore, the heart rate was significantly slower in atheletes than in the control, but was not different among the different athletic groups. R-R interval is the sum of intervals of P-R, 0-T and T-P: P-R and Q-T intervals showed no difference among the control and athletic groups, but T-P interval in the jump, short track, long track and boxing groups was significantly higher than the control. R-B interval showed a significant correlation with T-P or Q-T intervals but no correlation with P-R or QRS complex. Comparing the amplitude of electrocardiographic waves, the athletic groups showed a lower trend in P wave than the controls. T wave in lead $V_5\;(Tv_5)$ was similar in the athletic and control groups. The long track group showed a significantly higher waves of $Rv_5$, $Sv_1$, and the sum of $Rv_5$ and $Sv_1$ than not only the controls but also the other athletic group. The angles of P, QRS, and T vector in the frontal and horizontal planes were not different among the control and all the athletic groups. Each athletic group stowed a lower trend in amplitude of P vector in the frontal plane, but in horizontal plane, throwing, jump, short track and baseball groups showed a significantly lower than the controls. The amplitude of QRS and T vector was similar in the athletic and control groups, but only the baseball group showed a significantly higher QRS vector in the frontal plane. In taken together, all the athletic groups showed a slower heart rate than the controls, mainly because of elongated T-P interval. Comparing the electrocardiographic waves and vector, the athletic groups showed lower amplitudes of P wave and P vector than the controls. Values of $Rv_5$ and $Sv_1$ strongly suggest that only the long distance runners among the various athletic groups developed a left ventricular hypertrophy.

  • PDF

Expression of TIMP1, TIMP2 Genes by Ionizing Radiation (이온화 방사선에 의한 TIMP1, TIMP2 유전자 발현 측정)

  • Park Kun-Koo;Jin Jung Sun;Park Ki Yong;Lee Yun Hee;Kim Sang Yoon;Noh Young Ju;Ahn Seung Do;Kim Jong Hoon;Choi Eun Kyung;Chang Hyesook
    • Radiation Oncology Journal
    • /
    • v.19 no.2
    • /
    • pp.171-180
    • /
    • 2001
  • Purpose : Expression of TIMP, intrinsic inhibitor of MMP, is regulated by signal transduction in response to genotoxins and is likely to be an important step in metastasis, angiogenesis and wound healing after ionizing radiation. Therefore, we studied radiation mediated TIMP expression and its mechanism in head and neck cancer cell lines. Materials and Methods : Human head and neck cancer cell lines established at Asan Medical Center were used and radiosensitivity $(D_0)$, radiation cytotoxicity and metastatic potential were measured by clonogenic assay, n assay and invasion assay, respectively. The conditioned medium was prepared at 24 hours and 48 hours after 2 Gy and 10 Gy irradiation and expression of TIMP protein was measured by Elisa assay with specific antibodies against human TIMP. hTIMP1 promoter region was cloned and TIMP1 luciferase reporter vector was constructed. The reporter vector was transfected to AMC-HN-1 and -HN-9 cells with or without expression vector Ras, then the cells were exposed to radiation or PMA, PKC activator. EMSA was peformed with oligonucleotide (-59/-53 element and SP1) of TIMP1 promoter. Results : $D_0$ of HN-1, -2, -3, -5 and -9 cell lines were 1.55 Gy, 1.8 Gy, 1.5 Gt, 1.55 Gy and 2.45 Gy respectively. n assay confirmed cell viability, over $94\%$ at 24hrs, 48hrs after 2 Gy irradiation and over 73% after 10 Gy irradiation. Elisa assay confirmed that cells secreted TIMP1, 2 proteins continuously. After 2 Gy irradiation, TIMP2 secretion was decreased at 24hrs in HN-1 and HN-9 cell lines but after 10 Gy irradiation, it was increased in all cell lines. At 48hrs after irradiation, it was increased in HN-1 but decreased in HN-9 cells. But the change in TIMP secretion by RT was mild. The transcription of TIMP1 gene in HN-1 was induced by PMA but in HN-9 cell lines, it was suppressed. Wild type Ras induced the TIMP-1 transcription by 20 fold and 4 fold in HN-1 and HN-9 respectively. The binding activity to -59/-53, AP1 motif was increased by RT, but not to SP1 motif in both cell lines. Conclusions : We observed the difference of expression and activity of TIMPs between radiosensitive and radioresistant cell line and the different signal transduction pathway between in these cell lines may contribute the different radiosensitivity. Further research to investigate the radiation response and its signal pathway of TIMPs is needed.

  • PDF

An Intelligence Support System Research on KTX Rolling Stock Failure Using Case-based Reasoning and Text Mining (사례기반추론과 텍스트마이닝 기법을 활용한 KTX 차량고장 지능형 조치지원시스템 연구)

  • Lee, Hyung Il;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.47-73
    • /
    • 2020
  • KTX rolling stocks are a system consisting of several machines, electrical devices, and components. The maintenance of the rolling stocks requires considerable expertise and experience of maintenance workers. In the event of a rolling stock failure, the knowledge and experience of the maintainer will result in a difference in the quality of the time and work to solve the problem. So, the resulting availability of the vehicle will vary. Although problem solving is generally based on fault manuals, experienced and skilled professionals can quickly diagnose and take actions by applying personal know-how. Since this knowledge exists in a tacit form, it is difficult to pass it on completely to a successor, and there have been studies that have developed a case-based rolling stock expert system to turn it into a data-driven one. Nonetheless, research on the most commonly used KTX rolling stock on the main-line or the development of a system that extracts text meanings and searches for similar cases is still lacking. Therefore, this study proposes an intelligence supporting system that provides an action guide for emerging failures by using the know-how of these rolling stocks maintenance experts as an example of problem solving. For this purpose, the case base was constructed by collecting the rolling stocks failure data generated from 2015 to 2017, and the integrated dictionary was constructed separately through the case base to include the essential terminology and failure codes in consideration of the specialty of the railway rolling stock sector. Based on a deployed case base, a new failure was retrieved from past cases and the top three most similar failure cases were extracted to propose the actual actions of these cases as a diagnostic guide. In this study, various dimensionality reduction measures were applied to calculate similarity by taking into account the meaningful relationship of failure details in order to compensate for the limitations of the method of searching cases by keyword matching in rolling stock failure expert system studies using case-based reasoning in the precedent case-based expert system studies, and their usefulness was verified through experiments. Among the various dimensionality reduction techniques, similar cases were retrieved by applying three algorithms: Non-negative Matrix Factorization(NMF), Latent Semantic Analysis(LSA), and Doc2Vec to extract the characteristics of the failure and measure the cosine distance between the vectors. The precision, recall, and F-measure methods were used to assess the performance of the proposed actions. To compare the performance of dimensionality reduction techniques, the analysis of variance confirmed that the performance differences of the five algorithms were statistically significant, with a comparison between the algorithm that randomly extracts failure cases with identical failure codes and the algorithm that applies cosine similarity directly based on words. In addition, optimal techniques were derived for practical application by verifying differences in performance depending on the number of dimensions for dimensionality reduction. The analysis showed that the performance of the cosine similarity was higher than that of the dimension using Non-negative Matrix Factorization(NMF) and Latent Semantic Analysis(LSA) and the performance of algorithm using Doc2Vec was the highest. Furthermore, in terms of dimensionality reduction techniques, the larger the number of dimensions at the appropriate level, the better the performance was found. Through this study, we confirmed the usefulness of effective methods of extracting characteristics of data and converting unstructured data when applying case-based reasoning based on which most of the attributes are texted in the special field of KTX rolling stock. Text mining is a trend where studies are being conducted for use in many areas, but studies using such text data are still lacking in an environment where there are a number of specialized terms and limited access to data, such as the one we want to use in this study. In this regard, it is significant that the study first presented an intelligent diagnostic system that suggested action by searching for a case by applying text mining techniques to extract the characteristics of the failure to complement keyword-based case searches. It is expected that this will provide implications as basic study for developing diagnostic systems that can be used immediately on the site.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

Therapeutic Angiogenesis by Intramyocardial Injection of pCK-VEGF165 in Pigs (돼지에서 pCK-VEGF165의 심근내 주입에 의한 치료적 혈관조성)

  • Choi Jae-Sung;Han Woong;Kim Dong Sik;Park Jin Sik;Lee Jong Jin;Lee Dong Soo;Kim Ki-Bong
    • Journal of Chest Surgery
    • /
    • v.38 no.5 s.250
    • /
    • pp.323-334
    • /
    • 2005
  • Background: Gene therapy is a new and promising option for the treatment of severe myocardial ischemia by therapeutic angiogenesis. The goal of this study was to elucidate the efficacy of therapeutic angiogenesis by using VEGF165 in large animals. Material and Method: Twenty-one pigs that underwent ligation of the distal left anterior descending coronary artery were randomly allocated to one of two treatments: intramyocardial injection of pCK-VEGF (VEGF) or intramyocardial injection of pCK-Null (Control). Injections were administered 30 days after ligation. Seven pigs died during the trial, but eight pigs from VEGF and six from Control survived. Echo-cardiography was performed on day 0 (preoperative) and on days 30 and 60 following coronary ligation. Gated myocardial single photon emission computed tomography imaging (SPECT) with $^{99m}Tc-labeled$ sestamibi was performed on days 30 and 60. Myocardial perfusion was assessed from the uptake of $^{99m}Tc-labeled$ sestamibi at rest. Global and regional myocardial function as well as post-infarction left ventricular remodeling were assessed from segmental wall thickening; left ventricular ejection fraction (EF); end systolic volume (ESV); and end diastolic volume (EDV) using gated SPECT and echocardiography. Myocardium of the ischemic border zone into which pCK plasmid vector had been injected was also sampled to assess micro-capillary density. Result: Micro-capillary density was significantly higher in the VEGF than in Control ($386\pm110/mm^{2}\;vs.\;291\pm127/mm^{2};\;p<0.001$). Segmental perfusion increased significantly from day 30 to day 60 after intramyocardial injection of plasmid vector in VEGF ($48.4\pm15.2\%\;vs.\;53.8\pm19.6\%;\;p<0.001$), while no significant change was observed in the Control ($45.1\pm17.0\%\;vs.\;43.4\pm17.7\%;\;p=0.186$). This resulted in a significant difference in the percentage changes between the two groups ($11.4\pm27.0\%\;increase\;vs.\;2.7\pm19.0\%\;decrease;\;p=0.003$). Segmental wall thickening increased significantly from day 30 to day 60 in both groups; the increments did not differ between groups. ESV measured using echocardiography increased significantly from day 0 to day 30 in VEGF ($22.9\pm9.9\;mL\;vs.\;32.3\pm9.1\;mL;\; p=0.006$) and in Control ($26.3\pm12.0\;mL\;vs.\;36.8\pm9.7\;mL;\;p=0.046$). EF decreased significantly in VEGF ($52.0\pm7.7\%\;vs.\;46.5\pm7.4\%;\;p=0.004$) and in Control ($48.2\pm9.2\%\;vs.\;41.6\pm10.0\%;\;p=0.028$). There was no significant change in EDV. The interval changes (days $30\~60$) of EF, ESV, and EDV did not differ significantly between groups both by gated SPECT and by echocardiography. Conclusion: Intramyocardial injection of pCK-VEGF165 induced therapeutic angiogenesis and improved myocardial perfusion. However, post-infarction remodeling and global myocardial function were not improved.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Triptolide-induced Transrepression of IL-8 NF-${\kappa}B$ in Lung Epithelial Cells (폐상피세포에서 Triptolide에 의한 NF-${\kappa}B$ 의존성 IL-8 유전자 전사활성 억제기전)

  • Jee, Young-Koo;Kim, Yoon-Seup;Yun, Se-Young;Kim, Yong-Ho;Choi, Eun-Kyoung;Park, Jae-Seuk;Kim, Keu-Youl;Chea, Gi-Nam;Kwak, Sahng-June;Lee, Kye-Young
    • Tuberculosis and Respiratory Diseases
    • /
    • v.50 no.1
    • /
    • pp.52-66
    • /
    • 2001
  • Background : NF-${\kappa}B$ is the most important transcriptional factor in IL-8 gene expression. Triptolide is a new compound that recently has been shown to inhibit NF-${\kappa}B$ activation. The purpose of this study is to investigate how triptolide inhibits NF-${\kappa}B$-dependent IL-8 gene transcription in lung epithelial cells and to pilot the potential for the clinical application of triptolide in inflammatory lung diseases. Methods : A549 cells were used and triptolide was provided from Pharmagenesis Company (Palo Alto, CA). In order to examine NF-${\kappa}B$-dependent IL-8 transcriptional activity, we established stable A549 IL-8-NF-${\kappa}B$-luc. cells and performed luciferase assays. IL-8 gene expression was measured by RT-PCR and ELISA. A Western blot was done for the study of $I{\kappa}B{\alpha}$ degradation and an electromobility shift assay was done to analyze NF-${\kappa}B$ DNA binding. p65 specific transactivation was analyzed by a cotransfection study using a Gal4-p65 fusion protein expression system. To investigate the involvement of transcriptional coactivators, we perfomed a transfection study with CBP and SRC-1 expression vectors. Results : We observed that triptolide significantly suppresses NF-${\kappa}B$-dependent IL-8 transcriptional activity induced by IL-$1{\beta}$ and PMA. RT-PCR showed that triptolide represses both IL-$1{\beta}$ and PMA-induced IL-8 mRNA expression and ELISA confirmed this triptolide-mediated IL-8 suppression at the protein level. However, triptolide did not affect $I{\kappa}B{\alpha}$ degradation and NF-$_{\kappa}B$ DNA binding. In a p65-specific transactivation study, triptolide significantly suppressed Gal4-p65T Al and Gal4-p65T A2 activity suggesting that triptolide inhibits NF-${\kappa}B$ activation by inhibiting p65 transactivation. However, this triptolide-mediated inhibition of p65 transactivation was not rescued by the overexpression of CBP or SRC-1, thereby excluding the role of transcriptional coactivators. Conclusions : Triptolide is a new compound that inhibits NF-${\kappa}B$-dependent IL-8 transcriptional activation by inhibiting p65 transactivation, but not by an $I{\kappa}B{\alpha}$-dependent mechanism. This suggests that triptolide may have a therapeutic potential for inflammatory lung diseases.

  • PDF

Suggestion of Urban Regeneration Type Recommendation System Based on Local Characteristics Using Text Mining (텍스트 마이닝을 활용한 지역 특성 기반 도시재생 유형 추천 시스템 제안)

  • Kim, Ikjun;Lee, Junho;Kim, Hyomin;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.149-169
    • /
    • 2020
  • "The Urban Renewal New Deal project", one of the government's major national projects, is about developing underdeveloped areas by investing 50 trillion won in 100 locations on the first year and 500 over the next four years. This project is drawing keen attention from the media and local governments. However, the project model which fails to reflect the original characteristics of the area as it divides project area into five categories: "Our Neighborhood Restoration, Housing Maintenance Support Type, General Neighborhood Type, Central Urban Type, and Economic Base Type," According to keywords for successful urban regeneration in Korea, "resident participation," "regional specialization," "ministerial cooperation" and "public-private cooperation", when local governments propose urban regeneration projects to the government, they can see that it is most important to accurately understand the characteristics of the city and push ahead with the projects in a way that suits the characteristics of the city with the help of local residents and private companies. In addition, considering the gentrification problem, which is one of the side effects of urban regeneration projects, it is important to select and implement urban regeneration types suitable for the characteristics of the area. In order to supplement the limitations of the 'Urban Regeneration New Deal Project' methodology, this study aims to propose a system that recommends urban regeneration types suitable for urban regeneration sites by utilizing various machine learning algorithms, referring to the urban regeneration types of the '2025 Seoul Metropolitan Government Urban Regeneration Strategy Plan' promoted based on regional characteristics. There are four types of urban regeneration in Seoul: "Low-use Low-Level Development, Abandonment, Deteriorated Housing, and Specialization of Historical and Cultural Resources" (Shon and Park, 2017). In order to identify regional characteristics, approximately 100,000 text data were collected for 22 regions where the project was carried out for a total of four types of urban regeneration. Using the collected data, we drew key keywords for each region according to the type of urban regeneration and conducted topic modeling to explore whether there were differences between types. As a result, it was confirmed that a number of topics related to real estate and economy appeared in old residential areas, and in the case of declining and underdeveloped areas, topics reflecting the characteristics of areas where industrial activities were active in the past appeared. In the case of the historical and cultural resource area, since it is an area that contains traces of the past, many keywords related to the government appeared. Therefore, it was possible to confirm political topics and cultural topics resulting from various events. Finally, in the case of low-use and under-developed areas, many topics on real estate and accessibility are emerging, so accessibility is good. It mainly had the characteristics of a region where development is planned or is likely to be developed. Furthermore, a model was implemented that proposes urban regeneration types tailored to regional characteristics for regions other than Seoul. Machine learning technology was used to implement the model, and training data and test data were randomly extracted at an 8:2 ratio and used. In order to compare the performance between various models, the input variables are set in two ways: Count Vector and TF-IDF Vector, and as Classifier, there are 5 types of SVM (Support Vector Machine), Decision Tree, Random Forest, Logistic Regression, and Gradient Boosting. By applying it, performance comparison for a total of 10 models was conducted. The model with the highest performance was the Gradient Boosting method using TF-IDF Vector input data, and the accuracy was 97%. Therefore, the recommendation system proposed in this study is expected to recommend urban regeneration types based on the regional characteristics of new business sites in the process of carrying out urban regeneration projects."

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.