• Title/Summary/Keyword: Information System Types

Search Result 3,547, Processing Time 0.031 seconds

Analysis and Improvement Strategies for Korea's Cyber Security Systems Regulations and Policies

  • Park, Dong-Kyun;Cho, Sung-Je;Soung, Jea-Hyen
    • Korean Security Journal
    • /
    • no.18
    • /
    • pp.169-190
    • /
    • 2009
  • Today, the rapid advance of scientific technologies has brought about fundamental changes to the types and levels of terrorism while the war against the world more than one thousand small and big terrorists and crime organizations has already begun. A method highly likely to be employed by terrorist groups that are using 21st Century state of the art technology is cyber terrorism. In many instances, things that you could only imagine in reality could be made possible in the cyber space. An easy example would be to randomly alter a letter in the blood type of a terrorism subject in the health care data system, which could inflict harm to subjects and impact the overturning of the opponent's system or regime. The CIH Virus Crisis which occurred on April 26, 1999 had significant implications in various aspects. A virus program made of just a few lines by Taiwanese college students without any specific objective ended up spreading widely throughout the Internet, causing damage to 30,000 PCs in Korea and over 2 billion won in monetary damages in repairs and data recovery. Despite of such risks of cyber terrorism, a great number of Korean sites are employing loose security measures. In fact, there are many cases where a company with millions of subscribers has very slackened security systems. A nationwide preparation for cyber terrorism is called for. In this context, this research will analyze the current status of Korea's cyber security systems and its laws from a policy perspective, and move on to propose improvement strategies. This research suggests the following solutions. First, the National Cyber Security Management Act should be passed to have its effectiveness as the national cyber security management regulation. With the Act's establishment, a more efficient and proactive response to cyber security management will be made possible within a nationwide cyber security framework, and define its relationship with other related laws. The newly passed National Cyber Security Management Act will eliminate inefficiencies that are caused by functional redundancies dispersed across individual sectors in current legislation. Second, to ensure efficient nationwide cyber security management, national cyber security standards and models should be proposed; while at the same time a national cyber security management organizational structure should be established to implement national cyber security policies at each government-agencies and social-components. The National Cyber Security Center must serve as the comprehensive collection, analysis and processing point for national cyber crisis related information, oversee each government agency, and build collaborative relations with the private sector. Also, national and comprehensive response system in which both the private and public sectors participate should be set up, for advance detection and prevention of cyber crisis risks and for a consolidated and timely response using national resources in times of crisis.

  • PDF

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

Comparison of Deep Learning Frameworks: About Theano, Tensorflow, and Cognitive Toolkit (딥러닝 프레임워크의 비교: 티아노, 텐서플로, CNTK를 중심으로)

  • Chung, Yeojin;Ahn, SungMahn;Yang, Jiheon;Lee, Jaejoon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.1-17
    • /
    • 2017
  • The deep learning framework is software designed to help develop deep learning models. Some of its important functions include "automatic differentiation" and "utilization of GPU". The list of popular deep learning framework includes Caffe (BVLC) and Theano (University of Montreal). And recently, Microsoft's deep learning framework, Microsoft Cognitive Toolkit, was released as open-source license, following Google's Tensorflow a year earlier. The early deep learning frameworks have been developed mainly for research at universities. Beginning with the inception of Tensorflow, however, it seems that companies such as Microsoft and Facebook have started to join the competition of framework development. Given the trend, Google and other companies are expected to continue investing in the deep learning framework to bring forward the initiative in the artificial intelligence business. From this point of view, we think it is a good time to compare some of deep learning frameworks. So we compare three deep learning frameworks which can be used as a Python library. Those are Google's Tensorflow, Microsoft's CNTK, and Theano which is sort of a predecessor of the preceding two. The most common and important function of deep learning frameworks is the ability to perform automatic differentiation. Basically all the mathematical expressions of deep learning models can be represented as computational graphs, which consist of nodes and edges. Partial derivatives on each edge of a computational graph can then be obtained. With the partial derivatives, we can let software compute differentiation of any node with respect to any variable by utilizing chain rule of Calculus. First of all, the convenience of coding is in the order of CNTK, Tensorflow, and Theano. The criterion is simply based on the lengths of the codes and the learning curve and the ease of coding are not the main concern. According to the criteria, Theano was the most difficult to implement with, and CNTK and Tensorflow were somewhat easier. With Tensorflow, we need to define weight variables and biases explicitly. The reason that CNTK and Tensorflow are easier to implement with is that those frameworks provide us with more abstraction than Theano. We, however, need to mention that low-level coding is not always bad. It gives us flexibility of coding. With the low-level coding such as in Theano, we can implement and test any new deep learning models or any new search methods that we can think of. The assessment of the execution speed of each framework is that there is not meaningful difference. According to the experiment, execution speeds of Theano and Tensorflow are very similar, although the experiment was limited to a CNN model. In the case of CNTK, the experimental environment was not maintained as the same. The code written in CNTK has to be run in PC environment without GPU where codes execute as much as 50 times slower than with GPU. But we concluded that the difference of execution speed was within the range of variation caused by the different hardware setup. In this study, we compared three types of deep learning framework: Theano, Tensorflow, and CNTK. According to Wikipedia, there are 12 available deep learning frameworks. And 15 different attributes differentiate each framework. Some of the important attributes would include interface language (Python, C ++, Java, etc.) and the availability of libraries on various deep learning models such as CNN, RNN, DBN, and etc. And if a user implements a large scale deep learning model, it will also be important to support multiple GPU or multiple servers. Also, if you are learning the deep learning model, it would also be important if there are enough examples and references.

Comparison of Landcover Map Accuracy Using High Resolution Satellite Imagery (고해상도 위성영상의 토지피복분류와 정확도 비교 연구)

  • Oh, Che-Young;Park, So-Young;Kim, Hyung-Seok;Lee, Yanng-Won;Choi, Chul-Uong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.13 no.1
    • /
    • pp.89-100
    • /
    • 2010
  • The aim of this study is to produce land cover maps using satellite imagery with various degrees of high resolution and then compare the accuracy of the image types and categories. For the land cover map produced on a small-scale classification the estuary area around the Nakdong river, including an urban area, farming land and waters, was selected. The images were classified by analyzing the aerial photos taken from KOMPSAT2, Quickbird and IKONOS satellites, which all have a resolution of over 1m to the naked eye. Once all of the land cover maps with different images and land cover categories had been produced they were compared to each other. Results show that image accuracy from the aerial photos and Quickbird was relatively higher than with KOMPSAT2 and IKONOS. The agreement ratio for the large-scale classification across the classification methods ranged between 0.934 and 0.956 for most cases. The Kappa value ranged between 0.905 and 0.937; the agreement ratio for the middle-scale classification was 0.888~0.913 and the Kappa value was 0.872~0.901. The agreement ratio for the small-scale classification was 0.833~0.901 and the Kappa value was 0.813~0.888. In addition, in terms of the degree of confusion occurrence across the images, there was confusion on the urbanized arid areas and empty land in the large-scale classification. For the middle-scale classification, the confusion mainly occurred on the rice paddies, fields, house cultivating area and artificial grassland. For the small-scale classification, confusion mainly occurred on natural green fields, cultivating land with facilities, tideland and the surface of the sea. The findings of this study indicate that the classification of the high resolution images with the naked eye showed an agreement ratio of over 80%, which means that it can be used in practice. The findings also suggest that the use of higher resolution images can lead to increased accuracy in classification, indicating that the time when the images are taken is important in producing land cover maps.

A Survey for Source Reduction and Recycling of Household Waste in Seoul Metropolitan area (도시생활쓰레기의 발생억제 및 재활용에 대한 수도권주민의식 조사분석)

  • Namkoong, Wan;Sohn, Tai-Ik
    • Journal of the Korea Organic Resources Recycling Association
    • /
    • v.2 no.2
    • /
    • pp.89-98
    • /
    • 1994
  • A survey was carried out in Seoul Metropolitan area during December of 1993 and January of 1994. The objective of the survey was to provide a useful information for the development and improvement of recycling policies, regulations and systems in Korea. Of the 782 individuals contacted, 473 individuals completed and retured surveys, of which 437 were usable. The results were analyzed using a statistical package SAS(Statistical Analysis System). The results indicated that 86% of apartment area has recycling bins, while only 33% of individual house(detached dwelling) area has those. About half of the respondents felt that food waste is the major source of household waste. The most serious problem to recycle more household waste is to provide space to store recyclables at the source. The majority of Seoul Metropolitans(78.5%) are willing to participate in recycling programs, while 14.4% want to participate only when there are economic incentives or benefits. Respondents who want economic incentives appeared to be low income people. 66.1% of total respondents said that they do not use disposables. However, only 53,0% among respondents under 30 years old answered they do not use disposables. People who graduated from middle high school only and are under 30 years old have tendency to dispose of used milk cartons without rincing and drying, while those who are over 40 years old and graduated from university prefered to rinse and dry used milk cartons before disposal. Regarding disposal of newspapers, only 43.9% of the total respondents separated newspaper from other types of used paper. In the case of alumium cans, 22.5% of the total respondents answered that used aluminum cans are not recyclables. Much higher portion (30. 4%) of the respondents who graduated from middle high school only felt that aluminum cans have no value to recycle. The results indicated that education and information regarding recycling are highly desirable.

  • PDF

Analysis of Knowledge Community for Knowledge Creation and Use (지식 생성 및 활용을 위한 지식 커뮤니티 효과 분석)

  • Huh, Jun-Hyuk;Lee, Jung-Seung
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.85-97
    • /
    • 2010
  • Internet communities are a typical space for knowledge creation and use on the Internet as people discuss their common interests within the internet communities. When we define 'Knowledge Communities' as internet communities that are related to knowledge creation and use, they are categorized into 4 different types such as 'Search Engine,' 'Open Communities,' 'Specialty Communities,' and 'Activity Communities.' Each type of knowledge community does not remain the same, for example. Rather, it changes with time and is also affected by the external business environment. Therefore, it is critical to develop processes for practical use of such changeable knowledge communities. Yet there is little research regarding a strategic framework for knowledge communities as a source of knowledge creation and use. The purposes of this study are (1) to find factors that can affect knowledge creation and use for each type of knowledge community and (2) to develop a strategic framework for practical use of the knowledge communities. Based on previous research, we found 7 factors that have considerable impacts on knowledge creation and use. They were 'Fitness,' 'Reliability,' 'Systemicity,' 'Richness,' 'Similarity,' 'Feedback,' and 'Understanding.' We created 30 different questions from each type of knowledge community. The questions included common sense, IT, business and hobbies, and were uniformly selected from various knowledge communities. Instead of using survey, we used these questions to ask users of the 4 representative web sites such as Google from Search Engine, NAVER Knowledge iN from Open Communities, SLRClub from Specialty Communities, and Wikipedia from Activity Communities. These 4 representative web sites were selected based on popularity (i.e., the 4 most popular sites in Korea). They were also among the 4 most frequently mentioned sitesin previous research. The answers of the 30 knowledge questions were collected and evaluated by the 11 IT experts who have been working for IT companies more than 3 years. When evaluating, the 11 experts used the above 7 knowledge factors as criteria. Using a stepwise linear regression for the evaluation of the 7 knowledge factors, we found that each factors affects differently knowledge creation and use for each type of knowledge community. The results of the stepwise linear regression analysis showed the relationship between 'Understanding' and other knowledge factors. The relationship was different regarding the type of knowledge community. The results indicated that 'Understanding' was significantly related to 'Reliability' at 'Search Engine type', to 'Fitness' at 'Open Community type', to 'Reliability' and 'Similarity' at 'Specialty Community type', and to 'Richness' and 'Similarity' at 'Activity Community type'. A strategic framework was created from the results of this study and such framework can be useful for knowledge communities that are not stable with time. For the success of knowledge community, the results of this study suggest that it is essential to ensure there are factors that can influence knowledge communities. It is also vital to reinforce each factor has its unique influence on related knowledge community. Thus, these changeable knowledge communities should be transformed into an adequate type with proper business strategies and objectives. They also should be progressed into a type that covers varioustypes of knowledge communities. For example, DCInside started from a small specialty community focusing on digital camera hardware and camerawork and then was transformed to an open community focusing on social issues through well-known photo galleries. NAVER started from a typical search engine and now covers an open community and a special community through additional web services such as NAVER knowledge iN, NAVER Cafe, and NAVER Blog. NAVER is currently competing withan activity community such as Wikipedia through the NAVER encyclopedia that provides similar services with NAVER encyclopedia's users as Wikipedia does. Finally, the results of this study provide meaningfully practical guidance for practitioners in that which type of knowledge community is most appropriate to the fluctuated business environment as knowledge community itself evolves with time.

Prediction of commitment and persistence in heterosexual involvements according to the styles of loving using a datamining technique (데이터마이닝을 활용한 사랑의 형태에 따른 연인관계 몰입수준 및 관계 지속여부 예측)

  • Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.69-85
    • /
    • 2016
  • Successful relationship with loving partners is one of the most important factors in life. In psychology, there have been some previous researches studying the factors influencing romantic relationships. However, most of these researches were performed based on statistical analysis; thus they have limitations in analyzing complex non-linear relationships or rules based reasoning. This research analyzes commitment and persistence in heterosexual involvement according to styles of loving using a datamining technique as well as statistical methods. In this research, we consider six different styles of loving - 'eros', 'ludus', 'stroge', 'pragma', 'mania' and 'agape' which influence romantic relationships between lovers, besides the factors suggested by the previous researches. These six types of love are defined by Lee (1977) as follows: 'eros' is romantic, passionate love; 'ludus' is a game-playing or uncommitted love; 'storge' is a slow developing, friendship-based love; 'pragma' is a pragmatic, practical, mutually beneficial relationship; 'mania' is an obsessive or possessive love and, lastly, 'agape' is a gentle, caring, giving type of love, brotherly love, not concerned with the self. In order to do this research, data from 105 heterosexual couples were collected. Using the data, a linear regression method was first performed to find out the important factors associated with a commitment to partners. The result shows that 'satisfaction', 'eros' and 'agape' are significant factors associated with the commitment level for both male and female. Interestingly, in male cases, 'agape' has a greater effect on commitment than 'eros'. On the other hand, in female cases, 'eros' is a more significant factor than 'agape' to commitment. In addition to that, 'investment' of the male is also crucial factor for male commitment. Next, decision tree analysis was performed to find out the characteristics of high commitment couples and low commitment couples. In order to build decision tree models in this experiment, 'decision tree' operator in the datamining tool, Rapid Miner was used. The experimental result shows that males having a high satisfaction level in relationship show a high commitment level. However, even though a male may not have a high satisfaction level, if he has made a lot of financial or mental investment in relationship, and his partner shows him a certain amount of 'agape', then he also shows a high commitment level to the female. In the case of female, a women having a high 'eros' and 'satisfaction' level shows a high commitment level. Otherwise, even though a female may not have a high satisfaction level, if her partner shows a certain amount of 'mania' then the female also shows a high commitment level. Finally, this research built a prediction model to establish whether the relationship will persist or break up using a decision tree. The result shows that the most important factor influencing to the break up is a 'narcissistic tendency' of the male. In addition to that, 'satisfaction', 'investment' and 'mania' of both male and female also affect a break up. Interestingly, while the 'mania' level of a male works positively to maintain the relationship, that of a female has a negative influence. The contribution of this research is adopting a new technique of analysis using a datamining method for psychology. In addition, the results of this research can provide useful advice to couples for building a harmonious relationship with each other. This research has several limitations. First, the experimental data was sampled based on oversampling technique to balance the size of each classes. Thus, it has a limitation of evaluating performances of the predictive models objectively. Second, the result data, whether the relationship persists of not, was collected relatively in short periods - 6 months after the initial data collection. Lastly, most of the respondents of the survey is in their 20's. In order to get more general results, we would like to extend this research to general populations.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Nonlinear Vector Alignment Methodology for Mapping Domain-Specific Terminology into General Space (전문어의 범용 공간 매핑을 위한 비선형 벡터 정렬 방법론)

  • Kim, Junwoo;Yoon, Byungho;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.127-146
    • /
    • 2022
  • Recently, as word embedding has shown excellent performance in various tasks of deep learning-based natural language processing, researches on the advancement and application of word, sentence, and document embedding are being actively conducted. Among them, cross-language transfer, which enables semantic exchange between different languages, is growing simultaneously with the development of embedding models. Academia's interests in vector alignment are growing with the expectation that it can be applied to various embedding-based analysis. In particular, vector alignment is expected to be applied to mapping between specialized domains and generalized domains. In other words, it is expected that it will be possible to map the vocabulary of specialized fields such as R&D, medicine, and law into the space of the pre-trained language model learned with huge volume of general-purpose documents, or provide a clue for mapping vocabulary between mutually different specialized fields. However, since linear-based vector alignment which has been mainly studied in academia basically assumes statistical linearity, it tends to simplify the vector space. This essentially assumes that different types of vector spaces are geometrically similar, which yields a limitation that it causes inevitable distortion in the alignment process. To overcome this limitation, we propose a deep learning-based vector alignment methodology that effectively learns the nonlinearity of data. The proposed methodology consists of sequential learning of a skip-connected autoencoder and a regression model to align the specialized word embedding expressed in each space to the general embedding space. Finally, through the inference of the two trained models, the specialized vocabulary can be aligned in the general space. To verify the performance of the proposed methodology, an experiment was performed on a total of 77,578 documents in the field of 'health care' among national R&D tasks performed from 2011 to 2020. As a result, it was confirmed that the proposed methodology showed superior performance in terms of cosine similarity compared to the existing linear vector alignment.

Utilizing the Idle Railway Sites: A Proposal for the Location of Solar Power Plants Using Cluster Analysis (철도 유휴부지 활용방안: 군집분석을 활용한 태양광발전 입지 제안)

  • Eunkyung Kang;Seonuk Yang;Jiyoon Kwon;Sung-Byung Yang
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.79-105
    • /
    • 2023
  • Due to unprecedented extreme weather events such as global warming and climate change, many parts of the world suffer from severe pain, and economic losses are also snowballing. In order to address these problems, 'The Paris Agreement' was signed in 2016, and an intergovernmental consultative body was formed to keep the average temperature rise of the Earth below 1.5℃. Korea also declared 'Carbon Neutrality in 2050' to prevent climate catastrophe. In particular, it was found that the increase in temperature caused by greenhouse gas emissions hurts the environment and society as a whole, as well as the export-dependent economy of Korea. In addition, as the diversification of transportation types is accelerating, the change in means of choice is also increasing. As the development paradigm in the low-growth era changes to urban regeneration, interest in idle railway sites is rising due to reduced demand for routes, improvement of alignment, and relocation of urban railways. Meanwhile, it is possible to partially achieve the solar power generation goal of 'Renewable Energy 3020' by utilizing already developed but idle railway sites and take advantage of being free from environmental damage and resident acceptance issues surrounding the location; but the actual use and plan for these solar power facilities are still lacking. Therefore, in this study, using the big data provided by the Korea National Railway and the Renewable Energy Cloud Platform, we develop an algorithm to discover and analyze suitable idle sites where solar power generation facilities can be installed and identify potentially applicable areas considering conditions desired by users. By searching and deriving these idle but relevant sites, it is intended to devise a plan to save enormous costs for facilities or expansion in the early stages of development. This study uses various cluster analyses to develop an optimal algorithm that can derive solar power plant locations on idle railway sites and, as a result, suggests 202 'actively recommended areas.' These results would help decision-makers make rational decisions from the viewpoint of simultaneously considering the economy and the environment.