• Title/Summary/Keyword: Supervised classification

Search Result 422, Processing Time 0.022 seconds

A Detection Model using Labeling based on Inference and Unsupervised Learning Method (추론 및 비교사학습 기법 기반 레이블링을 적용한 탐지 모델)

  • Hong, Sung-Sam;Kim, Dong-Wook;Kim, Byungik;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.18 no.1
    • /
    • pp.65-75
    • /
    • 2017
  • The Detection Model is the model to find the result of a certain purpose using artificial intelligent, data mining, intelligent algorithms In Cyber Security, it usually uses to detect intrusion, malwares, cyber incident, and attacks etc. There are an amount of unlabeled data that are collected in a real environment such as security data. Since the most of data are not defined the class labels, it is difficult to know type of data. Therefore, the label determination process is required to detect and analysis with accuracy. In this paper, we proposed a KDFL(K-means and D-S Fusion based Labeling) method using D-S inference and k-means(unsupervised) algorithms to decide label of data records by fusion, and a detection model architecture using a proposed labeling method. A proposed method has shown better performance on detection rate, accuracy, F1-measure index than other methods. In addition, since it has shown the improved results in error rate, we have verified good performance of our proposed method.

Detection of Small Green Space in an Urban Area Using Airborne Hyperspectral Imagery and Spectral Angle Mapper (분광각매퍼 기법을 적용한 항공기 탑재 초분광영상의 소규모 녹지공간 탐지)

  • Kim, Tae-Woo;Choi, Don-Jeong;We, Gwang-Jae;Suh, Yong-Cheol
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.16 no.2
    • /
    • pp.88-100
    • /
    • 2013
  • Urban green space is one of most important aspects of urban infrastructure for improving the quality of life of city dwellers as it reduces the heat island effect and is used for recreation and relaxation. However, no systematic management of urban green space has been introduced in Korea as past practices focused on efficient development. A way to calculate the amount of green space needed to complement an urban area must be developed to preserve urban green space and to determine 'regulations determining the total amount of greenery'. In recent years, various studies have quantified urban green space and infrastructure using remotely sensed data. However, it is difficult to detect a myriad small green spaces in a city effectively when considering the spatial resolution of the data used in existing research. In this paper, we quantified small urban green spaces using CASI-1500 hyperspectral imagery. We calculated MCARI, a vegetation index for hyperspectral imagery, to evaluate the greenness of small green spaces. In addition, we applied image-classification methods, including the ISODATA algorithm and Spectral Angle Mapper, to detect small green spaces using supervised and unsupervised classifications. This could be used to categorize land-cover into four classes: unclassified, impervious, suspected green, and vegetation green.

A Novel of Data Clustering Architecture for Outlier Detection to Electric Power Data Analysis (전력데이터 분석에서 이상점 추출을 위한 데이터 클러스터링 아키텍처에 관한 연구)

  • Jung, Se Hoon;Shin, Chang Sun;Cho, Young Yun;Park, Jang Woo;Park, Myung Hye;Kim, Young Hyun;Lee, Seung Bae;Sim, Chun Bo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.465-472
    • /
    • 2017
  • In the past, researchers mainly used the supervised learning technique of machine learning to analyze power data and investigated the identification of patterns through the data mining technique. Data analysis research, however, faces its limitations with the old data classification and analysis techniques today when the size of electric power data has increased with the possible real-time provision of data. This study thus set out to propose a clustering architecture to analyze large-sized electric power data. The clustering process proposed in the study supplements the K-means algorithm, an unsupervised learning technique, for its problems and is capable of automating the entire process from the collection of electric power data to their analysis. In the present study, power data were categorized and analyzed in total three levels, which include the row data level, clustering level, and user interface level. In addition, the investigator identified K, the ideal number of clusters, based on principal component analysis and normal distribution and proposed an altered K-means algorithm to reduce data that would be categorized as ideal points in order to increase the efficiency of clustering.

GIS.RS-based Estimation of Carbon Dioxide Absorption and Bioenergy Supply Potential of Forest - Focused on Muju County, Jeonbuk - (GIS.RS기반 산림의 이산화탄소 흡수량 및 바이오에너지 공급 잠재량 추정 - 전북 무주군을 중심으로 -)

  • Kim, Hyun;Kim, Hyun-Jun;Choi, Soo-Min;Kang, Hag-Mo;Lee, Sang-Hyun
    • Journal of agriculture & life science
    • /
    • v.45 no.1
    • /
    • pp.21-32
    • /
    • 2011
  • This study was conducted to estimate carbon dioxide $(CO_{2})$ absorption and bioenergy supply potential of forests in Muju county based on GIS RS In results, it was estimated that 7,800,130 $tCO_{2}$ was absorbed and all bioenergy supply potential of 11,868,202,837 Mcal was available. Futhermore, bioenergy supply potential of 314,876,637 Mcal was available each year that was able to be supplied for the hitting during winter period to 11,241 households. This was more than all households of 10,902 in Muju county. This study suggested the methodology for estimating $CO_{2}$ absorption and bioenergy supply potential of forests on the national scale, and it was believed that reliability would be increased by estimation on the national scale using detailed forest information based on the latest techniques such as GIS RS techniques.

Predicting Program Code Changes Using a CNN Model (CNN 모델을 이용한 프로그램 코드 변경 예측)

  • Kim, Dong Kwan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.9
    • /
    • pp.11-19
    • /
    • 2021
  • A software system is required to change during its life cycle due to various requirements such as adding functionalities, fixing bugs, and adjusting to new computing environments. Such program code modification should be considered as carefully as a new system development becase unexpected software errors could be introduced. In addition, when reusing open source programs, we can expect higher quality software if code changes of the open source program are predicted in advance. This paper proposes a Convolutional Neural Network (CNN)-based deep learning model to predict source code changes. In this paper, the prediction of code changes is considered as a kind of a binary classification problem in deep learning and labeled datasets are used for supervised learning. Java projects and code change logs are collected from GitHub for training and testing datasets. Software metrics are computed from the collected Java source code and they are used as input data for the proposed model to detect code changes. The performance of the proposed model has been measured by using evaluation metrics such as precision, recall, F1-score, and accuracy. The experimental results show the proposed CNN model has achieved 95% in terms of F1-Score and outperformed the multilayer percept-based DNN model whose F1-Score is 92%.

Analysis of Tidal Channel Variations Using High Spatial Resolution Multispectral Satellite Image in Sihwa Reclaimed Land, South Korea (고해상도 다분광 인공위성영상자료 기반 시화 간척지 갯골 변화 양상 분석)

  • Jeong, Yongsik;Lee, Kwang-Jae;Chae, Tae-Byeong;Yu, Jaehyung
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_2
    • /
    • pp.1605-1613
    • /
    • 2020
  • The tidal channel is a coastal sedimentary terrain that plays the most important role in the formation and development of tidal flats, and is considered a very important index for understanding and distribution of tidal flat sedimentation/erosion terrain. The purpose of this study is to understand the changes in tidal channels by a period after the opening of the floodgate of the seawall in the reclaimed land of Sihwa Lake using KOMPSAT high-resolution multispectral satellite image data and to evaluate the applicability and efficiency of high-resolution satellite images. KOMPSAT 2 and 3 images were used for extraction of the tidal channels' lineaments in 2009, 2014, and 2019 and were applied to supervised classification method based on Principal Component Analysis (PCA), Artificial Neural Net (ANN), Matched Filtering (MF), and Spectral Angle Mapper (SAM) and band ratio techniques using Normalized Difference Water Index (NDWI) and MF/SAM. For verification, a numerical map of the National Geographic Information Service and Landsat 7 ETM+ image data were utilized. As a result, KOMPSAT data showed great agreement with the verification data compared to the Landsat 7 images for detecting a direction and distribution pattern of the tidal channels. However, it has been confirmed that there will be limitations in identifying the distribution of tidal channels' density and providing meaningful information related to the development of the sedimentary process. This research is expected to present the possibility of utilizing KOMPSAT image-based high-resolution remote exploration as a way of responding to domestic intertidal environmental issues, and to be used as basic research for providing multi-platform-image-based convergent thematic maps and topics.

Bankruptcy Type Prediction Using A Hybrid Artificial Neural Networks Model (하이브리드 인공신경망 모형을 이용한 부도 유형 예측)

  • Jo, Nam-ok;Kim, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.79-99
    • /
    • 2015
  • The prediction of bankruptcy has been extensively studied in the accounting and finance field. It can have an important impact on lending decisions and the profitability of financial institutions in terms of risk management. Many researchers have focused on constructing a more robust bankruptcy prediction model. Early studies primarily used statistical techniques such as multiple discriminant analysis (MDA) and logit analysis for bankruptcy prediction. However, many studies have demonstrated that artificial intelligence (AI) approaches, such as artificial neural networks (ANN), decision trees, case-based reasoning (CBR), and support vector machine (SVM), have been outperforming statistical techniques since 1990s for business classification problems because statistical methods have some rigid assumptions in their application. In previous studies on corporate bankruptcy, many researchers have focused on developing a bankruptcy prediction model using financial ratios. However, there are few studies that suggest the specific types of bankruptcy. Previous bankruptcy prediction models have generally been interested in predicting whether or not firms will become bankrupt. Most of the studies on bankruptcy types have focused on reviewing the previous literature or performing a case study. Thus, this study develops a model using data mining techniques for predicting the specific types of bankruptcy as well as the occurrence of bankruptcy in Korean small- and medium-sized construction firms in terms of profitability, stability, and activity index. Thus, firms will be able to prevent it from occurring in advance. We propose a hybrid approach using two artificial neural networks (ANNs) for the prediction of bankruptcy types. The first is a back-propagation neural network (BPN) model using supervised learning for bankruptcy prediction and the second is a self-organizing map (SOM) model using unsupervised learning to classify bankruptcy data into several types. Based on the constructed model, we predict the bankruptcy of companies by applying the BPN model to a validation set that was not utilized in the development of the model. This allows for identifying the specific types of bankruptcy by using bankruptcy data predicted by the BPN model. We calculated the average of selected input variables through statistical test for each cluster to interpret characteristics of the derived clusters in the SOM model. Each cluster represents bankruptcy type classified through data of bankruptcy firms, and input variables indicate financial ratios in interpreting the meaning of each cluster. The experimental result shows that each of five bankruptcy types has different characteristics according to financial ratios. Type 1 (severe bankruptcy) has inferior financial statements except for EBITDA (earnings before interest, taxes, depreciation, and amortization) to sales based on the clustering results. Type 2 (lack of stability) has a low quick ratio, low stockholder's equity to total assets, and high total borrowings to total assets. Type 3 (lack of activity) has a slightly low total asset turnover and fixed asset turnover. Type 4 (lack of profitability) has low retained earnings to total assets and EBITDA to sales which represent the indices of profitability. Type 5 (recoverable bankruptcy) includes firms that have a relatively good financial condition as compared to other bankruptcy types even though they are bankrupt. Based on the findings, researchers and practitioners engaged in the credit evaluation field can obtain more useful information about the types of corporate bankruptcy. In this paper, we utilized the financial ratios of firms to classify bankruptcy types. It is important to select the input variables that correctly predict bankruptcy and meaningfully classify the type of bankruptcy. In a further study, we will include non-financial factors such as size, industry, and age of the firms. Thus, we can obtain realistic clustering results for bankruptcy types by combining qualitative factors and reflecting the domain knowledge of experts.

Analysis on the Spatial Characteristics Caused by the Cropland Increase Using Multitemporal Landsat Images in Lower Reach of Duman River, Northeast Korea (다시기 위성영상을 이용한 두만강 하류지역의 농경지 개간의 공간적 특성분석)

  • Lee, Min-Boo;Han, Uk;Kim, Nam-Shin;Han, Ju-Youn;Shin, Keun-Ha;Kang, Chul-Sung
    • Journal of the Korean Geographical Society
    • /
    • v.38 no.4
    • /
    • pp.630-639
    • /
    • 2003
  • This study aims to analysis the distribution and change of cropland and forest, the Onseong, Saebyeol, and Eundeok counties on the lower reach of Duman(Tumen) river, northeast Korea, using 1992 year Landsat TM data, 2000 year Landsat ETM data, and digital terrain elevation data(DTED). Land cover and land use of the study areas are classified into cropland, forest, village, and water body, using the supervised classification method including 1:50,000 DTED analysis, image band composition, and principal component analysis(PCA). Results of quantitative analysis present that each growth rate of cropland of Onseong and Eundeok are 22.8% and 14.7% corresponding to decreasing rates of forest, 8% and 13.6% during 8 years from 1992 to 2000. In Onseong, Saebyeol, and Eundeok, each values of mean elevations and slope gradients increased to 192m, 95m, and 91m from 157m, 85m, and 78m, and to 6.6$^{\circ}$, 3.0$^{\circ}$, and 4.4$^{\circ}$ from 5.2$^{\circ}$, 2.5$^{\circ}$, and 3.0$^{\circ}$. Especially, in case of newly developed cropland, the values of mean elevation and mean gradient have 225m, 122m, and 127m, and 9.4$^{\circ}$, 5.1$^{\circ}$, and 8.0$^{\circ}$, in above three regions. These new croplands were developing along to deeper valleys and toward lower hill and mountain slope up to knickpoint zone of gradient change. Deforested lands for cropland have formed irregular pattern of patch-type, and become sources for the sheet erosion, rilling and gulleying in mountain slope and sedimentation in local river channel. Though there were no field checking, analysis using landsat images and GIS mapping can help understand actual environmental problems relating to cropland development of mountain slope in North Korea.

Estimation of Soil Loss Due to Cropland Increase in Hoeryeung, Northeast Korea (북한 회령지역의 농경지 변화에 따른 토양침식 추정)

  • Lee, Min-Boo;Kim, Nam-Shin;Kang, Chul-Sung;Shin, Keun-Ha;Choe, Han-Sung;Han, Uk
    • Journal of the Korean association of regional geographers
    • /
    • v.9 no.3
    • /
    • pp.373-384
    • /
    • 2003
  • This study analyses the soil loss due to cropland increase in the Hoeryeung area of northeast Korea, using Landsat images of 1987 TM and 2001 ETM, together with DTED, soil and geological maps, and rainfall data of 20 years. Items of land cover and land use were categorized as cropland, settlement, forest, river zone, and sand deposit by supervised classification with spectral bands 1, 2 and 3. RUSLE model is used for estimation of soil loss, and AML language for calculation of soil loss volumes. Fourier transformation method is used for unification of the geographical grids between Landsat images and DTED. GTD was selected from 1:50,000 topographic map. Main sources of soil losses over 100 ton/year may be the river zone and settlement in the both times of 1987 and 2001, but the image of the 2001 shows that sources areas have developed up to the higher mountain slopes. In the cropland average, increases of hight and gradient are 24m and $0.8^{\circ}$ from 1987 to 2001. In the case of new developed cropland, average increases are 75m and $2.5^{\circ}$, and highest soil loss has occurred at the elevation between 300 and 500m. The soil loss 57 ton of 1987 year increased 85 ton of 2001 year. Soil loss is highest in $30{\sim}50^{\circ}$ slope zones in both years, but in 2001 year, soil loss increased under $30^{\circ}$ zones. The size of area over 200 ton/year, indicating higher risk of landslides, have increased from $28.6km^2$ of 1987 year to $48.8km^2$ of 2001 year.

  • PDF

Performance of Investment Strategy using Investor-specific Transaction Information and Machine Learning (투자자별 거래정보와 머신러닝을 활용한 투자전략의 성과)

  • Kim, Kyung Mock;Kim, Sun Woong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.65-82
    • /
    • 2021
  • Stock market investors are generally split into foreign investors, institutional investors, and individual investors. Compared to individual investor groups, professional investor groups such as foreign investors have an advantage in information and financial power and, as a result, foreign investors are known to show good investment performance among market participants. The purpose of this study is to propose an investment strategy that combines investor-specific transaction information and machine learning, and to analyze the portfolio investment performance of the proposed model using actual stock price and investor-specific transaction data. The Korea Exchange offers daily information on the volume of purchase and sale of each investor to securities firms. We developed a data collection program in C# programming language using an API provided by Daishin Securities Cybosplus, and collected 151 out of 200 KOSPI stocks with daily opening price, closing price and investor-specific net purchase data from January 2, 2007 to July 31, 2017. The self-organizing map model is an artificial neural network that performs clustering by unsupervised learning and has been introduced by Teuvo Kohonen since 1984. We implement competition among intra-surface artificial neurons, and all connections are non-recursive artificial neural networks that go from bottom to top. It can also be expanded to multiple layers, although many fault layers are commonly used. Linear functions are used by active functions of artificial nerve cells, and learning rules use Instar rules as well as general competitive learning. The core of the backpropagation model is the model that performs classification by supervised learning as an artificial neural network. We grouped and transformed investor-specific transaction volume data to learn backpropagation models through the self-organizing map model of artificial neural networks. As a result of the estimation of verification data through training, the portfolios were rebalanced monthly. For performance analysis, a passive portfolio was designated and the KOSPI 200 and KOSPI index returns for proxies on market returns were also obtained. Performance analysis was conducted using the equally-weighted portfolio return, compound interest rate, annual return, Maximum Draw Down, standard deviation, and Sharpe Ratio. Buy and hold returns of the top 10 market capitalization stocks are designated as a benchmark. Buy and hold strategy is the best strategy under the efficient market hypothesis. The prediction rate of learning data using backpropagation model was significantly high at 96.61%, while the prediction rate of verification data was also relatively high in the results of the 57.1% verification data. The performance evaluation of self-organizing map grouping can be determined as a result of a backpropagation model. This is because if the grouping results of the self-organizing map model had been poor, the learning results of the backpropagation model would have been poor. In this way, the performance assessment of machine learning is judged to be better learned than previous studies. Our portfolio doubled the return on the benchmark and performed better than the market returns on the KOSPI and KOSPI 200 indexes. In contrast to the benchmark, the MDD and standard deviation for portfolio risk indicators also showed better results. The Sharpe Ratio performed higher than benchmarks and stock market indexes. Through this, we presented the direction of portfolio composition program using machine learning and investor-specific transaction information and showed that it can be used to develop programs for real stock investment. The return is the result of monthly portfolio composition and asset rebalancing to the same proportion. Better outcomes are predicted when forming a monthly portfolio if the system is enforced by rebalancing the suggested stocks continuously without selling and re-buying it. Therefore, real transactions appear to be relevant.