• Title/Summary/Keyword: dataset visualization

Search Result 75, Processing Time 0.026 seconds

A Comparative Analysis of Ensemble Learning-Based Classification Models for Explainable Term Deposit Subscription Forecasting (설명 가능한 정기예금 가입 여부 예측을 위한 앙상블 학습 기반 분류 모델들의 비교 분석)

  • Shin, Zian;Moon, Jihoon;Rho, Seungmin
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.97-117
    • /
    • 2021
  • Predicting term deposit subscriptions is one of representative financial marketing in banks, and banks can build a prediction model using various customer information. In order to improve the classification accuracy for term deposit subscriptions, many studies have been conducted based on machine learning techniques. However, even if these models can achieve satisfactory performance, utilizing them is not an easy task in the industry when their decision-making process is not adequately explained. To address this issue, this paper proposes an explainable scheme for term deposit subscription forecasting. For this, we first construct several classification models using decision tree-based ensemble learning methods, which yield excellent performance in tabular data, such as random forest, gradient boosting machine (GBM), extreme gradient boosting (XGB), and light gradient boosting machine (LightGBM). We then analyze their classification performance in depth through 10-fold cross-validation. After that, we provide the rationale for interpreting the influence of customer information and the decision-making process by applying Shapley additive explanation (SHAP), an explainable artificial intelligence technique, to the best classification model. To verify the practicality and validity of our scheme, experiments were conducted with the bank marketing dataset provided by Kaggle; we applied the SHAP to the GBM and LightGBM models, respectively, according to different dataset configurations and then performed their analysis and visualization for explainable term deposit subscriptions.

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Kim, SungJin;Choi, NakJin;Lee, JunDong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.105-112
    • /
    • 2021
  • In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP's data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

Detecting Common Weakness Enumeration(CWE) Based on the Transfer Learning of CodeBERT Model (CodeBERT 모델의 전이 학습 기반 코드 공통 취약점 탐색)

  • Chansol Park;So Young Moon;R. Young Chul Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.10
    • /
    • pp.431-436
    • /
    • 2023
  • Recently the incorporation of artificial intelligence approaches in the field of software engineering has been one of the big topics. In the world, there are actively studying in two directions: 1) software engineering for artificial intelligence and 2) artificial intelligence for software engineering. We attempt to apply artificial intelligence to software engineering to identify and refactor bad code module areas. To learn the patterns of bad code elements well, we must have many datasets with bad code elements labeled correctly for artificial intelligence in this task. The current problems have insufficient datasets for learning and can not guarantee the accuracy of the datasets that we collected. To solve this problem, when collecting code data, bad code data is collected only for code module areas with high-complexity, not the entire code. We propose a method for exploring common weakness enumeration by learning the collected dataset based on transfer learning of the CodeBERT model. The CodeBERT model learns the corresponding dataset more about common weakness patterns in code. With this approach, we expect to identify common weakness patterns more accurately better than one in traditional software engineering.

Implementation of Saemangeum Coastal Environmental Information System Using GIS (지리정보시스템을 이용한 새만금 해양환경정보시스템 구축)

  • Kim, Jin-Ah;Kim, Chang-Sik;Park, Jin-Ah
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.14 no.4
    • /
    • pp.128-136
    • /
    • 2011
  • To monitor and predict the change of coastal environment according to the construction of Saemangeum sea dyke and the development of land reclamation, we have done real-time and periodic ocean observation and numerical simulation since 2002. Saemangeum coastal environmental data can be largely classified to marine meteorology, ocean physics and circulation, water quality, marine geology and marine ecosystem and each part of data has been generated continuously and accumulated over about 10 years. The collected coastal environmental data are huge amounts of heterogeneous dataset and have some characteristics of multi-dimension, multivariate and spatio-temporal distribution. Thus the implementation of information system possible to data collection, processing, management and service is necessary. In this study, through the implementation of Saemangeum coastal environmental information system using geographic information system, it enables the integral data collection and management and the data querying and analysis of enormous and high-complexity data through the design of intuitive and effective web user interface and scientific data visualization using statistical graphs and thematic cartography. Furthermore, through the quantitative analysis of trend changed over long-term by the geo-spatial analysis with geo- processing, it's being used as a tool for provide a scientific basis for sustainable development and decision support in Saemangeum coast. Moreover, for the effective web-based information service, multi-level map cache, multi-layer architecture and geospatial database were implemented together.

Development of the KnowledgeMatrix as an Informetric Analysis System (계량정보분석시스템으로서의 KnowledgeMatrix 개발)

  • Lee, Bang-Rae;Yeo, Woon-Dong;Lee, June-Young;Lee, Chang-Hoan;Kwon, Oh-Jin;Moon, Yeong-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.1
    • /
    • pp.68-74
    • /
    • 2008
  • Application areas of Knowledge Discovery in Database(KDD) have been expanded to many R&D management processes including technology trends analysis, forecasting and evaluation etc. Established research field such as informetrics (or scientometrics) has utilized techniques or methods of KDD. Various systems have been developed to support works of analyzing large-scale R&D related databases such as patent DB or bibliographic DB by a few researchers or institutions. But extant systems have some problems for korean users to use. Their prices is not moderate, korean language processing is impossible, and user's demands not reflected. To solve these problems, Korea Institute of Science and Technology Information(KISTI) developed stand-alone type information analysis system named as KnowledgeMatrix. KnowledgeMatrix system offer various functions to analyze retrieved data set from databases. KnowledgeMatrix's main operation unit is composed of user-defined lists and matrix generation, cluster analysis, visualization, data pre-processing. Matrix generation unit help extract information items which will be analyzed, and calculate occurrence, co-occurrence, proximity of the items. Cluster analysis unit enable matrix data to be clustered by hierarchical or non-hierarchical clustering methods and present tree-type structure of clustered data. Visualization unit offer various methods such as chart, FDP, strategic diagram and PFNet. Data pre-processing unit consists of data import editor, string editor, thesaurus editor, grouping method, field-refining methods and sub-dataset generation methods. KnowledgeMatrix show better performances and offer more various functions than extant systems.

SuperDepthTransfer: Depth Extraction from Image Using Instance-Based Learning with Superpixels

  • Zhu, Yuesheng;Jiang, Yifeng;Huang, Zhuandi;Luo, Guibo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.4968-4986
    • /
    • 2017
  • In this paper, we primarily address the difficulty of automatic generation of a plausible depth map from a single image in an unstructured environment. The aim is to extrapolate a depth map with a more correct, rich, and distinct depth order, which is both quantitatively accurate as well as visually pleasing. Our technique, which is fundamentally based on a preexisting DepthTransfer algorithm, transfers depth information at the level of superpixels. This occurs within a framework that replaces a pixel basis with one of instance-based learning. A vital superpixels feature enhancing matching precision is posterior incorporation of predictive semantic labels into the depth extraction procedure. Finally, a modified Cross Bilateral Filter is leveraged to augment the final depth field. For training and evaluation, experiments were conducted using the Make3D Range Image Dataset and vividly demonstrate that this depth estimation method outperforms state-of-the-art methods for the correlation coefficient metric, mean log10 error and root mean squared error, and achieves comparable performance for the average relative error metric in both efficacy and computational efficiency. This approach can be utilized to automatically convert 2D images into stereo for 3D visualization, producing anaglyph images that are visually superior in realism and simultaneously more immersive.

CONSTRUCTION OF DATABASE FOR THE DIGITIZED SKY SURVEY I DATA (DIGITIZED SKY SURVEY I 자료의 검색 DB 구축)

  • Sung, Hyun-Il;Sang, Jian;Kim, Sang-Chul;Kim, Bong-Gyu;Yim, In-Sung;Ahn, Young-Suk;Sohn, Sang-Mo-Tony;Yang, Hong-Jin
    • Publications of The Korean Astronomical Society
    • /
    • v.20 no.1 s.24
    • /
    • pp.55-62
    • /
    • 2005
  • The First Generation Digitized Sky Survey (DSS-I) is a collection of digitized photographic atlases of the night sky taken from the Palomar Observatory (northen sky) and the Anglo-Australian Observatory (southern sky). DSS-I is widely used by the astronomical community for a number of applications including object cross-identification and astrometry. However, accessing and retrieving the actual images are nontrivial owing to the huge size (> 60 GB) of the dataset. To facilitate retrieval process of DSS-I data for the public, Korean Astronomical Data Center (KADC) developed a web application that provides not only data retrieval but also visualization functions. The web application consists of several modules developed using Java Applet, Jave Servlet, and JaveServer Pages (JSP) technologies. It allows users to retrieve images efficiently in various formats such as FITS, JPEG, GIF, and TIFF, and also offers an interactive visulization tool, ImgViewer, for displaying/analyzing FITS images. To use the web application, users require a Java-enabled web browser.

Acceleration of GPU-based Volume Rendering Using Vertex Splitting (정점분할을 이용한 GPU 기반 볼륨 렌더링의 가속 기법)

  • Yoo, Seong-Yeol;Lee, Eun-Seok;Shin, Byeong-Seok
    • Journal of Korea Game Society
    • /
    • v.12 no.2
    • /
    • pp.53-62
    • /
    • 2012
  • Visualizing a volume dataset with ray-casting which of visualization methods provides high quality image. However it spends too much time for rendering because the size of volume data are huge. Recently, various researches have been proposed to accelerate GPU-based volume rendering to solve these problems. In this paper, we propose an efficient GPU-based empty space skipping to accelerate volume ray-casting using octree traversal. This method creates min-max octree and searches empty space using vertex splitting. It minimizes the bounding polyhedron by eliminating empty space found in the octree traveral step. The rendering results of our method are identical to those of previous GPU-based volume ray-casting, with the advantage of faster run-time because of using minimized bounding polyhedron.

Adaptive Load Balancing Scheme using a Combination of Hierarchical Data Structures and 3D Clustering for Parallel Volume Rendering on GPU Clusters (계층 자료구조의 결합과 3차원 클러스터링을 이용하여 적응적으로 부하 균형된 GPU-클러스터 기반 병렬 볼륨 렌더링)

  • Lee Won-Jong;Park Woo-Chan;Han Tack-Don
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.1_2
    • /
    • pp.1-14
    • /
    • 2006
  • Sort-last parallel rendering using a cluster of GPUs has been widely used as an efficient method for visualizing large- scale volume datasets. The performance of this method is constrained by load balancing when data parallelism is included. In previous works static partitioning could lead to self-balance when only task level parallelism is included. In this paper, we present a load balancing scheme that adapts to the characteristic of volume dataset when data parallelism is also employed. We effectively combine the hierarchical data structures (octree and BSP tree) in order to skip empty regions and distribute workload to corresponding rendering nodes. Moreover, we also exploit a 3D clustering method to determine visibility order and save the AGP bandwidths on each rendering node. Experimental results show that our scheme can achieve significant performance gains compared with traditional static load distribution schemes.

Estimation of Aerosol Vertical Profile from the MODIS Aerosol Optical Thickness and Surface Visibility Data (MODIS 에어러솔 광학두께와 지상에서 관측된 시정거리를 이용한 대기 에어러솔 연직분포 산출)

  • Lee, Kwon-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.16 no.2
    • /
    • pp.141-151
    • /
    • 2013
  • This study presents a modeling of aerosol extinction vertical profiles in Korea by using the Moderate Resolution Imaging Spectro-radiometer(MODIS) derived aerosol optical thickness(AOT) and ground based visibility observation data. The method uses a series of physical equations for the derivation of aerosol scale height and vertical profiles from MODIS AOT and surface visibility data. The modelled results under the standard atmospheric condition showed small differences with the standard aerosol vertical profile used in the radiative transfer model. Model derived aerosol scale heights for two cases of clean(${\tau}_{MODIS}=0.12{\pm}0.07$, visibility=$21.13{\pm}3.31km$) and hazy atmosphere(${\tau}_{MODIS}=1.71{\pm}0.85$, visibility=$13.33{\pm}5.66km$) are $0.63{\pm}0.33km$ and $1.71{\pm}0.84km$. Based on these results, aerosol extinction profiles can be estimated and the results are transformed into the KML code for visualization of dataset. This has implications for atmospheric environmental monitoring and environmental policies for the future.