• Title/Summary/Keyword: database systems

Search Result 2,864, Processing Time 0.028 seconds

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

Finding Weighted Sequential Patterns over Data Streams via a Gap-based Weighting Approach (발생 간격 기반 가중치 부여 기법을 활용한 데이터 스트림에서 가중치 순차패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.55-75
    • /
    • 2010
  • Sequential pattern mining aims to discover interesting sequential patterns in a sequence database, and it is one of the essential data mining tasks widely used in various application fields such as Web access pattern analysis, customer purchase pattern analysis, and DNA sequence analysis. In general sequential pattern mining, only the generation order of data element in a sequence is considered, so that it can easily find simple sequential patterns, but has a limit to find more interesting sequential patterns being widely used in real world applications. One of the essential research topics to compensate the limit is a topic of weighted sequential pattern mining. In weighted sequential pattern mining, not only the generation order of data element but also its weight is considered to get more interesting sequential patterns. In recent, data has been increasingly taking the form of continuous data streams rather than finite stored data sets in various application fields, the database research community has begun focusing its attention on processing over data streams. The data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. In data stream processing, each data element should be examined at most once to analyze the data stream, and the memory usage for data stream analysis should be restricted finitely although new data elements are continuously generated in a data stream. Moreover, newly generated data elements should be processed as fast as possible to produce the up-to-date analysis result of a data stream, so that it can be instantly utilized upon request. To satisfy these requirements, data stream processing sacrifices the correctness of its analysis result by allowing some error. Considering the changes in the form of data generated in real world application fields, many researches have been actively performed to find various kinds of knowledge embedded in data streams. They mainly focus on efficient mining of frequent itemsets and sequential patterns over data streams, which have been proven to be useful in conventional data mining for a finite data set. In addition, mining algorithms have also been proposed to efficiently reflect the changes of data streams over time into their mining results. However, they have been targeting on finding naively interesting patterns such as frequent patterns and simple sequential patterns, which are found intuitively, taking no interest in mining novel interesting patterns that express the characteristics of target data streams better. Therefore, it can be a valuable research topic in the field of mining data streams to define novel interesting patterns and develop a mining method finding the novel patterns, which will be effectively used to analyze recent data streams. This paper proposes a gap-based weighting approach for a sequential pattern and amining method of weighted sequential patterns over sequence data streams via the weighting approach. A gap-based weight of a sequential pattern can be computed from the gaps of data elements in the sequential pattern without any pre-defined weight information. That is, in the approach, the gaps of data elements in each sequential pattern as well as their generation orders are used to get the weight of the sequential pattern, therefore it can help to get more interesting and useful sequential patterns. Recently most of computer application fields generate data as a form of data streams rather than a finite data set. Considering the change of data, the proposed method is mainly focus on sequence data streams.

Prefetching based on the Type-Level Access Pattern in Object-Relational DBMSs (객체관계형 DBMS에서 타입수준 액세스 패턴을 이용한 선인출 전략)

  • Han, Wook-Shin;Moon, Yang-Sae;Whang, Kyu-Young
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.529-544
    • /
    • 2001
  • Prefetching is an effective method to minimize the number of roundtrips between the client and the server in database management systems. In this paper we propose new notions of the type-level access pattern and the type-level access locality and developed an efficient prefetchin policy based on the notions. The type-level access patterns is a sequence of attributes that are referenced in accessing the objects: the type-level access locality a phenomenon that regular and repetitive type-level access patterns exist. Existing prefetching methods are based on object-level or page-level access patterns, which consist of object0ids of page-ids of the objects accessed. However, the drawback of these methods is that they work only when exactly the same objects or pages are accessed repeatedly. In contrast, even though the same objects are not accessed repeatedly, our technique effectively prefetches objects if the same attributes are referenced repeatedly, i,e of there is type-level access locality. Many navigational applications in Object-Relational Database Management System(ORDBMs) have type-level access locality. Therefore our technique can be employed in ORDBMs to effectively reduce the number of roundtrips thereby significantly enhancing the performance. We have conducted extensive experiments in a prototype ORDBMS to show the effectiveness of our algorithm. Experimental results using the 007 benchmark and a real GIS application show that our technique provides orders of magnitude improvements in the roundtrips and several factors of improvements in overall performance over on-demand fetching and context-based prefetching, which a state-of the art prefetching method. These results indicate that our approach significantly and is a practical method that can be implemented in commercial ORDMSs.

  • PDF

A Study on the Online Newspaper Archive : Focusing on Domestic and International Case Studies (온라인 신문 아카이브 연구 국내외 구축 사례를 중심으로)

  • Song, Zoo Hyung
    • The Korean Journal of Archival Studies
    • /
    • no.48
    • /
    • pp.93-139
    • /
    • 2016
  • Aside from serving as a body that monitors and criticizes the government through reviews and comments on public issues, newspapers can also form and spread public opinion. Metadata contains certain picture records and, in the case of local newspapers, the former is an important means of obtaining locality. Furthermore, advertising in newspapers and the way of editing in newspapers can be viewed as a representation of the times. For the value of archiving in newspapers when a documentation strategy is established, the newspaper is considered as a top priority that should be collected. A newspaper archive that will handle preservation and management carries huge significance in many ways. Journalists use them to write articles while scholars can use a newspaper archive for academic purposes. Also, the NIE is a type of a practical usage of such an archive. In the digital age, the newspaper archive has an important position because it is located in the core of MAM, which integrates and manages the media asset. With this, there are prospects that an online archive will perform a new role in the production of newspapers and the management of publishing companies. Korea Integrated News Database System (KINDS), an integrated article database, began its service in 1991, whereas Naver operates an online newspaper archive called "News Library." Initially, KINDS received an enthusiastic response, but nowadays, the utilization ratio continues to decrease because of the omission of some major newspapers, such as Chosun Ilbo and JoongAng Ilbo, and the numerous user interface problems it poses. Despite these, however, the system still presents several advantages. For example, it is easy to access freely because there is a set budget for the public, and accessibility to local papers is simple. A national library consistently carries out the digitalization of time-honored newspapers. In addition, individual newspaper companies have also started the service, but it is not enough for such to be labeled an archive. In the United States (US), "Chronicling America"-led by the Library of Congress with funding from the National Endowment for the Humanities-is in the process of digitalizing historic newspapers. The universities of each state and historical association provide funds to their public library for the digitalization of local papers. In the United Kingdom, the British Library is constructing an online newspaper archive called "The British Newspaper Archive," but unlike the one in the US, this service charges a usage fee. The Joint Information Systems Committee has also invested in "The British Newspaper Archive," and its construction is still ongoing. ProQuest Archiver and Gale NewsVault are the representative platforms because of their efficiency and how they have established the standardization of newspapers. Now, it is time to change the way we understand things, and a drastic investment is required to improve the domestic and international online newspaper archive.

Rainfall image DB construction for rainfall intensity estimation from CCTV videos: focusing on experimental data in a climatic environment chamber (CCTV 영상 기반 강우강도 산정을 위한 실환경 실험 자료 중심 적정 강우 이미지 DB 구축 방법론 개발)

  • Byun, Jongyun;Jun, Changhyun;Kim, Hyeon-Joon;Lee, Jae Joon;Park, Hunil;Lee, Jinwook
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.6
    • /
    • pp.403-417
    • /
    • 2023
  • In this research, a methodology was developed for constructing an appropriate rainfall image database for estimating rainfall intensity based on CCTV video. The database was constructed in the Large-Scale Climate Environment Chamber of the Korea Conformity Laboratories, which can control variables with high irregularity and variability in real environments. 1,728 scenarios were designed under five different experimental conditions. 36 scenarios and a total of 97,200 frames were selected. Rain streaks were extracted using the k-nearest neighbor algorithm by calculating the difference between each image and the background. To prevent overfitting, data with pixel values greater than set threshold, compared to the average pixel value for each image, were selected. The area with maximum pixel variability was determined by shifting with every 10 pixels and set as a representative area (180×180) for the original image. After re-transforming to 120×120 size as an input data for convolutional neural networks model, image augmentation was progressed under unified shooting conditions. 92% of the data showed within the 10% absolute range of PBIAS. It is clear that the final results in this study have the potential to enhance the accuracy and efficacy of existing real-world CCTV systems with transfer learning.

Assessing forest net primary productivity based on a process-based model: Focusing on pine and oak forest stands in South and North Korea (과정기반 모형을 활용한 산림의 순일차생산성 평가: 남북한 소나무 및 참나무 임분을 중심으로)

  • Cholho Song;Hyun-Ah Choi;Jiwon Son;Youngjin Ko;Stephan A. Pietsch;Woo-Kyun Lee
    • Korean Journal of Environmental Biology
    • /
    • v.41 no.4
    • /
    • pp.400-412
    • /
    • 2023
  • In this study, the biogeochemistry management (BGC-MAN) model was applied to North and South Korea pine and oak forest stands to evaluate the Net Primary Productivity (NPP), an indicator of forest ecosystem productivity. For meteorological information, historical records and East Asian climate scenario data of Shared Socioeconomic Pathways (SSPs) were used. For vegetation information, pine (Pinus densiflora) and oak(Quercus spp.) forest stands were selected at the Gwangneung and Seolmacheon in South Korea and Sariwon, Sohung, Haeju, Jongju, and Wonsan, which are known to have tree nurseries in North Korea. Among the biophysical information, we used the elevation model for topographic data such as longitude, altitude, and slope direction, and the global soil database for soil data. For management factors, we considered the destruction of forests in North and South Korea due to the Korean War in 1950 and the subsequent reforestation process. The overall mean value of simulated NPP from 1991 to 2100 was 5.17 Mg C ha-1, with a range of 3.30-8.19 Mg C ha-1. In addition, increased variability in climate scenarios resulted in variations in forest productivity, with a notable decline in the growth of pine forests. The applicability of the BGC-MAN model to the Korean Peninsula was examined at a time when the ecosystem process-based models were becoming increasingly important due to climate change. In this study, the data on the effects of climate change disturbances on forest ecosystems that was analyzed was limited; therefore, future modeling methods should be improved to simulate more precise ecosystem changes across the Korean Peninsula through process-based models.

A Study on Web-based Technology Valuation System (웹기반 지능형 기술가치평가 시스템에 관한 연구)

  • Sung, Tae-Eung;Jun, Seung-Pyo;Kim, Sang-Gook;Park, Hyun-Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.23-46
    • /
    • 2017
  • Although there have been cases of evaluating the value of specific companies or projects which have centralized on developed countries in North America and Europe from the early 2000s, the system and methodology for estimating the economic value of individual technologies or patents has been activated on and on. Of course, there exist several online systems that qualitatively evaluate the technology's grade or the patent rating of the technology to be evaluated, as in 'KTRS' of the KIBO and 'SMART 3.1' of the Korea Invention Promotion Association. However, a web-based technology valuation system, referred to as 'STAR-Value system' that calculates the quantitative values of the subject technology for various purposes such as business feasibility analysis, investment attraction, tax/litigation, etc., has been officially opened and recently spreading. In this study, we introduce the type of methodology and evaluation model, reference information supporting these theories, and how database associated are utilized, focusing various modules and frameworks embedded in STAR-Value system. In particular, there are six valuation methods, including the discounted cash flow method (DCF), which is a representative one based on the income approach that anticipates future economic income to be valued at present, and the relief-from-royalty method, which calculates the present value of royalties' where we consider the contribution of the subject technology towards the business value created as the royalty rate. We look at how models and related support information (technology life, corporate (business) financial information, discount rate, industrial technology factors, etc.) can be used and linked in a intelligent manner. Based on the classification of information such as International Patent Classification (IPC) or Korea Standard Industry Classification (KSIC) for technology to be evaluated, the STAR-Value system automatically returns meta data such as technology cycle time (TCT), sales growth rate and profitability data of similar company or industry sector, weighted average cost of capital (WACC), indices of industrial technology factors, etc., and apply adjustment factors to them, so that the result of technology value calculation has high reliability and objectivity. Furthermore, if the information on the potential market size of the target technology and the market share of the commercialization subject refers to data-driven information, or if the estimated value range of similar technologies by industry sector is provided from the evaluation cases which are already completed and accumulated in database, the STAR-Value is anticipated that it will enable to present highly accurate value range in real time by intelligently linking various support modules. Including the explanation of the various valuation models and relevant primary variables as presented in this paper, the STAR-Value system intends to utilize more systematically and in a data-driven way by supporting the optimal model selection guideline module, intelligent technology value range reasoning module, and similar company selection based market share prediction module, etc. In addition, the research on the development and intelligence of the web-based STAR-Value system is significant in that it widely spread the web-based system that can be used in the validation and application to practices of the theoretical feasibility of the technology valuation field, and it is expected that it could be utilized in various fields of technology commercialization.

Development of Local Animal BLAST Search System Using Bioinformatics Tools (생물정보시스템을 이용한 Local Animal BLAST Search System 구축)

  • Kim, Byeong-Woo;Lee, Geun-Woo;Kim, Hyo-Seon;No, Seung-Hui;Lee, Yun-Ho;Kim, Si-Dong;Jeon, Jin-Tae;Lee, Ji-Ung;Jo, Yong-Min;Jeong, Il-Jeong;Lee, Jeong-Gyu
    • Bioinformatics and Biosystems
    • /
    • v.1 no.2
    • /
    • pp.99-102
    • /
    • 2006
  • The Basic Local Alignment Search Tool (BLAST) is one of the most established software in bioinformatics research and it compares a query sequence against the libraries of known sequences in order to investigate sequence similarity. Expressed Sequence Tags (ESTs) are single-pass sequence reads from mRNA (or cDNA) and represent the expression for a given cDNA library and the snapshot of genes expressed in a given tissue and/or at a given developmental stage. Therefore, ESTs can be very valuable information for functional genomics and bioinformatics researches. Although major bio database (DB) websites including NCBI are providing BLAST services and EST data, local DB and search system is demanding for better performance and security issue. Here we present animal EST DBs and local BLAST search system. The animal ESTs DB in NCBI Genbank were divided by animal species using the Perl script we developed. and we also built the new extended DB search systems fur the new data (Local Animal BLAST Search System: http://bioinfo.kohost.net), which was constructed on the high-capacity PC Cluster system fur the best performance. The new local DB contains 650,046 sequences for Bos taurus(cattle), 368,120 sequences for Sus scrofa (pig), 693,005 sequences for Gallus gallus (fowl), respectively.

  • PDF

Development of an Object-Oriented Framework Data Update System (객체 기반의 기본지리정보 갱신시스템 개발)

  • Lee, Jin-Soo;Choi, Yun-Soo;Seo, Chang-Wan;Jeon, Chang-Dong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.11 no.1
    • /
    • pp.31-44
    • /
    • 2008
  • The 1st phase framework data implementation of National Geographic Information Systems (NGIS) used 1:5,000 digital map with 5 years updating period which is lacking in the latest information. This is a significant factor which hinders the use of framework data. This study proposed the efficient technical method of a location based object data management and system implementation for updating framework data. First, we did an object-oriented data modeling and database design using a location based features identifier(UFID: Unique Feature IDentifier). The second, we developed the system with various functions such as a location based UFID creation, input and output, a spatial and attribute data editing, an object based data processing using UML(Unified Modeling Language). Finally, we applied the system to the study area and got high quality data of 99% accuracy and 35% benefit effect of personnel expenses compare to the previous method. We expect that this study can contribute to the maintenance of national framework data as well as the revitalization of various GIS markets by providing user the latest framework data and that we can develop the methods of a feature-change modeling and monitoring using an object based data management.

  • PDF

Questionnaire on Marine Safety and Vessel Traffic Services in Philippine Coastal Waters (Part 1) (필리핀 연안수역의 선박교통관제서비스와 해양안전에 관한 설문조사 (Part 1))

  • Dimailig, Orlando S.;Jeong, Jae-Yong;Kim, Chol-Seong
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.19 no.2
    • /
    • pp.171-178
    • /
    • 2013
  • This paper presents the Part 1 of the Questionnaire Survey on Marine Safety and VTS in the Philippine Coastal Waters. This part deals with respondents profiles; experiences onboard and ashore; familiar areas; and their subjective perception of marine risks- by factors and by areas. The subjects are chosen from different regions nationwide with connection and/or with maritime background. There are 202 responses returned and these are put into a database for analysis made through Excel programs and statistics references. The result of the nationwide responses show that 97 % of respondents have shipboard experiences onboard of different ships' types and sizes; and 88 % are directly involved in the navigation of ships. Risk Perception levels - by factors and by familiar areas - show a higher risk degree in the 3rd level ('Sometimes Increases Risks') and 4th level ('Often Increases Risk') in each respondents' response indices. The study finds that the most risky factor is the "Violation of Rules and Regulations" which has a high risk at 5th level (Very Often Increases Risk), and for the over-all familiar areas, the Manila Bay area (NCR region) garners the most risky perception, also, at the 5th level. It is, therefore, recommended by this paper to conduct a comprehensive review of the rules and regulations viable in each locality; strengthening the maritime traffic systems, structures and educating the stake-holders specifically in Manila Bay area and other busy waterways of the country. The ultimate goal of this paper is to gather information, analyze these data and develop a set of tools and techniques to be utilized as a guide in the improvement and development of maritime traffic safety in the country.