• Title/Summary/Keyword: Database Mining

Search Result 572, Processing Time 0.023 seconds

Performance Comparison of Clustering using Discritization Algorithm (이산화 알고리즘을 이용한 계층적 클러스터링의 실험적 성능 평가)

  • Won, Jae Kang;Lee, Jeong Chan;Jung, Yong Gyu;Lee, Young Ho
    • Journal of Service Research and Studies
    • /
    • v.3 no.2
    • /
    • pp.53-60
    • /
    • 2013
  • Datamining from the large data in the form of various techniques for obtaining information have been developed. In recent years one of the most sought areas of pattern recognition and machine learning method is created with most of existing learning algorithms based on categorical attributes to a rule or decision model. However, the real-world data, it may consist of numeric attributes in many cases. In addition it contains attributes with numerical values to the normal categorical attribute. In this case, therefore, it is required processes in order to use the data to learn an appropriate value for the type attribute. In this paper, the domain of the numeric attributes are divided into several segments using learning algorithm techniques of discritization. It is described Clustering with other data mining techniques. Large amount of first cluster with characteristics is similar records from the database into smaller groups that split multiple given finite patterns in the pattern space. It is close to each other of a set of patterns that together make up a bunch. Among the set without specifying a particular category in a given data by extracting a pattern. It will be described similar grouping of data clustering technique to classify the data.

  • PDF

Spacio-temporal Analysis of Urban Population Exposure to Traffic-Related air Pollution (교통흐름에 기인하는 미세먼지 노출 도시인구에 대한 시.공간적 분석)

  • Lee, Keum-Sook
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.11 no.1
    • /
    • pp.59-77
    • /
    • 2008
  • The purpose of this study is to investigate the impact of traffic-related air pollution on the urban population in the Metropolitan Seoul area. In particular, this study analyzes urban population exposure to traffic-related particulate materials(PM). For the purpose, this study examines the relationships between traffic flows and PM concentration levels during the last fifteen years. Traffic volumes have been decreased significantly in recent year in Seoul, however, PM levels have been declined less compare to traffic volumes. It may be related with the rapid growth in the population and vehicle numbers in Gyenggi, the outskirt of Seoul, where several New Towns have been developed in the middle of 1990's. The spatial pattern of commuting has changed, and thus and travel distances and traffic volumes have increased along the main roads connecting CBDs in Seoul and New Towns consisting of large residential apartment complexes. These changes in traffic flows and travel behaviors cause increasing exposure to traffic-related air pollution for urban population over the Metropolitan Seoul area. GIS techniques are applied to analyze the spatial patterns of traffic flows, population distributions, PM distributions, and passenger flows comprehensively. This study also analyzes real time base traffic flow data and passenger flow data obtained from T-card transaction database applying data mining techniques. This study also attempts to develop a space-time model for assessing journey-time exposure to traffic related air pollutants based on travel passenger frequency distribution function. The results of this study can be used for the implications for sustainable transport systems, public health and transportation policy by reducing urban air pollution and road traffics in the Metropolitan Seoul area.

  • PDF

A Review of Magnetic Exploration in Korea (한국의 육상 자력탐사)

  • Park, Yeong-Sue
    • Economic and Environmental Geology
    • /
    • v.39 no.4 s.179
    • /
    • pp.403-416
    • /
    • 2006
  • Magnetic method is rapid, cheap and simple geophysical exploration technique, and has wide range of applications such as resources prospecting, geological structure investigation and even geotechnical and environmental problems. Especially, aeromagnetics gives fundamental and useful geoscientific data fnr not only assessment of potential resources, but also national land planning. Magnetic method, perhaps the oldest geophysical technique, was relatively early introduced into Korea. Documents during Japanese occupation says that magnetic method was used for exploring metallic ore deposits and hot spring, and that a geomagnetic observatory was operated. From mid 1950's, after Korean War, magnetic explorations for natural resources such as metallic ore, uranium, coal, and groundwater were intensively executed for industrialization. Apache aeromagnetic survey project during $1958{\sim}1959$ and its ground follow-up surveys are typical and important cases in those days. Magnetic survey techniques were rapidly advanced during 1970's and 1980's with improvements of instruments, growth of geophysical manpower, and availability of computers. The national aeromagnetic mapping project by KIGAM in 1981 showed the improved technical capability of those days. Decline of mining industry since mid 1980's moved the exploration objects from traditional resources to new ones such as groundwater and geothermal resources, and applications to investigation of geological structure were revived. Recently appeared applications such as natural hazard assessment, and engineering and environmental studies increased the magnetic method's utility in the realm of exploration.

A Single Index Approach for Subsequence Matching that Supports Normalization Transform in Time-Series Databases (시계열 데이터베이스에서 단일 색인을 사용한 정규화 변환 지원 서브시퀀스 매칭)

  • Moon Yang-Sae;Kim Jin-Ho;Loh Woong-Kee
    • The KIPS Transactions:PartD
    • /
    • v.13D no.4 s.107
    • /
    • pp.513-524
    • /
    • 2006
  • Normalization transform is very useful for finding the overall trend of the time-series data since it enables finding sequences with similar fluctuation patterns. The previous subsequence matching method with normalization transform, however, would incur index overhead both in storage space and in update maintenance since it should build multiple indexes for supporting arbitrary length of query sequences. To solve this problem, we propose a single index approach for the normalization transformed subsequence matching that supports arbitrary length of query sequences. For the single index approach, we first provide the notion of inclusion-normalization transform by generalizing the original definition of normalization transform. The inclusion-normalization transform normalizes a window by using the mean and the standard deviation of a subsequence that includes the window. Next, we formally prove correctness of the proposed method that uses the inclusion-normalization transform for the normalization transformed subsequence matching. We then propose subsequence matching and index building algorithms to implement the proposed method. Experimental results for real stock data show that our method improves performance by up to $2.5{\sim}2.8$ times over the previous method. Our approach has an additional advantage of being generalized to support many sorts of other transforms as well as normalization transform. Therefore, we believe our work will be widely used in many sorts of transform-based subsequence matching methods.

Building an SNS Crawling System Using Python (Python을 이용한 SNS 크롤링 시스템 구축)

  • Lee, Jong-Hwa
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.5
    • /
    • pp.61-76
    • /
    • 2018
  • Everything is coming into the world of network where modern people are living. The Internet of Things that attach sensors to objects allows real-time data transfer to and from the network. Mobile devices, essential for modern humans, play an important role in keeping all traces of everyday life in real time. Through the social network services, information acquisition activities and communication activities are left in a huge network in real time. From the business point of view, customer needs analysis begins with SNS data. In this research, we want to build an automatic collection system of SNS contents of web environment in real time using Python. We want to help customers' needs analysis through the typical data collection system of Instagram, Twitter, and YouTube, which has a large number of users worldwide. It is stored in database through the exploitation process and NLP process by using the virtual web browser in the Python web server environment. According to the results of this study, we want to conduct service through the site, the desired data is automatically collected by the search function and the netizen's response can be confirmed in real time. Through time series data analysis. Also, since the search was performed within 5 seconds of the execution result, the advantage of the proposed algorithm is confirmed.

Utilization of similarity measures by PIM with AMP as association rule thresholds (모든 주변 비율을 고려한 확률적 흥미도 측도 기반 유사성 측도의 연관성 평가 기준 활용 방안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.117-124
    • /
    • 2013
  • Association rule of data mining techniques is the method to quantify the relationship between a set of items in a huge database, andhas been applied in various fields like internet shopping mall, healthcare, insurance, and education. There are three primary interestingness measures for association rule, support and confidence and lift. Confidence is the most important measure of these measures, and we generate some association rules using confidence. But it is an asymmetric measure and has only positive value. So we can face with difficult problems in generation of association rules. In this paper we apply the similarity measures by probabilistic interestingness measure (PIM) with all marginal proportions (AMP) to solve this problem. The comparative studies with support, confidences, lift, chi-square statistics, and some similarity measures by PIM with AMPare shown by numerical example. As the result, we knew that the similarity measures by PIM with AMP could be seen the degree of association same as confidence. And we could confirm the direction of association because they had the sign of their values, and select the best similarity measure by PIM with AMP.

The Mitochondrial Warburg Effect: A Cancer Enigma

  • Kim, Hans H.;Joo, Hyun;Kim, Tae-Ho;Kim, Eui-Yong;Park, Seok-Ju;Park, Ji-Kyoung;Kim, Han-Jip
    • Interdisciplinary Bio Central
    • /
    • v.1 no.2
    • /
    • pp.7.1-7.7
    • /
    • 2009
  • "To be, or not to be?" This question is not only Hamlet's agony but also the dilemma of mitochondria in a cancer cell. Cancer cells have a high glycolysis rate even in the presence of oxygen. This feature of cancer cells is known as the Warburg effect, named for the first scientist to observe it, Otto Warburg, who assumed that because of mitochondrial malfunction, cancer cells had to depend on anaerobic glycolysis to generate ATP. It was demonstrated, however, that cancer cells with intact mitochondria also showed evidence of the Warburg effect. Thus, an alternative explanation was proposed: the Warburg effect helps cancer cells harness additional ATP to meet the high energy demand required for their extraordinary growth while providing a basic building block of metabolites for their proliferation. A third view suggests that the Warburg effect is a defense mechanism, protecting cancer cells from the higher than usual oxidative environment in which they survive. Interestingly, the latter view does not conflict with the high-energy production view, as increased glucose metabolism enables cancer cells to produce larger amounts of both antioxidants to fight oxidative stress and ATP and metabolites for growth. The combination of these two different hypotheses may explain the Warburg effect, but critical questions at the mechanistic level remain to be explored. Cancer shows complex and multi-faceted behaviors. Previously, there has been no overall plan or systematic approach to integrate and interpret the complex signaling in cancer cells. A new paradigm of collaboration and a well-designed systemic approach will supply answers to fill the gaps in current cancer knowledge and will accelerate the discovery of the connections behind the Warburg mystery. An integrated understanding of cancer complexity and tumorigenesis is necessary to expand the frontiers of cancer cell biology.

Incremental Generation of A Decision Tree Using Global Discretization For Large Data (대용량 데이터를 위한 전역적 범주화를 이용한 결정 트리의 순차적 생성)

  • Han, Kyong-Sik;Lee, Soo-Won
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.487-498
    • /
    • 2005
  • Recently, It has focused on decision tree algorithm that can handle large dataset. However, because most of these algorithms for large datasets process data in a batch mode, if new data is added, they have to rebuild the tree from scratch. h more efficient approach to reducing the cost problem of rebuilding is an approach that builds a tree incrementally. Representative algorithms for incremental tree construction methods are BOAT and ITI and most of these algorithms use a local discretization method to handle the numeric data type. However, because a discretization requires sorted numeric data in situation of processing large data sets, a global discretization method that sorts all data only once is more suitable than a local discretization method that sorts in every node. This paper proposes an incremental tree construction method that efficiently rebuilds a tree using a global discretization method to handle the numeric data type. When new data is added, new categories influenced by the data should be recreated, and then the tree structure should be changed in accordance with category changes. This paper proposes a method that extracts sample points and performs discretiration from these sample points to recreate categories efficiently and uses confidence intervals and a tree restructuring method to adjust tree structure to category changes. In this study, an experiment using people database was made to compare the proposed method with the existing one that uses a local discretization.

Trend of Research and Industry-Related Analysis in Data Quality Using Time Series Network Analysis (시계열 네트워크분석을 통한 데이터품질 연구경향 및 산업연관 분석)

  • Jang, Kyoung-Ae;Lee, Kwang-Suk;Kim, Woo-Je
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.6
    • /
    • pp.295-306
    • /
    • 2016
  • The purpose of this paper is both to analyze research trends and to predict industrial flows using the meta-data from the previous studies on data quality. There have been many attempts to analyze the research trends in various fields till lately. However, analysis of previous studies on data quality has produced poor results because of its vast scope and data. Therefore, in this paper, we used a text mining, social network analysis for time series network analysis to analyze the vast scope and data of data quality collected from a Web of Science index database of papers published in the international data quality-field journals for 10 years. The analysis results are as follows: Decreases in Mathematical & Computational Biology, Chemistry, Health Care Sciences & Services, Biochemistry & Molecular Biology, Biochemistry & Molecular Biology, and Medical Information Science. Increases, on the contrary, in Environmental Sciences, Water Resources, Geology, and Instruments & Instrumentation. In addition, the social network analysis results show that the subjects which have the high centrality are analysis, algorithm, and network, and also, image, model, sensor, and optimization are increasing subjects in the data quality field. Furthermore, the industrial connection analysis result on data quality shows that there is high correlation between technique, industry, health, infrastructure, and customer service. And it predicted that the Environmental Sciences, Biotechnology, and Health Industry will be continuously developed. This paper will be useful for people, not only who are in the data quality industry field, but also the researchers who analyze research patterns and find out the industry connection on data quality.

A Study on the Research Trends on Domestic Platform Government using Topic Modeling (토픽 모델링을 활용한 한국의 플랫폼정부 연구동향 분석)

  • Suh, Byung-Jo;Shin, Sun-Young
    • Informatization Policy
    • /
    • v.24 no.3
    • /
    • pp.3-26
    • /
    • 2017
  • The amount of unstructured data generated online is increasing exponentially and the analysis of text data is being done in various fields. In order to identify the research trends on the platform government, the title, year, academic society, and abstract information of the academic papers on the subject of platform government were collected from the database of the domestic papers, DBPIA(www.dbpia.co.kr). The results of the existing research on the platform government and related fields were analyzed based on each stage of the national informatization promotion. The technology, service, and governance topics were extracted from papers on platform government and the trends of core topics were analyzed by year. Entering the era of the intelligent information society, this study has significance for providing the basis for defining a new role of government - the platform government that sets the stage for the private sector to lead the innovation, and plays the role of an 'enabler' and 'facilitator' instead. The purpose of this study is to understand the platform government research through objective analysis of its trends. Looking for future directions, this study will contribute to future research by providing reference materials.