• Title/Summary/Keyword: Spatial-Temporal Data Mining

Search Result 36, Processing Time 0.02 seconds

A Study on the CBR Pattern using Similarity and the Euclidean Calculation Pattern (유사도와 유클리디안 계산패턴을 이용한 CBR 패턴연구)

  • Yun, Jong-Chan;Kim, Hak-Chul;Kim, Jong-Jin;Youn, Sung-Dae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.4
    • /
    • pp.875-885
    • /
    • 2010
  • CBR (Case-Based Reasoning) is a technique to infer the relationships between existing data and case data, and the method to calculate similarity and Euclidean distance is mostly frequently being used. However, since those methods compare all the existing and case data, it also has a demerit that it takes much time for data search and filtering. Therefore, to solve this problem, various researches have been conducted. This paper suggests the method of SE(Speed Euclidean-distance) calculation that utilizes the patterns discovered in the existing process of computing similarity and Euclidean distance. Because SE calculation applies the patterns and weight found during inputting new cases and enables fast data extraction and short operation time, it can enhance computing speed for temporal or spatial restrictions and eliminate unnecessary computing operation. Through this experiment, it has been found that the proposed method improves performance in various computer environments or processing rate more efficiently than the existing method that extracts data using similarity or Euclidean method does.

Kriging Analysis for Spatio-temporal Variations of Ground Level Ozone Concentration

  • Gorai, Amit Kumar;Jain, Kumar Gourav;Shaw, Neha;Tuluri, Francis;Tchounwou, Paul B.
    • Asian Journal of Atmospheric Environment
    • /
    • v.9 no.4
    • /
    • pp.247-258
    • /
    • 2015
  • Exposure of high concentration of ground-level ozone (GLO) can trigger a variety of health problems including chest pain, coughing, throat irritation, asthma, bronchitis and congestion. There are substantial human and animal toxicological data that support health effects associated with exposure to ozone and associations have been observed with a wide range of outcomes in epidemiological studies. The aim of the present study is to estimate the spatial distributions of GLO using geostatistical method (ordinary kriging) for assessing the exposure level of ozone in the eastern part of Texas, U.S.A. GLO data were obtained from 63 U.S. EPA's monitoring stations distributed in the region of study during the period January, 2012 to December, 2012. The descriptive statistics indicate that the spatial monthly mean of daily maximum 8 hour ozone concentrations ranged from 30.33 ppb (in January) to 48.05 (in June). The monthly mean of daily maximum 8 hour ozone concentrations was relatively low during the winter months (December, January, and February) and the higher values observed during the summer months (April, May, and June). The higher level of spatial variations observed in the months of July (Standard Deviation: 10.33) and August (Standard Deviation: 10.02). This indicates the existence of regional variations in climatic conditions in the study area. The range of the semivariogram models varied from 0.372 (in November) to 15.59 (in April). The value of the range represents the spatial patterns of ozone concentrations. Kriging maps revealed that the spatial patterns of ozone concentration were not uniform in each month. This may be due to uneven fluctuation in the local climatic conditions from one region to another. Thus, the formation and dispersion processes of ozone also change unevenly from one region to another. The ozone maps clearly indicate that the concentration values found maximum in the north-east region of the study area in most of the months. Part of the coastal area also showed maximum concentrations during the months of October, November, December, and January.

Spacio-temporal Analysis of Urban Population Exposure to Traffic-Related air Pollution (교통흐름에 기인하는 미세먼지 노출 도시인구에 대한 시.공간적 분석)

  • Lee, Keum-Sook
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.11 no.1
    • /
    • pp.59-77
    • /
    • 2008
  • The purpose of this study is to investigate the impact of traffic-related air pollution on the urban population in the Metropolitan Seoul area. In particular, this study analyzes urban population exposure to traffic-related particulate materials(PM). For the purpose, this study examines the relationships between traffic flows and PM concentration levels during the last fifteen years. Traffic volumes have been decreased significantly in recent year in Seoul, however, PM levels have been declined less compare to traffic volumes. It may be related with the rapid growth in the population and vehicle numbers in Gyenggi, the outskirt of Seoul, where several New Towns have been developed in the middle of 1990's. The spatial pattern of commuting has changed, and thus and travel distances and traffic volumes have increased along the main roads connecting CBDs in Seoul and New Towns consisting of large residential apartment complexes. These changes in traffic flows and travel behaviors cause increasing exposure to traffic-related air pollution for urban population over the Metropolitan Seoul area. GIS techniques are applied to analyze the spatial patterns of traffic flows, population distributions, PM distributions, and passenger flows comprehensively. This study also analyzes real time base traffic flow data and passenger flow data obtained from T-card transaction database applying data mining techniques. This study also attempts to develop a space-time model for assessing journey-time exposure to traffic related air pollutants based on travel passenger frequency distribution function. The results of this study can be used for the implications for sustainable transport systems, public health and transportation policy by reducing urban air pollution and road traffics in the Metropolitan Seoul area.

  • PDF

Digital Gravity Anomaly Map of KIGAM (한국지질자원연구원 디지털 중력 이상도)

  • Lim, Mutaek;Shin, Younghong;Park, Yeong-Sue;Rim, Hyoungrea;Ko, In Se;Park, Changseok
    • Geophysics and Geophysical Exploration
    • /
    • v.22 no.1
    • /
    • pp.37-43
    • /
    • 2019
  • We present gravity anomaly maps based on KIGAM's gravity data measured from 2000 to 2018. Until 2016, we acquired gravity data on about 6,400 points for the purpose of regional mapping covering the whole country with data density of at least one point per $4km{\times}4km$ for reducing the time of the data acquisition. In addition, we have performed local gravity surveys for the purpose of mining development in and around the NMC Moland Mine at Jecheon in 2013 and in the Taebaeksan mineralized zone from 2015 to 2018 with data interval of several hundred meters to 2 km. Meanwhile, we carried out precise gravity explorations with data interval of about 250 m on and around epicenter areas of Gyeongju and Pohang earthquakes of relatively large magnitude which occurred in 2016 and in 2017, respectively. Thus we acquired in total about 9,600 points data as the result. We also used additional data acquired by Pusan National University for some local areas. Finally, gravity data more than 16,000 points except for the repetition and temporal control points were available to calculate free-air, Bouguer, and isostatic gravity anomalies. Therefore, the presented anomaly maps are most advanced in spatial distribution and the number of used data so far in Korea.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Index-based Searching on Timestamped Event Sequences (타임스탬프를 갖는 이벤트 시퀀스의 인덱스 기반 검색)

  • 박상현;원정임;윤지희;김상욱
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.468-478
    • /
    • 2004
  • It is essential in various application areas of data mining and bioinformatics to effectively retrieve the occurrences of interesting patterns from sequence databases. For example, let's consider a network event management system that records the types and timestamp values of events occurred in a specific network component(ex. router). The typical query to find out the temporal casual relationships among the network events is as fellows: 'Find all occurrences of CiscoDCDLinkUp that are fellowed by MLMStatusUP that are subsequently followed by TCPConnectionClose, under the constraint that the interval between the first two events is not larger than 20 seconds, and the interval between the first and third events is not larger than 40 secondsTCPConnectionClose. This paper proposes an indexing method that enables to efficiently answer such a query. Unlike the previous methods that rely on inefficient sequential scan methods or data structures not easily supported by DBMSs, the proposed method uses a multi-dimensional spatial index, which is proven to be efficient both in storage and search, to find the answers quickly without false dismissals. Given a sliding window W, the input to a multi-dimensional spatial index is a n-dimensional vector whose i-th element is the interval between the first event of W and the first occurrence of the event type Ei in W. Here, n is the number of event types that can be occurred in the system of interest. The problem of‘dimensionality curse’may happen when n is large. Therefore, we use the dimension selection or event type grouping to avoid this problem. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than the sequential scan and ISO-Depth index methods.hods.