• Title/Summary/Keyword: temporal mining

Search Result 118, Processing Time 0.025 seconds

Digital Gravity Anomaly Map of KIGAM (한국지질자원연구원 디지털 중력 이상도)

  • Lim, Mutaek;Shin, Younghong;Park, Yeong-Sue;Rim, Hyoungrea;Ko, In Se;Park, Changseok
    • Geophysics and Geophysical Exploration
    • /
    • v.22 no.1
    • /
    • pp.37-43
    • /
    • 2019
  • We present gravity anomaly maps based on KIGAM's gravity data measured from 2000 to 2018. Until 2016, we acquired gravity data on about 6,400 points for the purpose of regional mapping covering the whole country with data density of at least one point per $4km{\times}4km$ for reducing the time of the data acquisition. In addition, we have performed local gravity surveys for the purpose of mining development in and around the NMC Moland Mine at Jecheon in 2013 and in the Taebaeksan mineralized zone from 2015 to 2018 with data interval of several hundred meters to 2 km. Meanwhile, we carried out precise gravity explorations with data interval of about 250 m on and around epicenter areas of Gyeongju and Pohang earthquakes of relatively large magnitude which occurred in 2016 and in 2017, respectively. Thus we acquired in total about 9,600 points data as the result. We also used additional data acquired by Pusan National University for some local areas. Finally, gravity data more than 16,000 points except for the repetition and temporal control points were available to calculate free-air, Bouguer, and isostatic gravity anomalies. Therefore, the presented anomaly maps are most advanced in spatial distribution and the number of used data so far in Korea.

Online Privacy Protection: An Analysis of Social Media Reactions to Data Breaches (온라인 정보 보호: 소셜 미디어 내 정보 유출 반응 분석)

  • Seungwoo Seo;Youngjoon Go;Hong Joo Lee
    • Knowledge Management Research
    • /
    • v.25 no.1
    • /
    • pp.1-19
    • /
    • 2024
  • This study analyzed the changes in social media reactions of data subjects to major personal data breach incidents in South Korea from January 2014 to October 2022. We collected a total of 1,317 posts written on Naver Blogs within a week immediately following each incident. Applying the LDA topic modeling technique to these posts, five main topics were identified: personal data breaches, hacking, information technology, etc. Analyzing the temporal changes in topic distribution, we found that immediately after a data breach incident, the proportion of topics directly mentioning the incident was the highest. However, as time passed, the proportion of mentions related indirectly to the personal data breach increased. This suggests that the attention of data subjects shifts from the specific incident to related topics over time, and interest in personal data protection also decreases. The findings of this study imply a future need for research on the changes in privacy awareness of data subjects following personal data breach incidents.

International Research Trend on Mountainous Sediment-related Disasters Induced by Earthquakes (지진 유발 산지토사재해 관련 국외 연구동향 분석)

  • Lee, Sang-In;Seo, Jung-Il;Kim, Jin-Hak;Ryu, Dong-Seop;Seo, Jun-Pyo;Kim, Dong-Yeob;Lee, Chang-Woo
    • Journal of Korean Society of Forest Science
    • /
    • v.106 no.4
    • /
    • pp.431-440
    • /
    • 2017
  • The 2016 Gyeongju Earthquake ($M_L$ 5.8) (occurred on September 12, 2016) and the 2017 Pohang Earthquake ($M_L$ 5.4) (occurred on November 15, 2017) caused unprecedented damages in South Korea. It is necessary to establish basic data related to earthquake-induced mountainous sediment-related disasters over worldwide. In this study, we analyzed previous international studies on the earthquake-induced mountainous sediment-related disasters, then classified research areas according to research themes using text-mining and co-word analysis in VOSviewer program, and finally examined spatio-temporal research trends by research area. The result showed that the related-researches have been rapidly increased since 2005, which seems to be affected by recent large-scale earthquakes occurred in China, Taiwan and Japan. In addition, the research area related to mountainous sediment-related disasters induced by earthquakes was classified into four subjects: (i) mechanisms of disaster occurrence; (ii) rainfall parameters controlling disaster occurrence; (iii) prediction of potential disaster area using aerial and satellite photographs; and (iv) disaster risk mapping through the modeling of disaster occurrence. These research areas are considered to have a strong correlation with each other. On the threshold year (i.e., 2012-2013), when cumulative number of research papers was reached 50% of total research papers published since 1987, proportions per unit year of all research areas should increase. Especially, the proportion of the research areas related to prediction of potential disaster area using aerial and satellite photographs is highly increased compared to other three research areas. These trends are responsible for the rapidly increasing research papers with study sites in China, and the research papers examined in Taiwan, Japan, and the United States have also contributed to increases in all research areas. The results are could be used as basic data to present future research direction related to mountainous sediment-related disasters induced by earthquakes in South Korea.

Analysis of Research Trends on Mountain Streams in the Republic of Korea: Comparison to International Research Trends (산지하천을 대상으로 한 국내 연구동향 분석: 국제 연구동향과의 비교)

  • Lee, Sang In;Seo, Jung Il;Lee, Yohan;Kim, Suk Woo;Chun, Kun Woo
    • Korean Journal of Environment and Ecology
    • /
    • v.33 no.2
    • /
    • pp.216-227
    • /
    • 2019
  • The purpose of this study is to propose the rational mountain stream management strategy considering the natural conditions and social needs of the Republic of Korea. We reviewed domestic and overseas studies related to mountain streams, identified the study areas by text mining and co-word analysis using the VOSviewer program, and then analyzed the spatial and temporal study trends and topics of each study area. The results showed that domestic studies on mountain streams are still in an initial stage compared to overseas studies. Overseas studies on mountain streams can be classified into four groups: (i) habitat and species composition of fish and invertebrates, (ii) hydrological phenomena and nutrient migration, (iii) transport of sediment and organic materials and the relevant morphological changes by runoff flows, and (iv) plant species composition in mountain streams. Of these study subjects, domestic studies belonging to the (i) group mainly focused on macroinvertebrates while domestic studies belonging to the (iii) group regarded transport of sediment and organic materials as not the ecological disturbance but the source of sediment-related disasters. We then analyzed the rate of each research group to all papers by period and country. The results showed that the overseas studies belonging to (iii) and (iv) groups have increased with time, and the increase was mostly due to the studies in the United States, Brazil, Canada, and China. On the other hand, domestic studies belonging to (i) and (iii) groups increased somewhat with time, but there was a slight lack of correlation between the two subjects. Therefore, the hybridity studies to complement the shortage is necessary for the future.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Spatio-temporal Distribution of Macrozoobenthos in the Three Estuaries of South Korea (우리나라 3개 하구역 대형저서동물 군집 시공간 분포)

  • LIM, HYUN-SIG;LEE, JIN-YOUNG;LEE, JUNG-HO;SHIN, HYUN-CHUL;RYU, JONGSEONG
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.24 no.1
    • /
    • pp.106-127
    • /
    • 2019
  • This study aims to understand spatio-temporal variations of macrozoobenthos community in Han River (HRE), Geum River (GRE), and Nakdong River estuaries (NRE) of Korea, sampled by National Survey of Marine Ecosystem. The survey was seasonally performed at a total of 20 stations for three years (2015-2017). Sediment samples were taken three times with van Veen grab of $0.1m^2$) areal size and sieved through a 1 mm pore size mesh on site. A total of 1,008 species were identified with 602 species in HRE, 612 in GRE, and 619 in NRE, showing similar number of species between estuaries. Mean density was $1,357ind./m^2$, showing the high in NRE ($1,357ind./m^2$), mid in GRE ($1,357ind./m^2$), and low in HRE ($1,127ind./m^2$). Mean biomass was $116.8g/m^2$, showing similar variations to density ($174.2g/m^2$ in NRE, $129.0g/m^2$ in GRE, $49.0g/m^2$ in HRE). Polychaeta dominated in number of species and density in three estuaries. Biomass-dominated taxon was Mollusca in HRE and GRE, and Echinodermata in NRE. Polychaetous species dominated all three estuaries over 4% of density, such as Dispio oculata, Heteromastus filiformis and Aonides oxycephala in HRE, Heteromastus filiformis and Scoletoma longifolia in GRE, and Pseudopolydora sp. and Aphelochaeta sp. in NRE, showing various density between estuaries. Community structure was determined by various environmental variables among estuaries such as mean grain size and sorting (HRE), salinity and mean grain size (GRE), and salinity, dissolved oxygen, loss on ignition and mud content (NRE). Our study demonstrates the application of different measures to manage ecosystems in three estuaries. HRE needs to alleviate sedimentary stressors such as sand mining, land-filling, dike construction. Management of GRE should be focused on fresh water control and sedimentary stressors. In NRE, monitoring of dominant benthos and process study on hypoxia occurrence in inner Masan Bay are necessary.

Index-based Searching on Timestamped Event Sequences (타임스탬프를 갖는 이벤트 시퀀스의 인덱스 기반 검색)

  • 박상현;원정임;윤지희;김상욱
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.468-478
    • /
    • 2004
  • It is essential in various application areas of data mining and bioinformatics to effectively retrieve the occurrences of interesting patterns from sequence databases. For example, let's consider a network event management system that records the types and timestamp values of events occurred in a specific network component(ex. router). The typical query to find out the temporal casual relationships among the network events is as fellows: 'Find all occurrences of CiscoDCDLinkUp that are fellowed by MLMStatusUP that are subsequently followed by TCPConnectionClose, under the constraint that the interval between the first two events is not larger than 20 seconds, and the interval between the first and third events is not larger than 40 secondsTCPConnectionClose. This paper proposes an indexing method that enables to efficiently answer such a query. Unlike the previous methods that rely on inefficient sequential scan methods or data structures not easily supported by DBMSs, the proposed method uses a multi-dimensional spatial index, which is proven to be efficient both in storage and search, to find the answers quickly without false dismissals. Given a sliding window W, the input to a multi-dimensional spatial index is a n-dimensional vector whose i-th element is the interval between the first event of W and the first occurrence of the event type Ei in W. Here, n is the number of event types that can be occurred in the system of interest. The problem of‘dimensionality curse’may happen when n is large. Therefore, we use the dimension selection or event type grouping to avoid this problem. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than the sequential scan and ISO-Depth index methods.hods.

Building a Korean Sentiment Lexicon Using Collective Intelligence (집단지성을 이용한 한글 감성어 사전 구축)

  • An, Jungkook;Kim, Hee-Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.49-67
    • /
    • 2015
  • Recently, emerging the notion of big data and social media has led us to enter data's big bang. Social networking services are widely used by people around the world, and they have become a part of major communication tools for all ages. Over the last decade, as online social networking sites become increasingly popular, companies tend to focus on advanced social media analysis for their marketing strategies. In addition to social media analysis, companies are mainly concerned about propagating of negative opinions on social networking sites such as Facebook and Twitter, as well as e-commerce sites. The effect of online word of mouth (WOM) such as product rating, product review, and product recommendations is very influential, and negative opinions have significant impact on product sales. This trend has increased researchers' attention to a natural language processing, such as a sentiment analysis. A sentiment analysis, also refers to as an opinion mining, is a process of identifying the polarity of subjective information and has been applied to various research and practical fields. However, there are obstacles lies when Korean language (Hangul) is used in a natural language processing because it is an agglutinative language with rich morphology pose problems. Therefore, there is a lack of Korean natural language processing resources such as a sentiment lexicon, and this has resulted in significant limitations for researchers and practitioners who are considering sentiment analysis. Our study builds a Korean sentiment lexicon with collective intelligence, and provides API (Application Programming Interface) service to open and share a sentiment lexicon data with the public (www.openhangul.com). For the pre-processing, we have created a Korean lexicon database with over 517,178 words and classified them into sentiment and non-sentiment words. In order to classify them, we first identified stop words which often quite likely to play a negative role in sentiment analysis and excluded them from our sentiment scoring. In general, sentiment words are nouns, adjectives, verbs, adverbs as they have sentimental expressions such as positive, neutral, and negative. On the other hands, non-sentiment words are interjection, determiner, numeral, postposition, etc. as they generally have no sentimental expressions. To build a reliable sentiment lexicon, we have adopted a concept of collective intelligence as a model for crowdsourcing. In addition, a concept of folksonomy has been implemented in the process of taxonomy to help collective intelligence. In order to make up for an inherent weakness of folksonomy, we have adopted a majority rule by building a voting system. Participants, as voters were offered three voting options to choose from positivity, negativity, and neutrality, and the voting have been conducted on one of the largest social networking sites for college students in Korea. More than 35,000 votes have been made by college students in Korea, and we keep this voting system open by maintaining the project as a perpetual study. Besides, any change in the sentiment score of words can be an important observation because it enables us to keep track of temporal changes in Korean language as a natural language. Lastly, our study offers a RESTful, JSON based API service through a web platform to make easier support for users such as researchers, companies, and developers. Finally, our study makes important contributions to both research and practice. In terms of research, our Korean sentiment lexicon plays an important role as a resource for Korean natural language processing. In terms of practice, practitioners such as managers and marketers can implement sentiment analysis effectively by using Korean sentiment lexicon we built. Moreover, our study sheds new light on the value of folksonomy by combining collective intelligence, and we also expect to give a new direction and a new start to the development of Korean natural language processing.