• Title/Summary/Keyword: Extracting characteristics

Search Result 581, Processing Time 0.027 seconds

Intrusion Detection Method Using Unsupervised Learning-Based Embedding and Autoencoder (비지도 학습 기반의 임베딩과 오토인코더를 사용한 침입 탐지 방법)

  • Junwoo Lee;Kangseok Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.355-364
    • /
    • 2023
  • As advanced cyber threats continue to increase in recent years, it is difficult to detect new types of cyber attacks with existing pattern or signature-based intrusion detection method. Therefore, research on anomaly detection methods using data learning-based artificial intelligence technology is increasing. In addition, supervised learning-based anomaly detection methods are difficult to use in real environments because they require sufficient labeled data for learning. Research on an unsupervised learning-based method that learns from normal data and detects an anomaly by finding a pattern in the data itself has been actively conducted. Therefore, this study aims to extract a latent vector that preserves useful sequence information from sequence log data and develop an anomaly detection learning model using the extracted latent vector. Word2Vec was used to create a dense vector representation corresponding to the characteristics of each sequence, and an unsupervised autoencoder was developed to extract latent vectors from sequence data expressed as dense vectors. The developed autoencoder model is a recurrent neural network GRU (Gated Recurrent Unit) based denoising autoencoder suitable for sequence data, a one-dimensional convolutional neural network-based autoencoder to solve the limited short-term memory problem that GRU can have, and an autoencoder combining GRU and one-dimensional convolution was used. The data used in the experiment is time-series-based NGIDS (Next Generation IDS Dataset) data, and as a result of the experiment, an autoencoder that combines GRU and one-dimensional convolution is better than a model using a GRU-based autoencoder or a one-dimensional convolution-based autoencoder. It was efficient in terms of learning time for extracting useful latent patterns from training data, and showed stable performance with smaller fluctuations in anomaly detection performance.

Extraction of Snowmelt Parameters using NOAA AVHRR and GIS Technique for 7 Major Dam Watersheds in South Korea (NOAA AVHRR 영상 및 GIS 기법을 이용한 국내 주요 7개 댐 유역의 융설 매개변수 추출)

  • Shin, Hyung Jin;Kim, Seong Joon
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.28 no.2B
    • /
    • pp.177-185
    • /
    • 2008
  • Accurate monitoring of snow cover is a key component for studying climate and global as well as for daily weather forecasting and snowmelt runoff modelling. The few observed data related to snowmelt was the major cause of difficulty in extracting snowmelt factors such as snow cover area, snow depth and depletion curve. Remote sensing technology is very effective to observe a wide area. Although many researchers have used remote sensing for snow observation, there were a few discussions on the characteristics of spatial and temporal variation. Snow cover maps were derived from NOAA AVHRR images for the winter seasons from 1997 to 2006. Distributed snow depth was mapped by overlapping between snow cover maps and interpolated snowfall maps from 69 meteorological observation stations. Model parameters (Snow Cover Area: SCA, snow depth, Snow cover Depletion Curve: SDC) were built for 7 major watersheds in South Korea. The decrease pattern of SCA for time (day) was expressed as exponentially decay function, and the determination coefficient was ranged from 0.46 to 0.88. The SCA decreased 70% to 100% from the maximum SCA when 10 days passed.

Automatic Extraction of Tree Information in Forest Areas Using Local Maxima Based on Aerial LiDAR (항공 LiDAR 기반 Local Maxima를 이용한 산림지역 수목정보 추출 자동화)

  • In-Ha Choi;Sang-Kwan Nam;Seung-Yub Kim;Dong-Gook Lee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_4
    • /
    • pp.1155-1164
    • /
    • 2023
  • Currently, the National Forest Inventory (NFI) collects tree information by human, so the range and time of the survey are limited. Research is actively being conducted to extract tree information from a large area using aerial Light Detection And Ranging (LiDAR) and aerial photographs, but it does not reflect the characteristics of forest areas in Korea because it is conducted in areas with wide tree spacing or evenly spaced trees. Therefore, this study proposed a methodology for generating Digital Surface Model (DSM), Digital Elevation Model (DEM), and Canopy Height Model (CHM) images using aerial LiDAR, extracting the tree height through the local Maxima, and calculating the Diameter at Breath Height (DBH) through the DBH-tree height formula. The detection accuracy of trees extracted through the proposed methodology was 88.46%, 86.14%, and 84.31%, respectively, and the Root Mean Squared Error (RMSE) of DBH calculated based on the tree height formula was around 5cm, confirming the possibility of using the proposed methodology. It is believed that if standardized research on various types of forests is conducted in the future, the scope of automation application of the manual national forest resource survey can be expanded.

Service Quality Evaluation based on Social Media Analytics: Focused on Airline Industry (소셜미디어 어낼리틱스 기반 서비스품질 평가: 항공산업을 중심으로)

  • Myoung-Ki Han;Byounggu Choi
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.157-181
    • /
    • 2022
  • As competition in the airline industry intensifies, effective airline service quality evaluation has become one of the main challenges. In particular, as big data analytics has been touted as a new research paradigm, new research on service quality measurement using online review analysis has been attempted. However, these studies do not use review titles for analysis, relyon supervised learning that requires a lot of human intervention in learning, and do not consider airline characteristics in classifying service quality dimensions.To overcome the limitations of existing studies, this study attempts to measure airlines service quality and to classify it into the AIRQUAL service quality dimension using online review text as well as title based on self-trainingand sentiment analysis. The results show the way of effective extracting service quality dimensions of AIRQUAL from online reviews, and find that each service quality dimension have a significant effect on service satisfaction. Furthermore, the effect of review title on service satisfaction is also found to be significant. This study sheds new light on service quality measurement in airline industry by using an advanced analytical approach to analyze effects of service quality on customer satisfaction. This study also helps managers who want to improve customer satisfaction by providing high quality service in airline industry.

Vision-based Method for Estimating Cable Tension Using the Stay Cable Shape (사장재 케이블 형태를 이용하여 케이블 장력을 추정하는 영상기반 방법)

  • Jin-Soo Kim;Jae-Bong Park;Deok-Keun Lee;Dong-Uk Park;Sung-Wan Kim
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.28 no.1
    • /
    • pp.98-106
    • /
    • 2024
  • Due to advancements in construction technology and analytical tools, an increasing number of cable-stayed bridges have been designed and constructed in recent years. A cable is a structural element that primarily transmits the main load of a cable-stayed bridge and plays the most crucial role in reflecting the overall condition of the entire bridge system. In this study, a vision-based method was applied to estimate the tension of the stay cables located at a long distance. To measure the response of a cable using a vision-based method, it is necessary to install feature points or targets on the cable. However, depending on the location of the point to be measured, there may be no feature points in the cable, and there may also be limitations in installing the target on the cable. Hence, it is necessary to find a way to measure cable response that overcomes the limitations of existing vision-based methods. This study proposes a method for measuring cable responses by utilizing the characteristics of cable shape. The proposed method involved extracting the cable shape from the acquired image and determining the center of the extracted cable shape to measure the cable response. The extracted natural frequencies of the vibration mode were obtained using the measured responses, and the tension was estimated by applying them to the vibration method. To verify the reliability of the vision-based method, cable images were obtained from the Hwatae Bridge in service under ambient vibration conditions. The reliability of the method proposed in this study was confirmed by applying it to the vibration method using a vision-based approach, resulting in estimated tensions with an error of less than 1% compared to tensions estimated using an accelerometer.

Developing the Process and Characteristics of Preservation of Area-Based Heritage Sites in Japan (일본 면형 유산 보존제도의 확산과정과 특성)

  • Sung, Wonseok;Kang, Dongjin
    • Korean Journal of Heritage: History & Science
    • /
    • v.53 no.4
    • /
    • pp.32-59
    • /
    • 2020
  • South Korea's area-based heritage preservation system originates from the "Preservation of Traditional Buildings Act" enacted in 1984. However, this system was abolished in 1996. As there was a need for protection of ancient cities in the 1960s, Japan enacted the Historic City Preservation Act in 1966, and 'Preservation Areas for Historic Landscapes' and 'Special Preservation Districts for Historic Landscapes' were introduced. For the preservation of area-based heritage sites, the 'Important Preservation Districts for Groups of Traditional Buildings' system introduced as part of the revision of the Cultural Heritage Protection Act in 1975 was the beginning. Then, in the early-2000s, discussions on the preservation of area-based heritage sites began in earnest, and the 'Important Cultural Landscape' system was introduced for protection of the space and context between heritage sites. Also, '33 Groups of Modernization Industry Heritage Sites' were designated in 2007, covering various material and immaterial resources related to the modernization of Japan, and '100 Beautiful Historic Landscapes of Japan' were selected for protection of local landscapes with historic value in the same year. In 2015, the "Japanese Heritage" system was established for the integrated preservation and management of tangible and intangible heritage aspects located in specific areas; in 2016, the "Japanese Agricultural Heritage" system was established for the succession and fostering of the disappearing agriculture and fishery industries; and in 2017, "the 20th Century Heritage," was established, representing evidence of modern and contemporary Japanese technologies in the 20th century. As a result, presently (in September 2020), 30 'Historic Landscape Preservation Areas', 60 'Historic Landscape Special Districts,' 120 'Important Preservation Districts for Groups of Traditional Buildings," 65 'Important Cultural Landscapes,' 66 'Groups of Modernization Industry Heritage Sites,' 264 "100 Beautiful Historic Landscapes of Japan,' 104 'Japanese Heritage Sites,' and 15 'Japanese Agricultural Heritage Sites' have been designated. According to this perception of situations, the research process for this study with its basic purpose of extracting the general characteristics of Japan's area-based heritage preservation system, has sequentially spread since 1976 as follows. First, this study investigates Japan's area-based heritage site preservation system and sets the scope of research through discussions of literature and preceding studies. Second, this study investigates the process of the spread of the area-based heritage site preservation system and analyzes the relationship between the systems according to their development, in order to draw upon their characteristics. Third, to concretize content related to relationships and characteristics, this study involves in-depth analysis of three representative examples and sums them up to identify the characteristics of Japan's area-based heritage system. A noticeable characteristic of Japan's area-based heritage site preservation system drawn from this is that new heritage sites are born each year. Consequently, an overlapping phenomenon takes place between heritage sites, and such phenomena occur alongside revitalization of related industries, traditional industry, and cultural tourism and the improvement of localities as well as the preservation of area-based heritage. These characteristics can be applied as suggestions for the revitalization of the 'modern historical and cultural space' system implemented by South Korea.

Analysis and Performance Evaluation of Pattern Condensing Techniques used in Representative Pattern Mining (대표 패턴 마이닝에 활용되는 패턴 압축 기법들에 대한 분석 및 성능 평가)

  • Lee, Gang-In;Yun, Un-Il
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.77-83
    • /
    • 2015
  • Frequent pattern mining, which is one of the major areas actively studied in data mining, is a method for extracting useful pattern information hidden from large data sets or databases. Moreover, frequent pattern mining approaches have been actively employed in a variety of application fields because the results obtained from them can allow us to analyze various, important characteristics within databases more easily and automatically. However, traditional frequent pattern mining methods, which simply extract all of the possible frequent patterns such that each of their support values is not smaller than a user-given minimum support threshold, have the following problems. First, traditional approaches have to generate a numerous number of patterns according to the features of a given database and the degree of threshold settings, and the number can also increase in geometrical progression. In addition, such works also cause waste of runtime and memory resources. Furthermore, the pattern results excessively generated from the methods also lead to troubles of pattern analysis for the mining results. In order to solve such issues of previous traditional frequent pattern mining approaches, the concept of representative pattern mining and its various related works have been proposed. In contrast to the traditional ones that find all the possible frequent patterns from databases, representative pattern mining approaches selectively extract a smaller number of patterns that represent general frequent patterns. In this paper, we describe details and characteristics of pattern condensing techniques that consider the maximality or closure property of generated frequent patterns, and conduct comparison and analysis for the techniques. Given a frequent pattern, satisfying the maximality for the pattern signifies that all of the possible super sets of the pattern must have smaller support values than a user-specific minimum support threshold; meanwhile, satisfying the closure property for the pattern means that there is no superset of which the support is equal to that of the pattern with respect to all the possible super sets. By mining maximal frequent patterns or closed frequent ones, we can achieve effective pattern compression and also perform mining operations with much smaller time and space resources. In addition, compressed patterns can be converted into the original frequent pattern forms again if necessary; especially, the closed frequent pattern notation has the ability to convert representative patterns into the original ones again without any information loss. That is, we can obtain a complete set of original frequent patterns from closed frequent ones. Although the maximal frequent pattern notation does not guarantee a complete recovery rate in the process of pattern conversion, it has an advantage that can extract a smaller number of representative patterns more quickly compared to the closed frequent pattern notation. In this paper, we show the performance results and characteristics of the aforementioned techniques in terms of pattern generation, runtime, and memory usage by conducting performance evaluation with respect to various real data sets collected from the real world. For more exact comparison, we also employ the algorithms implementing these techniques on the same platform and Implementation level.

A study on characteristics of palace wallpaper in the Joseon Dynasty - Focusing on Gyeongbokgung Palace, Changdeokgung Palace and Chilgung Palace - (조선시대 궁궐 도배지 특성 연구 - 경복궁, 창덕궁, 칠궁을 중심으로 -)

  • KIM Jiwon;KIM Jisun;KIM, Myoungnam;JEONG Seonhwa
    • Korean Journal of Heritage: History & Science
    • /
    • v.56 no.1
    • /
    • pp.80-97
    • /
    • 2023
  • By taking wallpaper specimens from Gyeongbokgung Palace, Changdeokgung Palace, and Chilgung Palace preserved from the late Joseon Dynasty to the present, we planned in this study to determine the types and characteristics of the paper used as wallpaper in the Joseon royal family. First, we confirmed the features of paper hanging in the palaces with old literature on the wallpaper used by the royal family based on archival research. Second, we conducted a field survey targeting the royal palaces whose construction period was relatively clear, and analyzed the first layer of wallpaper directly attached to the wall structure after sampling the specimens. Therefore, we confirmed that the main raw material was hanji, which was used as a wallpaper by the royal family, and grasped the types of substances(dyes and pigments) used to produce a blue color in spaces that must have formality by analyzing the blue-colored paper. Based on the results confirmed through the analysis, we checked documents and the existing wallpaper by comparing the old literature related to wallpaper records of the Joseon Dynasty palaces. We also built a database for the restoration of cultural properties when conserving the wallpaper in the royal palaces. We examined the changes in wallpaper types by century and the content according to the place of use by extracting wallpaper-related contents recorded in 36 cases of Uigwe from the 17th to 20th centuries. As a result, it was found that the names used for document paper and wallpaper were not different, thus document paper and wallpaper were used without distinction during the Joseon Dynasty. And though there are differences in the types of wallpaper depending on the period, it was confirmed that the foundation of wallpaper continued until the late Joseon Dynasty, with Baekji(white hanji), Hubaekji(thick white paper), jeojuji(common hanji used to write documents), chojuji(hanji used as a draft for writing documents) and Gakjang(a wide and thick hanji used as a pad). As a result of fiber identification by the morphological characteristics of fibers and the normal color reaction(KS M ISO 9184-4: Graph "C" staining test) for the first layer of paper directly attached to the palace wall, the main materials of hanji used by the royal family were confirmed and the raw materials used to make hanii in buildings of palaces based on the construction period were determined. Also, as a result of analyzing the coloring materials of the blue decorative paper with an optical microscope, ultraviolet-visible spectroscopic analysis(UV-Vis), and X-ray diffraction analysis(XRD), we determined that the type of blue decorative paper dyes and pigments used in the palaces must have formality and identified that the raw materials used to produce the blue color were natural indigo, lazurite and cobalt blue.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.