• 제목/요약/키워드: Data Extraction Techniques

검색결과 337건 처리시간 0.022초

원격탐사자료에 의한 해남지역 비금속광상 및 관련 특성 추출을 위한 연구 (A Study on Extraction of Non-metallic Ore Deposits from Remote Sensing Data of the Haenam Area)

  • 박인석;박종남
    • 대한원격탐사학회지
    • /
    • 제8권2호
    • /
    • pp.105-123
    • /
    • 1992
  • A study was made on the feature extraction for non-metallic one deposits and their related geology using the Remote Sensing and Airborne Radiometric data. The area chosen is around the Haenam area, where dickite and Quarzite mines are distributed in. The geology of the area consists mainly of Cretaceous volcanics and PreCambrian metamorphic. The methods applied are study on the reflectance characteristics of minerals and rocks sampled in the study area, and the feature extraction extraction of histogram normalized images for Landsat TM and Airborne Radiometric data, and finally evaluation of applicability of some useful pattern recognition techniques for regional lithological mapping. As a result, reflectances of non-metallic minerals are much higher than rock samples in the area. However, low grade dickites are slightly higher than rock samples, probably due to their greyish colour and also their textural features which may scatter the reflectance and may be capable of capturing much hychoryl ions. The reflectances of rock samples may depend on the degree of whiteness of samples. The outcrops or mine dumps in the study area were most effectively extracted on the histogram normalized image of TM Band 1, 2 and 3, due to their high reflectivity. The Masking technique using the above bands may be the most effective and the natural colour composite may provide some success as well. The colour composite image of PCA may also be effective in extracting geological features, and airborne radiometric data may be useful to some degree as an complementary tool.

Fine-tuning BERT Models for Keyphrase Extraction in Scientific Articles

  • Lim, Yeonsoo;Seo, Deokjin;Jung, Yuchul
    • 한국정보기술학회 영문논문지
    • /
    • 제10권1호
    • /
    • pp.45-56
    • /
    • 2020
  • Despite extensive research, performance enhancement of keyphrase (KP) extraction remains a challenging problem in modern informatics. Recently, deep learning-based supervised approaches have exhibited state-of-the-art accuracies with respect to this problem, and several of the previously proposed methods utilize Bidirectional Encoder Representations from Transformers (BERT)-based language models. However, few studies have investigated the effective application of BERT-based fine-tuning techniques to the problem of KP extraction. In this paper, we consider the aforementioned problem in the context of scientific articles by investigating the fine-tuning characteristics of two distinct BERT models - BERT (i.e., base BERT model by Google) and SciBERT (i.e., a BERT model trained on scientific text). Three different datasets (WWW, KDD, and Inspec) comprising data obtained from the computer science domain are used to compare the results obtained by fine-tuning BERT and SciBERT in terms of KP extraction.

Speaker Verification with the Constraint of Limited Data

  • Kumari, Thyamagondlu Renukamurthy Jayanthi;Jayanna, Haradagere Siddaramaiah
    • Journal of Information Processing Systems
    • /
    • 제14권4호
    • /
    • pp.807-823
    • /
    • 2018
  • Speaker verification system performance depends on the utterance of each speaker. To verify the speaker, important information has to be captured from the utterance. Nowadays under the constraints of limited data, speaker verification has become a challenging task. The testing and training data are in terms of few seconds in limited data. The feature vectors extracted from single frame size and rate (SFSR) analysis is not sufficient for training and testing speakers in speaker verification. This leads to poor speaker modeling during training and may not provide good decision during testing. The problem is to be resolved by increasing feature vectors of training and testing data to the same duration. For that we are using multiple frame size (MFS), multiple frame rate (MFR), and multiple frame size and rate (MFSR) analysis techniques for speaker verification under limited data condition. These analysis techniques relatively extract more feature vector during training and testing and develop improved modeling and testing for limited data. To demonstrate this we have used mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) as feature. Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) are used for modeling the speaker. The database used is NIST-2003. The experimental results indicate that, improved performance of MFS, MFR, and MFSR analysis radically better compared with SFSR analysis. The experimental results show that LPCC based MFSR analysis perform better compared to other analysis techniques and feature extraction techniques.

수치지도 갱신을 위한 건설도면 자료의 GIS 데이터 변환에 관한 연구 (A Study on GIS Data Transform for Update the Digital Map with Construction drawings)

  • 박승용;박우진;유기윤
    • 한국측량학회:학술대회논문집
    • /
    • 한국측량학회 2009년도 춘계학술발표회 논문집
    • /
    • pp.11-13
    • /
    • 2009
  • 본 연구는 최신의 수치지도 확보를 위한 갱신방안으로서 건축 및 다양한 SOC 건설공사에서 사용되고 있는 CAD 자료의 준공도면을 활용하여 수치지도를 갱신하기 위해 GIS 데이터로 변환하는 기법을 제시하였다. 변환과정은 레이어 추출, 객체변환, 좌표변환, 포맷변환으로 구성되며 CAD 데이터로부터 각 과정을 거쳐 변환된 GIS 데이터는 수치지도를 갱신할 수 있다.

  • PDF

AI 및 텍스트 마이닝 기법을 활용한 지반조사보고서 데이터 추출 자동화 (Automated Data Extraction from Unstructured Geotechnical Report based on AI and Text-mining Techniques)

  • 박지민;서완혁;서동희;윤태섭
    • 한국지반공학회논문집
    • /
    • 제40권4호
    • /
    • pp.69-79
    • /
    • 2024
  • 현장 지반정수 데이터는 다양한 현장 및 실내시험을 통해 획득된 후 지반조사보고서의 형태로 작성되어 유통된다. 효율적인 설계 및 시공을 위해선 지반정수의 디지털 데이터베이스화가 필수적이나, 현재 지반조사보고서 데이터는 수동 입력 방식으로 많은 시간과 인력이 소요되며, 오류가 발생하기도 한다. 본 연구는 이미지 기반 딥러닝 모델 및 텍스트 마이닝 기법을 사용하여 지반조사보고서에서 데이터를 자동으로 추출하는 방법을 제안하였다. 딥러닝 기반의 페이지 분류 모델과 텍스트 서칭 알고리즘을 사용하여 지반조사보고서 부록 내 세부 지반시험 결과 보고서를 100%의 정확도로 분류할 수 있었다. 컴퓨터 비전 알고리즘을 통해 보고서 페이지 내 유효한 데이터 영역을 결정하고, 텍스트 분석을 통해 추출 데이터 항목과 상응하는 지반 데이터를 짝지어 데이터를 추출했다. 제안한 모델은 205개의 지반조사 보고서로 구성된 데이터셋을 통해 검증되었으며, 평균 93.0%의 데이터 추출 정확도를 기록하였다. 마지막으로, 추출 모델의 실무 적용성을 위해 사용자 인터페이스 기반 프로그램을 개발하였다. 프로그램 내 사용자 상호작용을 통해 지반조사보고서 PDF 파일을 업로드하고 자동으로 보고서를 분석 및 데이터를 추출, 편집할 수 있도록 했다. 이를 통해 지반조사보고서의 디지털화 및 지반 데이터베이스 구축이 더욱 효율적이고 정확하게 이루어질 수 있을 것으로 판단된다.

항공영상을 이용한 딥러닝 기반 건물객체 추출 기법들의 비교평가 (Comparative evaluation of deep learning-based building extraction techniques using aerial images)

  • 모준상;성선경;최재완
    • 한국측량학회지
    • /
    • 제39권3호
    • /
    • pp.157-165
    • /
    • 2021
  • 최근 위성영상, 항공사진 등의 해상도가 향상됨에 따라 고해상도 원격탐사 자료를 이용한 다양한 연구가 진행되고 있다. 특히, 국토 전역의 건물객체 추출은 수치지도 레이어 및 주제도 작성에 필수적이기 때문에 높은 정확도가 요구된다. 본 연구에서는 딥러닝의 영상처리 기법 중 의미론적 분할에 사용되는 대표적인 모델인 SegNet, U-Net, FC-DenseNet, HRNetV2를 이용하여 건물객체 추출 모델을 생성하고, 이에 따른 모델의 평가를 수행하였다. 학습자료는 다양한 건물들로 이루어진 영상을 이용하여 생성하였고, 평가는 세 지역에 나누어서 진행하였다. 먼저 학습자료와 인접한 지역을 통해 모델의 성능을 평가하였고, 이후 학습자료와 상이한 지역을 통해 모델의 적용성을 평가하였다. 그 결과 HRNetV2 모델이 건물객체 추출의 성능과 적용성 면에서 가장 우수한 결과를 보였다. 본 연구를 통해 수치지도 내 건물레이어 생성 및 수정의 가능성을 확인하였다.

CutPaste-Based Anomaly Detection Model using Multi Scale Feature Extraction in Time Series Streaming Data

  • Jeon, Byeong-Uk;Chung, Kyungyong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권8호
    • /
    • pp.2787-2800
    • /
    • 2022
  • The aging society increases emergency situations of the elderly living alone and a variety of social crimes. In order to prevent them, techniques to detect emergency situations through voice are actively researched. This study proposes CutPaste-based anomaly detection model using multi-scale feature extraction in time series streaming data. In the proposed method, an audio file is converted into a spectrogram. In this way, it is possible to use an algorithm for image data, such as CNN. After that, mutli-scale feature extraction is applied. Three images drawn from Adaptive Pooling layer that has different-sized kernels are merged. In consideration of various types of anomaly, including point anomaly, contextual anomaly, and collective anomaly, the limitations of a conventional anomaly model are improved. Finally, CutPaste-based anomaly detection is conducted. Since the model is trained through self-supervised learning, it is possible to detect a diversity of emergency situations as anomaly without labeling. Therefore, the proposed model overcomes the limitations of a conventional model that classifies only labelled emergency situations. Also, the proposed model is evaluated to have better performance than a conventional anomaly detection model.

A Study of 3D Design Data Extraction for Thermal Forming Information

  • Kim, Jung;Park, Jung-Seo;Jo, Ye-Hyan;Shin, Jong-Gye;Kim, Won-Don;Ko, Kwang-Hee
    • Journal of Ship and Ocean Technology
    • /
    • 제12권3호
    • /
    • pp.1-13
    • /
    • 2008
  • In shipbuilding, diverse manufacturing techniques for automation have been developed and used in practice. Among them, however, the hull forming automation is the one that has not been of major concern compared with others such as welding and cutting. The basis of the development of this process is to find out how to extract thermal forming information. There exist various methods to obtain such information and the 3D design shape that needs to be formed should be extracted first for getting the necessary thermal forming information. Except well-established shipyards which operate 3D design systems, most of the shipyards only rely on 2.5D design systems and do not have an easy way to obtain 3D surface design data. So in this study, various shipbuilding design systems used by shipyards are investigated and a 3D design surface data extraction method is proposed from those design systems. Then an example is presented to show the extraction of real 3D surface data using the proposed method and computation of thermal forming information using the data.

A Comparison of Deep Reinforcement Learning and Deep learning for Complex Image Analysis

  • Khajuria, Rishi;Quyoom, Abdul;Sarwar, Abid
    • Journal of Multimedia Information System
    • /
    • 제7권1호
    • /
    • pp.1-10
    • /
    • 2020
  • The image analysis is an important and predominant task for classifying the different parts of the image. The analysis of complex image analysis like histopathological define a crucial factor in oncology due to its ability to help pathologists for interpretation of images and therefore various feature extraction techniques have been evolved from time to time for such analysis. Although deep reinforcement learning is a new and emerging technique but very less effort has been made to compare the deep learning and deep reinforcement learning for image analysis. The paper highlights how both techniques differ in feature extraction from complex images and discusses the potential pros and cons. The use of Convolution Neural Network (CNN) in image segmentation, detection and diagnosis of tumour, feature extraction is important but there are several challenges that need to be overcome before Deep Learning can be applied to digital pathology. The one being is the availability of sufficient training examples for medical image datasets, feature extraction from whole area of the image, ground truth localized annotations, adversarial effects of input representations and extremely large size of the digital pathological slides (in gigabytes).Even though formulating Histopathological Image Analysis (HIA) as Multi Instance Learning (MIL) problem is a remarkable step where histopathological image is divided into high resolution patches to make predictions for the patch and then combining them for overall slide predictions but it suffers from loss of contextual and spatial information. In such cases the deep reinforcement learning techniques can be used to learn feature from the limited data without losing contextual and spatial information.

내용 기반 음악 검색의 문제점 해결을 위한 전처리 (Pretreatment For The Problem Solution Of Contents-Based Music Retrieval)

  • 정명범;성보경;고일주
    • 한국컴퓨터정보학회논문지
    • /
    • 제12권6호
    • /
    • pp.97-104
    • /
    • 2007
  • 본 논문에서는 오디오를 내용기반으로 분석, 분류, 검색하기 위하여 사용되어 온 특징 추출 기법의 문제점을 제시하며, 새로운 검색 방법을 위해 하나의 전처리 과정을 제안한다. 기존 오디오 데이터 분석은 샘플링을 어떻게 하느냐에 따라 특징 값이 달라지기 때문에 같은 음악이라도 다른 음악으로 인식될 수 있는 문제를 갖고 있다. 따라서 본 논문에서는 다양한 포맷의 오디오 데이터를 내용 기반으로 검색하기 위해 PCM 데이터의 파형 정보 추출 방법을 제안한다. 이 방법을 이용하여 다양한 포맷으로 샘플링 된 오디오 데이터들이 같은 데이터임을 발견 할 수 있으며, 이는 내용기반 음악검색에 적용 할 수 있을 것이다. 이 방법의 유효성을 증명하기 위해 STFT를 이용한 특징 추출과 PCM 데이터의 파형 정보를 이용한 추출 실험을 하였으며, 그 결과 PCM데이터의 파형 정보 추출 방법이 효과적임을 보였다.

  • PDF