• Title/Summary/Keyword: point dataset

Search Result 195, Processing Time 0.023 seconds

Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion (멀티모달 맥락정보 융합에 기초한 다중 물체 목표 시각적 탐색 이동)

  • Jeong Hyun Choi;In Cheol Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.9
    • /
    • pp.407-418
    • /
    • 2023
  • The Multi-Object Goal Visual Navigation(MultiOn) is a visual navigation task in which an agent must visit to multiple object goals in an unknown indoor environment in a given order. Existing models for the MultiOn task suffer from the limitation that they cannot utilize an integrated view of multimodal context because use only a unimodal context map. To overcome this limitation, in this paper, we propose a novel deep neural network-based agent model for MultiOn task. The proposed model, MCFMO, uses a multimodal context map, containing visual appearance features, semantic features of environmental objects, and goal object features. Moreover, the proposed model effectively fuses these three heterogeneous features into a global multimodal context map by using a point-wise convolutional neural network module. Lastly, the proposed model adopts an auxiliary task learning module to predict the observation status, goal direction and the goal distance, which can guide to learn the navigational policy efficiently. Conducting various quantitative and qualitative experiments using the Habitat-Matterport3D simulation environment and scene dataset, we demonstrate the superiority of the proposed model.

Classification of Seismic Stations Based on the Simultaneous Inversion Result of the Ground-motion Model Parameters (지진동모델 파라미터 동시역산을 이용한 지진관측소 분류)

  • Yun, Kwan-Hee;Suh, Jung-Hee
    • Geophysics and Geophysical Exploration
    • /
    • v.10 no.3
    • /
    • pp.183-190
    • /
    • 2007
  • The site effects of seismic stations were evaluated by conducting a simultaneous inversion of the stochastic point-source ground-motion model (STGM model; Boore, 2003) parameters based on the accumulated dataset of horizontal shear-wave Fourier spectra. A model parameter $K_0$ and frequency-dependent site amplification function A(f) were used to express the site effects. Once after a H/V ratio of the Fourier spectra was used as an initial estimate of A(f) for the inversion, the final A(f) which is considered to be the result of combined effect of the crustal amplification and loca lsite effects was calculated by averaging the log residuals at the site from the inversion and adding the mean log residual to the H/V ratio. The seismic stations were classified into five classes according to $logA_{1-10}^{max}$(f), the maximum level of the site amplification function in the range of 1 Hz < f < 10 Hz, i.e., A: $logA_{1-10}^{max}$(f) < 0.2, B: 0.2 $\leq$ $logA_{1-10}^{max}$(f) < 0.4, C: 0.4 $\leq$ $logA_{1-10}^{max}$(f) < 0.6, D: 0.6 $\leq$ $logA_{1-10}^{max}$(f) < 0.8, E: 0.8 $\leq$ $logA_{1-10}^{max}$(f). Implication of the classified result was supported by observing a shift of the dominant frequency of average A(f) for each classified stations as the class changes. Change of site classes after moving seismic stations to a better site condition was successfully described by the result of the station classification. In addition, the observed PGA (Peak Ground Acceleration)-values for two recent moderate earthquakes were well classified according to the proposed station classes.

Enhancing the performance of the facial keypoint detection model by improving the quality of low-resolution facial images (저화질 안면 이미지의 화질 개선를 통한 안면 특징점 검출 모델의 성능 향상)

  • KyoungOok Lee;Yejin Lee;Jonghyuk Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.171-187
    • /
    • 2023
  • When a person's face is recognized through a recording device such as a low-pixel surveillance camera, it is difficult to capture the face due to low image quality. In situations where it is difficult to recognize a person's face, problems such as not being able to identify a criminal suspect or a missing person may occur. Existing studies on face recognition used refined datasets, so the performance could not be measured in various environments. Therefore, to solve the problem of poor face recognition performance in low-quality images, this paper proposes a method to generate high-quality images by performing image quality improvement on low-quality facial images considering various environments, and then improve the performance of facial feature point detection. To confirm the practical applicability of the proposed architecture, an experiment was conducted by selecting a data set in which people appear relatively small in the entire image. In addition, by choosing a facial image dataset considering the mask-wearing situation, the possibility of expanding to real problems was explored. As a result of measuring the performance of the feature point detection model by improving the image quality of the face image, it was confirmed that the face detection after improvement was enhanced by an average of 3.47 times in the case of images without a mask and 9.92 times in the case of wearing a mask. It was confirmed that the RMSE for facial feature points decreased by an average of 8.49 times when wearing a mask and by an average of 2.02 times when not wearing a mask. Therefore, it was possible to verify the applicability of the proposed method by increasing the recognition rate for facial images captured in low quality through image quality improvement.

Interactive analysis tools for the wide-angle seismic data for crustal structure study (Technical Report) (지각 구조 연구에서 광각 탄성파 자료를 위한 대화식 분석 방법들)

  • Fujie, Gou;Kasahara, Junzo;Murase, Kei;Mochizuki, Kimihiro;Kaneda, Yoshiyuki
    • Geophysics and Geophysical Exploration
    • /
    • v.11 no.1
    • /
    • pp.26-33
    • /
    • 2008
  • The analysis of wide-angle seismic reflection and refraction data plays an important role in lithospheric-scale crustal structure study. However, it is extremely difficult to develop an appropriate velocity structure model directly from the observed data, and we have to improve the structure model step by step, because the crustal structure analysis is an intrinsically non-linear problem. There are several subjective processes in wide-angle crustal structure modelling, such as phase identification and trial-and-error forward modelling. Because these subjective processes in wide-angle data analysis reduce the uniqueness and credibility of the resultant models, it is important to reduce subjectivity in the analysis procedure. From this point of view, we describe two software tools, PASTEUP and MODELING, to be used for developing crustal structure models. PASTEUP is an interactive application that facilitates the plotting of record sections, analysis of wide-angle seismic data, and picking of phases. PASTEUP is equipped with various filters and analysis functions to enhance signal-to-noise ratio and to help phase identification. MODELING is an interactive application for editing velocity models, and ray-tracing. Synthetic traveltimes computed by the MODELING application can be directly compared with the observed waveforms in the PASTEUP application. This reduces subjectivity in crustal structure modelling because traveltime picking, which is one of the most subjective process in the crustal structure analysis, is not required. MODELING can convert an editable layered structure model into two-way traveltimes which can be compared with time-sections of Multi Channel Seismic (MCS) reflection data. Direct comparison between the structure model of wide-angle data with the reflection data will give the model more credibility. In addition, both PASTEUP and MODELING are efficient tools for handling a large dataset. These software tools help us develop more plausible lithospheric-scale structure models using wide-angle seismic data.

Deep Learning-based Professional Image Interpretation Using Expertise Transplant (전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론)

  • Kim, Taejin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.79-104
    • /
    • 2020
  • Recently, as deep learning has attracted attention, the use of deep learning is being considered as a method for solving problems in various fields. In particular, deep learning is known to have excellent performance when applied to applying unstructured data such as text, sound and images, and many studies have proven its effectiveness. Owing to the remarkable development of text and image deep learning technology, interests in image captioning technology and its application is rapidly increasing. Image captioning is a technique that automatically generates relevant captions for a given image by handling both image comprehension and text generation simultaneously. In spite of the high entry barrier of image captioning that analysts should be able to process both image and text data, image captioning has established itself as one of the key fields in the A.I. research owing to its various applicability. In addition, many researches have been conducted to improve the performance of image captioning in various aspects. Recent researches attempt to create advanced captions that can not only describe an image accurately, but also convey the information contained in the image more sophisticatedly. Despite many recent efforts to improve the performance of image captioning, it is difficult to find any researches to interpret images from the perspective of domain experts in each field not from the perspective of the general public. Even for the same image, the part of interests may differ according to the professional field of the person who has encountered the image. Moreover, the way of interpreting and expressing the image also differs according to the level of expertise. The public tends to recognize the image from a holistic and general perspective, that is, from the perspective of identifying the image's constituent objects and their relationships. On the contrary, the domain experts tend to recognize the image by focusing on some specific elements necessary to interpret the given image based on their expertise. It implies that meaningful parts of an image are mutually different depending on viewers' perspective even for the same image. So, image captioning needs to implement this phenomenon. Therefore, in this study, we propose a method to generate captions specialized in each domain for the image by utilizing the expertise of experts in the corresponding domain. Specifically, after performing pre-training on a large amount of general data, the expertise in the field is transplanted through transfer-learning with a small amount of expertise data. However, simple adaption of transfer learning using expertise data may invoke another type of problems. Simultaneous learning with captions of various characteristics may invoke so-called 'inter-observation interference' problem, which make it difficult to perform pure learning of each characteristic point of view. For learning with vast amount of data, most of this interference is self-purified and has little impact on learning results. On the contrary, in the case of fine-tuning where learning is performed on a small amount of data, the impact of such interference on learning can be relatively large. To solve this problem, therefore, we propose a novel 'Character-Independent Transfer-learning' that performs transfer learning independently for each character. In order to confirm the feasibility of the proposed methodology, we performed experiments utilizing the results of pre-training on MSCOCO dataset which is comprised of 120,000 images and about 600,000 general captions. Additionally, according to the advice of an art therapist, about 300 pairs of 'image / expertise captions' were created, and the data was used for the experiments of expertise transplantation. As a result of the experiment, it was confirmed that the caption generated according to the proposed methodology generates captions from the perspective of implanted expertise whereas the caption generated through learning on general data contains a number of contents irrelevant to expertise interpretation. In this paper, we propose a novel approach of specialized image interpretation. To achieve this goal, we present a method to use transfer learning and generate captions specialized in the specific domain. In the future, by applying the proposed methodology to expertise transplant in various fields, we expected that many researches will be actively conducted to solve the problem of lack of expertise data and to improve performance of image captioning.