• Title/Summary/Keyword: HDBSCAN

Search Result 4, Processing Time 0.018 seconds

Discriminant analysis for unbalanced data using HDBSCAN (불균형자료를 위한 판별분석에서 HDBSCAN의 활용)

  • Lee, Bo-Hui;Kim, Tae-Heon;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.4
    • /
    • pp.599-609
    • /
    • 2021
  • Data with a large difference in the number of objects between clusters are called unbalanced data. In discriminant analysis of unbalanced data, it is more important to classify objects in minority categories than to classify objects in majority categories well. However, objects in minority categories are often misclassified into majority categories. In this study, we propose a method that combined hierarchical DBSCAN (HDBSCAN) and SMOTE to solve this problem. Using HDBSCAN, it removes noise in minority categories and majority categories. Then it applies SMOTE to create new data. Area under the roc curve (AUC) and F1 scores were used to compare performance with existing methods. As a result, in most cases, the method combining HDBSCAN and synthetic minority oversampling technique (SMOTE) showed a high performance index, and it was found to be an excellent method for classifying unbalanced data.

Method of Deriving Activity Relationship and Location Information from BIM Model for Construction Schedule Management (공정관리 활용을 위한 BIM모델의 공정별 수순 및 위치정보 추출방안)

  • Yoon, Hyeongseok;Lee, Jaehee;Hwang, Jaeyeong;Kang, Hyojeong;Park, sangmi;Kang, Leenseok
    • Korean Journal of Construction Engineering and Management
    • /
    • v.23 no.2
    • /
    • pp.33-44
    • /
    • 2022
  • The simulation function by the 4D system is a representative BIM function in the construction stage. For the 4D simulation, schedule information for each activity must be created and then linked with the 3D model. Since the 3D model created in the design stage does not consider schedule information, there are practical difficulties in the process of creating schedule information for application to the construction stage and linking the 3D model. In this study, after extracting the schedule information of the construction stage using the HDBSCAN algorithm from the 3D model in the design stage, authors propose a methodology for automatically generating schedule information by identifying precedence and sequencing relationships by applying the topological alignment algorithm. Since the generated schedule information is created based on the 3D model, it can be used as information that is automatically linked by the common parameters between the schedule and the 3D model in the 4D system, and the practical utility of the 4D system can be increased. The proposed methodology was applied to the four bridge projects to confirm the schedule information generation, and applied to the 4D system to confirm the simplification of the link process between schedule and 3D model.

Design of environmental technology search system using synonym dictionary (유의어 사전 기반 환경기술 검색 시스템 설계)

  • XIANGHUA, PIAO;HELIN, YIN;Gu, Yeong Hyeon;Yoo, Seong Joon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.07a
    • /
    • pp.582-586
    • /
    • 2020
  • 국가기후기술정보시스템은 국내 환경기술과 국외의 수요기술 정보를 제공하는 검색 시스템이다. 그러나 기존의 시스템은 유사한 뜻을 가진 단일 단어와 복수 단어들을 모두 식별하지 못하기에 유의어를 입력했을 경우 검색 결과가 다르다. 이런 문제점을 해결하기 위해 본 연구에서는 유의어 사전을 기반으로한 환경기술 검색 시스템을 제안한다. 이 시스템은 Word2vec 모델과 HDBSCAN(Hierarchical Density-Based Spatial Clustering of Application with Noise) 알고리즘을 이용해 유의어 사전을 구축한다. Word2vec 모델을 이용해 한국어와 영어 위키백과 코퍼스에 대해 형태소 분석을 진행한 후 단일 단어와 복수 단어를 포함한 단어를 추출하고 벡터화를 진행한다. 그 다음 HDBSCAN 알고리즘을 이용해 벡터화된 단어를 군집화 해주고 유의어를 추출한다. 기존의 Word2vec 모델이 모든 단어 간의 거리를 계산하고 유의어를 추출하는 과정과 대비하면 시간이 단축되는 역할을 한다. 추출한 유의어를 통합해 유의어 사전을 구축한다. 국가기후기술정보시스템에서 제공하는 국내외 기술정보, 기술정보 키워드와 구축한 유의어 사전을 Multi-filter를 제공하는 Elasticsearch에 적용해 최종적으로 유의어를 식별할 수 있는 환경기술 검색 시스템을 제안한다.

  • PDF

Dynamic Pricing Based on Reinforcement Learning Reflecting the Relationship between Driver and Passenger Using Matching Matrix (Matching Matrix를 사용하여 운전자와 승객의 관계를 반영한 강화학습 기반 유동적인 가격 책정 체계)

  • Park, Jun Hyung;Lee, Chan Jae;Yoon, Young
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.6
    • /
    • pp.118-133
    • /
    • 2020
  • Research interest in the Mobility-as-a-Service (MaaS) concept for enhancing users' mobility experience is increasing. In particular, dynamic pricing techniques based on reinforcement learning have emerged since adjusting prices based on the demand is expected to help mobility services, such as taxi and car-sharing services, to gain more profit. This paper provides a simulation framework that considers more practical factors, such as demand density per location, preferred prices, the distance between users and drivers, and distance to the destination that critically affect the probability of matching between the users and the mobility service providers (e.g., drivers). The aforementioned new practical features are reflected on a data structure referred to as the Matching Matrix. Using an efficient algorithm of computing the probability of matching between the users and drivers and given a set of precisely identified high-demand locations using HDBSCAN, this study developed a better reward function that can gear the reinforcement learning process towards finding more realistic dynamic pricing policies.