• Title/Summary/Keyword: jaccard

Search Result 90, Processing Time 0.021 seconds

An Effect of Semantic Relatedness on Entity Disambiguation: Using Korean Wikipedia (개체중의성해소에서 의미관련도 활용 효과 분석: 한국어 위키피디아를 사용하여)

  • Kang, In-Su
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.2
    • /
    • pp.111-118
    • /
    • 2015
  • Entity linking is to link entity's name mentions occurring in text to corresponding entities within knowledge bases. Since the same entity mention may refer to different entities according to their context, entity linking needs to deal with entity disambiguation. Most recent works on entity disambiguation focus on semantic relatedness between entities and attempt to integrate semantic relatedness with entity prior probabilities and term co-occurrence. To the best of my knowledge, however, it is hard to find studies that analyze and present the pure effects of semantic relatedness on entity disambiguation. From the experimentation on Korean Wikipedia data set, this article empirically evaluates entity disambiguation approaches using semantic relatedness in terms of the following aspects: (1) the difference among semantic relatedness measures such as NGD, PMI, Jaccard, Dice, Simpson, (2) the influence of ambiguities in co-occurring entity mentions' set, and (3) the difference between individual and collective disambiguation approaches.

A Sliding Window-based Multivariate Stream Data Classification (슬라이딩 윈도우 기반 다변량 스트림 데이타 분류 기법)

  • Seo, Sung-Bo;Kang, Jae-Woo;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.163-174
    • /
    • 2006
  • In distributed wireless sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. We propose a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes input as a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a standard text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Bayesian classifier and SVM, and for unsupervised, we tested Jaccard, TFIDF Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

Vegetation Types and Ecological Characteristics of Larix kaempferi Plantations in Baekdudaegan Protected Area, South Korea (백두대간 보호지역 일본잎갈나무림의 현존식생 유형과 생태적 특성)

  • Oh, Seung-Hwan;Kim, Jun-Soo;Cho, Joon-Hee;Cho, Hyun-Je
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.4
    • /
    • pp.530-542
    • /
    • 2021
  • To establish the basic unit for the ecological management of the Larix kaempferiplantations in the Baekdudaegan protected area, we classified the vegetation types using TWINSPAN and DCA ordination analysis based on the vegetation information collected from 119 plots and analyzed their spatial arrangement status. Vegetation types were classified into seven types, including Quercus mongolica-Rhododendron schlippenbachii type, Q. mongolica-Lespedeza maximowiczii type, Cornus controversa-Morus australis type, Q. mongolica-Carpinus cordata type, Lindera erythrocarpa-Rosa multiflora type, Q. serrata-Zanthoxylum schinifolium type, and Q. serrata-Sasa borealis type and they have usually reflected differences in the floristic composition according to latitude, elevation, establishment period, operation history, characteristics of the surrounding stands, and degree of disturbance. Furthermore, using the Jaccard coefficient to comparethe floristic composition similarity between Larix kaempferiplantations and surrounding potential natural vegetation (Q. mongolica and Q. serrata forests), although some differences depended on vegetation types, it was 0.21 on average with Q. mongolica forest and 0.32 with Q. serrata forest, indicating that the floristic composition was still heterogeneous.

Comparison of User-generated Tags with Subject Descriptors, Author Keywords, and Title Terms of Scholarly Journal Articles: A Case Study of Marine Science

  • Vaidya, Praveenkumar;Harinarayana, N.S.
    • Journal of Information Science Theory and Practice
    • /
    • v.7 no.1
    • /
    • pp.29-38
    • /
    • 2019
  • Information retrieval is the challenge of the Web 2.0 world. The experiment of knowledge organisation in the context of abundant information available from various sources proves a major hurdle in obtaining information retrieval with greater precision and recall. The fast-changing landscape of information organisation through social networking sites at a personal level creates a world of opportunities for data scientists and also library professionals to assimilate the social data with expert created data. Thus, folksonomies or social tags play a vital role in information organisation and retrieval. The comparison of these user-created tags with expert-created index terms, author keywords and title words, will throw light on the differentiation between these sets of data. Such comparative studies show revelation of a new set of terms to enhance subject access and reflect the extent of similarity between user-generated tags and other set of terms. The CiteULike tags extracted from 5,150 scholarly journal articles in marine science were compared with corresponding Aquatic Science and Fisheries Abstracts descriptors, author keywords, and title terms. The Jaccard similarity coefficient method was employed to compare the social tags with the above mentioned wordsets, and results proved the presence of user-generated keywords in Aquatic Science and Fisheries Abstracts descriptors, author keywords, and title words. While using information retrieval techniques like stemmer and lemmatization, the results were found to enhance keywords to subject access.

Deep Learning based Skin Lesion Segmentation Using Transformer Block and Edge Decoder (트랜스포머 블록과 윤곽선 디코더를 활용한 딥러닝 기반의 피부 병변 분할 방법)

  • Kim, Ji Hoon;Park, Kyung Ri;Kim, Hae Moon;Moon, Young Shik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.533-540
    • /
    • 2022
  • Specialists diagnose skin cancer using a dermatoscopy to detect skin cancer as early as possible, but it is difficult to determine accurate skin lesions because skin lesions have various shapes. Recently, the skin lesion segmentation method using deep learning, which has shown high performance, has a problem in segmenting skin lesions because the boundary between healthy skin and skin lesions is not clear. To solve these issues, the proposed method constructs a transformer block to effectively segment the skin lesion, and constructs an edge decoder for each layer of the network to segment the skin lesion in detail. Experiment results have shown that the proposed method achieves a performance improvement of 0.041 ~ 0.071 for Dic Coefficient and 0.062 ~ 0.112 for Jaccard Index, compared with the previous method.

3-Step Security Vulnerability Risk Scoring considering CVE Trends (CVE 동향을 반영한 3-Step 보안 취약점 위험도 스코어링)

  • Jihye, Lim;Jaewoo, Lee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.27 no.1
    • /
    • pp.87-96
    • /
    • 2023
  • As the number of security vulnerabilities increases yearly, security threats continue to occur, and the vulnerability risk is also important. We devise a security threat score calculation reflecting trends to determine the risk of security vulnerabilities. The three stages considered key elements such as attack type, supplier, vulnerability trend, and current attack methods and techniques. First, it reflects the results of checking the relevance of the attack type, supplier, and CVE. Secondly, it considers the characteristics of the topic group and CVE identified through the LDA algorithm by the Jaccard similarity technique. Third, the latest version of the MITER ATT&CK framework attack method, technology trend, and relevance between CVE are considered. We used the data within overseas sites provide reliable security information to review the usability of the proposed final formula CTRS. The scoring formula makes it possible to fast patch and respond to related information by identifying vulnerabilities with high relevance and risk only with some particular phrase.

Genetic Stability of the Plant-materials Induced in the Process of in vitro Organogenesis of Japanese Blood Grass (화본과 식물의 기내 기관분화 단계별 기관분화체의 유전적 안전성)

  • Ye-Jin Lee;In-Jin Kang;Chang-Hyu Bae
    • Proceedings of the Plant Resources Society of Korea Conference
    • /
    • 2023.04a
    • /
    • pp.35-35
    • /
    • 2023
  • 안정적인 유묘의 확보는 스마트작물생산을 위한 공정육묘 생산에서도 중요하며, 기내배양시 유전적 안정성이 높은 유묘의 대량증식은 유묘생산과 공정육묘생산에서 중요한 과정이다. 기내배양시 배양과 정에서 존재하는 체세포영양계변이(somaclonal variation)라는 장벽을 제거하는 것이 중요하다. 본 연구에서는 화본과 식물인 홍띠(Imperata cylindrica ‘Rubra’)로부터 기관분화 단계별 재분화체를 작성하여 기관분화 시 기내재생체의 유전적 안정성을 조사하였다. ISSR 마커에 기반하여 유전적 변이성을 조사하고자 7종류 총 21개체의 기관분화 단계별 재분화체 및 재분화식물체에 대하여 분석한 결과, 유전적 다형성은 기관분화 단계별 재분화체 및 순화 재분화체에서 대조구인 모식물체(1.4%) 대비 같거나 높게 나타나서 재분화체에서 유전적 안정성이 다소 낮은 것으로 나타났다. 또한, Jaccard 계수(Jaccard coefficient)로 총 21개체들 간의 유전적 유사도 지수를 평가한 결과, 유전적 유사도 지수는 0.747~1.0 사이에 분포하며, 평균 0.868로 나타났다. ISSR 마커 밴드에 기반하여 평균연결법(Average linkage method)으로 군집 분석한 결과, 모든 개체는 유사도 지수 0.809 ~ 1.000 내에 분포하였다. 유전적 유사도 지수 0.809에서 2개 그룹으로 유집되었으며, 모식물체와 실내재배, 노지재배 재분화 녹색 식물체가 같은 그룹으로 분류되었다. 이상의 결과는 화본과 식물의 기내배양에서 기관분화 시 존재하는 체세포영양계변이에 대한 기초 정보를 제공해 준다. 이들 기관분화에 따른 기내재생체의 안정성에 대한 연구자료는 향후 기내식물의 안정적인 대량번식에 있어 유익한 배경을 제공해 줄 것이다.

  • PDF

Cluster Analysis with Balancing Weight on Mixed-type Data

  • Chae, Seong-San;Kim, Jong-Min;Yang, Wan-Youn
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.719-732
    • /
    • 2006
  • A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.

서낙동강의 식물플랑크톤상과 군집동태

  • Choe, Cheol-Man;Mun, Seong-Gi
    • Proceedings of the Korean Environmental Sciences Society Conference
    • /
    • 2005.05a
    • /
    • pp.257-259
    • /
    • 2005
  • 서낙동강에서 조사된 식물플랑크톤은 6강 31과 128종류로 녹조류(Chlorophyceae)가 49종류(38.3%), 규조류(Bacillariophyceae)가 44종류(34.4%)였다. 계절별로는 여름에 최고 80종, 겨울에 최소 47종으로 출현하였으나 일반적인 경우와는 상이한 결과였다. 정점별로는 여름에 정점 1에서 63종류로 가장 많은 종수를, 가을과 겨울의 정점 4에서 18종류로 가장 적은 종수로 조사되어 계절별, 정점별 출현종수의 차이는 크게 나타났다. 생태적 주요종은 모두 63종류였고 Actinastrum hantzschii var. fluviatile을 비롯한 33종류가 광분포종, 오수지표종은 Ankistrodesmus falcatus를 비롯하여 28종류, 적조원인종은 Aulacoseira garanulata var. angustissima for. spiralis를 비롯하여 23종류, 우점종으로는 Aphanizomenon flos-aquae를 비롯하여 8종류, 출현빈번종은 Asterionella formosa를 비롯하여 7종이었다. Jaccard's coefficient에 의한 집괴분석을 실시한 결과, 거의 모든 계절에서 서낙동교를 중심으로 서낙동교 상부지역(st. 1 ${\sim}$ st. 3)과 하부지역(st. 4 ${\sim}$ st. 6)으로 구분되거나 또는 담수지역(st. 1 ${\sim}$ st. 4)과 해수의 영향이 미칠 것으로 예상되는 지역(st. 5 ${\sim}$ st. 6)의 두그룹으로 그룹지어졌다.

  • PDF

Red Tide Detection Based on Two Stage Filtering with MODIS Chlorophyll Information (MODIS 클로로필 정보를 이용한 2단계 필터링 기반 적조 탐지)

  • Kim, Yong-Min;Byun, Young-Gi;Kim, Yong-Il;Yu, Ki-Yun
    • Proceedings of the KSRS Conference
    • /
    • 2008.03a
    • /
    • pp.170-175
    • /
    • 2008
  • 본 연구는 MODIS에서 제공하는 클로로필 정보를 기반으로 하여 2단계 필터링을 통해 우리나라 동해, 남해 연안에 대규모로 발생했던 Cochlodinium polykrikoides 적조를 탐지하는 알고리즘을 제시한다. 일반적인 적조 탐지 연구들은 클로로필과 적조 발생의 상관성을 이용하여 클로로필의 농도가 높은 해역을 적조 발생 해역으로 탐지한다. 하지만 이 방법의 문제점은 적조가 발생하지 않은 해역을 적조 발생 해역으로 탐지함으로써 commission error를 발생시킨다는 것이다. 따라서 본 연구에서는 이러한 문제점을 극복하기 위해 MODIS에서 제공하는 클로로필 정보를 바탕으로 적조 발생 해역을 추출하고, 2단계 필터링 과정을 적용함으로써 진해, 여수, 남해도 부근 해역에서 발생한 commission error를 제거할 수 있었으며, 그 결과를 국립수산과학원의 적조속보자료와 함께 시각적 평가하여 본 연구에서 제안한 알고리즘의 효용성을 검증하였다. 향후 정량적인 평가를 위해 F-measure, JC(Jaccard coefficient), YC(Yule coefficient), 전체정확도를 탐지정확도 측정치로써 도입하여 정확도평가를 수행할 예정이다.

  • PDF