• Title/Summary/Keyword: jaccard

Search Result 89, Processing Time 0.027 seconds

Multi-Modal Based Malware Similarity Estimation Method (멀티모달 기반 악성코드 유사도 계산 기법)

  • Yoo, Jeong Do;Kim, Taekyu;Kim, In-sung;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.2
    • /
    • pp.347-363
    • /
    • 2019
  • Malware has its own unique behavior characteristics, like DNA for living things. To respond APT (Advanced Persistent Threat) attacks in advance, it needs to extract behavioral characteristics from malware. To this end, it needs to do classification for each malware based on its behavioral similarity. In this paper, various similarity of Windows malware is estimated; and based on these similarity values, malware's family is predicted. The similarity measures used in this paper are as follows: 'TF-IDF cosine similarity', 'Nilsimsa similarity', 'malware function cosine similarity' and 'Jaccard similarity'. As a result, we find the prediction rate for each similarity measure is widely different. Although, there is no similarity measure which can be applied to malware classification with high accuracy, this result can be helpful to select a similarity measure to classify specific malware family.

Proposal of Content Recommend System on Insurance Company Web Site Using Collaborative Filtering (협업필터링을 활용한 보험사 웹 사이트 내의 콘텐츠 추천 시스템 제안)

  • Kang, Jiyoung;Lim, Heuiseok
    • Journal of Digital Convergence
    • /
    • v.17 no.11
    • /
    • pp.201-206
    • /
    • 2019
  • While many users searched for insurance information online, there were not many cases of contents recommendation researches on insurance companies' websites. Therefore, this study proposed a page recommendation system with high possibility of preference to users by utilizing page visit history of insurance companies' websites. Data was collected by using client-side storage that occurs when using a web browser. Collaborative filtering was applied to research as a recommendation technique. As a result of experiment, we showed good performance in item-based collaborative (IBCF) based on Jaccard index using binary data which means visit or not. In the future, it will be possible to implement a content recommendation system that matches the marketing strategy when used in a company by studying recommendation technology that weights items.

Predicting link of R&D network to stimulate collaboration among education, industry, and research (산학연 협업 활성화를 위한 R&D 네트워크 연결 예측 연구)

  • Park, Mi-yeon;Lee, Sangheon;Jin, Guocheng;Shen, Hongme;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.37-52
    • /
    • 2015
  • The recent global trends display expansion and growing solidity in both cooperative collaboration between industry, education, and research and R&D network systems. A greater support for the network and cooperative research sector would open greater possibilities for the evolution of new scholar and industrial fields and the development of new theories evoked from synergized educational research. Similarly, the national need for a strategy that can most efficiently and effectively support R&D network that are established through the government's R&D project research is on the rise. Despite the growing urgency, due to the habitual dependency on simple individual personal information data regarding R&D industry participants and generalized statistical data references, the policies concerning network system are disappointing and inadequate. Accordingly, analyses of the relationships involved for each subject who is participating in the R&D industry was conducted and on the foundation of an educational-industrial-research network system, possible changes within and of the network that may arise were predicted. To predict the R&D network transitions, Common Neighbor and Jaccard's Coefficient models were designated as the basic foundational models, upon which a new prediction model was proposed to address the limitations of the two aforementioned former models and to increase the accuracy of Link Prediction, with which a comparative analysis was made between the two models. Through the effective predictions regarding R&D network changes and transitions, such study result serves as a stepping-stone for an establishment of a prospective strategy that supports a desirable educational-industrial-research network and proposes a measure to promote the national policy to one that can effectively and efficiently sponsor integrated R&D industries. Though both weighted applications of Common Neighbor and Jaccard's Coefficient models provided positive outcomes, improved accuracy was comparatively more prevalent in the weighted Common Neighbor. An un-weighted Common Neighbor model predicted 650 out of 4,136 whereas a weighted Common Neighbor model predicted 50 more results at a total of 700 predictions. While the Jaccard's model demonstrated slight performance improvements in numeric terms, the differences were found to be insignificant.

A Study on the Cloud Detection Technique of Heterogeneous Sensors Using Modified DeepLabV3+ (DeepLabV3+를 이용한 이종 센서의 구름탐지 기법 연구)

  • Kim, Mi-Jeong;Ko, Yun-Ho
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_1
    • /
    • pp.511-521
    • /
    • 2022
  • Cloud detection and removal from satellite images is an essential process for topographic observation and analysis. Threshold-based cloud detection techniques show stable performance because they detect using the physical characteristics of clouds, but they have the disadvantage of requiring all channels' images and long computational time. Cloud detection techniques using deep learning, which have been studied recently, show short computational time and excellent performance even using only four or less channel (RGB, NIR) images. In this paper, we confirm the performance dependence of the deep learning network according to the heterogeneous learning dataset with different resolutions. The DeepLabV3+ network was improved so that channel features of cloud detection were extracted and learned with two published heterogeneous datasets and mixed data respectively. As a result of the experiment, clouds' Jaccard index was low in a network that learned with different kind of images from test images. However, clouds' Jaccard index was high in a network learned with mixed data that added some of the same kind of test data. Clouds are not structured in a shape, so reflecting channel features in learning is more effective in cloud detection than spatial features. It is necessary to learn channel features of each satellite sensors for cloud detection. Therefore, cloud detection of heterogeneous sensors with different resolutions is very dependent on the learning dataset.

Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data (고차원 (유전자 발현) 자료에 대한 군집 타당성분석 기법의 성능 비교)

  • Jeong, Yun-Kyoung;Baek, Jang-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.167-181
    • /
    • 2007
  • Many clustering algorithms and cluster validation techniques for high-dimensional gene expression data have been suggested. The evaluations of these cluster validation techniques have, however, seldom been implemented. In this paper we compared various cluster validity indices for low-dimensional simulation data and real gene expression data, and found that Dunn's index is the most effective and robust, Silhouette index is next and Davies-Bouldin index is the bottom among the internal measures. Jaccard index is much more effective than Goodman-Kruskal index and adjusted Rand index among the external measures.

Longitudinal Variation of Fish Communities in the Geum River, Korea: Application of the Concept of Beta Diversity and Local Uniqueness

  • Kim, Jeong-Hui;Park, Sang-Hyeon;Baek, Seung-Ho;Hong, Donghyun;Jo, Hyunbin
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • v.3 no.2
    • /
    • pp.122-128
    • /
    • 2022
  • To present the spatial variation of fish assemblages in the Geum River in Korea, the concept of beta diversity (β-diversity) estimates based on the variance of the community data table was applied. Fish communities and environmental variables were collected from 13 sampling sites along the in mid-low reaches of the River. We calculated the β-diversity and local contribution to beta diversity (LCBD) values at each site depending on the two types of data, 'occurrence' with Jaccard and Sørensen dissimilarity coefficients, and 'abundance' with Hellinger distance. Multivariate and correlation analyses were also performed to determine the relationships between LCBD and other variables, such as community indices and physicochemical and hydrological factors. The β-diversity values of fish communities in the River were estimated as 0.218 and 0.145 for occurrence data table with Jaccard and Sørensen respectively, and 0.268 for abundance data. Similar patterns of LCBD along the sampling sites were detected in two dissimilarity measurements of occurrence table, and LCBD values with abundance data were slightly different. The LCBD values are strongly correlated with community indices, and also suitable for indicating the uniqueness of fish assemblages. However, further research is needed to determine the LCBD value as an indicator of environmental variability.

Attention-based deep learning framework for skin lesion segmentation (피부 병변 분할을 위한 어텐션 기반 딥러닝 프레임워크)

  • Afnan Ghafoor;Bumshik Lee
    • Smart Media Journal
    • /
    • v.13 no.3
    • /
    • pp.53-61
    • /
    • 2024
  • This paper presents a novel M-shaped encoder-decoder architecture for skin lesion segmentation, achieving better performance than existing approaches. The proposed architecture utilizes the left and right legs to enable multi-scale feature extraction and is further enhanced by integrating an attention module within the skip connection. The image is partitioned into four distinct patches, facilitating enhanced processing within the encoder-decoder framework. A pivotal aspect of the proposed method is to focus more on critical image features through an attention mechanism, leading to refined segmentation. Experimental results highlight the effectiveness of the proposed approach, demonstrating superior accuracy, precision, and Jaccard Index compared to existing methods

Malicious Trojan Horse Application Discrimination Mechanism using Realtime Event Similarity on Android Mobile Devices (안드로이드 모바일 단말에서의 실시간 이벤트 유사도 기반 트로이 목마 형태의 악성 앱 판별 메커니즘)

  • Ham, You Joung;Lee, Hyung-Woo
    • Journal of Internet Computing and Services
    • /
    • v.15 no.3
    • /
    • pp.31-43
    • /
    • 2014
  • Large number of Android mobile application has been developed and deployed through the Android open market by increasing android-based smart work device users recently. But, it has been discovered security vulnerabilities on malicious applications that are developed and deployed through the open market or 3rd party market. There are issues to leak user's personal and financial information in mobile devices to external server without the user's knowledge in most of malicious application inserted Trojan Horse forms of malicious code. Therefore, in order to minimize the damage caused by malignant constantly increasing malicious application, it is required a proactive detection mechanism development. In this paper, we analyzed the existing techniques' Pros and Cons to detect a malicious application and proposed discrimination and detection result using malicious application discrimination mechanism based on Jaccard similarity after collecting events occur in real-time execution on android-mobile devices.

Extraction of Building Boundary on Aerial Image Using Segmentation and Overlaying Algorithm (분할과 중첩 기법을 이용한 항공 사진 상의 빌딩 경계 추출)

  • Kim, Yong-Min;Chang, An-Jin;Kim, Yong-Il
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.30 no.1
    • /
    • pp.49-58
    • /
    • 2012
  • Buildings become complex and diverse with time. It is difficult to extract individual buildings using only an optical image, because they have similar spectral characteristics to objects such as vegetation and roads. In this study, we propose a method to extract building area and boundary through integrating airborne Light Detection and Ranging(LiDAR) data and aerial images. Firstly, a binary edge map was generated using Edison edge detector after applying Adaptive dynamic range linear stretching radiometric enhancement algorithm to the aerial image. Secondly, building objects on airborne LiDAR data were extracted from normalized Digital Surface Model and aerial image. Then, a temporary building areas were extracted by overlaying the binary edge map and building objects extracted from LiDAR data. Finally, some building boundaries were additionally refined considering positional accuracy between LiDAR data and aerial image. The proposed method was applied to two experimental sites for validation. Through error matrix, F-measure, Jaccard coefficient, Yule coefficient, and Overall accuracy were calculated, and the values had a higher accuracy than 0.85.

RAPD Analysis of Three Deer Species in Malaysia

  • El-Jaafari, Habiba A.A.;Panandam, Jothi M.;Idris, Ismail;Siraj, Siti Shapor
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.21 no.9
    • /
    • pp.1233-1237
    • /
    • 2008
  • The genetic variability within and among three deer species in Malaysia, namely Cervus nippon (sika), Cervus timorensis (rusa) and Cervus unicolor (sambar), were evaluated using the RAPD technique. The DNA extracted from the buffy coat of 34 sika, 38 rusa and 9 sambar were analysed using ten primers that gave bands which showed good resolution. The primers generated 164 RAPD markers in total, and these ranged in size from 150 to 900 bp. The percent of polymorphism of the bands generated per primer ranged from 66.66-93.33% for rusa, 36.84-61.14% for sambar and 52.38-100% for sika. The overall percent polymorphism observed for the 164 RAPD markers was 99.39%. The results revealed five exclusive, monomorphic markers for sambar and one exclusive, monomorphic marker for sika; none was observed for rusa. However, these cannot be declared as markers for the identification of the species without analysis of more samples, populations and species. The means of within population genetic distances, based on Dice's and Jaccard's similarity indices, were similar for the rusa (0.383 and 0.542, respectively) and sika (0.397 and 0.558, respectively) populations with the sambar population being the least variable (0.194 and 0.323, respectively). The Dice based genetic distances within the species ranged from 0.194 to 0.397 and the genetic distances among the species were 0.791-0.911. The genetic distances based on Dice's and Jaccard's similarity indices between the rusa and sambar were 0.556 and 0.713, between the rusa and sika populations were 0.552 and 0.710, and between sambar and sika were 0.622 and 0.766, respectively.