• Title/Summary/Keyword: Jaccard's similarity index

Search Result 10, Processing Time 0.032 seconds

Comparison of Plant Diversity of Natural Forest and Plantations of Rema-Kalenga Wildlife Sanctuary of Bangladesh

  • Sobuj, Norul-Alam;Rahman, Mizanur
    • Journal of Forest and Environmental Science
    • /
    • v.27 no.3
    • /
    • pp.127-134
    • /
    • 2011
  • The purpose of the study was to assess and compare the diversity of plant species (trees, shrubs, herbs) of natural forest and plantations. A total of 52 plant species were recorded in the natural forest, of which 16 were trees, 15 were shrubs and 21 were herbs. On the contrary, 31 species of plants including 11 trees, 8 shrubs and 12 herbs were identified in plantation forest. Shannon-Wiener diversity index were 2.70, 2.72 and 3.12 for trees, shrubs and herbs respectively in the natural forest. However, it was 2.35 for tree species, 2.31 for shrub species and 2.81 for herb species in the plantation forest. Jaccard's similarity index showed that 71% species of trees, 44% species of shrubs and 43% species of herbs were same in plantations and natural forest.

Development of a Personalized Similarity Measure using Genetic Algorithms for Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.12
    • /
    • pp.219-226
    • /
    • 2018
  • Collaborative filtering has been most popular approach to recommend items in online recommender systems. However, collaborative filtering is known to suffer from data sparsity problem. As a simple way to overcome this problem in literature, Jaccard index has been adopted to combine with the existing similarity measures. We analyze performance of such combination in various data environments. We also find optimal weights of factors in the combination using a genetic algorithm to formulate a similarity measure. Furthermore, optimal weights are searched for each user independently, in order to reflect each user's different rating behavior. Performance of the resulting personalized similarity measure is examined using two datasets with different data characteristics. It presents overall superiority to previous measures in terms of recommendation and prediction qualities regardless of the characteristics of the data environment.

Comparison of User-generated Tags with Subject Descriptors, Author Keywords, and Title Terms of Scholarly Journal Articles: A Case Study of Marine Science

  • Vaidya, Praveenkumar;Harinarayana, N.S.
    • Journal of Information Science Theory and Practice
    • /
    • v.7 no.1
    • /
    • pp.29-38
    • /
    • 2019
  • Information retrieval is the challenge of the Web 2.0 world. The experiment of knowledge organisation in the context of abundant information available from various sources proves a major hurdle in obtaining information retrieval with greater precision and recall. The fast-changing landscape of information organisation through social networking sites at a personal level creates a world of opportunities for data scientists and also library professionals to assimilate the social data with expert created data. Thus, folksonomies or social tags play a vital role in information organisation and retrieval. The comparison of these user-created tags with expert-created index terms, author keywords and title words, will throw light on the differentiation between these sets of data. Such comparative studies show revelation of a new set of terms to enhance subject access and reflect the extent of similarity between user-generated tags and other set of terms. The CiteULike tags extracted from 5,150 scholarly journal articles in marine science were compared with corresponding Aquatic Science and Fisheries Abstracts descriptors, author keywords, and title terms. The Jaccard similarity coefficient method was employed to compare the social tags with the above mentioned wordsets, and results proved the presence of user-generated keywords in Aquatic Science and Fisheries Abstracts descriptors, author keywords, and title words. While using information retrieval techniques like stemmer and lemmatization, the results were found to enhance keywords to subject access.

The benefit of one cannot replace the other: seagrass and mangrove ecosystems at Santa Fe, Bantayan Island

  • Mendoza, Ayana Rose R.;Patalinghug, Jenny Marie R.;Divinagracia, Joshua Ybanez
    • Journal of Ecology and Environment
    • /
    • v.43 no.2
    • /
    • pp.183-190
    • /
    • 2019
  • Background: In the Philippines, the practice of planting mangroves over seagrass has been a practice done to promote coastline protection from damages done by storms. Despite the added protection to the coastline, the addition of an artificial ecosystem gradually inflicts damage to the ecosystem already established. In this study, seagrass communities that had no history of mangrove planting were compared with those that had mangrove planting. The percent substrate cover of seagrass in the sampling areas was determined, and the macroinvertebrates present in the sampling areas were also observed. The study was conducted based on reports of mangrove planting activity that disrupted seagrass functions on Santa Fe, Bantayan Island, Cebu. Transect-quadrat method sampling was done to assess the chosen sites. Results: Six species of seagrass was found on the site without mangrove planting which was barangay Ocoy (Cymodocea sp., Thalassia sp., Halodule sp., Enhalus sp., Halophila sp., and Syringodium sp.) and had a higher percent cover, while only four were found on the site with mangrove planting (barangay Marikaban). It was also found that barangay Marikaban had a lesser Shannon-Wiener and Simpson's index compared to barangay Ocoy. Jaccard's index of similarity between the two sites was low. Conclusion: With the results of the assessment, we recommend proper monitoring of future mangrove planting activities and that these activities should not disrupt another ecosystem as all ecosystems are important.

Complimentary Assessment for Conserving Vegetation on Protected Areas in South Korea (보호지역의 식물종 보전 상보성 평가)

  • Park, Jin-Han;Choe, Hyeyeong;Mo, Yongwon
    • Korean Journal of Environment and Ecology
    • /
    • v.34 no.5
    • /
    • pp.436-445
    • /
    • 2020
  • The number of protected areas has been steadily increased in Korea to achieve Aichi Target 11, and there are studies on potential protected areas that required additional designation. However, there has been an insufficient assessment of the complementarity of protected areas to conserve biodiversity effectively. This study identified the potential habitat areas using the species distribution model for plant species from the 3rd National Ecosystem Survey and compared the plant species abundance in the existing protected area and the potential protected areas using the similarity indices, such as the Jaccard index, Sorenson index, and Bray-Curtis index. As a result, we found that the complementarity of the existing protected areas and most potential protected areas were low, leading to the preservation of similar plant species. Only the buffer zone for Korea National Arboretum had high complementarity and thus is important to conserve some species with the other protected areas. This study confirmed that it was necessary to select additional protected areas outside the existing or potential protected areas to protect plant species with a low inclusion ratio of potential habitats within the protected area. This study is significant because it identified the ecological representativeness of each protected area to examine if the individual protected area can conserve unique and various species and proposed a method of finding candidate areas for additional conservation spatially. The findings of this study can be a valuable reference for the qualitative improvement of protected areas through the complementarity assessments, including animals and the effectiveness assessment study of protected areas using the National Ecosystem Survey data in the future.

Comparison of Occurrences of Coleoptera by Three Sampling Methods in Mt. Yeonyeop Area, Korea (채집법에 따른 연엽산 일대 딱정벌레목의 출현상 비교 분석)

  • Jeong Jong-Kook;Lee Seung-Il;Choi Jae-Seok;Kwon Oh-Kil
    • Korean Journal of Environmental Biology
    • /
    • v.23 no.3 s.59
    • /
    • pp.228-237
    • /
    • 2005
  • To compare the occurrence of Coleoptera by different sampling methods such as light trap, pitfall trap and sweeping, we collected samples every month from April to September,2004 in the Mt. Yeonyeop, Gangwon-do, Korea. According to the sampling methods, the species composition, abundance and dry weight were completely different. We collected 151 species in 35 families (690 individuals) by sweeping method, 148 species in 30 families (689 individuals) by light trap, and 112 species in 18 families (1,674 individuals) by pitfall trap, respectively. The dry weight in collected sample was about 181.46 g in pitfall trap,39.85 g in light trap, and 10.89 g in sweeping method, respectively. Relatively high flight and small-sized beetles such as Coccinellidae, Nitidulidae, Scarabaeidae were collected in light trap. The species diversity was high in July. Unlike the samples collected in light trap, the pitfall trap samples were big-sized saprophagous or carnivorous beetles such as Carabidae, Silphidae, Staphylinidae. The pitfall trap showed relatively the higher number of individual and lower species diversity compared to other methods. The major samples collected by sweeping method were small-sized carnivorous or herbivorous beetles such as Chrysomelidae, Curculionidae, Coccinellidae. The peak of species diversity occurred in May. The similarity was calculated with the Jaccard's index over the light trap-pitfall trap was 0.07, light trap-sweeping was 0.10, and pitfall trap-sweeping was 0.01. Consequently, similarity of sampling methods was relatively low. In conclusion, efficiency of the each sampling methods significantly differed in the species composition of Coleoptera. This study emphasize the necessity of using three sampling methods in the area of diversity research.

Brain Tumor Detection Based on Amended Convolution Neural Network Using MRI Images

  • Mohanasundari M;Chandrasekaran V;Anitha S
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.10
    • /
    • pp.2788-2808
    • /
    • 2023
  • Brain tumors are one of the most threatening malignancies for humans. Misdiagnosis of brain tumors can result in false medical intervention, which ultimately reduces a patient's chance of survival. Manual identification and segmentation of brain tumors from Magnetic Resonance Imaging (MRI) scans can be difficult and error-prone because of the great range of tumor tissues that exist in various individuals and the similarity of normal tissues. To overcome this limitation, the Amended Convolutional Neural Network (ACNN) model has been introduced, a unique combination of three techniques that have not been previously explored for brain tumor detection. The three techniques integrated into the ACNN model are image tissue preprocessing using the Kalman Bucy Smoothing Filter to remove noisy pixels from the input, image tissue segmentation using the Isotonic Regressive Image Tissue Segmentation Process, and feature extraction using the Marr Wavelet Transformation. The extracted features are compared with the testing features using a sigmoid activation function in the output layer. The experimental findings show that the suggested model outperforms existing techniques concerning accuracy, precision, sensitivity, dice score, Jaccard index, specificity, Positive Predictive Value, Hausdorff distance, recall, and F1 score. The proposed ACNN model achieved a maximum accuracy of 98.8%, which is higher than other existing models, according to the experimental results.

Genetic diversity of grapevine (Vitis vinifera L.) as revealed by ISSR markers

  • Basheer-Salimia, Rezq;Mujahed, Arwa
    • Journal of Plant Biotechnology
    • /
    • v.46 no.1
    • /
    • pp.1-8
    • /
    • 2019
  • The main goal of this study was to determine the genetic diversity among 36 grape cultivars grown in Palestine by using ISSR-polymerase chain reaction (PCR) fingerprints. Among the tested primers, 17 produced reasonable amplification products with high intensity and pattern stability. A total of 57 DNA fragments (loci) separated by electrophoresis on agarose gels were detected and they ranged in size, from 150 to 900 bp. Out of these fragments, 55 (88%) were polymorphic and 2 (3.5%) monomorphic. Our results also revealed an average of 3.1 loci per primer. A minimum of 1 and maximum of 10 DNA fragments were obtained (S-17, #820 and #841) and (S-31) primers, respectively. Therefore, the later primer (S-31) is considered to be the most powerful primer among the tested ones. The genetic distance matrix showed an average distance range of between 0.05 and 0.76. The maximum genetic distance value of 0.76 (24% similarity) was exhibited between the (Shami and Marawi.Hamadani.Adi) as well as (Bairuti and Marawi.Hamadani.Adi) genotypes. On the other hand, the lowest genetic distance of 0.05 (95% similarity) was exhibited between (Jandali.Tawel.Mofarad and Jandali. Kurawi.Mlzlz) along with (Shami.Aswad and Shami.mtartash. mlwn) genotypes. Furthermore, the UPGMA dendrogram generally clusters the grape cultivars into eight major clusters in addition to an isolated genotype. Based on these figures, the cultivars tested in this study could be characterized by large divergence at the DNA level. This is taking the assumption that our region has a very rich and varied clonal grape genetic structure.

아까시나무(Robinia pseudo-acacia)종자 단백질의 전기 영동 변이

  • 김창호;이호준;김용옥
    • The Korean Journal of Ecology
    • /
    • v.16 no.4
    • /
    • pp.515-526
    • /
    • 1993
  • In order to study the ecotypic variation of Rohinia pseudo-acacia L. distributed in southern area of Korean peninsula, 15 local populations(Daejin, Sokcho, Kangneung, Mt. Surak, Hongcheon, Kwangneung, Namhansanseong, Chungju, Yesan, Andong, Jeonju, Dalseong, Changweon, Mokpo and Wando), located from $34^{\circ}18'N\;to\;38^{\circ}36'N$, were selected based on the latitudes and geographical distances. Seeds of these populations were collected and protein contents of seeds and their band patterns were investigated. The seed proteins of all populations were electrophoresed on SDS-polyacrylamide gel. Total number of protein bands were 35, whose molecular weights ranged from 17, 258 daltons to 142, 232 daltons. The number of bands of seed proteins was 23 in Dalseong and Hongcheon and was 32 in Daejin and Sokcho, showing an increasing tendency in the number of bands as the latitude goes high. The local populations were classified into 3 local types based on protein analysis: the middle north east coastal type(Daejin, Sokcho. Kangneung), the central type (Mt. Surak, Hongcheon, Kwangneung, Namhansanseong, Chungju) and the southern type(Yesan, Andong, Jeonju, Dalseong, Changweon, Mokpo, Wando). According to the results of cluster analysis by UPGMA based on the similarity index(c0efficient of Jaccard) of the patterns, 3 local types were subdivided further into 6 types: the middle north east coastal type(Sokcho, Kangneung), the north central type I (Mt. Surak, Hongcheon), the north central type II (Narnhansanseong, Chungju, Daejin), the north central type III (Kwangneung), the south central type (Yesan, Dalseong, Jeonju) and the southern type(Andong, Changweon, Mokpo, Dalseong, Wando). The No. 12 band of the separated seed proteins showed the highest colored density in the preparations from all the populations. The No. 11~13 and No. 23~28 bands also showed high densities. As a whole, southern type populations (Changweon, Mokpo, Wando) showed high protein contents and high colored density. Total protein contents of the seeds in each population were variable from 9. 68mg / g (Mt. Surak) to 17.30mg/g (Jeonju), showing an increasing trends toward low latitudes.

  • PDF

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.