• Title/Summary/Keyword: jaccard

Search Result 88, Processing Time 0.029 seconds

Use of Microsatellite Markers to Identify Commercial Melon Cultivars and for Hybrid Seed Purity Testing (Microsatellite Marker를 이용한 멜론 시판품종의 품종식별과 F1 순도검정)

  • Kwon, Yong-Sham;Hong, Jee-Hwa
    • Horticultural Science & Technology
    • /
    • v.32 no.4
    • /
    • pp.525-534
    • /
    • 2014
  • Microsatellite markers were used to identify 58 major commercial melon cultivars, and to assess hybrid seed purity of a melon breeding line known as '10H08'. A set of 412 microsatellite primer pairs were utilized for fingerprinting of the melon cultivars. Twenty-nine markers showed hyper-variability and could discriminate all cultivars on the basis of marker genotypes, representing the genetic variation within varietal groups. Cluster analysis based on Jaccard's distance coefficients using the UPGMA algorithm categorized 2 major groups, which were in accordance to morphological traits. The DNA bulks of female and male parents of breeding line '10H08' were tested with 29 primer pairs based on microsatellites to investigate purity testing of $F_1$ hybrid seeds, and 5 primer pairs exhibited polymorphism. One microsatellite primer pair (CMGAN12) produced unambiguous polymorphic bands among the parents. Among 192 seeds tested with CMGAN12, progeny possibly generated by self-pollination of the female parent were clearly distinguished from the hybrid progeny. These markers will be useful for fingerprinting melon cultivars and can help private seed companies to improve melon seed purity.

Construction of a Microsatellite Marker Database of Commercial Pepper Cultivars (유통 중인 고추 품종에 대한 Microsatellite 마커 Data Base 구축)

  • Kwon, Yong-Sham;Hong, Jee-Hwa;Choi, Keun-Jin
    • Horticultural Science & Technology
    • /
    • v.31 no.5
    • /
    • pp.580-589
    • /
    • 2013
  • This study was carried out to evaluate the suitability of microsatellite markers for varietal identification and genetic relationship of 170 commercial pepper cultivars. The relationship between marker genotypes and 11 pepper cultivars with different morphological traits was also analyzed. Of the 302 pairs of microsatellite primers screened against 11 pepper cultivars, 24 pairs were highly polymorphic in terms of number of alleles. These markers were applied for the construction of DNA profile data base for 170 commercial pepper cultivars. A total of 164 polymorphic amplified fragments were obtained from 24 microsatellite primers. The average polymorphism information content was 0.673 ranging from 0.324 to 0.824. One hundred and sixty four microsatellite alleles were used to calculate Jaccard's distance coefficients using unweighted pair group method. A clustering group of varieties, based on the results of microsatellite analysis, were categorized into 3 major groups corresponding to morphological traits. The phenogram discriminated all varieties by markers genotypes. These microsatellite markers will be useful as a tool for protection of plant breeders' intellectual property rights through variety identification in distinctness, uniformity and stability test.

Use of Microsatellite Markers Derived from Genomic and Expressed Sequence Tag (EST) Data to Identify Commercial Watermelon Cultivars (수박 시판 품종의 식별을 위한 Genomic과 Expressed Sequence Tag (EST)에서 유래된 Microsatellite Marker의 이용)

  • Kwon, Yong-Sham;Hong, Jee-Hwa;Kim, Du-Hyun;Kim, Do-Hoon
    • Horticultural Science & Technology
    • /
    • v.33 no.5
    • /
    • pp.737-750
    • /
    • 2015
  • This study was carried out to construct a DNA profile database for 102 watermelon cultivars through the comparison of polymorphism level and genetic relatedness using genomic microsatellite (gMS) and expressed sequence tag (EST)-microsatellite (eMS) markers. Sixteen gMS and 10 eMS primers showed hyper-variability and were able to represent the genetic variation within 102 watermelon cultivars. With gMS markers, an average of 3.63 alleles per marker were detected with a polymorphism information content (PIC) value of 0.479, whereas with eMS markers, the average number of alleles per marker was 2.50 and the PIC value was 0.425, indicating that eMS detects a lower polymorphism level compared to gMS. Cluster analysis and Jaccard's genetic distance coefficients using the unweighted pair group method with arithmetic average (UPGMA) based on the gMS, eMS, and combined data sets showed that 102 commercial watermelon cultivars could be categorized into 6 to 8 major groups corresponding to phenotypic traits. Moreover, this method was sufficient to identify 78 out of 102 cultivars. Correlation analysis with Mantel tests for those clusters using 3 data sets showed high correlation ($r{\geq}0.80$). Therefore, the microsatellite markers used in this study may serve as a useful tool for germplasm evaluation, genetic purity assessment, and fingerprinting of watermelon cultivars.

Development of An Automatic Classification System for Game Reviews Based on Word Embedding and Vector Similarity (단어 임베딩 및 벡터 유사도 기반 게임 리뷰 자동 분류 시스템 개발)

  • Yang, Yu-Jeong;Lee, Bo-Hyun;Kim, Jin-Sil;Lee, Ki Yong
    • The Journal of Society for e-Business Studies
    • /
    • v.24 no.2
    • /
    • pp.1-14
    • /
    • 2019
  • Because of the characteristics of game software, it is important to quickly identify and reflect users' needs into game software after its launch. However, most sites such as the Google Play Store, where users can download games and post reviews, provide only very limited and ambiguous classification categories for game reviews. Therefore, in this paper, we develop an automatic classification system for game reviews that categorizes reviews into categories that are clearer and more useful for game providers. The developed system converts words in reviews into vectors using word2vec, which is a representative word embedding model, and classifies reviews into the most relevant categories by measuring the similarity between those vectors and each category. Especially, in order to choose the best similarity measure that directly affects the classification performance of the system, we have compared the performance of three representative similarity measures, the Euclidean similarity, cosine similarity, and the extended Jaccard similarity, in a real environment. Furthermore, to allow a review to be classified into multiple categories, we use a threshold-based multi-category classification method. Through experiments on real reviews collected from Google Play Store, we have confirmed that the system achieved up to 95% accuracy.

Research on the Evaluation and Utilization of Constitutional Diagnosis by Korean Doctors using AI-based Evaluation Tool (인공지능 기반 평가 도구를 이용한 한의사의 체질 진단 평가 및 활용 방안에 대한 연구)

  • Park, Musun;Hwang, Minwoo;Lee, Jeongyun;Kim, Chang-Eop;Kwon, Young-Kyu
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.36 no.2
    • /
    • pp.73-78
    • /
    • 2022
  • Since Traditional Korean medicine (TKM) doctors use various knowledge systems during treatment, diagnosis results may differ for each TKM doctor. However, it is difficult to explain all the reasons for the diagnosis because TKM doctors use both explicit and implicit knowledge. In this study, an upgraded random forest (RF)-based evaluation tool was proposed to extract clinical knowledge of TKM doctors. Also, it was confirmed to what extent the professor's clinical knowledge was delivered to the trainees by using the evaluation tool. The data used to construct the evaluation tool were targeted at 106 people who visited the Sasang Constitutional Department at Kyung Hee University Korean Medicine Hospital at Gangdong. For explicit knowledge extraction, four TKM doctors were asked to express the importance of symptoms as scores. In addition, for implicit knowledge extraction, importance score was confirmed in the RF model that learned the patient's symptoms and the TKM doctor's constitutional determination results. In order to confirm the delivery of clinical knowledge, the similarity of symptoms that professors and trainees consider important when discriminating constitution was calculated using the Jaccard coefficient. As a result of the study, our proposed tool was able to successfully evaluate the clinical knowledge of TKM doctors. Also, it was confirmed that the professor's clinical knowledge was delivered to the trainee. Our tool can be used in various fields such as providing feedback on treatment, education of training TKM doctors, and development of AI in TKM.

A Tracking Method of Same Drug Sales Accounts through Similarity Analysis of Instagram Profiles and Posts

  • Eun-Young Park;Jiyeon Kim;Chang-Hoon Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.109-118
    • /
    • 2024
  • With the increasing number of social media users worldwide, cases of social media being abused to perpetrate various crimes are increasing. Specifically, drug distribution through social media is emerging as a serious social problem. Using social media channels, the curiosity of teenagers regarding drugs is stimulated through clever marketing. Further, social media easily facilitates drug purchases due to the high accessibility of drug sellers and consumers. Among various social media platforms, we focused on Instagram, which is the most used social media platform by young adults aged 19 to 24 years in South Korea. We collected four types of information, including profile photos, introductions, posts in the form of images, and posts in the form of texts on Instagram; then, we analyzed the similarity among each type of collected information. The profile photos and posts in the form of image were analyzed for similarity based on the SSIM(Structural Simplicity Index Measure), while introductions and posts in the form of text were analyzed for similarity using Jaccard and Cosine similarity techniques. Through the similarity analysis, the similarity among various accounts for each collected information type was measured, and accounts with similarity above the significance level were determined as the same drug sales account. By performing logistic regression analysis on the aforementioned information types, we confirmed that except posts in image form, profile photos, introductions, and posts in the text form were valid information for tracking the same drug sales account.

Application of diversity of recommender system accordingtouserpreferencechange (사용자 선호도 변화에 따른 추천시스템의 다양성 적용)

  • Na, Hyeyeon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.67-86
    • /
    • 2020
  • Recommender Systems have been huge influence users and business more and more. Recently the importance of E-commerce has been reached rapid growth greatly in world-wide COVID-19 pandemic. Recommender system is the center of E-commerce lively. Top ranked E-commerce managers mentioned that recommender systems have a major influence on customer's purchase such as about 50% of Netflix, Amazon sales from their recommender systems. Most algorithms have been focused on improving accuracy of recommender system regardless of novelty, diversity, serendipity etc. Recommender systems with only high accuracy cannot satisfy business long-term profit because of generating sales polarization. In addition, customers do not experience enjoyment of shopping from only focusing accuracy recommender system because customer's preference is changed constantly. Therefore, recommender systems with various values need to be developed for user's high satisfaction. Reranking is the most useful methodology to realize diversity of recommender system. In this paper, diversity of recommender system is represented through constructing high similarity with users who have different preference using each user's purchased item's category algorithm. It is distinguished from past research approach which is changing the algorithm of recommender system without user's diversity preference level. We tried to discover user's diversity preference level and observed the results how the effect was different according to user's diversity preference level. In addition, graph-based recommender system was used to show diversity through user's network, not collaborative filtering. In this paper, Amazon Grocery and Gourmet Food data was used because the low-involvement product, such as habitual product, foods, low-priced goods etc., had high probability to show customer's diversity. First, a bipartite graph with users and items simultaneously is constructed to make graph-based recommender system. However, each users and items unipartite graph also need to be established to show diversity of recommender system. The weight of each unipartite graph has played crucial role changing Jaccard Distance of item's category. We can observe two important results from the user's unipartite network. First, the user's diversity preference level is observed from the network and second, dissimilar users can be discovered in the user's network. Through the research process, diversity of recommender system is presented highly with small accuracy loss and optimalization for higher accuracy is possible controlling diversity ratio. This paper has three important theoretical points. First, this research expands recommender system research for user's satisfaction with various values. Second, the graph-based recommender system is developed newly. Third, the evaluation indicator of diversity is made for diversity. In addition, recommender systems are useful for corporate profit practically and this paper has contribution on business closely. Above all, business long-term profit can be improved using recommender system with diversity and the recommender system can provide right service according to user's diversity level. Lastly, the corporate selling low-involvement products have great effect based on the results.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.