• Title/Summary/Keyword: Jaccard index

Search Result 24, Processing Time 0.019 seconds

Analysis of Memory Pool Jacquard Similarity between Bitcoin and Ethereum in the Same Environment (동일한 환경에서 구성된 비트코인과 이더리움의 메모리 풀 자카드 유사도 분석)

  • Maeng, SooHoon;Shin, Hye-yeong;Kim, Daeyong;Ju, Hongtaek
    • KNOM Review
    • /
    • v.22 no.3
    • /
    • pp.20-24
    • /
    • 2019
  • Blockchain is a distributed ledger-based technology where all nodes participating in the blockchain network are connected to the P2P network. When a transaction is created in the blockchain network, the transaction is propagated and validated by the blockchain nodes. The verified transaction is sent to peers connected to each node through P2P network, and the peers keep the transaction in the memory pool. Due to the nature of P2P networks, the number and type of transactions delivered by a blockchain node is different for each node. As a result, all nodes do not have the same memory pool. Research is needed to solve problems such as attack detection. In this paper, we analyze transactions in the memory pool before solving problems such as transaction fee manipulation, double payment problem, and DDos attack detection. Therefore, this study collects transactions stored in each node memory pool of Bitcoin and Ethereum, a cryptocurrency system based on blockchain technology, and analyzes how much common transactions they have using jacquard similarity.

A Tracking Method of Same Drug Sales Accounts through Similarity Analysis of Instagram Profiles and Posts

  • Eun-Young Park;Jiyeon Kim;Chang-Hoon Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.109-118
    • /
    • 2024
  • With the increasing number of social media users worldwide, cases of social media being abused to perpetrate various crimes are increasing. Specifically, drug distribution through social media is emerging as a serious social problem. Using social media channels, the curiosity of teenagers regarding drugs is stimulated through clever marketing. Further, social media easily facilitates drug purchases due to the high accessibility of drug sellers and consumers. Among various social media platforms, we focused on Instagram, which is the most used social media platform by young adults aged 19 to 24 years in South Korea. We collected four types of information, including profile photos, introductions, posts in the form of images, and posts in the form of texts on Instagram; then, we analyzed the similarity among each type of collected information. The profile photos and posts in the form of image were analyzed for similarity based on the SSIM(Structural Simplicity Index Measure), while introductions and posts in the form of text were analyzed for similarity using Jaccard and Cosine similarity techniques. Through the similarity analysis, the similarity among various accounts for each collected information type was measured, and accounts with similarity above the significance level were determined as the same drug sales account. By performing logistic regression analysis on the aforementioned information types, we confirmed that except posts in image form, profile photos, introductions, and posts in the text form were valid information for tracking the same drug sales account.

아까시나무(Robinia pseudo-acacia)종자 단백질의 전기 영동 변이

  • 김창호;이호준;김용옥
    • The Korean Journal of Ecology
    • /
    • v.16 no.4
    • /
    • pp.515-526
    • /
    • 1993
  • In order to study the ecotypic variation of Rohinia pseudo-acacia L. distributed in southern area of Korean peninsula, 15 local populations(Daejin, Sokcho, Kangneung, Mt. Surak, Hongcheon, Kwangneung, Namhansanseong, Chungju, Yesan, Andong, Jeonju, Dalseong, Changweon, Mokpo and Wando), located from $34^{\circ}18'N\;to\;38^{\circ}36'N$, were selected based on the latitudes and geographical distances. Seeds of these populations were collected and protein contents of seeds and their band patterns were investigated. The seed proteins of all populations were electrophoresed on SDS-polyacrylamide gel. Total number of protein bands were 35, whose molecular weights ranged from 17, 258 daltons to 142, 232 daltons. The number of bands of seed proteins was 23 in Dalseong and Hongcheon and was 32 in Daejin and Sokcho, showing an increasing tendency in the number of bands as the latitude goes high. The local populations were classified into 3 local types based on protein analysis: the middle north east coastal type(Daejin, Sokcho. Kangneung), the central type (Mt. Surak, Hongcheon, Kwangneung, Namhansanseong, Chungju) and the southern type(Yesan, Andong, Jeonju, Dalseong, Changweon, Mokpo, Wando). According to the results of cluster analysis by UPGMA based on the similarity index(c0efficient of Jaccard) of the patterns, 3 local types were subdivided further into 6 types: the middle north east coastal type(Sokcho, Kangneung), the north central type I (Mt. Surak, Hongcheon), the north central type II (Narnhansanseong, Chungju, Daejin), the north central type III (Kwangneung), the south central type (Yesan, Dalseong, Jeonju) and the southern type(Andong, Changweon, Mokpo, Dalseong, Wando). The No. 12 band of the separated seed proteins showed the highest colored density in the preparations from all the populations. The No. 11~13 and No. 23~28 bands also showed high densities. As a whole, southern type populations (Changweon, Mokpo, Wando) showed high protein contents and high colored density. Total protein contents of the seeds in each population were variable from 9. 68mg / g (Mt. Surak) to 17.30mg/g (Jeonju), showing an increasing trends toward low latitudes.

  • PDF

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.