• Title/Summary/Keyword: Large tag data

Search Result 67, Processing Time 0.027 seconds

Discovering News Keyword Associations Using Association Rule Mining (연관규칙 마이닝을 활용한 뉴스기사 키워드의 연관성 탐사)

  • Kim, Han-Joon;Chang, Jae-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.11 no.6
    • /
    • pp.63-71
    • /
    • 2011
  • The current Web portal sites provide significant keywords with high popularity or importance; specifically, user-friendly services such as tag clouds and associated word search are provided. However, in general, since news articles are classified only with their date and categories, it is not easy for users to find other articles related to some articles while reading news articles classified with categories. And the conventional associated keyword service has not satisfied users sufficiently because it depends only upon user queries. This paper proposes a way of searching news articles by utilizing the keywords tightly associated with users' queries. Basically, the proposed method discovers a set of keyword association patterns by using the association rule mining technique that extracts association patterns for keywords by focusing upon sentences containing some keywords. The method enables users to navigate the space of associated keywords hidden in large news articles.

LSTM Model Design to Improve the Association of Keywords and Documents for Healthcare Services (의료서비스를 위한 키워드와 문서의 연관성 향상을 위한 LSTM모델 설계)

  • Kim, June-gyeom;Seo, Jin-beom;Cho, Young-bok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.75-77
    • /
    • 2021
  • A variety of search engines are currently in use. The search engine supports the retrieval of data required by users through three stages: crawling, index generation, and output of search results based on meta-tag information. However, a large number of documents obtained by searching for keywords are often unrelated or scarce. Because of these problems, it takes time and effort to grasp the content from the search results and classify the accuracy. The index of search engines is updated periodically, but the criteria for weighted values and update periods are different from one search engine to another. Therefore, this paper uses the LSTM model, which extracts the relationship between keywords entered by the user and documents instead of the existing search engine, and improves the relationship between keywords and documents by entering keywords that the user wants to find.

  • PDF

Prediction of Wave Breaking Using Machine Learning Open Source Platform (머신러닝 오픈소스 플랫폼을 활용한 쇄파 예측)

  • Lee, Kwang-Ho;Kim, Tag-Gyeom;Kim, Do-Sam
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.32 no.4
    • /
    • pp.262-272
    • /
    • 2020
  • A large number of studies on wave breaking have been carried out, and many experimental data have been documented. Moreover, on the basis of various experimental data set, many empirical or semi-empirical formulas based primarily on regression analysis have been proposed to quantitatively estimate wave breaking for engineering applications. However, wave breaking has an inherent variability, which imply that a linear statistical approach such as linear regression analysis might be inadequate. This study presents an alternative nonlinear method using an neural network, one of the machine learning methods, to estimate breaking wave height and breaking depth. The neural network is modeled using Tensorflow, a machine learning open source platform distributed by Google. The neural network is trained by randomly selecting the collected experimental data, and the trained neural network is evaluated using data not used for learning process. The results for wave breaking height and depth predicted by fully trained neural network are more accurate than those obtained by existing empirical formulas. These results show that neural network is an useful tool for the prediction of wave breaking.

Efficient Indirect Branch Predictor Based on Data Dependence (효율적인 데이터 종속 기반의 간접 분기 예측기)

  • Paik Kyoung-Ho;Kim Eun-Sung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.4 s.310
    • /
    • pp.1-14
    • /
    • 2006
  • The indirect branch instruction is a most substantial obstacle in utilizing ILP of modem high performance processors. The target address of an indirect branch has the polymorphic characteristic varied dynamically, so it is very difficult to predict the accurate target address. Therefore the performance of a processor with speculative methodology is reduced significantly due to the many execution cycle delays in occurring the misprediction. We proposed the very accurate and novel indirect branch prediction scheme so called data-dependence based prediction. The predictor results in the prediction accuracy of 98.92% using 1K entries, and. 99.95% using 8K But, all of the proposed indirect predictor including our predictor has a large hardware overhead for restoring expected target addresses as well as tags for alleviating an aliasing. Hence, we propose the scheme minimizing the hardware overhead without sacrificing the prediction accuracy. Our experiment results show that the hardware is reduced about 60% without the performance loss, and about 80% sacrificing only the performance loss of 0.1% in aspect of the tag overhead. Also, in aspect of the overhead of storing target addresses, it can save the hardware about 35% without the performance loss, and about 45% sacrificing only the performance loss of 1.11%.

Copy Number Deletion Has Little Impact on Gene Expression Levels in Racehorses

  • Park, Kyung-Do;Kim, Hyeongmin;Hwang, Jae Yeon;Lee, Chang-Kyu;Do, Kyoung-Tag;Kim, Heui-Soo;Yang, Young-Mok;Kwon, Young-Jun;Kim, Jaemin;Kim, Hyeon Jeong;Song, Ki-Duk;Oh, Jae-Don;Kim, Heebal;Cho, Byung-Wook;Cho, Seoae;Lee, Hak-Kyo
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.27 no.9
    • /
    • pp.1345-1354
    • /
    • 2014
  • Copy number variations (CNVs), important genetic factors for study of human diseases, may have as large of an effect on phenotype as do single nucleotide polymorphisms. Indeed, it is widely accepted that CNVs are associated with differential disease susceptibility. However, the relationships between CNVs and gene expression have not been characterized in the horse. In this study, we investigated the effects of copy number deletion in the blood and muscle transcriptomes of Thoroughbred racing horses. We identified a total of 1,246 CNVs of deletion polymorphisms using DNA re-sequencing data from 18 Thoroughbred racing horses. To discover the tendencies between CNV status and gene expression levels, we extracted CNVs of four Thoroughbred racing horses of which RNA sequencing was available. We found that 252 pairs of CNVs and genes were associated in the four horse samples. We did not observe a clear and consistent relationship between the deletion status of CNVs and gene expression levels before and after exercise in blood and muscle. However, we found some pairs of CNVs and associated genes that indicated relationships with gene expression levels: a positive relationship with genes responsible for membrane structure or cytoskeleton and a negative relationship with genes involved in disease. This study will lead to conceptual advances in understanding the relationship between CNVs and global gene expression in the horse.

A Folksonomy Ranking Framework: A Semantic Graph-based Approach (폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근)

  • Park, Hyun-Jung;Rho, Sang-Kyu
    • Asia pacific journal of information systems
    • /
    • v.21 no.2
    • /
    • pp.89-116
    • /
    • 2011
  • In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.

A Study on Flammability Risk of Flammable Liquid Mixture (가연성 액체 혼합물의 인화 위험성에 관한 연구)

  • Kim, Ju Suk;Koh, Jae Sun
    • Journal of the Society of Disaster Information
    • /
    • v.16 no.4
    • /
    • pp.701-711
    • /
    • 2020
  • Purpose: In this study, the risk of flammability of a liquid mixture was experimentally confirmed because the purpose of this study was to confirm the increase or decrease of the flammability risk in a mixture of two substances (combustible+combustible) and to present the risk of the mixture. Method: Flash point test method and result processing were tested based on KS M 2010-2008, a tag sealing test method used as a flash point test method for crude oil and petroleum products. The manufacturer of the equipment used in this experiment was Japan's TANAKA. The flash point was measured with a test equipment that satisfies the test standards of KS M 2010 with equipment produced by the company, and LP gas was used as the ignition source and water as the cooling water. In addition, when measuring the flash point, the temperature of the cooling water was tested using cooling water of about 2℃. Results: First of all, in the case of flammable + combustible mixtures, there was little change in flash point if the flash point difference between the two substances was not large, and if the flash point difference between the two substances was low, the flash point tended to increase as the number of substances with high flash point increased. However, in the case of toluene and methanol, the flash point of the mixture was lower than that of the material with a lower flash point. Also, in the case of a paint thinner, it was not easy to predict the flash point of the material because it was composed of a mixture, but as a result of experimental measurement, it was measured between -24℃ and 7℃. Conclusion: The results of this study are to determine the risk of mixtures through experimental studies on flammable mixtures for the purpose of securing the effectiveness of the details of the criteria for determining dangerous goods in the existing dangerous goods safety management method and securing the reliability and reproducibility of the determination of dangerous goods Criteria have been presented, and reference data on experimental criteria for flammable liquids that are regulated in firefighting sites can be provided. In addition, if this study accumulates know-how on differences in test methods, it is expected that it can be used as a basis for research on risk assessment of dangerous goods and as a basis for research on dangerous goods determination.