• Title/Summary/Keyword: Web Page Ranking

Search Result 30, Processing Time 0.027 seconds

Link ranking-based hierarchical structuring of web site (링크 중요도에 기반한 웹사이트의 계층 구조화)

  • Lim, Tae-Soo;Park, Bum-Hwan;Lee, Woo-Key
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.745-747
    • /
    • 2005
  • 수많은 웹페이지들이 하이퍼링크를 통해 복잡하게 연결된 그래프 구조를 가지고 있는 웹사이트를 계층적으로 구조화하는 것은 해당 사이트를 검색하고자 할 때, 정보를 재조직화하고 고려해야 할 대안들의 개수를 감소시킨다는 점에서 매우 유용하다. 본 논문은 웹사이트의 의미론적인 계층화를 최적화하기 위하여 사용자의 순회 경로, 즉 웹아크의 중요도 합을 최대화할 수 있는 트리 구조를 생성하였다. 구체적으로 첫째 PageRank에 기반한 웹아크 중요도를 생성하였고, 둘째 Minimum-Cost Arborescence 문제를 이용하여 최적 트리 구조를 생성하였다. 사용자의 질의에 독립적으로 생성된 트리 구조는 웹사이트의 의미 있는 계층 구조로서 사용자로 하여금 해당 사이트를 보다 효과적으로 검색할 수 있도록 도와줄 것이다.

  • PDF

PageRanking of Newly Crawled Web Documents (추가 수집 웹 문서를 위한 페이지랭크 할당 모델)

  • Oh, Eun-Jung;Kang, In-Ho;Kim, Gil-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.228-234
    • /
    • 2002
  • 사용자가 얻고자 하는 정보를 인터넷에서 빠르고 정확하게 검색하는 것은 중요하다. 웹 문서들 간의 상대적인 중요성을 나타내는 페이지랭크는 검객의 질을 높일 수 있어, 정보 검색에 많이 이용된다. 인터넷상의 웹 문서는 짧은 시간에 빠르게 증가하므로 새로운 문서들이 생성될 때마다 전체 문서의 페이지랭크를 계산하는 것은 많은 시간과 비용이 소모된다. 기존 웹 문서의 페이지랭크는 변경하지 않고 추가된 웹 문서들만으로 페이지랭크를 계산할 수 있다면 시간과 비용면에서 효율을 높일 수 있다. 본 논문에서는 추가되는 문서는 이전 문서의 페이지랭크에 많은 영향을 미치지 않는다는 점을 이용하여 추가되는 문서를 위한 페이지랭크를 할당 모델을 제시하고 평가한다.

  • PDF

A study on real-time internet comment system through sentiment analysis and deep learning application

  • Hae-Jong Joo;Ho-Bin Song
    • Journal of Platform Technology
    • /
    • v.11 no.2
    • /
    • pp.3-14
    • /
    • 2023
  • This paper proposes a big data sentiment analysis method and deep learning implementation method to provide a webtoon comment analysis web page for convenient comment confirmation and feedback of webtoon writers for the development of the cartoon industry in the video animation field. In order to solve the difficulty of automatic analysis due to the nature of Internet comments and provide various sentiment analysis information, LSTM(Long Short-Term Memory) algorithm, ranking algorithm, and word2vec algorithm are applied in parallel, and actual popular works are used to verify the validity. If the analysis method of this paper is used, it is easy to expand to other domestic and overseas platforms, and it is expected that it can be used in various video animation content fields, not limited to the webtoon field

  • PDF

Ranking Methods of Web Search using Genetic Algorithm (유전자 알고리즘을 이용한 웹 검색 랭킹방법)

  • Jung, Yong-Gyu;Han, Song-Yi
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.3
    • /
    • pp.91-95
    • /
    • 2010
  • Using artificial neural network to use a search preference based on the user's information, the ranking of search results that will enable flexible searches can be improved. After trained in several different queries by other users in the past, the actual search results in order to better reflect the use of artificial neural networks to neural network learning. In order to change the weights constantly moving backward in the network to change weights of backpropagation algorithm. In this study, however, the initial training, performance data, look for increasing the number of lessons that can be overfitted. In this paper, we have optimized a lot of objects that have a strong advantage to apply genetic algorithms to the relevant page of the search rankings flexible as an object to the URL list on a random selection method is proposed for the study.

Knowledge-based Semantic Meta-Search Engine (지식기반 의미 메타 검색엔진)

  • Lee, In-K.;Son, Seo-H.;Kwon, Soon-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.737-744
    • /
    • 2004
  • Retrieving relevant information well corresponding to the user`s request from web is a crucial task of search engines. However, most of conventional search engines based on pattern matching schemes to queries have a limitation that is not easy to provide results corresponding to the user`s request due to the uncertainty of queries. To overcome the limitation in this paper, we propose a framework for knowledge-based semantic meta-search engines with the following five processes: (i) Query formation, (ii) Query expansion, (iii) Searching, (iv) Ranking recreation, and (v) Knowledge base. From simulation results on english-based web documents, we can see that the Proposed knowledge-based semantic meta-search engine provides more correct and better searching results than those obtained by using the Google.

The Effective Blog Search Algorithm based on the Structural Features in the Blogspace (블로그의 구조적 특성을 고려한 효율적인 블로그 검색 알고리즘)

  • Kim, Jung-Hoon;Yoon, Tae-Bok;Lee, Jee-Hyong
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.7
    • /
    • pp.580-589
    • /
    • 2009
  • Today, most web pages are being created in the blogspace or evolving into the blogspace. A blog entry (blog page) includes non-traditional features of Web pages, such as trackback links, bloggers' authority, tags, and comments. Thus, the traditional rank algorithms are not proper to evaluate blog entries because those algorithms do not consider the blog specific features. In this paper, a new algorithm called "Blog-Rank" is proposed. This algorithm ranks blog entries by calculating bloggers' reputation scores, trackback scores, and comment scores based on the features of the blog entries. This algorithm is also applied to searching for information related to the users' queries in the blogspace. The experiment shows that it finds the much more relevant information than the traditional ranking algorithms.

A Regression-Model-based Method for Combining Interestingness Measures of Association Rule Mining (연관상품 추천을 위한 회귀분석모형 기반 연관 규칙 척도 결합기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.127-141
    • /
    • 2017
  • Advances in Internet technologies and the proliferation of mobile devices enabled consumers to approach a wide range of goods and services, while causing an adverse effect that they have hard time reaching their congenial items even if they devote much time to searching for them. Accordingly, businesses are using the recommender systems to provide tools for consumers to find the desired items more easily. Association Rule Mining (ARM) technology is advantageous to recommender systems in that ARM provides intuitive form of a rule with interestingness measures (support, confidence, and lift) describing the relationship between items. Given an item, its relevant items can be distinguished with the help of the measures that show the strength of relationship between items. Based on the strength, the most pertinent items can be chosen among other items and exposed to a given item's web page. However, the diversity of the measures may confuse which items are more recommendable. Given two rules, for example, one rule's support and confidence may not be concurrently superior to the other rule's. Such discrepancy of the measures in distinguishing one rule's superiority from other rules may cause difficulty in selecting proper items for recommendation. In addition, in an online environment where a web page or mobile screen can provide a limited number of recommendations that attract consumer interest, the prudent selection of items to be included in the list of recommendations is very important. The exposure of items of little interest may lead consumers to ignore the recommendations. Then, such consumers will possibly not pay attention to other forms of marketing activities. Therefore, the measures should be aligned with the probability of consumer's acceptance of recommendations. For this reason, this study proposes a model-based approach to combine those measures into one unified measure that can consistently determine the ranking of recommended items. A regression model was designed to describe how well the measures (independent variables; i.e., support, confidence, and lift) explain consumer's acceptance of recommendations (dependent variables, hit rate of recommended items). The model is intuitive to understand and easy to use in that the equation consists of the commonly used measures for ARM and can be used in the estimation of hit rates. The experiment using transaction data from one of the Korea's largest online shopping malls was conducted to show that the proposed model can improve the hit rates of recommendations. From the top of the list to 13th place, recommended items in the higher rakings from the proposed model show the higher hit rates than those from the competitive model's. The result shows that the proposed model's performance is superior to the competitive model's in online recommendation environment. In a web page, consumers are provided around ten recommendations with which the proposed model outperforms. Moreover, a mobile device cannot expose many items simultaneously due to its limited screen size. Therefore, the result shows that the newly devised recommendation technique is suitable for the mobile recommender systems. While this study has been conducted to cover the cross-selling in online shopping malls that handle merchandise, the proposed method can be expected to be applied in various situations under which association rules apply. For example, this model can be applied to medical diagnostic systems that predict candidate diseases from a patient's symptoms. To increase the efficiency of the model, additional variables will need to be considered for the elaboration of the model in future studies. For example, price can be a good candidate for an explanatory variable because it has a major impact on consumer purchase decisions. If the prices of recommended items are much higher than the items in which a consumer is interested, the consumer may hesitate to accept the recommendations.

Finding Influential Users in the SNS Using Interaction Concept : Focusing on the Blogosphere with Continuous Referencing Relationships (상호작용성에 의한 SNS 영향유저 선정에 관한 연구 : 연속적인 참조관계가 있는 블로고스피어를 중심으로)

  • Park, Hyunjung;Rho, Sangkyu
    • The Journal of Society for e-Business Studies
    • /
    • v.17 no.4
    • /
    • pp.69-93
    • /
    • 2012
  • Various influence-related relationships in Social Network Services (SNS) among users, posts, and user-and-post, can be expressed using links. The current research evaluates the influence of specific users or posts by analyzing the link structure of relevant social network graphs to identify influential users. We applied the concept of mutual interactions proposed for ranking semantic web resources, rather than the voting notion of Page Rank or HITS, to blogosphere, one of the early SNS. Through many experiments with network models, where the performance and validity of each alternative approach can be analyzed, we showed the applicability and strengths of our approach. The weight tuning processes for the links of these network models enabled us to control the experiment errors form the link weight differences and compare the implementation easiness of alternatives. An additional example of how to enter the content scores of commercial or spam posts into the graph-based method is suggested on a small network model as well. This research, as a starting point of the study on identifying influential users in SNS, is distinctive from the previous researches in the following points. First, various influence-related properties that are deemed important but are disregarded, such as scraping, commenting, subscribing to RSS feeds, and trusting friends, can be considered simultaneously. Second, the framework reflects the general phenomenon where objects interacting with more influential objects increase their influence. Third, regarding the extent to which a bloggers causes other bloggers to act after him or her as the most important factor of influence, we treated sequential referencing relationships with a viewpoint from that of PageRank or HITS (Hypertext Induced Topic Selection).

An Exploratory Study on the Competition Patterns Between Internet Sites in Korea (한국 인터넷사이트들의 산업별 경쟁유형에 대한 탐색적 연구)

  • Park, Yoonseo;Kim, Yongsik
    • Asia Marketing Journal
    • /
    • v.12 no.4
    • /
    • pp.79-111
    • /
    • 2011
  • Digital economy has grown rapidly so that the new business area called 'Internet business' has been dramatically extended as time goes on. However, in the case of Internet business, market shares of individual companies seem to fluctuate very extremely. Thus marketing managers who operate the Internet sites have seriously observed the competition structure of the Internet business market and carefully analyzed the competitors' behavior in order to achieve their own business goals in the market. The newly created Internet business might differ from the offline ones in management styles, because it has totally different business circumstances when compared with the existing offline businesses. Thus, there should be a lot of researches for finding the solutions about what the features of Internet business are and how the management style of those Internet business companies should be changed. Most marketing literatures related to the Internet business have focused on individual business markets. Specifically, many researchers have studied the Internet portal sites and the Internet shopping mall sites, which are the most general forms of Internet business. On the other hand, this study focuses on the entire Internet business industry to understand the competitive circumstance of online market. This approach makes it possible not only to have a broader view to comprehend overall e-business industry, but also to understand the differences in competition structures among Internet business markets. We used time-series data of Internet connection rates by consumers as the basic data to figure out the competition patterns in the Internet business markets. Specifically, the data for this research was obtained from one of Internet ranking sites, 'Fian'. The Internet business ranking data is obtained based on web surfing record of some pre-selected sample group where the possibility of double-count for page-views is controlled by method of same IP check. The ranking site offers several data which are very useful for comparison and analysis of competitive sites. The Fian site divides the Internet business areas into 34 area and offers market shares of big 5 sites which are on high rank in each category daily. We collected the daily market share data about Internet sites on each area from April 22, 2008 to August 5, 2008, where some errors of data was found and 30 business area data were finally used for our research after the data purification. This study performed several empirical analyses in focusing on market shares of each site to understand the competition among sites in Internet business of Korea. We tried to perform more statistically precise analysis for looking into business fields with similar competitive structures by applying the cluster analysis to the data. The research results are as follows. First, the leading sites in each area were classified into three groups based on averages and standard deviations of daily market shares. The first group includes the sites with the lowest market shares, which give more increased convenience to consumers by offering the Internet sites as complimentary services for existing offline services. The second group includes sites with medium level of market shares, where the site users are limited to specific small group. The third group includes sites with the highest market shares, which usually require online registration in advance and have difficulty in switching to another site. Second, we analyzed the second place sites in each business area because it may help us understand the competitive power of the strongest competitor against the leading site. The second place sites in each business area were classified into four groups based on averages and standard deviations of daily market shares. The four groups are the sites showing consistent inferiority compared to the leading sites, the sites with relatively high volatility and medium level of shares, the sites with relatively low volatility and medium level of shares, the sites with relatively low volatility and high level of shares whose gaps are not big compared to the leading sites. Except 'web agency' area, these second place sites show relatively stable shares below 0.1 point of standard deviation. Third, we also classified the types of relative strength between leading sites and the second place sites by applying the cluster analysis to the gap values of market shares between two sites. They were also classified into four groups, the sites with the relatively lowest gaps even though the values of standard deviation are various, the sites with under the average level of gaps, the sites with over the average level of gaps, the sites with the relatively higher gaps and lower volatility. Then we also found that while the areas with relatively bigger gap values usually have smaller standard deviation values, the areas with very small differences between the first and the second sites have a wider range of standard deviation values. The practical and theoretical implications of this study are as follows. First, the result of this study might provide the current market participants with the useful information to understand the competitive circumstance of the market and build the effective new business strategy for the market success. Also it might be useful to help new potential companies find a new business area and set up successful competitive strategies. Second, it might help Internet marketing researchers take a macro view of the overall Internet market so that make possible to begin the new studies on overall Internet market beyond individual Internet market studies.

  • PDF

A Methodology for Extracting Shopping-Related Keywords by Analyzing Internet Navigation Patterns (인터넷 검색기록 분석을 통한 쇼핑의도 포함 키워드 자동 추출 기법)

  • Kim, Mingyu;Kim, Namgyu;Jung, Inhwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.123-136
    • /
    • 2014
  • Recently, online shopping has further developed as the use of the Internet and a variety of smart mobile devices becomes more prevalent. The increase in the scale of such shopping has led to the creation of many Internet shopping malls. Consequently, there is a tendency for increasingly fierce competition among online retailers, and as a result, many Internet shopping malls are making significant attempts to attract online users to their sites. One such attempt is keyword marketing, whereby a retail site pays a fee to expose its link to potential customers when they insert a specific keyword on an Internet portal site. The price related to each keyword is generally estimated by the keyword's frequency of appearance. However, it is widely accepted that the price of keywords cannot be based solely on their frequency because many keywords may appear frequently but have little relationship to shopping. This implies that it is unreasonable for an online shopping mall to spend a great deal on some keywords simply because people frequently use them. Therefore, from the perspective of shopping malls, a specialized process is required to extract meaningful keywords. Further, the demand for automating this extraction process is increasing because of the drive to improve online sales performance. In this study, we propose a methodology that can automatically extract only shopping-related keywords from the entire set of search keywords used on portal sites. We define a shopping-related keyword as a keyword that is used directly before shopping behaviors. In other words, only search keywords that direct the search results page to shopping-related pages are extracted from among the entire set of search keywords. A comparison is then made between the extracted keywords' rankings and the rankings of the entire set of search keywords. Two types of data are used in our study's experiment: web browsing history from July 1, 2012 to June 30, 2013, and site information. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The original sample dataset contains 150 million transaction logs. First, portal sites are selected, and search keywords in those sites are extracted. Search keywords can be easily extracted by simple parsing. The extracted keywords are ranked according to their frequency. The experiment uses approximately 3.9 million search results from Korea's largest search portal site. As a result, a total of 344,822 search keywords were extracted. Next, by using web browsing history and site information, the shopping-related keywords were taken from the entire set of search keywords. As a result, we obtained 4,709 shopping-related keywords. For performance evaluation, we compared the hit ratios of all the search keywords with the shopping-related keywords. To achieve this, we extracted 80,298 search keywords from several Internet shopping malls and then chose the top 1,000 keywords as a set of true shopping keywords. We measured precision, recall, and F-scores of the entire amount of keywords and the shopping-related keywords. The F-Score was formulated by calculating the harmonic mean of precision and recall. The precision, recall, and F-score of shopping-related keywords derived by the proposed methodology were revealed to be higher than those of the entire number of keywords. This study proposes a scheme that is able to obtain shopping-related keywords in a relatively simple manner. We could easily extract shopping-related keywords simply by examining transactions whose next visit is a shopping mall. The resultant shopping-related keyword set is expected to be a useful asset for many shopping malls that participate in keyword marketing. Moreover, the proposed methodology can be easily applied to the construction of special area-related keywords as well as shopping-related ones.