• Title/Summary/Keyword: web pages

Search Result 553, Processing Time 0.023 seconds

AN EFFICIENT DENSITY BASED ANT COLONY APPROACH ON WEB DOCUMENT CLUSTERING

  • M. REKA
    • Journal of applied mathematics & informatics
    • /
    • v.41 no.6
    • /
    • pp.1327-1339
    • /
    • 2023
  • World Wide Web (WWW) use has been increasing recently due to users needing more information. Lately, there has been a growing trend in the document information available to end users through the internet. The web's document search process is essential to find relevant documents for user queries.As the number of general web pages increases, it becomes increasingly challenging for users to find records that are appropriate to their interests. However, using existing Document Information Retrieval (DIR) approaches is time-consuming for large document collections. To alleviate the problem, this novel presents Spatial Clustering Ranking Pattern (SCRP) based Density Ant Colony Information Retrieval (DACIR) for user queries based DIR. The proposed first stage is the Term Frequency Weight (TFW) technique to identify the query weightage-based frequency. Based on the weight score, they are grouped and ranked using the proposed Spatial Clustering Ranking Pattern (SCRP) technique. Finally, based on ranking, select the most relevant information retrieves the document using DACIR algorithm.The proposed method outperforms traditional information retrieval methods regarding the quality of returned objects while performing significantly better in run time.

Reengineering Template-Based Web Applications to Single Page AJAX Applications (단일 페이지 AJAX 애플리케이션을 위한 템플릿 기반 웹 애플리케이션 재공학 기법)

  • Oh, Jaewon;Choi, Hyeon Cheol;Lim, Seung Ho;Ahn, Woo Hyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.1
    • /
    • pp.1-6
    • /
    • 2012
  • Web pages in a template-based web application (TWA) are automatically populated using a template shared by the pages with contents specific to the pages. So users can easily obtain information guided by a consistent structure of the template. Reduced duplicated code helps to increase the level of maintainability as well. However, TWA still has the interaction problem of classic web applications that each time a user clicks a hyperlink a new page is loaded, although a partial update of the page is desirable. This paper proposes a reengineering technique to transform the multi-page structure of legacy Java-based TWA to a single page one with partial page refresh. In this approach, hyperlinks in HTML code are refactored to AJAX-enabled event handlers to achieve the single page structure. In addition, JSP and Servlet code is transformed in order not to send data unnecessary for the partial update. The new single page consists of individual components that are updateable independently when interacting with a user. Therefore, our approach can improve interactivity and responsiveness towards a user while reducing CPU and network usage. The measurement of our technique applied to a typical TWA shows that our technique improves the response time of user requests over the TWA in the range from 1 to 87%.

An Experimental Study of Cocitation Analysis on Web Information (웹 정보원의 동시인용분석에 관한 실험적 연구)

  • 정동열;최윤미
    • Journal of the Korean Society for information Management
    • /
    • v.16 no.2
    • /
    • pp.7-26
    • /
    • 1999
  • This experimental study examines informetric analysis of World Wide Web based upon cocitation analysis of Web pages and features of Web resources in the field of communication studies. Cocitation analysis is basically performed to examine the intellectual structure of the communication studies in reflecting link count on the Web. The selected Web resources in the field are mapped in two dimensions based upon the similarities of cocitation frequency, correlation matrix, mutidimensional scale and cluster analysis. Cocitation analysis methods using organizational homepage, personal homepage, or Web index, to Web produced clustering of Web resources that had topical similarities. So far, although informetric analysis of Web resources is in the preliminary stage, it shows that Web can be a new tool for indicating the intellectual structure of a specific research field. In addition, this study analyzes characteristics of printing resources and Web resources, and differences of research methods in applying cocitation analysis.

  • PDF

Implementation of big web logs analyzer in estimating preferences for web contents (웹 컨텐츠 선호도 측정을 위한 대용량 웹로그 분석기 구현)

  • Choi, Eun Jung;Kim, Myuhng Joo
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.4
    • /
    • pp.83-90
    • /
    • 2012
  • With the rapid growth of internet infrastructure, World Wide Web is evolving recently into various services such as cloud computing, social network services. It simply go beyond the sharing of information. It started to provide new services such as E-business, remote control or management, providing virtual services, and recently it is evolving into new services such as cloud computing and social network services. These kinds of communications through World Wide Web have been interested in and have developed user-centric customized services rather than providing provider-centric informations. In these environments, it is very important to check and analyze the user requests to a website. Especially, estimating user preferences is most important. For these reasons, analyzing web logs is being done, however, it has limitations that the most of data to analyze are based on page unit statistics. Therefore, it is not enough to evaluate user preferences only by statistics of specific page. Because recent main contents of web page design are being made of media files such as image files, and of dynamic pages utilizing the techniques of CSS, Div, iFrame etc. In this paper, large log analyzer was designed and executed to analyze web server log to estimate web contents preferences of users. With mapreduce which is based on Hadoop, large logs were analyzed and web contents preferences of media files such as image files, sounds and videos were estimated.

A Study on the Persistence of Web Pages (웹페이지의 지속성에 관한 연구)

  • Chang, Woo-Kwon
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2013.08a
    • /
    • pp.3-6
    • /
    • 2013
  • 이 연구는 인용된 웹자원에서 웹사이트의 지속성을 알아보기 위한 것이다. 연구에서 사용된 데이터는 한국정보관리학회지에 수록된 각 연구논문의 참고문헌이다. 연구논문 대상 참고문헌은 2001년부터 2002년까지이다. 이를 통해 웹페이지의 콘텐츠 효용성을 알 수 있을 것이다.

  • PDF

Execution-based System and Its Performance Analysis for Detecting Malicious Web Pages using High Interaction Client Honeypot (고 상호작용 클라이언트 허니팟을 이용한 실행 기반의 악성 웹 페이지 탐지 시스템 및 성능 분석)

  • Kim, Min-Jae;Chang, Hye-Young;Cho, Seong-Je
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.1003-1007
    • /
    • 2009
  • Client-side attacks including drive-by download target vulnerabilities in client applications that interact with a malicious server or process malicious data. A typical client-side attack is web-based one related to a malicious web page exploiting specific browser vulnerability that can execute mal ware on the client system (PC) or give complete control of it to the malicious server. To defend those attacks, this paper has constructed high interaction client honeypot system using Capture-HPC that adopts execution-based detection in virtual machine. We have detected and classified malicious web pages using the system. We have also analyzed the system's performance in terms of the number of virtual machine images and the number of browsers executed simultaneously in each virtual machine. Experimental results show that the system with one virtual machine image obtains better performance with less reverting overhead. The system also shows good performance when the number of browsers executed simultaneously in a virtual machine is 50.

A Structural Complexity Metric for Web Application based on Similarity (유사도 기반의 웹 어플리케이션 구조 복잡도)

  • Jung, Woo-Sung;Lee, Eun-Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.8
    • /
    • pp.117-126
    • /
    • 2010
  • Software complexity is used to evaluate a target system's maintainability. The existing complexity metrics on web applications are count-based, so it is hard to incorporate the understandability of developers or maintainers. To make up for this shortcomings, entropy-theory can be applied to define complexity, however, it is assumed that information quantity of each paper is identical. In this paper, structural complexity of a web application is defined based on information theory and similarity. In detail, the proposed complexity is defined using entropy as the previous approach, but the information quantity of individual pages is defined using similarity. That is, a page which are similar with many pages has smaller information quantity than a page which are dissimilar to others. Furthermore, various similarity measures can be used for various views, which results in many-sided complexity measures. Finally, several complexity properties are applied to verify the proposed metric and case studies shows the applicability of the metric.

Ranking Quality Evaluation of PageRank Variations (PageRank 변형 알고리즘들 간의 순위 품질 평가)

  • Pham, Minh-Duc;Heo, Jun-Seok;Lee, Jeong-Hoon;Whang, Kyu-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.5
    • /
    • pp.14-28
    • /
    • 2009
  • The PageRank algorithm is an important component for ranking Web pages in Google and other search engines. While many improvements for the original PageRank algorithm have been proposed, it is unclear which variations (and their combinations) provide the "best" ranked results. In this paper, we evaluate the ranking quality of the well-known variations of the original PageRank algorithm and their combinations. In order to do this, we first classify the variations into link-based approaches, which exploit the link structure of the Web, and knowledge-based approaches, which exploit the semantics of the Web. We then propose algorithms that combine the ranking algorithms in these two approaches and implement both the variations and their combinations. For our evaluation, we perform extensive experiments using a real data set of one million Web pages. Through the experiments, we find the algorithms that provide the best ranked results from either the variations or their combinations.

Design and Implementation of a Speech Synthesis Engine and a Plug-in for Internet Web Page (인터넷 웹페이지의 음성합성을 위한 엔진 및 플러그-인 설계 및 구현)

  • Lee, Hee-Man;Kim, Ji-Yeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.2
    • /
    • pp.461-469
    • /
    • 2000
  • In the paper, the design and the implementation of the netscape plug-in and the speech synthesis enginegenerating the speech sounds from the text information of the web pages are described. The steps of the generating speech sound from an web pages are the speech synthesis plug-in is activated when the netscape finds the audio/xesp MIME data type embedded in the browsed web page; the HTML file referenced in the EMBED MTML tag is down loaded from the referenced URL to send to the commander object located in the said plug-in; The speech synthesis engine control tags and the text characters are extracted from the down loaded HTML document by the commander object the synthesized speech sounds are generated by the speech synthesis engine. The speech synthesis engine interprets the command streams from the commander objects to call the member functions for the processing of the speech segment data in the data banks. The commander object and the speech synthesis engine are designed as an independent object to enhancethe flexitility and the portability.

  • PDF

Design and Implementation of Web Directory Engine Using Dynamic Category Hierarchy (동적분류에 의한 주제별 웹 검색엔진의 설계 및 구현)

  • Choi Bum-Ghi;Park Sun;Park Tae-Su;Song Jae-Won;Lee Ju-Hong
    • Journal of Internet Computing and Services
    • /
    • v.7 no.2
    • /
    • pp.71-80
    • /
    • 2006
  • In web search engines, there are two main methods: directory searching and keyword searching. Keyword searching shows high recall rate but tends to come up with too many search results to find which users want to see the pages. Directory searching has also a difficulty to find the pages that users want in case of selecting improper category without knowing the exact category, that is, it shows high precision rates but low recall rates. We designed and implemented a new web search engine to resolve the problems of directory search method. It regards a category as a fuzzy set which contains keywords and calculate the degree of inclusion between categories. The merit of this method is to enhance the recall rate of directory searching by expanding subcategories on the basis of similarity.

  • PDF