• Title/Summary/Keyword: web site

Search Result 1,608, Processing Time 0.03 seconds

Intelligent Web Crawler for Supporting Big Data Analysis Services (빅데이터 분석 서비스 지원을 위한 지능형 웹 크롤러)

  • Seo, Dongmin;Jung, Hanmin
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.12
    • /
    • pp.575-584
    • /
    • 2013
  • Data types used for big-data analysis are very widely, such as news, blog, SNS, papers, patents, sensed data, and etc. Particularly, the utilization of web documents offering reliable data in real time is increasing gradually. And web crawlers that collect web documents automatically have grown in importance because big-data is being used in many different fields and web data are growing exponentially every year. However, existing web crawlers can't collect whole web documents in a web site because existing web crawlers collect web documents with only URLs included in web documents collected in some web sites. Also, existing web crawlers can collect web documents collected by other web crawlers already because information about web documents collected in each web crawler isn't efficiently managed between web crawlers. Therefore, this paper proposed a distributed web crawler. To resolve the problems of existing web crawler, the proposed web crawler collects web documents by RSS of each web site and Google search API. And the web crawler provides fast crawling performance by a client-server model based on RMI and NIO that minimize network traffic. Furthermore, the web crawler extracts core content from a web document by a keyword similarity comparison on tags included in a web documents. Finally, to verify the superiority of our web crawler, we compare our web crawler with existing web crawlers in various experiments.

Relevance of the Cyclomatic Complexity Threshold for the Web Programming (웹 프로그래밍을 위한 복잡도 한계값의 적정성)

  • Kim, Jee-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.6
    • /
    • pp.153-161
    • /
    • 2012
  • In this empirical study at the Web environment based on the frequency distribution of the cyclomatic complexity number of the application, the relevance of the threshold has been analyzed with the next two assumptions. The upper bound established by McCabe in the procedural programming equals 10 and the upper bound established by Lopez in the Java programming equals 5. Which numerical value can be adapted to Web application contexts? In order to answer this 10 web site projects have been collected and a sample of more than 4,000 ASP files has been measured. After analyzing the frequency distribution of the cyclomatic complexity of the Web application, experiment result is that more than 90% of Web application have a complexity less than 50 and also 50 is proposed as threshold of Web application. Web application has the complex architecture with Server, Client and HTML, and the HTML side has the high complexity 35~40. The reason of high complexity is that HTML program is usually made of menu type for home page or site map, and the relevance of that has been explained. In the near future we need to find out if there exist some hidden properties of the Web application architecture related to complexity.

Differentiation Strategies for a Women Portal Site: An Empirical Study (여성 포탈 사이트의 차별화 전략에 관한 실증 연구)

  • Kim, Hyun-Soo;Kim, Na-Rang;Hong, Soon-Goo
    • Information Systems Review
    • /
    • v.4 no.2
    • /
    • pp.169-189
    • /
    • 2002
  • The recent survey shows that female web-suffers in Korea make up to 45.3% of total netizens and this growth will be continued. While the market in women portal sites is spotlighted recently, a brand power of a women portal site is still weaker to compete with other general portal sites. To be competitive, the women portal site should implement differentiation strategies. The primary objective of this research is to suggest the differentiation strategies for a women portal site based on the 6C (contents, community, commerce, connection, communication, and customizing). To achieve this goal, the questionnaire is posted on the web site and 348 samples are analyzed. As a result, the 15 differentiation strategies that are validated via the web-survey are presented. The results of this study are expected to help companies develop appropriate strategies for women Internet users.

A Study on Development of Remote Site Monitoring System in Public Road Construction Projects (공공 도로건설사업에서의 원격 현장모니터링 체계 구축에 관한 연구)

  • Ok, Hyun
    • International Journal of Highway Engineering
    • /
    • v.14 no.6
    • /
    • pp.57-65
    • /
    • 2012
  • PURPOSES : Efficiency Improvement of a public road construction project management work using the development of a real-time remote site monitoring system METHODS : In this study, we developed the remote site monitoring system using a web camera for road construction projects in the RCMA(Regional Construction Management Administration). We can be monitored a construction progress and a weak point of the situation in real time using this. To achieve this, we tested about 10 road construction projects ordered by RCMA. Then, we verified a applicability for the site monitoring system in future. RESULTS : Take advantage of the remote site monitoring system consists of the Construction CALS System, one of the business systems used in the part of the MLTM(Ministry of Land, Transport and Maritime Affairs) institution-agencies. Was configured to be served through the "Construction CALS System" of "Construction Management System(Contractors)" and the "Construction CALS Portal System". Through this, Benefit analysis through a pilot application of the 10 road construction sites and developing considerations and "Guide for visual information processing equipment installation-operating in construction sites"are presented. CONCLUSIONS : Through the establishment of remote site monitoring system can improve the efficiency of construction management services. In addition, Various disasters and calamities, accidents and illegal construction will be prevented in advance is expected. This is expected to further improve the quality of the facilities.

Page Logging System for Web Mining Systems (웹마이닝 시스템을 위한 페이지 로깅 시스템)

  • Yun, Seon-Hui;O, Hae-Seok
    • The KIPS Transactions:PartC
    • /
    • v.8C no.6
    • /
    • pp.847-854
    • /
    • 2001
  • The Web continues to grow fast rate in both a large aclae volume of traffic and the size and complexity of Web sites. Along with growth, the complexity of tasks such as Web site design Web server design and of navigating simply through a Web site have increased. An important input to these design tasks is the analysis of how a web site is being used. The is paper proposes a Page logging System(PLS) identifying reliably user sessions required in Web mining system PLS consists of Page Logger acquiring all the page accesses of the user Log processor producing user session from these data, and statements to incorporate a call to page logger applet. Proposed PLS abbreviates several preprocessing tasks which spends a log of time and efforts that must be performed in Web mining systems. In particular, it simplifies the complexity of transaction identification phase through acquiring directly the amount of time a user stays on a page. Also PLS solves local cache hits and proxy IPs that create problems with identifying user sessions from Web sever log.

  • PDF

An Integrated Data Mining Model for Customer Relationship Management (고객관계관리를 위한 통합 데이터마이닝 모형 연구)

  • Song, In-Young;Yi, Tae-Seok;Shin, Ki-Jeong;Kim, Kyung-Chang
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.3
    • /
    • pp.83-99
    • /
    • 2007
  • Nowadays, the advancement of digital information technology resulting in the increased interest of the management and the use of information has given stimulus to the research on the use and management of information. In this paper, we propose an integrated data mining model that can provide the necessary information and interface to users of scientific information portal service according to their respective classification groups. The integrated model classifies users from log files automatically collected by the web server based on users' behavioral patterns. By classifying the existing users of the web site, which provides information service, and analyzing their patterns, we proposed a web site utilization methodology that provides dynamic interface and user oriented site operating policy. In addition, we believe that our research can provide continuous web site user support, as well as provide information service according to user classification groups.

  • PDF

Development of Risk Communication Strategy and Educational Homepage on Food Additives (식품첨가물 Risk Communication 전략 모형 개발 및 교육용 홈페이지 구축)

  • Kim, Sang-Mi;Kim, Jeong-Weon
    • Korean Journal of Community Nutrition
    • /
    • v.15 no.2
    • /
    • pp.240-252
    • /
    • 2010
  • The purpose of this research was to develop risk communication (RC) strategy and educational web-site on food additives for elementary students and their parents to improve their perception on food additives and dietary life. First of all, a survey was conducted from 1,200 elementary children and their parents to diagnose the perception and information needs on food additives. The survey revealed that most children and their parents did not have enough knowledge on food additives and demanded the safety information on food additives. Second, previous researches on food communication were analyzed to develop a risk communication model, and it was directly applied in this study. Third, a web site (www.foodnara.go.kr/foodaddy) was developed to upload the education materials along with up-to-date information and classroom activities for teachers on food additives. Fourth, the developed homepage was evaluated by applying to about 100 children and parents each, and majority of them showed high levels of understanding (children 85.7%, parents 79%) and satisfaction (children 77.2%, parents 64%), and the effect of getting over the prejudice against food additives was observed. The RC model developed in this study could be applied to any food risk communication, and the content and materials in this web site including booklets, animations, and quiz could be used effectively to promote the communication on food additives. In the future, it will be necessary to advertise the web site to be utilized by various consumer levels and to update the contents continuously by developing consumer-friendly communication materials.

An Experimental Study on Topic Distillation Using Web Site Structure (웹 사이트 구조를 이용한 토픽 검색 연구)

  • Lee, Jee-Suk;Chung, Yung-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.3
    • /
    • pp.201-218
    • /
    • 2007
  • This study proposes a topic distillation algorithm that ranks the relevant sites selected from retrieved web pages, and evaluates the performance of the algorithm. The algorithm calculates the topic score of a site using its hierarchical structure. The TREC .GOV test collection and a set of TREC-2004 queries for topic distillation task are used for the experiment. The experimental results showed the algorithm returned at least 2 relevant sites in top ten retrieval results. We peformed an in-depth analysis of the relevant sites list provided by TREC-2004 to find out that the definition of topic distillation was not strictly applied in selecting relevant sites. When we re-evaluated the retrieved sites/sub-sites using the revised list of relevant sites, the performance of the proposed algorithm was improved significantly.