• Title/Summary/Keyword: web crawling

Search Result 176, Processing Time 0.023 seconds

An Analysis on Anti-Drone Technology Trends of Domestic Companies Using News Crawling on the Web (뉴스 기사의 크롤링을 통한 국내 기업의 안티 드론에 사용되는 기술 현황 분석)

  • Kim, Kyuseok
    • Journal of Advanced Navigation Technology
    • /
    • v.24 no.6
    • /
    • pp.458-464
    • /
    • 2020
  • Drones are being spreaded for the purposes such as construction, logistics, scientific research, recording, toy and so on. However, anti-drone related technologies which make the opposite drones neutralized are also widely being researched and developed because some drones are being used for crime or terror. The range of anti-drone related technologies can be divided into detection, identification and neutralization. The drone neutralization methods are divided into Soft-kill one which blocks the detected drones using jamming and Hard-kill one which destroys the detected ones physically. In this paper, Google and Naver domestic news articles related to anti-drone were gathered. Analyzing the domestic news articles, 8 of related technologies using RF, GNSS, Radar and so on were found. Regarding as this, the general features and usage status of those technologies were described and those on anti-drone for each company and agency were gathered and analyzed.

URL Signatures for Improving URL Normalization (URL 정규화 향상을 위한 URL 서명)

  • Soon, Lay-Ki;Lee, Sang-Ho
    • Journal of KIISE:Databases
    • /
    • v.36 no.2
    • /
    • pp.139-149
    • /
    • 2009
  • In the standard URL normalization mechanism, URLs are normalized syntactically by a set of predefined steps. In this paper, we propose to complement the standard URL normalization by incorporating the semantically meaningful metadata of the web pages. The metadata taken into consideration are the body texts and the page size of the web pages, which can be extracted during HTML parsing. The results from our first exploratory experiment indicate that the body texts are effective in identifying equivalent URLs. Hence, given a URL which has undergone the standard normalization, we construct its URL signature by hashing the body text of the associated web page using Message-Digest algorithm 5 in the second experiment. URLs which share identical signatures are considered to be equivalent in our scheme. The results in the second experiment show that our proposed URL signatures were able to further reduce redundant URLs by 32.94% in comparison with the standard URL normalization.

A Study of Comparison between Cruise Tours in China and U.S.A through Big Data Analytics

  • Shuting, Tao;Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.23 no.6
    • /
    • pp.1-11
    • /
    • 2017
  • The purpose of this study was to compare the cruise tours between China and U.S.A. through the semantic network analysis of big data by collecting online data with SCTM (Smart crawling & Text mining), a data collecting and processing program. The data analysis period was from January $1^{st}$, 2015 to August $15^{th}$, 2017, meanwhile, "cruise tour, china", "cruise tour, usa" were conducted to be as keywords to collet related data and packaged Netdraw along with UCINET 6.0 were utilized for data analysis. Currently, Chinese cruisers concern on the cruising destinations while American cruisers pay more attention on the onboard experience and cruising expenditure. After performing CONCOR (convergence of iterated correlation) analysis, for Chinese cruise tour, there were three clusters created with domestic destinations, international destinations and hospitality tourism. As for American cruise tour, four groups have been segmented with cruise expenditure, onboard experience, cruise brand and destinations. Since the cruise tourism of America was greatly developed, this study also was supposed to provide significant and social network-oriented suggestions for Chinese cruise tourism.

A Study on Big Data Processing Technology Based on Open Source for Expansion of LIMS (실험실정보관리시스템의 확장을 위한 오픈 소스 기반의 빅데이터 처리 기술에 관한 연구)

  • Kim, Soon-Gohn
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.2
    • /
    • pp.161-167
    • /
    • 2021
  • Laboratory Information Management System(LIMS) is a centralized database for storing, processing, retrieving, and analyzing laboratory data, and refers to a computer system or system specially designed for laboratories performing inspection, analysis, and testing tasks. In particular, LIMS is equipped with a function to support the operation of the laboratory, and it requires workflow management or data tracking support. In this paper, we collect data on websites and various channels using crawling technology, one of the automated big data collection technologies for the operation of the laboratory. Among the collected test methods and contents, useful test methods and contents useful that the tester can utilize are recommended. In addition, we implement a complementary LIMS platform capable of verifying the collection channel by managing the feedback.

Big Data Analysis of the Annals of the Joseon Dynasty Using Jsoup (Jsoup를 이용한 조선왕조실록의 빅 데이터 분석)

  • Bong, Young-Il;Lee, Choong-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.131-133
    • /
    • 2021
  • The Annals of the Joseon Dynasty are important records registered in UNESCO. This paper proposes a method to analyze big data by examining the frequency of words in the Annals of the Joseon Dynasty translated into Korean. When you access the Annals of the Joseon Dynasty from an Internet site and try to investigate the frequency of words, if you directly access the source included in the page, the keywords necessary for the HTML grammar are included, so that it is difficult to analyze big data based on the frequency of words in the necessary text. In this paper, we propose a method to analyze the text of the Annals of the Joseon Dynasty using Java's Jsoup crawling function. In the experiment, only the Taejo part of the Annals of the Joseon Dynasty was extracted to verify the validity of this method.

  • PDF

A Web application vulnerability scoring framework by categorizing vulnerabilities according to privilege acquisition (취약점의 권한 획득 정도에 따른 웹 애플리케이션 취약성 수치화 프레임워크)

  • Cho, Sung-Young;Yoo, Su-Yeon;Jeon, Sang-Hun;Lim, Chae-Ho;Kim, Se-Hun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.22 no.3
    • /
    • pp.601-613
    • /
    • 2012
  • It is required to design and implement secure web applications to provide safe web services. For this reason, there are several scoring frameworks to measure vulnerabilities in web applications. However, these frameworks do not classify according to seriousness of vulnerability because these frameworks simply accumulate score of individual factors in a vulnerability. We rate and score vulnerabilities according to probability of privilege acquisition so that we can prioritize vulnerabilities found in web applications. Also, our proposed framework provides a method to score all web applications provided by an organization so that which web applications is the worst secure and should be treated first. Our scoring framework is applied to the data which lists vulnerabilities in web applications found by a web scanner based on crawling, and we show the importance of categorizing vulnerabilities according to privilege acquisition.

Design and implementation of a music recommendation model through social media analytics (소셜 미디어 분석을 통한 음악 추천 모델의 설계 및 구현)

  • Chung, Kyoung-Rock;Park, Koo-Rack;Park, Sang-Hyock
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.9
    • /
    • pp.214-220
    • /
    • 2021
  • With the rapid spread of smartphones, it has become common to listen to music everywhere, just like background music in life, so it is necessary to create a music database that can make recommendations according to individual circumstances and conditions. This paper proposes a music recommendation model through social media. Since emotions, situations, time of day, weather, etc. are included in hashtags, it is possible to build a social media-based database that reflects the opinions of various people with collective intelligence. We use web crawling to collect and categorize different hashtags from posts with music title hashtags to use real listeners' opinions about music in a database. Data from social media is used to create a music database, and music is classified in a different way from collaborative filtering, which is mainly used by existing music platforms.

A Study on the Development of Product Planning Prediction Model Using Logistic Regression Algorithm (로지스틱 회귀 알고리즘을 활용한 상품 기획 예측 모형 개발에 관한 연구)

  • Ahn, Yeong-Hwil;Park, Koo-Rack;Kim, Dong-Hyun;Kim, Do-Yeon
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.9
    • /
    • pp.39-47
    • /
    • 2021
  • This study was conducted to propose a product planning prediction model using logistic regression algorithm to predict seasonal factors and rapidly changing product trends. First, we collected unstructured data of consumers in portal sites and online markets using web crawling, and analyzed meaningful information about products through preprocessing for transformation of standardized data. The datasets of 11,200 were analyzed by Logistic Regression to analyze consumer satisfaction, frequency analysis, and advantages and disadvantages of products. The result of analysis showed that the satisfaction of consumers was 92% and the defective issues of products were confirmed through frequency analysis. The results of analysis on the use satisfaction, system efficiency, and system effectiveness items of the developed product planning prediction program showed that the satisfaction was high. Defective issues are very meaningful data in that they provide information necessary for quickly recognizing the current problem of products and establishing improvement strategies.

Development of A Uniform And Casual Clothing Recognition System For Patient Care In Nursing Hospitals

  • Yun, Ye-Chan;Kwak, Young-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.45-53
    • /
    • 2020
  • The purpose of this paper is to reduce the ratio of the patient accidents that may occur in nursing hospitals. In other words, it determines whether the person approaching the dangerous area is a elderly (patient uniform) group or a practitioner(Casual Clothing) group, based on the clothing displayed by CCTV. We collected the basic learning data from web crawling techniques and nursing hospitals. Then model training data was created with Image Generator and Labeling program. Due to the limited performance of CCTV, it is difficult to create a good model with both high accuracy and speed. Therefore, we implemented the ResNet model with relatively excellent accuracy and the YOLO3 model with relatively excellent speed. Then we wanted to allow nursing hospitals to choose a model that they wanted. As a result of the study, we implemented a model that can distinguish patient and casual clothes with appropriate accuracy. Therefore, it is believed that it will contribute to the reduction of safety accidents in nursing hospitals by preventing the elderly from accessing the danger zone.

Image Super-Resolution for Improving Object Recognition Accuracy (객체 인식 정확도 개선을 위한 이미지 초해상도 기술)

  • Lee, Sung-Jin;Kim, Tae-Jun;Lee, Chung-Heon;Yoo, Seok Bong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.6
    • /
    • pp.774-784
    • /
    • 2021
  • The object detection and recognition process is a very important task in the field of computer vision, and related research is actively being conducted. However, in the actual object recognition process, the recognition accuracy is often degraded due to the resolution mismatch between the training image data and the test image data. To solve this problem, in this paper, we designed and developed an integrated object recognition and super-resolution framework by proposing an image super-resolution technique to improve object recognition accuracy. In detail, 11,231 license plate training images were built by ourselves through web-crawling and artificial-data-generation, and the image super-resolution artificial neural network was trained by defining an objective function to be robust to the image flip. To verify the performance of the proposed algorithm, we experimented with the trained image super-resolution and recognition on 1,999 test images, and it was confirmed that the proposed super-resolution technique has the effect of improving the accuracy of character recognition.