• Title/Summary/Keyword: Jsoup

Search Result 6, Processing Time 0.023 seconds

Big Data Analysis of the Annals of the Joseon Dynasty Using Jsoup (Jsoup를 이용한 조선왕조실록의 빅 데이터 분석)

  • Bong, Young-Il;Lee, Choong-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.131-133
    • /
    • 2021
  • The Annals of the Joseon Dynasty are important records registered in UNESCO. This paper proposes a method to analyze big data by examining the frequency of words in the Annals of the Joseon Dynasty translated into Korean. When you access the Annals of the Joseon Dynasty from an Internet site and try to investigate the frequency of words, if you directly access the source included in the page, the keywords necessary for the HTML grammar are included, so that it is difficult to analyze big data based on the frequency of words in the necessary text. In this paper, we propose a method to analyze the text of the Annals of the Joseon Dynasty using Java's Jsoup crawling function. In the experiment, only the Taejo part of the Annals of the Joseon Dynasty was extracted to verify the validity of this method.

  • PDF

Web Crawler Service Implementation for Information Retrieval based on Big Data Analysis (빅데이터 분석 기반의 정보 검색을 위한 웹 크롤러 서비스 구현)

  • Kim, Hye-Suk;Han, Na;Lim, Suk-Ja
    • Journal of Digital Contents Society
    • /
    • v.18 no.5
    • /
    • pp.933-942
    • /
    • 2017
  • In this paper, we propose a web crawler service method for collecting information efficiently about college students and job-seeker's external activities, competition, and scholarship. The proposed web crawler service uses Jsoup tree analysis and Json format data transmission method to avoid problems of duplicated crawling while crawling at high speed. After collecting relevant information for 24 hours, we were able to confirm that the web crawler service is running with an accuracy of 100%. It is expected that the web crawler service can be applied to various web sites in the future to improve the web crawler service.

Information-providing Application Based on Web Crawling (웹 크롤링을 통한 개인 맞춤형 정보제공 애플리케이션)

  • Ju-Hyeon Kim;Jeong-Eun Choi;U-Gyeong Shin;Min-Jun Piao;Tae-Kook Kim
    • Journal of Internet of Things and Convergence
    • /
    • v.10 no.1
    • /
    • pp.21-27
    • /
    • 2024
  • This paper presents the implementation of a personalized real-time information-providing application utilizing filtering and web crawling technologies. The implemented application performs web crawling based on the user-set keywords within web pages, using the Jsoup library as a basis for the selected keywords. The crawled data is then stored in a MySQL database. The stored data is presented to the user through an application implemented using Flutter. Additionally, mobile push notifications are provided using Firebase Cloud Messaging (FCM). Through these methods, users can efficiently obtain the desired information quickly. Furthermore, there is an expectation that this approach can be applied to the Internet of Things (IoT) where big data is generated, allowing users to receive only the information they need.

Information-providing Application Based on Web Crawling (웹 크롤링을 통한 개인 맞춤형 정보제공 애플리케이션)

  • Ju-Hyeon Kim;Jeong-Eun Choi;U-Gyeong Shin;Min-Jun Piao;Tae-Kook Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.295-296
    • /
    • 2023
  • 본 논문에서는 웹 크롤링을 통한 개인 맞춤형 정보제공 애플리케이션에 관해 연구하였다. 본 서비스는 Java의 Jsoup 라이브러리를 이용해서 웹 크롤링(Web Crawling)한 데이터를 MySQL에 저장한다. 이를 통해 사용자가 지정한 키워드를 필터링하여 사용자에게 정보를 제공한다. 예를 들어 사용자가 지정한 키워드 관련 공지 사항이 업데이트되면 구현한 앱 내에서 확인 가능하며, KakaoTalk 알림톡을 통해서도 업데이트된 정보를 실시간으로 전송받는 서비스를 구현하였다.

A Study on Leadership Typology in Sports Leaders Based on Big Data Analysis (빅데이터 분석을 활용한 스포츠 지도자들의 리더십 유형에 관한 연구)

  • Park, Eun-Mi;Seo, Joung-Hae
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.7
    • /
    • pp.191-198
    • /
    • 2019
  • This paper investigates different types of leadership found in foreign coaches in charge of the Korean national soccer team. To that end, news articles published during the tenure of those coaches were crawled for analysis. The analysis highlighted the following results. First, successful sports leaders showed their own specific types of leadership. Second, failed sports leaders showed specific types of leadership. The findings have the following implications. The leadership established based on the analysis results have practical implications in that they suggest the types of effectiveness leadership that are required of sports leaders in managing and leading athletes whilst generating tangible results and performance.

Word Frequency-Based Big Data Analysis for the Annals of the Joseon Dynasty (조선왕조실록 분석을 위한 단어 빈도수 기반 빅 데이터 분석)

  • Bong, Young-Il;Lee, Choong-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.707-709
    • /
    • 2022
  • Annals of the Joseon Dynasty is a librarian that compiled the history of the Joseon Dynasty for 472 years, from Taejo to Cheoljong. The Annals of the Joseon Dynasty, National Treasure No. 151, are important documented heritages, but they are difficult to analyze due to their vast content. Therefore, rather than analyzing all the contents of the Annals of the Joseon Dynasty, it is necessary to extract and analyze important words. In this paper, we propose a method of extracting words from the main body of the Annals of the Joseon Dynasty through web crawling and analyzing the translated texts of the Annals of the Joseon Dynasty based on the data sorted according to the frequency of words. In this study, only the part of King Sejong of the Annals of the Joseon Dynasty was extracted and the importance was analyzed according to the frequency of words.

  • PDF