• Title/Summary/Keyword: Web-Crawling

Search Result 175, Processing Time 0.031 seconds

A Study on Usage Frequency of Translated English Phrase Using Google Crawling

  • Kim, Kyuseok;Lee, Hyunno;Lim, Jisoo;Lee, Sungmin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.689-692
    • /
    • 2020
  • People have studied English using online English dictionaries when they looked for the meaning of English words or the example sentences. These days, as the AI technologies such as machine learning have been developing, documents can be translated in real time with Kakao, Papago, Google translators and so on. But, there has still been some problems with the accuracy of translation. The AI secretaries can be used for real-time interpreting, so this kind of systems are being used to translate such the web pages, papers into Korean. In this paper, we researched on the usage frequency of the combined English phrases from dictionaries by analyzing the number of the searched results on Google. With the result of this paper, we expect to help the people to use more English fluently.

Crawling Analysis Implementation of Cyber Crime Information in Deep Web Environment (딥웹 환경에서 사이버범죄 정보 수집분석 구현)

  • Hwang, Deok-Hyun;Park, So-Young;Bae, Ji-Seon;Jeong, Song-Ju;Hong, Jin-Keun;Park, Hyun-Joo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.390-392
    • /
    • 2020
  • 본 논문에서는 딥웹 환경에서 사이버 범죄 활동에 대한 정보를 중심으로 분석한다. 분석된 정보는 사이버 수사기관에 범죄 분석을 위한 보조정보로 활용될 수 있도록 지원하는 것과 청소년들의 사이버 범죄에 대한 위중성 및 범법성을 인지시키기 위한 교육을 목적으로 활용될 수 있도록 연구되었다. 따라서 본 논문에서는 크롤링, 파싱, 시각화 3가지 과정을 기반으로 딥웹 환경에서 활동하고 있는 정보를 키워드를 중심으로 수집하고 분석하는 솔루션 환경을 구현하였다. 분석된 정보는 사이버에서 일어나는 많은 범죄활동 가운데 가장 일어나기 쉬운 범죄 유형과 주의 깊게 수사가 이루어져야 할 범죄들을 정리하며, 수사의 방향성을 캐치 할 수 있도록 지원하는 기능을 포함한다.

A Study for Conflict in Public Construction Projects Based on Online News (온라인 뉴스 기반 공공건설사업 갈등지수 산정에 관한 기초연구)

  • Baek, Seungwon;Han, Seung Heon;Yun, Sungmin;Lim, Jonglok;Nam, Jihyun
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2021.05a
    • /
    • pp.277-278
    • /
    • 2021
  • Conflict in public construction projects has increased for the last decades. It not only entails enormous social and economic costs but also makes stakeholders suffer from unnecessary expense and time waste. This study defines the the conflict index for public construction projects based on news data, and calculates conflict index for representative past and current public construction projects that has been deepened conflicts at the national level. The result indicates that the major conflict issue of the 2nd Jeju Airport Project are the environment and location whereas that of the Gaduk New Airport Project are the safety, location and necessity. This approach is expected to enable construction project managers to manage conflicts quantitatively based on comparing with past cases.

  • PDF

Development of Dataset Cllection RPA for Machine Learning (머신러닝을 위한 데이터셋 수집 RPA 개발)

  • Kim, Ki-Tae;Seo, Bo-in;Yun, Sang-Hyeok;Lee, Sei-Hoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.295-296
    • /
    • 2020
  • 본 논문에서는 RPA(Robotic Process Automation) Tool 개발 과정 중 머신 러닝, 딥러닝에 필요한 이미지 크롤링 및 전처리 기능을 이용한 가공된 데이터 셋 처리 과정을 기술한다. 개발된 RPA 툴에서 머신러닝 및 딥러닝에 사용될 데이터 확보 기능을 제공하며, 세부적으로 이미지 전처리(Convert Gray, Histogram Equalization, Binary, Resize)등 반복적으로 사용되는 기능들을 제공한다. 개발된 툴을 통해 RPA의 자동화 기능과, 전처리 기능의 융합을 통해 업무의 효율성을 제공한다.

  • PDF

Automated Image Classification Model Using Web Crawling (웹 크롤링을 사용한 자동화된 이미지 분류 모델)

  • Lee, Ju-Hyeok;Kim, Mi-Hui
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.719-722
    • /
    • 2021
  • 최근 딥러닝은 이미지 인식, 음성 인식 등 여러 분야에서 고려되고 있는 기술이다. 그러나 딥러닝 기술을 이용하기 위해서는 대형데이터 세트가 필요하나 이를 구축하기 힘들고 많은 시간이 필요하다는 문제점이 있다. 이에, 본 논문에서는 웹 크롤링을 통해 사용자가 원하는 카테고리의 이미지 데이터 세트를 수집하고 수집한 데이터들을 전처리 과정을 통해 딥러닝 모델에 입력할 수 있는 데이터 세트의 구축을 자동화하며, 전이학습을 통해서 적은 훈련 시간과 높은 정확도를 얻을 수 있는 이미지 분류모델을 제안한다.

Implementation of perfume recommendation service using web crawling and image color extraction artificial intelligence (웹 크롤링과 이미지 색상 추출 인공지능을 이용한 향수 추천 서비스 구현)

  • Yu-jin Kim;Ye-lim Lee;Sung-Yoon Jung;Yu-jin Jo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.758-759
    • /
    • 2023
  • 이 논문에서는 웹 크롤링과 인공지능의 색상 추출 기능을 사용하여 사용자에게 맞는 향수를 추천해주는 서비스를 구현한다. 웹 사이트 제작에 용이한 Java 와 웹 크롤링과 인공지능 구현에 용이한 Python 을 기반으로 구현하였다.

An Implementation of System for Detecting and Filtering Malicious URLs (악성 URL 탐지 및 필터링 시스템 구현)

  • Chang, Hye-Young;Kim, Min-Jae;Kim, Dong-Jin;Lee, Jin-Young;Kim, Hong-Kun;Cho, Seong-Je
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.405-414
    • /
    • 2010
  • According to the statistics of SecurityFocus in 2008, client-side attacks through the Microsoft Internet Explorer have increased by more than 50%. In this paper, we have implemented a behavior-based malicious web page detection system and a blacklist-based malicious web page filtering system. To do this, we first efficiently collected the target URLs by constructing a crawling system. The malicious URL detection system, run on a specific server, visits and renders actively the collected web pages under virtual machine environment. To detect whether each web page is malicious or not, the system state changes of the virtual machine are checked after rendering the page. If abnormal state changes are detected, we conclude the rendered web page is malicious, and insert it into the blacklist of malicious web pages. The malicious URL filtering system, run on the web client machine, filters malicious web pages based on the blacklist when a user visits web sites. We have enhanced system performance by automatically handling message boxes at the time of ULR analysis on the detection system. Experimental results show that the game sites contain up to three times more malicious pages than the other sites, and many attacks incur a file creation and a registry key modification.

Text Data Analysis Model Based on Web Application (웹 애플리케이션 기반의 텍스트 데이터 분석 모델)

  • Jin, Go-Whan
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.785-792
    • /
    • 2021
  • Since the Fourth Industrial Revolution, various changes have occurred in society as a whole due to advance in technologies such as artificial intelligence and big data. The amount of data that can be collect in the process of applying important technologies tends to increase rapidly. Especially in academia, existing generated literature data is analyzed in order to grasp research trends, and analysis of these literature organizes the research flow and organizes some research methodologies and themes, or by grasping the subjects that are currently being talked about in academia, we are making a lot of contributions to setting the direction of future research. However, it is difficult to access whether data collection is necessary for the analysis of document data without the expertise of ordinary programs. In this paper, propose a text mining-based topic modeling Web application model. Even if you lack specialized knowledge about data analysis methods through the proposed model, you can perform various tasks such as collecting, storing, and text-analyzing research papers, and researchers can analyze previous research and research trends. It is expect that the time and effort required for data analysis can be reduce order to understand.

A Study on the Identification of fake Estimate Service using DID (분산신원증명 기술을 활용한 허위 부동산 매물정보 검출에 관한 연구)

  • Moon, Jeong-Kyung;Kim, Jin-Mook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.649-651
    • /
    • 2021
  • In recent years, O2O services for real estate sales are widely distributed in web platforms and apps. This allows sellers, buyers, and real estate brokers to quickly and conveniently conduct real estate sales and charter contracts. However, in the O2O-based real estate sales information system, it wastes time and money for real estate buyers due to the posting of fake information, partial correction of the sales information, and intentional non-posting of the sales information. Therefore, we propose a method of detecting the false or not of real estate property information that can occur on the web platform, and design and implement a proposal system for this. To this end, we propose a method of detecting personal identity and property information based on DID, a distributed identity authentication protocol. The false real estate sales information detection system proposed by us can determine the existence of real estate sales information, partially correct the false sales information, or prove whether or not intentionally unpublished in three steps.

  • PDF

Korean Collective Intelligence in Sharing Economy Using R Programming: A Text Mining and Time Series Analysis Approach (R프로그래밍을 활용한 공유경제의 한국인 집단지성: 텍스트 마이닝 및 시계열 분석)

  • Kim, Jae Won;Yun, You Dong;Jung, Yu Jin;Kim, Ki Youn
    • Journal of Internet Computing and Services
    • /
    • v.17 no.5
    • /
    • pp.151-160
    • /
    • 2016
  • The purpose of this research is to investigate Korean popular attitudes and social perceptions of 'sharing economy' terminology at the current moment from a creative or socio-economic point of view. In Korea, this study discovers and interprets the objective and tangible annual changes and patterns of sociocultural collective intelligence that have taken place over the last five years by applying text mining in the big data analysis approach. By crawling and Googling, this study collected a significant amount of time series web meta-data with regard to the theme of the sharing economy on the world wide web from 2010 to 2014. Consequently, huge amounts of raw data concerning sharing economy are processed into the value-added meaningful 'word clouding' form of graphs or figures by using the function of word clouding with R programming. Till now, the lack of accumulated data or collective intelligence about sharing economy notwithstanding, it is worth nothing that this study carried out preliminary research on conducting a time-series big data analysis from the perspective of knowledge management and processing. Thus, the results of this study can be utilized as fundamental data to help understand the academic and industrial aspects of future sharing economy-related markets or consumer behavior.