• Title/Summary/Keyword: Crawler

Search Result 199, Processing Time 0.022 seconds

Database metadata standardization processing model using web dictionary crawling (웹 사전 크롤링을 이용한 데이터베이스 메타데이터 표준화 처리 모델)

  • Jeong, Hana;Park, Koo-Rack;Chung, Young-suk
    • Journal of Digital Convergence
    • /
    • v.19 no.9
    • /
    • pp.209-215
    • /
    • 2021
  • Data quality management is an important issue these days. Improve data quality by providing consistent metadata. This study presents algorithms that facilitate standard word dictionary management for consistent metadata management. Algorithms are presented to automate synonyms management of database metadata through web dictionary crawling. It also improves the accuracy of the data by resolving homonym distinction issues that may arise during the web dictionary crawling process. The algorithm proposed in this study increases the reliability of metadata data quality compared to the existing passive management. It can also reduce the time spent on registering and managing synonym data. Further research on the new data standardization partial automation model will need to be continued, with a detailed understanding of some of the automatable tasks in future data standardization activities.

Design and Analysis of Technical Management System of Personal Information Security using Web Crawer (웹 크롤러를 이용한 개인정보보호의 기술적 관리 체계 설계와 해석)

  • Park, In-pyo;Jeon, Sang-june;Kim, Jeong-ho
    • Journal of Platform Technology
    • /
    • v.6 no.4
    • /
    • pp.69-77
    • /
    • 2018
  • In the case of personal information files containing personal information, there is insufficient awareness of personal information protection in end-point areas such as personal computers, smart terminals, and personal storage devices. In this study, we use Diffie-Hellman method to securely retrieve personal information files generated by web crawler. We designed SEED and ARIA using hybrid slicing to protect against attack on personal information file. The encryption performance of the personal information file collected by the Web crawling method is compared with the encryption decryption rate according to the key generation and the encryption decryption sharing according to the user key level. The simulation was performed on the personal information file delivered to the external agency transmission process. As a result, we compared the performance of existing methods and found that the detection rate is improved by 4.64 times and the information protection rate is improved by 18.3%.

Changes in public recognition of parabens on twitter and the research status of parabens related to toothpaste (트위터(twitter)에서의 파라벤(parabens) 관련 대중의 인식 변화와 치약내 파라벤에 대한 연구 현황)

  • Oh, Hyo-Jung;Jeon, Jae-Gyu
    • Journal of Korean Academy of Oral Health
    • /
    • v.41 no.2
    • /
    • pp.154-161
    • /
    • 2017
  • Objectives: The purpose of this study was to investigate changes in public recognition of parabens on Twitter and the research status of parabens related to toothpaste. Methods: Tweet information between 2010 and October 2016 was collected by an automatic web crawler and examined according to tweet frequency, key words (2012-October 2016), and issue tweet detection analyses to reveal changes in public recognition of parabens on Twitter. To investigate the research status of parabens related to toothpaste, queries such as "paraben," "paraben and toxicity," "paraben and (toothpastes or dentifrices)," and "paraben and (toothpastes or dentifrices) and toxicity" were used. Results: The number of tweets concerning parabens sharply increased when parabens in toothpaste emerged as a social issue (October 2014), and decreased from 2015 onward. However, toothpaste and its related terms were continuously included in the core key words extracted from tweets from 2015. They were not included in key words before 2014, indicating that the emergence of parabens in toothpaste as a social issue plays an important role in public recognition of parabens in toothpaste. The issue tweet analysis also confirmed the change in public recognition of parabens in toothpaste. Despite the expansion of public recognition of parabens in toothpaste, there are only seven research articles on the topic in PubMed. Conclusions: The general public clearly recognized parabens in toothpaste after emergence of parabens in toothpaste as a social issue. Nevertheless, the scientific information on parabens in toothpaste is very limited, suggesting that the efforts of dental scientists are required to expand scientific knowledge related to parabens in oral hygiene measures.

Deep Learning Frameworks for Cervical Mobilization Based on Website Images

  • Choi, Wansuk;Heo, Seoyoon
    • Journal of International Academy of Physical Therapy Research
    • /
    • v.12 no.1
    • /
    • pp.2261-2266
    • /
    • 2021
  • Background: Deep learning related research works on website medical images have been actively conducted in the field of health care, however, articles related to the musculoskeletal system have been introduced insufficiently, deep learning-based studies on classifying orthopedic manual therapy images would also just be entered. Objectives: To create a deep learning model that categorizes cervical mobilization images and establish a web application to find out its clinical utility. Design: Research and development. Methods: Three types of cervical mobilization images (central posteroanterior (CPA) mobilization, unilateral posteroanterior (UPA) mobilization, and anteroposterior (AP) mobilization) were obtained using functions of 'Download All Images' and a web crawler. Unnecessary images were filtered from 'Auslogics Duplicate File Finder' to obtain the final 144 data (CPA=62, UPA=46, AP=36). Training classified into 3 classes was conducted in Teachable Machine. The next procedures, the trained model source was uploaded to the web application cloud integrated development environment (https://ide.goorm.io/) and the frame was built. The trained model was tested in three environments: Teachable Machine File Upload (TMFU), Teachable Machine Webcam (TMW), and Web Service webcam (WSW). Results: In three environments (TMFU, TMW, WSW), the accuracy of CPA mobilization images was 81-96%. The accuracy of the UPA mobilization image was 43~94%, and the accuracy deviation was greater than that of CPA. The accuracy of the AP mobilization image was 65-75%, and the deviation was not large compared to the other groups. In the three environments, the average accuracy of CPA was 92%, and the accuracy of UPA and AP was similar up to 70%. Conclusion: This study suggests that training of images of orthopedic manual therapy using machine learning open software is possible, and that web applications made using this training model can be used clinically.

A Study for Used Transaction Analysis System using Big Data (빅데이터를 이용한 중고 거래 분석 시스템 연구)

  • Ahn, Byeongtae
    • Journal of Digital Convergence
    • /
    • v.19 no.6
    • /
    • pp.259-264
    • /
    • 2021
  • Recently, as the number of used trading sites supporting used trading increases, users want to search for a variety of information in real time. This new change has enabled a new type of C2C (Commerce to Commerce) transaction in the e-commerce base. However, since each used trading site has its own characteristics, it is difficult to standardize the whole. Therefore, in this paper, we studied a system that provides the transaction data used by the user in real time and provides the desired information quickly. In this paper, we researched the crawler system necessary for the development of the integrated trading system for used goods through Internet e-commerce, and made it possible to provide information in the web environment desired by the user through the defined morpheme analyzer. Therefore, in this study, we designed a system that provides information desired by users without accessing various used goods sites.

Underwater Drone Development for Ship Inspection Part 1: Design, Production and Testing (선박 검사용 수중 드론 개발 Part 1: 설계·제작 및 시험)

  • Ha, Yeon-Chul;Kim, Jin-Woo;Kim, Goo;Jeong, Kyeong-Teak;Choi, Hyun-Deuk
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.21 no.1
    • /
    • pp.38-48
    • /
    • 2020
  • In order to inspect the existing or newly constructed ship's hull, a professional diver directly inspects the ship's bottom of the water. However, since it is a work done by people, there are many dangers such as human casualties and crashes. To solve this problem, it is necessary to develop underwater drones for ship inspection for visual inspection. The technology applied to underwater drones, the use and manufacturing process of each component, and the method of manufacture such as firmware development were described, and the difference was compared by measuring the drone's own driving ability and driving ability using crawler under water, and the location tracking device test confirmed the error from the actual location. It is estimated that the use of underwater drones produced through this research will prevent human casualties and achieve economic effects and stability.

Mask Wearing Detection System using Deep Learning (딥러닝을 이용한 마스크 착용 여부 검사 시스템)

  • Nam, Chung-hyeon;Nam, Eun-jeong;Jang, Kyung-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.44-49
    • /
    • 2021
  • Recently, due to COVID-19, studies have been popularly worked to apply neural network to mask wearing automatic detection system. For applying neural networks, the 1-stage detection or 2-stage detection methods are used, and if data are not sufficiently collected, the pretrained neural network models are studied by applying fine-tuning techniques. In this paper, the system is consisted of 2-stage detection method that contain MTCNN model for face recognition and ResNet model for mask detection. The mask detector was experimented by applying five ResNet models to improve accuracy and fps in various environments. Training data used 17,217 images that collected using web crawler, and for inference, we used 1,913 images and two one-minute videos respectively. The experiment showed a high accuracy of 96.39% for images and 92.98% for video, and the speed of inference for video was 10.78fps.

Developing and Pre-Processing a Dataset using a Rhetorical Relation to Build a Question-Answering System based on an Unsupervised Learning Approach

  • Dutta, Ashit Kumar;Wahab sait, Abdul Rahaman;Keshta, Ismail Mohamed;Elhalles, Abheer
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.11
    • /
    • pp.199-206
    • /
    • 2021
  • Rhetorical relations between two text fragments are essential information and support natural language processing applications such as Question - Answering (QA) system and automatic text summarization to produce an effective outcome. Question - Answering (QA) system facilitates users to retrieve a meaningful response. There is a demand for rhetorical relation based datasets to develop such a system to interpret and respond to user requests. There are a limited number of datasets for developing an Arabic QA system. Thus, there is a lack of an effective QA system in the Arabic language. Recent research works reveal that unsupervised learning can support the QA system to reply to users queries. In this study, researchers intend to develop a rhetorical relation based dataset for implementing unsupervised learning applications. A web crawler is developed to crawl Arabic content from the web. A discourse-annotated corpus is generated using the rhetorical structural theory. A Naïve Bayes based QA system is developed to evaluate the performance of datasets. The outcome shows that the performance of the QA system is improved with proposed dataset and able to answer user queries with an appropriate response. In addition, the results on fine-grained and coarse-grained relations reveal that the dataset is highly reliable.

Sentence Filtering Dataset Construction Method about Web Corpus (웹 말뭉치에 대한 문장 필터링 데이터 셋 구축 방법)

  • Nam, Chung-Hyeon;Jang, Kyung-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.11
    • /
    • pp.1505-1511
    • /
    • 2021
  • Pretrained models with high performance in various tasks within natural language processing have the advantage of learning the linguistic patterns of sentences using large corpus during the training, allowing each token in the input sentence to be represented with appropriate feature vectors. One of the methods of constructing a corpus required for a pre-trained model training is a collection method using web crawler. However, sentences that exist on web may contain unnecessary words in some or all of the sentences because they have various patterns. In this paper, we propose a dataset construction method for filtering sentences containing unnecessary words using neural network models for corpus collected from the web. As a result, we construct a dataset containing a total of 2,330 sentences. We also evaluated the performance of neural network models on the constructed dataset, and the BERT model showed the highest performance with an accuracy of 93.75%.

Analysis of Text Mining of Consumer's Personality Implication Words in Review of Used Transaction Application (중고거래 어플리케이션 <당근마켓> 리뷰텍스트에 나타난 소비자의 인성 함축단어 텍스트마이닝 분석)

  • Jung, Yea-Rin;Ju, Young-Ae
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.1-10
    • /
    • 2021
  • This study analyzes the use and meaning of consumer personality implication words in the review text of the Used Transaction Application . From of May 2021, the data were collected for the past six months by our Web crawler in Seoul and Gyeonggi Province, and a total of 1368 cases were collected first by random sampling, and finally 570 cases were preprocessed. The results are as follows. First, 48.2% of review texts were related to the personality of consumers even though it was a commercial platform of products. Second, the review text is mainly positive, which formed a text network structure based on the keyword 'gratitude'. Third, the review text, which implies consumer character, was divided into two groups: 'extrovert personality' and 'introvert personality' of consumers. And the individuality of the two groups worked together on the platform. In conclusion, we would like to suggest that consumer personality plays an important role in the platform transaction process, that consumer personality will play a role in the services of the platform in the future, and that consumer personality should be studied from various perspectives.