• Title/Summary/Keyword: Crawling

Search Result 371, Processing Time 0.021 seconds

Tax Judgment Analysis and Prediction using NLP and BiLSTM (NLP와 BiLSTM을 적용한 조세 결정문의 분석과 예측)

  • Lee, Yeong-Keun;Park, Koo-Rack;Lee, Hoo-Young
    • Journal of Digital Convergence
    • /
    • v.19 no.9
    • /
    • pp.181-188
    • /
    • 2021
  • Research and importance of legal services applied with AI so that it can be easily understood and predictable in difficult legal fields is increasing. In this study, based on the decision of the Tax Tribunal in the field of tax law, a model was built through self-learning through information collection and data processing, and the prediction results were answered to the user's query and the accuracy was verified. The proposed model collects information on tax decisions and extracts useful data through web crawling, and generates word vectors by applying Word2Vec's Fast Text algorithm to the optimized output through NLP. 11,103 cases of information were collected and classified from 2017 to 2019, and verified with 70% accuracy. It can be useful in various legal systems and prior research to be more efficient application.

A Design of Estimate-information Filtering System using Artificial Intelligent Technology (인공지능 기술을 활용한 부동산 허위매물 필터링 시스템)

  • Moon, Jeong-Kyung
    • Convergence Security Journal
    • /
    • v.21 no.1
    • /
    • pp.115-120
    • /
    • 2021
  • An O2O-based real estate brokerage web sites or apps are increasing explosively. As a result, the environment has been changed from the existing offline-based real estate brokerage environment to the online-based environment, and consumers are getting very good feelings in terms of time, cost, and convenience. However, behind the convenience of online-based real estate brokerage services, users often suffer time and money damage due to false information or malicious false information. Therefore, in this study, in order to reduce the damage to consumers that may occur in the O2O-based real estate brokerage service, we designed a false property information filtering system that can determine the authenticity of registered property information using artificial intelligence technology. Through the proposed research method, it was shown that not only the authenticity of the property information registered in the online real estate service can be determined, but also the temporal and financial damage of consumers can be reduced.

HTML Text Extraction Using Frequency Analysis (빈도 분석을 이용한 HTML 텍스트 추출)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1135-1143
    • /
    • 2021
  • Recently, text collection using a web crawler for big data analysis has been frequently performed. However, in order to collect only the necessary text from a web page that is complexly composed of numerous tags and texts, there is a cumbersome requirement to specify HTML tags and style attributes that contain the text required for big data analysis in the web crawler. In this paper, we proposed a method of extracting text using the frequency of text appearing in web pages without specifying HTML tags and style attributes. In the proposed method, the text was extracted from the DOM tree of all collected web pages, the frequency of appearance of the text was analyzed, and the main text was extracted by excluding the text with high frequency of appearance. Through this study, the superiority of the proposed method was verified.

YouTube Channel Ranking Scheme based on Hidden Qualitative Information Analysis (유튜브 은닉 질적 정보 분석 기반 유튜브 채널 랭킹 기법)

  • Lee, Ji Hyeon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.7
    • /
    • pp.757-763
    • /
    • 2019
  • Youtube has become so popular that it is called the age of YouTube. As the number of users and contents increase, the choice of information increases. However, it is difficult to select information that meets the needs of users. YouTube provides recommendations based on their watch list. Therefore, in this study, we want to analyze the channel of user's subject in various angles and provide the proposed scheme based on the crawled channels, measurement of the perception of channels and channel videos through quantitative data and hidden qualitative data analysis. Based on the above two data analysis, it is possible to know the recognition of the channel and the recognition of the channel video, thereby providing a ranking of the channels that deal with the topic. Finally, as a case study, we recommend English learning channels to users based on numerical data statistics and emotional analysis results to maximize flipped learning effect regardless of time and space.

Female Middle-Aged Householders' Experiences in Preparation for Old Age: With Focus on Career Female Householders (중년 여성가구주의 노후준비 경험: 직업이 있는 여성가구주를 중심으로)

Design and Analysis of Technical Management System of Personal Information Security using Web Crawer (웹 크롤러를 이용한 개인정보보호의 기술적 관리 체계 설계와 해석)

  • Park, In-pyo;Jeon, Sang-june;Kim, Jeong-ho
    • Journal of Platform Technology
    • /
    • v.6 no.4
    • /
    • pp.69-77
    • /
    • 2018
  • In the case of personal information files containing personal information, there is insufficient awareness of personal information protection in end-point areas such as personal computers, smart terminals, and personal storage devices. In this study, we use Diffie-Hellman method to securely retrieve personal information files generated by web crawler. We designed SEED and ARIA using hybrid slicing to protect against attack on personal information file. The encryption performance of the personal information file collected by the Web crawling method is compared with the encryption decryption rate according to the key generation and the encryption decryption sharing according to the user key level. The simulation was performed on the personal information file delivered to the external agency transmission process. As a result, we compared the performance of existing methods and found that the detection rate is improved by 4.64 times and the information protection rate is improved by 18.3%.

Goal Gradient Effect in Reward-based Crowdfunding; Difference in Project Category (후원형 크라우드 펀딩에서의 목표 구배 효과; 프로젝트 카테고리 별 차이를 중심으로)

  • Hwang, Ji Hyeon;Choi, Kang Jun;Lee, Jae Young;Soh, Seung Bum
    • Knowledge Management Research
    • /
    • v.20 no.3
    • /
    • pp.173-193
    • /
    • 2019
  • Reward-based crowdfunding is a funding platform that allows funds to be raised to early operators who have lack of funds, and is seen as an outstanding infrastructure that is going to lead the fourth industrial revolution in that it is a field of realization of new technologies and creative ideas by start-ups. Reward-based crowdfunding has grown in line with the trend of the fourth industrial revolution, and funding success cases are taking place in various industries that culture/art to technology/IT, including as a new means of knowledge management in a rapidly changing industrial environment. The study focused on the fact that consumer's donation purposes may also vary depending on the category of projects classified as reward-based crowdfunding. Because consumer payment decisions and motivation of consumer purchasing behavior are classified according to the purpose of purchase, the previous papers that the goal gradient effect that the main motivation of consumer donation for reward-based crowdfunding introduced vary depending on project category of utilitarian and hedonic. In this study, consumer's daily donation data is collected by Indiegogo which is a leading reward-based crowdfunding company using web-crawling and the model was defined as propensity score matching (PSM) and random effect model. The results showed that the goal gradient effect occurred in utilitarian project category, but no goal gradient effect for the hedonic project category. Furthermore, this paper developed the study of motivation of consumer donation and contributes theoretical foundation by the results consumer donation may vary depending on the project category; also, this paper has implications for an effective marketing strategy depending on the project category leaves real meaning to the projector.

Coin Classification using CNN (CNN 을 이용한 동전 분류)

  • Lee, Jaehyun;Shin, Donggyu;Park, Leejun;Song, Hyunjoo;Gu, Bongen
    • Journal of Platform Technology
    • /
    • v.9 no.3
    • /
    • pp.63-69
    • /
    • 2021
  • Limited materials to make coins for countries and designs suitable for hand-carry make the shape, size, and color of coins similar. This similarity makes that it is difficult for visitors to identify each country's coins. To solve this problem, we propose the coin classification method using CNN effective to image processing. In our coin identification method, we collect the training data by using web crawling and use OpenCV for preprocessing. After preprocessing, we extract features from an image by using three CNN layers and classify coins by using two fully connected network layers. To show that our model designed in this paper is effective for coin classification, we evaluate our model using eight different coin types. From our experimental results, the accuracy for coin classification is about 99.5%.

A Study on Self-medication for Health Promotion of the Silver Generation

  • Oh, Soonhwan;Ryu, Gihwan
    • International Journal of Advanced Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.82-88
    • /
    • 2020
  • With the development of medical care in the 21st century and the rapid development of the 4th industry, electronic devices and household goods taking into account the physical and mental aging of the silver generation have been developed, and apps related to health and health are generally developed and operated. The apps currently used by the silver generation are a form that provides information on diseases by focusing on prevention rather than treatment, such as safety management apps for the elderly living alone and methods for preventing diseases. There are not many apps that provide information on foods that have a direct effect and nutrients in that food, and research on apps that can obtain information about individual foods is insufficient. In this paper, we propose an app that analyzes food factors and provides self-medication for health promotion of the silver generation. This app allows the silver generation to conveniently and easily obtain information such as nutrients, calories, and efficacy of food they need. In addition, this app collects/categorizes healthy food information through a textom solution-based crawling agent, and stores highly relevant words in a data resource. In addition, wide deep learning was applied to enable self-medication recommendations for food. When this technique is applied, the most appropriate healthy food is suggested to people with similar eating patterns and tastes in the same age group, and users can receive recommendations on customized healthy foods that they need before eating. This made it possible to obtain convenient healthy food information through a customized interface for the elderly through a smartphone.

An Exploratory Study on the Policy for Facilitating of Health Behaviors Related to Particulate Matter: Using Topic and Semantic Network Analysis of Media Text (미세먼지 관련 건강행위 강화를 위한 정책의 탐색적 연구: 미디어 정보의 토픽 및 의미연결망 분석을 활용하여)

  • Byun, Hye Min;Park, You Jin;Yun, Eun Kyoung
    • Journal of Korean Academy of Nursing
    • /
    • v.51 no.1
    • /
    • pp.68-79
    • /
    • 2021
  • Purpose: This study aimed to analyze the mass and social media contents and structures related to particulate matter before and after the policy enforcement of the comprehensive countermeasures for particulate matter, derive nursing implications, and provide a basis for designing health policies. Methods: After crawling online news articles and posts on social networking sites before and after policy enforcement with particulate matter as keywords, we conducted topic and semantic network analysis using TEXTOM, R, and UCINET 6. Results: In topic analysis, behavior tips was the common main topic in both media before and after the policy enforcement. After the policy enforcement, influence on health disappeared from the main topics due to increased reports about reduction measures and government in mass media, whereas influence on health appeared as the main topic in social media. However semantic network analysis confirmed that social media had much number of nodes and links and lower centrality than mass media, leaving substantial information that was not organically connected and unstructured. Conclusion: Understanding of particulate matter policy and implications influence health, as well as gaps in the needs and use of health information, should be integrated with leadership and supports in the nurses' care of vulnerable patients and public health promotion.