• Title/Summary/Keyword: Crawling

Search Result 371, Processing Time 0.028 seconds

Prototype Design and Development of Online Recruitment System Based on Social Media and Video Interview Analysis (소셜미디어 및 면접 영상 분석 기반 온라인 채용지원시스템 프로토타입 설계 및 구현)

  • Cho, Jinhyung;Kang, Hwansoo;Yoo, Woochang;Park, Kyutae
    • Journal of Digital Convergence
    • /
    • v.19 no.3
    • /
    • pp.203-209
    • /
    • 2021
  • In this study, a prototype design model was proposed for developing an online recruitment system through multi-dimensional data crawling and social media analysis, and validates text information and video interview in job application process. This study includes a comparative analysis process through text mining to verify the authenticity of job application paperwork and to effectively hire and allocate workers based on the potential job capability. Based on the prototype system, we conducted performance tests and analyzed the result for key performance indicators such as text mining accuracy and interview STT(speech to text) function recognition rate. If commercialized based on design specifications and prototype development results derived from this study, it may be expected to be utilized as the intelligent online recruitment system technology required in the public and private recruitment markets in the future.

Determination of Fire Risk Assessment Indicators for Building using Big Data (빅데이터를 활용한 건축물 화재위험도 평가 지표 결정)

  • Joo, Hong-Jun;Choi, Yun-Jeong;Ok, Chi-Yeol;An, Jae-Hong
    • Journal of the Korea Institute of Building Construction
    • /
    • v.22 no.3
    • /
    • pp.281-291
    • /
    • 2022
  • This study attempts to use big data to determine the indicators necessary for a fire risk assessment of buildings. Because most of the causes affecting the fire risk of buildings are fixed as indicators considering only the building itself, previously only limited and subjective assessment has been performed. Therefore, if various internal and external indicators can be considered using big data, effective measures can be taken to reduce the fire risk of buildings. To collect the data necessary to determine indicators, a query language was first selected, and professional literature was collected in the form of unstructured data using a web crawling technique. To collect the words in the literature, pre-processing was performed such as user dictionary registration, duplicate literature, and stopwords. Then, through a review of previous research, words were classified into four components, and representative keywords related to risk were selected from each component. Risk-related indicators were collected through analysis of related words of representative keywords. By examining the indicators according to their selection criteria, 20 indicators could be determined. This research methodology indicates the applicability of big data analysis for establishing measures to reduce fire risk in buildings, and the determined risk indicators can be used as reference materials for assessment.

Structural Assets of Local Broadcasting Networks and Regional Gap: Foucsing on Local MBC stations in South Korea (지역 방송국 네트워크의 구조적 자산(asset)과 지역 간 격차: 지역MBC를 중심으로)

  • Son, Ji-Hoon;Lee, Jung-Min;Kim, Jae-Hun;Park, Han-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.9
    • /
    • pp.194-204
    • /
    • 2022
  • This study examined the social capital and geographical gaps of local television stations using web data gathered through website crawling. URLs for 16 local MBC websites were collected. MBC is an abbreviation for Munhwa Broadcasting Corporation, one of South Korea's largest television and radio broadcasters. Munhwa is a Sino-Korean term that means "culture." It initially determined which institutions local broadcasting stations were linked to using a Web Impact Report. To investigate the specific connection type, URL information was classified using the n-tuple helix model, followed by 2-mode network analysis. The n-tuple helix model is an analysis method that extends the standard university-business-government triple-helix model by including a new network innovation originator. As a result, local broadcasting stations relied heavily on activities like as festivals, performances, and exhibitions to engage the local community. Local stations in Daegu-Gyeongbuk area and the Busan-Ulsan-Gyeongnam area were identified as having the most diverse connections to the local population among other regions.

Analysis of Rana coreana Behavior According to the Slope Angle Degree of Escape Ramp (콘크리트 수로 탈출로 경사각에 따른 한국산개구리 행동 분석)

  • Lee, Taeho;Kim, Jungkwon;Seo, Jihye;Jang, Moonjeong;Choi, Taeyoung;Chang, Minho
    • Journal of Environmental Impact Assessment
    • /
    • v.31 no.1
    • /
    • pp.75-81
    • /
    • 2022
  • The purpose of this study is to propose the angle-limit of the escape ramp by analyzing the frog behavior characteristics according to the inclination angle of the waterway escape ramp installed in the concrete U-bench plume pipe channel. Forthe experiment, an escape test device was manufactured with the same shape and number of materials applied in the field. And Rana coreana living in paddy wetlands were sel selected. The main behaviors of frogs on the slope were 'jumping', 'crawling' and 'slipping', and afterrecording the behavioralresults according to the inclination angle, statistical analysis was conducted using the chi-square test method. As a result of the analysis, there was no statistically significant difference between 30° and 40°. This result is an evidence for expanding the standard of inclination angle 30° suggested in the 'Guidelines for Installation and Management of Ecological Pathways' to a maximum of 40°. However, further research is required in that the escape ramp targets not only Korean frogs but also various small wild animals. However, considering that various wild animals are affected by artificial canals, additional studies using various target wild animals are needed.

Industrial Technology Leak Detection System on the Dark Web (다크웹 환경에서 산업기술 유출 탐지 시스템)

  • Young Jae, Kong;Hang Bae, Chang
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.46-53
    • /
    • 2022
  • Today, due to the 4th industrial revolution and extensive R&D funding, domestic companies have begun to possess world-class industrial technologies and have grown into important assets. The national government has designated it as a "national core technology" in order to protect companies' critical industrial technologies. Particularly, technology leaks in the shipbuilding, display, and semiconductor industries can result in a significant loss of competitiveness not only at the company level but also at the national level. Every year, there are more insider leaks, ransomware attacks, and attempts to steal industrial technology through industrial spy. The stolen industrial technology is then traded covertly on the dark web. In this paper, we propose a system for detecting industrial technology leaks in the dark web environment. The proposed model first builds a database through dark web crawling using information collected from the OSINT environment. Afterwards, keywords for industrial technology leakage are extracted using the KeyBERT model, and signs of industrial technology leakage in the dark web environment are proposed as quantitative figures. Finally, based on the identified industrial technology leakage sites in the dark web environment, the possibility of secondary leakage is detected through the PageRank algorithm. The proposed method accepted for the collection of 27,317 unique dark web domains and the extraction of 15,028 nuclear energy-related keywords from 100 nuclear power patents. 12 dark web sites identified as a result of detecting secondary leaks based on the highest nuclear leak dark web sites.

Apartment Price Prediction Using Deep Learning and Machine Learning (딥러닝과 머신러닝을 이용한 아파트 실거래가 예측)

  • Hakhyun Kim;Hwankyu Yoo;Hayoung Oh
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.2
    • /
    • pp.59-76
    • /
    • 2023
  • Since the COVID-19 era, the rise in apartment prices has been unconventional. In this uncertain real estate market, price prediction research is very important. In this paper, a model is created to predict the actual transaction price of future apartments after building a vast data set of 870,000 from 2015 to 2020 through data collection and crawling on various real estate sites and collecting as many variables as possible. This study first solved the multicollinearity problem by removing and combining variables. After that, a total of five variable selection algorithms were used to extract meaningful independent variables, such as Forward Selection, Backward Elimination, Stepwise Selection, L1 Regulation, and Principal Component Analysis(PCA). In addition, a total of four machine learning and deep learning algorithms were used for deep neural network(DNN), XGBoost, CatBoost, and Linear Regression to learn the model after hyperparameter optimization and compare predictive power between models. In the additional experiment, the experiment was conducted while changing the number of nodes and layers of the DNN to find the most appropriate number of nodes and layers. In conclusion, as a model with the best performance, the actual transaction price of apartments in 2021 was predicted and compared with the actual data in 2021. Through this, I am confident that machine learning and deep learning will help investors make the right decisions when purchasing homes in various economic situations.

A Study on the Perception of Quality of Care Services by Care Workers using Big Data (빅데이터를 활용한 요양보호사의 서비스질 인식에 관한 연구)

  • Han-A Cho
    • Journal of Korean Dental Hygiene Science
    • /
    • v.6 no.1
    • /
    • pp.13-25
    • /
    • 2023
  • Background: This study was conducted to confirm the service quality management of care workers, who are direct service personnel of long-term care insurance for the elderly, using unstructured big data. Methods: Using a textome, this study collected and analyzed unstructured social data related to care workers' service quality. Frequency, TF-IDF, centrality, semantic network, and CONCOR analyses were conducted on the top 50 keywords collected by crawling the data. Results: As a result of frequency analysis, the top-ranked keywords were 'Long-term care services,' 'Care workers,' 'Quality of care services,' 'Long term care,' 'Long term care facilities,' 'Enhancement,' 'Elderly,' 'Treatment,' 'Improvement,' and 'Necessity.' The results of degree centrality and eigenvector centrality were almost the same as those of the frequency analysis. As a result of the CONCOR analysis, it was found that the improvement in the quality of long-term care services, the operation of the long-term care services, the long-term care services system, and the perception of the psychological aspects of the care workers were of high concern. Conclusion: This study contributes to setting various directions for improving the service quality of care workers by presenting perceptions related to the service quality of care workers as a meaningful group.

Suitable clothing recommendation system by size and skin color (의류 사이즈별 및 피부톤에 기반을 둔 의류 추천 시스템)

  • Park, Chang-Young;Lim, Byeong-Chan;Lee, Won-Joon;Lee, Chang-Su;Kim, Min-Su;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.20 no.3
    • /
    • pp.407-413
    • /
    • 2022
  • Existing clothing recommendation systems remain at the level of showing appropriate photos when a user selects a type of clothing he or she likes after entering his or her own body size or body size. When a user purchases clothing using such recommendation systems, there are many cases in which it does not fit or does not fit the user's body size. In this study, to solve these problems of existing clothing recommendation systems, a system was implemented in which the user receives not only size but also skin tone and recommends clothing suitable for the user's body size as well as skin tone. In this system, clothing size information obtained through web crawling was periodically stored in a database for eight male tops to recommend clothing, and the entire pixel of the clothing image was analyzed to extract color text values. In order to confirm the performance of this system, a survey was conducted on 100 male college students, and the satisfaction level was 70%. Most of the reasons for not being satisfied are that the recommended clothing is limited, so it is judged that it is necessary to expand the target clothing in the future.

A Comparison of Image Classification System for Building Waste Data based on Deep Learning (딥러닝기반 건축폐기물 이미지 분류 시스템 비교)

  • Jae-Kyung Sung;Mincheol Yang;Kyungnam Moon;Yong-Guk Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.199-206
    • /
    • 2023
  • This study utilizes deep learning algorithms to automatically classify construction waste into three categories: wood waste, plastic waste, and concrete waste. Two models, VGG-16 and ViT (Vision Transformer), which are convolutional neural network image classification algorithms and NLP-based models that sequence images, respectively, were compared for their performance in classifying construction waste. Image data for construction waste was collected by crawling images from search engines worldwide, and 3,000 images, with 1,000 images for each category, were obtained by excluding images that were difficult to distinguish with the naked eye or that were duplicated and would interfere with the experiment. In addition, to improve the accuracy of the models, data augmentation was performed during training with a total of 30,000 images. Despite the unstructured nature of the collected image data, the experimental results showed that VGG-16 achieved an accuracy of 91.5%, and ViT achieved an accuracy of 92.7%. This seems to suggest the possibility of practical application in actual construction waste data management work. If object detection techniques or semantic segmentation techniques are utilized based on this study, more precise classification will be possible even within a single image, resulting in more accurate waste classification

Media exposure analysis of official sponsors and general companies of mega sport event (메가 스포츠이벤트의 공식스폰서와 일반기업의 미디어 노출 분석)

  • Kim, Joo-Hak;Cho, Sun-Mi
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.4
    • /
    • pp.171-181
    • /
    • 2018
  • As the proportion of sports events in the sports industry grows, the official sponsor market for sports events is also increasing. But because official sponsors are limited and expensive, some companies approach sporting events by way of Ambush marketing. This study is to analyze the differences of media exposure between official sponsors and general companies of mega sport events. To accomplish the purpose of the study, we collected text articles and analyzed them from the period of 2016 Rio Olympics, one year before the Olympics and one year after the Olympics. Web crawling was performed using Python for the collection of articles. Morphological and frequency analysis was performed using the KoNLP package and the TM package of statistical program R. In addition, the opinions of the related experts group were gathered to classify the companies or organizations in the media as the Organizing Committees for the Olympic Games(OCOGs), official sponsor, and general companies. As a result of the analysis, 5,220 times appeared related to the OCOGs, 7,845 times appeared related to the official sponsor, and 7,028 times appeared related to general companies. There isn't much difference in the frequency of exposure between official sponsors and general companies. It implies that Ambush marketing is recognized as a strategic marketing technique. The International Olympic Committee(IOC) has to recognize these social phenomena and establish reasonable standards for the marketing activities of official sponsors and general companies. And this study will serve as a basis for fair sponsor activities or marketing activities of sports events.