• Title/Summary/Keyword: data crawling

Search Result 195, Processing Time 0.024 seconds

A Comparison of Image Classification System for Building Waste Data based on Deep Learning (딥러닝기반 건축폐기물 이미지 분류 시스템 비교)

  • Jae-Kyung Sung;Mincheol Yang;Kyungnam Moon;Yong-Guk Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.199-206
    • /
    • 2023
  • This study utilizes deep learning algorithms to automatically classify construction waste into three categories: wood waste, plastic waste, and concrete waste. Two models, VGG-16 and ViT (Vision Transformer), which are convolutional neural network image classification algorithms and NLP-based models that sequence images, respectively, were compared for their performance in classifying construction waste. Image data for construction waste was collected by crawling images from search engines worldwide, and 3,000 images, with 1,000 images for each category, were obtained by excluding images that were difficult to distinguish with the naked eye or that were duplicated and would interfere with the experiment. In addition, to improve the accuracy of the models, data augmentation was performed during training with a total of 30,000 images. Despite the unstructured nature of the collected image data, the experimental results showed that VGG-16 achieved an accuracy of 91.5%, and ViT achieved an accuracy of 92.7%. This seems to suggest the possibility of practical application in actual construction waste data management work. If object detection techniques or semantic segmentation techniques are utilized based on this study, more precise classification will be possible even within a single image, resulting in more accurate waste classification

Building an SNS Crawling System Using Python (Python을 이용한 SNS 크롤링 시스템 구축)

  • Lee, Jong-Hwa
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.5
    • /
    • pp.61-76
    • /
    • 2018
  • Everything is coming into the world of network where modern people are living. The Internet of Things that attach sensors to objects allows real-time data transfer to and from the network. Mobile devices, essential for modern humans, play an important role in keeping all traces of everyday life in real time. Through the social network services, information acquisition activities and communication activities are left in a huge network in real time. From the business point of view, customer needs analysis begins with SNS data. In this research, we want to build an automatic collection system of SNS contents of web environment in real time using Python. We want to help customers' needs analysis through the typical data collection system of Instagram, Twitter, and YouTube, which has a large number of users worldwide. It is stored in database through the exploitation process and NLP process by using the virtual web browser in the Python web server environment. According to the results of this study, we want to conduct service through the site, the desired data is automatically collected by the search function and the netizen's response can be confirmed in real time. Through time series data analysis. Also, since the search was performed within 5 seconds of the execution result, the advantage of the proposed algorithm is confirmed.

Formulating Strategies from Consumer Opinion Analysis on AI Kids Phone using Text Mining (AI 키즈폰의 소비자리뷰 분석을 통한 제품개선 전략에 대한 연구)

  • Kim, Dohun;Cha, Kyungjin
    • The Journal of Society for e-Business Studies
    • /
    • v.24 no.2
    • /
    • pp.71-89
    • /
    • 2019
  • In order to come up with satisfying product and improvement, firms use traditional marketing research methods to obtain consumers' opinions and further try to reflect them. Recently, gathering data from consumer communication platforms like internet and SNS has become popular methods. Meanwhile, with the development of information technology, mobile companies are launching new digital products for children to protect them from harmful content and provide them with necessary functions and information. Among these digital products, Kids Phone, which is a wearable device with safe functions that enable parents to learn childern's location. Kids phone is relatively cheaper and simpler than smartphone but it is noted that there are several problems such as some useless functions and frequent breakdowns. This study analyzes the reviews of Kids phones from domestic mobile companies, identifies the characteristics, strengths and weaknesses of the products, proposes improvement methods strategies for devices and services through SNS consumer analysis. In order to do that customer review data from online shopping malls was gathered and was further analyzed through text mining methods such as TF/IDF, Sentiment Analysis, and network analysis. Customer review data was gathered through crawling Online shopping Mall and Naver Blog/$Caf\acute{e}$. Data analysis and visualization was done using 'R', 'Textom', and 'Python'. Such analysis allowed us to figure out main issues and recent trends regarding kids phones and to suggest possible service improvement strategies based on sentiment analysis.

A Study on the Perception of Quality of Care Services by Care Workers using Big Data (빅데이터를 활용한 요양보호사의 서비스질 인식에 관한 연구)

  • Han-A Cho
    • Journal of Korean Dental Hygiene Science
    • /
    • v.6 no.1
    • /
    • pp.13-25
    • /
    • 2023
  • Background: This study was conducted to confirm the service quality management of care workers, who are direct service personnel of long-term care insurance for the elderly, using unstructured big data. Methods: Using a textome, this study collected and analyzed unstructured social data related to care workers' service quality. Frequency, TF-IDF, centrality, semantic network, and CONCOR analyses were conducted on the top 50 keywords collected by crawling the data. Results: As a result of frequency analysis, the top-ranked keywords were 'Long-term care services,' 'Care workers,' 'Quality of care services,' 'Long term care,' 'Long term care facilities,' 'Enhancement,' 'Elderly,' 'Treatment,' 'Improvement,' and 'Necessity.' The results of degree centrality and eigenvector centrality were almost the same as those of the frequency analysis. As a result of the CONCOR analysis, it was found that the improvement in the quality of long-term care services, the operation of the long-term care services, the long-term care services system, and the perception of the psychological aspects of the care workers were of high concern. Conclusion: This study contributes to setting various directions for improving the service quality of care workers by presenting perceptions related to the service quality of care workers as a meaningful group.

State Information Based Recommendation Algorithm for Minimizing the Malicious User's Influence (상태 정보를 활용하여 악의적 사용자의 영향력을 최소화 하는 추천 알고리즘)

  • Noh, Taewan;Oh, Hayoung;Noh, Giseop;Kim, Chong-Kwon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.6
    • /
    • pp.1353-1360
    • /
    • 2015
  • With the extreme development of Internet, recently most users refer the sites with the various Recommendation Systems (RSs) when they want to buy some stuff, movie and music. However, the possibilities of the Sybils with the malicious behaviors may exists in these RSs sites in which Sybils intentionally increase or decrease the rating values. The RSs cannot play an accurate role of the proper recommendations to the general normal users. In this paper, we divide the given rating values into the stable or unstable states and propose a system information based recommendation algorithm that minimizes the malicious user's influence. To evaluate the performance of the proposed scheme, we directly crawl the real trace data from the famous movie site and analyze the performance. After that, we showed proposed scheme performs well compared to existing algorithms.

A Study on the Vitalization Strategy Based on Current Status Analysis of National Archives (국내외 국립기록관의 트위터 운용 현황 분석 및 활성화 방안)

  • Gang, JuYeon;Kim, TaeYoung;Choi, JungWon;Oh, Hyo-Jung
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.3
    • /
    • pp.263-285
    • /
    • 2016
  • Nowadays, Social Network Service (SNS), which has been in the spotlight as a way of communication, has become a most effective tool to improve easy of information use and accessibility for users. In this paper, we chose Twitter as the most representative SNS services because of automatic crawling and investigated tweet data gathered from domestic and foreign National Archives - NARA of U.S.A., TNA of U.K.. NAA of Australia, and National Archives of Korea. We also conducted information genres analysis and trend analysis by timeline. Information genres analysis shows how archives satisfied users' information needs as well as trends analysis of tweets helps to understand how users' interestedness was changed. Based on comparison results, we distilled four characteristics of National Archives and suggested vitalization ways for National Archives of Korea.

Customized Recipe Recommendation System Implemented in the form of a Chatbot (챗봇 형태로 구현한 사용자 맞춤형 레시피 추천 시스템)

  • Ahn, Ye-Jin;Cho, Ha-Young;Kang, Shin-Jae
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.5
    • /
    • pp.543-550
    • /
    • 2020
  • Interest in food recipe retrieval systems has been increasing recently. Most computer-based recipe retrieval systems are searched by cooking name or ingredient name. Since each recipe provides information in different weighing units, recalculations to the desired amount are necessary and inconvenient. This paper introduces a computer system that addresses these inconveniences. The system is a chatbot system, based on web-based recipe recommendations, for users familiar with the use of messenger conversation systems. After selecting the most popular recipes by their names, and pre-processing to extract only information required for the recipes, the system recommends recipes based on the 100,000 data. Recipes are then searched by the names of food ingredients (included and excluded). Recalculations are performed based on the number of servings entered by the user. A satisfaction rate for the systems' recommendations was 90.5%.

Research on Designing Korean Emotional Dictionary using Intelligent Natural Language Crawling System in SNS (SNS대상의 지능형 자연어 수집, 처리 시스템 구현을 통한 한국형 감성사전 구축에 관한 연구)

  • Lee, Jong-Hwa
    • The Journal of Information Systems
    • /
    • v.29 no.3
    • /
    • pp.237-251
    • /
    • 2020
  • Purpose The research was studied the hierarchical Hangul emotion index by organizing all the emotions which SNS users are thinking. As a preliminary study by the researcher, the English-based Plutchick (1980)'s emotional standard was reinterpreted in Korean, and a hashtag with implicit meaning on SNS was studied. To build a multidimensional emotion dictionary and classify three-dimensional emotions, an emotion seed was selected for the composition of seven emotion sets, and an emotion word dictionary was constructed by collecting SNS hashtags derived from each emotion seed. We also want to explore the priority of each Hangul emotion index. Design/methodology/approach In the process of transforming the matrix through the vector process of words constituting the sentence, weights were extracted using TF-IDF (Term Frequency Inverse Document Frequency), and the dimension reduction technique of the matrix in the emotion set was NMF (Nonnegative Matrix Factorization) algorithm. The emotional dimension was solved by using the characteristic value of the emotional word. The cosine distance algorithm was used to measure the distance between vectors by measuring the similarity of emotion words in the emotion set. Findings Customer needs analysis is a force to read changes in emotions, and Korean emotion word research is the customer's needs. In addition, the ranking of the emotion words within the emotion set will be a special criterion for reading the depth of the emotion. The sentiment index study of this research believes that by providing companies with effective information for emotional marketing, new business opportunities will be expanded and valued. In addition, if the emotion dictionary is eventually connected to the emotional DNA of the product, it will be possible to define the "emotional DNA", which is a set of emotions that the product should have.

Developing a Freeway Flow Management Scheme Under Ubiquitous System Environments (유비쿼터스 환경에서의 연속류 적정속도 관리 기술 개발)

  • Park, Eun-Mi;Seo, Ui-Hyeon;Go, Myeong-Seok;O, Hyeon-Seon
    • Journal of Korean Society of Transportation
    • /
    • v.28 no.4
    • /
    • pp.167-175
    • /
    • 2010
  • The ubiquitous transportation system environments make it possible to collect each vehicle's position and velocity data and to perform more sophisticated traffic flow management at the individual vehicle or platoon level through vehicle to vehicle (V2V) and vehicle to infrastructure (V2I) communication. It is necessary to develop a traffic flow management scheme to take advantage of the ubiquitous transportation system environments. This paper proposes an algorithm to advise the optimal speed for each vehicle according to the traffic flow condition. The algorithm aims to stabilize the traffic flow by advising the equilibrium speed to the vehicles speeding or crawling under freely flowing condition. And it aims to prevent or at least alleviate the shockwave propagation by advising the optimal speed that should dampen the speed drop under critical flow conditions. This paper builds a simulation testbed and performs some simulation experiments for the proposed algorithm. The proposed algorithm shows the expected results in terms of travel time reduction and congestion alleviation.

A Technique for Product Effect Analysis Using Online Customer Reviews (온라인 고객 리뷰를 활용한 제품 효과 분석 기법)

  • Lim, Young Seo;Lee, So Yeong;Lee, Ji Na;Ryu, Bo Kyung;Kim, Hyon Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.9
    • /
    • pp.259-266
    • /
    • 2020
  • In this paper, we propose a novel scheme for product effect analysis, termed PEM, to find out the effectiveness of products used for improving the current condition, such as health supplements and cosmetics, by utilizing online customer reviews. The proposed technique preprocesses online customer reviews to remove advertisements automatically, constructs the word dictionary composed of symptoms, effects, increases, and decreases, and measures products' effects from online customer reviews. Using Naver Shopping Review datasets collected through crawling, we evaluated the performance of PEM compared to those of two methods using traditional sentiment dictionary and an RNN model, respectively. Our experimental results shows that the proposed technique outperforms the other two methods. In addition, by applying the proposed technique to the online customer reviews of atopic dermatitis and acne, effective treatments for them were found appeared on online social media. The proposed product effect analysis technique presented in this paper can be applied to various products and social media because it can score the effect of products from reviews of various media including blogs.