• 제목/요약/키워드: data crawling

검색결과 195건 처리시간 0.022초

Product Recommendation System based on User Purchase Priority

  • Bang, Jinsuk;Hwang, Doyeun;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • 제18권1호
    • /
    • pp.55-60
    • /
    • 2020
  • As personalized customer services create a society that emphasizes the personality of an individual, the number of product reviews and quantity of user data generated by users on the internet in mobile shopping apps and sites are increasing. Such product review data are classified as unstructured data. Unstructured data have the potential to be transformed into information that companies and users can employ, using appropriate processing and analyses. However, existing systems do not reflect the detailed information they collect, such as user characteristics, purchase preference, or purchase priority while analyzing review data. Thus, it is challenging to provide customized recommendations for various users. Therefore, in this study, we have developed a product recommendation system that takes into account the user's priority, which they select, when searching for and purchasing a product. The recommendation system then displays the results to the user by processing and analyzing their preferences. Since the user's preference is considered, the user can obtain results that are more relevant.

Development of A Uniform And Casual Clothing Recognition System For Patient Care In Nursing Hospitals

  • Yun, Ye-Chan;Kwak, Young-Tae
    • 한국컴퓨터정보학회논문지
    • /
    • 제25권12호
    • /
    • pp.45-53
    • /
    • 2020
  • 본 연구의 목적은 요양병원에서 발생할 수 있는 노인안전사고 발생률을 감소시키는 것이다. 즉, 위험지역으로 접근하는 인물이 노인(환자복) 그룹인지 실무자(평상복) 그룹인지를 CCTV에 나타나는 의복을 기준으로 구별하는 것이다. Web Crawling기법과 요양병원으로부터 지원을 받아 기초 데이터를 수집하였다. 이후 Image Generator와 Labeling으로 모델 학습 데이터를 만들었다. CCTV의 제한된 성능 때문에 높은 정확도와 속도를 모두 갖춘 모델을 만드는 것은 어려웠다. 그러므로 정확성이 상대적으로 우수한 ResNet 모델, 속도에서 상대적으로 우수한 YOLO3 모델을 각각 구현했다. 그리고 요양병원이 자신의 실정에 맞는 모델을 고를 수 있게 하고자 했다. 연구 결과 환자복과 평상복을 적절한 정확도로 구별할 수 있는 모델을 구현하였다. 따라서 실제 사용처에서 노인들이 위험구역에 접근하지 못하도록 하여 요양병원 안전사고 감소에 이바지 할 것으로 평가된다.

조선왕조실록 과학계량적 분석을 통한 채소류의 통시적 고찰 (A Scientific Quantitative Analysis on Vegetables of Joseon Dynasty using the Joseonwangjoshilrok based Data)

  • 김미혜
    • 한국식생활문화학회지
    • /
    • 제36권2호
    • /
    • pp.143-157
    • /
    • 2021
  • This study aimed to analyze the periodic prevalence of the vegetables during the Joseon era with JoseonWangjoSilrok as a reference. The JoseonWangjoSilrok articles were collected from the Guksapyeonchanwewonhwe site, using web-crawling techniques to extract the relevant information. Out of 384,582 search results, 9,560 articles with vegetable-related keywords were found. According to the annual average vegetable recordings during the regimes of various kings, there were two peaking curves in the 15th and 18th centuryJoseon. The found was: 2,750 in the 18th century, 2,529 in the 15th century, 1,424 in the 16th century, and 1,018 in the 19th century. A Variable Interest Index was designed to ascertain the interestin vegetables of the 27 Joseon kings. The king most interested in vegetables was the 19th king Sookjong. The second most interested king was Youngjo. There were 5,105 vegetable-related findings within the JoseonWangjoSilrok related to specific species and categories of vegetables. Among the words found: 1,194 were stem-leaves vegetables (23.39%), 1,017 were root vegetables (19.92%), 1,148 were flower-fruit vegetables (22.49%), 1,144 were spice vegetables (22.41%), 95 were mushrooms (1.86%), and 507 were seaweeds (9.93%). Statistical analysis using ANOVA revealed the chronological factors that affected the vegetables' prevalence index.

A Study on Usage Frequency of Translated English Phrase Using Google Crawling

  • Kim, Kyuseok;Lee, Hyunno;Lim, Jisoo;Lee, Sungmin
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2020년도 추계학술발표대회
    • /
    • pp.689-692
    • /
    • 2020
  • People have studied English using online English dictionaries when they looked for the meaning of English words or the example sentences. These days, as the AI technologies such as machine learning have been developing, documents can be translated in real time with Kakao, Papago, Google translators and so on. But, there has still been some problems with the accuracy of translation. The AI secretaries can be used for real-time interpreting, so this kind of systems are being used to translate such the web pages, papers into Korean. In this paper, we researched on the usage frequency of the combined English phrases from dictionaries by analyzing the number of the searched results on Google. With the result of this paper, we expect to help the people to use more English fluently.

A Keyword-Based Big Data Analysis for Individualized Health Activity: Focusing on Methodological Approach

  • 김한별;배근표;허준호
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2017년도 춘계학술발표대회
    • /
    • pp.540-543
    • /
    • 2017
  • It will be possible to solve some of the major issues in our society and economy with the emerging Big Data used across 21st century global digital economy. One of the main areas where big data can be quite useful is the medical and health area. IT technology is being used extensively in this area and expected to expand its application field further. However, there is still room for improvement in the usage of Big Data as it is difficult to search unstructured data contained in Big Data and collect statistics for them. This limits wider application of Big Data. Depending on data collection and analysis method, the results from a Big Data can be varied. Some of them could be positive or negative so that it is essential that Big Data should be handled adequately and appropriately adapting to a purpose. Therefore, a Big Data has been constructed in this study to applying Crawling technique for data mining and analyzed with R. Also, the data were visualized for easier recognition and this was effective in developing an individualized health plan from different angles.

R프로그래밍을 활용한 공유경제의 한국인 집단지성: 텍스트 마이닝 및 시계열 분석 (Korean Collective Intelligence in Sharing Economy Using R Programming: A Text Mining and Time Series Analysis Approach)

  • 김재원;윤유동;정유진;김기연
    • 인터넷정보학회논문지
    • /
    • 제17권5호
    • /
    • pp.151-160
    • /
    • 2016
  • 본 연구의 목적은 최근 창조경제 또는 사회적 경제 관점에서 주목받고 있는 공유경제라는 키워드에 관해 현대 한국인들이 가지고 있는 대중적인 문화 및 사회적 인식, 즉 집단지성의 변화 추세를 조사하는 것이다. 이를 위해, 본 연구는 빅데이터 분석 관점의 텍스트 마이닝 기법을 적용하여 최근 5년 간 사회 문화적 집단지성의 객관적이고 가시적인 연간 변화 및 패턴들을 발견하고 이해하고자 한다. 월드 와이드 웹에서 크롤링(crawling) 기법과 구글링(googling)을 통해 분석에 필요한 2010년부터 2014년까지 축적된 상당한 양의 공유경제를 주제로 한 기존 문헌들의 시계열 웹 메타 데이터를 수집하였다. 결과적으로, 많은 양의 가공되지 않은 공유경제 키워드 관련 원 자료들은 R프로그래밍 분석을 통해 보다 의미 있는 가치 있는 '워드 클라우딩' 형태의 그래프나 그림으로 분석처리 되었다. 아직까지 시기적으로 공유경제에 관해 축적된 자료나 집단지성이 양적으로 미비함에도 불구하고, 본 연구는 지식처리 관점에서 시계열 빅데이터 분석을 수행한 선행연구라는 점에서 의미가 있다. 따라서 본 연구의 결과는 향후 산학 분야에서 공유경제 관련 시장분석과 소비자 행동학 관련 후속 연구들을 위해 1차 자료로서 학문적 시사점을 제공할 수 있다.

빅데이터 분석을 통한 한국과 미국의 스타벅스 비교 분석 (A Comparison of Starbucks between South Korea and U.S.A. through Big Data Analysis)

  • 조아라;김학선
    • 한국조리학회지
    • /
    • 제23권8호
    • /
    • pp.195-205
    • /
    • 2017
  • The purpose of this study was to compare the Starbucks in South Korea with Starbucks in U.S.A through the semantic network analysis of big data by collecting online data with SCTM(Smart Crawling & Text Mining) program which was developed by big data research institute at Kyungsung University, a data collecting and processing program. The data collection period was from January 1st 2014 to December 7th 2017, and packaged Netdraw along with UCINET 6.0 were utilized for data analysis and visualization. After performing CONCOR(convergence of iterated correlation) analysis and centrality analysis, this study illustrated the current characteristics of Starbucks for Korea and U.S.A reflected by the social network and the differences between Korea and U.S.A. Since the Starbucks was greatly developed, especially in Korea. this study also was supposed to provide significant and social-network oriented suggestions for Starbucks USA, Starbucks Korea and also the whole coffee industry. Also this study revealed that big data analytics can generate new insights into variables that have been extensively studied in existing hospitality literature. In addition, implications for theory and practice as well as directions for future research are discussed.

An Automatic Urban Function District Division Method Based on Big Data Analysis of POI

  • Guo, Hao;Liu, Haiqing;Wang, Shengli;Zhang, Yu
    • Journal of Information Processing Systems
    • /
    • 제17권3호
    • /
    • pp.645-657
    • /
    • 2021
  • Along with the rapid development of the economy, the urban scale has extended rapidly, leading to the formation of different types of urban function districts (UFDs), such as central business, residential and industrial districts. Recognizing the spatial distributions of these districts is of great significance to manage the evolving role of urban planning and further help in developing reliable urban planning programs. In this paper, we propose an automatic UFD division method based on big data analysis of point of interest (POI) data. Considering that the distribution of POI data is unbalanced in a geographic space, a dichotomy-based data retrieval method was used to improve the efficiency of the data crawling process. Further, a POI spatial feature analysis method based on the mean shift algorithm is proposed, where data points with similar attributive characteristics are clustered to form the function districts. The proposed method was thoroughly tested in an actual urban case scenario and the results show its superior performance. Further, the suitability of fit to practical situations reaches 88.4%, demonstrating a reasonable UFD division result.

R기반 빅데이터 분석기법을 활용한 상수도시스템 누수사고 분석 (Water leakage accident analysis of water supply networks using big data analysis technique)

  • 홍성진;유도근
    • 한국수자원학회논문집
    • /
    • 제55권spc1호
    • /
    • pp.1261-1270
    • /
    • 2022
  • 본 연구의 목적은 사람들이 쉽게 접할 수 있는 포털의 뉴스 검색 결과를 활용하여 쉽게 접근, 활용하지 못하는 상수도 누수 관련 정보를 모아 분석하는 것이다. 상수도 시스템의 누수사고 빅데이터 뉴스의 추출을 위한 웹크롤링 기법을 적용하고 정확한 누수사고 뉴스를 획득하고자 알고리즘을 절차화하여 제시하였다. 또한 추출된 누수사고 기사에서 발생일시, 피해영향, 발생지점, 피해원인, 피해시설 등과 같은 추가적인 정보의 획득이 가능하도록 상수도 누수사고 정보 분석에 적합한 데이터 분석 기법을 개발하였으며 그에 따른 적용결과를 제시하였다. 본 연구에서 제안한 빅데이터 기반 누수 분석을 통한 가치 추출은 기존의 상수도통계 결과와 비교를 통한 유의미한 가치를 추출하는 데 1차적 목표가 있으며, 이와 같은 분석 결과를 활용하여 향후 누수 사고 대응에 있어 소비자의 반응에 효과적으로 대응하거나 서비스 수준을 결정하는데 활용할 수 있다. 즉, 이와 같은 분석결과의 제시를 통해 사고와 같은 정보를 대중에 조금더 알려야하는 필요성을 제시하고, 사고 발생시 빠른 대처가 가능할 수 있는 전파 및 대응 체계를 마련하는데 연계활용할 수 있다.

빅데이터를 통한 브랜드 평가 맵 제안 : 현대자동차 제품 평가 중심으로 (Proposal of Brand Evaluation Map through Big Data : Focus on The Hyundai Motor's Product Evaluation)

  • 윤대명;이용혁;이봉규
    • 한국IT서비스학회지
    • /
    • 제19권4호
    • /
    • pp.1-11
    • /
    • 2020
  • Through text mining, sentiment analysis, and semiotics analysis, this study aims to reinterpret the meaning of user emotional words and related words to derive strategic elements of brand and design. After selecting a local car manufacturer whose user opinion on the brand is a clear topic, web-crawl the car comments of the manufacturer directly created by the users online. Then, analyze the extracted morphology and its associated words and convert them to fit the marketing mix theory. Through this process, propose a methodology that allows consumers to supplement and improve brand elements with negative sensibilities, and to inherit elements with positive sensibilities and manage brands reasonably. In particular, the Map presented in this study are considered to be fully utilized as information for overall brand management.