• 제목/요약/키워드: Web-Crawling

검색결과 175건 처리시간 0.033초

키워드의 유사도와 가중치를 적용한 연관 문서 추천 방법 (Method of Related Document Recommendation with Similarity and Weight of Keyword)

  • 임명진;김재현;신주현
    • 한국멀티미디어학회논문지
    • /
    • 제22권11호
    • /
    • pp.1313-1323
    • /
    • 2019
  • With the development of the Internet and the increase of smart phones, various services considering user convenience are increasing, so that users can check news in real time anytime and anywhere. However, online news is categorized by media and category, and it provides only a few related search terms, making it difficult to find related news related to keywords. In order to solve this problem, we propose a method to recommend related documents more accurately by applying Doc2Vec similarity to the specific keywords of news articles and weighting the title and contents of news articles. We collect news articles from Naver politics category by web crawling in Java environment, preprocess them, extract topics using LDA modeling, and find similarities using Doc2Vec. To supplement Doc2Vec, we apply TF-IDF to obtain TC(Title Contents) weights for the title and contents of news articles. Then we combine Doc2Vec similarity and TC weight to generate TC weight-similarity and evaluate the similarity between words using PMI technique to confirm the keyword association.

모바일 환경에서 실시간 악성코드 URL 탐지 및 차단 연구 (A Study of Realtime Malware URL Detection & Prevention in Mobile Environment)

  • 박재경
    • 한국컴퓨터정보학회논문지
    • /
    • 제20권6호
    • /
    • pp.37-42
    • /
    • 2015
  • 본 논문에서는 악성코드에 대한 피해를 실시간으로 탐지하고 차단하기 위해 모바일 내부에 악성링크에 대한 데이터베이스를 저장하고 또한 악성링크 탐지 엔진을 통해 웹 서비스를 통제함으로 인해 보다 안전한 모바일 환경을 제공하고자 한다. 최근 모바일 환경에서의 악성코드는 PC 환경 못지않게 기승을 부리고 있으며 새로운 위협이 되고 있다. 특히 모바일 특성상 악성코드의 피해는 사용자의 금전적인 피해로 이어진다는 것이 더 중요한 이유이다. 이러한 사이버 범죄를 어떻게 예방하고 실시간으로 차단할 수 있을 것 인지에 대해 많은 연구가 진행되고 있지만 초보적인 수준에 불과한 실정이다. 추가적으로 SMS나 MMS를 통해 전달되는 스미싱도 탐지 및 차단할 수 있는 방안을 제안하고자 한다. 향후 모바일 사업자는 본 연구를 바탕으로 한 근본적인 대책을 수립하여 안전한 모바일 환경을 구축해야 할 것이다.

온라인 소셜네트워크를 통한 한국인의 정치성향 예측 기법의 연구 (A Study on Political Attitude Estimation of Korean OSN Users)

  • 무하마드 에카 위자야;안희준
    • 한국산업정보학회논문지
    • /
    • 제21권4호
    • /
    • pp.1-11
    • /
    • 2016
  • 본 연구는 Facebook 사용자들의 Like활동 정보를 사용하여 정치성향을 예측하기 위한 분석 모델과 프로그램를 개발하였다. Facebook의 Ajax사용 특성 을 반영한 Facebook 크로울러를 개발하였으며, 이를 사용하여 수집된 성기고 방대한 데이터의 상관 매트릭스 정보를 효과적의 축소하기 위한 카테고리 레벨 필터링 기법을 개발하였다. 대한민국 사용자들을 대상으로 LCA (Latent class analysis) 분석한 결과 28 개의 기준 (전체 대상페이지의 3% 미만) 으로 사용자의 정치적인 극성을 상당히 정확하게 (AUC of 0.82) 예측할 수 있음을 확인하였다.

한국치위생학회지 게재논문의 피인용수에 영향을 미친 요인 (Factors affecting the number of citations in papers published in the Journal of Korean Society of Dental Hygiene)

  • 전세정
    • 한국치위생학회지
    • /
    • 제21권5호
    • /
    • pp.639-644
    • /
    • 2021
  • Objectives: The purpose of this study was to analyze the factors that affected the number of citations for articles published in the Journal of Korean Society of Dental Hygiene based on previous studies. Methods: Information on papers including the number of citations was collected using a web crawling technique. The effect of the number of author keywords, the number of Medical Subject Headings (MeSH) keywords, MeSH match rate, abstract word count and keyword-abstract ratio on the number of citations was analyzed by multiple regression analysis. Results: The use of the MeSH keyword did not have a significant effect on the number of citations. Among the other factors, only the keyword-abstract ratio was statistically significant. Conclusions: Select a topic of constant interest in the field, write the title in detail using colons or asterisks if necessary, and do not repeat the words used in the title in keywords. Select specific keywords deeply related to the topic. In particular, choice words or phrases that are frequently used in the abstract. If the MeSH keyword selection contradicts the previous strategies, boldly give up the MeSH keyword.

시간에 따라 변화하는 빗줄기 장면을 이용한 딥러닝 기반 비지도 학습 빗줄기 제거 기법 (Deep Unsupervised Learning for Rain Streak Removal using Time-varying Rain Streak Scene)

  • 조재훈;장현성;하남구;이승하;박성순;손광훈
    • 한국멀티미디어학회논문지
    • /
    • 제22권1호
    • /
    • pp.1-9
    • /
    • 2019
  • Single image rain removal is a typical inverse problem which decomposes the image into a background scene and a rain streak. Recent works have witnessed a substantial progress on the task due to the development of convolutional neural network (CNN). However, existing CNN-based approaches train the network with synthetically generated training examples. These data tend to make the network bias to the synthetic scenes. In this paper, we present an unsupervised framework for removing rain streaks from real-world rainy images. We focus on the natural phenomena that static rainy scenes capture a common background but different rain streak. From this observation, we train siamese network with the real rain image pairs, which outputs identical backgrounds from the pairs. To train our network, a real rainy dataset is constructed via web-crawling. We show that our unsupervised framework outperforms the recent CNN-based approaches, which are trained by supervised manner. Experimental results demonstrate that the effectiveness of our framework on both synthetic and real-world datasets, showing improved performance over previous approaches.

빅데이터 분석을 통한 조선시대 과실류 특성 연구 (A Study on Fruits Characteristics of the Chosen Dynasty through the Analysis of Chosenwangjoeshirok Big Data)

  • 김미혜
    • 한국식생활문화학회지
    • /
    • 제36권2호
    • /
    • pp.168-183
    • /
    • 2021
  • Using the big data analysis of the Choseonwangjosilrok, this research aimed to figure out the fruits' types, prevalence, seasonal appearances as well as the royalty's perspective on fruits during Choseon period. Choseonwangjosilrok included nineteen kinds of fruits and five kinds of nuts, totaling 1,601 cases at 72.8% and 533 cases at 24.2% respectively. The text recorded fruits being used as: tributes for kings, gifts from kings to palace officials, tomb offerings, county specialties, trade goods or gifts to the foreign ambassadors, and medicine ingredients in oriental pharmacy. Seasonally the fruits appeared demonstrating an even distribution. Periodic characteristics were observed in decreasing quantity chronologically. From fifteenth century to nineteenth century, the fruits with timely features were seen: 804 times at 36.6%, 578 times at 26.3%, 490 times at 22.3%, 248 times at 11.3%, and 78 times at 3.5% respectively. In fifteenth century: citrons, quinces, pomegranates, cherries, permissions, watermelons, Korean melons, omija, walnuts, chestnuts, and pine nuts appeared most frequently. In sixteenth century: pears, grapes, apricots, peaches, and hazelnuts appeared most frequently. In seventeenth century: tangerines and dates appeared most frequently. In eighteenth century, trifoliate orange was the most frequently mentioned fruit.

데이터 마이닝을 활용한 외식업체의 평점에 영향을 미치는 선행 요인 (A Study on Key Factors Influencing Customers' Ratings of Restaurants by Using Data Mining Method)

  • 김선주;김병수
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제31권2호
    • /
    • pp.1-18
    • /
    • 2022
  • Purpose Customer review is a major factor in choosing certain restaurants. This study investigates the key factors affecting customer's evaluation about restaurants. With the recent intensification of competition among restaurants in the service industry, the analysis results are expected to provide in-depth insights for enhancing customer experiences. Design/methodology/approach We collected information and reviews provided at the restaurants in the Kakao Map platform. The information collected is based on the information of 3,785 restaurants in Daegu registered on Kakao Map. Based on the information collected, seven independent variables, including number of rating registered, number of reviews, presence or absence of safe restaurants, presence or absence of a posting about holding facilities, presence or absence of a posting about business hours, presence or absence of a posting about hashtags, and presence or absence of break times, were used. Dependent variable is restaurant rating. Multiple regression between independent variables and restaurant rating was carried out. Findings The results of the study confirmed that number of rating registered, presence or absence of a posting about business hours, and presence or absence of a posting about hash tags have an positive effects on the restaurant rating. The number of reviews had a negative effect on the restaurant rating. In addition, in order to confirm the role of customer's reviews, we carried out LDA topic modeling. We divided the topics into the positive review and the negative reviews.

조현병 관련 주요 일간지 기사에 대한 텍스트 마이닝 분석 (Text-Mining Analyses of News Articles on Schizophrenia)

  • 남희정;류승형
    • 대한조현병학회지
    • /
    • 제23권2호
    • /
    • pp.58-64
    • /
    • 2020
  • Objectives: In this study, we conducted an exploratory analysis of the current media trends on schizophrenia using text-mining methods. Methods: First, web-crawling techniques extracted text data from 575 news articles in 10 major newspapers between 2018 and 2019, which were selected by searching "schizophrenia" in the Naver News. We had developed document-term matrix (DTM) and/or term-document matrix (TDM) through pre-processing techniques. Through the use of DTM and TDM, frequency analysis, co-occurrence network analysis, and topic model analysis were conducted. Results: Frequency analysis showed that keywords such as "police," "mental illness," "admission," "patient," "crime," "apartment," "lethal weapon," "treatment," "Jinju," and "residents" were frequently mentioned in news articles on schizophrenia. Within the article text, many of these keywords were highly correlated with the term "schizophrenia" and were also interconnected with each other in the co-occurrence network. The latent Dirichlet allocation model presented 10 topics comprising a combination of keywords: "police-Jinju," "hospital-admission," "research-finding," "care-center," "schizophrenia-symptom," "society-issue," "family-mind," "woman-school," and "disabled-facilities." Conclusion: The results of the present study highlight that in recent years, the media has been reporting violence in patients with schizophrenia, thereby raising an important issue of hospitalization and community management of patients with schizophrenia.

Research on the Drinking Culture of the Choseon dynasty's Ruling Class using Semantic Network Analysis

  • Mi-Hye, Kim;Yeon-Hee, Kim
    • 셀메드
    • /
    • 제13권2호
    • /
    • pp.3.1-3.21
    • /
    • 2023
  • In this study, the drinking culture of the Choseon dynasty is examined with the text frequency analysis technique on the entire 『Choseonwangjosilok (朝鮮王朝實錄)』. This study examined a total of 1,968 volumes and 948 books about 27 kings of Choseon , which spans a total of 518 years, through web crawling on the National Institute of Korean History website. Python 3.8 was used to extract sentences related to alcohol, Rhino 1.4.5 was used for morphological analysis to extract nouns, and Gephi 0.9.2 was used for semantic network analysis. According to 『Choseonwangjosilok (朝鮮王朝實錄)』 about alcohol culture, the results of the analysis are as follow: Alcoholic beverages were more often used in court or in ritual ceremonies rather than those based on specific ingredients or manufacturing methods commonly used by the general public. regarding the ruling class through semantic network analysis l in the 『Choseonwangjosilok (朝鮮王朝實錄)』, the Choseon dynasty was found to be highly associated with political issues related to maintaining the power relations within the Korean royal court system. At times, alcohol was used to maintain personal relationships, while at other times it was seen as an essential item in state ceremonies. It was also used as a highly political means to maintain and strengthen national power.

Development of Dataset Items for Commercial Space Design Applying AI

  • Jung Hwa SEO;Segeun CHUN;Ki-Pyeong, KIM
    • 한국인공지능학회지
    • /
    • 제11권1호
    • /
    • pp.25-29
    • /
    • 2023
  • In this paper, the purpose is to create a standard of AI training dataset type for commercial space design. As the market size of the field of space design continues to increase and the time spent increases indoors after COVID-19, interest in space is expanding throughout society. In addition, more and more consumers are getting used to the digital environment. Therefore, If you identify trends and preemptively propose the atmosphere and specifications that customers require quickly and easily, you can increase customer trust and conduct effective sales. As for the data set type, commercial districts were divided into a total of 8 categories, and images that could be processed were derived by refining 4,009,30MB JPG format images collected through web crawling. Then, by performing bounding and labeling operations, we developed a 'Dataset for AI Training' of 3,356 commercial space image data in CSV format with a size of 2.08MB. Through this study, elements of spatial images such as place type, space classification, and furniture can be extracted and used when developing AI algorithms, and it is expected that images requested by clients can be easily and quickly collected through spatial image input information.