• Title/Summary/Keyword: Crawler

Search Result 199, Processing Time 0.024 seconds

Reviews Analysis of Korean Clinics Using LDA Topic Modeling (토픽 모델링을 활용한 한의원 리뷰 분석과 마케팅 제언)

  • Kim, Cho-Myong;Jo, A-Ram;Kim, Yang-Kyun
    • The Journal of Korean Medicine
    • /
    • v.43 no.1
    • /
    • pp.73-86
    • /
    • 2022
  • Objectives: In the health care industry, the influence of online reviews is growing. As medical services are provided mainly by providers, those services have been managed by hospitals and clinics. However, direct promotions of medical services by providers are legally forbidden. Due to this reason, consumers, like patients and clients, search a lot of reviews on the Internet to get any information about hospitals, treatments, prices, etc. It can be determined that online reviews indicate the quality of hospitals, and that analysis should be done for sustainable hospital marketing. Method: Using a Python-based crawler, we collected reviews, written by real patients, who had experienced Korean medicine, about more than 14,000 reviews. To extract the most representative words, reviews were divided by positive and negative; after that reviews were pre-processed to get only nouns and adjectives to get TF(Term Frequency), DF(Document Frequency), and TF-IDF(Term Frequency - Inverse Document Frequency). Finally, to get some topics about reviews, aggregations of extracted words were analyzed by using LDA(Latent Dirichlet Allocation) methods. To avoid overlap, the number of topics is set by Davis visualization. Results and Conclusions: 6 and 3 topics extracted in each positive/negative review, analyzed by LDA Topic Model. The main factors, consisting of topics were 1) Response to patients and customers. 2) Customized treatment (consultation) and management. 3) Hospital/Clinic's environments.

HTML Text Extraction Using Tag Path and Text Appearance Frequency (태그 경로 및 텍스트 출현 빈도를 이용한 HTML 본문 추출)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.12
    • /
    • pp.1709-1715
    • /
    • 2021
  • In order to accurately extract the necessary text from the web page, the method of specifying the tag and style attributes where the main contents exist to the web crawler has a problem in that the logic for extracting the main contents. This method needs to be modified whenever the web page configuration is changed. In order to solve this problem, the method of extracting the text by analyzing the frequency of appearance of the text proposed in the previous study had a limitation in that the performance deviation was large depending on the collection channel of the web page. Therefore, in this paper, we proposed a method of extracting texts with high accuracy from various collection channels by analyzing not only the frequency of appearance of text but also parent tag paths of text nodes extracted from the DOM tree of web pages.

Detection Models and Response Techniques of Fake Advertising Phishing Websites (가짜 광고성 피싱 사이트 탐지 모델 및 대응 기술)

  • Eunbeen Lee;Jeongeun Cho;Wonhyung Park
    • Convergence Security Journal
    • /
    • v.23 no.3
    • /
    • pp.29-36
    • /
    • 2023
  • With the recent surge in exposure to fake advertising phishing sites in search engines, the damage caused by poor search quality and personal information leakage is increasing. In particular, the seriousness of the problem is worsening faster as the possibility of automating the creation of advertising phishing sites through tools such as ChatGPT increases. In this paper, the source code of fake advertising phishing sites was statically analyzed to derive structural commonalities, and among them, a detection crawler that filters sites step by step based on foreign domains and redirection was developed to confirm that fake advertising posts were finally detected. In addition, we demonstrate the need for new guide lines by verifying that the redirection page of fake advertising sites is divided into three types and returns different sites according to each situation. Furthermore, we propose new detection guidelines for fake advertising phishing sites that cannot be detected by existing detection methods.

Scale Effects and Field Applications for Continuous Intrusion Miniature Cone Penetrometer (연속관입형 소형콘관입시험기에 대한 크기효과 및 현장적용)

  • Yoon, Sungsoo;Kim, Kyu-Sun;Lee, Jin Hyung;Shin, Dong-Hyun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.33 no.6
    • /
    • pp.2359-2368
    • /
    • 2013
  • Cone penetration tests (CPTs) have been increasingly used for site characterizations. However, the site investigations using CPTs are often limited due to soil conditions depending on the cone size and capacity of the CPT system. The small sectional area of a miniature cone improves the applicability of the CPT system due to the increased capacity of the CPT system. A continuous intrusion system using a coiled rod allows fast and cost effective site investigation. In this study, the performance of the continuous intrusion miniature cone penetration test (CIMCPT) system has been evaluated by comparison tests with the standard CPT system at several construction sites in Korea. The results show that the CIMCPT system has a same performance with the CPT system and has advantages on the mobility and applicability. According to field verification tests for scale effect evaluation, the cone tip resistance evaluated by CIMCPT overestimates by 10% comparing to standard CPTs. A crawler mounted with the CIMCPT system has been implemented to improve accessibility to soft ground, and has shown improvement over the truck type CIMCPT system. Therefore, the improved CIMCPT system can be utilized as a cost effective and highly reliable soil investigation methodology to detect the depth of soft ground and to evaluate soil classification.

A Study on the necessity of Open Source Software Intermediaries in the Software Distribution Channel (소프트웨어 유통에 있어 공개소프트웨어 중개자의필요성에 대한 연구)

  • Lee, Seung-Chang;Suh, Eung-Kyo;Ahn, Sung-Hyuck;Park, Hoon-Sung
    • Journal of Distribution Science
    • /
    • v.11 no.2
    • /
    • pp.45-55
    • /
    • 2013
  • Purpose - The development and implementation of OSS (Open Source Software) led to a dramatic change in corporate IT infrastructure, from system server to smart phone, because the performance, reliability, and security functions of OSS are comparable to those of commercial software. Today, OSS has become an indispensable tool to cope with the competitive business environment and the constantly-evolving IT environment. However, the use of OSS is insufficient in small and medium-sized companies and software houses. This study examines the need for OSS Intermediaries in the Software Distribution Channel. It is expected that the role of the OSS Intermediary will be reduced with the improvement of the distribution process. The purpose of this research is to prove that OSS Intermediaries increase the efficiency of the software distribution market. Research design, Data, and Methodology - This study presents the analysis of data gathered online to determine the extent of the impact of the intermediaries on the OSS market. Data was collected using an online survey, conducted by building a personal search robot (web crawler). The survey period lasted 9 days during which a total of 233,021 data points were gathered from sourceforge.net and Apple's App store, the two most popular software intermediaries in the world. The data collected was analyzed using Google's Motion Chart. Results - The study found that, beginning 2006, the production of OSS in the Sourceforge.net increased rapidly across the board, but in the second half of 2009, it dropped sharply. There are many events that can explain this causality; however, we found an appropriate event to explain the effect. It was seen that during the same period of time, the monthly production of OSS in the App store was increasing quickly. The App store showed a contrasting trend to software production. Our follow-up analysis suggests that appropriate intermediaries like App store can enlarge the OSS market. The increase was caused by the appearance of B2C software intermediaries like App store. The results imply that OSS intermediaries can accelerate OSS software distribution, while development of a better online market is critical for corporate users. Conclusion - In this study, we analyzed 233,021 data points on the online software marketplace at Sourceforge.net. It indicates that OSS Intermediaries are needed in the software distribution market for its vitality. It is also critical that OSS intermediaries should satisfy certain qualifications to play a key role as market makers. This study has several interesting implications. One implication of this research is that the OSS intermediary should make an effort to create a complementary relationship between OSS and Proprietary Software. The second implication is that the OSS intermediary must possess a business model that shares the benefits with all the participants (developer, intermediary, and users).The third implication is that the intermediary provides an OSS of high quality like proprietary software with a high level of complexity. Thus, it is worthwhile to examine this study, which proves that the open source software intermediaries are essential in the software distribution channel.

  • PDF

Development of $^{192}Ir$ Small-Focal Source for Non-Destructive Testing Application by Using Enriched Target Material (고농축 표적을 이용한 비파괴검사용 $^{192}Ir$ 미세초점선원 개발)

  • Son, K.J;Hong, S.B.;Jang, K.D.;Han, H.S.;Park, U.J.;Lee, J.S.;Kim, D.H.;Han, K.D.;Park, C.D.
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.27 no.1
    • /
    • pp.31-37
    • /
    • 2007
  • A $^{192}Ir$ small-focal source has been developed by using the HANARO reactor and the radioisotope production facility at the Korea Atomic Energy Research Institute (KAERI). The small-focal source with the dimension of 0.5 mm in diameter and 0.5 mm in length was fabricated as an aluminum-encapsulated form by a specially designed pressing equipment. For the estimation of the radioactivity, neutron self-shielding and ${\gamma}-ray$ self-absorption effects on the measured activity was considered. From this estimation, it is realized that $^{192}Ir$ small-focal sources over 3 Ci activities can be produced from the HANARO. Field performance tests were performed by using a conventional source and the developed source to take images of a computer CPU and a piece of a carbon steel. The small-focal source showed better penetration sensitivity and geometrical sharpness than the conventional source does. It is concluded from the tests that the focal dimension of this source is small enough to maximize geometrical sharpness in the image taking for the close proximity shots, pipeline crawler applications and contact radiography.

Building an SNS Crawling System Using Python (Python을 이용한 SNS 크롤링 시스템 구축)

  • Lee, Jong-Hwa
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.5
    • /
    • pp.61-76
    • /
    • 2018
  • Everything is coming into the world of network where modern people are living. The Internet of Things that attach sensors to objects allows real-time data transfer to and from the network. Mobile devices, essential for modern humans, play an important role in keeping all traces of everyday life in real time. Through the social network services, information acquisition activities and communication activities are left in a huge network in real time. From the business point of view, customer needs analysis begins with SNS data. In this research, we want to build an automatic collection system of SNS contents of web environment in real time using Python. We want to help customers' needs analysis through the typical data collection system of Instagram, Twitter, and YouTube, which has a large number of users worldwide. It is stored in database through the exploitation process and NLP process by using the virtual web browser in the Python web server environment. According to the results of this study, we want to conduct service through the site, the desired data is automatically collected by the search function and the netizen's response can be confirmed in real time. Through time series data analysis. Also, since the search was performed within 5 seconds of the execution result, the advantage of the proposed algorithm is confirmed.

Development of a Belt Pick-up Type Two-row Sesame Reaper

  • Jun, Hyeon-Jong;Choi, Il-su;Kang, Tae-Gyoung;Kim, Young-Keun;Lee, Sang-Hee;Kim, Sung-Woo;Choi, Yong;Choi, Duck-Kyu;Lee, Choung-Keun
    • Journal of Biosystems Engineering
    • /
    • v.41 no.4
    • /
    • pp.281-287
    • /
    • 2016
  • Purpose: The purpose of this study is to develop a walking-type two-row sesame reaper, which can simultaneously perform the cutting and collecting of sesame plants and other crops like perilla and soybean. Methods: The factors involved in reaping sesame were determined experimentally in order to design a prototype of the sesame reaper. The prototype is made up of four parts for cutting, conveying, collecting, and running. The height of two disc-plate saw blades on the cutting part is adjusted by an adjusting wheel, and peripheral speed is adjusted in accordance with the running speed. The conveying belt of the conveying part can be tilted from $0^{\circ}$ to $90^{\circ}$. The collecting part extracts a predetermined amount of transferred sesame plants. The prototype was used to evaluate the performance at different working speeds, so that the work efficiency can be calculated. Results: The center of gravity of the sesame plants was 900 mm, measured from the end of the cut stem. The diameter of the disc-plate saw blade was determined to be 355 mm, peripheral speed was 20.4-32.7 m/s, and the picking height of the conveying belt for sesame was 130 mm. The performance of transfer and collection of the sesame, when the insertion angles were $60^{\circ}$ and $90^{\circ}$, proved to be excellent. However, when the angle was over $120^{\circ}$, the performance was only 75-80%. The performance was at 100% efficiency when the ratio between running speed and conveying belt speed of the prototype was 1:2, which seems to be the ideal ratio for the sesame reaper. Conclusions: A sesame reaper was developed, which can integrate the processes of cutting, conveying, and collecting, by investigating and considering various factors involved in the reaping process. The sesame reaper can reduce the costs for yielding and producing sesame due to its highly efficient performance.

Resear cher & Coordinator, Canal Reseach & Development, japan (농업수리시설과 소수로굴착용 Trencher V형의 개발에 대하여)

  • 영목청
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.21 no.2
    • /
    • pp.28-36
    • /
    • 1979
  • One of most important problems in the Monsoon Asia today is the production of rice paddy to meet the needs of the ever increasing population. Diversemeans are being employed to meet this demand, both by increasing productivity of existing farm land and by bringing further areas into cultivation. The primary step in either field is to ensure that there is sufficient moisture in the soil to suit the paddy, and at the same this means that excess moisture has to be drained off the land, while in others irrigat ion has to be employed to bring sufficient water to an area. In view of the fact that the project comprises a huge amount of earthwork, it can be carried out by extensive use of construction machinery in order to shorten the period. As farm ditch has a comparatively small section with shallow cutting depth, inaddition, there is lack of access road in the field, the excavation equipment with bulldozer or tracter-shovel (backhoe) type are not applicable because there are mostly adapted for the excavation of deep and wide section. Mini-backhoe with its bucket width not larger than 0. 3m, and width of blade not larger than 1. 00m seems to be more adaptable. About 80% of excavation of ditch section will be done by the machinery while the other 20% of excavation together with the finishing of the section are supposed to be done by man-power. The embankment of ditch section can be compacted by the crawler of backhoe when it is moving along the ditch for excavation. However, Lowland paddy field in the Monsoon Asia are made particulary in rain season, therefore, heavy machinery is not easy excavation for ditch. It is very important to know exact ground support power of the working site and select machines with corresponding ground pressure. Ground support power is variable subject to quality and water content of soil and therefore selection of machines should be made duly considering ground condition of the site at the time of construction works. Farm ditches dug and compacted by mannual labar are of poor quality and subject to destruction after one or two years of operation. On the other hand, excavation and compaction by bulldozer is not practical for ditches. Backboe is suitable for slope land, but this is required cycle time of bucket excavation and dumped out. If a small-scale farm ditch trencher adaptable to lowland paddy field is invented, such a machine could greatly accelerate the massive construction work envisaged in many countries and thus significantly speed up the most difficult part of irrigation development and management in Monsoon Asia.

  • PDF

상품에 대한 공급자 검색 문제 해결하기 위한 지능형 상품 에이전트 개발

  • Chae, Sang-Yong;Kim, Gyeong-Pil;Kim, U-Ju;Kim, Chang-Uk
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.11a
    • /
    • pp.475-480
    • /
    • 2005
  • 인터넷상에 존재하는 수 많은 웹 페이지들에는 정형화되지 않은 각종 정보들이 이종의 형태로 산재되어 있다. 현재의 검색 기술을 통하여 필요한 정보를 찾아내는 것은 시간과 비용이 많이 소요되는 비효율적인 방법으로 이뤄지고 있다. 이러한 상황에서 사용자가 원하는 정보를 검색 및 추출해내어 정형화시키는 것은 매우 중요하다. 전자상거래의 폭발적 성장에도 불구하고 전자상거래 표준 활용 및 적용이 미비하여 e- Procurement, e-Marketplace, on-Line Shopping Mall 등에서 소비자가 원하는 상품 정보를 손쉽게 획득하지 못하고 있다. 이는 공급자에게는 보다 많은 매출의 기회를 구매자에게는 보다 좋은 자재 및 상품을 저렴한 가격에 소싱 할 수 있는 기회를 제공하지 못하는 문제점이 발생한다. 본 연구에서 제안하고자 하는 지능형 상품 에이전트는 소비자가 구매하고자 하는 특정 상품에 대한 공급자 검색 문제를 해결하기 위하여, 시스템 내부 정보의 확장 및 지식화 뿐만 아니라 웹 상의 다양한 상품 정보를 자동적으로 수집 및 가공하여 저장하는 역할을 수행한다. 이러한 연구를 위해서 사용한 기술은 우선 database 의 schema 를 읽어 들일 수 있는 DB schema reader, 인터넷 웹 페이지(웹문서)를 방문해서 다양한 정보들의 URL을 수집하는 일을 하는 Meta Search Engine 과 Focused Crawler, 그리고 다른 형태의 데이터 구조를 특정 목적에 따라 표준화된 형태로 바꾸는 Wrapper가 있다. 이러한 기술들을 연동하여 필요한 정보들을 추출 공급자 검색 문제를 해결하고자 하는 것이 연구의 목적이다. 정보추출은 사용자의 관심사에 적합한 문서들로부터 어떤 구체적인 사실이나 관계를 정확히 추출하는 작업을 가리킨다.앞으로 e-메일, 매신저, 전자결재, 지식관리시스템, 인터넷 방송 시스템의 기반 구조 역할을 할 수 있다. 현재 오픈웨어에 적용하기 위한 P2P 기반의 지능형 BPM(Business Process Management)에 관한 연구와 X인터넷 기술을 이용한 RIA (Rich Internet Application) 기반 웹인터페이스 연구를 진행하고 있다.태도와 유아의 창의성간에는 상관이 없는 것으로 나타났고, 일반 유아의 아버지 양육태도와 유아의 창의성간의 상관에서는 아버지 양육태도의 성취-비성취 요인에서와 창의성제목의 추상성요인에서 상관이 있는 것으로 나타났다. 따라서 창의성이 높은 아동의 아버지의 양육태도는 일반 유아의 아버지와 보다 더 애정적이며 자율성이 높지만 창의성이 높은 아동의 집단내에서 창의성에 특별한 영향을 더 미치는 아버지의 양육방식은 발견되지 않았다. 반면 일반 유아의 경우 아버지의 성취지향성이 낮을 때 자녀의 창의성을 향상시킬 수 있는 것으로 나타났다. 이상에서 자녀의 창의성을 향상시키는 중요한 양육차원은 애정성이나 비성취지향성으로 나타나고 있어 정서적인 측면의 지원인 것으로 밝혀졌다.징에서 나타나는 AD-SR맥락의 반성적 탐구가 자주 나타났다. 반성적 탐구 척도 두 그룹을 비교 했을 때 CON 상호작용의 특징이 낮게 나타나는 N그룹이 양적으로 그리고 내용적으로 더 의미 있는 반성적 탐구를 했다용을 지원하는 홈페이지를 만들어 자료 제공 사이트에 대한 메타 자료를 데이터베이스화했으며 이를 통해 학생들이 원하는 실시간 자료를 검색하여 찾을 수 있고 홈페이지를 방분했을 때 이해하기 어려운 그래프나 각 홈페이지가 제공하는 자료들에 대한 처리 방법을 도움말로 제공받을 수 있게 했다. 실시간 자료들을 이용한 학습은 학생들의 학습 의욕과 탐구 능력을 향상시켰으

  • PDF