Search | Korea Science

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
- Journal of Intelligence and Information Systems
- /
- v.24 no.1
- /
- pp.1-23
- /
- 2018
From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.
https://doi.org/10.13088/jiis.2018.24.1.001 인용 PDF KSCI

Finding the time sensitive frequent itemsets based on data mining technique in data streams (데이터 스트림에서 데이터 마이닝 기법 기반의 시간을 고려한 상대적인 빈발항목 탐색)

Park, Tae-Su;Chun, Seok-Ju;Lee, Ju-Hong;Kang, Yun-Hee;Choi, Bum-Ghi
- Journal of The Korean Association of Information Education
- /
- v.9 no.3
- /
- pp.453-462
- /
- 2005
Recently, due to technical improvements of storage devices and networks, the amount of data increase rapidly. In addition, it is required to find the knowledge embedded in a data stream as fast as possible. Huge data in a data stream are created continuously and changed fast. Various algorithms for finding frequent itemsets in a data stream are actively proposed. Current researches do not offer appropriate method to find frequent itemsets in which flow of time is reflected but provide only frequent items using total aggregation values. In this paper we proposes a novel algorithm for finding the relative frequent itemsets according to the time in a data stream. We also propose the method to save frequent items and sub-frequent items in order to take limited memory into account and the method to update time variant frequent items. The performance of the proposed method is analyzed through a series of experiments. The proposed method can search both frequent itemsets and relative frequent itemsets only using the action patterns of the students at each time slot. Thus, our method can enhance the effectiveness of learning and make the best plan for individual learning.
PDF

Social graph visualization techniques for public data (공공데이터에 적합한 다양한 소셜 그래프 비주얼라이제이션 알고리즘 제안)

Lee, Manjai;On, Byung-Won
- Journal of the HCI Society of Korea
- /
- v.10 no.1
- /
- pp.5-17
- /
- 2015
Nowadays various public data have been serviced to the public. Through the opening of public data, the transparency and effectiveness of public policy developed by governments are increased and users can lead to the growth of industry related to public data. Since end-users of using public data are citizens, it is very important for everyone to figure out the meaning of public data using proper visualization techniques. In this work, to indicate the significance of widespread public data, we consider UN voting record as public data in which many people may be interested. In general, it has high utilization value by diplomatic and educational purposes, and is available in public. If we use proper data mining and visualization algorithms, we can get an insight regarding the voting patterns of UN members. To visualize, it is necessary to measure the voting similarity values among UN members and then a social graph is created by the similarity values. Next, using a graph layout algorithm, the social graph is rendered on the screen. If we use the existing method for visualizing the social graph, it is hard to understand the meaning of the social graph because the graph is usually dense. To improve the weak point of the existing social graph visualization, we propose Friend-Matching, Friend-Rival Matching, and Bubble Heap algorithms in this paper. We also validate that our proposed algorithms can improve the quality of visualizing social graphs displayed by the existing method. Finally, our prototype system has been released in http://datalab.kunsan.ac.kr/politiz/un/. Please, see if it is useful in the aspect of public data utilization.
PDF KSCI

Analysis method of patent document to Forecast Patent Registration (특허 등록 예측을 위한 특허 문서 분석 방법)

Koo, Jung-Min;Park, Sang-Sung;Shin, Young-Geun;Jung, Won-Kyo;Jang, Dong-Sik
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.11 no.4
- /
- pp.1458-1467
- /
- 2010
Recently, imitation and infringement rights of an intellectual property are being recognized as impediments to nation's industrial growth. To prevent the huge loss which comes from theses impediments, many researchers are studying protection and efficient management of an intellectual property in various ways. Especially, the prediction of patent registration is very important part to protect and assert intellectual property rights. In this study, we propose the patent document analysis method by using text mining to predict whether the patent is registered or rejected. In the first instance, the proposed method builds the database by using the word frequencies of the rejected patent documents. And comparing the builded database with another patent documents draws the similarity value between each patent document and the database. In this study, we used k-means which is partitioning clustering algorithm to select criteria value of patent rejection. In result, we found conclusion that some patent which similar to rejected patent have strong possibility of rejection. We used U.S.A patent documents about bluetooth technology, solar battery technology and display technology for experiment data.
https://doi.org/10.5762/KAIS.2010.11.4.1458 인용 PDF KSCI

A spectrum based evaluation algorithm for micro scale weather analysis module with application to time series cluster analysis (스펙트럼분석 기반의 미기상해석모듈 평가알고리즘 제안 및 시계열 군집분석에의 응용)

Kim, Hea-Jung;Kwak, Hwa-Ryun;Kim, Yu-Na;Choi, Young-Jean
- Journal of the Korean Data and Information Science Society
- /
- v.26 no.1
- /
- pp.41-53
- /
- 2015
In meteorological field, many researchers have tried to develop micro scale weather analysis modules for providing real-time weather information service in the metropolitan area. This effort enables us to cope with various economic and social harms coming from serious change in the micro meteorology of a metropolitan area due to rapid urbanization such as quantitative expansions in its urban activity, growth of population, and building concentration. The accuracy of the micro scale weather analysis modules (MSWAM) directly related to usefulness and quality of the real-time weather information service in the metropolitan area. This paper design a evaluation system along with verification tools that sufficiently accommodate spatio-temporal characteristics of the outputs of the MSWAM. For this we proposes a test for the equality of mean vectors of the output series of the MSWAM and corresponding observed time series by using a spectral analysis technique. As a byproduct, a time series cluster analysis method, using a function of the test statistic as the distance measure, is developed. A real data application is given to demonstrate the utility of the method.
https://doi.org/10.7465/jkdi.2015.26.1.41 인용 PDF KSCI

Algorithm for Primary Full-thickness Skin Grafting in Pediatric Hand Burns

Park, Yang Seo;Lee, Jong Wook;Huh, Gi Yeun;Koh, Jang Hyu;Seo, Dong Kook;Choi, Jai Koo;Jang, Young Chul
- Archives of Plastic Surgery
- /
- v.39 no.5
- /
- pp.483-488
- /
- 2012
Background Pediatric hand burns are a difficult problem because they lead to serious hand deformities with functional impairment due to rapid growth during childhood. Therefore, adequate management is required beginning in the acute stage. Our study aims to establish surgical guidelines for a primary full-thickness skin graft (FTSG) in pediatric hand burns, based on long-term observation periods and existing studies. Methods From January 2000 to May 2011, 210 patients underwent primary FTSG. We retrospectively studied the clinical course and treatment outcomes based on the patients' medical records. The patients' demographics, age, sex, injury site of the fingers, presence of web space involvement, the incidence of postoperative late deformities, and the duration of revision were critically analyzed. Results The mean age of the patients was 24.4 months (range, 8 to 94 months), consisting of 141 males and 69 females. The overall observation period was 6.9 years (range, 1 to 11 years) on average. At the time of the burn, 56 cases were to a single finger, 73 to two fingers, 45 to three fingers, and 22 to more than three. Among these cases, 70 were burns that included a web space (33.3%). During the observation, 25 cases underwent corrective operations with an average period of 40.6 months. Conclusions In the volar area, primary full-thickness skin grafting can be a good indication for an isolated injured finger, excluding the web spaces, and injuries of less than three fingers including the web spaces. Also, in the dorsal area, full-thickness skin grafting can be a good indication. However, if the donor site is insufficient and the wound is large, split-thickness skin grafting can be considered.
https://doi.org/10.5999/aps.2012.39.5.483 인용 PDF KSCI

Molecular Analysis of Pathogenic Molds Isolated from Clinical Specimen (임상검체에서 분리된 병원성 사상균의 분자생물학적 분석)

Lee, Jang Ho;Kwon, Kye Chul;Koo, Sun Hoe
- Korean Journal of Clinical Laboratory Science
- /
- v.52 no.3
- /
- pp.229-236
- /
- 2020
Sixty-five molds isolated from clinical specimens were included in this study. All the isolates were molds that could be identified morphologically, strains that are difficult to identify because of morphological similarities, and strains that require species-level identification. PCR and direct sequencing were performed to target the internal transcribed spacer (ITS) region, the D1/D2 region, and the β-tubulin gene. Comparative sequence analysis using the GenBank database was performed using the basic local alignment search tool (BLAST) algorithm. The fungi identified morphologically to the genus level were 67%. Sequencing analysis was performed on 62 genera and species level of the 65 strains. Discrepancies were 14 (21.5%) of the 65 strains between the results of phenotypic and molecular identification. B. dermatitidis, T. marneffei, and G. argillacea were identified for the first time in Korea using the DNA sequencing method. Morphological identification is a very useful method in terms of the reporting time and costs in cases of frequently isolated and rapid growth, such as Aspergillus. When molecular methods are employed, the cost and clinical significance should be considered. On the other hand, the molecular identification of molds can provide fast and accurate results.
https://doi.org/10.15324/kjcls.2020.52.3.229 인용 PDF KSCI

Clinicopathological Characteristics of Triple Negative Breast Cancer at a Tertiary Care Hospital in India

Dogra, Atika;Doval, Dinesh Chandra;Sardana, Manjula;Chedi, Subhash Kumar;Mehta, Anurag
- Asian Pacific Journal of Cancer Prevention
- /
- v.15 no.24
- /
- pp.10577-10583
- /
- 2015
Background: Triple-negative breast cancer (TNBC), characterized by the lack of expression of estrogen receptor, progesterone receptor and human epidermal growth factor receptor-2, is typically associated with a poor prognosis. The majority of TNBCs show the expression of basal markers on gene expression profiling and most authors accept TNBC as basal-like (BL) breast cancer. However, a smaller fraction lacks a BL phenotype despite being TNBC. The literature is silent on non-basal-like (NBL) type of TNBC. The present study was aimed at defining behavioral differences between BL and NBL phenotypes. Objectives: i) Identify the TNBCs and categorize them into BL and NBL breast cancer. ii) Examine the behavioral differences between two subtypes. iii) Observe the pattern of treatment failure among TNBCs. Materials and Methods: All TNBC cases during January 2009-December 2010 were retrieved. The subjects fitting the inclusion criteria of study were differentiated into BL and NBL phenotypes using surrogate immunohistochemistry with three basal markers $34{\beta}E12$, c-Kit and EGFR as per the algorithm defined by Nielsen et al. The detailed data of subjects were collated from clinical records. The comparison of clinicopathological features between two subgroups was done using statistical analyses. The pattern of treatment failure along with its association with prognostic factors was assessed. Results: TNBC constituted 18% of breast cancer cases considered in the study. The BL and NBL subtypes accounted for 81% and 19% respectively of the TNBC group. No statistically significant association was seen between prognostic parameters and two phenotypes. Among patients with treatment failure, 19% were with BL and 15% were with NBL phenotype. The mean disease free survival (DFS) in groups BL and NBL was 30.0 and 37.9 months respectively, while mean overall survival (OS) was 31.93 and 38.5 months respectively. Treatment failure was significantly associated with stage (p=.023) among prognostic factors. Conclusions: Disease stage at presentation is an important prognostic factor influencing the treatment failure and survival among TNBCs. Increasing tumor size is related to lymph node positivity. BL tumors have a more aggressive clinical course than that of NBL as shown by shorter DFS and OS, despite having no statistically significant difference between prognostic parameters. New therapeutic alternatives should be explored for patients with this subtype of breast cancer.
https://doi.org/10.7314/APJCP.2014.15.24.10577 인용 PDF KSCI

Effective Prioritized HRW Mapping in Heterogeneous Web Server Cluster (이질적 웹 서버 클러스터 환경에서 효율적인 우선순위 가중치 맵핑)

김진영;김성천
- Journal of KIISE:Computer Systems and Theory
- /
- v.30 no.12
- /
- pp.708-713
- /
- 2003
For many years clustered heterogeneous web server architecture has been formed on the internet because the explosive internet services and the various quality of requests. The critical point in cluster environment is the mapping schemes of request to server. and recently this is the main issue of internet architecture. The topic of previous mapping methods is to assign equal loads to servers in cluster using the number of requests. But recent growth of various services makes it hard to depend on simple load balancing to satisfy appropriate latency. So mapping based on requested content to decrease response time and to increase cache hit rates on entire servers - so called “content-based” mapping is highly valuated on the internet recently. This paper proposes Prioritized Highest Random Weight mapping(PHRW mapping) that improves content-based mapping to properly fit in the heterogeneous environment. This mapping scheme that assigns requests to the servers with priority, is very effective on heterogeneous web server cluster, especially effective on decreasing latency of reactive data service which has limit on latency. This paper have proved through algorithm and simulation that proposed PHRW mapping show higher-performance by decrease in latency.
PDF KSCI

The impacts of high speed train on the regional economy of Korea (고속철도(KTX) 개통이 지역경제에 미치는 영향 분석과 시사점)

Park, Mi Suk;Kim, Yongku
- The Korean Journal of Applied Statistics
- /
- v.29 no.1
- /
- pp.13-25
- /
- 2016
High-speed railway (Korea Train Express) has had a deep impact on the regional economy of Korea. Current high-speed rail research is mostly theoretical, there is a lack of quantitative research using a precise algorithm to study the effect of high-speed railway on the regional economy. This paper analyses the influence of high-speed rail on the regional economy, with a focus on the Daegu area. Quantitative analysis using department store indexes and regional medical records is performed to calculate the economic influence of high-speed rail. The result shows that high-speed railway effects the regional economy though regional consumption growth and medical care trends.
https://doi.org/10.5351/KJAS.2016.29.1.013 인용 PDF KSCI

Search Result 588, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)