• Title/Summary/Keyword: search attributes

Search Result 230, Processing Time 0.036 seconds

Mapping Categories of Heterogeneous Sources Using Text Analytics (텍스트 분석을 통한 이종 매체 카테고리 다중 매핑 방법론)

  • Kim, Dasom;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.193-215
    • /
    • 2016
  • In recent years, the proliferation of diverse social networking services has led users to use many mediums simultaneously depending on their individual purpose and taste. Besides, while collecting information about particular themes, they usually employ various mediums such as social networking services, Internet news, and blogs. However, in terms of management, each document circulated through diverse mediums is placed in different categories on the basis of each source's policy and standards, hindering any attempt to conduct research on a specific category across different kinds of sources. For example, documents containing content on "Application for a foreign travel" can be classified into "Information Technology," "Travel," or "Life and Culture" according to the peculiar standard of each source. Likewise, with different viewpoints of definition and levels of specification for each source, similar categories can be named and structured differently in accordance with each source. To overcome these limitations, this study proposes a plan for conducting category mapping between different sources with various mediums while maintaining the existing category system of the medium as it is. Specifically, by re-classifying individual documents from the viewpoint of diverse sources and storing the result of such a classification as extra attributes, this study proposes a logical layer by which users can search for a specific document from multiple heterogeneous sources with different category names as if they belong to the same source. Besides, by collecting 6,000 articles of news from two Internet news portals, experiments were conducted to compare accuracy among sources, supervised learning and semi-supervised learning, and homogeneous and heterogeneous learning data. It is particularly interesting that in some categories, classifying accuracy of semi-supervised learning using heterogeneous learning data proved to be higher than that of supervised learning and semi-supervised learning, which used homogeneous learning data. This study has the following significances. First, it proposes a logical plan for establishing a system to integrate and manage all the heterogeneous mediums in different classifying systems while maintaining the existing physical classifying system as it is. This study's results particularly exhibit very different classifying accuracies in accordance with the heterogeneity of learning data; this is expected to spur further studies for enhancing the performance of the proposed methodology through the analysis of characteristics by category. In addition, with an increasing demand for search, collection, and analysis of documents from diverse mediums, the scope of the Internet search is not restricted to one medium. However, since each medium has a different categorical structure and name, it is actually very difficult to search for a specific category insofar as encompassing heterogeneous mediums. The proposed methodology is also significant for presenting a plan that enquires into all the documents regarding the standards of the relevant sites' categorical classification when the users select the desired site, while maintaining the existing site's characteristics and structure as it is. This study's proposed methodology needs to be further complemented in the following aspects. First, though only an indirect comparison and evaluation was made on the performance of this proposed methodology, future studies would need to conduct more direct tests on its accuracy. That is, after re-classifying documents of the object source on the basis of the categorical system of the existing source, the extent to which the classification was accurate needs to be verified through evaluation by actual users. In addition, the accuracy in classification needs to be increased by making the methodology more sophisticated. Furthermore, an understanding is required that the characteristics of some categories that showed a rather higher classifying accuracy of heterogeneous semi-supervised learning than that of supervised learning might assist in obtaining heterogeneous documents from diverse mediums and seeking plans that enhance the accuracy of document classification through its usage.

Comparison of food involvement scale (FIS) and use intention for block type sauce between US and Japanese consumers (미국과 일본 소비자의 음식관여도와 블록형 소스에 대한 이용의도 비교 분석)

  • Lee, Hojin;Kim, Su Jin;Lee, Min A
    • Journal of Nutrition and Health
    • /
    • v.51 no.6
    • /
    • pp.590-598
    • /
    • 2018
  • Purpose: This study was conducted to compare the food involvement scale (FIS) of American and Japanese consumers. In addition, the effects of familiarity, likability, and expectations on willingness to use intentions for block type sauce by nationality were evaluated. Methods: A total of 149 and 112 American and Japanese consumers, respectively, completed the survey. Consumers were asked about familiarity, likability, expectation, willing to use intention, and usage frequency of block type sauce, food involvement scale (FIS), and demographic information. Results: There were differences in the using frequency of block type sauce according to nationality, with consumers in Japan showing significantly higher frequency of using block type sauce than those in the United States (US) (p < 0.001). According to the FIS, US consumers were more focused on how to provide food than food, such as cooking process, table setting, and food shopping, compared to Japanese consumers. In addition, 'expectation' and 'likability' among US consumers and 'expectation' and 'familiarity' among Japanese consumers were positive attributes for willing to use intention (p < 0.01). Conclusion: In the case of the US consumers, 'familiarity' was not significant because the using frequency of the block type sauce was lower than that of Japanese consumers. In the case of the Japanese consumers, 'likability' was not significant because they enjoy cooking itself according to the FIS. Therefore, it is necessary to recognize positive attributes as a key factor for block type sauce, as well as to search for ways to apply marketing strategies based on attributes by nationality.

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

  • Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.1-23
    • /
    • 2018
  • From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.

Comparison of Deep Learning Frameworks: About Theano, Tensorflow, and Cognitive Toolkit (딥러닝 프레임워크의 비교: 티아노, 텐서플로, CNTK를 중심으로)

  • Chung, Yeojin;Ahn, SungMahn;Yang, Jiheon;Lee, Jaejoon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.1-17
    • /
    • 2017
  • The deep learning framework is software designed to help develop deep learning models. Some of its important functions include "automatic differentiation" and "utilization of GPU". The list of popular deep learning framework includes Caffe (BVLC) and Theano (University of Montreal). And recently, Microsoft's deep learning framework, Microsoft Cognitive Toolkit, was released as open-source license, following Google's Tensorflow a year earlier. The early deep learning frameworks have been developed mainly for research at universities. Beginning with the inception of Tensorflow, however, it seems that companies such as Microsoft and Facebook have started to join the competition of framework development. Given the trend, Google and other companies are expected to continue investing in the deep learning framework to bring forward the initiative in the artificial intelligence business. From this point of view, we think it is a good time to compare some of deep learning frameworks. So we compare three deep learning frameworks which can be used as a Python library. Those are Google's Tensorflow, Microsoft's CNTK, and Theano which is sort of a predecessor of the preceding two. The most common and important function of deep learning frameworks is the ability to perform automatic differentiation. Basically all the mathematical expressions of deep learning models can be represented as computational graphs, which consist of nodes and edges. Partial derivatives on each edge of a computational graph can then be obtained. With the partial derivatives, we can let software compute differentiation of any node with respect to any variable by utilizing chain rule of Calculus. First of all, the convenience of coding is in the order of CNTK, Tensorflow, and Theano. The criterion is simply based on the lengths of the codes and the learning curve and the ease of coding are not the main concern. According to the criteria, Theano was the most difficult to implement with, and CNTK and Tensorflow were somewhat easier. With Tensorflow, we need to define weight variables and biases explicitly. The reason that CNTK and Tensorflow are easier to implement with is that those frameworks provide us with more abstraction than Theano. We, however, need to mention that low-level coding is not always bad. It gives us flexibility of coding. With the low-level coding such as in Theano, we can implement and test any new deep learning models or any new search methods that we can think of. The assessment of the execution speed of each framework is that there is not meaningful difference. According to the experiment, execution speeds of Theano and Tensorflow are very similar, although the experiment was limited to a CNN model. In the case of CNTK, the experimental environment was not maintained as the same. The code written in CNTK has to be run in PC environment without GPU where codes execute as much as 50 times slower than with GPU. But we concluded that the difference of execution speed was within the range of variation caused by the different hardware setup. In this study, we compared three types of deep learning framework: Theano, Tensorflow, and CNTK. According to Wikipedia, there are 12 available deep learning frameworks. And 15 different attributes differentiate each framework. Some of the important attributes would include interface language (Python, C ++, Java, etc.) and the availability of libraries on various deep learning models such as CNN, RNN, DBN, and etc. And if a user implements a large scale deep learning model, it will also be important to support multiple GPU or multiple servers. Also, if you are learning the deep learning model, it would also be important if there are enough examples and references.

Feature Analysis of Metadata Schemas for Records Management and Archives from the Viewpoint of Records Lifecycle (기록 생애주기 관점에서 본 기록관리 메타데이터 표준의 특징 분석)

  • Baek, Jae-Eun;Sugimoto, Shigeo
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.10 no.2
    • /
    • pp.75-99
    • /
    • 2010
  • Digital resources are widely used in our modern society. However, we are facing fundamental problems to maintain and preserve digital resources over time. Several standard methods for preserving digital resources have been developed and are in use. It is widely recognized that metadata is one of the most important components for digital archiving and preservation. There are many metadata standards for archiving and preservation of digital resources, where each standard has its own feature in accordance with its primary application. This means that each schema has to be appropriately selected and tailored in accordance with a particular application. And, in some cases, those schemas are combined in a larger frame work and container metadata such as the DCMI application framework and METS. There are many metadata standards for archives of digital resources. We used the following metadata standards in this study for the feature analysis me metadata standards - AGLS Metadata which is defined to improve search of both digital resources and non-digital resources, ISAD(G) which is a commonly used standard for archives, EAD which is well used for digital archives, OAIS which defines a metadata framework for preserving digital objects, and PREMIS which is designed primarily for preservation of digital resources. In addition, we extracted attributes from the decision tree defined for digital preservation process by Digital Preservation Coalition (DPC) and compared the set of attributes with these metadata standards. This paper shows the features of these metadata standards obtained through the feature analysis based on the records lifecycle model. The features are shown in a single frame work which makes it easy to relate the tasks in the lifecycle to metadata elements of these standards. As a result of the detailed analysis of the metadata elements, we clarified the features of the standards from the viewpoint of relationships between the elements and the lifecycle stages. Mapping between metadata schemas is often required in the long-term preservation process because different schemes are used in the records lifecycle. Therefore, it is crucial to build a unified framework to enhance interoperability of these schemes. This study presents a basis for the interoperability of different metadata schemas used in digital archiving and preservation.

Development of Simulation Technology Based on 3D Indoor Map for Analyzing Pedestrian Convenience (보행 편의성 분석을 위한 3차원 실내지도 기반의 시뮬레이션 기술 개발)

  • KIM, Byung-Ju;KANG, Byoung-Ju;YOU, So-Young;KWON, Jay-Hyoun
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.20 no.3
    • /
    • pp.67-79
    • /
    • 2017
  • Increasing transportation dependence on the metro system has lead to the convenience of passengers becoming as important as the transportation capacity. In this study, a pedestrian simulator has been developed that can quantitatively assess the pedestrian environment in terms of attributes such as speed and distance. The simulator consists of modules designed for 3D indoor map authoring and algorithmic pedestrian modeling. Module functions for 3D indoor map authoring include 3D spatial modeling, network generation, and evaluation of obtained results. The pedestrian modeling algorithm executes functions such as conducting a path search, allocation of users, and evaluation of level of service (LOS). The primary objective behind developing the said functions is to apply and analyze various scenarios repeatedly, such as before and after the improvement of the pedestrian environment, and to integrate the spatial information database with the dynamic information database. Furthermore, to demonstrate the practical applicability of the proposed simulator in the future, a test-bed was constructed for a currently operational metro station and the quantitative index of the proposed improvement effect was calculated by analyzing the walking speed of pedestrians before and after the improvement of the passage. The possibility of database extension for further analysis has also been discussed in this study.

A Study on Strategy for success of tourism e-marketplace (관광 e-마켓플레이스의 성공전략에 관한 연구)

  • Hong, Ji-Whan;Kim, Keun-Hyung
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.333-336
    • /
    • 2006
  • E-marketplace is a kind of B2B e-Business system that supports business transactions among companies. If e-marketplace is revitalized, we expect not only the development of related industry but also decrease of transaction cost among companies. It is necessary for the introduction and revitalization of e-marketplace in tourist industry from this point of view. Participants of tour e-marketplace are tour-related companies(travel agencies, lodging enterprises, shipping enterprises, etc.). Also tourists want to search a variety of tour products or contents. So tour e-marketplace has characteristics of B2C e-Business systems as well as B2B e-Business systems at once. The purpose of this study is to classify success factors that determine characteristics of tour e-marketplace through statistics survey from e-marketplace factors related tourism websites. First of all, we analyze success factors of B2B and B2C e-marketplace. Then we will set up influence factors of tour e-marketplace and conduct a survey about success factors of tour e-marketplace. Therefore, we could expect to find these good attributes in tour e-marketplace success through logistic regression and decision tree analysis from source data.

  • PDF

A Dynamic Shortest Path Finding Model using Hierarchical Road Networks (도로 위계 구조를 고려한 동적 최적경로 탐색 기법개발)

  • Kim, Beom-Il;Lee, Seung-Jae
    • Journal of Korean Society of Transportation
    • /
    • v.23 no.6 s.84
    • /
    • pp.91-102
    • /
    • 2005
  • When it comes to the process of information storage, people are likely to organize individual information into the forms of groups rather than independent attributes, and put them together in their brains. Likewise, in case of finding the shortest path, this study suggests that a Hierarchical Road Network(HRN) model should be selected to browse the most desirable route, since the HRN model takes the process mentioned above into account. Moreover, most of drivers make a decision to select a route from origin to destination by road hierarchy. It says that the drivers feel difference between the link travel tine which was measured by driving and the theoretical link travel time. There is a different solution which has predicted the link travel time to solve this problem. By using this solution, the link travel time is predicted based on link conditions from time to time. The predicated link travel time is used to search the shortest path. Stochastic Process model uses the historical patterns of travel time conditions on links. The HRN model has compared favorably with the conventional shortest path finding model in tern of calculated speeds. Even more, the result of the shortest path using the HRN model has more similar to the survey results which was conducted to the taxi drivers. Taxi drivers have a strong knowledge of road conditions on the road networks and they are more likely to select a shortest path according to the real common sense.

An Index-Based Approach for Subsequence Matching Under Time Warping in Sequence Databases (시퀀스 데이터베이스에서 타임 워핑을 지원하는 효과적인 인덱스 기반 서브시퀀스 매칭)

  • Park, Sang-Hyeon;Kim, Sang-Uk;Jo, Jun-Seo;Lee, Heon-Gil
    • The KIPS Transactions:PartD
    • /
    • v.9D no.2
    • /
    • pp.173-184
    • /
    • 2002
  • This paper discuss an index-based subsequence matching that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. In earlier work, Kim et al. suggested an efficient method for whole matching under time warping. This method constructs a multidimensional index on a set of feature vectors, which are invariant to time warping, from data sequences. For filtering at feature space, it also applies a lower-bound function, which consistently underestimates the time warping distance as well as satisfies the triangular inequality. In this paper, we incorporate the prefix-querying approach based on sliding windows into the earlier approach. For indexing, we extract a feature vector from every subsequence inside a sliding window and construct a multidimensional index using a feature vector as indexing attributes. For query processing, we perform a series of index searches using the feature vectors of qualifying query prefixes. Our approach provides effective and scalable subsequence matching even with a large volume of a database. We also prove that our approach does not incur false dismissal. To verify the superiority of our approach, we perform extensive experiments. The results reveal that our approach achieves significant speedup with real-world S&P 500 stock data and with very large synthetic data.

Information System Evaluation using IPA Method (IPA 기법을 활용한 정보시스템 평가)

  • Park, Minsoo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.3
    • /
    • pp.431-436
    • /
    • 2020
  • Information service organizations that provide science and technology information with a relatively short information life cycle for free or paid are in need of reflecting rapidly changing user needs and behaviors and grafting the latest technologies. The purpose of this study is to derive improvements for each system by comparing and analyzing general recognition of science and technology information users' domestic and foreign science and technology information sites and importance by science and technology information attributes. A total of 816 users of science and technology information participated in the online survey, and the collected data were analyzed by quantitative methods including IPA (Importance Performance Analysis) technique. The importance was evaluated by the impact value calculated through regression analysis. As a result of data analysis, the general recognition of users on science and technology information sites was relatively high in national science and technology information services, and Google Scholar and Science Direct were also high. Google Scholar was found to have more strength than improvement. A better understanding of the user's preferred system is a good driving force for improving the lack of existing systems. It is necessary to improve the information retrieval of the science and technology information service system, that is, to improve the search speed and functions, and also to improve the user interface with improved convenience and usability.