• Title/Summary/Keyword: information retrieval system

Search Result 1,843, Processing Time 0.031 seconds

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.

The Effects of e-Business on Business Performance - In the home-shopping industry - (e-비즈니스가 경영성과에 미치는 영향 -홈쇼핑을 중심으로-)

  • Kim, Sae-Jung;Ahn, Seon-Sook
    • Management & Information Systems Review
    • /
    • v.22
    • /
    • pp.137-165
    • /
    • 2007
  • It seems high time to increase productivity by adopting e-business to overcome challenges posed by both external factors including the appreciation of Korean won, oil hikes and fierce global competition and domestic issues represented by disparities between large corporations and small and medium enterprises (SMEs), Seoul metropolitan and local cities, and export and domestic demand all of which weaken future growth engines in the Korean economy. The demands of the globalization era are for innovative changes in businessprocess and industrial structure aiming for creating new values. To this end, e-business is expected to play a core role in the sophistication of the Korean economy through new values and innovation. In order to examine business performance in e-business-adopting industries, this study analyzed the home shopping industry by closely looking into the financial ratios including the ratio of net profit to sales, the ratio of operation income to sales, the ratio of gross cost to sales cost, the ratio of gross cost to selling, general and administrative (SG&A) expense, and return of investment (ROI). This study, for best outcome, referred to corporate financial statements as a main resource to calculate financial ratios by utilizing Data Analysis, Retrieval and Transfer System (DART) of the Financial Supervisory Service, one of the Korea's financial supervisory authorities. First of all, the result of the trend analysis on the ratio of net profit to sales is as following. CJ Home Shopping has registered a remarkable increase in its ratio of net profit rate to sales since 2002 while its competitors find it hard to catch up with CJ's stunning performances. This is partly due to the efficient management compared to CJ's value of capital. Such significance, if the current trend continues, will make the front-runner assume the largest market share. On the other hand, GS Home Shopping, despite its best organized system and largest value of capital among others, lacks efficiency in management. Second of all, the result of the trend analysis on the ratio of operation income to sales is as following. Both CJ Home Shopping and GS Home Shopping have, until 2004, recorded similar growth trend. However, while CJ Home Shopping's operating income continued to increase in 2005, GS Home Shopping observed its operating income declining which resulted in the increasing income gap with CJ Home Shopping. While CJ Home Shopping with the largest market share in home shopping industryis engaged in aggressive marketing, GS Home Shopping due to its stability-driven management strategies falls behind CJ again in the ratio of operation income to sales in spite of its favorable management environment including its large capital. Companies in the Group B were established in the same year of 2001. NS Home Shopping was the first in the Group B to shift its loss to profit. Woori Home Shopping has continued to post operating loss for three consecutive years and finally was sold to Lotte Group in 2007, but since then, has registered a continuing increase in net income on sales. Third of all, the result of the trend analysis on the ratio of gross cost to sales cost is as following. Since home shopping falls into sales business, its cost of sales is much lower than that of other types of business such as manufacturing industry. Since 2002 in gross costs including cost of sales, SG&A expense, and non-operating expense, cost of sales turned out to have remarkably decreased. Group B has also posted a notable decline in the same sector since 2002. Fourth of all, the result of the trend analysis on the ratio of gross cost to SG&A expense is as following. Due to its unique characteristics, the home shopping industry usually posts ahigh ratio of SG&A expense. However, more than 80% of SG&A expense means the result of lax management and at the same time, a sharp lower net income on sales than other industries. Last but not least, the result of the trend analysis on ROI is as following. As for CJ Home Shopping, the curve of ROI looks similar to that of its investment on fixed assets. As it turned out, the company's ratio of fixed assets to operating income skyrocketed in 2004 and 2005. As far as GS Home Shopping is concerned, its fixed assets are not as much as that of CJ Home Shopping. Consequently, competition in the home shopping industry, at the moment, is among CJ, GS, Hyundai, NS and Woori Home Shoppings, and all of them need to more thoroughly manage their costs. In order for the late-comers of Group B and other home shopping companies to advance further, the current lax management should be reformed particularly on their SG&A expense sector. Provided that the total sales volume in the Internet shopping sector is projected to grow over 20 trillion won by the year 2010, it is concluded that all the participants in the home shopping industry should put strategies on efficient management on costs and expenses as their top priority rather than increase revenues, if they hope to grow even further after 2007.

  • PDF

Discussions on the Accessibility of School Library DLS Catalogue Records - Focused on Literary Collections - (학교도서관 DLS 목록의 자료 접근성에 대한 논의 - 문학 분야 장서를 중심으로 -)

  • Kang, Bong-Suk;Jung, Youngmi
    • Journal of Korean Library and Information Science Society
    • /
    • v.50 no.4
    • /
    • pp.539-559
    • /
    • 2019
  • One of the fundamental roles of libraries is to provide users with efficient and easy retrieval of materials. Various discussions have been made at domestic and abroad to improve the accessibility of materials by category, user, and collection, and at the center of this is the issue of improving classification and cataloging systems. However, there are few studies in this area dealing with the data accessibility of the DLS catalog, which is a central tool for accessing domestic school library materials. This study started from the appeal of school library users to the difficulty of searching and accessing books, especially literature. This study is an exploratory study that attempts to derive problems by finding the causes of there difficulties from various aspects. To this study, we surveyed and analyzed the current status of school library collections, the data registration of the school library support system DLS, the subject accessibility of catalog records produced through this, and the recognition and opinions of school library professionals. As a result, school library collections were highly concentrated in the literature field, and it was found that there was not enough catalog bibliographic records to provide efficient access to these collections. In addition, it was found to be somewhat lacking through the DLS search function to compensate for this. Surveys of school librarians and librarians have also identified this problem, and a rich topic index and search keyword assignments have been drawn to the majority of opinions as a way to improve access to materials in school library catalogs. As a continuous discussion on this subject, the plan for improving access to school library materials will be more concrete through future user studies and new challenges for bookshelf classification.

Korean Abbreviation Generation using Sequence to Sequence Learning (Sequence-to-sequence 학습을 이용한 한국어 약어 생성)

  • Choi, Su Jeong;Park, Seong-Bae;Kim, Kweon-Yang
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.183-187
    • /
    • 2017
  • Smart phone users prefer fast reading and texting. Hence, users frequently use abbreviated sequences of words and phrases. Nowadays, abbreviations are widely used from chat terms to technical terms. Therefore, gathering abbreviations would be helpful to many services, including information retrieval, recommendation system, and so on. However, manually gathering abbreviations needs to much effort and cost. This is because new abbreviations are continuously generated whenever a new material such as a TV program or a phenomenon is made. Thus it is required to generate of abbreviations automatically. To generate Korean abbreviations, the existing methods use the rule-based approach. The rule-based approach has limitations, in that it is unable to generate irregular abbreviations. Another problem is to decide the correct abbreviation among candidate abbreviations generated rules. To address the limitations, we propose a method of generating Korean abbreviations automatically using sequence-to-sequence learning in this paper. The sequence-to-sequence learning can generate irregular abbreviation and does not lead to the problem of deciding correct abbreviation among candidate abbreviations. Accordingly, it is suitable for generating Korean abbreviations. To evaluate the proposed method, we use dataset of two type. As experimental results, we prove that our method is effective for irregular abbreviations.

DDC문학류의 조합식 분류시스템 분석 - 20판을 중심으로

  • 윤희윤
    • Journal of Korean Library and Information Science Society
    • /
    • v.20
    • /
    • pp.351-381
    • /
    • 1993
  • The purpose of this study is to analyze the various processes and patterns to build or synthesize class numbers in the 800 class of the Dewey Decimal Classification, Edition 20(1989). The results of the analysis are as follows: 1. The 800(Literature and rhetoric) class in the DDC system is the main class added analytico-synthetic principle positively to an enumerative scheme. 2. The facets to be a n.0, pplied in literature are language literary form literary period ; kind, scope, or medium ; notation 08(collection) or 09(criticism) literary feature, subject, author, etc. 3. In the 800 class, there are the five tables of precedence for literary forms aspects ; specific kinds of persons ; literary, period in relation to the aspects for works treating more than one literary form subforms, aspects and literary periods in the works treating a specific literary form. 4. The basic number synthesis of literary works proceeds through the various facets in the following sequence, as far as necessary for the item : base no. + literary form + literary time or period + kind, scope, or medium + notation 08 or 09 + subform + additional notation from T3C and other tables. 5. In view of the multiplicity of facets, their synthesis formulas take the following order : (1) Works about the literature : base no.(schedule) + language(T6) or form(T3B) (2) Works by or about individual author : base no.(schedule) + form (T3A) + period(schedule) + subform(T3A) (3) Works by or about more than one author, not restricted by language facet : base no.(schedule) + period(T1) ; base no.(schedule) + kind, scope, medium(T3B), or feature(T3C), or person(T5). (4) Works by or about more than one author, restricted by language facet : base no.(schedule) + form (T3B) + period(schedule) + subform(T3B) + notation 08 or 09(T3B) ; base no.(schedule) + notation 08 or 09(T3B) + 9(T3C) + area notation(T2) : base no.(schedule) + form (T3B) + notation 008 or 009(T3B) : base no.(schedule) + form (T3B) + kind, scope, medium(T3B) + notation 08 or 09(T3B) + period(schedule). (5) Affiliated literatures for which period numbers are not us base no.(schedule) + form (T3A or T3B), or notation 08 or 09(T3B) : base no.(schedule) + kind, scope, medium(T3B), feature(T3C), or person(T5) 6. The problems in the number building of the 800 class are the complexity and difficulty of number synthesis, the intrinsic weakness of from distinction and the inconvenience of retrieval inherent in the form class. In order to solve these problems, therefore, the citation orders and methods of DDC should be improved and synthesis patterns simplified from the point of view of its applicability and its usefulness in the "literature class".

  • PDF

VRML Model Retrieval System Based on XML (XML 기반 VRML 모델 검색 시스템)

  • Im, Min-San;Gwun, O-Bong;Song, Ju-Whan
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07a
    • /
    • pp.709-711
    • /
    • 2005
  • 컴퓨터 그래픽스 분야의 발전으로 3D 모델의 수가 기하급수적으로 늘고 있다. 기존의 텍스트나 2D 이미지만을 검색하는 시스템으로는 정확한 3D 모델의 검색이 힘들다. 따라서 3D 모델 검색 시스템의 필요성이 대두되고 많은 분야에서 그 정확도와 속도향상을 위한 3D 모델 검색 연산자(Descriptor)와 검색 알고리즘을 개발하기 위한 연구가 진행 중이다. 본 논문에서는 VRML 모델을 XML 데이터로 변환하여 3D 모델 검색에 사용하는 것이 주요 목표이다. 검색 방법은 크게 VRML의 노드 분류화를 통한 기본 도형에 대한 검색과 XML로 변환하면서 생성하는 무게중심(Mass-Center)을 이용한 검색 두 가지이다. 즉, 3D 모델 데이터베이스를 구축함으로써 VRML 노드를 통한 분류화와 라벨화된 3D 모델 데이터베이스 지원 등의 장점을 활용한다. 3D 모델을 Key값(Descriptor)을 생성하여 분류화된 XML 데이터로 저장하고, 처리하여 유사도 비교의 대상과 횟수가 많아질수록, 3D 모델을 바로 데이터베이스에서 검색에 사용할 수 있어 검색의 속도와 성능을 보다 증가시킬 수 있다. 보다 복잡한 3D 모델의 유사도 비교에 있어서는 Princeton Shape Benchmark(PSB)[1]에서 정확도가 가장 높게 평가된 방법인 LFD(Light Field Descriptor)[6] 검색 연산자를 사용한다. 이 방법은 3D 모델에서 2D 이미지를 얻어 검색하는 방법으로 많은 2D 이미지 관측점(View-Point)과 관측된 2D 이미지의 적합도를 비교하는 계산량이 많은 단점이 있다. 그래서 3D 모델 검색을 위한 2D 이미지 관측에 있어 x, y, z축 방향의 관측점을 얻는 방법을 제안함으로써 2D 이미지의 관측점을 줄여 계산량을 대폭 감소시키는 장점을 갖는다.것으로 조사되었으며 40대 이상의 연령층은 점심비용으로 더 많은 지출을 하고 있는 것으로 나타났다. 4) 끼니별 한식에 대한 선호도는 아침식사의 경우가 가장 높았으며, 이는 40대와 50대에서 높게 나타났다. 점심 식사로 가장 선호되는 음식은 중식, 일식이었으며 저녁 식사에서 가장 선호되는 메뉴는 전 연령층에서 일식, 분식류 이었으며, 한식에 대한 선택 정도는 전 연령층에서 매우 낮게 나타났다. 5) 각 연령층에서 선호하는 한식에 대한 조사에서는 된장찌개가 전 연령층에서 가장 높은 선호도를 나타내었고, 김치는 40대 이상의 선호도가 30대보다 높게 나타났으며, 흥미롭게도 30세 이하의 선호도는 30대보다 높게 나타났다. 그 외에도 떡과 죽에 대한 선호도는 전 연령층에서 낮게 조사되었다. 장아찌류의 선호도는 전 연령대에서 낮았으며 특히 30세 이하에서 매우 낮게 조사되었다. 한식의 맛에 대한 만족도 조사에서는 연령이 올라갈수록 한식의 맛에 대한 만족도는 낮아지고 있었으나, 한식의 맛에 대한 만족도가 높을수록 양과 가격에 대한 만족도는 높은 경향을 나타내었다. 전반적으로 한식에 대한 선호도는 식사 때와 식사 목적에 따라 연령대 별로 다르게 나타나고 있으나, 선호도는 성별이나 세대에 관계없이 폭 넓은 선호도를 반영하고 있으며, 이는 대학생들을 대상으로 하는 연구 등에서도 나타난바 같다. 주 5일 근무제의 확산과 초 중 고생들의 토요일 휴무와 더불어 여행과 엔터테인먼트산업은 더욱 더 발전을 거듭하고 있으며, 외식은 여행과 여가 활동의 필수적인 요소로써 그 역할을 일조하고 있다. 이와 같은 여가시간의 증가는 독신자들에게는 좀더 많은 여유시간을 가족을 이루고 있는 가족구성원들에게는 가족과의 유대를 강화하는 휴식과 오락의 소비 트렌드를 창출시켰다. 이와 더불어 외식은 식사를 해결하기 위한

  • PDF

A Study on Consumer Recall Competency and Recall Experience (우리나라 소비자의 리콜 역량과 리콜 경험에 관한 연구)

  • Koo, Hye-Gyoung
    • Journal of Digital Convergence
    • /
    • v.16 no.4
    • /
    • pp.1-10
    • /
    • 2018
  • In Korea, the number of recalled products is steadily increasing annually, but the recall participation rate of consumers is very low. This study looked at recall competency as a necessary factor for active recall participation by consumers. And identify the components of the recall competency and identify the recall competence factors that influence recall experience. To this end, we examined the recall experience and recall capacity of 1,626 adult consumers in Korea. As a result, five factors of recall participation will, recall related skill, recall policy recognition, subjective knowledge and objective knowledge were derived. As a result of comparing recall competencies among recall experience and non-recall experience, there were statistically significant differences in all competency factors. Recall related skill and subjective knowledge competency were significant factors for recall experience. In order to improve the effectiveness of the recall system, it is important to improve the recall information and increase access to information retrieval in order to increase the recall participation rate by strengthening the recall capacity of consumers.

Research on the Development of Facets for Improvement in Searching Records: Focusing on Presidential Records (기록물의 검색 향상을 위한 패싯 개발에 관한 연구 - 대통령기록물을 중심으로 -)

  • Seong, Hyoju;Rieh, Hae-young
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.17 no.2
    • /
    • pp.165-188
    • /
    • 2017
  • As the recognition of the importance of user-oriented services is increasing, there has been a heightened attention for finding aids that could improve the effectiveness of searching. This study tried to draw various facet elements that can be applied to the presidential records retrieval system using presidential records as cases in analyzing various resources, considering the importance of facets in finding aids for the improvement of effectiveness in searching in the future and the importance of presidential records in Korea. In drawing facet elements based on the characteristics of presidential records, the websites of the National Archives (NARA) and Presidential (Prime Ministers') Archives as well as their search options were examined as cases. In addition, the morpheme of each title of presidential records were analyzed, as well as the terms entered by the users of the Presidential Archives Portal of Korea, the terms used in the request for information disclosure toward the Presidential Archives in Korea, the search options of the Presidential Archives Portal, and the elements of the description and metadata standards. The significance of this study lies on suggesting the methodology of developing various facets as main elements in finding aids using the presidential records as cases.

Development of Collaborative Environment for Community-driven Scientific Data Curation (커뮤니티 주도적 과학 데이터 큐레이션 협업 환경의 개발)

  • Choi, Dong-Hoon;Park, Jae-Won;Kim, ByungKyu;Shin, Jin-Sup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.9
    • /
    • pp.1-11
    • /
    • 2017
  • The importance of data curation is increasingly recognized as the need of data reuse drastically grows. Due to recent data explosion, scientists invest almost 90% of their efforts in the retrieval and collection of data needed to their study. In this paper, we deal with the development and application of a collaborative environment for community-driven data curation which is essential to enhance scientific data reusability and citability. The collaborative scientific data curation environment focuses on the cross-linking between data (or data collections) and their associated literatures to capture and organize inter-relations among research results in a specific domain. Also, plenty of contextual information is provided as metadata in order to support users in understanding data. The cross-linking has been realized by using DOI system to guarantee global accessibility to data and their relationships to literatures. The curation environment has been adopted to build a community-driven curated DB by a globally well-known intrinsically-disorderd protein research group. The curated DB will drastically reduce researchers' efforts to retrieve and collect the data required for scientific discovery.

A MPEG Audio-Visual Conversational Communication Terminal on the B-ISDN Environment (광대역 ISDN용 MPEG 오디오-비쥬열 대화형 통신단말의 설계 및 구현)

  • Hwang, Dae-Hwan;Cho, Kyu-Seob
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.8
    • /
    • pp.1960-1971
    • /
    • 1998
  • The researches and developments to provide multimedia communication services such as Video on Demand(VoDJ), real time video phonc and multipoint vidco conferencing on broadband ISDN environmcnts have been proceeded with activity. Specifications for Vol) services which is worked by Digital Audio-Visual Council(DAVIC) to support detail technologies including total service system that is consist of VoD server. delive[\! networl, and Set-Top Box(STB) had been already finished and ITU-T SG16 also recommended the standards of H.300 series terminal aspects for conversational multimedia services, But the architectures of multimedia tenninals recommended and specified by these organizations do not have an efficient st11lcture to provide all of retrieval, distrihution and conversational service due to a different point of view about multimedia terminals and services. In this paper, we analyzed the recornmendatio!E and the specifications of intemational public and private organizations like lTU-T, DAVIC and ATM forum. As a result of these analysis. we propose an efficient terminal architecture, and then we have designed, lmplemented the multimedia communication terminal for offering VoI) and real- time conversation ,,, functional module test according to the individual commumication service session and confirined the validiry or terminal implemented to be used on broadband ISDK environments.

  • PDF