• Title/Summary/Keyword: 검색어 추출

Search Result 329, Processing Time 0.034 seconds

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.

Activity Screening of the Proteolytic Enzymes Responsible for Post-mortem Degradation of Fish Tissues (어류의 사후 변화에 관여하는 단백질분해효소의 검색)

  • PYEUN Jae-Hyeung;LEE Dong-Soo;KIM Doo-Sang;HEU Min-Soo
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.29 no.3
    • /
    • pp.296-308
    • /
    • 1996
  • Proteolytic enzymes responsible for post-mortem degradation of the fish tissues have been studied in regard with screening the proteases distributed in the fish body by reacting with the specific synthesized substrates. Activities of cathepsin L, B, H, G, and D like enzymes were detected in the muscle crude protease from the both kind of fish, dark fleshed fish (anchovy, Engraulis japonica, and gizzard-shad, Clupanodo punctatus) and white fleshed fish (seabass, Lateolabrax japonicus, and sole, Pleuronichthys cornutus), however, those of chymotrypsin, trypsin, pepsin, and peptidase like enzymes were observed 3n the viscera crude pretense from the fish. Proteolytic activities of the muscle crude protease at pH 6.0 were similar to those of the viscera crude protease at pH 8.0, but, those of the viscera crude protease at pH 8.0 were about 2 times higher than those at pH 6.0. The muscle and viscera crude protease from anchovy showed the strongest proteolytic activity among the four fish crude proteases and the proteolytic activity of the viscera crude protease was approximately 100 times higher than that of the muscle crude protease, which suggest that viscera proteases were more contributed on the development of post-mortem changes than muscle proteases. With the degradation patterns on SDS-polyacrylamide gel electrophoresis against yellowtail myofibrillar proteins, the muscle and viscera crude protease of the four fishes were primary responsible for the degradation of myosin heavy chain, and myosin light chain and actin, respectively.

  • PDF

Analysis of Research Trends of 'Word of Mouth (WoM)' through Main Path and Word Co-occurrence Network (주경로 분석과 연관어 네트워크 분석을 통한 '구전(WoM)' 관련 연구동향 분석)

  • Shin, Hyunbo;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.179-200
    • /
    • 2019
  • Word-of-mouth (WoM) is defined by consumer activities that share information concerning consumption. WoM activities have long been recognized as important in corporate marketing processes and have received much attention, especially in the marketing field. Recently, according to the development of the Internet, the way in which people exchange information in online news and online communities has been expanded, and WoM is diversified in terms of word of mouth, score, rating, and liking. Social media makes online users easy access to information and online WoM is considered a key source of information. Although various studies on WoM have been preceded by this phenomenon, there is no meta-analysis study that comprehensively analyzes them. This study proposed a method to extract major researches by applying text mining techniques and to grasp the main issues of researches in order to find the trend of WoM research using scholarly big data. To this end, a total of 4389 documents were collected by the keyword 'Word-of-mouth' from 1941 to 2018 in Scopus (www.scopus.com), a citation database, and the data were refined through preprocessing such as English morphological analysis, stopwords removal, and noun extraction. To carry out this study, we adopted main path analysis (MPA) and word co-occurrence network analysis. MPA detects key researches and is used to track the development trajectory of academic field, and presents the research trend from a macro perspective. For this, we constructed a citation network based on the collected data. The node means a document and the link means a citation relation in citation network. We then detected the key-route main path by applying SPC (Search Path Count) weights. As a result, the main path composed of 30 documents extracted from a citation network. The main path was able to confirm the change of the academic area which was developing along with the change of the times reflecting the industrial change such as various industrial groups. The results of MPA revealed that WoM research was distinguished by five periods: (1) establishment of aspects and critical elements of WoM, (2) relationship analysis between WoM variables, (3) beginning of researches of online WoM, (4) relationship analysis between WoM and purchase, and (5) broadening of topics. It was found that changes within the industry was reflected in the results such as online development and social media. Very recent studies showed that the topics and approaches related WoM were being diversified to circumstantial changes. However, the results showed that even though WoM was used in diverse fields, the main stream of the researches of WoM from the start to the end, was related to marketing and figuring out the influential factors that proliferate WoM. By applying word co-occurrence network analysis, the research trend is presented from a microscopic point of view. Word co-occurrence network was constructed to analyze the relationship between keywords and social network analysis (SNA) was utilized. We divided the data into three periods to investigate the periodic changes and trends in discussion of WoM. SNA showed that Period 1 (1941~2008) consisted of clusters regarding relationship, source, and consumers. Period 2 (2009~2013) contained clusters of satisfaction, community, social networks, review, and internet. Clusters of period 3 (2014~2018) involved satisfaction, medium, review, and interview. The periodic changes of clusters showed transition from offline to online WoM. Media of WoM have become an important factor in spreading the words. This study conducted a quantitative meta-analysis based on scholarly big data regarding WoM. The main contribution of this study is that it provides a micro perspective on the research trend of WoM as well as the macro perspective. The limitation of this study is that the citation network constructed in this study is a network based on the direct citation relation of the collected documents for MPA.

An Automatic Schema Generation System based on the Contents for Integrating Web Information Sources (웹 정보원 통합을 위한 내용 기반의 스키마 자동생성시스템)

  • Kwak, Jun-Young;Bae, Jong-Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.6
    • /
    • pp.77-86
    • /
    • 2008
  • The Web information sources can be regarded as the largest distributed database to the users. By virtually integrating the distributed information sources and regarding them as a single huge database, we can query the database to extract information. This capability is important to develop Web application programs. We have to infer a database schema from browsing-oriented Web documents in order to integrate databases. This paper presents a heuristic algorithm to infer the XML Schema fully automatically from semi-structured Web documents. The algorithm first extracts candidate pattern regions based on predefined structure-making tags, and determines a target pattern region using a few heuristic factors, and then derives XML Schema extraction rules from the target pattern region. The schema extraction rule is represented in XQuery, which makes development of various application systems possible using open standard XML tools. We also present the experimental results for several public web sources to show the effectiveness of the algorithm.

  • PDF

Semi-automatic Ontology Modeling for VOD Annotation for IPTV (IPTV의 VOD 어노테이션을 위한 반자동 온톨로지 모델링)

  • Choi, Jung-Hwa;Heo, Gil;Park, Young-Tack
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.7
    • /
    • pp.548-557
    • /
    • 2010
  • In this paper, we propose a semi-automatic modeling approach of ontology to annotate VOD to realize the IPTV's intelligent searching. The ontology is made by combining partial tree that extracts hypernym, hyponym, and synonym of keywords related to a service domain from WordNet. Further, we add to the partial tree new keywords that are undefined in WordNet, such as foreign words and words written in Chinese characters. The ontology consists of two parts: generic hierarchy and specific hierarchy. The former is the semantic model of vocabularies such as keywords and contents of keywords. They are defined as classes including property restrictions in the ontology. The latter is generated using the reasoning technique by inferring contents of keywords based on the generic hierarchy. An annotation generates metadata (i.e., contents and genre) of VOD based on the specific hierarchy. The generic hierarchy can be applied to other domains, and the specific hierarchy helps modeling the ontology to fit the service domain. This approach is proved as good to generate metadata independent of any specific domain. As a result, the proposed method produced around 82% precision with 2,400 VOD annotation test data.

The Blog Polarity Classification Technique using Opinion Mining (오피니언 마이닝을 활용한 블로그의 극성 분류 기법)

  • Lee, Jong-Hyuk;Lee, Won-Sang;Park, Jea-Won;Choi, Jae-Hyun
    • Journal of Digital Contents Society
    • /
    • v.15 no.4
    • /
    • pp.559-568
    • /
    • 2014
  • Previous polarity classification using sentiment analysis utilizes a sentence rule by product reviews based rating points. It is difficult to be applied to blogs which have not rating of product reviews and is possible to fabricate product reviews by comment part-timers and managers who use web site so it is not easy to understand a product and store reviews which are reliability. Considering to these problems, if we analyze blogs which have personal and frank opinions and classify polarity, it is possible to understand rightly opinions for the product, store. This paper suggests that we extract high frequency vocabularies in blogs by several domains and choose topic words. Then we apply a technique of sentiment analysis and classify polarity about contents of blogs. To evaluate performances of sentiment analysis, we utilize the measurement index that use Precision, Recall, F-Score in an information retrieval field. In a result of evaluation, using suggested sentiment analysis is the better performances to classify polarity than previous techniques of using the sentence rule based product reviews.

Opinion Mining of Product Reviews using Sentiment Phrase Patterns considered the Endings of Declinable Words (어미변화를 고려한 감성 구문 패턴을 이용한 상품평 의견 분류)

  • Kim, Jung-Ho;Cha, Myung-Hoon;Kim, Myung-Kyu;Chae, Soo-Hoan
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.285-290
    • /
    • 2010
  • 인터넷이 대중화됨에 따라 누구나 쉽게 자신의 의견을 온라인상에 표현할 수 있게 되었다. 그 결과 생각이나 느낌을 나타내는 의견 데이터들의 양이 급속도로 방대해졌으며, 이러한 데이터들을 이용한 여러 응용 사례들의 등장으로, 효율적인 검색 및 자동 분류 기술이 요구되고 있다. 이런 기술적 흐름에 맞추어 의견 데이터 분류에 관한 여러 연구들이 이루어져 왔다. 이러한 의견 분류에 대한 연구들을 살펴보면, 분류를 위해 자질(Feature)로서 사용한 단일어(Single word)가 아닌 2개 이상의 N-gram 단어, 어휘 구문 패턴 및 통사 구문 패턴 등을 사용한다. 특히, 패턴은 단일어나 N-gram 단어에 비해 유연하고, 언어학적으로 풍부한 정보를 표현할 수 있기 때문에 이를 주요 연구 주제로 사용되었다. 그럼에도 불구하고, 이러한 연구들은 주로 영어에 대한 연구들이었으며, 한국어에 패턴을 적용하여 주관성을 갖는 문장을 분류하거나, 극성을 분류하는 연구들은 아직 미비하다. 한국어의 특색으로 한국어는 용언의 활용이 발달되어 있어, 어미의 변화가 다양하며, 그 변화에 따라 의미가 미묘하게 변화한다. 그러나 기존 한국어에 대한 의견 분류 연구들은 단어의 핵심 의미만을 파악하기 위해 어미 부분을 제거하고 어간만을 취해서 처리하여 어미에 대한 의미변화를 고려하지 못하므로 분류 정확도가 영어권에 연구 결과에 비해 떨어진다. 그래서 본 연구는 영어에 적용된 패턴을 이용한 기존 방법들을 정리하고, 그 방법들 중에서 극성을 지닌 문장성분 패턴을 한국어에 적용하였다. 그리고 어미의 변화에 대한 패턴을 추출하여 이 변화가 의견 분류의 성능에 미치는 영향을 분석하였다.

  • PDF

Analysis of Major Changes in Press Articles Related to 'High School Credit System'

  • Kwon, Choong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.7
    • /
    • pp.183-191
    • /
    • 2020
  • The purpose of this study is to objectively analyze the trend of media articles related to the 'high school credit system' (2017~2019: 3 years), which has become the biggest concern among Korean education policies, through BIGKinds, a news data big data analysis service for media companies. The main research methodologies were BIGKinds system's specific search term news search, news trend analysis, keyword extraction and wordcloud implementation, network analysis and network picture presentation. The research results are as follows; First, the number of articles related to the high school credit system that appeared in major media outlets in Korea for 3 years from 2017 to 2019 was 3,649. The number of articles was sharply increased at a certain point about 4 times, based on the government's announcement of related policies. It showed an increasing news trend. Second, the top 20 keywords that emerged from the press articles related to the high school credit system for 3 years of analysis were presented, and it was confirmed that the keyword change by year appeared. Third, the network of media articles related to the high school credit system was visualized and presented in different ways by person, institution, and keyword. The results of this study confirmed that the high school credit system education policy was adopted as the representative education policy of the Moon Jae-in government, and is proceeding in the policy decision stage and policy implementation stage.

Biological characterization of Tenacibaculum maritimum isolated from cultured olive flounder in Korea and sensitivity against native plant extracts (한국의 양식넙치에서 분리한 Tenacibaculum maritimum의 특성과 자생식물 추출물에 대한 감수성)

  • Jang, Yeoung-Hwan;Jeong, Joon-Bum;Yeo, In-Kyu;Kim, Ki-Young;Harikrishnan, Ramasamy;Heo, Moon-Soo
    • Journal of fish pathology
    • /
    • v.22 no.1
    • /
    • pp.53-65
    • /
    • 2009
  • Tenacibaculum maritimum (formerly Flexibacter maritimus) is the aetiological agent of an ulcerative and necrotic disease commonly called tenacibaculosis in marine fish. Tenacibaculosis is an economically important disease in a great variety species in Jeju Island cultured fish and leading to this pathogen initially affected by skin, mouth, fins, tail causing severe necrotic and ulcerative lesions on the body surface. In the present study, A-7 strain was isolated from Paralichthys olivaceus showing symptoms of tenacibaculosis and identified as T. maritimum by morphological, biochemical and molecular biological analysis. T. maritimum A-7 is experimentally infected through immersion route in Paralichthys olivaceus which the disease outbreaks in land-based fish tanks of Jeju Island. Up to data a number of treatments proposed for the tenacibaculosis outbreaks are based on the immersion administration of drugs in tank. Oxytetracycline is the most widely used disinfectants in fish farms. However, most of fish farms manager and consumers have expressed concern as bioaccumulation in tissue and its environmental. In addition, this antimicrobial compounds is expensive in fish farmers. The overcome of this problem is desired the application of natural plant derived products. To obtain as 70% EtOH extract antimicrobial compounds against tenacibaculosis from 35 species of Jeju Island native plants were screened for antimicrobial activity against T. maritimum. In the present study were identified most of the plant extracts were better antimicrobial activity against T. maritimum.

Schema Element Matching System using WordNet (워드넷을 이용한 스키마 엘리먼트 매칭 시스템)

  • Lee, Min-Ho;Lee, Won-Goo;Choi, Yun-Soo;Yun, Hwa-Muk;Choi, Dong-Hoon;Cho, Min-Hee;Jung, Han-Min
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.122-124
    • /
    • 2012
  • 정보의 상호운용성 확보를 위해서 여러 형태로 정의되어 있는 스키마들을 매칭하는 것은 반드시 필요한 작업이다. 워드넷은 영어의 의미 어휘목록으로 유의어 집단과 어휘 목록사이의 다양한 의미관계를 기록하여 자동화된 본문 분석과 인공지능 응용에 활용할 수 있다. 본 논문에서는 워드넷을 이용하여 스키마 엘리먼트 이름의 의미 집합을 추출하고 대응하는 엘리먼트 의미 집합과의 유사도를 측정함으로써 스키마 엘리먼트를 매칭하는 시스템을 제안한다. 본 시스템은 다중매칭된 복잡한 관계를 간단한 방법으로 단일매칭화함으로써 사용자가 직관적이고 용이하게 사용할 수 있다. 이를 통하여 데이터 통합, 변환, 분산 검색 등 정보의 상호운용이 필요한 다양한 분야에서 활용될 수 있을 것으로 기대한다.