• 제목/요약/키워드: Implicit Extraction

검색결과 34건 처리시간 0.026초

한의 임상 정보의 효율적 통합을 위한 한의임상 데이터베이스 및 E-CRF 입력 시스템 구축 (Implementation of database and E-CRF for efficient integration of Korean clinical data)

  • 소지호;전영주;이범주
    • 한국인터넷방송통신학회논문지
    • /
    • 제16권5호
    • /
    • pp.205-212
    • /
    • 2016
  • 최근 의학기술의 발전과 더불어 서양의학 뿐만 아니라 한의학 분야에서도 임상 데이터에 대한 통합 및 표준화에 관한 연구가 활발히 진행 중에 있다. 유사한 임상시험 뿐만 아니라 전혀 다른 임상시험의 데이터도 하나의 표준에 맞춰 통합 구축된다면 통합된 의료데이터는 암묵적 한의의료지식 도출연구에 활용될 수 있다. 따라서 본 논문에서는 한의임상 정보를 효율적으로 저장하기 위하여 국제표준으로 널리 사용되는 CDISC 표준안을 기반으로 한의임상 데이터베이스를 구축하였고, 임상현장에서 편리한 데이터 입력을 위해 E-CRF를 구축하였다. 아울러, 실제 4개의 임상연구에 대한 데이터 저장과정을 거쳐 한의임상 데이터 통합에 대한 예를 보였다. 우리의 연구 결과는 통합된 데이터로부터 암묵적 의료지식도출을 위한 기반을 마련하였고, 데이터 통합을 통한 효율적 관리뿐만 아니라 반복적이거나 불필요한 임상시험 방지, 정제 된 데이터의 재배포를 통하여 연구의 편리성과 협업을 촉진할 수 있다.

키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법 (A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model)

  • 조원진;노상규;윤지영;박진수
    • Asia pacific journal of information systems
    • /
    • 제21권1호
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Arabic Words Extraction and Character Recognition from Picturesque Image Macros with Enhanced VGG-16 based Model Functionality Using Neural Networks

  • Ayed Ahmad Hamdan Al-Radaideh;Mohd Shafry bin Mohd Rahim;Wad Ghaban;Majdi Bsoul;Shahid Kamal;Naveed Abbas
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권7호
    • /
    • pp.1807-1822
    • /
    • 2023
  • Innovation and rapid increased functionality in user friendly smartphones has encouraged shutterbugs to have picturesque image macros while in work environment or during travel. Formal signboards are placed with marketing objectives and are enriched with text for attracting people. Extracting and recognition of the text from natural images is an emerging research issue and needs consideration. When compared to conventional optical character recognition (OCR), the complex background, implicit noise, lighting, and orientation of these scenic text photos make this problem more difficult. Arabic language text scene extraction and recognition adds a number of complications and difficulties. The method described in this paper uses a two-phase methodology to extract Arabic text and word boundaries awareness from scenic images with varying text orientations. The first stage uses a convolution autoencoder, and the second uses Arabic Character Segmentation (ACS), which is followed by traditional two-layer neural networks for recognition. This study presents the way that how can an Arabic training and synthetic dataset be created for exemplify the superimposed text in different scene images. For this purpose a dataset of size 10K of cropped images has been created in the detection phase wherein Arabic text was found and 127k Arabic character dataset for the recognition phase. The phase-1 labels were generated from an Arabic corpus of quotes and sentences, which consists of 15kquotes and sentences. This study ensures that Arabic Word Awareness Region Detection (AWARD) approach with high flexibility in identifying complex Arabic text scene images, such as texts that are arbitrarily oriented, curved, or deformed, is used to detect these texts. Our research after experimentations shows that the system has a 91.8% word segmentation accuracy and a 94.2% character recognition accuracy. We believe in the future that the researchers will excel in the field of image processing while treating text images to improve or reduce noise by processing scene images in any language by enhancing the functionality of VGG-16 based model using Neural Networks.

Adaptive Cross-Device Gait Recognition Using a Mobile Accelerometer

  • Hoang, Thang;Nguyen, Thuc;Luong, Chuyen;Do, Son;Choi, Deokjai
    • Journal of Information Processing Systems
    • /
    • 제9권2호
    • /
    • pp.333-348
    • /
    • 2013
  • Mobile authentication/identification has grown into a priority issue nowadays because of its existing outdated mechanisms, such as PINs or passwords. In this paper, we introduce gait recognition by using a mobile accelerometer as not only effective but also as an implicit identification model. Unlike previous works, the gait recognition only performs well with a particular mobile specification (e.g., a fixed sampling rate). Our work focuses on constructing a unique adaptive mechanism that could be independently deployed with the specification of mobile devices. To do this, the impact of the sampling rate on the preprocessing steps, such as noise elimination, data segmentation, and feature extraction, is examined in depth. Moreover, the degrees of agreement between the gait features that were extracted from two different mobiles, including both the Average Error Rate (AER) and Intra-class Correlation Coefficients (ICC), are assessed to evaluate the possibility of constructing a device-independent mechanism. We achieved the classification accuracy approximately $91.33{\pm}0.67%$ for both devices, which showed that it is feasible and reliable to construct adaptive cross-device gait recognition on a mobile phone.

Discovery of CPA`s Tacit Decision Knowledge Using Fuzzy Modeling

  • Li, Sheng-Tun;Shue, Li-Yen
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2001년도 The Pacific Aisan Confrence On Intelligent Systems 2001
    • /
    • pp.278-282
    • /
    • 2001
  • The discovery of tacit knowledge from domain experts is one of the most exciting challenges in today\`s knowledge management. The nature of decision knowledge in determining the quality a firm\`s short-term liquidity is full of abstraction, ambiguity, and incompleteness, and presents a typical tacit knowledge extraction problem. In dealing with knowledge discovery of this nature, we propose a scheme that integrates both knowledge elicitation and knowledge discovery in the knowledge engineering processes. The knowledge elicitation component applies the Verbal Protocol Analysis to establish industrial cases as the basic knowledge data set. The knowledge discovery component then applies fuzzy clustering to the data set to build a fuzzy knowledge based system, which consists of a set of fuzzy rules representing the decision knowledge, and membership functions of each decision factor for verifying linguistic expression in the rules. The experimental results confirm that the proposed scheme can effectively discover the expert\`s tacit knowledge, and works as a feedback mechanism for human experts to fine-tune the conversion processes of converting tacit knowledge into implicit knowledge.

  • PDF

Performance analysis of Savonius Rotor for Wave Energy Conversion using CFD

  • ;최영도;김규한;이영호
    • 한국신재생에너지학회:학술대회논문집
    • /
    • 한국신재생에너지학회 2009년도 추계학술대회 논문집
    • /
    • pp.600-605
    • /
    • 2009
  • A general purpose viscous flow solver Ansys CFX is used to study a Savonius type wave energy converter in a 3D numerical viscous wave tank. This paper presents the results of a computational fluid dynamics (CFD) analysis of the effect of blade configuration on the performance of 3 bladed Savonius rotors for wave energy extraction. A piston-type wave generator was incorporated in the computational domain to generate the desired incident waves. A complete OWC system with a 3-bladed Savonius rotor was modeled in a three dimensional numerical wave tank and the hydrodynamic conversion efficiency was estimated. The flow over the rotors is assumed to be two-dimensional (2D), viscous, turbulent and unsteady. The CFX code is used with a solver of the coupled conservation equations of mass, momentum and energy, with an implicit time scheme and with the adoption of the hexahedral mesh and the moving mesh techniques in areas of moving surfaces. Turbulence is modeled with the k.e model. Simulations were carried out simultaneously for the rotor angle and the helical twist. The results indicate that the developed models are suitable to analyze the water flows both in the chamber and in the turbine. For the turbine, the numerical results of torque were compared for all the cases.

  • PDF

상품 평가 텍스트에 암시된 사용자 관점 추출 (Extracting Implicit Customer Viewpoints from Product Review Text)

  • 장경록;이강욱;맹성현
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2013년도 제25회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.53-58
    • /
    • 2013
  • 온라인 소비자들은 amazon.com과 같은 온라인 상점 플랫폼에 상품 평가(리뷰: review) 글을 남김으로써 대상 상품에 대한 의견을 표현한다. 이러한 상품 리뷰는 다른 소비자들의 구매 결정에도 큰 영향을 끼친다는 관점에서 볼 때, 매우 중요한 정보원이라고 할 수 있다. 사람들이 남긴 의견 정보(opinion)를 자동으로 추출하거나 분석하고자 하는 연구인 감성 분석(sentiment analysis)분야에서 과거에 진행된 대다수의 연구들은 크게는 문서 단위에서 작게는 상품의 요소(aspect) 단위로 사용자들이 남긴 의견이 긍정적 혹은 부정적 감정을 포함하고 있는지 분석하고자 하였다. 이렇게 소비자들이 남긴 의견이 대상 상품 혹은 상품의 요소를 긍정적 혹은 부정적으로 판단했는지 여부를 판단하는 것이 유용한 경우도 있겠으나, 본 연구에서는 소비자들이 '어떤 관점'에서 대상 상품 혹은 상품의 요소를 평가했는지를 자동으로 추출하는 방법에 초점을 두었다. 본 연구에서는 형용사의 대표적인 성질 중 하나가 자신이 수식하는 명사의 속성에 값을 부여하는 것임에 주목하여, 수식된 명사의 속성을 추출하고자 하였고 이를 위해 WordNet을 사용하였다. 제안하는 방법의 효과를 검증하기 위해 3명의 평가자를 활용하여 실험을 하였으며 그 결과는 본 연구 방향이 감성분석에 있어 새로운 가능성을 열기에 충분하다는 것을 보여주었다.

  • PDF

텍스트 내 사건-공간 표현 간 참조 관계 분석을 위한 말뭉치 주석 (Corpus Annotation for the Linguistic Analysis of Reference Relations between Event and Spatial Expressions in Text)

  • 정진우;이희진;박종철
    • 한국언어정보학회지:언어와정보
    • /
    • 제18권2호
    • /
    • pp.141-168
    • /
    • 2014
  • Recognizing spatial information associated with events expressed in natural language text is essential not only for the interpretation of such events and but also for the understanding of the relations among them. However, spatial information is rarely mentioned as compared to events and the association between event and spatial expressions is also highly implicit in a text. This would make it difficult to automate the extraction of spatial information associated with events from the text. In this paper, we give a linguistic analysis of how spatial expressions are associated with event expressions in a text. We first present issues in annotating narrative texts with reference relations between event and spatial expressions, and then discuss surface-level linguistic characteristics of such relations based on the annotated corpus to give a helpful insight into developing an automated recognition method.

  • PDF

Semantic Trajectory Based Behavior Generation for Groups Identification

  • Cao, Yang;Cai, Zhi;Xue, Fei;Li, Tong;Ding, Zhiming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권12호
    • /
    • pp.5782-5799
    • /
    • 2018
  • With the development of GPS and the popularity of mobile devices with positioning capability, collecting massive amounts of trajectory data is feasible and easy. The daily trajectories of moving objects convey a concise overview of their behaviors. Different social roles have different trajectory patterns. Therefore, we can identify users or groups based on similar trajectory patterns by mining implicit life patterns. However, most existing daily trajectories mining studies mainly focus on the spatial and temporal analysis of raw trajectory data but missing the essential semantic information or behaviors. In this paper, we propose a novel trajectory semantics calculation method to identify groups that have similar behaviors. In our model, we first propose a fast and efficient approach for stay regions extraction from daily trajectories, then generate semantic trajectories by enriching the stay regions with semantic labels. To measure the similarity between semantic trajectories, we design a semantic similarity measure model based on spatial and temporal similarity factor. Furthermore, a pruning strategy is proposed to lighten tedious calculations and comparisons. We have conducted extensive experiments on real trajectory dataset of Geolife project, and the experimental results show our proposed method is both effective and efficient.

도로 네트워크 환경에서 사용자의 이동 성향을 고려한 경로 생성 기법 (A Path Planning Scheme Using the Moving Tendencies of Users in Road Network Environments)

  • 황동교;박혁;김동주;리하;박준호;박용훈;복경수;유재수
    • 한국콘텐츠학회논문지
    • /
    • 제12권9호
    • /
    • pp.16-26
    • /
    • 2012
  • 사용자들은 시간, 거리, 도로 혼잡도와 같은 속성들에 의해 선호하는 경로가 있기 때문에 사용자의 이동 성향에 맞는 경로를 생성하는 기법들이 필요하다. 기존의 기법들은 이동 성향을 고려하여 경로를 생성하기 위해서는 사용자의 이동 성향 정보를 추가적으로 입력하여야 한다. 그러나 네비게이션 및 모바일 장치의 불편한 인터페이스 특성상 이러한 정보 입력은 거의 하지 않고 출발지와 목적지만을 입력하여 경로를 추천받는 경향이 있다. 본 논문에서는 추가적인 이동 성향에 대한 정보 입력 없이 이동 성향에 맞는 경로를 생성하는 기법을 제안한다. 성능평가를 통해 최소 시간 경로나 최단 거리 경로와 비교하여 제안하는 기법이 사용자의 이동 성향을 고려한 경로가 생성됨을 입증한다.