• Title/Summary/Keyword: String matching

Search Result 102, Processing Time 0.023 seconds

The Reference Identifier Matching System for Developing Reference Linking Service (참조연계 서비스 구현을 위한 참고문헌 식별자 매칭 시스템)

  • Lee, Yong-Sik;Lee, Sang-Gi
    • Journal of Information Management
    • /
    • v.41 no.3
    • /
    • pp.191-209
    • /
    • 2010
  • A reference linking service that is connection of each other different information resource need to setup the reference database and to match identifier. CrossRef, PubMed and Web Of Science etc. the many overseas agencies developed reference linking service, that they used the automatic tools of Inera eXstyles, Parity Computings Reference Extractor etc. and setup in base DOI and PMID etc. Domestic the various agencies of KISTI(Korea Institute Science and Technology of Information), KRF(Korea Research Foundation) etc are construction reference database. But each research communities adopts a various reference bibliography writing format. As, the data base construction which is collect is confronting is many to being difficult. In this paper, We developed the Citation Matcher System. This system is automatic parsing the reference string to metadata and matching DOI, PMID and KOI as Identifier. It is improved the effectiveness of reference database setup.

Efficient Regular Expression Matching Using FPGA (FPGA를 이용한 효율적 정규표현매칭)

  • Lee, Jang-Haeng;Lee, Seong-Won;Park, Neung-Soo
    • The KIPS Transactions:PartC
    • /
    • v.16C no.5
    • /
    • pp.583-588
    • /
    • 2009
  • Network intrusion detection system (NIDS) monitors all incoming packets in the network and detects packets that are malicious to internal system. The NIDS should also have ability to update detection rules because new attack patterns are unpredictable. Incorporating FPGAs into the NIDS is one of the best solutions that can provide both high performance and high flexibility comparing with other approaches such as software solutions. In this paper we propose and design a novel approach, prefix sharing parallel pattern matcher, that can not only minimize additional resources but also maximize the processing performance. Experimental results showed that the throughput for 16-bit input is twice larger than for 8-bit input but the used LEs/Char in FPGA increases only 1.07 times.

Intracorporeal Esophagojejunostomy Using a Circular or a Linear Stapler in Totally Laparoscopic Total Gastrectomy: a Propensity-Matched Analysis

  • Kang, So Hyun;Cho, Yo-Seok;Min, Sa-Hong;Park, Young Suk;Ahn, Sang-Hoon;Park, Do Joong;Kim, Hyung-Ho
    • Journal of Gastric Cancer
    • /
    • v.19 no.2
    • /
    • pp.193-201
    • /
    • 2019
  • Purpose: There is no consensus on the optimal method for intracorporeal esophagojejunostomy (EJ) in laparoscopic total gastrectomy (LTG). This study aims to compare 2 established methods of EJ anastomosis in LTG. Materials and Methods: A total of 314 patients diagnosed with gastric cancer that underwent LTG in the period from January 2013 to October 2016 were enrolled in the study. In 254 patients, the circular stapler with purse-string "Lap-Jack" method was used, and in the other 60 patients the linear stapling method was used for EJ anastomosis. After propensity score matching, 58 were matched 1:1, and retrospective data for patient characteristics, surgical outcome, and post-operative complications was reviewed. Results: The 2 groups showed no significant difference in age, body mass index, or other clinicopathological characteristics. After propensity score matching analysis, the linear group had shorter operating time than the circular group ($200.3{\pm}62.0$ vs. $244.0{\pm}65.5$, $P{\leq}0.001$). Early postoperative complications in the circular and linear groups occurred in 12 (20.7%) and 15 (25.9%, P=0.660) patients, respectively. EJ leakage occurred in 3 (5.2%) patients from each group, with 1 patient from each group needing intervention of Clavien-Dindo grade III or more. Late complications were observed in 3 (5.1%) patients from the linear group only, including 1 EJ anastomosis stricture, but there was no statistical significance. Conclusions: Both circular and linear stapling techniques are feasible and safe in performing intracorporeal EJ anastomosis during LTG. The linear group had shorter operative time, but there was no difference in anastomosis complications.

Analysis of Molecular Pathways in Pancreatic Ductal Adenocarcinomas with a Bioinformatics Approach

  • Wang, Yan;Li, Yan
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.6
    • /
    • pp.2561-2567
    • /
    • 2015
  • Pancreatic ductal adenocarcinoma (PDAC) is a leading cause of cancer death worldwide. Our study aimed to reveal molecular mechanisms. Microarray data of GSE15471 (including 39 matching pairs of pancreatic tumor tissues and patient-matched normal tissues) was downloaded from Gene Expression Omnibus (GEO) database. We identified differentially expressed genes (DEGs) in PDAC tissues compared with normal tissues by limma package in R language. Then GO and KEGG pathway enrichment analyses were conducted with online DAVID. In addition, principal component analysis was performed and a protein-protein interaction network was constructed to study relationships between the DEGs through database STRING. A total of 532 DEGs were identified in the 38 PDAC tissues compared with 33 normal tissues. The results of principal component analysis of the top 20 DEGs could differentiate the PDAC tissues from normal tissues directly. In the PPI network, 8 of the 20 DEGs were all key genes of the collagen family. Additionally, FN1 (fibronectin 1) was also a hub node in the network. The genes of the collagen family as well as FN1 were significantly enriched in complement and coagulation cascades, ECM-receptor interaction and focal adhesion pathways. Our results suggest that genes of collagen family and FN1 may play an important role in PDAC progression. Meanwhile, these DEGs and enriched pathways, such as complement and coagulation cascades, ECM-receptor interaction and focal adhesion may be important molecular mechanisms involved in the development and progression of PDAC.

Fast, Flexible Text Search Using Genomic Short-Read Mapping Model

  • Kim, Sung-Hwan;Cho, Hwan-Gue
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.518-528
    • /
    • 2016
  • The searching of an extensive document database for documents that are locally similar to a given query document, and the subsequent detection of similar regions between such documents, is considered as an essential task in the fields of information retrieval and data management. In this paper, we present a framework for such a task. The proposed framework employs the method of short-read mapping, which is used in bioinformatics to reveal similarities between genomic sequences. In this paper, documents are considered biological objects; consequently, edit operations between locally similar documents are viewed as an evolutionary process. Accordingly, we are able to apply the method of evolution tracing in the detection of similar regions between documents. In addition, we propose heuristic methods to address issues associated with the different stages of the proposed framework, for example, a frequency-based fragment ordering method and a locality-aware interval aggregation method. Extensive experiments covering various scenarios related to the search of an extensive document database for documents that are locally similar to a given query document are considered, and the results indicate that the proposed framework outperforms existing methods.

Similarity Measure based on XML Document's Structure and Contents (XML 문서의 구조와 내용을 고려한 유사도 측정)

  • Kim, Woo-Saeng
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.8
    • /
    • pp.1043-1050
    • /
    • 2008
  • XML has become a standard for data representation and exchange on the Internet. With a large number of XML documents on the Web, there is an increasing need to automatically process those structurally rich documents for information retrieval, document management, and data mining applications. In this paper, we propose a new method to measure the similarity between XML documents by considering their structures and contents. The similarity of document's structure is found by a simple string matching technique and that of document's contents is found by weights taking into account of the names and positions of elements. The overall algorithm runs in time that is linear in the combined size of the two documents involved in comparison evaluation.

  • PDF

Parallel Computation for Extended Edit Distances Using the Shared Memory on GPU (GPU의 공유메모리를 활용한 확장편집거리 병렬계산)

  • Kim, Youngho;Na, Joong Chae;Sim, Jeong Seop
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.7
    • /
    • pp.213-218
    • /
    • 2015
  • Given two strings X and Y (|X|=m, |Y|=n) over an alphabet ${\Sigma}$, the extended edit distance between X and Y can be computed using dynamic programming in O(mn) time and space. Recently, a parallel algorithm that takes O(m+n) time and O(mn) space using m threads to compute the extended edit distance between X and Y was presented. In this paper, we present an improved parallel algorithm using the shared memory on GPU. The experimental results show that our parallel algorithm runs about 19~25 times faster than the previous parallel algorithm.

Ontology-Based Information Retrieval for Cultural Assets Information (문화재 정보의 온톨로지 기반 검색시스템)

  • Baek Seung-Jae;Cheon Hyeon-Jae;Lee Hong-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.3 s.35
    • /
    • pp.229-236
    • /
    • 2005
  • The Semantic Web enables machines to achieve an effective retrieval, integration, and reuse of web resources. The keyword search method currently used has a limit to accurate search results because of a simple string matching method in web environment. This paper proposes an Ontology-Based Information Retrieval which can solve the problems and retrieve better search results through semantic relations. In this system, we implemented the Cultural Assets Ontology based on OWL with RDQL and Jena API. we also suggest a method to handle properties stored in a database.

  • PDF

Memory-Efficient High Performance Parallelization of Aho-Corasick Algorithm on Intel Xeon Phi (Intel Xeon Phi 에서의 Aho-Corasick 알고리즘을 위한 메모리 친화적인 고성능 병렬화)

  • Tran, Nhat-Phuong;Jeong, Yosang;Lee, Myungho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.87-89
    • /
    • 2014
  • Aho-Corasick (AC) algorithm is a multiple patterns string matching algorithm commonly used in many applications with real-time performance requirements. In this paper, we parallelize the AC algorithm on the Intel's Many Integrated Core (MIC) Architecture, Xeon Phi Coprocessor. We propose a new technique to compress the Deterministic Finite Automaton structure which represents the set of pattern strings again which the input data is inspected for possible matches. The new technique reduces the cache misses and leads to significantly improved performance on Xeon Phi.

A Study on the Prediction of Drug Efficacy by Using Molecular Structure (분자구조 유사도를 활용한 약물 효능 예측 알고리즘 연구)

  • Jeong, Hwayoung;Song, Changhyeon;Cho, Hyeyoun;Key, Jaehong
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.4
    • /
    • pp.230-240
    • /
    • 2022
  • Drug regeneration technology is an efficient strategy than the existing new drug development process, which requires large costs and time by using drugs that have already been proven safe. In this study, we recognize the importance of the new drug regeneration aspect of new drug development and research in predicting functional similarities through the basic molecular structure that forms drugs. We test four string-based algorithms by using SMILES data and searching for their similarities. And by using the ATC codes, pair them with functional similarities, which we compare and validate to select the optimal model. We confirmed that the higher the molecular structure similarity, the higher the ATC code matching rate. We suggest the possibility of additional potency of random drugs, which can be predicted through data that give information on drugs with high molecular similarities. This model has the advantage of being a great combination with additional data, so we look forward to using this model in future research.