• 제목/요약/키워드: Text Databases

검색결과 192건 처리시간 0.025초

생명정보학과 유전체의학 (Bioinformatics and Genomic Medicine)

  • 김주한
    • Journal of Preventive Medicine and Public Health
    • /
    • 제35권2호
    • /
    • pp.83-91
    • /
    • 2002
  • Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges in computational sciences. Clinical informatics has long developed methodologies to improve biomedical research and clinical care by integrating experimental and clinical information systems. The informatics revolutions both in bioinformatics and clinical informatics will eventually change the current practice of medicine, including diagnostics, therapeutics, and prognostics. Postgenome informatics, powered by high throughput technologies and genomic-scale databases, is likely to transform our biomedical understanding forever much the same way that biochemistry did a generation ago. The paper describes how these technologies will impact biomedical research and clinical care, emphasizing recent advances in biochip-based functional genomics and proteomics. Basic data preprocessing with normalization, primary pattern analysis, and machine learning algorithms will be presented. Use of integrated biochip informatics technologies, text mining of factual and literature databases, and integrated management of biomolecular databases will be discussed. Each step will be given with real examples in the context of clinical relevance. Issues of linking molecular genotype and clinical phenotype information will be discussed.

인터넷 웹에서의 STN 검색 (Information Searching on STN Web (STN Easy & ChemPort))

  • 유선희
    • 정보관리연구
    • /
    • 제30권1호
    • /
    • pp.11-28
    • /
    • 1999
  • STN(The Scientific & Technical Information Network)은 과학 기술, 산업 및 특허 분야의 200여종의 데이터베이스를 검색할 수 있는 상용 온라인 데이터뱅크이다. 이러한 STN의 데이터 베이스중 이용도가 높은 59개의 데이터베이스를 인터넷의 웹상에서 편리하게 검색할 수 있고, 서지 사항이나 초록뿐만 아니라, 특허의 도면이나 비즈니스 정보의 전문 기사 출력 및 화학물질의 3차원 표시도 가능한 STN Easy(http://stneasy.cas.org)의 특징과 기능에 대해 알아보았다. 또한 이렇게 검색한 결과를 ACS(American Chemical Society)를 비롯하여 총 9개 출판사의 웹사이트로 링크하여 전자화된 원문을 얻을 수 있는 ChemPort(http://www.chemfort.org)를 소개하였다.

  • PDF

인터넷상에서 텍스트와 TIFF 이미지 자료 디스플레이를 위한 뷰어 구현 및 평가 (Implementation and Evaluation of Integrated Viewier for Displanning Text and TIFF Image Materials on the Internet Environments)

  • 최흥식
    • 정보관리학회지
    • /
    • 제17권1호
    • /
    • pp.67-87
    • /
    • 2000
  • The purpose of the study is to develop an integrated viewer which can display both text and image files on the Internet environment. Up to now, most viewers for full-text databases can be displayed documents only by image or graphic viewers. The newly developed system can compress document files in commercial word processors (e.g, 한글TM, WordTM, ExceITM, PowerpointTM, HunminJungumTM, ArirangTM, CADTM), as well as conventional TIFF image file in smaller size, which were converted into DVI(DeVice Independent) file format, and display them on computer screen. IDoc Viewer was evaluated to test its performance by user group, consisting of 5 system developers, 5 librarians, and 10 end-users. IDoc Viewer has been proved to be good or excellent at 20 out of 26 check lists.

  • PDF

Design and Implementation of Web Crawler with Real-Time Keyword Extraction based on the RAKE Algorithm

  • Zhang, Fei;Jang, Sunggyun;Joe, Inwhee
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2017년도 추계학술발표대회
    • /
    • pp.395-398
    • /
    • 2017
  • We propose a web crawler system with keyword extraction function in this paper. Researches on the keyword extraction in existing text mining are mostly based on databases which have already been grabbed by documents or corpora, but the purpose of this paper is to establish a real-time keyword extraction system which can extract the keywords of the corresponding text and store them into the database together while grasping the text of the web page. In this paper, we design and implement a crawler combining RAKE keyword extraction algorithm. It can extract keywords from the corresponding content while grasping the content of web page. As a result, the performance of the RAKE algorithm is improved by increasing the weight of the important features (such as the noun appearing in the title). The experimental results show that this method is superior to the existing method and it can extract keywords satisfactorily.

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • 제2권2호
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.

Research Trends on Literature Reviews in Scopus Journals by Authors from Indonesia, Japan, South Korea, Vietnam, Singapore, and Malaysia: A Bibliometric Analysis from 2003 to 2022

  • Prakoso Bhairawa Putera;Amelya Gustina
    • Asian Journal of Innovation and Policy
    • /
    • 제12권3호
    • /
    • pp.304-322
    • /
    • 2023
  • Text data mining ('big data methods') is one of the most widely used approaches during the COVID-19 pandemic. In particular, text data mining on Scopus databases or Web of Science (WoS). Text data mining is widely used to collect literature for later bibliometric analysis, and in the end, it becomes a literature review article. Therefore, in this article, we reveal the trend of publication of literature reviews in Scopus journals from Indonesia, Japan, South Korea, Vietnam, Singapore, and Malaysia. This article describes two essential parts, namely 1) a comparison of international publication trends and subject area of literature review publications, and 2) a comparison of Top 5 for Authors, Affiliation, Source Title, and Collaboration Country.

Enhancing the Text Mining Process by Implementation of Average-Stochastic Gradient Descent Weight Dropped Long-Short Memory

  • Annaluri, Sreenivasa Rao;Attili, Venkata Ramana
    • International Journal of Computer Science & Network Security
    • /
    • 제22권7호
    • /
    • pp.352-358
    • /
    • 2022
  • Text mining is an important process used for analyzing the data collected from different sources like videos, audio, social media, and so on. The tools like Natural Language Processing (NLP) are mostly used in real-time applications. In the earlier research, text mining approaches were implemented using long-short memory (LSTM) networks. In this paper, text mining is performed using average-stochastic gradient descent weight-dropped (AWD)-LSTM techniques to obtain better accuracy and performance. The proposed model is effectively demonstrated by considering the internet movie database (IMDB) reviews. To implement the proposed model Python language was used due to easy adaptability and flexibility while dealing with massive data sets/databases. From the results, it is seen that the proposed LSTM plus weight dropped plus embedding model demonstrated an accuracy of 88.36% as compared to the previous models of AWD LSTM as 85.64. This result proved to be far better when compared with the results obtained by just LSTM model (with 85.16%) accuracy. Finally, the loss function proved to decrease from 0.341 to 0.299 using the proposed model

생물학 문헌 데이터의 제목과 본문을 이용한 질병 관련 유전자 추론 방법 (Inferring Disease-related Genes using Title and Body in Biomedical Text)

  • 김정우;김현진;여윤구;신민철;박상현
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제23권1호
    • /
    • pp.28-36
    • /
    • 2017
  • 1990년대 게놈프로젝트 이후 유전자와 관련된 많은 연구가 진행되고 있다. 데이터 저장 기술의 발달로 연구의 결과물들은 다량의 문헌들로 기록되고 있으며, 이러한 문헌들은 새로운 생물학적 관계들을 추론하는 데이터로 유용하게 사용되고 있다. 이러한 이유로 본 연구에서는 생물학 문헌들을 활용하여 질병과 관련한 유전자를 추론하는 방법론에 대해서 제안한다. 문헌들을 제목과 본문으로 구분하고, 각 영역에서 등장한 유전자들을 추출한다. 제목 영역에서 추출된 유전자는 중심 유전자로 구분하고, 본문 영역에서 추출된 유전자는 제목에서 추출된 유전자와 관계를 갖는 주변 유전자로 구분한다. 이러한 과정을 각 문헌에 적용하여, 지역 유전자 네트워크를 구축한다. 구축된 지역 유전자 네트워크는 모두 연결하여 전역유전자 네트워크를 구축한다. 구축한 네트워크를 분석하여 질병 관련 유전자를 추론하였으며, 비교 실험을 통해 제안하는 방법론이 질병 관련 유전자를 추론하는 유용한 방법론임을 입증하였다.

방사선 의료영상 검색 시스템에 관한 연구 (A Study on Radiological Image Retrieval System)

  • 박병래;신용원
    • 대한방사선기술학회지:방사선기술과학
    • /
    • 제28권1호
    • /
    • pp.19-24
    • /
    • 2005
  • 방사선사를 위한 교육 및 영상 정보에 대한 정확한 판단에 유용한 주석-기반 방사선 의료영상 검색 시스템을 설계 및 구현하고, 방사선 의료영상에 대한 단순 속성정보, 부가적인 정보인 텍스트 설명정보로부터 추출한 중요 키워드에 대한 효율적인 검색을 위해 $B^+$-트리와 역화일 기법을 이용한 색인기법을 제안하고자 한다. 윈도우즈 XP에서 Delphi를 이용하여 구현하였으며, 방사선사는 방사선 의료영상에 대한 속성 정보, 부가적인 설명정보, 이미지 정보를 저장하도록 하고, 구축된 영상 데이터베이스로부터 속성정보와 텍스트 키워드 정보를 이용하여 검색 가능하도록 하였다. 임상방사선사가 단순속성정보 및 텍스트 설명정보를 찾아냄으로써 임상현장에서의 체계적인 교육뿐 만 아니라 지식을 구조화함으로써 교육시간의 단축과 방사선 의료영상에 대해 정확한 판단을 내릴 수 있다. 구현되어진 방사선 의료영상검색 시스템은 차후에 일반촬영, 특수조영영상을 포함한 통합화상시스템으로의 확장이 요구되며, 아울러 웹을 통한 서비스를 구축함으로써 의사결정시스템으로 발전 할 수 있는 기반기술로 기대된다.

  • PDF

텍스트마이닝을 이용한 약물유해반응 보고자료 분석 (Analysis of Adverse Drug Reaction Reports using Text Mining)

  • 김현희;유기연
    • 한국임상약학회지
    • /
    • 제27권4호
    • /
    • pp.221-227
    • /
    • 2017
  • Background: As personalized healthcare industry has attracted much attention, big data analysis of healthcare data is essential. Lots of healthcare data such as product labeling, biomedical literature and social media data are unstructured, extracting meaningful information from the unstructured text data are becoming important. In particular, text mining for adverse drug reactions (ADRs) reports is able to provide signal information to predict and detect adverse drug reactions. There has been no study on text analysis of expert opinion on Korea Adverse Event Reporting System (KAERS) databases in Korea. Methods: Expert opinion text of KAERS database provided by Korea Institute of Drug Safety & Risk Management (KIDS-KD) are analyzed. To understand the whole text, word frequency analysis are performed, and to look for important keywords from the text TF-IDF weight analysis are performed. Also, related keywords with the important keywords are presented by calculating correlation coefficient. Results: Among total 90,522 reports, 120 insulin ADR report and 858 tramadol ADR report were analyzed. The ADRs such as dizziness, headache, vomiting, dyspepsia, and shock were ranked in order in the insulin data, while the ADR symptoms such as vomiting, 어지러움, dizziness, dyspepsia and constipation were ranked in order in the tramadol data as the most frequently used keywords. Conclusion: Using text mining of the expert opinion in KIDS-KD, frequently mentioned ADRs and medications are easily recovered. Text mining in ADRs research is able to play an important role in detecting signal information and prediction of ADRs.