• Title/Summary/Keyword: text

Search Result 13,381, Processing Time 0.041 seconds

Automatic Text Categorization using the Importance of Sentences (문장 중요도를 이용한 자동 문서 범주화)

  • Ko, Young-Joong;Park, Jin-Woo;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.6
    • /
    • pp.417-424
    • /
    • 2002
  • Automatic text categorization is a problem of assigning predefined categories to free text documents. In order to classify text documents, we have to extract good features from them. In previous researches, a text document is commonly represented by the frequency of each feature. But there is a difference between important and unimportant sentences in a text document. It has an effect on the importance of features in a text document. In this paper, we measure the importance of sentences in a text document using text summarizing techniques. A text document is represented by features with different weights according to the importance of each sentence. To verify the new method, we constructed Korean news group data set and experiment our method using it. We found that our new method gale a significant improvement over a basis system for our data sets.

Extending TextAE for annotation of non-contiguous entities

  • Lever, Jake;Altman, Russ;Kim, Jin-Dong
    • Genomics & Informatics
    • /
    • v.18 no.2
    • /
    • pp.15.1-15.6
    • /
    • 2020
  • Named entity recognition tools are used to identify mentions of biomedical entities in free text and are essential components of high-quality information retrieval and extraction systems. Without good entity recognition, methods will mislabel searched text and will miss important information or identify spurious text that will frustrate users. Most tools do not capture non-contiguous entities which are separate spans of text that together refer to an entity, e.g., the entity "type 1 diabetes" in the phrase "type 1 and type 2 diabetes." This type is commonly found in biomedical texts, especially in lists, where multiple biomedical entities are named in shortened form to avoid repeating words. Most text annotation systems, that enable users to view and edit entity annotations, do not support non-contiguous entities. Therefore, experts cannot even visualize non-contiguous entities, let alone annotate them to build valuable datasets for machine learning methods. To combat this problem and as part of the BLAH6 hackathon, we extended the TextAE platform to allow visualization and annotation of non-contiguous entities. This enables users to add new subspans to existing entities by selecting additional text. We integrate this new functionality with TextAE's existing editing functionality to allow easy changes to entity annotation and editing of relation annotations involving non-contiguous entities, with importing and exporting to the PubAnnotation format. Finally, we roughly quantify the problem across the entire accessible biomedical literature to highlight that there are a substantial number of non-contiguous entities that appear in lists that would be missed by most text mining systems.

An International Comparative Study on Home Economics Text Books of Middle School (중학교 가정교과서의 국제비교 연구)

  • 차미경;윤인경
    • Journal of Korean Home Economics Education Association
    • /
    • v.3 no.1
    • /
    • pp.113-129
    • /
    • 1991
  • This study was conducted to compare the outward aspects, objectives, and the contents of Home Economics text books of middle schools of Korea, Japan, U.S.A. and England. The results were summarized as follows. 1. The outward aspects of tex books: The Korean text books were small in size and the quality of paper was inferior to those of foreign countries. The Japanese text books were written by many authors, contained many lab works and data. Text books of U.S.A. were big in size made with good quality paper and contained many colour pictures. Text books England contained many problems and lab works. 2. Objectives of the Home Economics and Unit objectives: The objective of the subjects of Home Economics was written only in Korean text books. The unit objectives were described most concretely and detailedly in Korean text books comparing with other countries. 3. Contents: Korean text books covered all six areas of foods, clothings, housing, home management, family and occupation and theoretical explanations prevailed. Japanese text books contained numerous lab works, lacked two areas of home management and occupation, thecontents included a few practical lab works two areas of home management and occupation, the contents included a few practical lab works. In the text books of U.S.A. contained all six areas of Home Economics were covered and special emphasis was placed on self discovory and self development, and vocational guidance was also stressed. The text book of England contained only three areas of Home Economics, clothing, foods and housing; the number of area was limited but the basic theories of covered area was intended to lead to self comprehension through questions and lab works.

  • PDF

Effects of Dopants Introduced into the Poly-Si on the Formation of Ti-Silicides (Poly-Si에 첨가한 도펀트가 Titanium Silicides 형성에 미치는 영향 Ⅱ)

  • Ryu, Yeon-Soo;Choi, Jin-Seog;Paek, Su-Hyon
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.2
    • /
    • pp.73-80
    • /
    • 1990
  • The formation of Ti-silicides with the type of substrate, the species and the concentration of dopant, and the annealing temperature was investigated with sheet resistance and thickness measurement, elemental depth profilling, and microstructure. It was directly affected by the type of substrate, the species and the concentration of dopant, and the annealing temperature. For the amorphous Si substrate, the smothness of $TiSi_2/Si$ interface was increased. Above concentr-ation of $1{\times}10^{16}ions/cm^2$, the rate of $TiSi_2/Si$ formation was decreased and the sheet resistance was increased. The initial profile of dopant according to the implantation energy was one of the factors influencing the out-diffusion of dopant. In $POCI_3$ process, this was less than in ion implantation process.

  • PDF

HTML Text Extraction Using Frequency Analysis (빈도 분석을 이용한 HTML 텍스트 추출)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1135-1143
    • /
    • 2021
  • Recently, text collection using a web crawler for big data analysis has been frequently performed. However, in order to collect only the necessary text from a web page that is complexly composed of numerous tags and texts, there is a cumbersome requirement to specify HTML tags and style attributes that contain the text required for big data analysis in the web crawler. In this paper, we proposed a method of extracting text using the frequency of text appearing in web pages without specifying HTML tags and style attributes. In the proposed method, the text was extracted from the DOM tree of all collected web pages, the frequency of appearance of the text was analyzed, and the main text was extracted by excluding the text with high frequency of appearance. Through this study, the superiority of the proposed method was verified.

Enzymatic Synthesis of β-Glucosylglycerol and Its Unnatural Glycosides Via β-Glycosidase and Amylosucrase

  • Jung, Dong-Hyun;Seo, Dong-Ho;Park, Ji-Hae;Kim, Myo-Jung;Baek, Nam-In;Park, Cheon-Seok
    • Journal of Microbiology and Biotechnology
    • /
    • v.29 no.4
    • /
    • pp.562-570
    • /
    • 2019
  • ${\beta}$-Glucosylglycerol (${\beta}-GG$) and their derivatives have potential applications in food, cosmetics and the healthcare industry, including antitumor medications. In this study, ${\beta}-GG$ and its unnatural glycosides were synthesized through the transglycosylation of two enzymes, Sulfolobus shibatae ${\beta}$-glycosidase (SSG) and Deinococcus geothermalis amylosucrase (DGAS). SSG catalyzed a transglycosylation reaction with glycerol as an acceptor and cellobiose as a donor to produce 56% of ${\beta}-GGs$ [${\beta}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}1/3$)-$\text\tiny{D}$-glycerol and ${\beta}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}2$)-$\text\tiny{D}$-glycerol]. In the second transglycosylation reaction, ${\beta}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}1/3$)-$\text\tiny{D}$-glycerol was used as acceptor molecules of the DGAS reaction. As a result, 61% of ${\alpha}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}4$)-${\beta}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}1/3$)-$\text\tiny{D}$-glycerol and 28% of ${\alpha}$-$\text\tiny{D}$-maltopyranosyl-($1{\rightarrow}4$)-${\beta}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}1/3$)-$\text\tiny{D}$-glycerol were synthesized as unnatural glucosylglycerols. In conclusion, the combined enzymatic synthesis of the unnatural glycosides of ${\beta}-GG$ was established. The synthesis of these unnatural glycosides may provide an opportunity to discover new applications in the biotechnological industry.

A Tensor Space Model based Deep Neural Network for Automated Text Classification (자동문서분류를 위한 텐서공간모델 기반 심층 신경망)

  • Lim, Pu-reum;Kim, Han-joon
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.3-13
    • /
    • 2018
  • Text classification is one of the text mining technologies that classifies a given textual document into its appropriate categories and is used in various fields such as spam email detection, news classification, question answering, emotional analysis, and chat bot. In general, the text classification system utilizes machine learning algorithms, and among a number of algorithms, naïve Bayes and support vector machine, which are suitable for text data, are known to have reasonable performance. Recently, with the development of deep learning technology, several researches on applying deep neural networks such as recurrent neural networks (RNN) and convolutional neural networks (CNN) have been introduced to improve the performance of text classification system. However, the current text classification techniques have not yet reached the perfect level of text classification. This paper focuses on the fact that the text data is expressed as a vector only with the word dimensions, which impairs the semantic information inherent in the text, and proposes a neural network architecture based upon the semantic tensor space model.

Case Analysis of Bible Visualization based on Text Data Traits -Focused on Content, Structure, Quotation of Text- (텍스트 데이터의 특성에 따른 성경 시각화 사례 분석 -텍스트의 내용적, 구조적 특성 및 인용 정보를 중심으로-)

  • Kim, Hyoyoung;Park, Jin Wan
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.8
    • /
    • pp.83-92
    • /
    • 2013
  • Text visualization begins with understanding text itself which is material of visual expression. To visualize any text data, sufficient understanding about characteristics of the text first and the expressive approaches can be decided depending on the derived unique characteristics of the text. In this research we aimed to establish theoretical foundation about the approaches for text visualization by diverse examples of text visualization which are derived through the various characteristics of the text. To do this, we chose the 'Bible' text which is well known globally and digital data of it can be accessed easily and thus diverse text visualization examples exist and analyzed the examples of the bible text visualization. We derived the unique characteristics of text-content, structure, quotation- as criteria for analyzing and supported validity of analysis by adopting at least 2-3 examples for each criterion. In the result, we can comprehend that the goals and expressive approaches are decided depending on the unique characteristics of the Bible text. We expect to build theoretical method for choosing the materials and approaches by analyzing more diverse examples with various point of views on the basis of this research.

The Forming Mechanism of Brain Text and Brain Concept in the Theory of Ethical Literary Criticism (뇌텍스트(Brain Text) 및 뇌개념(Brain Concept)의 형성원리와 문학윤리학비평)

  • Nie, Zhenzhao;Yoon, Seokmin
    • Journal of Popular Narrative
    • /
    • v.25 no.1
    • /
    • pp.193-215
    • /
    • 2019
  • According to ethical literary criticism, every type of literature has its text. The original definition of oral literature refers to the literature disseminated orally. Before the dissemination, the text of oral literature is stored in the human brain, which is termed as "brain text". Brain text is the textual form used before the formation of writing symbols and its application to a recording of information, and it still exists after the creation of writing symbols. Other types of texts are written text and electronic text. Brain text consists of brain concepts, which, according to different sources, can be divided into objective concepts and abstractive concepts. Brain concepts are tools for thinking while thought comes from thinking with understanding and an application of brain concepts. Brain text is the carrier of thought. The termination of the synthesis of brain concepts signifies the completion of thinking, which produces thoughts to form brain text. Brain text determines thinking and behavioral patterns that not only communicate and spread information, but also decide our ideas, thoughts, judgments, choices, actions and emotions. Brain text is also a deciding factor for our lifestyle and moral behaviors. The nature of a person's brain text determines his thoughts and actions, and most importantly determines who he is.

Study of Analyzing Outcome of Building and Introducing System for Preserving Full-Text of e-Journal

  • Kim, Kwang-Young;Kim, Soon-Young;Kim, Hwan-Min
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.2 no.2
    • /
    • pp.5-16
    • /
    • 2012
  • Today, most researchers conduct their studies through the full-text of e-journals. Therefore, an important base for domestic development of science and technology is to obtain the full-text of quality e-journals by overseas researchers and to provide it to Korea's researchers. This study aims to build a system based on the National Archiving Center for the full-text of e-journals and to make a service system for providing them to the public by acquiring the full-text of quality overseas e-journals. To do this, an analysis was made of the outcome of introducing such a system for full-text of e-journals in comparison with the investment. As a result, 112 more institutions, that is, from 47 institutions to 159 institutions, have introduced the system as of 2012, and the number of downloaded full-texts increased at least 2.17 times.