• Title/Summary/Keyword: Science text

Search Result 3,733, Processing Time 0.034 seconds

An Automatic Text Categorization Theories and Techniques for Text Management (문서관리를 위한 자동문서범주화에 대한 이론 및 기법)

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Information Management
    • /
    • v.33 no.2
    • /
    • pp.19-32
    • /
    • 2002
  • With the growth of the digital library and the use of Internet, the amount of online text information has increased rapidly. The need for efficient data management and retrieval techniques has also become greater. An automatic text categorization system assigns text documents to predefined categories. The system allows to reduce the manual labor for text categorization. In order to classify text documents, the good features from the documents should be selected and the documents are indexed with the features. In this paper, each steps of text categorization and several techniques used in each step are introduced.

A Novel Video Image Text Detection Method

  • Zhou, Lin;Ping, Xijian;Gao, Haolin;Xu, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.3
    • /
    • pp.941-953
    • /
    • 2012
  • A novel and universal method of video image text detection is proposed. A coarse-to-fine text detection method is implemented. Firstly, the spectral clustering (SC) method is adopted to coarsely detect text regions based on the stationary wavelet transform (SWT). In order to make full use of the information, multi-parameters kernel function which combining the features similarity information and spatial adjacency information is employed in the SC method. Secondly, 28 dimension classifying features are proposed and support vector machine (SVM) is implemented to classify text regions with non-text regions. Experimental results on video images show the encouraging performance of the proposed algorithm and classifying features.

A Novel Video Image Text Detection Method

  • Zhou, Lin;Ping, Xijian;Gao, Haolin;Xu, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.4
    • /
    • pp.1140-1152
    • /
    • 2012
  • A novel and universal method of video image text detection is proposed. A coarse-to-fine text detection method is implemented. Firstly, the spectral clustering (SC) method is adopted to coarsely detect text regions based on the stationary wavelet transform (SWT). In order to make full use of the information, multi-parameters kernel function which combining the features similarity information and spatial adjacency information is employed in the SC method. Secondly, 28 dimension classifying features are proposed and support vector machine (SVM) is implemented to classify text regions with non-text regions. Experimental results on video images show the encouraging performance of the proposed algorithm and classifying features.

A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance

  • Li, Xu;Yao, Chunlong;Fan, Fenglong;Yu, Xiaoqiang
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.863-875
    • /
    • 2017
  • The traditional text similarity measurement methods based on word frequency vector ignore the semantic relationships between words, which has become the obstacle to text similarity calculation, together with the high-dimensionality and sparsity of document vector. To address the problems, the improved singular value decomposition is used to reduce dimensionality and remove noises of the text representation model. The optimal number of singular values is analyzed and the semantic relevance between words can be calculated in constructed semantic space. An inverted index construction algorithm and the similarity definitions between vectors are proposed to calculate the similarity between two documents on the semantic level. The experimental results on benchmark corpus demonstrate that the proposed method promotes the evaluation metrics of F-measure.

A Comparative Study of Word Embedding Models for Arabic Text Processing

  • Assiri, Fatmah;Alghamdi, Nuha
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.399-403
    • /
    • 2022
  • Natural texts are analyzed to obtain their intended meaning to be classified depending on the problem under study. One way to represent words is by generating vectors of real values to encode the meaning; this is called word embedding. Similarities between word representations are measured to identify text class. Word embeddings can be created using word2vec technique. However, recently fastText was implemented to provide better results when it is used with classifiers. In this paper, we will study the performance of well-known classifiers when using both techniques for word embedding with Arabic dataset. We applied them to real data collected from Wikipedia, and we found that both word2vec and fastText had similar accuracy with all used classifiers.

- For the Development of Inquiring, integrated Science Curricular Materials - The Comparison and Analysis of Inquiry Activity between "The FAST Program" and "The Secondary Science Books" (탐구적 통합 과학 교재 개발을 위한, "FAST program"과 "중등 과학 교과서"의 탐구 활동 비교 분석)

  • Son, Yeon-A;Lee, Hack-Dong
    • Journal of The Korean Association For Science Education
    • /
    • v.14 no.1
    • /
    • pp.45-57
    • /
    • 1994
  • The purpose of this study is to verify whether the FAST program is the Inquiry Science Curricular Materials, through the Comparison and Analysis of Inquiry Activities between the FAST program and our Secondary Science Books. The results of this study are as follows ; 1. FAST has 226 tasks of the Inquiry Activities, which is analyzed over two times than our text. 2. In level one, FAST holds the parts of Synthesizing Results and Evaluation, Hypothesizing and Designing an Experiment but u.ese aren't found in our text. 3. In level two, our text is analyzed No Discussion 72.2%, Demonstrating or Verifying the Content of the Text 82%, but FAST has Discussion Guided 81.8%, and isn't found any tesk of Demonstrating or Verifying the Content of the text. 4. In level three, our text is exposed a typical type I and analyzed Inquiry Index 15-25 ( Middle ), but FAST is found type IV, excepting Manipulating Apparatus and Observation and analyzed Inquiry Index over 35 ( Very - High ). Therefore, FAST Program is proved to be the desirable Inquiry Science Curricular Materials. In future, this worker is to arrange the results of the following paper as follows ; 1. The verification of the FAST Program by means of the Integrated Science Curricular Materials. 2. The development of the Inquiring, Integrated Science Curricular Materials through the results of the preceding study.

  • PDF

An Enhanced Text Mining Approach using Ensemble Algorithm for Detecting Cyber Bullying

  • Z.Sunitha Bai;Sreelatha Malempati
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.5
    • /
    • pp.1-6
    • /
    • 2023
  • Text mining (TM) is most widely used to process the various unstructured text documents and process the data present in the various domains. The other name for text mining is text classification. This domain is most popular in many domains such as movie reviews, product reviews on various E-commerce websites, sentiment analysis, topic modeling and cyber bullying on social media messages. Cyber-bullying is the type of abusing someone with the insulting language. Personal abusing, sexual harassment, other types of abusing come under cyber-bullying. Several existing systems are developed to detect the bullying words based on their situation in the social networking sites (SNS). SNS becomes platform for bully someone. In this paper, An Enhanced text mining approach is developed by using Ensemble Algorithm (ETMA) to solve several problems in traditional algorithms and improve the accuracy, processing time and quality of the result. ETMA is the algorithm used to analyze the bullying text within the social networking sites (SNS) such as facebook, twitter etc. The ETMA is applied on synthetic dataset collected from various data a source which consists of 5k messages belongs to bullying and non-bullying. The performance is analyzed by showing Precision, Recall, F1-Score and Accuracy.

Decrease of Protease-Resistant PrPSc Level in ScN2a Cells by Polyornithine and Polyhistidine

  • Waqas, Muhammad;Trinh, Huyen Trang;Lee, Sungeun;Kim, Dae-hwan;Lee, Sang Yeol;Choe, Kevin K.;Ryou, Chongsuk
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.12
    • /
    • pp.2141-2144
    • /
    • 2018
  • Based on previous studies reporting the anti-prion activity of poly-${\text\tiny{L}}$-lysine and poly-${\text\tiny{L}}$-arginine, we investigated cationic poly-${\text\tiny{L}}$-ornithine (PLO), poly-${\text\tiny{L}}$-histidine (PLH), anionic poly-${\text\tiny{L}}$-glutamic acid (PLE) and uncharged poly-${\text\tiny{L}}$-threonine (PLT) in cultured cells chronically infected by prions to determine their anti-prion efficacy. While PLE and PLT did not alter the level of $PrP^{Sc}$, PLO and PLH exhibited potent $PrP^{Sc}$ inhibition in ScN2a cells. These results suggest that the anti-prion activity of poly-basic amino acids is correlated with the cationicity of their functional groups. Comparison of anti-prion activity of PLO and PLH proposes that the anti-prion activity of poly-basic amino acids is associated with their acidic cellular compartments.

Academic Registration Text Classification Using Machine Learning

  • Alhawas, Mohammed S;Almurayziq, Tariq S
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.1
    • /
    • pp.93-96
    • /
    • 2022
  • Natural language processing (NLP) is utilized to understand a natural text. Text analysis systems use natural language algorithms to find the meaning of large amounts of text. Text classification represents a basic task of NLP with a wide range of applications such as topic labeling, sentiment analysis, spam detection, and intent detection. The algorithm can transform user's unstructured thoughts into more structured data. In this work, a text classifier has been developed that uses academic admission and registration texts as input, analyzes its content, and then automatically assigns relevant tags such as admission, graduate school, and registration. In this work, the well-known algorithms support vector machine SVM and K-nearest neighbor (kNN) algorithms are used to develop the above-mentioned classifier. The obtained results showed that the SVM classifier outperformed the kNN classifier with an overall accuracy of 98.9%. in addition, the mean absolute error of SVM was 0.0064 while it was 0.0098 for kNN classifier. Based on the obtained results, the SVM is used to implement the academic text classification in this work.

Automatic Text Categorization Using Term Information of Anchor Text (Anchor Text의 단어 정보를 이용한 자동 문서 범주화)

  • Heo, Hee-keun;Han, Gi-deok;Jung, Sung-won;Lim, Sung-shin;Kwon, Hyuk-chul
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.665-668
    • /
    • 2004
  • 최근의 웹 문서는 텍스트뿐만 아니라 이미지, 사운드 등 다른 여러 형태로 표현되고 있어서 텍스트의 비중이 낮아지고 있다. 그래서 문서 내에서 일정량 이상의 단어 추출이 어려운 문서들에 대해서 기존의 단어 정보만을 이용한 문서 범주화 방법은 좋은 성능을 기대할 수 없다. 그래서 본 논문은 Anchor Text 단어 정보의 자질 적합성 판단에 의한 새로운 자동 문서 범주화 모델을 제안한다. 문서 범주화 모델로는 베이지언 확률 모델을 이용하였으며, 카이제곱 통계량을 사용하여 자질을 선정하였다. 문서 내에서 추출된 단어 자질들이 해당 문서를 판단하는데 부족하다고 판단되면 문서의 링크정보를 이용하여 연결된 문서의 단어 자질과 Anchor Text의 단어 자질을 반영함으로써 성능을 향상시킨다.

  • PDF