• Title/Summary/Keyword: Text comparing

Search Result 270, Processing Time 0.026 seconds

SMS Text Messages Filtering using Word Embedding and Deep Learning Techniques (워드 임베딩과 딥러닝 기법을 이용한 SMS 문자 메시지 필터링)

  • Lee, Hyun Young;Kang, Seung Shik
    • Smart Media Journal
    • /
    • v.7 no.4
    • /
    • pp.24-29
    • /
    • 2018
  • Text analysis technique for natural language processing in deep learning represents words in vector form through word embedding. In this paper, we propose a method of constructing a document vector and classifying it into spam and normal text message, using word embedding and deep learning method. Automatic spacing applied in the preprocessing process ensures that words with similar context are adjacently represented in vector space. Additionally, the intentional word formation errors with non-alphabetic or extraordinary characters are designed to avoid being blocked by spam message filter. Two embedding algorithms, CBOW and skip grams, are used to produce the sentence vector and the performance and the accuracy of deep learning based spam filter model are measured by comparing to those of SVM Light.

RECENT RESEARCH AND DEVELOPING TREND OF ENGINEERING MANAGEMENT IN CHINA BASED ON TEXT MINING

  • Shaohua Jiang;Wenling Zhang;Zhaohong Qiu;Shaojun Wang
    • International conference on construction engineering and project management
    • /
    • 2009.05a
    • /
    • pp.814-820
    • /
    • 2009
  • With the rapid development of China economy, many engineering projects with large scale and investment were constructed in China and some were the biggest ones in the world. With the development of engineering practice, great progress in the research of engineering management of China was made and a large number of research findings were embodied in content of research papers and were represented by technical words. To know the state of arts in the research field of engineering management in China, three major parts, namely title, abstract and keywords of research papers in last five years from three representative Chinese journals about engineering management were chose as research materials. Unlike western languages, there are no delimiters between the words of Chinese, so the maximum matching and frequency statistics (MMFS) method, a text segmentation technique of text mining Chinese, was presented to extract the features consisting of technical words, phrases and words from the research materials. Recent research and developing trend of engineering management in China were found by comparing and analyzing the difference of technical words in the research materials of last five years.

  • PDF

Korean Middle School Students' Epistemic Ideas of Claim, Data, Evidence, and Argument When Evaluating and Critiquing Arguments (한국 중학생들의 주장, 자료, 근거와 과학 논의에 대한 인식론적 이해조사)

  • Ryu, Suna
    • Journal of The Korean Association For Science Education
    • /
    • v.35 no.2
    • /
    • pp.199-208
    • /
    • 2015
  • An enhanced understanding of the nature of scientific knowledge-what counts as a scientific argument and how scientists justify their claims with evidence-has been central in Korean science instruction. However, despite its importance, scholars are generally concerned about the difficulty of both addressing and improving students' epistemic understanding, especially for students of a young age. This study investigated Korean middle school students' epistemic ideas about claim, data, evidence, and argument when they engage in reading both text-based and data-inscription arguments. Compared to previous studies, Korean middle school students show a sophisticated understanding of the role of claim and evidence. Yet, these students think that there is only a single way of interpreting data. When comparing students' ideas from text-based and data-inscription arguments, the majority of Korean students barely perceive text description as evidence and recognize only measured data as evidence.

A Study Comparing the Han Period Bamboo Slats of the Beijing University Collection with the Laoguanshan Collection (북경대학 소장 한대의간(漢代醫簡)과 노관산 의간(老官山醫簡)의 비교 연구)

  • Kim, Beomsu;Kim, Kiwang
    • Journal of Korean Medical classics
    • /
    • v.36 no.1
    • /
    • pp.33-43
    • /
    • 2023
  • Objectives : Overlapping contents between two recently discovered Han period bamboo slats, the so-called "Beidahanjian" and the "Liushibingfang" have been identified. This study aims to present new knowledge that could be inferred from the concordance of these two texts. Methods : The most recent original texts of the medical part of the Beidahanjian and medical texts excavated from the Laoguanshan in addition to the Liushibingfang were compared with each other to determine identical parts. The meaning of these concordances was explored. Results : Identical sentences in two verses in the Beidahanjian and the Laoguanshan were identified. Conclusions : The Beidahanjian is a credible Western Han period text, of which the medical bamboo slats are likely to comprise an independent text that is a combination of ancient folk prescriptions and those of doctors.

Real-time Printed Text Detection System using Deep Learning Model (딥러닝 모델을 활용한 실시간 인쇄물 문자 탐지 시스템)

  • Ye-Jun Choi;Song-Won Kim;Mi-Kyeong Moon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.3
    • /
    • pp.523-530
    • /
    • 2024
  • Online, such as web pages and digital documents, have the ability to search for specific words or specific phrases that users want to search in real time. Printed materials such as printed books and reference books often have difficulty finding specific words or specific phrases in real time. This paper describes the development of a deep learning model for detecting text and a real-time character detection system using OCR for recognizing text. This study proposes a method of detecting text using the EAST model, a method of recognizing the detected text using EasyOCR, and a method of expressing the recognized text as a bounding box by comparing a specific word or specific phrase that the user wants to search for. Through this system, users expect to find specific words or phrases they want to search in real time in print, such as books and reference books, and find necessary information easily and quickly.

Color Recommendation for Text Based on Colors Associated with Words

  • Liba, Saki;Nakamura, Tetsuaki;Sakamoto, Maki
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.1
    • /
    • pp.21-29
    • /
    • 2012
  • In this paper, we propose a new method to select colors representing the meaning of text contents based on the cognitive relation between words and colors, Our method is designed on the previous study revealing the existence of crucial words to estimate the colors associated with the meaning of text contents, Using the associative probability of each color with a given word and the strength of color association of the word, we estimate the probability of colors associated with a given text. The goal of this study is to propose a system to recommend the cognitively plausible colors for the meaning of the input text. To build a versatile and efficient database used by our system, two psychological experiments were conducted by using news site articles. In experiment 1, we collected 498 words which were chosen by the participants as having the strong association with color. Subsequently, we investigated which color was associated with each word in experiment 2. In addition to those data, we employed the estimated values of the strength of color association and the colors associated with the words included in a very large corpus of newspapers (approximately 130,000 words) based on the similarity between the words obtained by Latent Semantic Analysis (LSA). Therefore our method allows us to select colors for a large variety of words or sentences. Finally, we verified that our system cognitively succeeded in proposing the colors associated with the meaning of the input text, comparing the correct colors answered by participants with the estimated colors by our method. Our system is expected to be of use in various types of situations such as the data visualization, the information retrieval, the art or web pages design, and so on.

On supporting full-text retrievals in XML query

  • Hong, Dong-Kweon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.7 no.4
    • /
    • pp.274-278
    • /
    • 2007
  • As XML becomes the standard of digital data exchange format we need to manage a lot of XML data effectively. Unlike tables in relational model XML documents are not structural. That makes it difficult to store XML documents as tables in relational model. To solve these problems there have been significant researches in relational database systems. There are two kinds of approaches: 1) One way is to decompose XML documents so that elements of XML match fields of relational tables. 2) The other one stores a whole XML document as a field of relational table. In this paper we adopted the second approach to store XML documents because sometimes it is not easy for us to decompose XML documents and in some cases their element order in documents are very meaningful. We suggest an efficient table schema to store only inverted index as tables to retrieve required data from XML data fields of relational tables and shows SQL translations that correspond to XML full-text retrievals. The functionalities of XML retrieval are based on the W3C XQuery which includes full-text retrievals. In this paper we show the superiority of our method by comparing the performances in terms of a response time and a space to store inverted index. Experiments show our approach uses less space and shows faster response times.

Consideration of a Robust Search Methodology that could be used in Full-Text Information Retrieval Systems (퍼지 논리를 이용한 사용자 중심적인 Full-Text 검색방법에 관한 연구)

  • Lee, Won-Bu
    • Asia pacific journal of information systems
    • /
    • v.1 no.1
    • /
    • pp.87-101
    • /
    • 1991
  • The primary purpose of this study was to investigate a robust search methodology that could be used in full-text information retrieval systems. A robust search methodology is one that can be easily used by a variety of users (particularly naive users) and it will give them comparable search performance regardless of their different expertise or interests In order to develop a possibly robust search methodology, a fully functional prototype of a fuzzy knowledge based information retrieval system was developed. Also, an experiment that used this prototype information retreival system was designed to investigate the performance of that search methodology over a small exploratory sample of user queries To probe the relatonships between the possibly robust search performance and the query organization using fuzzy inference logic, the search performance of a shallow query structure was analyzes. Consequently the following several noteworthy findings were obtained: 1) the hierachical(tree type) query structure might be a better query organization than the linear type query structure 2) comparing with the complex tree query structure, the simple tree query structure that has at most three levels of query might provide better search performance 3) the fuzzy search methodology that employs a proper levels of cut-off value might provide more efficient search performance than the boolean search methodology. Even though findings could not be statistically verified because the experiments were done using a single replication, it is worth noting however, that the research findings provided valuable information for developing a possibly robust search methodology in full-text information retrieval.

  • PDF

Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text

  • Atwan, Jaffar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.65-74
    • /
    • 2022
  • In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf's law, and Combined Stop-list. An experiment was conducted using a selected file from the Arabic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.

A comparative study on the mathematics curriculum of Korea and Japan in the last of 20 century (1) - focusing on 7he elementary school Mathematics curriculum mainly - (20세기 말 개정된 한국과 일본의 수학과 교육과정 비교(1) - 초등학교 수학과 교육과정을 중심으로 -)

  • 임문규
    • Journal of Educational Research in Mathematics
    • /
    • v.11 no.2
    • /
    • pp.257-271
    • /
    • 2001
  • This study investigated the new revised Mathematics curriculums of elementary schools in Korea and Japan at the end of the 20th century. The comparison was made especially with revising direction, purposes, and contents of elementary school mathematics curriculum in both countries. I began by comparing and confirming the ratio of instruction hours of Mathematics to the total instruction hours of all the subjects at as whole. This comparison was done of the elementary and middle school mathematics. The next part of the study was to compare in detail the purposes of revised mathematics in elementary and middle schools of both countries. Particular attentions was paid to the important revised points of Japanese elementary school Mathematics. Finally, I concluded by comparing the contents of elementary school Mathematics of the two countries. New mathematic text books in both countries having been published by revised curriculum, puts the future task in comparing, in detail, the concrete contents of each textbook.

  • PDF