• Title/Summary/Keyword: text information

Search Result 4,359, Processing Time 0.033 seconds

Guiding Practical Text Classification Framework to Optimal State in Multiple Domains

  • Choi, Sung-Pil;Myaeng, Sung-Hyon;Cho, Hyun-Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.3 no.3
    • /
    • pp.285-307
    • /
    • 2009
  • This paper introduces DICE, a Domain-Independent text Classification Engine. DICE is robust, efficient, and domain-independent in terms of software and architecture. Each module of the system is clearly modularized and encapsulated for extensibility. The clear modular architecture allows for simple and continuous verification and facilitates changes in multiple cycles, even after its major development period is complete. Those who want to make use of DICE can easily implement their ideas on this test bed and optimize it for a particular domain by simply adjusting the configuration file. Unlike other publically available tool kits or development environments targeted at general purpose classification models, DICE specializes in text classification with a number of useful functions specific to it. This paper focuses on the ways to locate the optimal states of a practical text classification framework by using various adaptation methods provided by the system such as feature selection, lemmatization, and classification models.

Identification of the Minimum Legible Text Size for Group-View Display of the Main Control Room in Radioactive Waste Facility

  • Jung, Kihyo;Lee, Baekhee;Chang, Yoon;Jung, Ilho;You, Heecheon
    • Journal of the Ergonomics Society of Korea
    • /
    • v.36 no.3
    • /
    • pp.213-219
    • /
    • 2017
  • Objective: The present study identified the minimum legible text size by an experiment for eight combinations of background and text colors, which will be used in designing visual information on group-view display (GVD). Background: Information on minimum legible text size is needed to design the visual information presented on GVD in a radioactive waste control room. Method: The experiment was conducted for 22 male participants (age: mean = 37, SD = 6.7; visual acuity: over 0.8) who were recruited by considering demographic characteristics of current control room operators. Eight combinations of background and text colors were considered and the minimum legible text size was determined for each combination by applying the method of limits, one of psychophysical methods. Results: The minimum legible text size was significantly different in accordance with the combination of background and text colors. Statistical analysis results showed that luminance contrast and color contrast between background and text influenced the minimum legible text sizes. Conclusion: This study concluded that the minimum legible text size is 8 minute of arc for various combinations of background and text colors. Application: The minimum legible text size identified in the present study can be utilized in designing visual information on GVD at the main control room in a radioactive waste facility.

The Effect of Text Information Frame Ratio and Font Size on the Text Readability of Circle Smartwatch

  • Park, Seungtaek;Park, Jaekyu;Choe, Jaeho;Jung, Eui S.
    • Journal of the Ergonomics Society of Korea
    • /
    • v.33 no.6
    • /
    • pp.499-513
    • /
    • 2014
  • Objective: The objective of this study was to examine frame ratio of text information and font size in the circle smartwatch. Background: Recently, electronic manufacturers try to develop the original metaphor of traditional wrist watch (circle) in terms of smartwatch. They endeavor to break the square display in order to improve emotional customer satisfaction. Method: The experiments examined twenty level of text information design, combinations of four frame ratios (1:1, 4:3, 16:9, 21:9) and five font sizes (6pt, 7pt, 8pt, 9pt, 10pt). Nineteen participants volunteered for the experiment. Dependent variables were WPM (Words per Minute), reading preference, design preference and total preference. Furthermore, small circle display was made by using circle display data (1.3inch), which was exhibited in IFA (International Funkausstellung) 2014. Results: As a result, ANOVA (Analysis of Variance) revealed that WPM, and task time preference affect the specific frame ratio and font size. Results of ANOVA for reading preference, design preference, total preference were grouped by post-analysis LSD (Least Significant Difference). Among users, display ratio (16:9, 21:9), and font size (9pt) were preferred. In conclusion, 16:9 display ratio and 9pt are adaptable for text information in 1.3inch circle display. Conclusion: From the study, it is shown that 16:9 display ratio and 9pt size are more adaptable for text information in 1.3inch circle display than others. It is mainly due to the fact that the order of frame ratio and font size may affect the usability of reading long text information in a small circle display. Therefore, when developers design a circle display, the square frame ratio and font size are required to be considered according to circle size. Application: The 16:9 display ratio and 9pt font size may be utilized as a text information frame in the circle display design guideline for smartwatch.

Representation of Texts into String Vectors for Text Categorization

  • Jo, Tae-Ho
    • Journal of Computing Science and Engineering
    • /
    • v.4 no.2
    • /
    • pp.110-127
    • /
    • 2010
  • In this study, we propose a method for encoding documents into string vectors, instead of numerical vectors. A traditional approach to text categorization usually requires encoding documents into numerical vectors. The usual method of encoding documents therefore causes two main problems: huge dimensionality and sparse distribution. In this study, we modify or create machine learning-based approaches to text categorization, where string vectors are received as input vectors, instead of numerical vectors. As a result, we can improve text categorization performance by avoiding these two problems.

Design and Implementation of a Text-to Speech System using the Prosody and Duration Information (운율 및 길이 정보를 이용한 무제한 음성 합성기의 설계 및 구현)

  • Yang, Jin-Seok;Kim, Jae-Beom;Lee, Jeong-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.5
    • /
    • pp.1121-1129
    • /
    • 1996
  • To produce more natural speech in a Text-to-Speech system, the processing of the prosody and duration must be processing in advance, and then extracted the prosody and duration information by means of trial-and-error experiments. In this paper, a method is proposed to improve the naturalness in a Text-to Speech system using this information. As the results, the Text-to-Speech system proposed and implemented in this paper showed more natural speech synthesis than the systems, which do not use this information, did.

  • PDF

Arabic Handwritten Manuscripts Text Recognition: A Systematic Review

  • Alghamdi, Arwa;Alluhaybi, Dareen;Almehmadi, Doaa;Alameer, Khadijah;Siddeq, Sundos Bin;Alsubait, Tahani
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.11
    • /
    • pp.319-323
    • /
    • 2022
  • Handwritten text recognition is one of the active research areas nowadays. The progress in this field differs in every language. For example, the progress in Arabic handwritten text recognition is still insignificant and needs more attentions and efforts. One of the most important fields in this is Arabic handwritten manuscript text recognition which focuses in extracting text from historical manuscripts. For eons, ancients used manuscripts to write everything. Nowadays, there are millions of manuscripts all around the world. There are two main challenges in dealing with these manuscripts. The first one is that they are at the risk of damage since they are written in primitive materials, the second challenge is due to the difference in writing styles, hence most people are unable to read these manuscripts easily. Therefore, we discuss in this study different papers that are related to this important research field.

An Automatic Text Categorization Theories and Techniques for Text Management (문서관리를 위한 자동문서범주화에 대한 이론 및 기법)

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Information Management
    • /
    • v.33 no.2
    • /
    • pp.19-32
    • /
    • 2002
  • With the growth of the digital library and the use of Internet, the amount of online text information has increased rapidly. The need for efficient data management and retrieval techniques has also become greater. An automatic text categorization system assigns text documents to predefined categories. The system allows to reduce the manual labor for text categorization. In order to classify text documents, the good features from the documents should be selected and the documents are indexed with the features. In this paper, each steps of text categorization and several techniques used in each step are introduced.

Automatic In-Text Keyword Tagging based on Information Retrieval

  • Kim, Jin-Suk;Jin, Du-Seok;Kim, Kwang-Young;Choe, Ho-Seop
    • Journal of Information Processing Systems
    • /
    • v.5 no.3
    • /
    • pp.159-166
    • /
    • 2009
  • As shown in Wikipedia, tagging or cross-linking through major keywords in a document collection improves not only the readability of documents but also responsive and adaptive navigation among related documents. In recent years, the Semantic Web has increased the importance of social tagging as a key feature of the Web 2.0 and, as its crucial phenotype, Tag Cloud has emerged to the public. In this paper we provide an efficient method of automated in-text keyword tagging based on large-scale controlled term collection or keyword dictionary, where the computational complexity of O(mN) - if a pattern matching algorithm is used - can be reduced to O(mlogN) - if an Information Retrieval technique is adopted - while m is the length of target document and N is the total number of candidate terms to be tagged. The result shows that automatic in-text tagging with keywords filtered by Information Retrieval speeds up to about 6 $\sim$ 40 times compared with the fastest pattern matching algorithm.

Applying Academic Theory with Text Mining to Offer Business Insight: Illustration of Evaluating Hotel Service Quality

  • Choong C. Lee;Kun Kim;Haejung Yun
    • Asia pacific journal of information systems
    • /
    • v.29 no.4
    • /
    • pp.615-643
    • /
    • 2019
  • Now is the time for IS scholars to demonstrate the added value of academic theory through its integration with text mining, clearly outline how to implement this for text mining experts outside of the academic field, and move towards establishing this integration as a standard practice. Therefore, in this study we develop a systematic theory-based text-mining framework (TTMF), and illustrate the use and benefits of TTMF by conducting a text-mining project in an actual business case evaluating and improving hotel service quality using a large volume of actual user-generated reviews. A total of 61,304 sentences extracted from actual customer reviews were successfully allocated to SERVQUAL dimensions, and the pragmatic validity of our model was tested by the OLS regression analysis results between the sentiment scores of each SERVQUAL dimension and customer satisfaction (star rates), and showed significant relationships. As a post-hoc analysis, the results of the co-occurrence analysis to define the root causes of positive and negative service quality perceptions and provide action plans to implement improvements were reported.

A Novel Text Sample Selection Model for Scene Text Detection via Bootstrap Learning

  • Kong, Jun;Sun, Jinhua;Jiang, Min;Hou, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.771-789
    • /
    • 2019
  • Text detection has been a popular research topic in the field of computer vision. It is difficult for prevalent text detection algorithms to avoid the dependence on datasets. To overcome this problem, we proposed a novel unsupervised text detection algorithm inspired by bootstrap learning. Firstly, the text candidate in a novel form of superpixel is proposed to improve the text recall rate by image segmentation. Secondly, we propose a unique text sample selection model (TSSM) to extract text samples from the current image and eliminate database dependency. Specifically, to improve the precision of samples, we combine maximally stable extremal regions (MSERs) and the saliency map to generate sample reference maps with a double threshold scheme. Finally, a multiple kernel boosting method is developed to generate a strong text classifier by combining multiple single kernel SVMs based on the samples selected from TSSM. Experimental results on standard datasets demonstrate that our text detection method is robust to complex backgrounds and multilingual text and shows stable performance on different standard datasets.