• Title/Summary/Keyword: text complexity

Search Result 109, Processing Time 0.02 seconds

Improving Lookup Time Complexity of Compressed Suffix Arrays using Multi-ary Wavelet Tree

  • Wu, Zheng;Na, Joong-Chae;Kim, Min-Hwan;Kim, Dong-Kyue
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.1
    • /
    • pp.1-4
    • /
    • 2009
  • In a given text T of size n, we need to search for the information that we are interested. In order to support fast searching, an index must be constructed by preprocessing the text. Suffix array is a kind of index data structure. The compressed suffix array (CSA) is one of the compressed indices based on the regularity of the suffix array, and can be compressed to the $k^{th}$ order empirical entropy. In this paper we improve the lookup time complexity of the compressed suffix array by using the multi-ary wavelet tree at the cost of more space. In our implementation, the lookup time complexity of the compressed suffix array is O(${\log}_{\sigma}^{\varepsilon/(1-{\varepsilon})}\;n\;{\log}_r\;\sigma$), and the space of the compressed suffix array is ${\varepsilon}^{-1}\;nH_k(T)+O(n\;{\log}\;{\log}\;n/{\log}^{\varepsilon}_{\sigma}\;n)$ bits, where a is the size of alphabet, $H_k$ is the kth order empirical entropy r is the branching factor of the multi-ary wavelet tree such that $2{\leq}r{\leq}\sqrt{n}$ and $r{\leq}O({\log}^{1-{\varepsilon}}_{\sigma}\;n)$ and 0 < $\varepsilon$ < 1/2 is a constant.

The Effects of Task Complexity for Text Summarization by Korean Adult EFL Learners

  • Lee, Haemoon;Park, Heesoo
    • Journal of English Language & Literature
    • /
    • v.57 no.6
    • /
    • pp.911-938
    • /
    • 2011
  • The present study examined the effect of two variables of task complexity, reasoning demand and time pressure, each from the resourcedirecting and resource-dispersing dimension in Robinson's (2001) framework of task classification. Reasoning demand was operationalized as the two types of texts to read and summarize, expository and argumentative. Time pressure was operationalized as the two modes of performance, oral and written. Six university students summarized the two types of text orally and twenty four students from the same school summarized them in the written form. Results from t test and ANCOVA showed that in the oral mode, reasoning demand tends to heighten the complexity of the language used in the summary in competition with accuracy but such an effect disappeared in the written mode. It was interpreted that the degree of time pressure is not the only difference between the oral and written modes but that the two modes may be fundamentally different cognitive tasks, and that Robinson's (2001) and Skehan's (1998) models were differentially supported by the oral mode of tasks but not by the written mode of the tasks.

노트수에 의한 프로그램 복잡성 개선

  • No, Cheol-U
    • ETRI Journal
    • /
    • v.5 no.3
    • /
    • pp.16-25
    • /
    • 1983
  • Increasing importance is being attached to the idea of measuring software characteristics. This paper deals with following things. First, a relation of program and flow graph is discussed. It describes a theoretic complexity measure and illustrates how it can be used to manage and control program complexity. Second, cyclomatic complexity measure is discussed. The complexity is independent of physical size and depends only on the decision structure of a program. Third, consider a knot which defines crossing point and provide the ordering of the nodes to make the transition from a two dimensional graph to a one dimensional program. A program modules that can improve FORTRAN IV program text is tested by knot counting and its control complexity is improved.

  • PDF

Speaker Identification using Phonetic GMM (음소별 GMM을 이용한 화자식별)

  • Kwon Sukbong;Kim Hoi-Rin
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.185-188
    • /
    • 2003
  • In this paper, we construct phonetic GMM for text-independent speaker identification system. The basic idea is to combine of the advantages of baseline GMM and HMM. GMM is more proper for text-independent speaker identification system. In text-dependent system, HMM do work better. Phonetic GMM represents more sophistgate text-dependent speaker model based on text-independent speaker model. In speaker identification system, phonetic GMM using HMM-based speaker-independent phoneme recognition results in better performance than baseline GMM. In addition to the method, N-best recognition algorithm used to decrease the computation complexity and to be applicable to new speakers.

  • PDF

The Prefix Array for Multimedia Information Retrieval in the Real-Time Stenograph (실시간 속기 자막 환경에서 멀티미디어 정보 검색을 위한 Prefix Array)

  • Kim, Dong-Joo;Kim, Han-Woo
    • Proceedings of the KIEE Conference
    • /
    • 2006.10c
    • /
    • pp.521-523
    • /
    • 2006
  • This paper proposes an algorithm and its data structure to support real-time full-text search for the streamed or broadcasted multimedia data containing real-time stenograph text. Since the traditional indexing method used at information retrieval area uses the linguistic information, there is a heavy cost. Therefore, we propose the algorithm and its data structure based on suffix array, which is a simple data structure and has low space complexity. Suffix array is useful frequently to search for huge text. However, subtitle text of multimedia data is to get longer by time. Therefore, suffix array must be reconstructed because subtitle text is continually changed. We propose the data structure called prefix array and search algorithm using it.

  • PDF

Text Extraction Algorithm in Complex Images using Adaptive Edge detection (복잡한 영상에서 적응적 에지검출을 이용한 텍스트 추출 알고리즘 연구)

  • Shin, Seong;Kim, Sung-Dong;Baek, Young-Hyun;Moon, Sung-Ryong
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.251-252
    • /
    • 2007
  • The thesis proposed the Text Extraction Algorithm which is a text extraction algorithm which uses the Coiflet Wavelet, YCbCr Color model and the close curve edge feature of adaptive LoG Operator in order to complement the demerit of the existing research which is weak in complexity of background, variety of light and disordered line and similarity of text and background color. This thesis is simulated with natural images which include naturally text area regardless of size, resolution and slant and so on of image. And the proposed algorithm is confirmed to an excellent by compared with an existing extraction algorithm in same image.

  • PDF

Arabic Text Clustering Methods and Suggested Solutions for Theme-Based Quran Clustering: Analysis of Literature

  • Bsoul, Qusay;Abdul Salam, Rosalina;Atwan, Jaffar;Jawarneh, Malik
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.4
    • /
    • pp.15-34
    • /
    • 2021
  • Text clustering is one of the most commonly used methods for detecting themes or types of documents. Text clustering is used in many fields, but its effectiveness is still not sufficient to be used for the understanding of Arabic text, especially with respect to terms extraction, unsupervised feature selection, and clustering algorithms. In most cases, terms extraction focuses on nouns. Clustering simplifies the understanding of an Arabic text like the text of the Quran; it is important not only for Muslims but for all people who want to know more about Islam. This paper discusses the complexity and limitations of Arabic text clustering in the Quran based on their themes. Unsupervised feature selection does not consider the relationships between the selected features. One weakness of clustering algorithms is that the selection of the optimal initial centroid still depends on chances and manual settings. Consequently, this paper reviews literature about the three major stages of Arabic clustering: terms extraction, unsupervised feature selection, and clustering. Six experiments were conducted to demonstrate previously un-discussed problems related to the metrics used for feature selection and clustering. Suggestions to improve clustering of the Quran based on themes are presented and discussed.

Linguistic and Cognitive Factors that Affect Word Problem Solving (수학 문장제 해결에 영향을 주는 언어적.인지적 요인 -혼합물 문제를 중심으로-)

  • 김선희
    • Journal of Educational Research in Mathematics
    • /
    • v.14 no.3
    • /
    • pp.267-281
    • /
    • 2004
  • Many students feel the word problems are very difficult. This study analyzes the linguistic and cognitive factors that affect word problem solving so that we help students bring through the difficulty. There are a text base, a situation model, and a real world in the linguistic aspects. Students have a difficulty at the transition from text base to situation model(equation), and make lots of errors at the situation model. In the cognitive aspects, I investigated problem solving schemes, strategies, and complexity level. Students are likely to choose strategy by the contents which teacher instructed, but not by low complexity level, and mix up the amount of sugar and sugar water, and concentration. We can recognize how complex the types of word problems are to solve, which strategies students choose largely, and what errors that students make in the problem solving are.

  • PDF

Text Summarization on Large-scale Vietnamese Datasets

  • Ti-Hon, Nguyen;Thanh-Nghi, Do
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.4
    • /
    • pp.309-316
    • /
    • 2022
  • This investigation is aimed at automatic text summarization on large-scale Vietnamese datasets. Vietnamese articles were collected from newspaper websites and plain text was extracted to build the dataset, that included 1,101,101 documents. Next, a new single-document extractive text summarization model was proposed to evaluate this dataset. In this summary model, the k-means algorithm is used to cluster the sentences of the input document using different text representations, such as BoW (bag-of-words), TF-IDF (term frequency - inverse document frequency), Word2Vec (Word-to-vector), Glove, and FastText. The summary algorithm then uses the trained k-means model to rank the candidate sentences and create a summary with the highest-ranked sentences. The empirical results of the F1-score achieved 51.91% ROUGE-1, 18.77% ROUGE-2 and 29.72% ROUGE-L, compared to 52.33% ROUGE-1, 16.17% ROUGE-2, and 33.09% ROUGE-L performed using a competitive abstractive model. The advantage of the proposed model is that it can perform well with O(n,k,p) = O(n(k+2/p)) + O(nlog2n) + O(np) + O(nk2) + O(k) time complexity.

Text Region Detection using Adaptive Character-Edge Map From Natural Image (자연영상에서 적응적 문자-에지 맵을 이용한 텍스트 영역 검출)

  • Park, Jong-Cheon;Hwang, Dong-Guk;Jun, Byoung-Min
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.5
    • /
    • pp.1135-1140
    • /
    • 2007
  • This paper proposes an edge-based text region detection algorithm using the adaptive character-edge maps which are independent of the size of characters and the orientation of character string in natural images. First, labeled images are obtained from edge images and in order to search for characters, adaptive character-edge maps by way grammar are applied to labeled images. Next, selected label images are clustered as for distance of its neighbors. And then, text region candidates are obtained. Finally, text region candidates are verified by using the empirical rules and horizontal/vertical projection profiles based on the orientation of text region. As the results of experiments, a text region detection algorithm turned out to be robust in the matter of various character size, orientation, and the complexity of the background.

  • PDF