• Title/Summary/Keyword: text complexity

Search Result 109, Processing Time 0.026 seconds

A Novel VLSI Architecture for Parallel Adaptive Dictionary-Base Text Compression (가변 적응형 사전을 이용한 텍스트 압축방식의 병렬 처리를 위한 VLSI 구조)

  • Lee, Yong-Doo;Kim, Hie-Cheol;Kim, Jung-Gyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.6
    • /
    • pp.1495-1507
    • /
    • 1997
  • Among a number of approaches to text compression, adaptive dictionary schemes based on a sliding window have been very frequently used due to their high performance. The LZ77 algorithm is the most efficient algorithm which implements such adaptive schemes for the practical use of text compression. This paperpresents a VLSI architecture designed for processing the LZ77 algorithm in parallel. Compared with the other VLSI architectures developed so far, the proposed architecture provides the more viable solution to high performance with regard to its throughput, efficient implementation of the VLSI systolic arrays, and hardware scalability. Indeed, without being affected by the size of the sliding window, our system has the complexity of O(N) for both the compression and decompression and also requires small wafer area, where N is the size of the input text.

  • PDF

An Interactive Hangul Text Entry Method Using The Numeric Phone Keypad (전화기 숫자 자판을 이용한 대화형 한글 문자 입력 방법)

  • Park, Jae-Hwa
    • The KIPS Transactions:PartB
    • /
    • v.14B no.5
    • /
    • pp.391-400
    • /
    • 2007
  • An interactive Hangul input method using the numeric phone keypad, which is applicable for mobile devices is introduced. In the proposed method, user only selects the corresponding keys by single tapping, for the alphabet of Korean letter which is desired to enter. The interface generates the subset of eligible letters for the key sequence, then the user selects the desired letter in the set. Such an interactive approach transforms the text entry interface into a multi-level interactive letter-oriented style, from the preexisting passive and single-level alphabet-oriented interface. The annoyance of key-operations, the major disadvantage of the previous methods, derived from multi-tap to clear the ambiguity of multi-assigned alphabets for the Hangul automata, can be eliminated permanently, while the additional letter selection procedure to finalize the desired letter is essential. Also the complexity of Hangul text entry is reduced since all letters can be compounded from basic alphabet selection of the writing sequence order. The advantage and disadvantage of the proposed method are analyzed through comparing with pre-existing method by experiments.

A MVC Framework for Visualizing Text Data (텍스트 데이터 시각화를 위한 MVC 프레임워크)

  • Choi, Kwang Sun;Jeong, Kyo Sung;Kim, Soo Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.39-58
    • /
    • 2014
  • As the importance of big data and related technologies continues to grow in the industry, it has become highlighted to visualize results of processing and analyzing big data. Visualization of data delivers people effectiveness and clarity for understanding the result of analyzing. By the way, visualization has a role as the GUI (Graphical User Interface) that supports communications between people and analysis systems. Usually to make development and maintenance easier, these GUI parts should be loosely coupled from the parts of processing and analyzing data. And also to implement a loosely coupled architecture, it is necessary to adopt design patterns such as MVC (Model-View-Controller) which is designed for minimizing coupling between UI part and data processing part. On the other hand, big data can be classified as structured data and unstructured data. The visualization of structured data is relatively easy to unstructured data. For all that, as it has been spread out that the people utilize and analyze unstructured data, they usually develop the visualization system only for each project to overcome the limitation traditional visualization system for structured data. Furthermore, for text data which covers a huge part of unstructured data, visualization of data is more difficult. It results from the complexity of technology for analyzing text data as like linguistic analysis, text mining, social network analysis, and so on. And also those technologies are not standardized. This situation makes it more difficult to reuse the visualization system of a project to other projects. We assume that the reason is lack of commonality design of visualization system considering to expanse it to other system. In our research, we suggest a common information model for visualizing text data and propose a comprehensive and reusable framework, TexVizu, for visualizing text data. At first, we survey representative researches in text visualization era. And also we identify common elements for text visualization and common patterns among various cases of its. And then we review and analyze elements and patterns with three different viewpoints as structural viewpoint, interactive viewpoint, and semantic viewpoint. And then we design an integrated model of text data which represent elements for visualization. The structural viewpoint is for identifying structural element from various text documents as like title, author, body, and so on. The interactive viewpoint is for identifying the types of relations and interactions between text documents as like post, comment, reply and so on. The semantic viewpoint is for identifying semantic elements which extracted from analyzing text data linguistically and are represented as tags for classifying types of entity as like people, place or location, time, event and so on. After then we extract and choose common requirements for visualizing text data. The requirements are categorized as four types which are structure information, content information, relation information, trend information. Each type of requirements comprised with required visualization techniques, data and goal (what to know). These requirements are common and key requirement for design a framework which keep that a visualization system are loosely coupled from data processing or analyzing system. Finally we designed a common text visualization framework, TexVizu which is reusable and expansible for various visualization projects by collaborating with various Text Data Loader and Analytical Text Data Visualizer via common interfaces as like ITextDataLoader and IATDProvider. And also TexVisu is comprised with Analytical Text Data Model, Analytical Text Data Storage and Analytical Text Data Controller. In this framework, external components are the specifications of required interfaces for collaborating with this framework. As an experiment, we also adopt this framework into two text visualization systems as like a social opinion mining system and an online news analysis system.

Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification (공격 메일 식별을 위한 비정형 데이터를 사용한 유전자 알고리즘 기반의 특징선택 알고리즘)

  • Hong, Sung-Sam;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.1-10
    • /
    • 2019
  • Since big-data text mining extracts many features and data, clustering and classification can result in high computational complexity and low reliability of the analysis results. In particular, a term document matrix obtained through text mining represents term-document features, but produces a sparse matrix. We designed an advanced genetic algorithm (GA) to extract features in text mining for detection model. Term frequency inverse document frequency (TF-IDF) is used to reflect the document-term relationships in feature extraction. Through a repetitive process, a predetermined number of features are selected. And, we used the sparsity score to improve the performance of detection model. If a spam mail data set has the high sparsity, detection model have low performance and is difficult to search the optimization detection model. In addition, we find a low sparsity model that have also high TF-IDF score by using s(F) where the numerator in fitness function. We also verified its performance by applying the proposed algorithm to text classification. As a result, we have found that our algorithm shows higher performance (speed and accuracy) in attack mail classification.

Text Mining of Online News, Social Media, and Consumer Review on Artificial Intelligence Service (인공지능 서비스에 대한 온라인뉴스, 소셜미디어, 소비자리뷰 텍스트마이닝)

  • Li, Xu;Lim, Hyewon;Yeo, Harim;Hwang, Hyesun
    • Human Ecology Research
    • /
    • v.59 no.1
    • /
    • pp.23-43
    • /
    • 2021
  • This study looked through the text mining analysis to check the status of the virtual assistant service, and explore the needs of consumers, and present consumer-oriented directions. Trendup 4.0 was used to analyze the keywords of AI services in Online News and social media from 2016 to 2020. The R program was used to collect consumer comment data and implement Topic Modeling analysis. According to the analysis, the number of mentions of AI services in mass media and social media has steadily increased. The Sentimental Analysis showed consumers were feeling positive about AI services in terms of useful and convenient functional and emotional aspects such as pleasure and interest. However, consumers were also experiencing complexity and difficulty with AI services and had concerns and fears about the use of AI services in the early stages of their introduction. The results of the consumer review analysis showed that there were topics(Technical Requirements) related to technology and the access process for the AI services to be provided, and topics (Consumer Request) expressed negative feelings about AI services, and topics(Consumer Life Support Area) about specific functions in the use of AI services. Text mining analysis enable this study to confirm consumer expectations or concerns about AI service, and to examine areas of service support that consumers experienced. The review data on each platform also revealed that the potential needs of consumers could be met by expanding the scope of support services and applying platform-specific strengths to provide differentiated services.

Character Region Detection in Natural Image Using Edge and Connected Component by Morphological Reconstruction (에지 및 형태학적 재구성에 의한 연결요소를 이용한 자연영상의 문자영역 검출)

  • Gwon, Gyo-Hyeon;Park, Jong-Cheon;Jun, Byoung-Min
    • Journal of Korea Entertainment Industry Association
    • /
    • v.5 no.1
    • /
    • pp.127-133
    • /
    • 2011
  • Characters in natural image are an important information with various context. Previous work of character region detection algorithms is not detect of character region in case of image complexity and the surrounding lighting, similar background to character, so this paper propose an method of character region detection in natural image using edge and connected component by morphological reconstructions. Firstly, we detect edge using Canny-edge detector and connected component with local min/max value by morphological reconstructed-operation in gray-scale image, and labeling each of detected connected component elements. lastly, detected candidate of text regions was merged for generation for one candidate text region, Final text region detected by checking the similarity and adjacency of neighbor of text candidate individual character. As the results of experiments, proposed algorithm improved the correctness of character regions detection using edge and connected components.

The Task-Based Approach to Website Complexity and The Role of e-Tutor in e-Learning Process (e-러닝 학습자 만족을 이끄는 것은 무엇인가? 지각된 웹사이트 복잡성(Perceived Website Complexity)과 e-튜터(e-Tutor)의 역할)

  • Lee, Jae-Beom;Rho, Mi-Jung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.8
    • /
    • pp.2780-2792
    • /
    • 2010
  • In this study, we examine what components of e-learning environment affect e-learners' satisfaction. We focus on the task based approach to perceived website complexity(PWC). We study about the role of e-tutor using the internet, telephone, text message and e-mail etc. To test our model, we collected 235 data from online learners of Korea Culture & Content Agency using survey method. The research was conducted by SPSS15.0. Our results show that the relationship between PWC and e-learner satisfaction was negative. The rules of e-tutor are supporting e-learning service and facilitating recommendation intention. This study provides implications to design future e-learning service, understand user's herd behavior and evaluate learning process developed.

Visualization Techniques for Massive Source Code (대용량 소스코드 시각화기법 연구)

  • Seo, Dong-Su
    • The Journal of Korean Association of Computer Education
    • /
    • v.18 no.4
    • /
    • pp.63-70
    • /
    • 2015
  • Program source code is a set of complex syntactic information which are expressed in text forms, and contains complex logical structures. Structural and logical complexity inside source code become barriers in applying visualization techniques shown in traditional big-data approaches when the volume of source code become over ten-thousand lines of code. This paper suggests a procedure for making visualization of structural characteristics in source code. For this purpose, this paper defines internal data structures as well as inter-procedural relationships among functions. The paper also suggests a means of outlining the structural characteristics of source code by visualizing the source codes with network forms The result of the research work can be used as a means of controling and understanding the massive volume of source code.

Using Small Corpora of Critiques to Set Pedagogical Goals in First Year ESP Business English

  • Wang, Yu-Chi;Davis, Richard Hill
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.17-29
    • /
    • 2021
  • The current study explores small corpora of critiques written by Chinese and non-Chinese university students and how strategies used by these writers compare with high-rated L1 students. Data collection includes three small corpora of student writing; 20 student critiques in 2017, 23 student critiques from 2018, and 23 critiques from the online Michigan MICUSP collection at the University of Michigan. The researchers employ Text Inspector and Lexical Complexity to identify university students' vocabulary knowledge and awareness of syntactic complexity. In addition, WMatrix4® is used to identify and support the comparison of lexical and semantic differences among the three corpora. The findings indicate that gaps between Chinese and non-Chinese writers in the same university classes exist in students' knowledge of grammatical features and interactional metadiscourse. In addition, critiques by Chinese writers are more likely to produce shorter clauses and sentences. In addition, the mean value of complex nominal and coordinate phrases is smaller for Chinese students than for non-Chinese and MICUSP writers. Finally, in terms of lexical bundles, Chinese student writers prefer clausal bundles instead of phrasal bundles, which, according to previous studies, are more often found in texts of skilled writers. The current study's findings suggest incorporating implicit and explicit instruction through the implementation of corpora in language classrooms to advance skills and strategies of all, but particularly of Chinese writers of English.

The Impact of Online Reviews on Hotel Ratings through the Lens of Elaboration Likelihood Model: A Text Mining Approach

  • Qiannan Guo;Jinzhe Yan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.10
    • /
    • pp.2609-2626
    • /
    • 2023
  • The hotel industry is an example of experiential services. As consumers cannot fully evaluate the online review content and quality of their services before booking, they must rely on several online reviews to reduce their perceived risks. However, individuals face information overload owing to the explosion of online reviews. Therefore, consumer cognitive fluency is an individual's subjective experience of the difficulty in processing information. Information complexity influences the receiver's attitude, behavior, and purchase decisions. Individuals who cannot process complex information rely on the peripheral route, whereas those who can process more information prefer the central route. This study further discusses the influence of the complexity of review information on hotel ratings using online attraction review data retrieved from TripAdvisor.com. This study conducts a two-level empirical analysis to explore the factors that affect review value. First, in the Peripheral Route model, we introduce a negative binomial regression model to examine the impact of intuitive and straightforward information on hotel ratings. In the Central Route model, we use a Tobit regression model with expert reviews as moderator variables to analyze the impact of complex information on hotel ratings. According to the analysis, five-star and budget hotels have different effects on hotel ratings. These findings have immediate implications for hotel managers in terms of better identifying potentially valuable reviews.