Search | Korea Science

Neural Text Categorizer for Exclusive Text Categorization

Jo, Tae-Ho
- Journal of Information Processing Systems
- /
- v.4 no.2
- /
- pp.77-86
- /
- 2008
This research proposes a new neural network for text categorization which uses alternative representations of documents to numerical vectors. Since the proposed neural network is intended originally only for text categorization, it is called NTC (Neural Text Categorizer) in this research. Numerical vectors representing documents for tasks of text mining have inherently two main problems: huge dimensionality and sparse distribution. Although many various feature selection methods are developed to address the first problem, the reduced dimension remains still large. If the dimension is reduced excessively by a feature selection method, robustness of text categorization is degraded. Even if SVM (Support Vector Machine) is tolerable to huge dimensionality, it is not so to the second problem. The goal of this research is to address the two problems at same time by proposing a new representation of documents and a new neural network using the representation for its input vector.
https://doi.org/10.3745/JIPS.2008.4.2.077 인용 PDF KSCI

Research on Keyword-Overlap Similarity Algorithm Optimization in Short English Text Based on Lexical Chunk Theory

Na Li;Cheng Li;Honglie Zhang
- Journal of Information Processing Systems
- /
- v.19 no.5
- /
- pp.631-640
- /
- 2023
Short-text similarity calculation is one of the hot issues in natural language processing research. The conventional keyword-overlap similarity algorithms merely consider the lexical item information and neglect the effect of the word order. And some of its optimized algorithms combine the word order, but the weights are hard to be determined. In the paper, viewing the keyword-overlap similarity algorithm, the short English text similarity algorithm based on lexical chunk theory (LC-SETSA) is proposed, which introduces the lexical chunk theory existing in cognitive psychology category into the short English text similarity calculation for the first time. The lexical chunks are applied to segment short English texts, and the segmentation results demonstrate the semantic connotation and the fixed word order of the lexical chunks, and then the overlap similarity of the lexical chunks is calculated accordingly. Finally, the comparative experiments are carried out, and the experimental results prove that the proposed algorithm of the paper is feasible, stable, and effective to a large extent.
https://doi.org/10.3745/JIPS.02.0205 인용 PDF

Effects of Medium Experience on Medium Perception and Communication Process (텍스트매체 사용에 있어서 매체 경험이 매체 인지와 의사소통과정에 미치는 영향)

Yang, Jae-Ho;Lee, Hyun-Kyu;Suh, Kil-Soo
- Asia pacific journal of information systems
- /
- v.9 no.3
- /
- pp.1-23
- /
- 1999
The objective of this study is to examine the media richness theory and the social information processing model by analyzing the effect of media experience on media perception and communication process. To accomplish this objective, a laboratory experiment was conducted. The independent variable was text medium experience and a face-to-face medium was added as a control group. The dependent variables were medium perception and communication process. Medium perception includes perceived richness, medium feeling, task satisfaction, and communication satisfaction. Communication processes were also analyzed to compare each treatment group. The results can be summarized into two facts. First, face-to-face group showed higher perceived richness than text medium group. And experienced text medium group perceived their text medium richer than inexperienced text medium group. Second, experienced text medium groups showed more interactions between subjects than inexperienced text medium group. Experienced text medium group also showed more agreements and meta-communication which could be found in face-to-face group. The result of this study supported media richness theory by finding that face-to-face medium was perceived richer than text medium, And the results also proved social information processing model by comparing experienced text medium group and inexperienced text medium group. The text medium, although thought to be the leanest one, could be perceived richer if users had lots of experience on it.
PDF

Patent Document Similarity Based on Image Analysis Using the SIFT-Algorithm and OCR-Text

Park, Jeong Beom;Mandl, Thomas;Kim, Do Wan
- International Journal of Contents
- /
- v.13 no.4
- /
- pp.70-79
- /
- 2017
Images are an important element in patents and many experts use images to analyze a patent or to check differences between patents. However, there is little research on image analysis for patents partly because image processing is an advanced technology and typically patent images consist of visual parts as well as of text and numbers. This study suggests two methods for using image processing; the Scale Invariant Feature Transform(SIFT) algorithm and Optical Character Recognition(OCR). The first method which works with SIFT uses image feature points. Through feature matching, it can be applied to calculate the similarity between documents containing these images. And in the second method, OCR is used to extract text from the images. By using numbers which are extracted from an image, it is possible to extract the corresponding related text within the text passages. Subsequently, document similarity can be calculated based on the extracted text. Through comparing the suggested methods and an existing method based only on text for calculating the similarity, the feasibility is achieved. Additionally, the correlation between both the similarity measures is low which shows that they capture different aspects of the patent content.
https://doi.org/10.5392/IJoC.2017.13.4.070 인용 PDF KSCI

Impact of Instance Selection on kNN-Based Text Categorization

Barigou, Fatiha
- Journal of Information Processing Systems
- /
- v.14 no.2
- /
- pp.418-434
- /
- 2018
With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction.
https://doi.org/10.3745/JIPS.02.0080 인용 PDF KSCI

Real Scene Text Image Super-Resolution Based on Multi-Scale and Attention Fusion

Xinhua Lu;Haihai Wei;Li Ma;Qingji Xue;Yonghui Fu
- Journal of Information Processing Systems
- /
- v.19 no.4
- /
- pp.427-438
- /
- 2023
Plenty of works have indicated that single image super-resolution (SISR) models relying on synthetic datasets are difficult to be applied to real scene text image super-resolution (STISR) for its more complex degradation. The up-to-date dataset for realistic STISR is called TextZoom, while the current methods trained on this dataset have not considered the effect of multi-scale features of text images. In this paper, a multi-scale and attention fusion model for realistic STISR is proposed. The multi-scale learning mechanism is introduced to acquire sophisticated feature representations of text images; The spatial and channel attentions are introduced to capture the local information and inter-channel interaction information of text images; At last, this paper designs a multi-scale residual attention module by skillfully fusing multi-scale learning and attention mechanisms. The experiments on TextZoom demonstrate that the model proposed increases scene text recognition's (ASTER) average recognition accuracy by 1.2% compared to text super-resolution network.
https://doi.org/10.3745/JIPS.02.0199 인용 PDF

Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text

Atwan, Jaffar
- International Journal of Computer Science & Network Security
- /
- v.22 no.7
- /
- pp.65-74
- /
- 2022
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf's law, and Combined Stop-list. An experiment was conducted using a selected file from the Arabic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
https://doi.org/10.22937/IJCSNS.2022.22.7.9 인용 PDF KSCI

A Study of Korean Soft-keyboard Layout for One Finger Text Entry (한 손가락 문자 입력을 위한 한글 Soft-keyboard 배열에 관한 연구)

Kong, Byung-Don;Hong, Seung-Kweon;Jo, Seong-Sik;Myung, Ro-Hae
- IE interfaces
- /
- v.22 no.4
- /
- pp.329-335
- /
- 2009
Recently, the use of soft-keyboard is widespread and increases, because various handheld devices were developed such as PDA, navigation, mobile phones with enhanced competence of touchscreen. The use of soft-keyboard requires different characteristics compared to traditional hard-keyboard like QWERTY keyboard: no standard character layout, one finger entry, and cognitive processing time. In this study, therefore, the optimal soft-keyboard layout for one finger text entry in touchscreen environment was investigated among 6 keyboard layouts which were developed based on traditional characteristic of Korean text and the usage frequency of both vowels and consonants. As a result, the interface with Korean text invention order like 'ㄱㄴㄷㄹ' or 'ㅏㅑㅓㅕㅕ' was found to be better than the interface with usage frequency-based arrangement. Especially the vowels were most efficient when separated into two parts; located at the right-hand side and at right below the consonants. In conclusion, the keyboard layout with regard to the Korean text characteristic and the invention order was a more effective layout resulted from the minimum cognitive processing time.
PDF KSCI

A Comparative Study of Word Embedding Models for Arabic Text Processing

Assiri, Fatmah;Alghamdi, Nuha
- International Journal of Computer Science & Network Security
- /
- v.22 no.8
- /
- pp.399-403
- /
- 2022
Natural texts are analyzed to obtain their intended meaning to be classified depending on the problem under study. One way to represent words is by generating vectors of real values to encode the meaning; this is called word embedding. Similarities between word representations are measured to identify text class. Word embeddings can be created using word2vec technique. However, recently fastText was implemented to provide better results when it is used with classifiers. In this paper, we will study the performance of well-known classifiers when using both techniques for word embedding with Arabic dataset. We applied them to real data collected from Wikipedia, and we found that both word2vec and fastText had similar accuracy with all used classifiers.
https://doi.org/10.22937/IJCSNS.2022.22.8.50 인용 PDF KSCI

Analysis of LinkedIn Jobs for Finding High Demand Job Trends Using Text Processing Techniques

Kazi, Abdul Karim;Farooq, Muhammad Umer;Fatima, Zainab;Hina, Saman;Abid, Hasan
- International Journal of Computer Science & Network Security
- /
- v.22 no.10
- /
- pp.223-229
- /
- 2022
LinkedIn is one of the most job hunting and career-growing applications in the world. There are a lot of opportunities and jobs available on LinkedIn. According to statistics, LinkedIn has 738M+ members. 14M+ open jobs on LinkedIn and 55M+ Companies listed on this mega-connected application. A lot of vacancies are available daily. LinkedIn data has been used for the research work carried out in this paper. This in turn can significantly tackle the challenges faced by LinkedIn and other job posting applications to improve the levels of jobs available in the industry. This research introduces Text Processing in natural language processing on datasets of LinkedIn which aims to find out the jobs that appear most in a month or/and year. Therefore, the large data became renewed into the required or needful source. This study thus uses Multinomial Naïve Bayes and Linear Support Vector Machine learning algorithms for text classification and developed a trained multilingual dataset. The results indicate the most needed job vacancies in any field. This will help students, job seekers, and entrepreneurs with their career decisions
https://doi.org/10.22937/IJCSNS.2022.22.10.29 인용 PDF KSCI

Search Result 1,191, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)