Search | Korea Science

A Comparative Study of Word Embedding Models for Arabic Text Processing

Assiri, Fatmah;Alghamdi, Nuha
- International Journal of Computer Science & Network Security
- /
- v.22 no.8
- /
- pp.399-403
- /
- 2022
Natural texts are analyzed to obtain their intended meaning to be classified depending on the problem under study. One way to represent words is by generating vectors of real values to encode the meaning; this is called word embedding. Similarities between word representations are measured to identify text class. Word embeddings can be created using word2vec technique. However, recently fastText was implemented to provide better results when it is used with classifiers. In this paper, we will study the performance of well-known classifiers when using both techniques for word embedding with Arabic dataset. We applied them to real data collected from Wikipedia, and we found that both word2vec and fastText had similar accuracy with all used classifiers.
https://doi.org/10.22937/IJCSNS.2022.22.8.50 인용 PDF KSCI

An End-to-End Sequence Learning Approach for Text Extraction and Recognition from Scene Image

Lalitha, G.;Lavanya, B.
- International Journal of Computer Science & Network Security
- /
- v.22 no.7
- /
- pp.220-228
- /
- 2022
Image always carry useful information, detecting a text from scene images is imperative. The proposed work's purpose is to recognize scene text image, example boarding image kept on highways. Scene text detection on highways boarding's plays a vital role in road safety measures. At initial stage applying preprocessing techniques to the image is to sharpen and improve the features exist in the image. Likely, morphological operator were applied on images to remove the close gaps exists between objects. Here we proposed a two phase algorithm for extracting and recognizing text from scene images. In phase I text from scenery image is extracted by applying various image preprocessing techniques like blurring, erosion, tophat followed by applying thresholding, morphological gradient and by fixing kernel sizes, then canny edge detector is applied to detect the text contained in the scene images. In phase II text from scenery image recognized using MSER (Maximally Stable Extremal Region) and OCR; Proposed work aimed to detect the text contained in the scenery images from popular dataset repositories SVT, ICDAR 2003, MSRA-TD 500; these images were captured at various illumination and angles. Proposed algorithm produces higher accuracy in minimal execution time compared with state-of-the-art methodologies.
https://doi.org/10.22937/IJCSNS.2022.22.7.27 인용 PDF KSCI

Bibliometric Network Analysis on Low Cost Carrier Research (저가항공 관련 국내학술지 네트워크 텍스트 분석)

Rha, Jin-Sung;Choi, Dong-Hyun
- Journal of the Korean Society for Aviation and Aeronautics
- /
- v.23 no.1
- /
- pp.14-23
- /
- 2015
This study applied the network text analysis to reveal the scope and trends of low cost carrier studies. We analyzed low cost carrier research published in Korean journals and news articles. The results showed that there are three clusters in terms of research topics. First dimension consists of articles investigating growth in the low cost carrier industry. The second dimension is associated with service characteristics. The last dimension has strong ties organizational and human resource dimension. We run Krkwic, Krtitle, Netdraw, and Ucinet 6.0 to conduct the network text analysis. This study suggests the direction of low cost carrier research in the future.
https://doi.org/10.12985/ksaa.2015.23.1.014 인용 PDF KSCI

Hot Topic Discovery across Social Networks Based on Improved LDA Model

Liu, Chang;Hu, RuiLin
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.15 no.11
- /
- pp.3935-3949
- /
- 2021
With the rapid development of Internet and big data technology, various online social network platforms have been established, producing massive information every day. Hot topic discovery aims to dig out meaningful content that users commonly concern about from the massive information on the Internet. Most of the existing hot topic discovery methods focus on a single network data source, and can hardly grasp hot spots as a whole, nor meet the challenges of text sparsity and topic hotness evaluation in cross-network scenarios. This paper proposes a novel hot topic discovery method across social network based on an im-proved LDA model, which first integrates the text information from multiple social network platforms into a unified data set, then obtains the potential topic distribution in the text through the improved LDA model. Finally, it adopts a heat evaluation method based on the word frequency of topic label words to take the latent topic with the highest heat value as a hot topic. This paper obtains data from the online social networks and constructs a cross-network topic discovery data set. The experimental results demonstrate the superiority of the proposed method compared to baseline methods.
https://doi.org/10.3837/tiis.2021.11.004 인용 PDF KSCI HTML

An Enhanced Text Mining Approach using Ensemble Algorithm for Detecting Cyber Bullying

Z.Sunitha Bai;Sreelatha Malempati
- International Journal of Computer Science & Network Security
- /
- v.23 no.5
- /
- pp.1-6
- /
- 2023
Text mining (TM) is most widely used to process the various unstructured text documents and process the data present in the various domains. The other name for text mining is text classification. This domain is most popular in many domains such as movie reviews, product reviews on various E-commerce websites, sentiment analysis, topic modeling and cyber bullying on social media messages. Cyber-bullying is the type of abusing someone with the insulting language. Personal abusing, sexual harassment, other types of abusing come under cyber-bullying. Several existing systems are developed to detect the bullying words based on their situation in the social networking sites (SNS). SNS becomes platform for bully someone. In this paper, An Enhanced text mining approach is developed by using Ensemble Algorithm (ETMA) to solve several problems in traditional algorithms and improve the accuracy, processing time and quality of the result. ETMA is the algorithm used to analyze the bullying text within the social networking sites (SNS) such as facebook, twitter etc. The ETMA is applied on synthetic dataset collected from various data a source which consists of 5k messages belongs to bullying and non-bullying. The performance is analyzed by showing Precision, Recall, F1-Score and Accuracy.
https://doi.org/10.22937/IJCSNS.2023.23.5.1 인용 PDF

YOLO, EAST : Comparison of Scene Text Detection Performance, Using a Neural Network Model (YOLO, EAST: 신경망 모델을 이용한 문자열 위치 검출 성능 비교)

Park, Chan Yong;Lim, Young Min;Jeong, Seung Dae;Cho, Young Heuk;Lee, Byeong Chul;Lee, Gyu Hyun;Kim, Jin Wook
- KIPS Transactions on Software and Data Engineering
- /
- v.11 no.3
- /
- pp.115-124
- /
- 2022
In this paper, YOLO and EAST models are tested to analyze their performance in text area detecting for real-world and normal text images. The earl ier YOLO models which include YOLOv3 have been known to underperform in detecting text areas for given images, but the recently released YOLOv4 and YOLOv5 achieved promising performances to detect text area included in various images. Experimental results show that both of YOLO v4 and v5 models are expected to be widely used for text detection in the filed of scene text recognition in the future.
https://doi.org/10.3745/KTSDE.2022.11.3.115 인용 PDF KSCI

Analysis of Nursing Start-up Trends Using Text Network Analysis (텍스트 네트워크를 활용한 간호창업 연구동향 고찰)

Kim, Juhang
- Journal of the Korea Convergence Society
- /
- v.11 no.1
- /
- pp.359-367
- /
- 2020
The purpose of this study is to explore text data of nursing start-up. 55 literatures were extracted from MEDLINE, Embase and Cochrane Library Data BASE. Text network analysis applied by using python network program. Key words with highest frequency and degree centrality were 'business', 'care', 'nursing', 'healthcare', 'service'. Keywords with highest degree centrality were 'mission', 'vision', 'team'. Based on the results nursing entrepreneurship support should be provided to develop competitive nursing services reflecting the specificity and science of nursing, to strengthen business competencies essential for nursing entrepreneurship, to expand nursing expertise and to present role models. The result will serve a basement to development systematic educational program and theory in nursing start-up.
https://doi.org/10.15207/JKCS.2020.11.1.359 인용 PDF KSCI

Systematic network analysis of herb formula in Traditional East Asian Medicine discloses synergistic operation of medicinal herb pairs with statistical significance

Lee, Jungsul;Jeon, Jongwook;Choi, Chulhee
- CELLMED
- /
- v.5 no.2
- /
- pp.11.1-11.5
- /
- 2015
Traditional East Asian Medicine (TEAM) prescriptions typically consist of several herbs based on the assumption that the herbs operate synergistically and/or cooperate on several related pathways simultaneously. This is a general concept that is widely accepted in TEAM, but it has not been tested systematically. To check this assumption statistically, we have text mined traditional Korean medicine text the Inje-ji(仁濟志, Collections of benevolent savings), a text that contains more than 5000 herb-cocktail prescriptions. We created herb-pairing network based on herb-herb pairing specificity and performed a systematic network analysis. Herbs were shown to be used selectively with other herbs and not randomly. Moreover, herb pairs were more specifically associated with symptoms than were single herbs. Single herbs and combinations of herbs specifically used for diabetes mellitus were successfully identified. As conclusion, herb-pairings in TEAM are not randomly constructed; instead, each herb was selectively used with other herbs. In terms of statistical significance, herb pairs were more specifically associated with symptoms than were single herbs alone. Collectively, these results suggest that it may be important to understand the interactions among multiple ingredients contained in herb pairs rather than trying to identify a single compound to resolve symptoms.
https://doi.org/10.5667/tang.2014.0034 인용 PDF KSCI KPUBS

The Impact of Transforming Unstructured Data into Structured Data on a Churn Prediction Model for Loan Customers

Jung, Hoon;Lee, Bong Gyou
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.12
- /
- pp.4706-4724
- /
- 2020
With various structured data, such as the company size, loan balance, and savings accounts, the voice of customer (VOC), which is text data containing contact history and counseling details was analyzed in this study. To analyze unstructured data, the term frequency-inverse document frequency (TF-IDF) analysis, semantic network analysis, sentiment analysis, and a convolutional neural network (CNN) were implemented. A performance comparison of the models revealed that the predictive model using the CNN provided the best performance with regard to predictive power, followed by the model using the TF-IDF, and then the model using semantic network analysis. In particular, a character-level CNN and a word-level CNN were developed separately, and the character-level CNN exhibited better performance, according to an analysis for the Korean language. Moreover, a systematic selection model for optimal text mining techniques was proposed, suggesting which analytical technique is appropriate for analyzing text data depending on the context. This study also provides evidence that the results of previous studies, indicating that individual customers leave when their loyalty and switching cost are low, are also applicable to corporate customers and suggests that VOC data indicating customers' needs are very effective for predicting their behavior.
https://doi.org/10.3837/tiis.2020.12.005 인용 PDF KSCI HTML

Using Highly Secure Data Encryption Method for Text File Cryptography

Abu-Faraj, Mua'ad M.;Alqadi, Ziad A.
- International Journal of Computer Science & Network Security
- /
- v.21 no.12
- /
- pp.53-60
- /
- 2021
Many standard methods are used for secret text files and secrete short messages cryptography, these methods are efficient when the text to be encrypted is small, and the efficiency will rapidly decrease when increasing the text size, also these methods sometimes have a low level of security, this level will depend on the PK length and sometimes it may be hacked. In this paper, a new method will be introduced to improve the data protection level by using a changeable secrete speech file to generate PK. Highly Secure Data Encryption (HSDE) method will be implemented and tested for data quality levels to ensure that the HSDE destroys the data in the encryption phase, and recover the original data in the decryption phase. Some standard methods of data cryptography will be implemented; comparisons will be done to justify the enhancements provided by the proposed method.
https://doi.org/10.22937/IJCSNS.2021.21.12.8 인용 PDF KSCI

Search Result 1,111, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)