Search | Korea Science

Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text

Atwan, Jaffar
- International Journal of Computer Science & Network Security
- /
- v.22 no.7
- /
- pp.65-74
- /
- 2022
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf's law, and Combined Stop-list. An experiment was conducted using a selected file from the Arabic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
https://doi.org/10.22937/IJCSNS.2022.22.7.9 인용 PDF KSCI

Analysis of LinkedIn Jobs for Finding High Demand Job Trends Using Text Processing Techniques

Kazi, Abdul Karim;Farooq, Muhammad Umer;Fatima, Zainab;Hina, Saman;Abid, Hasan
- International Journal of Computer Science & Network Security
- /
- v.22 no.10
- /
- pp.223-229
- /
- 2022
LinkedIn is one of the most job hunting and career-growing applications in the world. There are a lot of opportunities and jobs available on LinkedIn. According to statistics, LinkedIn has 738M+ members. 14M+ open jobs on LinkedIn and 55M+ Companies listed on this mega-connected application. A lot of vacancies are available daily. LinkedIn data has been used for the research work carried out in this paper. This in turn can significantly tackle the challenges faced by LinkedIn and other job posting applications to improve the levels of jobs available in the industry. This research introduces Text Processing in natural language processing on datasets of LinkedIn which aims to find out the jobs that appear most in a month or/and year. Therefore, the large data became renewed into the required or needful source. This study thus uses Multinomial Naïve Bayes and Linear Support Vector Machine learning algorithms for text classification and developed a trained multilingual dataset. The results indicate the most needed job vacancies in any field. This will help students, job seekers, and entrepreneurs with their career decisions
https://doi.org/10.22937/IJCSNS.2022.22.10.29 인용 PDF KSCI

Evaluating the Characteristics of Subversive Basic Fashion Utilizing Text Mining Techniques (텍스트 마이닝(text mining) 기법을 활용한 서브버시브 베이식(subversive basics) 패션의 특성)

Minjung Im
- Journal of Fashion Business
- /
- v.27 no.5
- /
- pp.78-92
- /
- 2023
Fashion trends are actively disseminated through social media, which influences both their propagation and consumption. This study explored how users perceive subversive basic fashion in social media videos, by examining the associated concepts and characteristics. In addition, the factors contributing to the style's social media dissemination were identified and its distinctive features were analyzed. Through text mining analysis, 80 keywords were selected for semantic network and CONCOR analysis. TF-IDF and N-gram results indicate that subversive basic fashion involves transformative design techniques such as cutting or layering garments, emphasizing the body with thin fabrics, and creating bold visual effects. Topic modeling suggests that this fashion forms a subculture that resists mainstream norms, seeking individuality by creatively transforming the existing garments. CONCOR analysis categorized the style into six groups: forward-thinking unconventional fashion, bold and unique style, creative reworking, item utilization and combination, pursuit of easy and convenient fashion, and contemporary sensibility. Consumer actions, linked to social media, were shown to involve easily transforming and pursuing personalized styles. Furthermore, creating new styles through the existing clothing is seen as an economic and creative activity that fosters network formation and interaction. This study is significant as it addresses language expression limitations and subjectivity issues in fashion image analysis, revealing factors contributing to content reproduction through user-perceived design concepts and social media-conveyed fashion characteristics.
https://doi.org/10.12940/jfb.2023.27.5.78 인용 PDF

Analysis of Xiaomi Trends Using Big Data - Based on Customer Perception at Domestic and Global - (빅데이터를 활용한 샤오미 동향분석 - 국내외 고객인식을 바탕으로 -)

Eunji Lee;Jaeyoung Moon
- Journal of Korean Society for Quality Management
- /
- v.52 no.2
- /
- pp.323-340
- /
- 2024
Purpose: The purpose of this study was to propose useful suggestions by analyzing research Xiaomi which are big data analyses, by collecting data based on Customer Perception in Textom. Methods: The collected data through scraping social media on the Textom site. And data preprocessing was performed using deleting and organizing data(text) that are duplicated, irrelevant, and where there is no meaning. The derived data were analyzed using Textom and Ucinet 6.0 with Text Analysis, WordClould, TF-IDF, Network Analysis, and Emotional analysis. Results: The results of this study are as follows; although the results of Xiaomi's text at domestic and global were similar, it was analyzed that there were perceptions of Xiaomi-related smart home products and cost-effectiveness in Korea, while in foreign countries, there were perceptions of functions and performance centered on smartphones. At domestic and global, the perception of Xiaomi was analyzed to be positive, and implications were presented based on these analysis results. Conclusion: Based on the results, if the product's performance or product competitiveness is considered to be meaningful in the market, and it is expected that there will be an opportunity to change the overall image of Chinese products.
https://doi.org/10.7469/JKSQM.2024.52.2.323 인용 PDF

Topic Model Analysis of Research Trend on Spatial Big Data (공간빅데이터 연구 동향 파악을 위한 토픽모형 분석)

Lee, Won Sang;Sohn, So Young
- Journal of Korean Institute of Industrial Engineers
- /
- v.41 no.1
- /
- pp.64-73
- /
- 2015
Recent emergence of spatial big data attracts the attention of various research groups. This paper analyzes the research trend on spatial big data by text mining the related Scopus DB. We apply topic model and network analysis to the extracted abstracts of articles related to spatial big data. It was observed that optics, astronomy, and computer science are the major areas of spatial big data analysis. The major topics discovered from the articles are related to mobile/cloud/smart service of spatial big data in urban setting. Trends of discovered topics are provided over periods along with the results of topic network. We expect that uncovered areas of spatial big data research can be further explored.
https://doi.org/10.7232/JKIIE.2015.41.1.064 인용 PDF KSCI

Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks

Farhadipour, Aref;Veisi, Hadi;Asgari, Mohammad;Keyvanrad, Mohammad Ali
- ETRI Journal
- /
- v.40 no.5
- /
- pp.643-652
- /
- 2018
Dysarthria is a degenerative disorder of the central nervous system that affects the control of articulation and pitch; therefore, it affects the uniqueness of sound produced by the speaker. Hence, dysarthric speaker recognition is a challenging task. In this paper, a feature-extraction method based on deep belief networks is presented for the task of identifying a speaker suffering from dysarthria. The effectiveness of the proposed method is demonstrated and compared with well-known Mel-frequency cepstral coefficient features. For classification purposes, the use of a multi-layer perceptron neural network is proposed with two structures. Our evaluations using the universal access speech database produced promising results and outperformed other baseline methods. In addition, speaker identification under both text-dependent and text-independent conditions are explored. The highest accuracy achieved using the proposed system is 97.3%.
https://doi.org/10.4218/etrij.2017-0260 인용 PDF KSCI

Rapid and Brief Communication GPU implementation of neural networks

Oh, Kyoung-Su;Jung, Kee-Chul
- 한국HCI학회:학술대회논문집
- /
- 2007.02c
- /
- pp.322-325
- /
- 2007
Graphics processing unit (GPU) is used for a faster artificial neural network. It is used to implement the matrix multiplication of a neural network to enhance the time performance of a text detection system. Preliminary results produced a 20-fold performance enhancement using an ATI RADEON 9700 PRO board. The parallelism of a GPU is fully utilized by accumulating a lot of input feature vectors and weight vectors, then converting the many inner-product operations into one matrix operation. Further research areas include benchmarking the performance with various hardware and GPU-aware learning algorithms. (c) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
https://doi.org/10.1016/j.patcog.2004.01.013 인용 PDF

Robust Quick String Matching Algorithm for Network Security (네트워크 보안을 위한 강력한 문자열 매칭 알고리즘)

Lee, Jong Woock;Park, Chan Kil
- Journal of Korea Society of Digital Industry and Information Management
- /
- v.9 no.4
- /
- pp.135-141
- /
- 2013
String matching is one of the key algorithms in network security and many areas could be benefit from a faster string matching algorithm. Based on the most efficient string matching algorithm in sual applications, the Boyer-Moore (BM) algorithm, a novel algorithm called RQS is proposed. RQS utilizes an improved bad character heuristic to achieve bigger shift value area and an enhanced good suffix heuristic to dramatically improve the worst case performance. The two heuristics combined with a novel determinant condition to switch between them enable RQS achieve a higher performance than BM both under normal and worst case situation. The experimental results reveal that RQS appears efficient than BM many times in worst case, and the longer the pattern, the bigger the performance improvement. The performance of RQS is 7.57~36.34% higher than BM in English text searching, 16.26~26.18% higher than BM in uniformly random text searching, and 9.77% higher than BM in the real world Snort pattern set searching.
KSCI

Analysis of Laughter Therapy Trend Using Text Network Analysis and Topic Modeling

LEE, Do-Young
- Journal of Wellbeing Management and Applied Psychology
- /
- v.5 no.4
- /
- pp.33-37
- /
- 2022
Purpose: This study aims to understand the trend and central concept of domestic researches on laughter therapy. For the analysis, this study used total 72 theses verified by inputting the keyword 'laughter therapy' from 2007 to 2021. Research design, data and methodology: This study performed the development and analysis of keyword co-occurrence network, analyzed the types of researches through topic modeling, and verified the visualized word cloud and sociogram. The keyword data that was cleaned through preprocessing, was analyzed in the method of centrality analysis and topic modeling through the 1-mode matrix conversion process by using the NetMiner (version 4.4) Program. Results: The keywords that most appeared for last 14 years were laughter therapy, depression, the elderly, and stress. The five topics analyzed in thesis data from 2007 to 2021 were therapy, cognitive behavior, quality of life, stress, and the elderly. Conclusions: This study understood the flow and trend of research topics of domestic laughter therapy for last 14 years, and there should be continuous researches on laughter therapy, which reflects the flow of time in the future.
https://doi.org/10.13106/jwmap.2022.Vol5.no4.33 인용 PDF KSCI

Improving on Matrix Factorization for Recommendation Systems by Using a Character-Level Convolutional Neural Network (문자 수준 컨볼루션 뉴럴 네트워크를 이용한 추천시스템에서의 행렬 분해법 개선)

Son, Donghee;Shim, Kyuseok
- KIISE Transactions on Computing Practices
- /
- v.24 no.2
- /
- pp.93-98
- /
- 2018
Recommendation systems are used to provide items of interests for users to maximize a company's profit. Matrix factorization is frequently used by recommendation systems, based on an incomplete user-item rating matrix. However, as the number of items and users increase, it becomes difficult to make accurate recommendations due to the sparsity of data. To overcome this drawback, the use of text data related to items was recently suggested for matrix factorization algorithms. Furthermore, a word-level convolutional neural network was shown to be effective in the process of extracting the word-level features from the text data among these kinds of matrix factorization algorithms. However, it involves a large number of parameters to learn in the word-level convolutional neural network. Thus, we propose a matrix factorization algorithm which utilizes a character-level convolutional neural network with which to extract the character-level features from the text data. We also conducted a performance study with real-life datasets to show the effectiveness of the proposed matrix factorization algorithm.
https://doi.org/10.5626/KTCP.2018.24.2.93 인용 KSCI

Search Result 1,135, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)