• Title/Summary/Keyword: Text data

Search Result 2,956, Processing Time 0.028 seconds

Lightweight Named Entity Extraction for Korean Short Message Service Text

  • Seon, Choong-Nyoung;Yoo, Jin-Hwan;Kim, Hark-Soo;Kim, Ji-Hwan;Seo, Jung-Yun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.5 no.3
    • /
    • pp.560-574
    • /
    • 2011
  • In this paper, we propose a hybrid method of Machine Learning (ML) algorithm and a rule-based algorithm to implement a lightweight Named Entity (NE) extraction system for Korean SMS text. NE extraction from Korean SMS text is a challenging theme due to the resource limitation on a mobile phone, corruptions in input text, need for extension to include personal information stored in a mobile phone, and sparsity of training data. The proposed hybrid method retaining the advantages of statistical ML and rule-based algorithms provides fully-automated procedures for the combination of ML approaches and their correction rules using a threshold-based soft decision function. The proposed method is applied to Korean SMS texts to extract person's names as well as location names which are key information in personal appointment management system. Our proposed system achieved 80.53% in F-measure in this domain, superior to those of the conventional ML approaches.

Competitive intelligence in Korean Ramen Market using Text Mining and Sentiment Analysis

  • Kim, Yoosin;Jeong, Seung Ryul
    • Journal of Internet Computing and Services
    • /
    • v.19 no.1
    • /
    • pp.155-166
    • /
    • 2018
  • These days, online media, such as blogospheres, online communities, and social networking sites, provides the uncountable user-generated content (UGC) to discover market intelligence and business insight with. The business has been interested in consumers, and constantly requires the approach to identify consumers' opinions and competitive advantage in the competing market. Analyzing consumers' opinion about oneself and rivals can help decision makers to gain in-depth and fine-grained understanding on the human and social behavioral dynamics underlying the competition. In order to accomplish the comparison study for rival products and companies, we attempted to do competitive analysis using text mining with online UGC for two popular and competing ramens, a market leader and a market follower, in the Korean instant noodle market. Furthermore, to overcome the lack of the Korean sentiment lexicon, we developed the domain specific sentiment dictionary of Korean texts. We gathered 19,386 pieces of blogs and forum messages, developed the Korean sentiment dictionary, and defined the taxonomy for categorization. In the context of our study, we employed sentiment analysis to present consumers' opinion and statistical analysis to demonstrate the differences between the competitors. Our results show that the sentiment portrayed by the text mining clearly differentiate the two rival noodles and convincingly confirm that one is a market leader and the other is a follower. In this regard, we expect this comparison can help business decision makers to understand rich in-depth competitive intelligence hidden in the social media.

Analysis and Localization of freeWAIS-sf (FreeWAIS-sf의 분석 및 한글화)

  • O, Jeong-Seok;Kim, Ji-Seung;Lee, Jun-Ho;Lee, Sang-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.5 no.5
    • /
    • pp.611-618
    • /
    • 1999
  • An efficient and effective access to needed information becomes an important factor in the modern information society. Many people have developed information retrieval (IR) systems that retrieve needed information from a large amount of data at a given time. However, most freely available IR systems have been developed for English text rather than for Korean text. In this research, we have analyzed the IR system freeWAIS-sf, and localized it with the Korean morphological analyzer, namely HAM. The localized freeWAIS-sf can handle both English text and Korean text simultaneously. We have also modified the weighting scheme of freeWAIS-sf. The experimental result shows that the modified weighting scheme outperforms the original one in terms of retrieval effectiveness.

A Text Mining Approach to the Comparative Analysis of the Blockchain Issues : South Korea and the United States (텍스트 마이닝을 활용한 블록체인 이슈 분석 : 한국과 미국)

  • Shon, Saeah;Jeon, Byeong-Jin;Kim, Hee-Woong
    • Journal of Information Technology Services
    • /
    • v.18 no.1
    • /
    • pp.45-61
    • /
    • 2019
  • Blockchain technology, which enables transparent transactions among individuals without central control, opens up diverse business possibilities. It is also expected that blockchain will have a ripple effect on the entire area of society including finance, manufacturing, distribution, and the public sector. Previous studies related to the blockchain also deals with its functional features and application to industrial and public fields. In the new technology such as blockchain, it is necessary to know what social perception is in order to create technological development environment, but there is a lack of research on it. Therefore, this study aims to find out the implications for industrial and policy direction by analyzing issues related to the blockchain in South Korea and the US through text mining. From these two countries, we collected text data related to blockchain in online communities and internet articles. Then, we did co-occurrence analysis and topic modeling on them respectively. As a result of this study, we have found common points and differences in keywords and topics extracted from social media in the two countries. Based on them, we can offer helpful suggestions for building a sound blockchain ecosystem, and directions for future research.

Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text

  • Atwan, Jaffar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.65-74
    • /
    • 2022
  • In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf's law, and Combined Stop-list. An experiment was conducted using a selected file from the Arabic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.

On the Analysis of Natural Language Processing Morphology for the Specialized Corpus in the Railway Domain

  • Won, Jong Un;Jeon, Hong Kyu;Kim, Min Joong;Kim, Beak Hyun;Kim, Young Min
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.189-197
    • /
    • 2022
  • Today, we are exposed to various text-based media such as newspapers, Internet articles, and SNS, and the amount of text data we encounter has increased exponentially due to the recent availability of Internet access using mobile devices such as smartphones. Collecting useful information from a lot of text information is called text analysis, and in order to extract information, it is performed using technologies such as Natural Language Processing (NLP) for processing natural language with the recent development of artificial intelligence. For this purpose, a morpheme analyzer based on everyday language has been disclosed and is being used. Pre-learning language models, which can acquire natural language knowledge through unsupervised learning based on large numbers of corpus, are a very common factor in natural language processing recently, but conventional morpheme analysts are limited in their use in specialized fields. In this paper, as a preliminary work to develop a natural language analysis language model specialized in the railway field, the procedure for construction a corpus specialized in the railway field is presented.

Violation Pattern Analysis for Good Manufacturing Practice for Medicine using t-SNE Based on Association Rule and Text Mining (우수 의약품 제조 기준 위반 패턴 인식을 위한 연관규칙과 텍스트 마이닝 기반 t-SNE분석)

  • Jun-O, Lee;So Young, Sohn
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.4
    • /
    • pp.717-734
    • /
    • 2022
  • Purpose: The purpose of this study is to effectively detect violations that occur simultaneously against Good Manufacturing Practice, which were concealed by drug manufacturers. Methods: In this study, we present an analysis framework for analyzing regulatory violation patterns using Association Rule Mining (ARM), Text Mining, and t-distributed Stochastic Neighbor Embedding (t-SNE) to increase the effectiveness of on-site inspection. Results: A number of simultaneous violation patterns was discovered by applying Association Rule Mining to FDA's inspection data collected from October 2008 to February 2022. Among them there were 'concurrent violation patterns' derived from similar regulatory ranges of two or more regulations. These patterns do not help to predict violations that simultaneously appear but belong to different regulations. Those unnecessary patterns were excluded by applying t-SNE based on text-mining. Conclusion: Our proposed approach enables the recognition of simultaneous violation patterns during the on-site inspection. It is expected to decrease the detection time by increasing the likelihood of finding intentionally concealed violations.

The Effects of Consumers' Mask Selection Criteria on Mask Brand Awareness and Purchase Intention for Fashion Masks (마스크 선택기준이 브랜드 인지와 패션 마스크 구매의도에 미치는 영향)

  • Kim, Min Su;Lee, Ha Kyung;Kim, Hanna
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.46 no.1
    • /
    • pp.116-131
    • /
    • 2022
  • This study used text mining to analyze big data to understand consumers' demand for and perceptions of fashion masks. Based on the text-mining analysis results, a survey was conducted with those living in Korea to investigate the influence of consumers' mask selection criteria on mask brand awareness and purchase intention for fashion masks. "Fashion mask" and "functional mask" were used as the keywords in a text-mining analysis, and an online survey of 242 respondents was conducted. The analysis results were as follows: First, the text-mining analysis extracted commonly appearing words that had a high frequency and TF-IDF, such as "COVID-19," "fashion," "celebrity," "antibacterial," and "filter." This confirmed that during the COVID-19 pandemic, consumers have demanded masks that are both functional and fashionable. Second, among consumers' mask selection criteria, trend and design had positive effects on face-mask brand awareness. Third, face-mask brand awareness had a positive effect on the purchase intention for both brand and fashion masks, and the purchase intention for brand masks had a positive effect on the purchase intention for fashion masks.

Validity of Language-Based Algorithms Trained on Supervisor Feedback Language for Predicting Interpersonal Fairness in Performance Feedback

  • Jisoo Ock;Joyce S. Pang
    • Asia pacific journal of information systems
    • /
    • v.33 no.4
    • /
    • pp.1118-1134
    • /
    • 2023
  • Previous research has shown that employees tend to react more positively to corrective feedback from supervisors to the extent they perceive that they were treated with empathy, respect, and concern towards fair interpersonal treatment in receiving the feedback information. Then, to facilitate effective supervisory feedback and coaching, it would be useful for organizations to monitor the contents of feedback exchanges between supervisors and employees to make sure that supervisors are providing performance feedback using languages that are more likely to be perceived as interpersonally fair. Computer-aided text analysis holds potential as a useful tool that organizations can use to efficiently monitor the quality of the feedback messages that supervisors provide to their employees. In the current study, we applied computer-aided text analysis (using closed-vocabulary text analysis) and machine learning to examine the validity of language-based algorithms trained on supervisor language in performance feedback situations for predicting human ratings of feedback interpersonal fairness. Results showed that language-based algorithms predicted feedback interpersonal fairness with reasonable level of accuracy. Our findings provide supportive evidence for the promise of using employee language data for managing (and improving) performance management in organizations.

Text Network Analysis of Korean Trade Stakeholder's Interactions - A Focus on the Trade Ministry and the Legislature (통상 이해관계자 간 상호작용 관련 텍스트 네트워크 분석(TNA) - 한국 통상부처와 입법부 관계를 중심으로)

  • Bomin Ko
    • Korea Trade Review
    • /
    • v.45 no.6
    • /
    • pp.23-43
    • /
    • 2020
  • This study aims at analyzing the interactions between two of the most significant trade stakeholders in Korea, the Trade Ministry and the Legislature, using text network analysis. Tackling seven Action and Plan Reports for Requests from Parliamentary Inspection released by the National Assembly, this paper conducts a topic modelling analysis, particularly focusing on the reports for the three trade-related institutes: the MOTIE headquarter, Korea Trade Insurance Corporation, Korea Trade and Investment Promotion Agency. According to the analysis, such traditional topics of the MOTIE as enterprise, industry, business, management, development were frequently appeared in the reports. Trade-related topics including export, trade, commerce, investment, overseas, domestic, dispute, cooperation, efficiency, negotiation, service, promotion were repeatedly shown. Lastly, a case study on 2019 Parliamentary Inspection Report showed specific trade-related topics and relevant contents that raised issues in that year. This analysis implies that the text data driven from the Parliamentary Inspection Reports between the MOTIE and the National Assembly, can be established as so called 'trade policy information system' which are valuable not only for the two but also the rest of the trade stakeholders in Korea.