• Title/Summary/Keyword: Text Mining Method

Search Result 451, Processing Time 0.029 seconds

A Study on the Failure Experiences of Online Fashion Shopping Mall Startups -Applying Text Mining and Grounded Theory- (온라인 패션 쇼핑몰 창업의 실패 경험에 관한 연구 -텍스트 마이닝과 근거이론을 적용하여-)

  • Min Jeong Seo
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.47 no.6
    • /
    • pp.1096-1112
    • /
    • 2023
  • Many entrepreneurs who launched online fashion shopping malls faced failure compared to those who achieved success. Recognizing the importance of research that reflects reality, this study explores entrepreneurs' experiences during the failure process of online fashion shopping malls. Two studies utilized YouTube videos documenting such online fashion shopping malls' failure. Study 1 employed text mining techniques, including high-frequency analysis and topic modeling, while Study 2 used a qualitative research method, specifically grounded theory. Study 1 identified the prominent experiences of operating online fashion shopping malls, while Study 2 provided a holistic perspective on the failure processes. The integrated findings from both studies highlight that entrepreneurs' passion for fashion motivates them to establish online fashion shopping malls, yet they encounter numerous challenges during the operational process. Insufficient business preparation and operational capabilities contribute to their failure to achieve financial goals. Despite efforts to boost sales and profit, entrepreneurs often close their businesses due to inadequate funds and waning motivation. The outcomes of this study can inform us about the operational challenges faced by online fashion shopping malls and offer valuable insights for developing new strategies to sustain and improve them.

Visualization of University Curriculum for Multidisciplinary Learning: A Case Study of Yonsei University, South Korea

  • Geonsik Yu;Sunju Park
    • Journal of Information Science Theory and Practice
    • /
    • v.12 no.1
    • /
    • pp.77-86
    • /
    • 2024
  • As the significance of knowledge convergence continues to grow, universities are making efforts to develop methods that promote multidisciplinary learning. To address this educational challenge, our paper applies network theory and text mining techniques to analyze university curricula and introduces a graphical syllabus rendering method. Visualizing the course curriculum provides a macro and structured perspective for individuals seeking alternative educational pathways within the existing system. By visualizing the relationships among courses, students can explore different combinations of courses with comprehensive search support. To illustrate our approach, we conduct a detailed demonstration using the syllabus database of Yonsei University. Through the application of our methods, we create visual course networks that reveal the underlying structure of the university curriculum. Our results yield insights into the interconnectedness of courses across various academic majors at Yonsei University. We present both macro visualizations, covering 18 academic majors, and visualizations for a few selected majors. Our analysis using Yonsei University's database not only showcases the value of our methodology but also serves as a practical example of how our approach can facilitate multidisciplinary learning.

100 Article Paper Text Minning Data Analysis and Visualization in Web Environment (웹 환경에서 100 논문에 대한 텍스트 마이닝, 데이터 분석과 시각화)

  • Li, Xiaomeng;Li, Jiapei;Lee, HyunChang;Shin, SeongYoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.157-158
    • /
    • 2017
  • There is a method to analyze the big data of the article and text mining by using Python language. And Python is a kind of programming language and it is easy to operating. Reaserch and use Python to creat a Web environment that the research result of the analysis can show directly on the browser. In this thesis, there are 100 article paper frrom Altmetric, Altmetric tracks a range of sources to capture. It is necessary to collect and analyze the big data use an effictive method, After the result coming out, Use Python wordcloud to make a directive image that can show the highest frequency of words.

  • PDF

Automated Data Extraction from Unstructured Geotechnical Report based on AI and Text-mining Techniques (AI 및 텍스트 마이닝 기법을 활용한 지반조사보고서 데이터 추출 자동화)

  • Park, Jimin;Seo, Wanhyuk;Seo, Dong-Hee;Yun, Tae-Sup
    • Journal of the Korean Geotechnical Society
    • /
    • v.40 no.4
    • /
    • pp.69-79
    • /
    • 2024
  • Field geotechnical data are obtained from various field and laboratory tests and are documented in geotechnical investigation reports. For efficient design and construction, digitizing these geotechnical parameters is essential. However, current practices involve manual data entry, which is time-consuming, labor-intensive, and prone to errors. Thus, this study proposes an automatic data extraction method from geotechnical investigation reports using image-based deep learning models and text-mining techniques. A deep-learning-based page classification model and a text-searching algorithm were employed to classify geotechnical investigation report pages with 100% accuracy. Computer vision algorithms were utilized to identify valid data regions within report pages, and text analysis was used to match and extract the corresponding geotechnical data. The proposed model was validated using a dataset of 205 geotechnical investigation reports, achieving an average data extraction accuracy of 93.0%. Finally, a user-interface-based program was developed to enhance the practical application of the extraction model. It allowed users to upload PDF files of geotechnical investigation reports, automatically analyze these reports, and extract and edit data. This approach is expected to improve the efficiency and accuracy of digitizing geotechnical investigation reports and building geotechnical databases.

A Study on Text Pattern Analysis Applying Discrete Fourier Transform - Focusing on Sentence Plagiarism Detection - (이산 푸리에 변환을 적용한 텍스트 패턴 분석에 관한 연구 - 표절 문장 탐색 중심으로 -)

  • Lee, Jung-Song;Park, Soon-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.22 no.2
    • /
    • pp.43-52
    • /
    • 2017
  • Pattern Analysis is One of the Most Important Techniques in the Signal and Image Processing and Text Mining Fields. Discrete Fourier Transform (DFT) is Generally Used to Analyzing the Pattern of Signals and Images. We thought DFT could also be used on the Analysis of Text Patterns. In this Paper, DFT is Firstly Adapted in the World to the Sentence Plagiarism Detection Which Detects if Text Patterns of a Document Exist in Other Documents. We Signalize the Texts Converting Texts to ASCII Codes and Apply the Cross-Correlation Method to Detect the Simple Text Plagiarisms such as Cut-and-paste, term Relocations and etc. WordNet is using to find Similarities to Detect the Plagiarism that uses Synonyms, Translations, Summarizations and etc. The Data set, 2013 Corpus, Provided by PAN Which is the One of Well-known Workshops for Text Plagiarism is used in our Experiments. Our Method are Fourth Ranked Among the Eleven most Outstanding Plagiarism Detection Methods.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Research on text mining based malware analysis technology using string information (문자열 정보를 활용한 텍스트 마이닝 기반 악성코드 분석 기술 연구)

  • Ha, Ji-hee;Lee, Tae-jin
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.45-55
    • /
    • 2020
  • Due to the development of information and communication technology, the number of new / variant malicious codes is increasing rapidly every year, and various types of malicious codes are spreading due to the development of Internet of things and cloud computing technology. In this paper, we propose a malware analysis method based on string information that can be used regardless of operating system environment and represents library call information related to malicious behavior. Attackers can easily create malware using existing code or by using automated authoring tools, and the generated malware operates in a similar way to existing malware. Since most of the strings that can be extracted from malicious code are composed of information closely related to malicious behavior, it is processed by weighting data features using text mining based method to extract them as effective features for malware analysis. Based on the processed data, a model is constructed using various machine learning algorithms to perform experiments on detection of malicious status and classification of malicious groups. Data has been compared and verified against all files used on Windows and Linux operating systems. The accuracy of malicious detection is about 93.5%, the accuracy of group classification is about 90%. The proposed technique has a wide range of applications because it is relatively simple, fast, and operating system independent as a single model because it is not necessary to build a model for each group when classifying malicious groups. In addition, since the string information is extracted through static analysis, it can be processed faster than the analysis method that directly executes the code.

Improving methods for normalizing biomedical text entities with concepts from an ontology with (almost) no training data at BLAH5 the CONTES

  • Ferre, Arnaud;Ba, Mouhamadou;Bossy, Robert
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.20.1-20.5
    • /
    • 2019
  • Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.

Prediction of Housing Price Index using Data Mining and Learning Techniques (데이터마이닝과 학습기법을 이용한 부동산가격지수 예측)

  • Lee, Jiyoung;Ryu, Jae Pil
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.8
    • /
    • pp.47-53
    • /
    • 2021
  • With increasing interest in the 4th industrial revolution, data-driven scientific methodologies have developed. However, there are limitations of data collection in the real estate field of research. In addition, as the public becomes more knowledgeable about the real estate market, the qualitative sentiment comes to play a bigger role in the real estate market. Therefore, we propose a method to collect quantitative data that reflects sentiment using text mining and k-means algorithms, rather than the existing source data, and to predict the direction of housing index through artificial neural network learning based on the collected data. Data from 2012 to 2019 is set as the training period and 2020 as the prediction period. It is expected that this study will contribute to the utilization of scientific methods such as artificial neural networks rather than the use of the classical methodology for real estate market participants in their decision making process.

Terminology Recognition System based on Machine Learning for Scientific Document Analysis (과학 기술 문헌 분석을 위한 기계학습 기반 범용 전문용어 인식 시스템)

  • Choi, Yun-Soo;Song, Sa-Kwang;Chun, Hong-Woo;Jeong, Chang-Hoo;Choi, Sung-Pil
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.329-338
    • /
    • 2011
  • Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.