• 제목/요약/키워드: Text analysis

검색결과 3,350건 처리시간 0.032초

태그 경로 및 텍스트 출현 빈도를 이용한 HTML 본문 추출 (HTML Text Extraction Using Tag Path and Text Appearance Frequency)

  • 김진환;김은경
    • 한국정보통신학회논문지
    • /
    • 제25권12호
    • /
    • pp.1709-1715
    • /
    • 2021
  • 웹 페이지에서 필요한 텍스트를 정확하게 추출하기 위해 본문이 존재하는 곳의 태그와 스타일 속성을 웹 크롤러에 명시하는 방법은 웹 페이지 구성이 변경될 때마다 본문을 추출하는 로직을 수정해야 하는 문제가 있다. 이러한 문제점을 해결하기 위해 이전 연구에서 제안한 텍스트의 출현 빈도를 분석하여 본문을 추출하는 방법은 웹 페이지의 수집 채널에 따라 성능 편차가 크다는 한계점이 있었다. 따라서 본 논문에서는 텍스트의 출현 빈도뿐만 아니라 웹 페이지의 DOM 트리로부터 추출된 텍스트 노드의 부모 태그 경로를 분석하여 다양한 수집 채널에서 높은 정확도로 본문을 추출하는 방법을 제안하였다.

온라인 고객리뷰 분석을 통한 시장세분화에 텍스트마이닝 기술을 적용하기 위한 방법론 (Methodology for Applying Text Mining Techniques to Analyzing Online Customer Reviews for Market Segmentation)

  • 김근형;오성열
    • 한국콘텐츠학회논문지
    • /
    • 제9권8호
    • /
    • pp.272-284
    • /
    • 2009
  • 본 논문에서는 텍스트마이닝 기술을 이용하여 온라인 고객리뷰를 분석하기 위한 방법론을 제안하였다. 온라인 고객리뷰를 보다 효율적이고 효과적으로 분석할 수 있도록 시장세분화의 개념을 도입하였다. 즉, 제안한 방법론은 텍스트마이닝 분야에서 시장세분화의 개념에 부응하는 기술들이라 할 수 있는 범주화와 정보추출 기법의 사용을 포함한다. 특히, 통계적으로 보다 견고한 분석결과를 도출할 수 있도록 전통적 통계분석기법중의 하나인 교차분석방법을 제안하는 방법론에 포함하였다. 제안한 방법론의 타당성을 확인하기 위하여 양질의 온라인 고객리뷰가 있는 웹사이트를 선정하여 실제로 온라인 고객리뷰들을 분석하여 보았다.

마스크 선택기준이 브랜드 인지와 패션 마스크 구매의도에 미치는 영향 (The Effects of Consumers' Mask Selection Criteria on Mask Brand Awareness and Purchase Intention for Fashion Masks)

  • 김민수;이하경;김한나
    • 한국의류학회지
    • /
    • 제46권1호
    • /
    • pp.116-131
    • /
    • 2022
  • This study used text mining to analyze big data to understand consumers' demand for and perceptions of fashion masks. Based on the text-mining analysis results, a survey was conducted with those living in Korea to investigate the influence of consumers' mask selection criteria on mask brand awareness and purchase intention for fashion masks. "Fashion mask" and "functional mask" were used as the keywords in a text-mining analysis, and an online survey of 242 respondents was conducted. The analysis results were as follows: First, the text-mining analysis extracted commonly appearing words that had a high frequency and TF-IDF, such as "COVID-19," "fashion," "celebrity," "antibacterial," and "filter." This confirmed that during the COVID-19 pandemic, consumers have demanded masks that are both functional and fashionable. Second, among consumers' mask selection criteria, trend and design had positive effects on face-mask brand awareness. Third, face-mask brand awareness had a positive effect on the purchase intention for both brand and fashion masks, and the purchase intention for brand masks had a positive effect on the purchase intention for fashion masks.

텍스트 마이닝(text mining) 기법을 활용한 서브버시브 베이식(subversive basics) 패션의 특성 (Evaluating the Characteristics of Subversive Basic Fashion Utilizing Text Mining Techniques)

  • 임민정
    • 패션비즈니스
    • /
    • 제27권5호
    • /
    • pp.78-92
    • /
    • 2023
  • Fashion trends are actively disseminated through social media, which influences both their propagation and consumption. This study explored how users perceive subversive basic fashion in social media videos, by examining the associated concepts and characteristics. In addition, the factors contributing to the style's social media dissemination were identified and its distinctive features were analyzed. Through text mining analysis, 80 keywords were selected for semantic network and CONCOR analysis. TF-IDF and N-gram results indicate that subversive basic fashion involves transformative design techniques such as cutting or layering garments, emphasizing the body with thin fabrics, and creating bold visual effects. Topic modeling suggests that this fashion forms a subculture that resists mainstream norms, seeking individuality by creatively transforming the existing garments. CONCOR analysis categorized the style into six groups: forward-thinking unconventional fashion, bold and unique style, creative reworking, item utilization and combination, pursuit of easy and convenient fashion, and contemporary sensibility. Consumer actions, linked to social media, were shown to involve easily transforming and pursuing personalized styles. Furthermore, creating new styles through the existing clothing is seen as an economic and creative activity that fosters network formation and interaction. This study is significant as it addresses language expression limitations and subjectivity issues in fashion image analysis, revealing factors contributing to content reproduction through user-perceived design concepts and social media-conveyed fashion characteristics.

Validity of Language-Based Algorithms Trained on Supervisor Feedback Language for Predicting Interpersonal Fairness in Performance Feedback

  • Jisoo Ock;Joyce S. Pang
    • Asia pacific journal of information systems
    • /
    • 제33권4호
    • /
    • pp.1118-1134
    • /
    • 2023
  • Previous research has shown that employees tend to react more positively to corrective feedback from supervisors to the extent they perceive that they were treated with empathy, respect, and concern towards fair interpersonal treatment in receiving the feedback information. Then, to facilitate effective supervisory feedback and coaching, it would be useful for organizations to monitor the contents of feedback exchanges between supervisors and employees to make sure that supervisors are providing performance feedback using languages that are more likely to be perceived as interpersonally fair. Computer-aided text analysis holds potential as a useful tool that organizations can use to efficiently monitor the quality of the feedback messages that supervisors provide to their employees. In the current study, we applied computer-aided text analysis (using closed-vocabulary text analysis) and machine learning to examine the validity of language-based algorithms trained on supervisor language in performance feedback situations for predicting human ratings of feedback interpersonal fairness. Results showed that language-based algorithms predicted feedback interpersonal fairness with reasonable level of accuracy. Our findings provide supportive evidence for the promise of using employee language data for managing (and improving) performance management in organizations.

Text Mining of Wood Science Research Published in Korean and Japanese Journals

  • Eun-Suk JANG
    • Journal of the Korean Wood Science and Technology
    • /
    • 제51권6호
    • /
    • pp.458-469
    • /
    • 2023
  • Text mining techniques provide valuable insights into research information across various fields. In this study, text mining was used to identify research trends in wood science from 2012 to 2022, with a focus on representative journals published in Korea and Japan. Abstracts from Journal of the Korean Wood Science and Technology (JKWST, 785 articles) and Journal of Wood Science (JWS, 812 articles) obtained from the SCOPUS database were analyzed in terms of the word frequency (specifically, term frequency-inverse document frequency) and co-occurrence network analysis. Both journals showed a significant occurrence of words related to the physical and mechanical properties of wood. Furthermore, words related to wood species native to each country and their respective timber industries frequently appeared in both journals. CLT was a common keyword in engineering wood materials in Korea and Japan. In addition, the keywords "MDF," "MUF," and "GFRP" were ranked in the top 50 in Korea. Research on wood anatomy was inferred to be more active in Japan than in Korea. Co-occurrence network analysis showed that words related to the physical and structural characteristics of wood were organically related to wood materials.

광섬유를 이용한 Magneto-Optic Current Transformer실현에 관한 연구 (A Study on the Realization of Magneto-Optic Current Transformer by Optical Fiber)

  • 이상효;김은수
    • 한국통신학회논문지
    • /
    • 제8권4호
    • /
    • pp.139-144
    • /
    • 1983
  • 本論文에서는 MCT(magnito-optic current transformer)를 구성하기 위한 單一모우드光纖維의 偏光特性 및 파라데이回轉에 대한 分析 및 實驗을 하였다. 分析過程에서 單一모우드光纖維는 線形延器로 모델링하였으며 측정결과 순수 birefringence는 2.57 /m로 나타났고 구부림에 의한 birefringence는 구부림 반지름의 제곱에 반비례하였다. 그리고 파라데이回轉에 대한 理論的解析의 결과 자계의 세기(H)에 대한 편광회전감도(F)는 F/H=1.4x$10^-5$rad/[Am]로 나타났다.

  • PDF

의료 웹포럼에서의 텍스트 분석을 통한 정보적 지지 및 감성적 지지 유형의 글 분류 모델 (The Informative Support and Emotional Support Classification Model for Medical Web Forums using Text Analysis)

  • 우지영;이민정
    • 한국IT서비스학회지
    • /
    • 제11권sup호
    • /
    • pp.139-152
    • /
    • 2012
  • In the medical web forum, people share medical experience and information as patients and patents' families. Some people search medical information written in non-expert language and some people offer words of comport to who are suffering from diseases. Medical web forums play a role of the informative support and the emotional support. We propose the automatic classification model of articles in the medical web forum into the information support and emotional support. We extract text features of articles in web forum using text mining techniques from the perspective of linguistics and then perform supervised learning to classify texts into the information support and the emotional support types. We adopt the Support Vector Machine (SVM), Naive-Bayesian, decision tree for automatic classification. We apply the proposed model to the HealthBoards forum, which is also one of the largest and most dynamic medical web forum.

RESEARCH ON SENTIMENT ANALYSIS METHOD BASED ON WEIBO COMMENTS

  • Li, Zhong-Shi;He, Lin;Guo, Wei-Jie;Jin, Zhe-Zhi
    • East Asian mathematical journal
    • /
    • 제37권5호
    • /
    • pp.599-612
    • /
    • 2021
  • In China, Weibo is one of the social platforms with more users. It has the characteristics of fast information transmission and wide coverage. People can comment on a certain event on Weibo to express their emotions and attitudes. Judging the emotional tendency of users' comments is not only beneficial to the monitoring of the management department, but also has very high application value for rumor suppression, public opinion guidance, and marketing. This paper proposes a two-input Adaboost model based on TextCNN and BiLSTM. Use the TextCNN model that can perform local feature extraction and the BiLSTM model that can perform global feature extraction to process comment data in parallel. Finally, the classification results of the two models are fused through the improved Adaboost algorithm to improve the accuracy of text classification.

저가항공 관련 국내학술지 네트워크 텍스트 분석 (Bibliometric Network Analysis on Low Cost Carrier Research)

  • 나진성;최동현
    • 한국항공운항학회지
    • /
    • 제23권1호
    • /
    • pp.14-23
    • /
    • 2015
  • This study applied the network text analysis to reveal the scope and trends of low cost carrier studies. We analyzed low cost carrier research published in Korean journals and news articles. The results showed that there are three clusters in terms of research topics. First dimension consists of articles investigating growth in the low cost carrier industry. The second dimension is associated with service characteristics. The last dimension has strong ties organizational and human resource dimension. We run Krkwic, Krtitle, Netdraw, and Ucinet 6.0 to conduct the network text analysis. This study suggests the direction of low cost carrier research in the future.