• Title/Summary/Keyword: N-GRAM

Search Result 576, Processing Time 0.022 seconds

A Study on the Characteristics of Amekaji Fashion Trends Using Big Data Text Mining Analysis (빅데이터 텍스트 마이닝 분석을 활용한 아메카지 패션 트렌드 특징 고찰)

  • Kim, Gihyung
    • Journal of Fashion Business
    • /
    • v.26 no.3
    • /
    • pp.138-154
    • /
    • 2022
  • The purpose of this study is to identify the characteristics of domestic American casual fashion trends using big data text mining analysis. 108,524 posts and 2,038,999 extracted keywords from Naver and Daum related to American casual fashion in the past 5 years were collected and refined by the Textom program, and frequency analysis, word cloud, N-gram, centrality analysis, and CONCOR analysis were performed. The frequency analysis, 'vintage', 'style', 'daily look', 'coordination', 'workwear', 'men's wear' appeared as the main keywords. The main nationality of the representative brands was Japanese, followed by American, Korean, and others. As a result of the CONCOR analysis, four clusters were derived: "general American casual trend", "vintage taste", "direct sales mania", and "American styling". This study results showed that Japanese American casual clothes are influenced by American casual clothes, and American casual fashion in Korea, which has been reinterpreted, is completed with various coordination and creative styles such as workwear, street, military, classic, etc., focusing on items and brands. Looks were worn and shared on social networks, and the existence of an active consumer group and market potential to obtain genuine products, ranging from second-hand transactions for limited edition vintages to individual transactions were also confirmed. The significance of this study is that it presented the characteristics of American casual fashion trends academically based on online text data that the public actually uses because it has been spread by the public.

Korean sentence spacing correction model using syllable and morpheme information (음절과 형태소 정보를 이용한 한국어 문장 띄어쓰기 교정 모델)

  • Choi, Jeong-Myeong;Oh, Byoung-Doo;Heo, Tak-Sung;Jeong, Yeong-Seok;Kim, Yu-Seop
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.141-144
    • /
    • 2020
  • 한국어에서 문장의 가독성이나 맥락 파악을 위해 띄어쓰기는 매우 중요하다. 또한 자연 언어 처리를 할 때 띄어쓰기 오류가 있는 문장을 사용하면 문장의 구조가 달라지기 때문에 성능에 영향을 미칠 수 있다. 기존 연구에서는 N-gram 기반 통계적인 방법과 형태소 분석기를 이용하여 띄어쓰기 교정을 해왔다. 최근 들어 심층 신경망을 활용하는 많은 띄어쓰기 교정 연구가 진행되고 있다. 기존 심층 신경망을 이용한 연구에서는 문장을 음절 단위 또는 형태소 단위로 처리하여 교정 모델을 만들었다. 본 연구에서는 음절과 형태소 단위 모두 모델의 입력으로 사용하여 두 정보를 결합하여 띄어쓰기 교정 문제를 해결하고자 한다. 모델은 문장의 음절과 형태소 시퀀스에서 지역적 정보를 학습할 수 있는 Convolutional Neural Network와 순서정보를 정방향, 후방향으로 학습할 수 있는 Bidirectional Long Short-Term Memory 구조를 사용한다. 모델의 성능은 음절의 정확도와 어절의 정밀도, 어절의 재현율, 어절의 F1 score를 사용해 평가하였다. 제안한 모델의 성능 평가 결과 어절의 F1 score가 96.06%로 우수한 성능을 냈다.

  • PDF

Developing and Evaluating Damage Information Classifier of High Impact Weather by Using News Big Data (재해기상 언론기사 빅데이터를 활용한 피해정보 자동 분류기 개발)

  • Su-Ji, Cho;Ki-Kwang Lee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.7-14
    • /
    • 2023
  • Recently, the importance of impact-based forecasting has increased along with the socio-economic impact of severe weather have emerged. As news articles contain unconstructed information closely related to the people's life, this study developed and evaluated a binary classification algorithm about snowfall damage information by using media articles text mining. We collected news articles during 2009 to 2021 which containing 'heavy snow' in its body context and labelled whether each article correspond to specific damage fields such as car accident. To develop a classifier, we proposed a probability-based classifier based on the ratio of the two conditional probabilities, which is defined as I/O Ratio in this study. During the construction process, we also adopted the n-gram approach to consider contextual meaning of each keyword. The accuracy of the classifier was 75%, supporting the possibility of application of news big data to the impact-based forecasting. We expect the performance of the classifier will be improve in the further research as the various training data is accumulated. The result of this study can be readily expanded by applying the same methodology to other disasters in the future. Furthermore, the result of this study can reduce social and economic damage of high impact weather by supporting the establishment of an integrated meteorological decision support system.

Evaluating the Characteristics of Subversive Basic Fashion Utilizing Text Mining Techniques (텍스트 마이닝(text mining) 기법을 활용한 서브버시브 베이식(subversive basics) 패션의 특성)

  • Minjung Im
    • Journal of Fashion Business
    • /
    • v.27 no.5
    • /
    • pp.78-92
    • /
    • 2023
  • Fashion trends are actively disseminated through social media, which influences both their propagation and consumption. This study explored how users perceive subversive basic fashion in social media videos, by examining the associated concepts and characteristics. In addition, the factors contributing to the style's social media dissemination were identified and its distinctive features were analyzed. Through text mining analysis, 80 keywords were selected for semantic network and CONCOR analysis. TF-IDF and N-gram results indicate that subversive basic fashion involves transformative design techniques such as cutting or layering garments, emphasizing the body with thin fabrics, and creating bold visual effects. Topic modeling suggests that this fashion forms a subculture that resists mainstream norms, seeking individuality by creatively transforming the existing garments. CONCOR analysis categorized the style into six groups: forward-thinking unconventional fashion, bold and unique style, creative reworking, item utilization and combination, pursuit of easy and convenient fashion, and contemporary sensibility. Consumer actions, linked to social media, were shown to involve easily transforming and pursuing personalized styles. Furthermore, creating new styles through the existing clothing is seen as an economic and creative activity that fosters network formation and interaction. This study is significant as it addresses language expression limitations and subjectivity issues in fashion image analysis, revealing factors contributing to content reproduction through user-perceived design concepts and social media-conveyed fashion characteristics.

Text-Mining Analysis of Korea Government R&D Trends in Construction Machinery Domains (텍스트 마이닝을 통한 건설기계분야 국내 정부 R&D 연구동향 분석)

  • Bom Yun;Joonsoo Bae
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.spc
    • /
    • pp.1-8
    • /
    • 2023
  • To investigate the national science and technology policy direction in the field of construction machinery, an analysis was conducted on projects selected as national research and development (R&D) initiatives by the government. Assuming that the project titles contain key keywords, text mining was employed to substantiate this assumption. Project information data spanning nine years from 2014 to 2022 was collected through the National Science & Technology Information Service (NTIS). To observe changes over time, the years were divided into three-year sections. To analyze research trends efficiently, keywords were categorized into groups: 'equipment,' 'smart,' and 'eco-friendly.' Based on the collected data, keyword frequency analysis, N-gram analysis, and topic modeling were performed. The research findings indicate that domestic government R&D in the construction machinery field primarily focuses on smart-related research and development. Specifically, investments in monitoring systems and autonomous operation technologies are increasing. This study holds significance in analyzing objective research trends through the utilization of big data analysis techniques and is expected to contribute to future research and development planning, strategic formulation, and project management.

Applications of Machine Learning Models on Yelp Data

  • Ruchi Singh;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.29 no.1
    • /
    • pp.35-49
    • /
    • 2019
  • The paper attempts to document the application of relevant Machine Learning (ML) models on Yelp (a crowd-sourced local business review and social networking site) dataset to analyze, predict and recommend business. Strategically using two cloud platforms to minimize the effort and time required for this project. Seven machine learning algorithms in Azure ML of which four algorithms are implemented in Databricks Spark ML. The analyzed Yelp business dataset contained 70 business attributes for more than 350,000 registered business. Additionally, review tips and likes from 500,000 users have been processed for the project. A Recommendation Model is built to provide Yelp users with recommendations for business categories based on their previous business ratings, as well as the business ratings of other users. Classification Model is implemented to predict the popularity of the business as defining the popular business to have stars greater than 3 and unpopular business to have stars less than 3. Text Analysis model is developed by comparing two algorithms, uni-gram feature extraction and n-feature extraction in Azure ML studio and logistic regression model in Spark. Comparative conclusions have been made related to efficiency of Spark ML and Azure ML for these models.

Exploring the research trends of elderly oral health through language network analysis (언어 네트워크 분석을 통한 노인 구강 건강 연구 동향 탐구)

  • Yun-Jeong Kim
    • Journal of Korean society of Dental Hygiene
    • /
    • v.23 no.6
    • /
    • pp.451-458
    • /
    • 2023
  • Objectives: The purpose of this study is to explore the research trends of elderly oral health through a language network analysis. Methods: A total of 354 published studies with 668 keywords were collected from the Research Information Sharing Service (RISS) between 2000 and 2022. Language network analysis was performed using Textom 6.0, Ucinet 6.774, and NetDraw 2.183. Results: The most frequent keywords were 'elderly', 'oral health', 'quality of life', and 'OHIP-14'. The result of frequency-inverse document frequent keywords showed similar results to the most frequent keywords. The N-gram of keywords shows that 'elderly', 'oral health' (18 times) and 'elderly', 'depression' (7 times). As a results of the analysis of degree centrality and between centrality, 'elderly', 'oral health', and 'quality of life' were found to be high. The CONCOR analysis identified the main clusters of 'quality of life', 'oral health behavior', 'health', and 'oral function disorder'. Conclusions: The results of the current study could be available to know research trends in elderly oral health and it is necessary to improve more comprehensive study in follow-up study.

JarBot: Automated Java Libraries Suggestion in JAR Archives Format for a given Software Architecture

  • P. Pirapuraj;Indika Perera
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.5
    • /
    • pp.191-197
    • /
    • 2024
  • Software reuse gives the meaning for rapid software development and the quality of the software. Most of the Java components/libraries open-source are available only in Java Archive (JAR) file format. When a software design enters into the development process, the developer needs to select necessary JAR files manually via analyzing the given software architecture and related JAR files. This paper proposes an automated approach, JarBot, to suggest all the necessary JAR files for given software architecture in the development process. All related JAR files will be downloaded from the internet based on the extracted information from the given software architecture (class diagram). Class names, method names, and attribute names will be extracted from the downloaded JAR files and matched with the information extracted from the given software architecture to identify the most relevant JAR files. For the result and evaluation of the proposed system, 05 software design was developed for 05 well-completed software project from GitHub. The proposed system suggested more than 95% of the JAR files among expected JAR files for the given 05 software design. The result indicated that the proposed system is suggesting almost all the necessary JAR files.

Effects of Maturity Stages on the Nutritive Composition and Silage Quality of Whole Crop Wheat

  • Xie, Z.L.;Zhang, T.F.;Chen, X.Z.;Li, G.D.;Zhang, J.G.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.25 no.10
    • /
    • pp.1374-1380
    • /
    • 2012
  • The changes in yields and nutritive composition of whole crop wheat (Triticum aestivum L.) during maturation and effects of maturity stage and lactic acid bacteria (LAB) inoculants on the fermentation quality and aerobic stability were investigated under laboratory conditions. Whole crop wheat harvested at three maturation stages: flowering stage, milk stage and dough stage. Two strains of LAB (Lactobacillus plantarum: LAB1, Lactobacillus parafarraqinis: LAB2) were inoculated for wheat ensiling at $1.0{\times}10^5$ colony forming units per gram of fresh forage. The results indicated that wheat had higher dry matter yields at the milk and dough stages. The highest water-soluble carbohydrates content, crude protein yields and relative feed value of wheat were obtained at the milk stage, while contents of crude fiber, neutral detergent fiber and acid detergent fiber were the lowest, compared to the flowering and dough stages. Lactic acid contents of wheat silage significantly decreased with maturity. Inoculating homofermentative LAB1 markedly reduced pH values and ammonia-nitrogen ($NH_3$-N) content (p<0.05) of silages at three maturity stages compared with their corresponding controls. Inoculating heterofermentative LAB2 did not significantly influence pH values, whereas it notably lowered lactic acid and $NH_3$-N content (p<0.05) and effectively improved the aerobic stability of silages. In conclusion, considering both yields and nutritive value, whole crop wheat as forage should be harvested at the milk stage. Inoculating LAB1 improved the fermentation quality, while inoculating LAB2 enhanced the aerobic stability of wheat silages at different maturity stages.

Isolation and Characterization of Acetobacter Species from a Traditionally Prepared Vinegar (전통방식으로제조한식초로부터 Acetobacter 종들분리및특성조사)

  • Lee, Kang Wook;Shim, Jae Min;Kim, Gyeong Min;Shin, Jung-Hye;Kim, Jeong Hwan
    • Microbiology and Biotechnology Letters
    • /
    • v.43 no.3
    • /
    • pp.219-226
    • /
    • 2015
  • Acetic acid bacteria (AAB) were isolated from vinegar fermented through traditional methods in Namhae county, Gyeongnam, the Republic of Korea. The isolated strains were Gram negative, non-motile, and short-rods. Three selected strains were identified as either Acetobacter pasteurianus or Acetobacter aceti by 16S rRNA gene sequencing. A. pasteurianus NH2 and A. pasteurianus NH6 utilized ethanol, glycerol, D-fructose, D-glucose, D-mannitol, D-sorbitol, L-glutamic acid and Na-acetate. A. aceti NH12 utilized ethanol, n-propanol, glycerol, D-mannitol and Na-acetate. These strains grew best at 30℃ and an initial pH of 3.4. They were tolerant against acetic acid at up to 3% of initial concentration (v/v). The optimum conditions for acetic acid production were 30℃ and pH 3.4, with an initial ethanol concentration of 5%, resulting in an acetic acid concentration of 7.3−7.7%.