• Title/Summary/Keyword: Word2Vec

Search Result 218, Processing Time 0.026 seconds

Implementation of Recipe Recommendation System Using Ingredients Combination Analysis based on Recipe Data (레시피 데이터 기반의 식재료 궁합 분석을 이용한 레시피 추천 시스템 구현)

  • Min, Seonghee;Oh, Yoosoo
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.8
    • /
    • pp.1114-1121
    • /
    • 2021
  • In this paper, we implement a recipe recommendation system using ingredient harmonization analysis based on recipe data. The proposed system receives an image of a food ingredient purchase receipt to recommend ingredients and recipes to the user. Moreover, it performs preprocessing of the receipt images and text extraction using the OCR algorithm. The proposed system can recommend recipes based on the combined data of ingredients. It collects recipe data to calculate the combination for each food ingredient and extracts the food ingredients of the collected recipe as training data. And then, it acquires vector data by learning with a natural language processing algorithm. Moreover, it can recommend recipes based on ingredients with high similarity. Also, the proposed system can recommend recipes using replaceable ingredients to improve the accuracy of the result through preprocessing and postprocessing. For our evaluation, we created a random input dataset to evaluate the proposed recipe recommendation system's performance and calculated the accuracy for each algorithm. As a result of performance evaluation, the accuracy of the Word2Vec algorithm was the highest.

Research Trends of Ergonomics in Occupational Safety and Health through MEDLINE Search: Focus on Abstract Word Modeling using Word Embedding (MEDLINE 검색을 통한 산업안전보건 분야에서의 인간공학 연구동향 : 워드임베딩을 활용한 초록 단어 모델링을 중심으로)

  • Kim, Jun Hee;Hwang, Ui Jae;Ahn, Sun Hee;Gwak, Gyeong Tae;Jung, Sung Hoon
    • Journal of the Korean Society of Safety
    • /
    • v.36 no.5
    • /
    • pp.61-70
    • /
    • 2021
  • This study aimed to analyze the research trends of the abstract data of ergonomic studies registered in MEDLINE, a medical bibliographic database, using word embedding. Medical-related ergonomic studies mainly focus on work-related musculoskeletal disorders, and there are no studies on the analysis of words as data using natural language processing techniques, such as word embedding. In this study, the abstract data of ergonomic studies were extracted with a program written with selenium and BeutifulSoup modules using python. The word embedding of the abstract data was performed using the word2vec model, after which the data found in the abstract were vectorized. The vectorized data were visualized in two dimensions using t-Distributed Stochastic Neighbor Embedding (t-SNE). The word "ergonomics" and ten of the most frequently used words in the abstract were selected as keywords. The results revealed that the most frequently used words in the abstract of ergonomics studies include "use", "work", and "task". In addition, the t-SNE technique revealed that words, such as "workplace", "design", and "engineering," exhibited the highest relevance to ergonomics. The keywords observed in the abstract of ergonomic studies using t-SNE were classified into four groups. Ergonomics studies registered with MEDLINE have investigated the risk factors associated with workers performing an operation or task using tools, and in this study, ergonomics studies were identified by the relationship between keywords using word embedding. The results of this study will provide useful and diverse insights on future research direction on ergonomic studies.

A Comparative Study on the Performance of Korean Sentence Embedding (Word2Vec, GloVe 및 RoBERTa 등의 모델을 활용한 한국어 문장 임베딩 성능 비교 연구)

  • Seok, Juree;Lim, Heuiseok
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.444-449
    • /
    • 2021
  • 자연어처리에서 임베딩이란 사람의 언어를 컴퓨터가 이해할 수 있는 벡터로 변환한 것으로 자연어처리의 필수 요소 중 하나이다. 본 논문에서는 단어 기반 임베딩인 Word2Vec, GloVe, fastText와 문장 기반 임베딩 기법인 BERT와 M-USE, RoBERTa를 사용하여 한국어 문장 임베딩을 만들어 NSMC, KorNLI, KorSTS 세 가지 태스크에 대한 성능을 확인해보았다. 그 결과 태스크에 따라서 적합한 한국어 문장 임베딩 기법이 달라지며, 태스크에 따라서는 BERT의 평균 임베딩보다 GloVe의 평균 임베딩과 같은 단어 기반의 임베딩이 좋은 성능을 보일 수 있음을 확인할 수 있었다.

  • PDF

A Visitor Study of The Exhibition of Using Big Data Analysis which reflects viewing experiences

  • Kang, Ji-Su;Rhee, Bo-A
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.81-89
    • /
    • 2022
  • This study aims to analyze the images of Instagram posts and to draw implcations regarding the exhibition of . This study collects and crawl 24,295 images from Instagram posts as a dataset. We use the Google Cloud Vision API for labeling the images and a total of 212,567 clusters of labels are finally classified into 9 categories using Word2Vec. The categories of museum spaces, photo zone, architecture category are dominant along with people category. In conclusion, visitors curate their experiences and memories of physical places and spaces while they are experiencing with the exhibition. This result reproves the results of previous studies which emphasize a sense of social presence and place making. The convergent approach of art management and art technology used in this study help museum professionals have an insight on big data based visitor research on a practical level.

Course recommendation system using deep learning (딥러닝을 이용한 강좌 추천시스템)

  • Min-Ah Lim;Seung-Yeon Hwang;Dong-Jin Shin;Jae-Kon Oh;Jeong-Joon Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.193-198
    • /
    • 2023
  • We study a learner-customized lecture recommendation project using deep learning. Recommendation systems can be easily found on the web and apps, and examples using this feature include recommending feature videos by clicking users and advertising items in areas of interest to users on SNS. In this study, the sentence similarity Word2Vec was mainly used to filter twice, and the course was recommended through the Surprise library. With this system, it provides users with the desired classification of course data conveniently and conveniently. Surprise Library is a Python scikit-learn-based library that is conveniently used in recommendation systems. By analyzing the data, the system is implemented at a high speed, and deeper learning is used to implement more precise results through course steps. When a user enters a keyword of interest, similarity between the keyword and the course title is executed, and similarity with the extracted video data and voice text is executed, and the highest ranking video data is recommended through the Surprise Library.

A Global-Interdependence Pairwise Approach to Entity Linking Using RDF Knowledge Graph (개체 링킹을 위한 RDF 지식그래프 기반의 포괄적 상호의존성 짝 연결 접근법)

  • Shim, Yongsun;Yang, Sungkwon;Kim, Hong-Gee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.3
    • /
    • pp.129-136
    • /
    • 2019
  • There are a variety of entities in natural language such as people, organizations, places, and products. These entities can have many various meanings. The ambiguity of entity is a very challenging task in the field of natural language processing. Entity Linking(EL) is the task of linking the entity in the text to the appropriate entity in the knowledge base. Pairwise based approach, which is a representative method for solving the EL, is a method of solving the EL by using the association between two entities in a sentence. This method considers only the interdependence between entities appearing in the same sentence, and thus has a limitation of global interdependence. In this paper, we developed an Entity2vec model that uses Word2vec based on knowledge base of RDF type in order to solve the EL. And we applied the algorithms using the generated model and ranked each entity. In this paper, to overcome the limitations of a pairwise approach, we devised a pairwise approach based on comprehensive interdependency and compared it.

A medical history taking system using Symptom2Vec (Symptom2Vec 을 활용한 병력 청취 시스템)

  • Kim, Min-Ji;Jee, In-whee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.411-413
    • /
    • 2022
  • 임상 환경에서 진료시간의 대부분은 환자의 증상을 듣고, 추가 증상을 이끌어내는데 사용된다. 이를 병력 청취라고 하며, 진료에 있어서 가장 기본적이고 필수적인 활동이다. 하지만 병력 청취에 대한 연구가 1940 년대부터 계속되고 있음에도 아직까지 표준이 정립되지 않았으며, 다양한 분야에 접목되는 딥러닝 기술 또한 병력 청취와 관련해서는 연구가 부족한 현실이다. 본 논문에서는 Symptom2Vec 을 새롭게 제안하였으며, 이를 활용하여 질병에 따른 증상의 평균 cosine 유사도 점수(0.962)로 병력 청취의 기준을 확립하였다. 또한 most similar word Top5 를 확인하는 것을 통해 환자의 증상에 따른 유사 증상을 묻는 병력 청취가 가능한 것을 확인하였다. 이를 통해 실제 임상 환경에서의 자동화된 병력 청취 시스템을 제안한다.

A Machine Learning-Based Vocational Training Dropout Prediction Model Considering Structured and Unstructured Data (정형 데이터와 비정형 데이터를 동시에 고려하는 기계학습 기반의 직업훈련 중도탈락 예측 모형)

  • Ha, Manseok;Ahn, Hyunchul
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.1
    • /
    • pp.1-15
    • /
    • 2019
  • One of the biggest difficulties in the vocational training field is the dropout problem. A large number of students drop out during the training process, which hampers the waste of the state budget and the improvement of the youth employment rate. Previous studies have mainly analyzed the cause of dropouts. The purpose of this study is to propose a machine learning based model that predicts dropout in advance by using various information of learners. In particular, this study aimed to improve the accuracy of the prediction model by taking into consideration not only structured data but also unstructured data. Analysis of unstructured data was performed using Word2vec and Convolutional Neural Network(CNN), which are the most popular text analysis technologies. We could find that application of the proposed model to the actual data of a domestic vocational training institute improved the prediction accuracy by up to 20%. In addition, the support vector machine-based prediction model using both structured and unstructured data showed high prediction accuracy of the latter half of 90%.

Recommendation System for Research Field of R&D Project Using Machine Learning (머신러닝을 이용한 R&D과제의 연구분야 추천 서비스)

  • Kim, Yunjeong;Shin, Donggu;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.12
    • /
    • pp.1809-1816
    • /
    • 2021
  • In order to identify the latest research trends using data related to national R&D projects and to produce and utilize meaningful information, the application of automatic classification technology was also required in the national R&D information service, so we conducted research to automatically classify and recommend research field. About 450,000 cases of national R&D project data from 2013 to 2020 were collected and used for learning and evaluation. A model was selected after data pre-processing, analysis, and performance analysis for valid data among collected data. The performance of Word2vec, GloVe, and fastText was compared for the purpose of deriving the optimal model combination. As a result of the experiment, the accuracy of only the subcategories used as essential items of task information is 90.11%. This model is expected to be applicable to the automatic classification study of other classification systems with a hierarchical structure similar to that of the national science and technology standard classification research field.

Metaverse Platform Customer Review Analysis Using Text Mining Techniques (텍스트 마이닝 기법을 활용한 메타버스 플랫폼 고객 리뷰 분석)

  • Hye Jin Kim;Jung Seung Lee;Soo Kyung Kim
    • Journal of Information Technology Applications and Management
    • /
    • v.31 no.1
    • /
    • pp.113-122
    • /
    • 2024
  • This comprehensive study delves into the analysis of user review data across various metaverse platforms, employing advanced text mining techniques such as TF-IDF and Word2Vec to gain insights into user perceptions. The primary objective is to uncover the factors that contribute to user satisfaction and dissatisfaction, thereby providing a nuanced understanding of user experiences in the metaverse. Through TF-IDF analysis, the research identifies key words and phrases frequently mentioned in user reviews, highlighting aspects that resonate positively with users, such as the ability to engage in creative activities and social interactions within these virtual environments. Word2Vec analysis further enriches this understanding by revealing the contextual relationships between words, offering a deeper insight into user sentiments and the specific features that enhance their engagement with the platforms. A significant finding of this study is the identification of common grievances among users, particularly related to the processes of refunds and login, which point to broader issues within payment systems and user interface designs across platforms. These insights are critical for developers and operators of metaverse platforms, suggesting a focused approach towards enhancing user experiences by amplifying positive aspects. The research underscores the importance of continuous improvement in user interface design and the transparency of payment systems to foster a loyal user base. By providing a comprehensive analysis of user reviews, this study offers valuable guidance for the strategic development and optimization of metaverse platforms, ensuring they remain responsive to user needs and continue to evolve as vibrant, engaging virtual environments.