• Title/Summary/Keyword: Bag of words

Search Result 90, Processing Time 0.032 seconds

The MeSH-Term Query Expansion Models using LDA Topic Models in Health Information Retrieval (MeSH 기반의 LDA 토픽 모델을 이용한 검색어 확장)

  • You, Sukjin
    • Journal of Korean Library and Information Science Society
    • /
    • v.52 no.1
    • /
    • pp.79-108
    • /
    • 2021
  • Information retrieval in the health field has several challenges. Health information terminology is difficult for consumers (laypeople) to understand. Formulating a query with professional terms is not easy for consumers because health-related terms are more familiar to health professionals. If health terms related to a query are automatically added, it would help consumers to find relevant information. The proposed query expansion (QE) models show how to expand a query using MeSH terms. The documents were represented by MeSH terms (i.e. Bag-of-MeSH), found in the full-text articles. And then the MeSH terms were used to generate LDA (Latent Dirichlet Analysis) topic models. A query and the top k retrieved documents were used to find MeSH terms as topic words related to the query. LDA topic words were filtered by threshold values of topic probability (TP) and word probability (WP). Threshold values were effective in an LDA model with a specific number of topics to increase IR performance in terms of infAP (inferred Average Precision) and infNDCG (inferred Normalized Discounted Cumulative Gain), which are common IR metrics for large data collections with incomplete judgments. The top k words were chosen by the word score based on (TP *WP) and retrieved document ranking in an LDA model with specific thresholds. The QE model with specific thresholds for TP and WP showed improved mean infAP and infNDCG scores in an LDA model, comparing with the baseline result.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

Updates of Nursing Evidence-Based Practice Guideline for Indwelling Urinary Catheterization (근거기반 유치도뇨간호 실무지침 개정)

  • Park, Kyung Hee;Choo, Hee Jung;Seo, Hyun Ju;Hong, Hae Kyung;Lee, Joohyun;Lim, Kyung Choon
    • Journal of Korean Clinical Nursing Research
    • /
    • v.29 no.3
    • /
    • pp.211-222
    • /
    • 2023
  • Purpose: This study was conducted to update the existing evidence-based nursing clinical practice guideline for indwelling urinary catheterization (IUC). Methods: The guideline have been revised in 22 steps based on international standards. The quality of the practice guidelines to be used for revision was evaluated using the Appraisal of Guidelines for Research and Evaluation II. The evaluation of the content appropriateness and applicability of the draft recommendations of the revised practice guidelines was performed using the RAND/UCLA Appropriateness Method, a decision-making method developed by the RAND Corporation. Four guidelines were used for the revision. Results: The updated nursing practice guideline for IUC consisted of 9 domains and 134 recommendations. The numbers of recommendations in each domain were: 4 Assessment, 20 Equipment, 11 Catheter insertion, 52 Catheter maintenance, 4 Catheter and drainage bag change, 9 Catheter removal, 22 Complications management, 5 Education and consult, and 7 Hospital support. The recommended grade was 8.2% for A, 38.1% for B, and 53.7% for C. Among these, the major revision was done in 11 recommendations (8.2%). A total of 29 recommendations (21.6%) were newly added. 30 (22.4%) recommendations had minor revisions such as changes or addition for some words or sentences, and 13 (9.7%) recommendations were deleted. Conclusion: Revised nursing practice guideline is expected to serve as an evidence-based practice guideline for IUC in Korea. This guideline will provide health care providers, patients, and caregivers with information to help manage IUC, leading to improved patient outcomes.

A Hybrid Proposed Framework for Object Detection and Classification

  • Aamir, Muhammad;Pu, Yi-Fei;Rahman, Ziaur;Abro, Waheed Ahmed;Naeem, Hamad;Ullah, Farhan;Badr, Aymen Mudheher
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1176-1194
    • /
    • 2018
  • The object classification using the images' contents is a big challenge in computer vision. The superpixels' information can be used to detect and classify objects in an image based on locations. In this paper, we proposed a methodology to detect and classify the image's pixels' locations using enhanced bag of words (BOW). It calculates the initial positions of each segment of an image using superpixels and then ranks it according to the region score. Further, this information is used to extract local and global features using a hybrid approach of Scale Invariant Feature Transform (SIFT) and GIST, respectively. To enhance the classification accuracy, the feature fusion technique is applied to combine local and global features vectors through weight parameter. The support vector machine classifier is a supervised algorithm is used for classification in order to analyze the proposed methodology. The Pascal Visual Object Classes Challenge 2007 (VOC2007) dataset is used in the experiment to test the results. The proposed approach gave the results in high-quality class for independent objects' locations with a mean average best overlap (MABO) of 0.833 at 1,500 locations resulting in a better detection rate. The results are compared with previous approaches and it is proved that it gave the better classification results for the non-rigid classes.

A Historical Consideration on the External Treatment theories and diseases for which medicine is efficacious (외치료법(外治療法)의 이론(理論)과 적응증(適應症)에 대한 사적(史的) 고찰(考察))

  • Moon, Woo-Sang;Lee, Byung-Wook;Ahn, Sang-Woo;Kim, Eun-Ha
    • Korean Journal of Oriental Medicine
    • /
    • v.10 no.2
    • /
    • pp.1-21
    • /
    • 2004
  • 1) Objective External treatments have various curative effects. So it had been used to cure various patients. But, it has a limited sphere of application in the present South Korea. Therefore we would like to bring out its sphere of application and detailed method in the oriental medicine classics. 2) Methodologies We have researched external treatment history according to below the procedure. (1) Making a related words list: We have used existing external treatments technical books to make a list. It has been connected with external treatments. It includes not only technical terms, but also general terms. (2) Searching sentence: We have searched sentence that contain terms that related with external treatments. (3) Analysis of related sentence: We have searched and classified sentence by disease. (4) Analysis of external treatment methods. 3) Conclusions From long time ago people have used external treatment to cure various disease. According to the ${\ulcorner}Nei-Jing{\lrcorner}$, hot compress therapy, fumigation therapy and bathing therapy had been used to cure blockage syndrome, muscle disease, carbuncle and cellulitis. Thereafter, a sphere of external treatment had gradually enlarged. (1) After all its sphere had included dermatologic, psychologic, internal, ophthalmic, otolaryngologic, obstetrics, gynecologic, pediatric and surgical diseases. (2) External treatment methods have contained hot compress therapy, fumigation therapy, bathing therapy, application therapy, medication bag therapy, medication plug therapy, medication massotherapy, aroma therapy and so on. (3) Medication types of external treatment have contained ointment, juice, infusion, powder, suppository and so on.

  • PDF

Consumers' perceptions of professional laundry shops using semantic network analysis (의미 네트워크 분석을 활용한 세탁전문점에 대한 소비자 인식 연구)

  • Kim, Ji-Yeon;Lee, Kyu-Hye
    • The Research Journal of the Costume Culture
    • /
    • v.27 no.6
    • /
    • pp.645-653
    • /
    • 2019
  • Laundry services are becoming more specialized and diversified. Therefore, this study investigated consumers' perceptions of professional laundry shops by analyzing social media data. For this purpose, text data from blogs, cafés, and Q&A sections ('Ji-Sik-In') on the portal site, naver.com, was collected. Sixty-four keywords were extracted from 2,213 social texts and transformed into a one-mode matrix using KrKwic, a program for the analysis of Korean text. Semantic network analysis was conducted to understand the network structure and the results were visualized using NodeXL. Keywords included fashion items and materials that require specialized professional laundry services, words related to the establishment of laundry shops, and laundry shop brands. Essential keywords of professional laundry shops included 'luxury,' 'footwear,' 'removal,' 'bag,' 'leather,' 'sneakers,' 'padding,' 'premium,' 'dyeing,' and 'franchise.' These results could be used to deduce that consumers perceive a professional laundry shop as a franchise shop offering specialized professional laundry services. A cluster analysis was conducted to identify the types of consumer perceptions of professional laundry shops. The network was divided into three groups: 'specialized professional laundry service,' 'laundry and repair of winter coats and jackets,' and 'the establishment of a professional laundry shop.' According to the results, consumers perceive professional laundry shops as franchises that offer specialized professional laundry services rather than general laundry services. Therefore, professional laundry shops need a strategy to develop special laundry services that differentiate them from other companies and communicate with consumers about these services.

Effective Korean sentiment classification method using word2vec and ensemble classifier (Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안)

  • Park, Sung Soo;Lee, Kun Chang
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.133-140
    • /
    • 2018
  • Accurate sentiment classification is an important research topic in sentiment analysis. This study suggests an efficient classification method of Korean sentiment using word2vec and ensemble methods which have been recently studied variously. For the 200,000 Korean movie review texts, we generate a POS-based BOW feature and a feature using word2vec, and integrated features of two feature representation. We used a single classifier of Logistic Regression, Decision Tree, Naive Bayes, and Support Vector Machine and an ensemble classifier of Adaptive Boost, Bagging, Gradient Boosting, and Random Forest for sentiment classification. As a result of this study, the integrated feature representation composed of BOW feature including adjective and adverb and word2vec feature showed the highest sentiment classification accuracy. Empirical results show that SVM, a single classifier, has the highest performance but ensemble classifiers show similar or slightly lower performance than the single classifier.

Mobile Camera-Based Positioning Method by Applying Landmark Corner Extraction (랜드마크 코너 추출을 적용한 모바일 카메라 기반 위치결정 기법)

  • Yoo Jin Lee;Wansang Yoon;Sooahm Rhee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_1
    • /
    • pp.1309-1320
    • /
    • 2023
  • The technological development and popularization of mobile devices have developed so that users can check their location anywhere and use the Internet. However, in the case of indoors, the Internet can be used smoothly, but the global positioning system (GPS) function is difficult to use. There is an increasing need to provide real-time location information in shaded areas where GPS is not received, such as department stores, museums, conference halls, schools, and tunnels, which are indoor public places. Accordingly, research on the recent indoor positioning technology based on light detection and ranging (LiDAR) equipment is increasing to build a landmark database. Focusing on the accessibility of building a landmark database, this study attempted to develop a technique for estimating the user's location by using a single image taken of a landmark based on a mobile device and the landmark database information constructed in advance. First, a landmark database was constructed. In order to estimate the user's location only with the mobile image photographing the landmark, it is essential to detect the landmark from the mobile image, and to acquire the ground coordinates of the points with fixed characteristics from the detected landmark. In the second step, by applying the bag of words (BoW) image search technology, the landmark photographed by the mobile image among the landmark database was searched up to a similar 4th place. In the third step, one of the four candidate landmarks searched through the scale invariant feature transform (SIFT) feature point extraction technique and Homography random sample consensus(RANSAC) was selected, and at this time, filtering was performed once more based on the number of matching points through threshold setting. In the fourth step, the landmark image was projected onto the mobile image through the Homography matrix between the corresponding landmark and the mobile image to detect the area of the landmark and the corner. Finally, the user's location was estimated through the location estimation technique. As a result of analyzing the performance of the technology, the landmark search performance was measured to be about 86%. As a result of comparing the location estimation result with the user's actual ground coordinate, it was confirmed that it had a horizontal location accuracy of about 0.56 m, and it was confirmed that the user's location could be estimated with a mobile image by constructing a landmark database without separate expensive equipment.

A Study on the Empathy of the Teenage Audience at the Cheong Kong Festival - Focusing on the 3rd Performing Arts Festival for Youth - (청공축제의 청소년 관객 공감 양상 연구 - '제3회 청소년을 위한 공연예술축제'를 중심으로)

  • Oh, Pan-Jin
    • (The) Research of the performance art and culture
    • /
    • no.39
    • /
    • pp.609-635
    • /
    • 2019
  • This study analyzed five official entries in the 3rd Cheong Kong Festival contest and analyzed the patterns of teen audience empathy. The tools used for this analysis were 'characters, acting, background and theme'. Firstly, characters were mostly teenagers and out-of-school teenagers, but there were other performances that focused on the relationship between teenagers and adults or focused on the youth, which the teen audience preferred. And they preferred realism acting to emotional acting and preferred musical acting to realism acting. In addition, the background of the events covered in the performance was evaluated to be like this: the closer the audience was to the youth, the higher the audience sympathized with the performance, and the closer the subject matter was to the youth's interest, the more positive it received. In summing up the opinions of the youth evaluation team, the first audience-participating Sinpa Theater, "Mr. X" was evaluated to expand the scope of teenagers to 20s and to show the negative and heavy reality as fun and beneficial one. Secondly, when it comes to non-prejudiced youth theatre "The Turtle", which have a high level of empathy, it was evaluated to shape the prejudice about others through the symbol of 'bag'. Thirdly, regarding the time-traveling retro-style youth theatre of the 'a jam-packed Bus', it was evaluated to be a well-made retro-style youth theatre. Regarding the 'Lunar Eclipse', which showed the aesthetic of the relationship, scenes were evaluated to be built with omission and restraint. Regarding "B Officer on and Love Letter", it was evaluated to be adapted to a musical from Hyun Jingun's novel, which was released 100 years ago. Lastly, the performance desired by the youth evaluation team was a performance with a high level of 'sympathy' and 'education'. In other words, they preferred performances that empathize with the emotions and thoughts of teenagers, and on the other hand, they wanted to see performances that allowed them to see the world broadly outside their own worlds. If youth theater is created by referring to the evaluation of youth as it is in this study, the audience will be more sympathetic to performances.

A Study on the Palsapum (八賜品, Eight-Bestowed Things), Treasure No. 440, in Tong-Yong Shrine to the Loyal Dead in Korea (보물 제440호 통영 충렬사 팔사품(八賜品) 연구)

  • Jang, Kyung-hee
    • Journal of Korean Historical Folklife
    • /
    • no.46
    • /
    • pp.195-237
    • /
    • 2014
  • Palsapum are ornaments to reveal the purpose of commander of three naval forces as well as symbols to remember the greatness of admiral Yi, Sun-Shin. In 1966, ther were designated as a treasure No. 440 based on their value; however, they have not received attention from academia because they are relics from China. This study compares and analyzes the document, paintings, and relevant references from Korea and China focusing on Palsapum, understands their formal characteristics, and examines their historical value such as years and location of creation. As a result, the study determines five of them are original, but three of them were newly created by the later generations. The five, Dodogin (都督印, Commander's seal)·Yeongpae (令牌, Commander's tablet)·Gwido (鬼刀, Replica of the devil sword)·Chamdo (斬刀, Replica of the decapitation swor d)·and Gognapal (bugle) were created by Ming Dynasty before 1598, and delivered by the hands of General Chen Lin. The other three, Dokjeongi (督戰 旗, Battle flag)·Hongsoryeonggi (紅小令旗, Commander's flag)·and Namsoryeonggi (藍小令旗, Commander's flag), were created in 19th century by Joseon Dynasty. After analysis on the former relics, the study determines that they are not official relics with the dignity of Ming Dynasty but personal relics with regional characteristics; in other words, Palsamun are not the royal gifts from Emperor Shenzong to Admiral Yi, Sun-Shin. but personal momentoes left by General Chen Lin in the Tongjeyoung to celebrate the admiral. The names, variety, numbers, and appurtenances of Palsapum have been changed with time as follows. First, the scholars of Jeseon in 17the century only focused on Dodogin. It was certainly created in Ming Dynasty; however, it was a personal stamp, so considered to be not from the emperor but from General Chen Lin. Second, Palsapum was called Palsamul and consisted of 14 pieces of 8 kinds in 18the century, ; it is confirmed on the 「Dosul(圖說, stories with pictures of」 『Yi Chungmugong Literary Collection』 The sizes of five relics including Dodogin are similar to the records, but their patterns and shapes are exotic, or cannot be found in Joseon. Thus, they reflect the regional characteristics of Guangdong province. Third, they were called Palsapum, and consisted on 15 pieces of 8 kinds in 19th century; it is confirmed on , a sixteen-fold folding screen drawn by Shin, Gwan-Ho in 1861. The stamp box, tablet bag, and three flags were newly created to engrave Joseon style letters and patterns on damageable materials such as leather and cloth. The relics easy to be destroyed have been renewed even after 19th century. Last, there are many misunderstandings about Palsapum by governmental indifference and improper management of records even though they were designated as a treasure in very early times. Thus, authorities should be concerned with Palsapum to provide the measures for stable maintenance of the relics; this will let people remember not only the history of cooperation between Korea and China to stop the Japanese ambition, but also Admiral Yi, Sun-Shin and General Chen Lin to bring victory in Japanese invasions of Korea.