• 제목/요약/키워드: semantic topic

검색결과 189건 처리시간 0.023초

Bag of Visual Words Method based on PLSA and Chi-Square Model for Object Category

  • Zhao, Yongwei;Peng, Tianqiang;Li, Bicheng;Ke, Shengcai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권7호
    • /
    • pp.2633-2648
    • /
    • 2015
  • The problem of visual words' synonymy and ambiguity always exist in the conventional bag of visual words (BoVW) model based object category methods. Besides, the noisy visual words, so-called "visual stop-words" will degrade the semantic resolution of visual dictionary. In view of this, a novel bag of visual words method based on PLSA and chi-square model for object category is proposed. Firstly, Probabilistic Latent Semantic Analysis (PLSA) is used to analyze the semantic co-occurrence probability of visual words, infer the latent semantic topics in images, and get the latent topic distributions induced by the words. Secondly, the KL divergence is adopt to measure the semantic distance between visual words, which can get semantically related homoionym. Then, adaptive soft-assignment strategy is combined to realize the soft mapping between SIFT features and some homoionym. Finally, the chi-square model is introduced to eliminate the "visual stop-words" and reconstruct the visual vocabulary histograms. Moreover, SVM (Support Vector Machine) is applied to accomplish object classification. Experimental results indicated that the synonymy and ambiguity problems of visual words can be overcome effectively. The distinguish ability of visual semantic resolution as well as the object classification performance are substantially boosted compared with the traditional methods.

Semi Automatic Ontology Generation about XML Documents

  • Gu Mi Sug;Hwang Jeong Hee;Ryu Keun Ho;Jung Doo Yeong;Lee Keum Woo
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2004년도 Proceedings of ISRS 2004
    • /
    • pp.730-733
    • /
    • 2004
  • Recently XML (eXtensible Markup Language) is becoming the standard for exchanging the documents on the web. And as the amount of information is increasing because of the development of the technique in the Internet, semantic web is becoming to appear for more exact result of information retrieval than the existing one on the web. Ontology which is the basis of the semantic web provides the basic knowledge system to express a particular knowledge. So it can show the exact result of the information retrieval. Ontology defines the particular concepts and the relationships between the concepts about specific domain and it has the hierarchy similar to the taxonomy. In this paper, we propose the generation of semi-automatic ontology based on XML documents that are interesting to many researchers as the means of knowledge expression. To construct the ontology in a particular domain, we suggest the algorithm to determine the domain. So we determined that the domain of ontology is to extract the information of movie on the web. And we used the generalized association rules, one of data mining methods, to generate the ontology, using the tag and contents of XML documents. And XTM (XML Topic Maps), ISO Standard, is used to construct the ontology as an ontology language. The advantage of this method is that because we construct the ontology based on the terms frequently used documents related in the domain, it is useful to query and retrieve the related domain.

  • PDF

간호교육기관의 교육목적 및 교육목표에 대한 토픽 모델링 (Educational goals and objectives of nursing education programs: Topic modeling)

  • 박은준;옥종선;박찬숙
    • 한국간호교육학회지
    • /
    • 제28권4호
    • /
    • pp.400-410
    • /
    • 2022
  • Purpose: This study aimed to understand the keywords and major topics of the educational goals and objectives of nursing educational institutions in South Korea. Methods: From May 10 to May 20, 2022, the educational goals and objectives of all 201 nursing educational institutions in South Korea were collected. Using the NetMiner program, degree and degree centrality, semantic structure, and topic modeling were analyzed. Results: The top keywords and semantic structures of educational goals included 'respect for human (life)-spirit-science-based on, global-competency-professional nurse-nursing personnel-training, professional-science-knowledge-skills, and patients-therapeutic care-relationship.' The educational goals' major topics were clients well-being based on science and respect for human life, a practicing nurse with capabilities and spirit, fostering a nursing personnel with creativity and professionalism, and training of global nurses. The top keywords and semantic structures of the educational objectives included 'holistic care-nursing-research-action-capability, critical thinking-health-problem solving-capability, and efficiency-communication-collaboration-capability.' The educational objectives' major topics were 'nursing professionalism, communication and problem-solving capability; a change of healthcare environments and a progress of nursing practices; fostering professional nurses with creativity and global capability; and clients' health and nursing practice.' Conclusion: Educational goals in nursing presented specific nursing values and concepts, such as respect for human life, therapeutic care relationships, and the promotion of well-being. Educational objectives in nursing presented the competencies of nurses as defined by the Korean Accreditation Board of Nursing Education (KABONE). Recently, the KABONE announced new program outcomes and competencies, which will require the revision of educational goals. To achieve those educational objectives, it is suggested that the expected level of competencies be clearly defined for nursing graduates.

온톨로지 인스턴스 구축을 위한 주제 중심 웹문서 수집에 관한 연구 (A Study on Focused Crawling of Web Document for Building of Ontology Instances)

  • 장문수
    • 한국지능시스템학회논문지
    • /
    • 제18권1호
    • /
    • pp.86-93
    • /
    • 2008
  • 복잡한 의미관계를 정의하는 온톨로지를 구축하는 일은 매우 정밀하고 전문적인 작업이다. 잘 구축된 온톨로지를 응용 시스템에 활용하기 위해서는 온톨로지 클래스에 대한 많은 인스턴스 정보를 구축해야 한다. 본 논문은 온톨로지 인스턴스 정보 추출을 위하여 방대한 양의 웹 문서로부터 주어진 주제에 적합한 문서만을 추출하는 주제 중심 웹 문서 수집 알고리즘을 제안하고, 이 알고리즘을 바탕으로 문서 수집 시스템을 개발한다. 제안하는 문서 수집 알고리즘은 URL의 패턴을 이용하여 주제에 적합한 링크만을 추출함으로써 빠른 속도의 문서 수집을 가능하게 한다. 또한 링크 블록 텍스트에 대한 퍼지집합으로 표현된 주제 적합도는 문서의 주제 관련성을 지능적으로 판단하여 주제 중심 문서 수집의 정확도를 향상시킨다.

Analysis of International Research Trends on Metaverse

  • Mina, Shim
    • International Journal of Advanced Culture Technology
    • /
    • 제10권4호
    • /
    • pp.453-459
    • /
    • 2022
  • This study attempted to explore the realization and research direction of a successful metaverse environment in the future by analyzing international research trends of the metaverse using topic modeling. A total of 208 papers among WoS and ScienceDirect papers using metaverse as keywords were selected, and quantitative frequency analysis and topic modeling were performed. As a result, it was confirmed that research has rapidly increased after 2022. The main keywords of the research topics were 'second', 'life', 'learning', 'reality', 'metaverse', 'virtual', 'blockchain', 'nft', 'medical', 'avatar', etc. The topic keywords 'Second life & Education' and 'Virtual Reality & Medical' accounted for a large proportion of 57%, followed by 'Blockchain & Cryptocurrency', 'Avatar & Interaction', and 'Sensing and Device'. As a result of semantic analysis, current metaverse research is focused on application and utilization, and research on underlying technologies and devices is also active. Therefore, it is necessary to identify the commonalities and differences between domestic and foreign studies, and to study the application method considering the domestic environment. In addition, new jurisprudence research is more necessary along with predicting new problems. It is expected that the results of study will provide the right research direction for domestic researchers in the era of digital transformation and contribute to the realization of a digital society.

A Development Method of Framework for Collecting, Extracting, and Classifying Social Contents

  • Cho, Eun-Sook
    • 한국컴퓨터정보학회논문지
    • /
    • 제26권1호
    • /
    • pp.163-170
    • /
    • 2021
  • 빅데이터가 여러 분야에서 다양하게 접목됨에 따라 빅데이터 시장이 하드웨어로부터 시작해서 서비스 소프트웨어 부문으로 확장되고 있다. 특히 빅데이터 의미 파악 및 이해 능력, 분석 결과 등 총체적이고 직관적인 시각화를 위하여 애플리케이션을 제공하는 거대 플랫폼 시장으로 확대되고 있다. 그 중에서 SNS(Social Network Service) 등과 같은 소셜 미디어를 활용한 빅데이터 추출 및 분석에 대한 수요가 기업 뿐만 아니라 개인에 이르기까지 매우 활발히 진행되고 있다. 그러나 이처럼 사용자 트렌드 분석과 마케팅을 위한 소셜 미디어 데이터의 수집 및 분석에 대한 많은 수요에도 불구하고, 다양한 소셜 미디어 서비스 인터페이스의 이질성으로 인한 동적 연동의 어려움과 소프트웨어 플랫폼 구축 및 운영의 복잡성을 해결하기 위한 연구가 미흡한 상태이다. 따라서 본 논문에서는 소셜 미디어 데이터의 수집에서 추출 및 분류에 이르는 과정을 하나로 통합하여 운영할 수 있는 프레임워크를 개발하는 방법에 대해 제시한다. 제시된 프레임워크는 이질적인 소셜 미디어 데이터 수집 채널의 문제를 어댑터 패턴을 통해 해결하고, 의미 연관성 기반 추출 기법과 주제 연관성 기반 분류 기법을 통해 소셜 토픽 추출과 분류의 정확성을 높였다.

Non-Simultaneous Sampling Deactivation during the Parameter Approximation of a Topic Model

  • Jeong, Young-Seob;Jin, Sou-Young;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권1호
    • /
    • pp.81-98
    • /
    • 2013
  • Since Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) were introduced, many revised or extended topic models have appeared. Due to the intractable likelihood of these models, training any topic model requires to use some approximation algorithm such as variational approximation, Laplace approximation, or Markov chain Monte Carlo (MCMC). Although these approximation algorithms perform well, training a topic model is still computationally expensive given the large amount of data it requires. In this paper, we propose a new method, called non-simultaneous sampling deactivation, for efficient approximation of parameters in a topic model. While each random variable is normally sampled or obtained by a single predefined burn-in period in the traditional approximation algorithms, our new method is based on the observation that the random variable nodes in one topic model have all different periods of convergence. During the iterative approximation process, the proposed method allows each random variable node to be terminated or deactivated when it is converged. Therefore, compared to the traditional approximation ways in which usually every node is deactivated concurrently, the proposed method achieves the inference efficiency in terms of time and memory. We do not propose a new approximation algorithm, but a new process applicable to the existing approximation algorithms. Through experiments, we show the time and memory efficiency of the method, and discuss about the tradeoff between the efficiency of the approximation process and the parameter consistency.

청소년 임신에 대한 연구 동향 분석: 텍스트 네트워크 분석과 토픽 모델링 (A study on research trends for pregnancy in adolescence: Focusing on text network analysis and topic modeling)

  • 박승미;곽은주;박혜옥;홍정은
    • 한국간호교육학회지
    • /
    • 제30권2호
    • /
    • pp.149-159
    • /
    • 2024
  • Purpose: The aim of this study was to identify core keywords and topic groups in the "adolescent pregnancy" field of research for a better understanding of research trends in the past 10 years. Methods: Topics related to adolescent pregnancy were extracted from 3,819 articles that were published in journals between January 2013 and July 2023. Abstracts were retrieved from five databases (MEDLINE, CINAHL, Embase, RISS, and KISS). Keywords were extracted from the abstracts and cleaned using semantic morphemes. Text network analysis and topic modeling were performed using NetMiner 4.3.3. Results: The most important keywords were "health," "woman," "risk," "group," "girl," "school," "service," "family," "program," and "contraception." Five topic groups were identified through topic modeling. Through the topic modeling analysis, five themes were derived: "health service," "community program for school girls," "risks for adult women," "relationship risks," and "sexual contraceptive knowledge." Conclusion: This study utilized text network analysis and topic modeling to analyze keywords from abstracts of research conducted over the past decade on adolescent pregnancy. Given that adolescent pregnancy leads to physical, mental, social, and economic issues, it is imperative to provide integrated intervention programs, including prenatal/postnatal care, psychological services, proper contraception methods, and sex education, through school and community partnerships, as well as related research studies. Nurses can play a vital role by actively engaging in prevention efforts and directly supporting and educating socially disadvantaged adolescent mothers, which could significantly contribute to improving their quality of life.

언어 네트워크 분석을 통한 IFLA의 학교도서관 가이드라인 비교·분석에 관한 연구 (A Comparative Analysis Study of IFLA School Library Guidelines Using Semantic Network Analysis)

  • 이병기
    • 한국도서관정보학회지
    • /
    • 제51권2호
    • /
    • pp.1-21
    • /
    • 2020
  • 본 연구는 언어 네트워크 분석을 통해 IFLA의 학교도서관 가이드라인의 언어적 의미를 파악하는데 목적이 있다. IFLA의 학교도서관 가이드라인은 2002년 초판과 2015년에 개정한 제2판이 있다. 본 연구는 학교도서관 가이드라인의 2002년판과 2015년판을 언어 네트워크의 관점에서 분석하고, 상호 비교하였다. 대상 테스트로부터 키워드들을 추출하고 동시출현관계를 바탕으로 언어 네트워크를 구성하였다. 동시출현 네트워크로부터 중심성(연결정도 중심성, 근접 중심성, 매개 중심성)을 분석하였다. 또한, 본 연구는 넷마이너4.0의 LDA 기능을 사용하여 토픽모델링 분석을 수행하였다. 본 연구의 주요 결과는 다음과 같다. 첫째, 중심성 차원에서 비교해 보면, 2015년판에서 'Program, Teaching, Reading, Inquiry, Literacy, Media' 등의 키워드가 2002년판에 비해 높게 나타나고 있다. 둘째, 2002년판의 중심성 상위 리스트에서 보이지 않던 'Inquiry'와 'Achievement' 키워드가 2015년판의 연결정도 중심성과 근접중심성에 새롭게 출현하고 있다. 셋째, 토픽 모델링의 분석 결과, 2002년판에 비해 2015년판은 학교도서관 서비스, 사서교사의 교수학습 활동, 미디어 및 정보활용교육, 교육과정 참여 등에 관한 토픽의 비중이 높아지고 있다.

PC-SAN: Pretraining-Based Contextual Self-Attention Model for Topic Essay Generation

  • Lin, Fuqiang;Ma, Xingkong;Chen, Yaofeng;Zhou, Jiajun;Liu, Bo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권8호
    • /
    • pp.3168-3186
    • /
    • 2020
  • Automatic topic essay generation (TEG) is a controllable text generation task that aims to generate informative, diverse, and topic-consistent essays based on multiple topics. To make the generated essays of high quality, a reasonable method should consider both diversity and topic-consistency. Another essential issue is the intrinsic link of the topics, which contributes to making the essays closely surround the semantics of provided topics. However, it remains challenging for TEG to fill the semantic gap between source topic words and target output, and a more powerful model is needed to capture the semantics of given topics. To this end, we propose a pretraining-based contextual self-attention (PC-SAN) model that is built upon the seq2seq framework. For the encoder of our model, we employ a dynamic weight sum of layers from BERT to fully utilize the semantics of topics, which is of great help to fill the gap and improve the quality of the generated essays. In the decoding phase, we also transform the target-side contextual history information into the query layers to alleviate the lack of context in typical self-attention networks (SANs). Experimental results on large-scale paragraph-level Chinese corpora verify that our model is capable of generating diverse, topic-consistent text and essentially makes improvements as compare to strong baselines. Furthermore, extensive analysis validates the effectiveness of contextual embeddings from BERT and contextual history information in SANs.