• Title/Summary/Keyword: 주제 중심 문서 수집

Search Result 16, Processing Time 0.028 seconds

A Study on Focused Crawling of Web Document for Building of Ontology Instances (온톨로지 인스턴스 구축을 위한 주제 중심 웹문서 수집에 관한 연구)

  • Chang, Moon-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.86-93
    • /
    • 2008
  • The construction of ontology defines as complicated semantic relations needs precise and expert skills. For the well defined ontology in real applications, plenty of information of instances for ontology classes is very critical. In this study, crawling algorithm which extracts the fittest topic from the Web overflowing over by a great number of documents has been focused and developed. Proposed crawling algorithm made a progress to gather documents at high speed by extracting topic-specific Link using URL patterns. And topic fitness of Link block text has been represented by fuzzy sets which will improve a precision of the focused crawler.

A Focused Crawler by Segmentation of Context Information (주변정보 분할을 이용한 주제 중심 웹 문서 수집기)

  • Cho, Chang-Hee;Lee, Nam-Yong;Kang, Jin-Bum;Yang, Jae-Young;Choi, Joong-Min
    • The KIPS Transactions:PartB
    • /
    • v.12B no.6 s.102
    • /
    • pp.697-702
    • /
    • 2005
  • The focused crawler is a topic-driven document-collecting crawler that was suggested as a promising alternative of maintaining up-to-date web document Indices in search engines. A major problem inherent in previous focused crawlers is the liability of missing highly relevant documents that are linked from off-topic documents. This problem mainly originated from the lack of consideration of structural information in a document. Traditional weighting method such as TFIDF employed in document classification can lead to this problem. In order to improve the performance of focused crawlers, this paper proposes a scheme of locality-based document segmentation to determine the relevance of a document to a specific topic. We segment a document into a set of sub-documents using contextual features around the hyperlinks. This information is used to determine whether the crawler would fetch the documents that are linked from hyperlinks in an off-topic document.

Efficient Document Classification for Web Document Collection (웹 문서 수집을 위한 효율적인 문서 분류)

  • Lee, Jung-Hun;Cheon, Suh-Hyun;Kim, Sun-Hee
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.397-401
    • /
    • 2006
  • 최근 다양한 형식의 웹 문서에서 사용자가 원하는 정보만을 검색 하기위해 웹 문서를 주제별로 분류하여 수집하고, 관리하는 것은 필수적인 요소이다. 즉, 정확하고 빠른 정보 검색을 위한 웹 문서 수집은 문서 형식에 따라 분류되어 수집 되어야 한다. 따라서 웹 환경에서 문서를 구성하는 형식을 텍스트나 이미지 데이터로 구분하고 그 형식에 맞는 분류기법을 사용한다면 정확한 정보 검색이 이루어 질수 있다. 본 논문에서는 텍스트와 URL을 이용한 주제 중심의 하이브리드 웹 문서 분류 방법을 제안한다. 텍스트와 URL을 이용한 분류 방법은 텍스트 형식은 주제 중심의 문서 분류방식을 사용하며, 텍스트 정보의 효용성이 낮은 경우 URL의 주제 분포도를 이용하여 분류하며 수집한다. 이를 통해 여러 가지 형식의 웹 문서가 분류 가능하며, 주제에 따른 문서 분류의 정확도가 높아진다.

  • PDF

Graph Learning System for Analyzing Bias among News Using Keyword Distance Model (주제어 문장거리를 이용한 뉴스 편향성 분석 그래프 학습)

  • Cho Chanwoo;Cho Chanhyung
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.533-538
    • /
    • 2023
  • 문서에서 저자의 의도와 주제, 그 안에 포함된 감성을 분석하는 것은 자연어 연구의 핵심적인 주제이다. 이와 유사하게 특정 글에 포함된 정치적 문화적 편향을 분석하는 것 역시 매우 의미 있는 연구주제이다. 우리는 최근 발생한 한 사건에 대하여 여러 신문사와 해당 신문사에서 생산한 기사를 중심으로 해당 글의 정치적 편향을 정량화 하는 방법을 제시한다. 그 방법은 선택된 주제어들의 문장 공간에서의 거리를 중심으로 그래프를 생성하고, 생성된 그래프의 기계학습을 통하여 편향과 특징을 분석하였다. 그리고 그 그래프들의 시간적 변화를 추적하여 특정 신문사에서 특정 사건에 대한 입장이 시간적으로 어떻게 변화하였는지를 동적으로 보여주는 그래프 애니메이션 시스템을 개발하였다. 실험을 위하여 최근 이슈에 대하여 12개의 신문사에서 약 2000여 개의 기사를 수집하였다. 그 결과, 약 82%의 정확도로 일반적으로 알려진 정치적 편향을 예측할 수 있었다. 또한, 학습 데이터에 쓰이지 않은 신문기사를 활용하여도 같은 정도의 정확도를 보임을 알 수 있었다. 우리는 이를 통하여 신문기사에서의 정치적 편향은 작성자나 신문사의 특성이 아니라 주제어들의 문장 공간에서의 거리 관계로 특성화할 수 있음을 보였다. 할 수 있다.

  • PDF

RSS Channel Recommendation System using Focused Crawler (주제 중심 수집기를 이용한 RSS 채널 추천 시스템)

  • Lee, Young-Seok;Cho, Jung-Woo;Kim, Jun-Il;Choi, Byung-Uk
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.6 s.312
    • /
    • pp.52-59
    • /
    • 2006
  • Recently, the internet has seen tremendous growth with plenty of enriched information due to an increasing number of specialized personal interests and popularizations of private cyber space called, blog. Many of today's blog provide internet users, RSS, which is also hewn as the syndication technology. It enables blog users to receive update automatically by registering their RSS channel address with RSS aggregator. In other words, it keeps internet users wasting their time checking back the web site for update. This paper propose the ways to manage RSS Channel Searching Crawler and collected RSS Channels for internet users to search for a specific RSS channel of their want without any obstacles. At the same time. This paper proposes RSS channel ranking based on user popularity. So, we focus on an idea of adding index to information and web update for users to receive appropriate information according to user property.

A Study on the Direction of Art Policy through Semantic Network Analysis in New Normal Era (뉴노멀(New Normal) 시대 언어네트워크 분석에 의한 예술정책 방향 연구)

  • Kim, Mi Yeon;Kwon, Byeong Woong
    • Korean Association of Arts Management
    • /
    • no.58
    • /
    • pp.153-177
    • /
    • 2021
  • This study attempted to analyze language networks based on the theory of art policy in the New Normal era triggered by COVID-19 and domestic and foreign policy trends. For analysis, data containing key words of "Corona" and "Art" were collected from Google News and Web documents from March to September 2020 to extract 227 refined subject words, and the extracted subject words were analyzed as indicators of frequency and centrality of subject words through the Netminor program. In addition, visualization analysis of semantic networks has been attempted for the analysis of relationships between each topic languages. As a result of the semantic network analysis, the most frequent topic was "Corona," and "Culture and Art," "Art," "Performance," "Online" and "Support" were included in the group with the most frequencies. In the centrality analysis, "Corona" was the most popular, followed by "the era," "after," "post," "art," and "cultural arts," with high frequency, "Corona," "art," and "cultural arts" also dominated most centrality. In particular, the top-level key words in the analysis of frequency and centrality of the topic are 'online' and 'support' and 'policy'. This can be seen as indicating that the rapid rise of non-face-to-face and online content and support policies for the artistic communities are needed due to the dailyization of social distance due to COVID-19.

The Qualitative study about parent's in law of multiculture family - Hermeneutical grounded theory methology - (다문화 가족 시부모들에 대한 연구 -해석학적 근거이론 방법 접근-)

  • Kim, Young Sook;Lee, Keun Moo
    • Korean Journal of Social Welfare Studies
    • /
    • v.42 no.2
    • /
    • pp.41-70
    • /
    • 2011
  • This research is to study the interaction of multiculture parents in law between. daughter in law. We approached hermeneutical grounded theory and six of multiculture family's parent's in law participated this research. Data were collected by the depth interview and various written records and we could construct 9 hermeutical theme by analyzing and categorizing 83 meaning units, 22 categories. We descripted the process of acculturation between parent's in law and daughter in law. The result as follows : ① sticking self dimensions → ② strategic acceptomce → ③ making shade of co-existence → ④ self dismantling and reconstruction. Finally we proposed the 「joint program of parent's in law and daughter in law for reinforcing culture unpetence」.

Chinese Communist Party's Management of Records & Archives during the Chinese Revolution Period (혁명시기 중국공산당의 문서당안관리)

  • Lee, Won-Kyu
    • The Korean Journal of Archival Studies
    • /
    • no.22
    • /
    • pp.157-199
    • /
    • 2009
  • The organization for managing records and archives did not emerge together with the founding of the Chinese Communist Party. Such management became active with the establishment of the Department of Documents (文書科) and its affiliated offices overseeing reading and safekeeping of official papers, after the formation of the Central Secretariat(中央秘書處) in 1926. Improving the work of the Secretariat's organization became the focus of critical discussions in the early 1930s. The main criticism was that the Secretariat had failed to be cognizant of its political role and degenerated into a mere "functional organization." The solution to this was the "politicization of the Secretariat's work." Moreover, influenced by the "Rectification Movement" in the 1940s, the party emphasized the responsibility of the Resources Department (材料科) that extended beyond managing documents to collecting, organizing and providing various kinds of important information data. In the mean time, maintaining security with regard to composing documents continued to be emphasized through such methods as using different names for figures and organizations or employing special inks for document production. In addition, communications between the central political organs and regional offices were emphasized through regular reports on work activities and situations of the local areas. The General Secretary not only composed the drafts of the major official documents but also handled the reading and examination of all documents, and thus played a central role in record processing. The records, called archives after undergoing document processing, were placed in safekeeping. This function was handled by the "Document Safekeeping Office(文件保管處)" of the Central Secretariat's Department of Documents. Although the Document Safekeeping Office, also called the "Central Repository(中央文庫)", could no longer accept, beginning in the early 1930s, additional archive transfers, the Resources Department continued to strengthen throughout the 1940s its role of safekeeping and providing documents and publication materials. In particular, collections of materials for research and study were carried out, and with the recovery of regions which had been under the Japanese rule, massive amounts of archive and document materials were collected. After being stipulated by rules in 1931, the archive classification and cataloguing methods became actively systematized, especially in the 1940s. Basically, "subject" classification methods and fundamental cataloguing techniques were adopted. The principle of assuming "importance" and "confidentiality" as the criteria of management emerged from a relatively early period, but the concept or process of evaluation that differentiated preservation and discarding of documents was not clear. While implementing a system of secure management and restricted access for confidential information, the critical view on providing use of archive materials was very strong, as can be seen in the slogan, "the unification of preservation and use." Even during the revolutionary movement and wars, the Chinese Communist Party continued their efforts to strengthen management and preservation of records & archives. The results were not always desirable nor were there any reasons for such experiences to lead to stable development. The historical conditions in which the Chinese Communist Party found itself probably made it inevitable. The most pronounced characteristics of this process can be found in the fact that they not only pursued efficiency of records & archives management at the functional level but, while strengthening their self-awareness of the political significance impacting the Chinese Communist Party's revolution movement, they also paid attention to the value possessed by archive materials as actual evidence for revolutionary policy research and as historical evidence of the Chinese Communist Party.

An Analysis of the Status of National Research and Development Projects in Records Management (기록관리 분야 국가연구개발사업 현황 분석)

  • Hoemyeong Jeong;Soonhee Kim
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.23 no.4
    • /
    • pp.137-157
    • /
    • 2023
  • The scale of research and development (R&D) investment is increasing to strengthen national competitiveness through technological innovation, leading to an increased interest in investment efficiency. In records management, the National Archives of Korea has been leading the national research and development project since 2008. Accordingly, this study analyzed R&D projects in records management regarding implementing organization, performance or outcomes, and subjects, targeting 111 National Archives of Korea contract research projects from 2008 to 2022. The analysis showed that small and medium-sized enterprises (SMEs) were the most likely to conduct research, the majority of the research outcomes were academic publications, and there were some discrepancies between the reported performance in research and the actual performance. In terms of research subjects, the most common type of records are paper or print documents, establishing an electronic management system among the National Archives' works. In terms of the frequency of keywords in the records management process and research projects, it was found that research was mainly conducted on "preservation." Meanwhile, only 10 cases, or 9% of the 111 projects, were found to be relevant in terms of utilizing big data and developing intelligent technologies related to digital transformation. Therefore, the effectiveness of the R&D project must be improved through follow-up management of the results even after the research project is completed. In addition, in terms of research topics, it was identified that aside from "preservation," studies focusing on "transfer," "classification," "evaluation," and "collection," as well as research that responds to digital transformation, are needed.

The Case Study of Using Technology in Education of Pre-service Mathematics Teachers. - Developing Materials Assisting Teaching-Learning for 7th-9th Grade Mathematics Classroom - (예비수학교사교육에서의 공학적 도구 활용 사례연구 - 7${\~}$9단계 수학수업과 연계된 교수$\cdot$학습보조자료 개발을 중심으로 -)

  • Kim, Nam-Hee
    • School Mathematics
    • /
    • v.7 no.4
    • /
    • pp.337-352
    • /
    • 2005
  • In this study, we carried out a case study with 38 pre-service mathematics teachers. A theoretical basis of this study is the 'technology principle' by NCTM(2000) and teaching-learning methods by the 7th curriculum. Using mathematics program(Grafeq.), we executed classroom activities for developing materials assisting teaching-teaming for 7th-9th grade mathematics. Pre-service mathematics teachers constructed mathematical designs for each grade by Grafeq. program. We tried to find the results for three research problems. On the basis of observation data, interview data and document materials, we analysed our results as follows. First, our activities help Pre-service mathematics teachers to examine and understand each grade mathematics. Second, we can developamathematicaldesignineachgrade mathematics. Therefore mathematical designs developed in this study can be used middle school mathematics classroom. Third, pre-service mathematics teachers gained the belief that the activities using mathematical in this study can be applied program. effectively to teaching and loaming school mathematics.

  • PDF