• Title/Summary/Keyword: labeled value

Search Result 124, Processing Time 0.022 seconds

X-tree Diff: An Efficient Change Detection Algorithm for Tree-structured Data (X-tree Diff: 트리 기반 데이터를 위한 효율적인 변화 탐지 알고리즘)

  • Lee, Suk-Kyoon;Kim, Dong-Ah
    • The KIPS Transactions:PartC
    • /
    • v.10C no.6
    • /
    • pp.683-694
    • /
    • 2003
  • We present X-tree Diff, a change detection algorithm for tree-structured data. Our work is motivated by need to monitor massive volume of web documents and detect suspicious changes, called defacement attack on web sites. From this context, our algorithm should be very efficient in speed and use of memory space. X-tree Diff uses a special ordered labeled tree, X-tree, to represent XML/HTML documents. X-tree nodes have a special field, tMD, which stores a 128-bit hash value representing the structure and data of subtrees, so match identical subtrees form the old and new versions. During this process, X-tree Diff uses the Rule of Delaying Ambiguous Matchings, implying that it perform exact matching where a node in the old version has one-to one corrspondence with the corresponding node in the new, by delaying all the others. It drastically reduces the possibility of wrong matchings. X-tree Diff propagates such exact matchings upwards in Step 2, and obtain more matchings downwsards from roots in Step 3. In step 4, nodes to ve inserted or deleted are decided, We aldo show thst X-tree Diff runs on O(n), woere n is the number of noses in X-trees, in worst case as well as in average case, This result is even better than that of BULD Diff algorithm, which is O(n log(n)) in worst case, We experimented X-tree Diff on reat data, which are about 11,000 home pages from about 20 wev sites, instead of synthetic documets manipulated for experimented for ex[erimentation. Currently, X-treeDiff algorithm is being used in a commeercial hacking detection system, called the WIDS(Web-Document Intrusion Detection System), which is to find changes occured in registered websites, and report suspicious changes to users.

A Study on the Online Newspaper Archive : Focusing on Domestic and International Case Studies (온라인 신문 아카이브 연구 국내외 구축 사례를 중심으로)

  • Song, Zoo Hyung
    • The Korean Journal of Archival Studies
    • /
    • no.48
    • /
    • pp.93-139
    • /
    • 2016
  • Aside from serving as a body that monitors and criticizes the government through reviews and comments on public issues, newspapers can also form and spread public opinion. Metadata contains certain picture records and, in the case of local newspapers, the former is an important means of obtaining locality. Furthermore, advertising in newspapers and the way of editing in newspapers can be viewed as a representation of the times. For the value of archiving in newspapers when a documentation strategy is established, the newspaper is considered as a top priority that should be collected. A newspaper archive that will handle preservation and management carries huge significance in many ways. Journalists use them to write articles while scholars can use a newspaper archive for academic purposes. Also, the NIE is a type of a practical usage of such an archive. In the digital age, the newspaper archive has an important position because it is located in the core of MAM, which integrates and manages the media asset. With this, there are prospects that an online archive will perform a new role in the production of newspapers and the management of publishing companies. Korea Integrated News Database System (KINDS), an integrated article database, began its service in 1991, whereas Naver operates an online newspaper archive called "News Library." Initially, KINDS received an enthusiastic response, but nowadays, the utilization ratio continues to decrease because of the omission of some major newspapers, such as Chosun Ilbo and JoongAng Ilbo, and the numerous user interface problems it poses. Despite these, however, the system still presents several advantages. For example, it is easy to access freely because there is a set budget for the public, and accessibility to local papers is simple. A national library consistently carries out the digitalization of time-honored newspapers. In addition, individual newspaper companies have also started the service, but it is not enough for such to be labeled an archive. In the United States (US), "Chronicling America"-led by the Library of Congress with funding from the National Endowment for the Humanities-is in the process of digitalizing historic newspapers. The universities of each state and historical association provide funds to their public library for the digitalization of local papers. In the United Kingdom, the British Library is constructing an online newspaper archive called "The British Newspaper Archive," but unlike the one in the US, this service charges a usage fee. The Joint Information Systems Committee has also invested in "The British Newspaper Archive," and its construction is still ongoing. ProQuest Archiver and Gale NewsVault are the representative platforms because of their efficiency and how they have established the standardization of newspapers. Now, it is time to change the way we understand things, and a drastic investment is required to improve the domestic and international online newspaper archive.

Studies on Glycolipids in Bacteria -Part II. On the Structure of Glycolipid of Selenomonas ruminantium- (세균(細菌)의 당지질(糖脂質)에 관(關)한 연구(硏究) -제2보(第二報) Selenomonas ruminantium의 당지질(糖脂質)의 구조(構造)-)

  • Kim, Kyo-Chang
    • Applied Biological Chemistry
    • /
    • v.17 no.2
    • /
    • pp.125-137
    • /
    • 1974
  • The chemical structure of glycolipid of Selenomonas ruminantium cell wall was to be elucidated. The bacterial cells were treated in hot TCA and the glycolipid fractions were extracted by the solvent $CHCl_3\;:\;CH_3OH$ (1 : 3). The extracted glycolipids fraction was further separated by acetone extraction. The acetone soluble fraction was named as the spot A-compound. The acetone insoluble but ether soluble fraction was named as the spot B-compound. These two compounds were examined for elucidation of their chemical structure. The results were as follows: 1. The IR spectral analysis showed that O-acyl and N-acyl fatty acids were linked to glucosamine moiety in the spot A-compound. However in the spot B-compound in addition to O and N-acyl acids phosphorus was shown to be attached to glucosamine. 2. It was recognized by gas liquid chromatography that spot A compound contained beta-OH $C_{13:0}$ fatty acid in predominance in addition to the fatty acid with beta-OH $C_{9:0}$, whereas the spot B compound was composed of the predominant fatty acid of beta-OH $C_{13:0}$ with small amount of beta-OH $C_{9:0}$. 3. According to the paper chromatographic analysis of hydrazinolysis products of the spot A compound, a compound of a similar Rf value as the chitobiose was recognized, which indicated a structure of two molecules glucosamine condensed. The low Rf value of the hydrazinolysis product of the spot B-compound confirmed the presence of phosphorus attached to glucosamine. 4. The appearance of arabinose resulting from. ninhydrin decomposition of the acid hydrolyzate of the spot A compound indicated that the amino group is attached to $C_2$ of glucosamine. 5. The amount of glucosamine in the N-acetylated spot A compound decreased in half of the original content by the treatment. with $NaBH_4$, indicating that there are two molecules of glucosamines in the spot A compound. The presence of 1, 6-linkage between two molecules of glucosamine was suggested by the Morgan-Elson reaction and confirmed by the periodate decomposition test. 6. By the action of ${\beta}-N-acetyl$ glucosaminidase the N-acetylated spot A compound was completely decomposed into N-acetyl glucosamine, whereas the spot B compound was not. This indicated the spot A compound has a beta-linkage. 7. When phosphodiesterase or phosphomonoesterase acted on $^{32}P-labeled$ spot B compound, $^{32}P$ was not released by phosphodiesterase, but completely released by phosphomonoesterase. This indicated that one phosphorus is linked to glucosamine moiety. 8. The spot A compound is assumed to have the following chemical structure: That is glucosaminyl, ${\beta}-1$, 6-glucosamine to which O-acyl and N-acyl fatty acids are linked, of which the predominant fatty acid is beta-OH $C_{13:0}$ fatty acid in addition to beta-OH $C_{9:0}$ fatty acid 9. The spot B compound is likely to have the linkage of $glucosaminyl-{\beta}-1$, 6-glucosamine to which phosphorus is linked in monoester linkage. Furthermore both O-acyl and N-acyl fatty acids contained beta-OH $C_{13:0}$ fatty acid predominantly in addition to beta-OH $C_{9:0}$ fatty acid.

  • PDF

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.