• 제목/요약/키워드: library 4.0

검색결과 632건 처리시간 0.027초

키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법 (A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model)

  • 조원진;노상규;윤지영;박진수
    • Asia pacific journal of information systems
    • /
    • 제21권1호
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Field Studios of In-situ Aerobic Cometabolism of Chlorinated Aliphatic Hydrocarbons

  • Semprini, Lewts
    • 한국지하수토양환경학회:학술대회논문집
    • /
    • 한국지하수토양환경학회 2004년도 총회 및 춘계학술발표회
    • /
    • pp.3-4
    • /
    • 2004
  • Results will be presented from two field studies that evaluated the in-situ treatment of chlorinated aliphatic hydrocarbons (CAHs) using aerobic cometabolism. In the first study, a cometabolic air sparging (CAS) demonstration was conducted at McClellan Air Force Base (AFB), California, to treat chlorinated aliphatic hydrocarbons (CAHs) in groundwater using propane as the cometabolic substrate. A propane-biostimulated zone was sparged with a propane/air mixture and a control zone was sparged with air alone. Propane-utilizers were effectively stimulated in the saturated zone with repeated intermediate sparging of propane and air. Propane delivery, however, was not uniform, with propane mainly observed in down-gradient observation wells. Trichloroethene (TCE), cis-1, 2-dichloroethene (c-DCE), and dissolved oxygen (DO) concentration levels decreased in proportion with propane usage, with c-DCE decreasing more rapidly than TCE. The more rapid removal of c-DCE indicated biotransformation and not just physical removal by stripping. Propane utilization rates and rates of CAH removal slowed after three to four months of repeated propane additions, which coincided with tile depletion of nitrogen (as nitrate). Ammonia was then added to the propane/air mixture as a nitrogen source. After a six-month period between propane additions, rapid propane-utilization was observed. Nitrate was present due to groundwater flow into the treatment zone and/or by the oxidation of tile previously injected ammonia. In the propane-stimulated zone, c-DCE concentrations decreased below tile detection limit (1 $\mu$g/L), and TCE concentrations ranged from less than 5 $\mu$g/L to 30 $\mu$g/L, representing removals of 90 to 97%. In the air sparged control zone, TCE was removed at only two monitoring locations nearest the sparge-well, to concentrations of 15 $\mu$g/L and 60 $\mu$g/L. The responses indicate that stripping as well as biological treatment were responsible for the removal of contaminants in the biostimulated zone, with biostimulation enhancing removals to lower contaminant levels. As part of that study bacterial population shifts that occurred in the groundwater during CAS and air sparging control were evaluated by length heterogeneity polymerase chain reaction (LH-PCR) fragment analysis. The results showed that an organism(5) that had a fragment size of 385 base pairs (385 bp) was positively correlated with propane removal rates. The 385 bp fragment consisted of up to 83% of the total fragments in the analysis when propane removal rates peaked. A 16S rRNA clone library made from the bacteria sampled in propane sparged groundwater included clones of a TM7 division bacterium that had a 385bp LH-PCR fragment; no other bacterial species with this fragment size were detected. Both propane removal rates and the 385bp LH-PCR fragment decreased as nitrate levels in the groundwater decreased. In the second study the potential for bioaugmentation of a butane culture was evaluated in a series of field tests conducted at the Moffett Field Air Station in California. A butane-utilizing mixed culture that was effective in transforming 1, 1-dichloroethene (1, 1-DCE), 1, 1, 1-trichloroethane (1, 1, 1-TCA), and 1, 1-dichloroethane (1, 1-DCA) was added to the saturated zone at the test site. This mixture of contaminants was evaluated since they are often present as together as the result of 1, 1, 1-TCA contamination and the abiotic and biotic transformation of 1, 1, 1-TCA to 1, 1-DCE and 1, 1-DCA. Model simulations were performed prior to the initiation of the field study. The simulations were performed with a transport code that included processes for in-situ cometabolism, including microbial growth and decay, substrate and oxygen utilization, and the cometabolism of dual contaminants (1, 1-DCE and 1, 1, 1-TCA). Based on the results of detailed kinetic studies with the culture, cometabolic transformation kinetics were incorporated that butane mixed-inhibition on 1, 1-DCE and 1, 1, 1-TCA transformation, and competitive inhibition of 1, 1-DCE and 1, 1, 1-TCA on butane utilization. A transformation capacity term was also included in the model formation that results in cell loss due to contaminant transformation. Parameters for the model simulations were determined independently in kinetic studies with the butane-utilizing culture and through batch microcosm tests with groundwater and aquifer solids from the field test zone with the butane-utilizing culture added. In microcosm tests, the model simulated well the repetitive utilization of butane and cometabolism of 1.1, 1-TCA and 1, 1-DCE, as well as the transformation of 1, 1-DCE as it was repeatedly transformed at increased aqueous concentrations. Model simulations were then performed under the transport conditions of the field test to explore the effects of the bioaugmentation dose and the response of the system to tile biostimulation with alternating pulses of dissolved butane and oxygen in the presence of 1, 1-DCE (50 $\mu$g/L) and 1, 1, 1-TCA (250 $\mu$g/L). A uniform aquifer bioaugmentation dose of 0.5 mg/L of cells resulted in complete utilization of the butane 2-meters downgradient of the injection well within 200-hrs of bioaugmentation and butane addition. 1, 1-DCE was much more rapidly transformed than 1, 1, 1-TCA, and efficient 1, 1, 1-TCA removal occurred only after 1, 1-DCE and butane were decreased in concentration. The simulations demonstrated the strong inhibition of both 1, 1-DCE and butane on 1, 1, 1-TCA transformation, and the more rapid 1, 1-DCE transformation kinetics. Results of tile field demonstration indicated that bioaugmentation was successfully implemented; however it was difficult to maintain effective treatment for long periods of time (50 days or more). The demonstration showed that the bioaugmented experimental leg effectively transformed 1, 1-DCE and 1, 1-DCA, and was somewhat effective in transforming 1, 1, 1-TCA. The indigenous experimental leg treated in the same way as the bioaugmented leg was much less effective in treating the contaminant mixture. The best operating performance was achieved in the bioaugmented leg with about over 90%, 80%, 60 % removal for 1, 1-DCE, 1, 1-DCA, and 1, 1, 1-TCA, respectively. Molecular methods were used to track and enumerate the bioaugmented culture in the test zone. Real Time PCR analysis was used to on enumerate the bioaugmented culture. The results show higher numbers of the bioaugmented microorganisms were present in the treatment zone groundwater when the contaminants were being effective transformed. A decrease in these numbers was associated with a reduction in treatment performance. The results of the field tests indicated that although bioaugmentation can be successfully implemented, competition for the growth substrate (butane) by the indigenous microorganisms likely lead to the decrease in long-term performance.

  • PDF