Search | Korea Science

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
- Asia pacific journal of information systems
- /
- v.21 no.1
- /
- pp.103-122
- /
- 2011
Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.
PDF KSCI

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

Lee, Minchul;Kim, Hea-Jin
- Journal of Intelligence and Information Systems
- /
- v.24 no.1
- /
- pp.183-203
- /
- 2018
News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.
https://doi.org/10.13088/jiis.2018.24.1.183 인용 PDF KSCI

The Relations between Financial Constraints and Dividend Smoothing of Innovative Small and Medium Sized Enterprises (혁신형 중소기업의 재무적 제약과 배당스무딩간의 관계)

Shin, Min-Shik;Kim, Soo-Eun
- Korean small business review
- /
- v.31 no.4
- /
- pp.67-93
- /
- 2009
The purpose of this paper is to explore the relations between financial constraints and dividend smoothing of innovative small and medium sized enterprises(SMEs) listed on Korea Securities Market and Kosdaq Market of Korea Exchange. The innovative SMEs is defined as the firms with high level of R&D intensity which is measured by (R&D investment/total sales) ratio, according to Chauvin and Hirschey (1993). The R&D investment plays an important role as the innovative driver that can increase the future growth opportunity and profitability of the firms. Therefore, the R&D investment have large, positive, and consistent influences on the market value of the firm. In this point of view, we expect that the innovative SMEs can adjust dividend payment faster than the noninnovative SMEs, on the ground of their future growth opportunity and profitability. And also, we expect that the financial unconstrained firms can adjust dividend payment faster than the financial constrained firms, on the ground of their financing ability of investment funds through the market accessibility. Aivazian et al.(2006) exert that the financial unconstrained firms with the high accessibility to capital market can adjust dividend payment faster than the financial constrained firms. We collect the sample firms among the total SMEs listed on Korea Securities Market and Kosdaq Market of Korea Exchange during the periods from January 1999 to December 2007 from the KIS Value Library database. The total number of firm-year observations of the total sample firms throughout the entire period is 5,544, the number of firm-year observations of the dividend firms is 2,919, and the number of firm-year observations of the non-dividend firms is 2,625. About 53%(or 2,919) of these total 5,544 observations involve firms that make a dividend payment. The dividend firms are divided into two groups according to the R&D intensity, such as the innovative SMEs with larger than median of R&D intensity and the noninnovative SMEs with smaller than median of R&D intensity. The number of firm-year observations of the innovative SMEs is 1,506, and the number of firm-year observations of the noninnovative SMEs is 1,413. Furthermore, the innovative SMEs are divided into two groups according to level of financial constraints, such as the financial unconstrained firms and the financial constrained firms. The number of firm-year observations of the former is 894, and the number of firm-year observations of the latter is 612. Although all available firm-year observations of the dividend firms are collected, deletions are made in the case of financial industries such as banks, securities company, insurance company, and other financial services company, because their capital structure and business style are widely different from the general manufacturing firms. The stock repurchase was involved in dividend payment because Grullon and Michaely (2002) examined the substitution hypothesis between dividends and stock repurchases. However, our data structure is an unbalanced panel data since there is no requirement that the firm-year observations data are all available for each firms during the entire periods from January 1999 to December 2007 from the KIS Value Library database. We firstly estimate the classic Lintner(1956) dividend adjustment model, where the decision to smooth dividend or to adopt a residual dividend policy depends on financial constraints measured by market accessibility. Lintner model indicates that firms maintain stable and long run target payout ratio, and that firms adjust partially the gap between current payout rato and target payout ratio each year. In the Lintner model, dependent variable is the current dividend per share(DPS_t), and independent variables are the past dividend per share(DPS_t-1) and the current earnings per share(EPS_t). We hypothesized that firms adjust partially the gap between the current dividend per share(DPS_t) and the target payout ratio(Ω) each year, when the past dividend per share(DPS_t-1) deviate from the target payout ratio(Ω). We secondly estimate the expansion model that extend the Lintner model by including the determinants suggested by the major theories of dividend, namely, residual dividend theory, dividend signaling theory, agency theory, catering theory, and transactions cost theory. In the expansion model, dependent variable is the current dividend per share(DPS_t), explanatory variables are the past dividend per share(DPS_t-1) and the current earnings per share(EPS_t), and control variables are the current capital expenditure ratio(CEA_t), the current leverage ratio(LEV_t), the current operating return on assets(ROA_t), the current business risk(RISK_t), the current trading volume turnover ratio(TURN_t), and the current dividend premium(DPREM_t). In these control variables, CEA_t, LEV_t, and ROA_t are the determinants suggested by the residual dividend theory and the agency theory, ROA_t and RISK_t are the determinants suggested by the dividend signaling theory, TURN_t is the determinant suggested by the transactions cost theory, and DPREM_t is the determinant suggested by the catering theory. Furthermore, we thirdly estimate the Lintner model and the expansion model by using the panel data of the financial unconstrained firms and the financial constrained firms, that are divided into two groups according to level of financial constraints. We expect that the financial unconstrained firms can adjust dividend payment faster than the financial constrained firms, because the former can finance more easily the investment funds through the market accessibility than the latter. We analyzed descriptive statistics such as mean, standard deviation, and median to delete the outliers from the panel data, conducted one way analysis of variance to check up the industry-specfic effects, and conducted difference test of firms characteristic variables between innovative SMEs and noninnovative SMEs as well as difference test of firms characteristic variables between financial unconstrained firms and financial constrained firms. We also conducted the correlation analysis and the variance inflation factors analysis to detect any multicollinearity among the independent variables. Both of the correlation coefficients and the variance inflation factors are roughly low to the extent that may be ignored the multicollinearity among the independent variables. Furthermore, we estimate both of the Lintner model and the expansion model using the panel regression analysis. We firstly test the time-specific effects and the firm-specific effects may be involved in our panel data through the Lagrange multiplier test that was proposed by Breusch and Pagan(1980), and secondly conduct Hausman test to prove that fixed effect model is fitter with our panel data than the random effect model. The main results of this study can be summarized as follows. The determinants suggested by the major theories of dividend, namely, residual dividend theory, dividend signaling theory, agency theory, catering theory, and transactions cost theory explain significantly the dividend policy of the innovative SMEs. Lintner model indicates that firms maintain stable and long run target payout ratio, and that firms adjust partially the gap between the current payout ratio and the target payout ratio each year. In the core variables of Lintner model, the past dividend per share has more effects to dividend smoothing than the current earnings per share. These results suggest that the innovative SMEs maintain stable and long run dividend policy which sustains the past dividend per share level without corporate special reasons. The main results show that dividend adjustment speed of the innovative SMEs is faster than that of the noninnovative SMEs. This means that the innovative SMEs with high level of R&D intensity can adjust dividend payment faster than the noninnovative SMEs, on the ground of their future growth opportunity and profitability. The other main results show that dividend adjustment speed of the financial unconstrained SMEs is faster than that of the financial constrained SMEs. This means that the financial unconstrained firms with high accessibility to capital market can adjust dividend payment faster than the financial constrained firms, on the ground of their financing ability of investment funds through the market accessibility. Futhermore, the other additional results show that dividend adjustment speed of the innovative SMEs classified by the Small and Medium Business Administration is faster than that of the unclassified SMEs. They are linked with various financial policies and services such as credit guaranteed service, policy fund for SMEs, venture investment fund, insurance program, and so on. In conclusion, the past dividend per share and the current earnings per share suggested by the Lintner model explain mainly dividend adjustment speed of the innovative SMEs, and also the financial constraints explain partially. Therefore, if managers can properly understand of the relations between financial constraints and dividend smoothing of innovative SMEs, they can maintain stable and long run dividend policy of the innovative SMEs through dividend smoothing. These are encouraging results for Korea government, that is, the Small and Medium Business Administration as it has implemented many policies to commit to the innovative SMEs. This paper may have a few limitations because it may be only early study about the relations between financial constraints and dividend smoothing of the innovative SMEs. Specifically, this paper may not adequately capture all of the subtle features of the innovative SMEs and the financial unconstrained SMEs. Therefore, we think that it is necessary to expand sample firms and control variables, and use more elaborate analysis methods in the future studies.

Field Studios of In-situ Aerobic Cometabolism of Chlorinated Aliphatic Hydrocarbons

Semprini, Lewts
- Proceedings of the Korean Society of Soil and Groundwater Environment Conference
- /
- 2004.04a
- /
- pp.3-4
- /
- 2004
Results will be presented from two field studies that evaluated the in-situ treatment of chlorinated aliphatic hydrocarbons (CAHs) using aerobic cometabolism. In the first study, a cometabolic air sparging (CAS) demonstration was conducted at McClellan Air Force Base (AFB), California, to treat chlorinated aliphatic hydrocarbons (CAHs) in groundwater using propane as the cometabolic substrate. A propane-biostimulated zone was sparged with a propane/air mixture and a control zone was sparged with air alone. Propane-utilizers were effectively stimulated in the saturated zone with repeated intermediate sparging of propane and air. Propane delivery, however, was not uniform, with propane mainly observed in down-gradient observation wells. Trichloroethene (TCE), cis-1, 2-dichloroethene (c-DCE), and dissolved oxygen (DO) concentration levels decreased in proportion with propane usage, with c-DCE decreasing more rapidly than TCE. The more rapid removal of c-DCE indicated biotransformation and not just physical removal by stripping. Propane utilization rates and rates of CAH removal slowed after three to four months of repeated propane additions, which coincided with tile depletion of nitrogen (as nitrate). Ammonia was then added to the propane/air mixture as a nitrogen source. After a six-month period between propane additions, rapid propane-utilization was observed. Nitrate was present due to groundwater flow into the treatment zone and/or by the oxidation of tile previously injected ammonia. In the propane-stimulated zone, c-DCE concentrations decreased below tile detection limit (1 $\mu$g/L), and TCE concentrations ranged from less than 5 $\mu$g/L to 30 $\mu$g/L, representing removals of 90 to 97%. In the air sparged control zone, TCE was removed at only two monitoring locations nearest the sparge-well, to concentrations of 15 $\mu$g/L and 60 $\mu$g/L. The responses indicate that stripping as well as biological treatment were responsible for the removal of contaminants in the biostimulated zone, with biostimulation enhancing removals to lower contaminant levels. As part of that study bacterial population shifts that occurred in the groundwater during CAS and air sparging control were evaluated by length heterogeneity polymerase chain reaction (LH-PCR) fragment analysis. The results showed that an organism(5) that had a fragment size of 385 base pairs (385 bp) was positively correlated with propane removal rates. The 385 bp fragment consisted of up to 83% of the total fragments in the analysis when propane removal rates peaked. A 16S rRNA clone library made from the bacteria sampled in propane sparged groundwater included clones of a TM7 division bacterium that had a 385bp LH-PCR fragment; no other bacterial species with this fragment size were detected. Both propane removal rates and the 385bp LH-PCR fragment decreased as nitrate levels in the groundwater decreased. In the second study the potential for bioaugmentation of a butane culture was evaluated in a series of field tests conducted at the Moffett Field Air Station in California. A butane-utilizing mixed culture that was effective in transforming 1, 1-dichloroethene (1, 1-DCE), 1, 1, 1-trichloroethane (1, 1, 1-TCA), and 1, 1-dichloroethane (1, 1-DCA) was added to the saturated zone at the test site. This mixture of contaminants was evaluated since they are often present as together as the result of 1, 1, 1-TCA contamination and the abiotic and biotic transformation of 1, 1, 1-TCA to 1, 1-DCE and 1, 1-DCA. Model simulations were performed prior to the initiation of the field study. The simulations were performed with a transport code that included processes for in-situ cometabolism, including microbial growth and decay, substrate and oxygen utilization, and the cometabolism of dual contaminants (1, 1-DCE and 1, 1, 1-TCA). Based on the results of detailed kinetic studies with the culture, cometabolic transformation kinetics were incorporated that butane mixed-inhibition on 1, 1-DCE and 1, 1, 1-TCA transformation, and competitive inhibition of 1, 1-DCE and 1, 1, 1-TCA on butane utilization. A transformation capacity term was also included in the model formation that results in cell loss due to contaminant transformation. Parameters for the model simulations were determined independently in kinetic studies with the butane-utilizing culture and through batch microcosm tests with groundwater and aquifer solids from the field test zone with the butane-utilizing culture added. In microcosm tests, the model simulated well the repetitive utilization of butane and cometabolism of 1.1, 1-TCA and 1, 1-DCE, as well as the transformation of 1, 1-DCE as it was repeatedly transformed at increased aqueous concentrations. Model simulations were then performed under the transport conditions of the field test to explore the effects of the bioaugmentation dose and the response of the system to tile biostimulation with alternating pulses of dissolved butane and oxygen in the presence of 1, 1-DCE (50 $\mu$g/L) and 1, 1, 1-TCA (250 $\mu$g/L). A uniform aquifer bioaugmentation dose of 0.5 mg/L of cells resulted in complete utilization of the butane 2-meters downgradient of the injection well within 200-hrs of bioaugmentation and butane addition. 1, 1-DCE was much more rapidly transformed than 1, 1, 1-TCA, and efficient 1, 1, 1-TCA removal occurred only after 1, 1-DCE and butane were decreased in concentration. The simulations demonstrated the strong inhibition of both 1, 1-DCE and butane on 1, 1, 1-TCA transformation, and the more rapid 1, 1-DCE transformation kinetics. Results of tile field demonstration indicated that bioaugmentation was successfully implemented; however it was difficult to maintain effective treatment for long periods of time (50 days or more). The demonstration showed that the bioaugmented experimental leg effectively transformed 1, 1-DCE and 1, 1-DCA, and was somewhat effective in transforming 1, 1, 1-TCA. The indigenous experimental leg treated in the same way as the bioaugmented leg was much less effective in treating the contaminant mixture. The best operating performance was achieved in the bioaugmented leg with about over 90%, 80%, 60 % removal for 1, 1-DCE, 1, 1-DCA, and 1, 1, 1-TCA, respectively. Molecular methods were used to track and enumerate the bioaugmented culture in the test zone. Real Time PCR analysis was used to on enumerate the bioaugmented culture. The results show higher numbers of the bioaugmented microorganisms were present in the treatment zone groundwater when the contaminants were being effective transformed. A decrease in these numbers was associated with a reduction in treatment performance. The results of the field tests indicated that although bioaugmentation can be successfully implemented, competition for the growth substrate (butane) by the indigenous microorganisms likely lead to the decrease in long-term performance.
PDF

Search Result 4,804, Processing Time 0.04 seconds

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

The Relations between Financial Constraints and Dividend Smoothing of Innovative Small and Medium Sized Enterprises (혁신형 중소기업의 재무적 제약과 배당스무딩간의 관계)

Field Studios of In-situ Aerobic Cometabolism of Chlorinated Aliphatic Hydrocarbons

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)