Search | Korea Science

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.25 no.2
- /
- pp.141-166
- /
- 2019
Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.
https://doi.org/10.13088/jiis.2019.25.2.141 인용 PDF KSCI HTML

Ecological Changes of Insect-damaged Pinus densiflora Stands in the Southern Temperate Forest Zone of Korea (I) (솔잎혹파리 피해적송림(被害赤松林)의 생태학적(生態学的) 연구(研究) (I))

Yim, Kyong Bin;Lee, Kyong Jae;Kim, Yong Shik
- Journal of Korean Society of Forest Science
- /
- v.52 no.1
- /
- pp.58-71
- /
- 1981
Thecodiplosis japonesis is sweeping the Pinus densiflora forests from south-west to north-east direction, destroying almost all the aged large trees as well as even the young ones. The front line of infestation is moving slowly but ceaselessly norhwards as a long bottle front. Estimation is that more than 40 percent of the area of P. densiflora forest has been damaged already, however some individuals could escapes from the damage and contribute to restore the site to the previous vegetation composition. When the stands were attacked by this insect, the drastic openings of the upper story of tree canopy formed by exclusively P. densiflora are usually resulted and some environmental factors such as light, temperature, litter accumulation, soil moisture and offers were naturally modified. With these changes after insect invasion, as the time passes, phytosociologic changes of the vegetation are gradually proceeding. If we select the forest according to four categories concerning the history of the insect outbreak, namely, non-attacked (healthy forest), recently damaged (the outbreak occured about 1-2 years ago), severely damaged (occured 5-6 years ago), damage prolonged (occured 10 years ago) and restored (occured about 20 years ago), any directional changes of vegetation composition could be traced these in line with four progressive stages. To elucidate these changes, three survey districts; (1) "Gongju" where the damage was severe and it was outbroken in 1977, (2) "Buyeo" where damage prolonged and (3) "Gochang" as restored, were set, (See Tab. 1). All these were located in the south temperate forest zone which was delimited mainly due to the temporature factor and generally accepted without any opposition at present. In view of temperature, the amount and distribution of precipitation and various soil factor, the overall homogeneity of environmental conditions between survey districts might be accepted. However this did not mean that small changes of edaphic and topographic conditions and microclimates can induce any alteration of vegetation patterns. Again four survey plots were set in each district and inter plot distance was 3 to 4 km. And again four subplots were set within a survey plot. The size of a subplot was $10m{\times}10m$ for woody vegetation and $5m{\times}5m$ for ground cover vegetation which was less than 2 m high. The nested quadrat method was adopted. In sampling survey plots, the followings were taken into account: (1) Natural growth having more than 80 percent of crown density of upper canopy and more than 5 hectares of area. (2) Was not affected by both natural and artificial disturbances such as fire and thinning operation for the past three decades. (3) Lower than 500 m of altitude (4) Less than 20 degrees of slope, and (5) Northerly sited aspect. An intensive vegetation survey was undertaken during the summer of 1980. The vegetation was devided into 3 categories for sampling; the upper layer (dominated mainly by the pine trees), the middle layer composed by oak species and other broad-leaved trees as well as the pine, and the ground layer or the lower layer (shrubby form of woody plants). In this study our survey was concentrated on woody species only. For the vegetation analysis, calculated were values of intensity, frequency, covers, relative importance, species diversity, dominance and similarity and dissimilasity index when importance values were calculated, different relative weights as score were arbitrarily given to each layer, i.e., 3 points for the upper layer, 2 for the middle layer and 1 for the ground layer. Then the formula becomes as follows; $$R.I.V.=\frac{3(IV\;upper\;L.)+2(IV.\;middle\;L.)+1(IV.\;ground\;L.)}{6}$$ The values of Similarity Index were calculated on the basis of the Relative Importance Value of trees (sum of relative density, frequency and cover). The formula used is; $$S.I.=\frac{2C}{S_1+S_2}{\times}100=\frac{2C}{100+100}{\times}100=C(%)$$ Where: C = The sum of the lower of the two quantitative values for species shared by the two communities. $S_1$ = The sum of all values for the first community. $S_2$ = The sum of all values for the second community. In Tab. 3, the species composition of each plot by layer and by district is presented. Without exception, the species formed the upper layer of stands was Pinus densiflora. As seen from the table, the relative cover (%), density (number of tree per $500m^2$), the range of height and diameter at brest height and cone bearing tendency were given. For the middle layer, Quercus spp. (Q. aliena, serrata, mongolica, accutissina and variabilis) and Pinus densiflora were dominating ones. Genus Rhodedendron and Lespedeza were abundant in ground vegetation, but some oaks were involved also. (1) Gongju district The total of woody species appeared in this district was 26 and relative importance value of Pinus densiflora for the upper layer was 79.1%, but in the middle layer, the R.I.V. for Quercus acctissima, Pinus densiflora, and Quercus aliena, were 22.8%, 18.7% and 10.0%, respectively, and in ground vegetation Q. mongolica 17.0%, Q. serrata 16.8% Corylus heterophylla 11.8%, and Q. dentata 11.3% in order. (2) Buyeo district. The number of species enumerated in this district was 36 and the R.I.V. of Pinus densiflora for the uppper layer was 100%. In the middle layer, the R.I.V. of Q. variabilis and Q. serrata were 8.6% and 8.5% respectively. In the ground vegetative 24 species were counted which had no more than 5% of R.I.V. The mean R.I.V. of P.densiflora ( totaling three layers ) and averaging four plots was 57.7% in contrast to 46.9% for Gongju district. (3) Gochang-district The total number of woody species was 23 and the mean R.I.V. of Pinus densiflora was 66.0% showing greater value than those for two former districts. The next high value was 6.5% for Q. serrata. As the time passes since insect outbreak, the mean R.I.V. of P. densiflora increased as the following order, 46.9%, 57.7% and 66%. This implies that P. densiflora was getting back to its original dominat state again. The pooled importance of Genus Quercus was decreasing with the increase of that for Pinus densiflora. This trend was contradict to the facts which were surveyed at Kyonggi-do area (the central temperate forest zone) reported previously (Yim et al, 1980). Among Genus Quercus, Quercus acutissina, warm-loving species, was more abundant in the southern temperature zone to which the present research is concerned than the central temperate zone. But vice-versa was true with Q. mongolica, a cold-loving one. The species which are not common between the present survey and the previous report are Corpinus cordata, Beltala davurica, Wisturia floribunda, Weigela subsessilis, Gleditsia japonica var. koraiensis, Acer pseudosieboldianum, Euonymus japonica var. macrophylla, Ribes mandshuricum, Pyrus calleryana var. faruiei, Tilia amurensis and Pyrus pyrifolia. In Figure 4 and Table 5, Maximum species diversity (maximum H'), Species diversity (H') and Eveness (J') were presented. The Similarity indices between districts were shown in Tab. 5. Seeing Fig. 6, showing two-dimensional ordination of polts on the basis of X and Y coordinates, Ai plots aggregate at the left site, Bi plots at lower site, and Ci plots at upper-right site. The increasing and decreasing patterns as to Relative Density and Relative Importance Value by genus or species were given in Fig. 7. Some of the patterns presented here are not consistent with the previously reported ones (Yim, et al, 1980). The present authors would like to attribute this fact that two distinct types of the insect attack, one is the short war type occuring in the south temperate forest zone, which means that insect attack went for a few years only, the other one is a long-drawn was type observed at the temperate forest zone in which the insect damage went on continuously for several years. These different behaviours of infestation might have resulted the different ways of vegetational change. Analysing the similarity indices between districts, the very convincing results come out that the value of dissimilarity index between A and B was 30%, 27% between B and C and 35% between A and C (Table 6). The range of similarity index was obtained from the calculation of every possible combinations of plots between two districts. Longer time isolation between communities has brought the higher value of dissimilarity index. The main components of ground vegetation, 10 to 20 years after insect outbreak, become to be consisted of mainly Genus Lespedeza and Rhododendron. Genus Quercus which relate to the top dorminant state for a while after insect attack was giving its place to Pinus densiflora. It was implied that, provided that the soil fertility, soil moisture and soil depth were good enough, Genus Quercuss had never been so easily taken ever by the resistant speeies like Pinus densiflora which forms the edaphic climax at vast areas of forest land. Usually they refer Quercus to the representative component of the undisturbed natural forest in the central part of this country.
PDF

Search Result 1,022, Processing Time 0.021 seconds

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

Ecological Changes of Insect-damaged Pinus densiflora Stands in the Southern Temperate Forest Zone of Korea (I) (솔잎혹파리 피해적송림(被害赤松林)의 생태학적(生態学的) 연구(研究) (I))

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)