• Title/Summary/Keyword: decomposition

Search Result 7,135, Processing Time 0.041 seconds

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Ecological Characteristics of Termite(Reticulitermes speratus kyushuensis) for Preservation of Wooden Cultural Heritage (목조문화재의 보존을 위한 한국산 흰개미의 생태적 특성 연구)

  • Lee, Kyu-Shik;Jeong, So-Young
    • Korean Journal of Heritage: History & Science
    • /
    • v.37
    • /
    • pp.327-348
    • /
    • 2004
  • In this study, after analyzing several local climate characteristics of South Korea, I validated distribution, invasion, foraging, underground activities, attack season as ecological characteristics and also temperature, relative humidity, and tree species as preference characteristics of Korean termites (Reticulitermes speratus kyushuensis Morimoto). Especially, southern part of the Korean peninsula is a suitable area for inhabitation and motion of termites holding same ecological characteristic like R. speratus kyushuensis. Busan is a neighboring district at field distribution north limiting temperature of Coptotermes formosanus Shiraki and Chuncheon is a passing area through the Korean Peninsula of field distribution north limiting temperature of Reticulitermes speratus Kolbe. The termite attack of wood devices was about 34.5% for 3 years in the forest of Jongmyo. Although the attack rate of termite increased each year, the detection rate decreased and the missing rate was high by degrees. I confirmed a foraging habits which is a part of termite colony was a role of continuous decomposition and another was a role of new food hunt as experimental results. The foraging termites were found under ground at Jongmyo in Seoul from April to November in the 2001 and the most active period was on July and August. The termite invasion rate of bait station increased in every monitoring. Through the increasing attack rate of bait station during 2nd monitoring (November, 2000) and 3rd monitoring(March, 2001), I confirmed that termites moved into the deep underground in winter, and were working continuously to forage. R. speratus kyushuensis inhabiting at the Korean Peninsula is a species which has food consumption rate with higher temperature. The termite revealed the greatest amount of food(filter paper) at $30^{\circ}C$(90% RH), but showed increasing death rate at over $32^{\circ}C$. Also, survival rate of this termite was 97% at 84% RH($30^{\circ}C$), but killed 100% at 52% RH($30^{\circ}C$) and 70% RH($30^{\circ}C$). For wood feeding, this was observed the preference in a pine tree(Pinus densiflora) above all others. Survival of termites was high(87%) at a pine tree, but low(13.5%) at a paulownia tree(Paulownia coreana). In this study, I presented the biological characteristic of termite(R. speratus kyushuensis Morimoto) and confirmed the deterioration degree of termite on wooden cultural heritage in Korea. Depending on climate and soil temperature, each area in the southern part of the Korea Peninsula, has some different active period and different distribution of R. speratus kyushensis. With these results, I expect that this report helps to prepare the integrated pest management(IPM) of the termite on wooden cultural heritage in Korea, and it may help to reduce the economical loss from termite damage in Korea.

Monitoring Soil Characteristics and Growth of Pinus densiflora Five Years after Restoration in the Baekdudaegan Ridge (백두대간 마루금 복원사업지에서의 5년 경과 후 토양특성 및 소나무 생장 모니터링)

  • Han, Seung Hyun;Kim, Jung Hwan;Kang, Won Seok;Hwang, Jae Hong;Park, Ki Hyung;Kim, Chan-Beom
    • Korean Journal of Environment and Ecology
    • /
    • v.33 no.4
    • /
    • pp.453-461
    • /
    • 2019
  • This study was conducted to monitor the soil characteristics and growth of Pinus densiflora and to determine the effect of soil characteristics on growth rate five years after an ecological restoration project in Baekdudaegan ridge including Ihwaryeong, Yuksimnyeong, and Beoljae sites. The ecological restoration project was executed with the forest of P. densiflora in 2012-2013. In April 2018, we collected soil samples from each site and measured the height and the diameter at breach height (DBH) of P. densiflora. Although there was no significant change of soil pH compared to the early stage of restoration (one year after the project), it was high in Ihwaryeong, and Beoljae with values of 7.7 and 6.4, respectively. Also, the organic matter decreased by 70-80%, and the available phosphorus (P) was unchanged in three restoration sites. The decreased organic matter can be attributed to restriction of inflow and thus decomposition of litter in the early stage after the restoration. The tree height growth rate ($m\;yr^{-1}$) of P. densiflora in Yuksimnyeong was the highest at 1.02, followed by Beolja at 0.75 and Ihwaryeong at 0.17. The height growth rate showed negative relationships with soil pH and cations, including Na and Ca concentrations and a positive relationship with available phosphate. The low growth rate in the Ihwaryeong site, in particular, might result from the poor nutrient availability due to high soil pH and the decrease in water absorption into the root due to high Na and Ca concentrations. The substantial reduction of organic matter after five years indicates that the need for soil improvement using chemical fertilizer and biochar.

Coarse Woody Debris (CWD) Respiration Rates of Larix kaempferi and Pinus rigida: Effects of Decay Class and Physicochemical Properties of CWD (일본잎갈나무와 리기다소나무 고사목의 호흡속도: 고사목의 부후등급과 이화학적 특성의 영향)

  • Lee, Minkyu;Kwon, Boram;Kim, Sung-geun;Yoon, Tae Kyung;Son, Yowhan;Yi, Myong Jong
    • Journal of Korean Society of Forest Science
    • /
    • v.108 no.1
    • /
    • pp.40-49
    • /
    • 2019
  • Coarse woody debris (CWD), which is a component of the forest ecosystem, plays a major role in forest energy flow and nutrient cycling. In particular, CWD isolates carbon for a long time and is important in terms of slowing the rate of carbon released from the forest to the atmosphere. Therefore, this study measured the physiochemical characteristics and respiration rate ($R_{CWD}$) of CWD for Larix kaempferi and Pinus rigida in temperate forests in central Korea. In summer 2018, CWD samples from decay class (DC) I to IV were collected in the 14 forest stands. $R_{CWD}$ and physiochemical characteristics were measured using a closed chamber with a portable carbon dioxide sensor in the laboratory. In both species, as CWD decomposition progressed, the density ($D_{CWD}$) of the CWD decreased while the water content ($WC_{CWD}$) increased. Furthermore, the carbon concentrations did not significantly differ by DC, whereas the nitrogen concentration significantly increased and the C/N ratio decreased. The respiration rate of L. kaempferi CWD increased significantly up to DC IV, but for P. rigida it increased to DC II and then unchanged for DC II-IV. Accordingly, except for carbon concentration, all the measured characteristics showed a significant correlation with $R_{CWD}$. Multiple linear regression showed that $WC_{CWD}$ was the most influential factor on $R_{CWD}$. $WC_{CWD}$ affects $R_{CWD}$ by increasing microbial activity and is closely related to complex environmental factors such as temperature and light conditions. Therefore, it is necessary to study their correlation and estimate the time-series pattern of CWD moisture.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Production of Medium-chain-length Poly (3-hydroxyalkanoates) by Pseudomonas sp. EML8 from Waste Frying Oil (Pseudomonas sp. EML8 균주를 이용한 폐식용류로부터 medium-chain-length poly(3-hydroxyalkanoates) 생합성)

  • Kim, Tae-Gyeong;Kim, Jong-Sik;Chung, Chung-Wook
    • Journal of Life Science
    • /
    • v.31 no.1
    • /
    • pp.90-99
    • /
    • 2021
  • In this study, to reduce the production cost of poly(3-hydroxyalkanoates) (PHA), optimal cell growth and PHA biosynthesis conditions of the isolated strain Pseudomonas sp. EML8 were established using waste frying oil (WFO) as the cheap carbon source. Gas chromatography (GC) and GC mass spectrometry analysis of the medium-chain-length PHA (mcl-PHAWFO) obtained by Pseudomonas sp. EML8 of WFO indicated that it was composed of 7.28 mol% 3-hydrxoyhexanoate, 39.04 mol% 3-hydroxyoctanoate, 37.11 mol% 3-hydroxydecanoate, and 16.58 mol% 3-hydroxvdodecanoate monomers. When Pseudomonas sp. EML8 were culture in flask, the maximum dry cell weight (DCW) and the mcl-PHAWFO yield (g/l) were showed under WFO (20 g/l), (NH4)2SO4 (0.5 g/l), pH 7, and 25℃ culture conditions. Based on this, the highest DCW, mcl-PHAWFO content, and mcl-PHAWFO yield from 3-l-jar fermentation was obtained after 48 hr. Similar results were obtained using 20 g/l of fresh frying oil (FFO) as a control carbon source. In this case, the DCW, the mcl-PHAFFO content, and the mcl-PHAFFO yields were 2.7 g/l, 62 wt%, and 1.6 g/l, respectively. Gel permeation chromatography analysis confirmed the average molecular weight of the mcl-PHAWFO and mcl-PHAFFO to be between 165-175 kDa. Thermogravimetric analysis showed decomposition temperature values of 260℃ and 274.7℃ for mcl-PHAWFO and mcl-PHAFFO, respectively. In conclusion, Pseudomonas sp. EML8 and WFO could be suggested as a new candidate and substrate for the industrial production of PHA.

A Study on the Development of High Sensitivity Collision Simulation with Digital Twin (디지털 트윈을 적용한 고감도 충돌 시뮬레이션 개발을 위한 연구)

  • Ki, Jae-Sug;Hwang, Kyo-Chan;Choi, Ju-Ho
    • Journal of the Society of Disaster Information
    • /
    • v.16 no.4
    • /
    • pp.813-823
    • /
    • 2020
  • Purpose: In order to maximize the stability and productivity of the work through simulation prior to high-risk facilities and high-cost work such as dismantling the facilities inside the reactor, we intend to use digital twin technology that can be closely controlled by simulating the specifications of the actual control equipment. Motion control errors, which can be caused by the time gap between precision control equipment and simulation in applying digital twin technology, can cause hazards such as collisions between hazardous facilities and control equipment. In order to eliminate and control these situations, prior research is needed. Method: Unity 3D is currently the most popular engine used to develop simulations. However, there are control errors that can be caused by time correction within Unity 3D engines. The error is expected in many environments and may vary depending on the development environment, such as system specifications. To demonstrate this, we develop crash simulations using Unity 3D engines, which conduct collision experiments under various conditions, organize and analyze the resulting results, and derive tolerances for precision control equipment based on them. Result: In experiments with collision experiment simulation, the time correction in 1/1000 seconds of an engine internal function call results in a unit-hour distance error in the movement control of the collision objects and the distance error is proportional to the velocity of the collision. Conclusion: Remote decomposition simulators using digital twin technology are considered to require limitations of the speed of movement according to the required precision of the precision control devices in the hardware and software environment and manual control. In addition, the size of modeling data such as system development environment, hardware specifications and simulations imitated control equipment and facilities must also be taken into account, available and acceptable errors of operational control equipment and the speed required of work.

Effect of Dry Surface Treatment with Ozone and Ammonia on Physico-chemical Characteristics of Dried Low Rank Coal (건조된 저등급 석탄에 대한 건식 표면처리가 물리화학적 특성에 미치는 영향)

  • Choi, Changsik;Han, Gi Bo;Jang, Jung Hee;Park, Jaehyeon;Bae, Dal Hee;Shun, Dowon
    • Applied Chemistry for Engineering
    • /
    • v.22 no.5
    • /
    • pp.532-539
    • /
    • 2011
  • The physical and chemical properties of the dried low rank coals (LRCs) before and after the surface treatment using ozone and ammonia were characterized in this study. The contents of moisture, volatiles, fixed carbon and ash consisting of dried LRCs before the surface treatment were about 2.0, 44.8, 44.9 and 8.9%, respectively. Also, it was composed of carbon of 62.66%, hydrogen of 4.33%, nitrogen of 0.94%, oxygen of 27.01% and sulfur of 0.09%. The dried LRCs was surface-treated with the various dry methods using gases such as ozone at room temperature, ammonia at $200^{\circ}C$ and then the dried LRCs before and after the surface treatment were characterized by the various analysis methods such as FT-IR, TGA, proximate and elemental analysis, caloric value, ignition test, adsorption of $H_2O$ and $NH_3-TPD$. As a result, the oxygen content increased and the calorific value, ignition temperature and the contents of carbon and hydrogen relatively decreased because the oxygen-contained functional groups were additionally generated by the surface oxidation with ozone which plays a role as an oxidant. Also, its $H_2O$ adsorption ability got higher because the hydrophilic oxygen-contained functional groups were additionally generated by the surface oxidation with ozone. On the other hand, it was confirmed that the dried LRCs after the surface treatment with $NH_3$ at $200^{\circ}C$ have the decreased oxygen content, but the increased calorific value, ignition temperature and contents of carbon and hydrogen because of the decomposition of oxygen-contained functional groups the on the surface. In addition, the $H_2O$ adsorption ability was lowered bucause the surface of the dried LRCs might be hydrophobicized by the loss of the hydrophilic oxygen-contained functional groups. It was concluded that the various physico-chemical properties of the dried LRCs can be changed by the surface treatment.

Production of Poly (3-Hydroxybutyrate-co-3-Hydroxyvalerate) by Bacillus sp. EMK-5020 Using Makgeolli Lees Enzymatic Hydrolysate and Propionic Acid as Carbon Sources (막걸리 주박 가수분해 산물과 propionic acid를 탄소원으로 이용한 Bacillus sp. EML-5020 균주로부터 poly (3-hydroxybutyrate-co-3-hydroxyvalerate) 생합성)

  • Kwon, Kyungjin;Kim, Jong-Sik;Chung, Chung-Wook
    • Journal of Life Science
    • /
    • v.32 no.7
    • /
    • pp.510-522
    • /
    • 2022
  • In this study, to biosynthesize PHA with properties more similar to polypropylene, a Bacillus sp. EMK-5020 strain that biosynthesized poly (3-hydroxybutyrate-co-3-hydroxyvalerate) (PHBV) was isolated from soil. Bacillus sp. EMK-5020 strain biosynthesized PHBV containing 1.3% 3-hydroxyvalerate (3HV) using reducing sugar contained in Makgeolli lees enzymatic hydrolysate (MLEH) as a single carbon source. As the amount of propionic acid, which was added as a second carbon source, increased, the content of 3HV also increased. PHBV containing up to 48.6% of 3HV was synthesized when 1.0 g/l of propionic acid was added. Based on these results, the strain was cultured for 72 hr in a 3 l fermenter using reducing sugar in MLEH (20 g/l) and propionic acid (1 g/l) as the main and secondary carbon sources, respectively. As a result, 6.4 g/l DCW and 50 wt% of PHBV (MLEH-PHBV) containing 8.9% 3HV were biosynthesized. Through gel permeation chromatography and thermogravimetric analysis, it was confirmed that the average molecular weight and the decomposition temperature of MLEH-PHBV were 152 kDa and 273℃, respectively. In conclusion, the Bacillus sp. EMK-5020 strain could biosynthesize PHBV containing various 3HV fractions when MLEH and propionic acid were used as carbon sources, and PHBV-MLEH containing 8.9% 3HV was confirmed to have higher thermal stability than standard PHBV (8% 3HV).

Development of simultaneous analytical method for investigation of ketamine and dexmedetomidine in feed (사료 내 케타민과 덱스메데토미딘의 잔류조사를 위한 동시분석법 개발)

  • Chae, Hyun-young;Park, Hyejin;Seo, Hyung-Ju;Jang, Su-nyeong;Lee, Seung Hwa;Jeong, Min-Hee;Cho, Hyunjeong;Hong, Seong-Hee;Na, Tae Woong
    • Analytical Science and Technology
    • /
    • v.35 no.3
    • /
    • pp.136-142
    • /
    • 2022
  • According to media reports, the carcasses of euthanized abandoned dogs were processed at high temperature and pressure to make powder, and then used as feed materials (meat and bone meal), raising the possibility of residuals in the feed of the anesthetic ketamine and dexmedetomidine used for euthanasia. Therefore, a simultaneous analysis method using QuEChERS combined with high-performance liquid chromatography coupled with electrospray ionization tandem mass spectrometry was developed for rapid residue analysis. The method developed in this study exhibited linearity of 0.999 and higher. Selectivity was evaluated by analyzing blank and spiked samples at the limit of quantification. The MRM chromatograms of blank samples were compared with those of spiked samples with the analyte, and there were no interferences at the respective retention times of ketamine and dexmedetomidine. The detection and quantitation limits of the instrument were 0.6 ㎍/L and 2 ㎍/L, respectively. The limit of quantitation for the method was 10 ㎍/kg. The results of the recovery test on meat and bone meal, meat meal, and pet food showed ketamine in the range of 80.48-98.63 % with less than 5.00 % RSD, and dexmedetomidine in the range of 72.75-93.00 % with less than 4.83 % RSD. As a result of collecting and analyzing six feeds, such as meat and bone meal, prepared at the time the raw material was distributed, 10.8 ㎍/kg of ketamine was detected in one sample of meat and bone meal, while dexmedetomidine was found to have a concentration below the limit of quantitation. It was confirmed that the detected sample was distributed before the safety issue was known, and thereafter, all the meat and bone meal made with the carcasses of euthanized abandoned dogs was recalled and completely discarded. To ensure the safety of the meat and bone meal, 32 samples of the meat and bone meal as well as compound feed were collected, and additional residue investigations were conducted for ketamine and dexmedetomidine. As a result of the analysis, no component was detected. However, through this investigation, it was confirmed that some animal drugs, such as anesthetics, can remain without decomposition even at high temperature and pressure; therefore, there is a need for further investigation of other potentially hazardous substances not controlled in the feed.