• Title/Summary/Keyword: Bi-gram Analysis

Search Result 9, Processing Time 0.024 seconds

Text Mining Analysis Technique on ECDIS Accident Report (텍스트 마이닝 기법을 활용한 ECDIS 사고보고서 분석)

  • Lee, Jeong-Seok;Lee, Bo-Kyeong;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.25 no.4
    • /
    • pp.405-412
    • /
    • 2019
  • SOLAS requires that ECDIS be installed on ships of more than 500 gross tonnage engaged in international navigation until the first inspection arriving after July 1, 2018. Several accidents related to the use of ECDIS have occurred with its installation as a new major navigation instrument. The 12 incident reports issued by MAIB, BSU, BEAmer, DMAIB, and DSB were analyzed, and the cause of accident was determined to be related to the operation of the navigator and the ECDIS system. The text was analyzed using the R-program to quantitatively analyze words related to the cause of the accident. We used text mining techniques such as Wordcloud, Wordnetwork and Wordweight to represent the importance of words according to their frequency of derivation. Wordcloud uses the N-gram model as a way of expressing the frequency of used words in cloud form. As a result of the uni-gram analysis of the N-gram model, ECDIS words were obtained the most, and the bi-gram analysis results showed that the word "Safety Contour" was used most frequently. Based on the bi-gram analysis, the causative words are classified into the officer and the ECDIS system, and the related words are represented by Wordnetwork. Finally, the related words with the of icer and the ECDIS system were composed of word corpus, and Wordweight was applied to analyze the change in corpus frequency by year. As a result of analyzing the tendency of corpus variation with the trend line graph, more recently, the corpus of the officer has decreased, and conversely, the corpus of the ECDIS system is gradually increasing.

Analyzing the Trend of Wearable Keywords using Text-mining Methodology (텍스트마이닝 방법론을 활용한 웨어러블 관련 키워드의 트렌드 분석)

  • Kim, Min-Jeong
    • Journal of Digital Convergence
    • /
    • v.18 no.9
    • /
    • pp.181-190
    • /
    • 2020
  • The purpose of this study is to analyze the trends of wearable keywords using text mining methodology. To this end, 11,952 newspaper articles were collected from 1992 to 2019, and frequency analysis and bi-gram analysis were applied. The frequency analysis showed that Samsung Electronics, LG Electronics, and Apple were extracted as the highest frequency words, and smart watches and smart bands continued to emerge as higher frequency in terms of devices. As a result of the analysis of the bi-gram, it was confirmed that the sequence of two adjacent words such as world-first and world-largest appeared continuously, and related new bi-gram words were derived whenever issues or events occurred. This trend of wearable keywords will be useful for understanding the wearable trend and future direction.

A Language Model Approach to "The Vegetarian" (채식주의자: 랭귀지 모델 접근)

  • Kim, Jaejun;Kwon, Junhyeok;Kim, Yoolae;Park, Myung-Kwan;Song, Sanghoun
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.260-263
    • /
    • 2017
  • This paper is to broaden the possible spectrums of analyzing the Korean-written novel "The Vegetarian" by using the computational linguistics program. Through the use of language model, which was usually used in bi-gram analysis in corpus linguistics, to the International Man Booker award winning novel, the characteristics of "The Vegetarian" is investigated by comparing it to the English-written novel "A Little Life".

  • PDF

A Language Model Approach to "The Vegetarian" (채식주의자: 랭귀지 모델 접근)

  • Kim, Jaejun;Kwon, Junhyeok;Kim, Yoolae;Park, Myung-Kwan;Song, Sanghoun
    • 한국어정보학회:학술대회논문집
    • /
    • 2017.10a
    • /
    • pp.260-263
    • /
    • 2017
  • This paper is to broaden the possible spectrums of analyzing the Korean-written novel "The Vegetarian" by using the computational linguistics program. Through the use of language model, which was usually used in bi-gram analysis in corpus linguistics, to the International Man Booker award winning novel, the characteristics of "The Vegetarian" is investigated by comparing it to the English-written novel "A Little Life".

  • PDF

Analyzing Female College Student's Recognition of Health Monitoring and Wearable Device Using Topic Modeling and Bi-gram Network Analysis (토픽 모델링 및 바이그램 네트워크 분석 기법을 통한 여대생의 건강관리 및 웨어러블 디바이스 인식에 관한 연구)

  • Jeong, Wookyoung;Shin, Donghee
    • Journal of the Korean Society for information Management
    • /
    • v.38 no.4
    • /
    • pp.129-152
    • /
    • 2021
  • This study proposed a plan to develop wearable devices suitable for female college students by analyzing female college students' perceptions and preferences for wearable devices and their needs for health care using topic modeling and network analysis techniques. To this end, 2,457 posts related to health care and wearable devices were collected from the community used by S Women's University students. After preprocessing the collected posts and comment data, LDA-based topic modeling was performed. Through topic modeling techniques, major issues of female college students related to health care and wearable devices are derived, and bi-gram analysis and network analysis are performed on posts containing related keywords to understand female college students' views on wearable devices.

Modern Methods of Text Analysis as an Effective Way to Combat Plagiarism

  • Myronenko, Serhii;Myronenko, Yelyzaveta
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.242-248
    • /
    • 2022
  • The article presents the analysis of modern methods of automatic comparison of original and unoriginal text to detect textual plagiarism. The study covers two types of plagiarism - literal, when plagiarists directly make exact copying of the text without changing anything, and intelligent, using more sophisticated techniques, which are harder to detect due to the text manipulation, like words and signs replacement. Standard techniques related to extrinsic detection are string-based, vector space and semantic-based. The first, most common and most successful target models for detecting literal plagiarism - N-gram and Vector Space are analyzed, and their advantages and disadvantages are evaluated. The most effective target models that allow detecting intelligent plagiarism, particularly identifying paraphrases by measuring the semantic similarity of short components of the text, are investigated. Models using neural network architecture and based on natural language sentence matching approaches such as Densely Interactive Inference Network (DIIN), Bilateral Multi-Perspective Matching (BiMPM) and Bidirectional Encoder Representations from Transformers (BERT) and its family of models are considered. The progress in improving plagiarism detection systems, techniques and related models is summarized. Relevant and urgent problems that remain unresolved in detecting intelligent plagiarism - effective recognition of unoriginal ideas and qualitatively paraphrased text - are outlined.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

Effect of Fire on Microbial Community Structure and Enzyme Activities in Forest Soil (산불이 토양 미생물 군집과 효소 활성 변화에 미치는 영향)

  • Oh, Ju-Hwan;Lee, Seul-Bi;Park, Sung-Eun;Lee, Yong-Bok;Kim, Pil-Joo
    • Korean Journal of Environmental Agriculture
    • /
    • v.27 no.2
    • /
    • pp.133-138
    • /
    • 2008
  • Fire can affect microbial community structure of soil through altered environmental conditions, nutrient availability, and biotic source for microbial re-colonization. We examined the influence of fire on chemical properties and soil enzyme activities of soil for 10 months. We also characterized the soil microbial community structure through ester-linked fatty acid analysis(EL-FAME). For this study, we established five burned plots(1*1 m) and 5 unburned plots outside the margin of fire. Soil was sampled three soil cores in a each plots and composited for analysis at 1, 3, 5, 8, and 10 month after fire. The fire caused an increase in soil pH, exchangeable Ca, and Mg, organic matter, available $P_2O_5$ compared to unburned sites. The content of $NH_4-N$ in burned site was significantly higher than that of unburned site and this effect continued for 8 months after fire. There was no difference of $NO_3-N$ content in soil between burned and unburned site. Fire caused no change in acid phosphatase and arylsulfatase activities but $\beta$-glucosidase and alkaline phosphatase activities in burned site were increased compared to unburned site. Microbial biomass as estimated by total concentration of EL-FAMEs in burned sites was significantly higher than that of unburned sites at one month after fire. Burned site decreased the EL-FAMEs indicative of gram-positive bacteria and tended to increase the fatty acid associated with gram-negative bacteria at one and three months after fire. The sum of EL-FAME compound $18:2{\omega}6,9c$ and $18:1{\omega}9c$ as served fungal biomarkers was decreased in burned site compared to unburned site.

Sesquiterpenoids Bioconversion Analysis by Wood Rot Fungi

  • Lee, Su-Yeon;Ryu, Sun-Hwa;Choi, In-Gyu;Kim, Myungkil
    • 한국균학회소식:학술대회논문집
    • /
    • 2016.05a
    • /
    • pp.19-20
    • /
    • 2016
  • Sesquiterpenoids are defined as $C_{15}$ compounds derived from farnesyl pyrophosphate (FPP), and their complex structures are found in the tissue of many diverse plants (Degenhardt et al. 2009). FPP's long chain length and additional double bond enables its conversion to a huge range of mono-, di-, and tri-cyclic structures. A number of cyclic sesquiterpenes with alcohol, aldehyde, and ketone derivatives have key biological and medicinal properties (Fraga 1999). Fungi, such as the wood-rotting Polyporus brumalis, are excellent sources of pharmaceutically interesting natural products such as sesquiterpenoids. In this study, we investigated the biosynthesis of P. brumalis sesquiterpenoids on modified medium. Fungal suspensions of 11 white rot species were inoculated in modified medium containing $C_6H_{12}O_6$, $C_4H_{12}N_2O_6$, $KH_2PO_4$, $MgSO_4$, and $CaCl_2$ for 20 days. Cultivation was stopped by solvent extraction via separation of the mycelium. The metabolites were identified as follows: propionic acid (1), mevalonic acid lactone (2), ${\beta}$-eudesmane (3), and ${\beta}$-eudesmol (4), respectively (Figure 1). The main peaks of ${\beta}$-eudesmane and ${\beta}$-eudesmol, which were indicative of sesquiterpene structures, were consistently detected for 5, 7, 12, and 15 days These results demonstrated the existence of terpene metabolism in the mycelium of P. brumalis. Polyporus spp. are known to generate flavor components such as methyl 2,4-dihydroxy-3,6-dimethyl benzoate; 2-hydroxy-4-methoxy-6-methyl benzoic acid; 3-hydroxy-5-methyl phenol; and 3-methoxy-2,5-dimethyl phenol in submerged cultures (Hoffmann and Esser 1978). Drimanes of sesquiterpenes were reported as metabolites from P. arcularius and shown to exhibit antimicrobial activity against Gram-positive bacteria such as Staphylococcus aureus (Fleck et al. 1996). The main metabolites of P. brumalis, ${\beta}$-Eudesmol and ${\beta}$-eudesmane, were categorized as eudesmane-type sesquiterpene structures. The eudesmane skeleton could be biosynthesized from FPP-derived IPP, and approximately 1,000 structures have been identified in plants as essential oils. The biosynthesis of eudesmol from P. brumalis may thus be an important tool for the production of useful natural compounds as presumed from its identified potent bioactivity in plants. Essential oils comprising eudesmane-type sesquiterpenoids have been previously and extensively researched (Wu et al. 2006). ${\beta}$-Eudesmol is a well-known and important eudesmane alcohol with an anticholinergic effect in the vascular endothelium (Tsuneki et al. 2005). Additionally, recent studies demonstrated that ${\beta}$-eudesmol acts as a channel blocker for nicotinic acetylcholine receptors at the neuromuscular junction, and it can inhibit angiogenesis in vitro and in vivo by blocking the mitogen-activated protein kinase (MAPK) signaling pathway (Seo et al. 2011). Variation of nutrients was conducted to determine an optimum condition for the biosynthesis of sesquiterpenes by P. brumalis. Genes encoding terpene synthases, which are crucial to the terpene synthesis pathway, generally respond to environmental factors such as pH, temperature, and available nutrients (Hoffmeister and Keller 2007, Yu and Keller 2005). Calvo et al. described the effect of major nutrients, carbon and nitrogen, on the synthesis of secondary metabolites (Calvo et al. 2002). P. brumalis did not prefer to synthesize sesquiterpenes under all growth conditions. Results of differences in metabolites observed in P. brumalis grown in PDB and modified medium highlighted the potential effect inorganic sources such as $C_4H_{12}N_2O_6$, $KH_2PO_4$, $MgSO_4$, and $CaCl_2$ on sesquiterpene synthesis. ${\beta}$-eudesmol was apparent during cultivation except for when P. brumalis was grown on $MgSO_4$-free medium. These results demonstrated that $MgSO_4$ can specifically control the biosynthesis of ${\beta}$-eudesmol. Magnesium has been reported as a cofactor that binds to sesquiterpene synthase (Agger et al. 2008). Specifically, the $Mg^{2+}$ ions bind to two conserved metal-binding motifs. These metal ions complex to the substrate pyrophosphate, thereby promoting the ionization of the leaving groups of FPP and resulting in the generation of a highly reactive allylic cation. Effect of magnesium source on the sesquiterpene biosynthesis was also identified via analysis of the concentration of total carbohydrates. Our current study offered further insight that fungal sesquiterpene biosynthesis can be controlled by nutrients. To profile the metabolites of P. brumalis, the cultures were extracted based on the growth curve. Despite metabolites produced during mycelia growth, there was difficulty in detecting significant changes in metabolite production, especially those at low concentrations. These compounds may be of interest in understanding their synthetic mechanisms in P. brumalis. The synthesis of terpene compounds began during the growth phase at day 9. Sesquiterpene synthesis occurred after growth was complete. At day 9, drimenol, farnesol, and mevalonic lactone (or mevalonic acid lactone) were identified. Mevalonic acid lactone is the precursor of the mevalonic pathway, and particularly, it is a precursor for a number of biologically important lipids, including cholesterol hormones (Buckley et al. 2002). Farnesol is the precursor of sesquiterpenoids. Drimenol compounds, bi-cyclic-sesquiterpene alcohols, can be synthesized from trans-trans farnesol via cyclization and rearrangement (Polovinka et al. 1994). They have also been identified in the basidiomycota Lentinus lepideus as secondary metabolites. After 12 days in the growth phase, ${\beta}$-elemene caryophyllene, ${\delta}$-cadiene, and eudesmane were detected with ${\beta}$-eudesmol. The data showed the synthesis of sesquiterpene hydrocarbons with bi-cyclic structures. These compounds can be synthesized from FPP by cyclization. Cyclic terpenoids are synthesized through the formation of a carbon skeleton from linear precursors by terpene cyclase, which is followed by chemical modification by oxidation, reduction, methylation, etc. Sesquiterpene cyclase is a key branch-point enzyme that catalyzes the complex intermolecular cyclization of the linear prenyl diphosphate into cyclic hydrocarbons (Toyomasu et al. 2007). After 20 days in stationary phase, the oxygenated structures eudesmol, elemol, and caryophyllene oxide were detected. Thus, after growth, sesquiterpenes were identified. Per these results, we showed that terpene metabolism in wood-rotting fungi occurs in the stationary phase. We also showed that such metabolism can be controlled by magnesium supplementation in the growth medium. In conclusion, we identified P. brumalis as a wood-rotting fungus that can produce sesquiterpenes. To mechanistically understand eudesmane-type sesquiterpene biosynthesis in P. brumalis, further research into the genes regulating the dynamics of such biosynthesis is warranted.

  • PDF