• Title/Summary/Keyword: Extraction

Search Result 17,205, Processing Time 0.05 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Sesquiterpenoids Bioconversion Analysis by Wood Rot Fungi

  • Lee, Su-Yeon;Ryu, Sun-Hwa;Choi, In-Gyu;Kim, Myungkil
    • 한국균학회소식:학술대회논문집
    • /
    • 2016.05a
    • /
    • pp.19-20
    • /
    • 2016
  • Sesquiterpenoids are defined as $C_{15}$ compounds derived from farnesyl pyrophosphate (FPP), and their complex structures are found in the tissue of many diverse plants (Degenhardt et al. 2009). FPP's long chain length and additional double bond enables its conversion to a huge range of mono-, di-, and tri-cyclic structures. A number of cyclic sesquiterpenes with alcohol, aldehyde, and ketone derivatives have key biological and medicinal properties (Fraga 1999). Fungi, such as the wood-rotting Polyporus brumalis, are excellent sources of pharmaceutically interesting natural products such as sesquiterpenoids. In this study, we investigated the biosynthesis of P. brumalis sesquiterpenoids on modified medium. Fungal suspensions of 11 white rot species were inoculated in modified medium containing $C_6H_{12}O_6$, $C_4H_{12}N_2O_6$, $KH_2PO_4$, $MgSO_4$, and $CaCl_2$ for 20 days. Cultivation was stopped by solvent extraction via separation of the mycelium. The metabolites were identified as follows: propionic acid (1), mevalonic acid lactone (2), ${\beta}$-eudesmane (3), and ${\beta}$-eudesmol (4), respectively (Figure 1). The main peaks of ${\beta}$-eudesmane and ${\beta}$-eudesmol, which were indicative of sesquiterpene structures, were consistently detected for 5, 7, 12, and 15 days These results demonstrated the existence of terpene metabolism in the mycelium of P. brumalis. Polyporus spp. are known to generate flavor components such as methyl 2,4-dihydroxy-3,6-dimethyl benzoate; 2-hydroxy-4-methoxy-6-methyl benzoic acid; 3-hydroxy-5-methyl phenol; and 3-methoxy-2,5-dimethyl phenol in submerged cultures (Hoffmann and Esser 1978). Drimanes of sesquiterpenes were reported as metabolites from P. arcularius and shown to exhibit antimicrobial activity against Gram-positive bacteria such as Staphylococcus aureus (Fleck et al. 1996). The main metabolites of P. brumalis, ${\beta}$-Eudesmol and ${\beta}$-eudesmane, were categorized as eudesmane-type sesquiterpene structures. The eudesmane skeleton could be biosynthesized from FPP-derived IPP, and approximately 1,000 structures have been identified in plants as essential oils. The biosynthesis of eudesmol from P. brumalis may thus be an important tool for the production of useful natural compounds as presumed from its identified potent bioactivity in plants. Essential oils comprising eudesmane-type sesquiterpenoids have been previously and extensively researched (Wu et al. 2006). ${\beta}$-Eudesmol is a well-known and important eudesmane alcohol with an anticholinergic effect in the vascular endothelium (Tsuneki et al. 2005). Additionally, recent studies demonstrated that ${\beta}$-eudesmol acts as a channel blocker for nicotinic acetylcholine receptors at the neuromuscular junction, and it can inhibit angiogenesis in vitro and in vivo by blocking the mitogen-activated protein kinase (MAPK) signaling pathway (Seo et al. 2011). Variation of nutrients was conducted to determine an optimum condition for the biosynthesis of sesquiterpenes by P. brumalis. Genes encoding terpene synthases, which are crucial to the terpene synthesis pathway, generally respond to environmental factors such as pH, temperature, and available nutrients (Hoffmeister and Keller 2007, Yu and Keller 2005). Calvo et al. described the effect of major nutrients, carbon and nitrogen, on the synthesis of secondary metabolites (Calvo et al. 2002). P. brumalis did not prefer to synthesize sesquiterpenes under all growth conditions. Results of differences in metabolites observed in P. brumalis grown in PDB and modified medium highlighted the potential effect inorganic sources such as $C_4H_{12}N_2O_6$, $KH_2PO_4$, $MgSO_4$, and $CaCl_2$ on sesquiterpene synthesis. ${\beta}$-eudesmol was apparent during cultivation except for when P. brumalis was grown on $MgSO_4$-free medium. These results demonstrated that $MgSO_4$ can specifically control the biosynthesis of ${\beta}$-eudesmol. Magnesium has been reported as a cofactor that binds to sesquiterpene synthase (Agger et al. 2008). Specifically, the $Mg^{2+}$ ions bind to two conserved metal-binding motifs. These metal ions complex to the substrate pyrophosphate, thereby promoting the ionization of the leaving groups of FPP and resulting in the generation of a highly reactive allylic cation. Effect of magnesium source on the sesquiterpene biosynthesis was also identified via analysis of the concentration of total carbohydrates. Our current study offered further insight that fungal sesquiterpene biosynthesis can be controlled by nutrients. To profile the metabolites of P. brumalis, the cultures were extracted based on the growth curve. Despite metabolites produced during mycelia growth, there was difficulty in detecting significant changes in metabolite production, especially those at low concentrations. These compounds may be of interest in understanding their synthetic mechanisms in P. brumalis. The synthesis of terpene compounds began during the growth phase at day 9. Sesquiterpene synthesis occurred after growth was complete. At day 9, drimenol, farnesol, and mevalonic lactone (or mevalonic acid lactone) were identified. Mevalonic acid lactone is the precursor of the mevalonic pathway, and particularly, it is a precursor for a number of biologically important lipids, including cholesterol hormones (Buckley et al. 2002). Farnesol is the precursor of sesquiterpenoids. Drimenol compounds, bi-cyclic-sesquiterpene alcohols, can be synthesized from trans-trans farnesol via cyclization and rearrangement (Polovinka et al. 1994). They have also been identified in the basidiomycota Lentinus lepideus as secondary metabolites. After 12 days in the growth phase, ${\beta}$-elemene caryophyllene, ${\delta}$-cadiene, and eudesmane were detected with ${\beta}$-eudesmol. The data showed the synthesis of sesquiterpene hydrocarbons with bi-cyclic structures. These compounds can be synthesized from FPP by cyclization. Cyclic terpenoids are synthesized through the formation of a carbon skeleton from linear precursors by terpene cyclase, which is followed by chemical modification by oxidation, reduction, methylation, etc. Sesquiterpene cyclase is a key branch-point enzyme that catalyzes the complex intermolecular cyclization of the linear prenyl diphosphate into cyclic hydrocarbons (Toyomasu et al. 2007). After 20 days in stationary phase, the oxygenated structures eudesmol, elemol, and caryophyllene oxide were detected. Thus, after growth, sesquiterpenes were identified. Per these results, we showed that terpene metabolism in wood-rotting fungi occurs in the stationary phase. We also showed that such metabolism can be controlled by magnesium supplementation in the growth medium. In conclusion, we identified P. brumalis as a wood-rotting fungus that can produce sesquiterpenes. To mechanistically understand eudesmane-type sesquiterpene biosynthesis in P. brumalis, further research into the genes regulating the dynamics of such biosynthesis is warranted.

  • PDF

Construction of Consumer Confidence index based on Sentiment analysis using News articles (뉴스기사를 이용한 소비자의 경기심리지수 생성)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.1-27
    • /
    • 2017
  • It is known that the economic sentiment index and macroeconomic indicators are closely related because economic agent's judgment and forecast of the business conditions affect economic fluctuations. For this reason, consumer sentiment or confidence provides steady fodder for business and is treated as an important piece of economic information. In Korea, private consumption accounts and consumer sentiment index highly relevant for both, which is a very important economic indicator for evaluating and forecasting the domestic economic situation. However, despite offering relevant insights into private consumption and GDP, the traditional approach to measuring the consumer confidence based on the survey has several limits. One possible weakness is that it takes considerable time to research, collect, and aggregate the data. If certain urgent issues arise, timely information will not be announced until the end of each month. In addition, the survey only contains information derived from questionnaire items, which means it can be difficult to catch up to the direct effects of newly arising issues. The survey also faces potential declines in response rates and erroneous responses. Therefore, it is necessary to find a way to complement it. For this purpose, we construct and assess an index designed to measure consumer economic sentiment index using sentiment analysis. Unlike the survey-based measures, our index relies on textual analysis to extract sentiment from economic and financial news articles. In particular, text data such as news articles and SNS are timely and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. There exist two main approaches to the automatic extraction of sentiment from a text, we apply the lexicon-based approach, using sentiment lexicon dictionaries of words annotated with the semantic orientations. In creating the sentiment lexicon dictionaries, we enter the semantic orientation of individual words manually, though we do not attempt a full linguistic analysis (one that involves analysis of word senses or argument structure); this is the limitation of our research and further work in that direction remains possible. In this study, we generate a time series index of economic sentiment in the news. The construction of the index consists of three broad steps: (1) Collecting a large corpus of economic news articles on the web, (2) Applying lexicon-based methods for sentiment analysis of each article to score the article in terms of sentiment orientation (positive, negative and neutral), and (3) Constructing an economic sentiment index of consumers by aggregating monthly time series for each sentiment word. In line with existing scholarly assessments of the relationship between the consumer confidence index and macroeconomic indicators, any new index should be assessed for its usefulness. We examine the new index's usefulness by comparing other economic indicators to the CSI. To check the usefulness of the newly index based on sentiment analysis, trend and cross - correlation analysis are carried out to analyze the relations and lagged structure. Finally, we analyze the forecasting power using the one step ahead of out of sample prediction. As a result, the news sentiment index correlates strongly with related contemporaneous key indicators in almost all experiments. We also find that news sentiment shocks predict future economic activity in most cases. In almost all experiments, the news sentiment index strongly correlates with related contemporaneous key indicators. Furthermore, in most cases, news sentiment shocks predict future economic activity; in head-to-head comparisons, the news sentiment measures outperform survey-based sentiment index as CSI. Policy makers want to understand consumer or public opinions about existing or proposed policies. Such opinions enable relevant government decision-makers to respond quickly to monitor various web media, SNS, or news articles. Textual data, such as news articles and social networks (Twitter, Facebook and blogs) are generated at high-speeds and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. Although research using unstructured data in economic analysis is in its early stages, but the utilization of data is expected to greatly increase once its usefulness is confirmed.

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.

Effects of Y Chromosome Microdeletion on the Outcome of in vitro Fertilization (남성 불임 환자에서 Y 염색체 미세 결손이 체외 수정 결과에 미치는 영향)

  • Choi, Noh-Mi;Yang, Kwang-Moon;Kang, Inn-Soo;Seo, Ju-Tae;Song, In-Ok;Park, Chan-Woo;Lee, Hyoung-Song;Lee, Hyun-Joo;Ahn, Ka-Young;Hahn, Ho-Suap;Lee, Hee-Jung;Kim, Na-Young;Yu, Seung-Youn
    • Clinical and Experimental Reproductive Medicine
    • /
    • v.34 no.1
    • /
    • pp.41-48
    • /
    • 2007
  • Objective: To determine whether the presence of Y-chromosome microdeletion affects the outcome of in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) program. Methods: Fourteen couples with microdeletion in azoospermic factor (AZF)c region who attempted IVF/ICSI or cryopreserved and thawed embryo transfer cycles were enrolled. All of the men showed severe oligoasthenoteratoazoospermia (OATS) or azoospermia. As a control, 12 couples with OATS or azoospermia and having normal Y-chromosome were included. Both groups were divided into two subgroups by sperm source used in ICSI such as those who underwent testicular sperm extraction (TESE) and those used ejaculate sperm. We retrospectively analyzed our database in respect to the IVF outcomes. The outcome measures were mean number of good quality embryos, fertilization rates, implantation rates, $\beta$-hCG positive rates, early pregnancy loss and live birth rates. Results: Mean number of good quality embryos, implantation rates, $\beta$-hCG positive rates, early pregnancy loss rates and live birth rates were not significantly different between Y-chromosome microdeletion and control groups. But, fertilization rates in the Y-chromosome microdeletion group (61.1%) was significantly lower than that of control group (79.8%, p=0.003). Also, the subgroup underwent TESE and having AZFc microdeletion showed significantly lower fertilization rates (52.9%) than the subgroup underwent TESE and having normal Y-chromosome (79.5%, p=0.008). Otherwise, in the subgroups used ejaculate sperm, fertilization rates were showed tendency toward lower in couples having Y-chromosome microdeletion than couples with normal Y-chromosome. (65.5% versus 79.9%, p=0.082). But, there was no significance statistically. Conclusions: In IVF/ICSI cycles using TESE sperm, presence of V-chromosome microdeletion may adversely affect to fertilization ability of injected sperm. But, in cases of ejaculate sperm available for ICSI, IVF outcome was not affected by presence of Y-chromosome AZFc microdeletion. However, more larger scaled prospective study was needed to support our results.

Development and Validation of the Analytical Method for Oxytetracycline in Agricultural Products using QuEChERS and LC-MS/MS (QuEChERS법 및 LC-MS/MS를 이용한 농산물 중 Oxytetracycline의 잔류시험법 개발 및 검증)

  • Cho, Sung Min;Do, Jung-Ah;Lee, Han Sol;Park, Ji-Su;Shin, Hye-Sun;Jang, Dong Eun;Cho, Myong-Shik;Jung, ong-hyun;Lee, Kangbong
    • Journal of Food Hygiene and Safety
    • /
    • v.34 no.3
    • /
    • pp.227-234
    • /
    • 2019
  • An analytical method was developed for the determination of oxytetracycline in agricultural products using the QuEChERS (Quick, Easy, Cheap, Effective, Rugged and Safe) method by liquid chromatography-tandem mass spectrometry (LC-MS/MS). After the samples were extracted with methanol, the extracts were adjusted to pH 4 by formic acid and sodium chloride was added to remove water. Dispersive solid phase extraction (d-SPE) cleanup was carried out using $MgSO_4$ (anhydrous magnesium sulfate), PSA (primary secondary amine), $C_{18}$ (octadecyl) and GCB (graphitized carbon black). The analytes were quantified and confirmed with LC-MS/MS using ESI (electrospray ionization) in positive ion MRM (multiple reaction monitoring) mode. The matrix-matched calibration curves were constructed using six levels ($0.001{\sim}0.25{\mu}g/mL$) and coefficient of determination ($r^2$) was above 0.99. Recovery results at three concentrations (LOQ, $10{\times}LOQ$, and $50{\times}LOQ$, n=5) were from 80.0 to 108.2% with relative standard deviations (RSDs) less than of 11.4%. For inter-laboratory validation, the average recovery was in the range of 83.5~103.2% and the coefficient of variation (CV) was below 14.1%. All results satisfied the criteria ranges requested in the Codex guidelines (CAC/GL 40-1993, 2003) and the Food Safety Evaluation Department guidelines (2016). The proposed analytical method was accurate, effective and sensitive for oxytetracycline determination in agricultural commodities. This study could be useful for safety management of oxytetracycline residues in agricultural products.

Development of a Simultaneous Analytical Method for Determination of Insecticide Broflanilide and Its Metabolite Residues in Agricultural Products Using LC-MS/MS (LC-MS/MS를 이용한 농산물 중 살충제 Broflanilide 및 대사물질 동시시험법 개발)

  • Park, Ji-Su;Do, Jung-Ah;Lee, Han Sol;Park, Shin-min;Cho, Sung Min;Kim, Ji-Young;Shin, Hye-Sun;Jang, Dong Eun;Jung, Yong-hyun;Lee, Kangbong
    • Journal of Food Hygiene and Safety
    • /
    • v.34 no.2
    • /
    • pp.124-134
    • /
    • 2019
  • An analytical method was developed for the determination of broflanilide and its metabolites in agricultural products. Sample preparation was conducted using the QuEChERS (Quick, Easy, Cheap, Effective, Rugged and Safe) method and LC-MS/MS (liquid chromatograph-tandem mass spectrometer). The analytes were extracted with acetonitrile and cleaned up using d-SPE (dispersive solid phase extraction) sorbents such as anhydrous magnesium sulfate, primary secondary amine (PSA) and octadecyl ($C_{18}$). The limit of detection (LOD) and quantification (LOQ) were 0.004 and 0.01 mg/kg, respectively. The recovery results for broflanilide, DM-8007 and S(PFP-OH)-8007 ranged between 90.7 to 113.7%, 88.2 to 109.7% and 79.8 to 97.8% at different concentration levels (LOQ, 10LOQ, 50LOQ) with relative standard deviation (RSD) less than 8.8%. The inter-laboratory study recovery results for broflanilide and DM-8007 and S (PFP-OH)-8007 ranged between 86.3 to 109.1%, 87.8 to 109.7% and 78.8 to 102.1%, and RSD values were also below 21%. All values were consistent with the criteria ranges requested in the Codex guidelines (CAC/GL 40-1993, 2003) and the Food and Drug Safety Evaluation guidelines (2016). Therefore, the proposed analytical method was accurate, effective and sensitive for broflanilide determination in agricultural commodities.

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.


  • (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.