• Title/Summary/Keyword: library performance

Search Result 981, Processing Time 0.026 seconds

Topic Model Augmentation and Extension Method using LDA and BERTopic (LDA와 BERTopic을 이용한 토픽모델링의 증강과 확장 기법 연구)

  • Kim, SeonWook;Yang, Kiduk
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.3
    • /
    • pp.99-132
    • /
    • 2022
  • The purpose of this study is to propose AET (Augmented and Extended Topics), a novel method of synthesizing both LDA and BERTopic results, and to analyze the recently published LIS articles as an experimental approach. To achieve the purpose of this study, 55,442 abstracts from 85 LIS journals within the WoS database, which spans from January 2001 to October 2021, were analyzed. AET first constructs a WORD2VEC-based cosine similarity matrix between LDA and BERTopic results, extracts AT (Augmented Topics) by repeating the matrix reordering and segmentation procedures as long as their semantic relations are still valid, and finally determines ET (Extended Topics) by removing any LDA related residual subtopics from the matrix and ordering the rest of them by F1 (BERTopic topic size rank, Inverse cosine similarity rank). AET, by comparing with the baseline LDA result, shows that AT has effectively concretized the original LDA topic model and ET has discovered new meaningful topics that LDA didn't. When it comes to the qualitative performance evaluation, AT performs better than LDA while ET shows similar performances except in a few cases.

A Machine Learning-Based Encryption Behavior Cognitive Technique for Ransomware Detection (랜섬웨어 탐지를 위한 머신러닝 기반 암호화 행위 감지 기법)

  • Yoon-Cheol Hwang
    • Journal of Industrial Convergence
    • /
    • v.21 no.12
    • /
    • pp.55-62
    • /
    • 2023
  • Recent ransomware attacks employ various techniques and pathways, posing significant challenges in early detection and defense. Consequently, the scale of damage is continually growing. This paper introduces a machine learning-based approach for effective ransomware detection by focusing on file encryption and encryption patterns, which are pivotal functionalities utilized by ransomware. Ransomware is identified by analyzing password behavior and encryption patterns, making it possible to detect specific ransomware variants and new types of ransomware, thereby mitigating ransomware attacks effectively. The proposed machine learning-based encryption behavior detection technique extracts encryption and encryption pattern characteristics and trains them using a machine learning classifier. The final outcome is an ensemble of results from two classifiers. The classifier plays a key role in determining the presence or absence of ransomware, leading to enhanced accuracy. The proposed technique is implemented using the numpy, pandas, and Python's Scikit-Learn library. Evaluation indicators reveal an average accuracy of 94%, precision of 95%, recall rate of 93%, and an F1 score of 95%. These performance results validate the feasibility of ransomware detection through encryption behavior analysis, and further research is encouraged to enhance the technique for proactive ransomware detection.

Exploring ESG Activities Using Text Analysis of ESG Reports -A Case of Chinese Listed Manufacturing Companies- (ESG 보고서의 텍스트 분석을 이용한 ESG 활동 탐색 -중국 상장 제조 기업을 대상으로-)

  • Wung Chul Jin;Seung Ik Baek;Yu Feng Sun;Xiang Dan Jin
    • Journal of Service Research and Studies
    • /
    • v.14 no.2
    • /
    • pp.18-36
    • /
    • 2024
  • As interest in ESG has been increased, it is easy to find papers that empirically study that a company's ESG activities have a positive impact on the company's performance. However, research on what ESG activities companies should actually engage in is relatively lacking. Accordingly, this study systematically classifies ESG activities of companies and seeks to provide insight to companies seeking to plan new ESG activities. This study analyzes how Chinese manufacturing companies perform ESG activities based on their dynamic capabilities in the global economy and how they differ in their activities. This study used the ESG annual reports of 151 Chinese manufacturing listed companies on the Shanghai & Shenzhen Stock Exchange and ESG indicators of China Securities Index Company (CSI) as data. This study focused on the following three research questions. The first is to determine whether there are any differences in ESG activities between companies with high ESG scores (TOP-25) and companies with low ESG scores (BOT-25), and the second is to determine whether there are any changes in ESG activities over a 10-year period (2010-2019), focusing only on companies with high ESG scores. The results showed that there was a significant difference in ESG activities between high and low ESG scorers, while tracking the year-to-year change in activities of the top-25 companies did not show any difference in ESG activities. In the third study, social network analysis was conducted on the keywords of E/S/G. Through the co-concurrence matrix technique, we visualized the ESG activities of companies in a four-quadrant graph and set the direction for ESG activities based on this.

A Narrative Literature Review on the Neural Substrates of Cognitive Reserve: Focusing on the Resting-state Functional Magnetic Resonance Imaging Studies (인지예비능의 신경적 기질에 대한 서술적 문헌고찰 연구 : 휴지기 기능적 자기공명영상 연구를 중심으로)

  • Hyeonsang Shin;Woohyun Seong;Bo-in Kwon;Yeonju Woo;Joo-Hee Kim;Dong Hyuk Lee
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.38 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • Cognitive reserve (CR) is a concept that can explain the discrepancies between the pathologic burden of the disease and clinical manifestations. It refers to the individual susceptibility to age-related brain changes and pathologies related to Alzheimer's disease, thus recognized as a factor affecting the trajectories of the disease. The purpose of this study was to explore the current states of clinical studies on neural substrates of CR in Alzheimer's disease using functional magnetic resonance imaging. We searched for clinical studies on CR using fMRI in the Pubmed, Cochrane library, RISS, KISS and ScienceON on August 14, 2023. Once the online search was finished, studies were selected manually by the inclusion criteria. Finally, we analyzed the characteristics of selected articles and reviewed the neural substrates of CR. Total thirty-four studies were included in this study. As surrogate markers of CR, not only education and occupational complexity, but also composite score and questionnaire-based method, which cover various areas of life, were mainly used. The most utilized methods in resting-state fMRI were independent component analysis, seed-based analysis, and graph theory analysis. Through the analysis, we demonstrated that neuroimaging techniques could capture the neural substrates associated with cognitive reserve. Moreover, functional connectivity of brain regions centered on prefrontal and parietal cortex and network areas such as default mode network showed a significant correlation with CR, which indicated a significant association with cognitive performance. CR may induce differential effects according to the disease status. We hope that this perspective on cognitive reserve would be helpful when conducting clinical researches on the mechanisms of traditional Korean medicine for Alzheimer's disease in the future.

A Study on Analysis of Research Trends and Intellectual Structure in the Overseas Cataloging Research (해외 목록학 연구동향 및 지적구조 분석)

  • Ji Won Lee;Sung Sook Lee
    • Journal of the Korean Society for information Management
    • /
    • v.41 no.1
    • /
    • pp.367-387
    • /
    • 2024
  • This study aims to identify the recent trends and intellectual structure of international research in the field of catalog, which is undergoing a major change due to the enactment of new standards and rules and the anticipated future. For this purpose, we collected 680 articles published in the 14 years since 2010 and analyzed 1,942 author keywords extracted from them after preprocessing. The main findings of the analysis are as follows First, overseas cataloging research has seen notable growth since 2017. Second, the most frequent research topics were: cataloging, metadata, RDA, university libraries, authority control, linked data, FRBR, catalog, LCSH, libraries, andonline cataloging. Third, the research themes were divided into two clusters, one related to the traditional aspects of library cataloging and the other related to the more recently discussed topics of authority control, cooperative cataloging, RDA, and linked data, which were further subdivided into 14 subclusters. Fourth, we looked at the growth index and standard performance index of the 14 keyword clusters and found that all but one cluster showed growth in terms of discipline growth. This study is significant in that it can be used as a basis for predicting the future development of inventories for Korean academia and the field and for related education.

Tea Leaf Disease Classification Using Artificial Intelligence (AI) Models (인공지능(AI) 모델을 사용한 차나무 잎의 병해 분류)

  • K.P.S. Kumaratenna;Young-Yeol Cho
    • Journal of Bio-Environment Control
    • /
    • v.33 no.1
    • /
    • pp.1-11
    • /
    • 2024
  • In this study, five artificial intelligence (AI) models: Inception v3, SqueezeNet (local), VGG-16, Painters, and DeepLoc were used to classify tea leaf diseases. Eight image categories were used: healthy, algal leaf spot, anthracnose, bird's eye spot, brown blight, gray blight, red leaf spot, and white spot. Software used in this study was Orange 3 which functions as a Python library for visual programming, that operates through an interface that generates workflows to visually manipulate and analyze the data. The precision of each AI model was recorded to select the ideal AI model. All models were trained using the Adam solver, rectified linear unit activation function, 100 neurons in the hidden layers, 200 maximum number of iterations in the neural network, and 0.0001 regularizations. To extend the functionality of Orange 3, new add-ons can be installed and, this study image analytics add-on was newly added which is required for image analysis. For the training model, the import image, image embedding, neural network, test and score, and confusion matrix widgets were used, whereas the import images, image embedding, predictions, and image viewer widgets were used for the prediction. Precisions of the neural networks of the five AI models (Inception v3, SqueezeNet (local), VGG-16, Painters, and DeepLoc) were 0.807, 0.901, 0.780, 0.800, and 0.771, respectively. Finally, the SqueezeNet (local) model was selected as the optimal AI model for the detection of tea diseases using tea leaf images owing to its high precision and good performance throughout the confusion matrix.

Field Studios of In-situ Aerobic Cometabolism of Chlorinated Aliphatic Hydrocarbons

  • Semprini, Lewts
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • 2004.04a
    • /
    • pp.3-4
    • /
    • 2004
  • Results will be presented from two field studies that evaluated the in-situ treatment of chlorinated aliphatic hydrocarbons (CAHs) using aerobic cometabolism. In the first study, a cometabolic air sparging (CAS) demonstration was conducted at McClellan Air Force Base (AFB), California, to treat chlorinated aliphatic hydrocarbons (CAHs) in groundwater using propane as the cometabolic substrate. A propane-biostimulated zone was sparged with a propane/air mixture and a control zone was sparged with air alone. Propane-utilizers were effectively stimulated in the saturated zone with repeated intermediate sparging of propane and air. Propane delivery, however, was not uniform, with propane mainly observed in down-gradient observation wells. Trichloroethene (TCE), cis-1, 2-dichloroethene (c-DCE), and dissolved oxygen (DO) concentration levels decreased in proportion with propane usage, with c-DCE decreasing more rapidly than TCE. The more rapid removal of c-DCE indicated biotransformation and not just physical removal by stripping. Propane utilization rates and rates of CAH removal slowed after three to four months of repeated propane additions, which coincided with tile depletion of nitrogen (as nitrate). Ammonia was then added to the propane/air mixture as a nitrogen source. After a six-month period between propane additions, rapid propane-utilization was observed. Nitrate was present due to groundwater flow into the treatment zone and/or by the oxidation of tile previously injected ammonia. In the propane-stimulated zone, c-DCE concentrations decreased below tile detection limit (1 $\mu$g/L), and TCE concentrations ranged from less than 5 $\mu$g/L to 30 $\mu$g/L, representing removals of 90 to 97%. In the air sparged control zone, TCE was removed at only two monitoring locations nearest the sparge-well, to concentrations of 15 $\mu$g/L and 60 $\mu$g/L. The responses indicate that stripping as well as biological treatment were responsible for the removal of contaminants in the biostimulated zone, with biostimulation enhancing removals to lower contaminant levels. As part of that study bacterial population shifts that occurred in the groundwater during CAS and air sparging control were evaluated by length heterogeneity polymerase chain reaction (LH-PCR) fragment analysis. The results showed that an organism(5) that had a fragment size of 385 base pairs (385 bp) was positively correlated with propane removal rates. The 385 bp fragment consisted of up to 83% of the total fragments in the analysis when propane removal rates peaked. A 16S rRNA clone library made from the bacteria sampled in propane sparged groundwater included clones of a TM7 division bacterium that had a 385bp LH-PCR fragment; no other bacterial species with this fragment size were detected. Both propane removal rates and the 385bp LH-PCR fragment decreased as nitrate levels in the groundwater decreased. In the second study the potential for bioaugmentation of a butane culture was evaluated in a series of field tests conducted at the Moffett Field Air Station in California. A butane-utilizing mixed culture that was effective in transforming 1, 1-dichloroethene (1, 1-DCE), 1, 1, 1-trichloroethane (1, 1, 1-TCA), and 1, 1-dichloroethane (1, 1-DCA) was added to the saturated zone at the test site. This mixture of contaminants was evaluated since they are often present as together as the result of 1, 1, 1-TCA contamination and the abiotic and biotic transformation of 1, 1, 1-TCA to 1, 1-DCE and 1, 1-DCA. Model simulations were performed prior to the initiation of the field study. The simulations were performed with a transport code that included processes for in-situ cometabolism, including microbial growth and decay, substrate and oxygen utilization, and the cometabolism of dual contaminants (1, 1-DCE and 1, 1, 1-TCA). Based on the results of detailed kinetic studies with the culture, cometabolic transformation kinetics were incorporated that butane mixed-inhibition on 1, 1-DCE and 1, 1, 1-TCA transformation, and competitive inhibition of 1, 1-DCE and 1, 1, 1-TCA on butane utilization. A transformation capacity term was also included in the model formation that results in cell loss due to contaminant transformation. Parameters for the model simulations were determined independently in kinetic studies with the butane-utilizing culture and through batch microcosm tests with groundwater and aquifer solids from the field test zone with the butane-utilizing culture added. In microcosm tests, the model simulated well the repetitive utilization of butane and cometabolism of 1.1, 1-TCA and 1, 1-DCE, as well as the transformation of 1, 1-DCE as it was repeatedly transformed at increased aqueous concentrations. Model simulations were then performed under the transport conditions of the field test to explore the effects of the bioaugmentation dose and the response of the system to tile biostimulation with alternating pulses of dissolved butane and oxygen in the presence of 1, 1-DCE (50 $\mu$g/L) and 1, 1, 1-TCA (250 $\mu$g/L). A uniform aquifer bioaugmentation dose of 0.5 mg/L of cells resulted in complete utilization of the butane 2-meters downgradient of the injection well within 200-hrs of bioaugmentation and butane addition. 1, 1-DCE was much more rapidly transformed than 1, 1, 1-TCA, and efficient 1, 1, 1-TCA removal occurred only after 1, 1-DCE and butane were decreased in concentration. The simulations demonstrated the strong inhibition of both 1, 1-DCE and butane on 1, 1, 1-TCA transformation, and the more rapid 1, 1-DCE transformation kinetics. Results of tile field demonstration indicated that bioaugmentation was successfully implemented; however it was difficult to maintain effective treatment for long periods of time (50 days or more). The demonstration showed that the bioaugmented experimental leg effectively transformed 1, 1-DCE and 1, 1-DCA, and was somewhat effective in transforming 1, 1, 1-TCA. The indigenous experimental leg treated in the same way as the bioaugmented leg was much less effective in treating the contaminant mixture. The best operating performance was achieved in the bioaugmented leg with about over 90%, 80%, 60 % removal for 1, 1-DCE, 1, 1-DCA, and 1, 1, 1-TCA, respectively. Molecular methods were used to track and enumerate the bioaugmented culture in the test zone. Real Time PCR analysis was used to on enumerate the bioaugmented culture. The results show higher numbers of the bioaugmented microorganisms were present in the treatment zone groundwater when the contaminants were being effective transformed. A decrease in these numbers was associated with a reduction in treatment performance. The results of the field tests indicated that although bioaugmentation can be successfully implemented, competition for the growth substrate (butane) by the indigenous microorganisms likely lead to the decrease in long-term performance.

  • PDF

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Comparison of Flavonoid Characteristics between Blueberry (Vaccinium corymbosum) and Black Raspberry (Rubus coreanus) Cultivated in Korea using UPLC-DAD-QTOF/MS (UPLC-DAD-QTOF/MS를 이용한 국내 재배 블루베리(Vaccinium corymbosum)와 복분자(Rubus coreanus)의 플라보노이드 특성 비교)

  • Jin, Young;Kim, Heon-Woong;Lee, Min-Ki;Lee, Seon-Hye;Jang, Hwan-Hee;Hwang, Yu-Jin;Choe, Jeong-Sook;Lee, Sung-Hyun;Cha, Youn-Soo;Kim, Jung-Bon
    • Korean Journal of Environmental Agriculture
    • /
    • v.36 no.2
    • /
    • pp.87-96
    • /
    • 2017
  • BACKGROUND: The objective of this study was to identify and compare the main phenolic compounds (anthocyanins, flavonoids, phenolic acids) in blueberry and black raspberry cultivated in Korea using ultra-performance liquid chromatography diode array detection-quadrupole time-of-flight mass spectrometry (UPLC-DAD-QTOF/MS). METHODS AND RESULTS: Twenty-nine flavonoids were identified by comparison of ultraviolet and mass spectra with data in a chemical library and published data. Blueberry contained flavonols including kaempferol, quercetin, isorhamnetin, myricetin, and syringetin aglycones. Isorhamnetin 3-O-robinobioside, kaempferol 3-O-(6"-O-acetyl)glucoside, quercetin, quercetin 3-O-arabinofuranoside (avicularin), quercetin 3-O-(6''-O-malonyl) glucoside, and quercetin 3-O-robinobioside were detected for the first time in blueberry. The flavonoids in raspberry consisted of quercetin aglycone and its glycosides. The mean total flavonoid content in blueberry [143.0 mg/100 g dry weight (DW)] was 1.5-times that in raspberry (95.4 mg/100 g DW). The most abundant flavonoid in blueberry was quercetin 3-O-galactoside (hyperoside, up to 76.1 mg/100 g DW) and that in raspberry was quercetin 3-O-glucuronide (miquelianin, up to 55.5 mg/100 g DW). Miquelianin was not detected in blueberry. CONCLUSION: Flavonol glycosides were the main flavonoids in blueberry and black raspberry cultivated in Korea. The composition and contents of flavonoids differed between blueberry and black raspberry, and may be affected by the cultivar and cultivation conditions.

Data Mining Approaches for DDoS Attack Detection (분산 서비스거부 공격 탐지를 위한 데이터 마이닝 기법)

  • Kim, Mi-Hui;Na, Hyun-Jung;Chae, Ki-Joon;Bang, Hyo-Chan;Na, Jung-Chan
    • Journal of KIISE:Information Networking
    • /
    • v.32 no.3
    • /
    • pp.279-290
    • /
    • 2005
  • Recently, as the serious damage caused by DDoS attacks increases, the rapid detection and the proper response mechanisms are urgent. However, existing security mechanisms do not effectively defend against these attacks, or the defense capability of some mechanisms is only limited to specific DDoS attacks. In this paper, we propose a detection architecture against DDoS attack using data mining technology that can classify the latest types of DDoS attack, and can detect the modification of existing attacks as well as the novel attacks. This architecture consists of a Misuse Detection Module modeling to classify the existing attacks, and an Anomaly Detection Module modeling to detect the novel attacks. And it utilizes the off-line generated models in order to detect the DDoS attack using the real-time traffic. We gathered the NetFlow data generated at an access router of our network in order to model the real network traffic and test it. The NetFlow provides the useful flow-based statistical information without tremendous preprocessing. Also, we mounted the well-known DDoS attack tools to gather the attack traffic. And then, our experimental results show that our approach can provide the outstanding performance against existing attacks, and provide the possibility of detection against the novel attack.