• Title/Summary/Keyword: Co-occurrence feature

Search Result 89, Processing Time 0.025 seconds

GCNXSS: An Attack Detection Approach for Cross-Site Scripting Based on Graph Convolutional Networks

  • Pan, Hongyu;Fang, Yong;Huang, Cheng;Guo, Wenbo;Wan, Xuelin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.4008-4023
    • /
    • 2022
  • Since machine learning was introduced into cross-site scripting (XSS) attack detection, many researchers have conducted related studies and achieved significant results, such as saving time and labor costs by not maintaining a rule database, which is required by traditional XSS attack detection methods. However, this topic came across some problems, such as poor generalization ability, significant false negative rate (FNR) and false positive rate (FPR). Moreover, the automatic clustering property of graph convolutional networks (GCN) has attracted the attention of researchers. In the field of natural language process (NLP), the results of graph embedding based on GCN are automatically clustered in space without any training, which means that text data can be classified just by the embedding process based on GCN. Previously, other methods required training with the help of labeled data after embedding to complete data classification. With the help of the GCN auto-clustering feature and labeled data, this research proposes an approach to detect XSS attacks (called GCNXSS) to mine the dependencies between the units that constitute an XSS payload. First, GCNXSS transforms a URL into a word homogeneous graph based on word co-occurrence relationships. Then, GCNXSS inputs the graph into the GCN model for graph embedding and gets the classification results. Experimental results show that GCNXSS achieved successful results with accuracy, precision, recall, F1-score, FNR, FPR, and predicted time scores of 99.97%, 99.75%, 99.97%, 99.86%, 0.03%, 0.03%, and 0.0461ms. Compared with existing methods, GCNXSS has a lower FNR and FPR with stronger generalization ability.

Improving Field Crop Classification Accuracy Using GLCM and SVM with UAV-Acquired Images

  • Seung-Hwan Go;Jong-Hwa Park
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.1
    • /
    • pp.93-101
    • /
    • 2024
  • Accurate field crop classification is essential for various agricultural applications, yet existing methods face challenges due to diverse crop types and complex field conditions. This study aimed to address these issues by combining support vector machine (SVM) models with multi-seasonal unmanned aerial vehicle (UAV) images, texture information extracted from Gray Level Co-occurrence Matrix (GLCM), and RGB spectral data. Twelve high-resolution UAV image captures spanned March-October 2021, while field surveys on three dates provided ground truth data. We focused on data from August (-A), September (-S), and October (-O) images and trained four support vector classifier (SVC) models (SVC-A, SVC-S, SVC-O, SVC-AS) using visual bands and eight GLCM features. Farm maps provided by the Ministry of Agriculture, Food and Rural Affairs proved efficient for open-field crop identification and served as a reference for accuracy comparison. Our analysis showcased the significant impact of hyperparameter tuning (C and gamma) on SVM model performance, requiring careful optimization for each scenario. Importantly, we identified models exhibiting distinct high-accuracy zones, with SVC-O trained on October data achieving the highest overall and individual crop classification accuracy. This success likely stems from its ability to capture distinct texture information from mature crops.Incorporating GLCM features proved highly effective for all models,significantly boosting classification accuracy.Among these features, homogeneity, entropy, and correlation consistently demonstrated the most impactful contribution. However, balancing accuracy with computational efficiency and feature selection remains crucial for practical application. Performance analysis revealed that SVC-O achieved exceptional results in overall and individual crop classification, while soybeans and rice were consistently classified well by all models. Challenges were encountered with cabbage due to its early growth stage and low field cover density. The study demonstrates the potential of utilizing farm maps and GLCM features in conjunction with SVM models for accurate field crop classification. Careful parameter tuning and model selection based on specific scenarios are key for optimizing performance in real-world applications.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Temporal Variations of Ore Mineralogy and Sulfur Isotope Data from the Boguk Cobalt Mine, Korea: Implication for Genesis and Geochemistry of Co-bearing Hydrothermal System (보국 코발트 광상의 산출 광물종 및 황동위원소 조성의 시간적 변화: 함코발트 열수계의 성인과 지화학적 특성 고찰)

  • Yun, Seong-Taek;Youm, Seung-Jun
    • Economic and Environmental Geology
    • /
    • v.30 no.4
    • /
    • pp.289-301
    • /
    • 1997
  • The Boguk cobalt mine is located within the Cretaceous Gyeongsang Sedimentary Basin. Major ore minerals including cobalt-bearing minerals (loellingite, cobaltite, and glaucodot) and Co-bearing arsenopyrite occur together with base-metal sulfides (pyrrhotite, chalcopyrite, pyrite, sphalerite, etc.) and minor amounts of oxides (magnetite and hematite) within fracture-filling $quartz{\pm}actinolite{\pm}carbonate$ veins. These veins are developed within an epicrustal micrographic granite stock which intrudes the Konchonri Formation (mainly of shale). Radiometric date of the granite (85.98 Ma) indicates a Late Cretaceous age for granite emplacement and associated cobalt mineralization. The vein mineralogy is relatively complex and changes with time: cobalt-bearing minerals with actinolite, carbonates, and quartz gangues (stages I and II) ${\rightarrow}$ base-metal sulfides, gold, and Fe oxides with quartz gangues (stage III) ${\rightarrow}$ barren carbonates (stages IV and V). The common occurrence of high-temperature minerals (cobalt-bearing minerals, molybdenite and actinolite) with low-temperature minerals (base-metal sulfides, gold and carbonates) in veins indicates a xenothermal condition of the hydrothermal mineralization. High enrichment of Co in the granite (avg. 50.90 ppm) indicates the magmatic hydrothermal derivation of cobalt from this cooling granite stock, whereas higher amounts of Cu and Zn in the Konchonri Formation shale suggest their derivations largely from shale. The decrease in temperature of hydrothermal fluids with a concomitant increase in fugacity of oxygen with time (for cobalt deposition in stages I and II, $T=560^{\circ}C-390^{\circ}C$ and log $fO_2=$ >-32.7 to -30.7 atm at $350^{\circ}C$; for base-metal sulfide deposition in stage III, $T=380^{\circ}-345^{\circ}C$ and log $fO_2={\geq}-30.7$ atm at $350^{\circ}C$) indicates a transition of the hydrothermal system from a magmatic-water domination toward a less-evolved meteoric-water domination. Sulfur isotope data of stage II sulfide minerals evidence that early, Co-bearing hydrothermal fluids derived originally from an igneous source with a ${\delta}^{34}S_{{\Sigma}S}$ value near 3 to 5‰. The remarkable increase in ${\delta}^{34}S_{H2S}$ values of hydrothermal fluids with time from cobalt deposition in stage II (3-5‰) to base-metal sulfide deposition in stage III (up to about 20‰) also indicates the change of the hydrothermal system toward the meteoric water domination, which resulted in the leaching-out and concentration of isotopically heavier sulfur (sedimentary sulfates), base metals (Cu, Zn, etc.) and gold from surrounding sedimentary rocks during the huge, meteoric water circulation. We suggest that without the formation of the later, meteoric water circulation extensively through surrounding sedimentary rocks the Boguk cobalt deposits would be simple veins only with actinolite + quartz + cobalt-bearing minerals. Furthermore, the formation of the meteoric water circulation after the culmination of a magmatic hydrothermal system resulted in the common occurrence of high-temperature minerals with later, lower-temperature minerals, resulting in a xenothermal feature of the mineralization.

  • PDF

A Study on High-Resolution Seasonal Variations of Major Ionic Species in Recent Snow Near the Antarctic Jang Bogo Station (남극 장보고과학기지 인근에서 채취한 눈시료 내의 주요 이온성분들의 고해상도 계절변동성 연구)

  • Kwak, Hoje;Kang, Jung-Ho;Hong, Sang-Bum;Lee, Jeonghoon;Chang, Chaewon;Hur, Soon Do;Hong, Sungmin
    • Ocean and Polar Research
    • /
    • v.37 no.2
    • /
    • pp.127-140
    • /
    • 2015
  • A continuous series of 60 snow samples was collected at a 2.5-cm interval from a 1.5-m snow pit at a site on the Styx Glacier Plateau in Victoria Land, Antarctica, during the 2011/2012 austral summer season. Various chemical components (${\delta}D$, ${\delta}^{18}O$, $Na^+$, $K^+$, $Mg^{2+}$, $Ca^{2+}$, $Cl^-$, $SO_4{^2-}$, $NO_3{^-}$, $F^-$, $CH_3SO_3{^-}$, $CH_3CO_2{^-}$ and $HCO_2{^-}$) were determined to understand the highly resolved seasonal variations of these species in the coastal atmosphere near the Antarctic Jang Bogo station. Based on vertical profiles of ${\delta}^{18}O$, $NO_3{^-}$and MSA, which showed prominent seasonal changes in concentrations, the snow samples were dated to cover the time period from 2009 austral winter to 2012 austral summer with a mean accumulation rate of $226kgH_2Om^{-2}yr^{-1}$. Our snow profiles show pronounced seasonal variations for all the measured chemical species with a different pattern between different species. The distinctive feature of the occurrence patterns of the seasonal variations is clearly linked to changes in the relative strength of contributions from various natural sources (sea salt spray, volcanoes, crust-derived dust, and marine biogenic activities) during different short-term periods. The results allow us to understand the transport pathways and input mechanisms for each species and provide valuable information that will be useful for investigating long-term (decades to century scale periods) climate and environmental changes that can be deduced from an ice core to be retrieved from the Styx Glacier Plateau in the near future.

Performance Evaluations for Leaf Classification Using Combined Features of Shape and Texture (형태와 텍스쳐 특징을 조합한 나뭇잎 분류 시스템의 성능 평가)

  • Kim, Seon-Jong;Kim, Dong-Pil
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.1-12
    • /
    • 2012
  • There are many trees in a roadside, parks or facilities for landscape. Although we are easily seeing a tree in around, it would be difficult to classify it and to get some information about it, such as its name, species and surroundings of the tree. To find them, you have to find the illustrated books for plants or search for them on internet. The important components of a tree are leaf, flower, bark, and so on. Generally we can classify the tree by its leaves. A leaf has the inherited features of the shape, vein, and so on. The shape is important role to decide what the tree is. And texture included in vein is also efficient feature to classify them. This paper evaluates the performance of a leaf classification system using both shape and texture features. We use Fourier descriptors for shape features, and both gray-level co-occurrence matrices and wavelets for texture features, and used combinations of such features for evaluation of images from the Flavia dataset. We compared the recognition rates and the precision-recall performances of these features. Various experiments showed that a combination of shape and texture gave better results for performance. The best came from the case of a combination of features of shape and texture with a flipped contour for a Fourier descriptor.

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Ultrastructural Differentiation of the Vacuole in Mesophyll Tissues of Orostachys (바위솔속 엽육조직 세포 내 액포의 미세구조 분화 양상)

  • Kim, In-Sun
    • Applied Microscopy
    • /
    • v.39 no.4
    • /
    • pp.333-340
    • /
    • 2009
  • In the present study, ultrastructural features of the mesophyll tissue have been investigated in Crassulacean acid metabolism (CAM)-performing succulent Orostachys. A large central vacuole and numerous small vacuoles in the peripheral cytoplasm were characterized at the subcellular level in both developing and mature mesophyll cells. The most notable feature was the invagination of vacuolar membranes into the secondary vacuoles or multivesicular bodies. In many cases, tens of single, membrane-bound secondary vacuoles of various sizes were found to be formed within the central vacuole. multivesicular bodies containing numerous small vesicles were also distributed in the cytoplasm but were better developed within the central vacuole. Occasionally, electron-dense prevacuolar compartments, directly attached to structures appearing to be small vacuoles, were also detected in the cytoplasm. One or more huge central vacuoles were frequently observed in cells undergoing differentiation and maturation. Consistent with the known occurrence of morphologically distinct vacuoles within different tissues, two types of vacuoles, one representing lytic vacuoles and the other, most likely protein storage vacuoles, were noted frequently within Orostachys mesophyll. The two types coexisted in mature vegetative cells but did not merge during the study. Nevertheless, the coexistence of two distinct vacuole types in maturing cells implies the presence of more than one mechanism for vacuolar solute sorting in these species. The vacuolar membrane is known to be unique among the intracellular compartments for having different channels and/or pumps to maintain its function. In CAM plants, the vacuole is a very important organelle that regulates malic acid diurnal fluctuation to a large extent. The membrane invagination seen in Orostachys mesophyll likely plays a significant role in survival under the physiological drought conditions in which these Orostachys occur; by increasing to such a large vacuolar volume, the mesophyll cells are able to retain enormous amounts of acid when needed. Furthermore, the mesophyll cells are able to attain their large sizes with less energy expenditure in order to regulate the large degree of diurnal fluctuation of organic acid that occurs within the vacuoles of Orostachys.