• Title/Summary/Keyword: New Words Detection

Search Result 40, Processing Time 0.024 seconds

A Method to Collect Trusted Processes for Application Whitelisting in macOS (macOS 운영체제에서 화이트리스트 구축을 위한 신뢰 프로세스 수집 연구)

  • Youn, Jung-moo;Ryu, Jae-cheol
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.2
    • /
    • pp.397-405
    • /
    • 2018
  • Blacklist-based tools are most commonly used to effectively detect suspected malicious processes. The blacklist-based tool compares the malicious code extracted from the existing malicious code with the malicious code. Therefore, it is most effective to detect known malicious codes, but there is a limit to detecting malicious code variants. In order to solve this problem, the necessity of a white list-based tool, which is the opposite of black list, has emerged. Whitelist-based tools do not extract features of malicious code processes, but rather collect reliable processes and verify that the process that checks them is a trusted process. In other words, if malicious code is created using a new vulnerability or if variant malicious code appears, it is not in the list of trusted processes, so it can effectively detect malicious code. In this paper, we propose a method for effectively building a whitelist through research that collects reliable processes in the macOS operating system.

The Design of Blog Network Analysis System using Map/Reduce Programming Model (Map/Reduce를 이용한 블로그 연결망 분석 시스템 설계)

  • Joe, In-Whee;Park, Jae-Kyun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.9B
    • /
    • pp.1259-1265
    • /
    • 2010
  • Recently, on-line social network has been increasing according to development of internet. The most representative service is blog. A Blog is a type of personal web site, usually maintained by an individual with regular entries of commentary. These blogs are related to each other, and it is called Blog Network in this paper. In a blog network, posts in a blog can be diffused to other blogs. Analyzing information diffusion in a blog world is a very useful research issue, which can be used for predicting information diffusion, abnormally detection, marketing, and revitalizing the blog world. Existing studies on network analysis have no consideration for the passage of time and these approaches can only measure network activity for a node by the number of direct connections that a given node has. As one solution, this paper suggests the new method of measuring the blog network activity using logistic curve model and Cosine-similarity in key words by the Map/Reduce programming model.

Developing of Text Plagiarism Detection Model using Korean Corpus Data (한글 말뭉치를 이용한 한글 표절 탐색 모델 개발)

  • Ryu, Chang-Keon;Kim, Hyong-Jun;Cho, Hwan-Gue
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.2
    • /
    • pp.231-235
    • /
    • 2008
  • Recently we witnessed a few scandals on plagiarism among academic paper and novels. Plagiarism on documents is getting worse more frequently. Although plagiarism on English had been studied so long time, we hardly find the systematic and complete studies on plagiarisms in Korean documents. Since the linguistic features of Korean are quite different from those of English, we cannot apply the English-based method to Korean documents directly. In this paper, we propose a new plagiarism detecting method for Korean, and we throughly tested our algorithm with one benchmark Korean text corpus. The proposed method is based on "k-mer" and "local alignment" which locates the region of plagiarized document pairs fast and accurately. Using a Korean corpus which contains more than 10 million words, we establish a probability model (or local alignment score (random similarity by chance). The experiment has shown that our system was quite successful to detect the plagiarized documents.

Research Trend Analysis Using Bibliographic Information and Citations of Cloud Computing Articles: Application of Social Network Analysis (클라우드 컴퓨팅 관련 논문의 서지정보 및 인용정보를 활용한 연구 동향 분석: 사회 네트워크 분석의 활용)

  • Kim, Dongsung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.195-211
    • /
    • 2014
  • Cloud computing services provide IT resources as services on demand. This is considered a key concept, which will lead a shift from an ownership-based paradigm to a new pay-for-use paradigm, which can reduce the fixed cost for IT resources, and improve flexibility and scalability. As IT services, cloud services have evolved from early similar computing concepts such as network computing, utility computing, server-based computing, and grid computing. So research into cloud computing is highly related to and combined with various relevant computing research areas. To seek promising research issues and topics in cloud computing, it is necessary to understand the research trends in cloud computing more comprehensively. In this study, we collect bibliographic information and citation information for cloud computing related research papers published in major international journals from 1994 to 2012, and analyzes macroscopic trends and network changes to citation relationships among papers and the co-occurrence relationships of key words by utilizing social network analysis measures. Through the analysis, we can identify the relationships and connections among research topics in cloud computing related areas, and highlight new potential research topics. In addition, we visualize dynamic changes of research topics relating to cloud computing using a proposed cloud computing "research trend map." A research trend map visualizes positions of research topics in two-dimensional space. Frequencies of key words (X-axis) and the rates of increase in the degree centrality of key words (Y-axis) are used as the two dimensions of the research trend map. Based on the values of the two dimensions, the two dimensional space of a research map is divided into four areas: maturation, growth, promising, and decline. An area with high keyword frequency, but low rates of increase of degree centrality is defined as a mature technology area; the area where both keyword frequency and the increase rate of degree centrality are high is defined as a growth technology area; the area where the keyword frequency is low, but the rate of increase in the degree centrality is high is defined as a promising technology area; and the area where both keyword frequency and the rate of degree centrality are low is defined as a declining technology area. Based on this method, cloud computing research trend maps make it possible to easily grasp the main research trends in cloud computing, and to explain the evolution of research topics. According to the results of an analysis of citation relationships, research papers on security, distributed processing, and optical networking for cloud computing are on the top based on the page-rank measure. From the analysis of key words in research papers, cloud computing and grid computing showed high centrality in 2009, and key words dealing with main elemental technologies such as data outsourcing, error detection methods, and infrastructure construction showed high centrality in 2010~2011. In 2012, security, virtualization, and resource management showed high centrality. Moreover, it was found that the interest in the technical issues of cloud computing increases gradually. From annual cloud computing research trend maps, it was verified that security is located in the promising area, virtualization has moved from the promising area to the growth area, and grid computing and distributed system has moved to the declining area. The study results indicate that distributed systems and grid computing received a lot of attention as similar computing paradigms in the early stage of cloud computing research. The early stage of cloud computing was a period focused on understanding and investigating cloud computing as an emergent technology, linking to relevant established computing concepts. After the early stage, security and virtualization technologies became main issues in cloud computing, which is reflected in the movement of security and virtualization technologies from the promising area to the growth area in the cloud computing research trend maps. Moreover, this study revealed that current research in cloud computing has rapidly transferred from a focus on technical issues to for a focus on application issues, such as SLAs (Service Level Agreements).

Development of Selection Model of Interchange Influence Area in Seoul Belt Expressway Using Chi-square Automatic Interaction Detection (CHAID) (CHAID분석을 이용한 나들목 주변 지가의 공간분포 영향모형 개발 - 서울외곽순환고속도로를 중심으로 -)

  • Kim, Tae Ho;Park, Je Jin;Kim, Young Il;Rho, Jeong Hyun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.6D
    • /
    • pp.711-717
    • /
    • 2009
  • This study develops model for analysis of relationship between major node (Interchange in expressway) and land price formation of apartments along with Seoul Belt Expressway by using CHAID analysis. The results show that first, regions(outer side: Gyeongido, inner side: Seoul) on the line of Seoul Belt Expressway are different and a graph generally show llinear relationships between land price and traffic node but it does not; second, CHAID analysis shows two different spatial distribution at the point of 2.6km in the outer side, but three different spatial distribution at the point of 1.4km and 3.8km in the inner side. In other words, traffic access does not necessarily guarantee high housing price since the graphs shows land price related to composite spatial distribution. This implies that residential environments (highway noise and regional discontinuity) and traffic accessibility cause mutual interaction to generate this phenomenon. Therefore, the highway IC landprice model will be beneficial for calculation of land price in New Town which constantly is being built along the highway.

Strategic Behavioral Characteristics of Co-opetition in the Display Industry (디스플레이 산업에서의 협력-경쟁(co-opetition) 전략적 행동 특성)

  • Jung, Hyo-jung;Cho, Yong-rae
    • Journal of Korea Technology Innovation Society
    • /
    • v.20 no.3
    • /
    • pp.576-606
    • /
    • 2017
  • It is more salient in the high-tech industry to cooperate even among competitors in order to promptly respond to the changes in product architecture. In this sense, 'co-opetition,' which is the combination word between 'cooperation' and 'competition,' is the new business term in the strategic management and represents the two concepts "simultaneously co-exist." From this view, this study set up the research purposes as follows: 1) investigating the corporate managerial and technological behavioral characteristics in the co-opetition of the global display industry. 2) verifying the emerging factors during the co-opetition behavior hereafter. 3) suggesting the strategic direction focusing on the co-opetition behavioral characteristics. To this end, this study used co-word network analysis to understand the structure in context level of the co-opetition. In order to understand topics on each network, we clustered the keywords by community detection algorithm based on modularity and labeled the cluster name. The results show that there were increasing patterns of competition rather than cooperation. Especially, the litigations for mutual control against Korean firms much more severely occurred and increased as time passed by. Investigating these network structure in technological evolution perspective, there were already active cooperation and competition among firms in the early 2000s surrounding the issues of OLED-related technology developments. From the middle of the 2000s, firm behaviors have focused on the acceleration of the existing technologies and the development of futuristic display. In other words, there has been competition to take leadership of the innovation in the level of final products such as the TV and smartphone by applying the display panel products. This study will provide not only better understanding on the context of the display industry, but also the analytical framework for the direction of the predictable innovation through analyzing the managerial and technological factors. Also, the methods can support CTOs and practitioners in the technology planning who should consider those factors in the process of decision making related to the strategic technology management and product development.

Improved Original Entry Point Detection Method Based on PinDemonium (PinDemonium 기반 Original Entry Point 탐지 방법 개선)

  • Kim, Gyeong Min;Park, Yong Su
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.6
    • /
    • pp.155-164
    • /
    • 2018
  • Many malicious programs have been compressed or encrypted using various commercial packers to prevent reverse engineering, So malicious code analysts must decompress or decrypt them first. The OEP (Original Entry Point) is the address of the first instruction executed after returning the encrypted or compressed executable file back to the original binary state. Several unpackers, including PinDemonium, execute the packed file and keep tracks of the addresses until the OEP appears and find the OEP among the addresses. However, instead of finding exact one OEP, unpackers provide a relatively large set of OEP candidates and sometimes OEP is missing among candidates. In other words, existing unpackers have difficulty in finding the correct OEP. We have developed new tool which provides fewer OEP candidate sets by adding two methods based on the property of the OEP. In this paper, we propose two methods to provide fewer OEP candidate sets by using the property that the function call sequence and parameters are same between packed program and original program. First way is based on a function call. Programs written in the C/C++ language are compiled to translate languages into binary code. Compiler-specific system functions are added to the compiled program. After examining these functions, we have added a method that we suggest to PinDemonium to detect the unpacking work by matching the patterns of system functions that are called in packed programs and unpacked programs. Second way is based on parameters. The parameters include not only the user-entered inputs, but also the system inputs. We have added a method that we suggest to PinDemonium to find the OEP using the system parameters of a particular function in stack memory. OEP detection experiments were performed on sample programs packed by 16 commercial packers. We can reduce the OEP candidate by more than 40% on average compared to PinDemonium except 2 commercial packers which are can not be executed due to the anti-debugging technique.

HPLC-MS/MS Detection and Sonodegradation of Bisphenol A in Water (HPLC-MS/MS를 이용한 Bisphenol A 분석 및 초음파에 의한 분해 특성 조사)

  • Park, Jong-Sung;Yoon, Yeo-Min;Her, Nam-Guk
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.32 no.6
    • /
    • pp.639-648
    • /
    • 2010
  • The optimal conditions for the analysis of BPA by HPLC-MS/MS was investigated and the ultrasound degradation capacity of the BPA, with the goal to establish the proper directions for analyzing infinitesimal quantities of BPA by HPLC-MS/MS was examined. The MDL and LOQ of BPA analyzed by HPLC-MS/MS were measured 0.13 nM and 1.3 nM respectively, its sensitivity about 620 and 32 times greater than HPLC-UV (MDL: 81.1 nM, LOQ: 811 nM) and FLD (MDL: 4.6 nM, LOQ: 46 nM). In other words, the new method enables the analysis of BPA with the accuracy up to one 1,180th of the amount specified in U.S. EPA guideline for drinking water. Degradation rate of BPA by ultrasound measured over 95% under 580 kHz and 1000 kHz frequency within 30 minutes of treatment, whereas the rate showed some decrease at 28 kHz frequency. At 580 kHz of ultrasound has proven to be the most effective among others at degradation rate and $k_1$ value, so we concluded that this frequency of ultrasound creates hospitable condition for the combined process of degradation by pyrolysis and oxidization. With the addition of 0.01 mM of $CCl_4$, BPA with the initial concentration of 1 ${\mu}M$ was degraded by more than 98% within 30 minutes, the $k_1$ value measured 5 minutes and 30 minutes into the experiment both showed increases by 1.4 and 1.1 times, respectively, compared with BPA without $CCl_4$. It is also found that the main degradation mechanism of BPA by ultrasound is oxidization process by OH radical, based on the fact that the addition of 10 mM of t-BuOH decreased the rate of BPA degradation by around 60%. However, 33% of BPA degradation rate obtained with the addition of t-BuOH implies further degradation done by pyrolysis or other sorts of radical beside OH radical.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.