• Title/Summary/Keyword: Parsing Method

Search Result 151, Processing Time 0.023 seconds

An Integrated Processing Method for Image and Sensing Data Based on Location in Mobile Sensor Networks (이동 센서 네트워크에서 위치 기반의 동영상 및 센싱 데이터 통합 처리 방안)

  • Ko, Minjung;Jung, Juyoung;Boo, Junpil;Kim, Dohyun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.8 no.5
    • /
    • pp.65-71
    • /
    • 2008
  • Recently, the research is progressing on the SWE(Sensor Web Enablement) platform of OGC(Open Geospatial Consortium) to provide the sensing data and moving pictures collected in a sensor network through the Internet Web. However, existed research does not deal with moving objects like cars, trains, ships, and person. Therefore, we present a method to deal with integrated sensing data collected by GPS device, sensor network, and image devices. Also, this paper proposes an integrated processing method for image and sensing data based on location in mobile sensor networks. Additionally, according to proposed methods, we design and implement the combine adapter. This combine adapter receives a contexts data, and provides the common interface included parsing, queueing, creating unified message function. We verity the proposed method which deal with the integrated sensing data based on combine adapter efficiently. Therefore, the research is expected to help the development of a various context information service based on location in future.

  • PDF

Development of Geocoding and Reverse Geocoding Method Implemented for Street-based Addresses in Korea (우리나라 도로명주소를 활용한 지오코딩 및 역 지오코딩 기법 개발)

  • Seok, Sangmuk;Lee, Jiyeong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.34 no.1
    • /
    • pp.33-42
    • /
    • 2016
  • In Korea, the address-point matching technique has been used to provide geocoding services. In fact, this technique brings the high positional accuracy. However, the quality of geocoding result can be limited, since it is significantly affected by data quality. Also, it cannot be used for the 3D address geocoding and the reverse geocoding. In order to alleviate issues, the paper has implemeted proposed geocoding methods, based on street-based addresses matching technique developed by US census bureau, for street-based addresses in Korea. Those proposed geocoding methods are illustrated in two ways; (1) street address-matching method, which of being used for not only 2D addresses representing a single building but also 3D addresses representing indoor space or underground building, and (2) reverse geocoding method, whichas converting a location point to a readable address. The result of street-based address geocoding shows 82.63% match rates, while the result of reverse geocoding shows 98.5% match rates within approximately 1.7(m) the average position error. According to the results, we could conclude that the proposed geocoding techniques enable to provide the LBS(Location Based Service). To develop the geocoding methods, the study has perfoermed by ignoring the parsing algorithms for address standardization as well as the several areas with unusual addresses, such as sub-urban areas or subordinate areas to the roads, etc. In the future, we are planning the improved geocoding methods for considering these cases.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Model Proposal for Detection Method of Cyber Attack using SIEM (SIEM을 이용한 침해사고 탐지방법 모델 제안)

  • Um, Jin-Guk;Kwon, Hun-Yeong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.6
    • /
    • pp.43-54
    • /
    • 2016
  • The occurrence of cyber crime is on the rise every year, and the security control center, which should play a crucial role in monitoring and early response against the cyber attacks targeting various information systems, its importance has increased accordingly. Every endeavors to prevent cyber attacks is being attempted by information security personnel of government and financial sector's security control center, threat response Center, cyber terror response center, Cert Team, SOC(Security Operator Center) and else. The ordinary method to monitor cyber attacks consists of utilizing the security system or the network security device. It is anticipated, however, to be insufficient since this is simply one dimensional way of monitoring them based on signatures. There has been considerable improvement of the security control system and researchers also have conducted a number of studies on monitoring methods to prevent threats to security. In accordance with the environment changes from ESM to SIEM, the security control system is able to be provided with more input data as well as generate the correlation analysis which integrates the processed data, by extraction and parsing, into the potential scenarios of attack or threat. This article shows case studies how to detect the threat to security in effective ways, from the initial phase of the security control system to current SIEM circumstances. Furthermore, scenarios based security control systems rather than simple monitoring is introduced, and finally methods of producing the correlation analysis and its verification methods are presented. It is expected that this result contributes to the development of cyber attack monitoring system in other security centers.

Component Analysis for Constructing an Emotion Ontology (감정 온톨로지의 구축을 위한 구성요소 분석)

  • Yoon, Ae-Sun;Kwon, Hyuk-Chul
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.1
    • /
    • pp.157-175
    • /
    • 2010
  • Understanding dialogue participant's emotion is important as well as decoding the explicit message in human communication. It is well known that non-verbal elements are more suitable for conveying speaker's emotions than verbal elements. Written texts, however, contain a variety of linguistic units that express emotions. This study aims at analyzing components for constructing an emotion ontology, that provides us with numerous applications in Human Language Technology. A majority of the previous work in text-based emotion processing focused on the classification of emotions, the construction of a dictionary describing emotion, and the retrieval of those lexica in texts through keyword spotting and/or syntactic parsing techniques. The retrieved or computed emotions based on that process did not show good results in terms of accuracy. Thus, more sophisticate components analysis is proposed and the linguistic factors are introduced in this study. (1) 5 linguistic types of emotion expressions are differentiated in terms of target (verbal/non-verbal) and the method (expressive/descriptive/iconic). The correlations among them as well as their correlation with the non-verbal expressive type are also determined. This characteristic is expected to guarantees more adaptability to our ontology in multi-modal environments. (2) As emotion-related components, this study proposes 24 emotion types, the 5-scale intensity (-2~+2), and the 3-scale polarity (positive/negative/neutral) which can describe a variety of emotions in more detail and in standardized way. (3) We introduce verbal expression-related components, such as 'experiencer', 'description target', 'description method' and 'linguistic features', which can classify and tag appropriately verbal expressions of emotions. (4) Adopting the linguistic tag sets proposed by ISO and TEI and providing the mapping table between our classification of emotions and Plutchik's, our ontology can be easily employed for multilingual processing.

  • PDF

Performance Comparison of State-of-the-Art Vocoder Technology Based on Deep Learning in a Korean TTS System (한국어 TTS 시스템에서 딥러닝 기반 최첨단 보코더 기술 성능 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.2
    • /
    • pp.509-514
    • /
    • 2020
  • The conventional TTS system consists of several modules, including text preprocessing, parsing analysis, grapheme-to-phoneme conversion, boundary analysis, prosody control, acoustic feature generation by acoustic model, and synthesized speech generation. But TTS system with deep learning is composed of Text2Mel process that generates spectrogram from text, and vocoder that synthesizes speech signals from spectrogram. In this paper, for the optimal Korean TTS system construction we apply Tacotron2 to Tex2Mel process, and as a vocoder we introduce the methods such as WaveNet, WaveRNN, and WaveGlow, and implement them to verify and compare their performance. Experimental results show that WaveNet has the highest MOS and the trained model is hundreds of megabytes in size, but the synthesis time is about 50 times the real time. WaveRNN shows MOS performance similar to that of WaveNet and the model size is several tens of megabytes, but this method also cannot be processed in real time. WaveGlow can handle real-time processing, but the model is several GB in size and MOS is the worst of the three vocoders. From the results of this study, the reference criteria for selecting the appropriate method according to the hardware environment in the field of applying the TTS system are presented in this paper.

Protocol Monitor System Between Cortex M7 Based PLC And HMI

  • Kim, Ki-Su;Lee, Jong-Chan;Ha, Heon-Seong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.6
    • /
    • pp.17-23
    • /
    • 2020
  • In this paper, collecting real-time data frames that occur during RS232 communication between an HMI and PLC of automation equipment by sniffing real-time information data frames through MCU without modification of the HMI or PLC, a method is proposed that allows users to collect data without being dependent on the modification of PLC and HMI systems. The user collects necessary information from the sniffing data through the parsing operation, and the original communication interface is maintained by transmitting the corresponding sniffing frame to the destination. The MCU's UART communication interface circuit is physically designed according to the RS232 communication standard, and this additionally improves efficiency more so than an interrupt-based system by using the DMA device inside the MCU. In addition, the data frame IO operation is performed by logically separating the work of the DMA interrupt service routine from the work of the main thread using the circular queue. Through this method, the user receives the sniffing data frame between the HMI and PLC in RS232 format, and the frame transfer between PLC and HMI arrives normally at the original destination. By sniffing the data frame without further modification of the PLC and HMI, it can be confirmed that it arrives at the user system normally.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Design and Implementation of Content-based Video Database using an Integrated Video Indexing Method (통합된 비디오 인덱싱 방법을 이용한 내용기반 비디오 데이타베이스의 설계 및 구현)

  • Lee, Tae-Dong;Kim, Min-Koo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.6
    • /
    • pp.661-683
    • /
    • 2001
  • There is a rapid increase in the use of digital video information in recent years, it becomes more important to manage video databases efficiently. The development of high speed data network and digital techniques has emerged new multimedia applications such as internet broadcasting, Video On Demand(VOD) combined with video data processing and computer. Video database should be construct for searching fast, efficient video be extract the accurate feature information of video with more massive and more complex characteristics. Video database are essential differences between video databases and traditional databases. These differences lead to interesting new issues in searching of video, data modeling. So, cause us to consider new generation method of database, efficient retrieval method of video. In this paper, We propose the construction and generation method of the video database based on contents which is able to accumulate the meaningful structure of video and the prior production information. And by the proposed the construction and generation method of the video database implemented the video database which can produce the new contents for the internet broadcasting centralized on the video database. For this production, We proposed the video indexing method which integrates the annotation-based retrieval and the content-based retrieval in order to extract and retrieval the feature information of the video data using the relationship between the meaningful structure and the prior production information on the process of the video parsing and extracting the representative key frame. We can improve the performance of the video contents retrieval, because the integrated video indexing method is using the content-based metadata type represented in the low level of video and the annotation-based metadata type impressed in the high level which is difficult to extract the feature information of the video at he same time.

  • PDF

KorLexClas 1.5: A Lexical Semantic Network for Korean Numeral Classifiers (한국어 수분류사 어휘의미망 KorLexClas 1.5)

  • Hwang, Soon-Hee;Kwon, Hyuk-Chul;Yoon, Ae-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.1
    • /
    • pp.60-73
    • /
    • 2010
  • This paper aims to describe KorLexClas 1.5 which provides us with a very large list of Korean numeral classifiers, and with the co-occurring noun categories that select each numeral classifier. Differently from KorLex of other POS, of which the structure depends largely on their reference model (Princeton WordNet), KorLexClas 1.0 and its extended version 1.5 adopt a direct building method. They demand a considerable time and expert knowledge to establish the hierarchies of numeral classifiers and the relationships between lexical items. For the efficiency of construction as well as the reliability of KorLexClas 1.5, we use following processes: (1) to use various language resources while their cross-checking for the selection of classifier candidates; (2) to extend the list of numeral classifiers by using a shallow parsing techniques; (3) to set up the hierarchies of the numeral classifiers based on the previous linguistic studies; and (4) to determine LUB(Least Upper Bound) of the numeral classifiers in KorLexNoun 1.5. The last process provides the open list of the co-occurring nouns for KorLexClas 1.5 with the extensibility. KorLexClas 1.5 is expected to be used in a variety of NLP applications, including MT.