• Title/Summary/Keyword: language processing

Search Result 2,686, Processing Time 0.026 seconds

Rule Based Document Conversion and Information Extraction on the Word Document (워드문서 콘텐츠의 사용자 XML 콘텐츠로의 변환 및 저장 시스템 개발)

  • Joo, Won-Kyun;Yang, Myung-Seok;Kim, Tae-Hyun;Lee, Min-Ho;Choi, Ki-Seok
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.555-559
    • /
    • 2006
  • This paper will intend to contribute to extracting and storing various form of information on user interests by using structural rules user makes and XML-based word document converting techniques. The system named PPE consists of three essential element. One is converting element which converts word documents like HWP, DOC into XML documents, another is extracting element to prepare structural rules and extract concerned information from XML document by structural rules, and the other is storing element to make final XML document or store it into database system. For word document converting, we developed OCX based word converting daemon. Helping user to extracting information, we developed script language having native function/variable processing engine extended from XSLT. This system can be used in the area of constructing word document contents DB or providing various information service based on RAW word documents. We really applied it to project management system and project result management system.

  • PDF

Railway Track Extraction from Mobile Laser Scanning Data (모바일 레이저 스캐닝 데이터로부터 철도 선로 추출에 관한 연구)

  • Yoonseok, Jwa;Gunho, Sohn;Jong Un, Won;Wonchoon, Lee;Nakhyeon, Song
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.33 no.2
    • /
    • pp.111-122
    • /
    • 2015
  • This study purposed on introducing a new automated solution for detecting railway tracks and reconstructing track models from the mobile laser scanning data. The proposed solution completes following procedures; the study initiated with detecting a potential railway region, called Region Of Interest (ROI), and approximating the orientation of railway track trajectory with the raw data. At next, the knowledge-based detection of railway tracks was performed for localizing track candidates in the first strip. In here, a strip -referring the local track search region- is generated in the orthogonal direction to the orientation of track trajectory. Lastly, an initial track model generated over the candidate points, which were detected by GMM-EM (Gaussian Mixture Model-Expectation & Maximization) -based clustering strip- wisely grows to capture all track points of interest and thus converted into geometric track model in the tracking by detection framework. Therefore, the proposed railway track tracking process includes following key features; it is able to reduce the complexity in detecting track points by using a hypothetical track model. Also, it enhances the efficiency of track modeling process by simultaneously capturing track points and modeling tracks that resulted in the minimization of data processing time and cost. The proposed method was developed using the C++ program language and was evaluated by the LiDAR data, which was acquired from MMS over an urban railway track area with a complex railway scene as well.

Network Capacity Design in the local Communication and Computer Network for Consumer Portal System (전력수용가포털을 위한 구내 통신 및 컴퓨터 네트워크 용량 설계)

  • Hong, Jun-Hee;Choi, Jung-In;Kim, Jin-Ho;Kim, Chang-Sub;Son, Sung-Young;Son, Kwang-Myung;Jang, Gil-Soo;Lee, Jea-Bok
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.21 no.10
    • /
    • pp.89-100
    • /
    • 2007
  • Consumer Portal is defined as "a combination of hardware and software that enables two-way communication between energy service provider(ESP, like KEPCO) and equipment within the consumer's premises". The portal provides both a physical link(between wires, radio waves, and other media) and a logical link(translating among language-like codes and etiquette-like protocols) between in-building and wide-area access networks. Thus, the consumer portal is an important, open public shared infrastructure in the future vision of energy services. In this paper, we describe a new methodology for local communication and computer network capacity design of consumer portal, and also presents capacity calculation method using a network system limitation factors. By the approach, we can check into the limitations of existing methods, and propose an improved data processing algorithm that can expand the maximum number of the networked end-use devices up to $30{\sim}40$ times. For validation, we applies the proposed methode to our real system design. Our contribution will help electrical power information network design.

Application of Advertisement Filtering Model and Method for its Performance Improvement (광고 글 필터링 모델 적용 및 성능 향상 방안)

  • Park, Raegeun;Yun, Hyeok-Jin;Shin, Ui-Cheol;Ahn, Young-Jin;Jeong, Seungdo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.11
    • /
    • pp.1-8
    • /
    • 2020
  • In recent years, due to the exponential increase in internet data, many fields such as deep learning have developed, but side effects generated as commercial advertisements, such as viral marketing, have been discovered. This not only damages the essence of the internet for sharing high-quality information, but also causes problems that increase users' search times to acquire high-quality information. In this study, we define advertisement as "a text that obscures the essence of information transmission" and we propose a model for filtering information according to that definition. The proposed model consists of advertisement filtering and advertisement filtering performance improvement and is designed to continuously improve performance. We collected data for filtering advertisements and learned document classification using KorBERT. Experiments were conducted to verify the performance of this model. For data combining five topics, accuracy and precision were 89.2% and 84.3%, respectively. High performance was confirmed, even if atypical characteristics of advertisements are considered. This approach is expected to reduce wasted time and fatigue in searching for information, because our model effectively delivers high-quality information to users through a process of determining and filtering advertisement paragraphs.

Query Expansion Based on Word Graphs Using Pseudo Non-Relevant Documents and Term Proximity (잠정적 부적합 문서와 어휘 근접도를 반영한 어휘 그래프 기반 질의 확장)

  • Jo, Seung-Hyeon;Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.19B no.3
    • /
    • pp.189-194
    • /
    • 2012
  • In this paper, we propose a query expansion method based on word graphs using pseudo-relevant and pseudo non-relevant documents to achieve performance improvement in information retrieval. The initially retrieved documents are classified into a core cluster when a document includes core query terms extracted by query term combinations and the degree of query term proximity. Otherwise, documents are classified into a non-core cluster. The documents that belong to a core query cluster can be seen as pseudo-relevant documents, and the documents that belong to a non-core cluster can be seen as pseudo non-relevant documents. Each cluster is represented as a graph which has nodes and edges. Each node represents a term and each edge represents proximity between the term and a query term. The term weight is calculated by subtracting the term weight in the non-core cluster graph from the term weight in the core cluster graph. It means that a term with a high weight in a non-core cluster graph should not be considered as an expanded term. Expansion terms are selected according to the term weights. Experimental results on TREC WT10g test collection show that the proposed method achieves 9.4% improvement over the language model in mean average precision.

A Reference Architecture and Manifest Standard Suggestions for Interworking Open Web Store (OWS(Open Web Store) 연동을 위한 참조 모델 및 Manifest 표준 제안)

  • Ryu, Taejun;Kim, Changjun;Jeon, Jonghong;Lee, Seungyoon;Park, Sangwon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.11
    • /
    • pp.779-788
    • /
    • 2013
  • With a wide dissemination of smartphones, the number of native applications developed and sold freely by anyone is growing now. The application market activated by Apple's App Store is spreading more rapidly with Google's Google Play. But due to platform-dependent of native application's attribute, developers are programming at each platform. As a result, development cost is increasing compared to earnings. To solve a dependency problem, people focused on web application developed by web-based language. However, stores at each browser are requiring a web application to follow manifest format. And this causes browser-dependent problem. Those problems can be solved by installing a certain browser, but this can make an application useless on the other browser of a store. Dependency problem can narrow not only user's application variation, but also concentration on some specific store. OWS(Open Web Store) is a standard store that supports various web environments. It overcomes browser or platform dependency problems by interworking applications between stores. Also customers are able to choose a large number of applications. In this paper, related to OWS, I would like to suggest manifest standards and store's reference architecture. An interworking scenario is going to be proposed as well.

Partitioning and Merging an Index for Efficient XML Keyword Search (효율적 XML키워드 검색을 인덱스 분할 및 합병)

  • Kim, Sung-Jin;Lee, Hyung-Dong;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.754-765
    • /
    • 2006
  • In XML keyword search, a search result is defined as a set of the smallest elements (i.e., least common ancestors) containing all query keywords and a granularity of indexing is an XML element instead of a document. Under the conventional index structure, all least common ancestors produced by the combination of the elements, each of which contains a query keyword, are considered as a search result. In this paper, to avoid unnecessary operations of producing the least common ancestors and reduce query process time, we describe a way to construct a partitioned index composed of several partitions and produce a search result by merging those partitions if necessary. When a search result is restricted to be composed of the least common ancestors whose depths are higher than a given minimum depth, under the proposed partitioned index structure, search systems can reduce the query process time by considering only combinations of the elements belonging to the same partition. Even though the minimum depth is not given or unknown, search systems can obtain a search result with the partitioned index, which requires the same query process time to obtain the search result with non-partitioned index. Our experiment was conducted with the XML documents provided by the DBLP site and INEX2003, and the partitioned index could reduce a substantial amount of query processing time when the minimum depth is given.

An Efficient Query-based XML Access Control Enforcement Mechanism (효율적인 질의 기반 XML 접근제어 수행 메커니즘)

  • Byun, Chang-Woo;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.34 no.1
    • /
    • pp.1-17
    • /
    • 2007
  • As XML is becoming a de facto standard for distribution and sharing of information, the need for an efficient yet secure access of XML data has become very important. To enforce the fine-level granularity requirement, authorization models for regulating access to XML documents use XPath which is a standard for specifying parts of XML data and a suitable language for both query processing. An access control environment for XML documents and some techniques to deal with authorization priorities and conflict resolution issues are proposed. Despite this, relatively little work has been done to enforce access controls particularly for XML databases in the case of query access. Developing an efficient mechanism for XML databases to control query-based access is therefore the central theme of this paper. This work is a proposal for an efficient yet secure XML access control system. The basic idea utilized is that a user query interaction with only necessary access control rules is modified to an alternative form which is guaranteed to have no access violations using tree-aware metadata of XML schemes and set operators supported by XPath 2.0. The scheme can be applied to any XML database management system and has several advantages over other suggested schemes. These include implementation easiness, small execution time overhead, fine-grained controls, and safe and correct query modification. The experimental results clearly demonstrate the efficiency of the approach.

Parameter Optimization and Automation of the FLEXPART Lagrangian Particle Dispersion Model for Atmospheric Back-trajectory Analysis (공기괴 역궤적 분석을 위한 FLEXPART Lagrangian Particle Dispersion 모델의 최적화 및 자동화)

  • Kim, Jooil;Park, Sunyoung;Park, Mi-Kyung;Li, Shanlan;Kim, Jae-Yeon;Jo, Chun Ok;Kim, Ji-Yoon;Kim, Kyung-Ryul
    • Atmosphere
    • /
    • v.23 no.1
    • /
    • pp.93-102
    • /
    • 2013
  • Atmospheric transport pathway of an air mass is an important constraint controlling the chemical properties of the air mass observed at a designated location. Such information could be utilized for understanding observed temporal variabilities in atmospheric concentrations of long-lived chemical compounds, of which sinks and/or sources are related particularly with natural and/or anthropogenic processes in the surface, and as well as for performing inversions to constrain the fluxes of such compounds. The Lagrangian particle dispersion model FLEXPART provides a useful tool for estimating detailed particle dispersion during atmospheric transport, a significant improvement over traditional "single-line" trajectory models that have been widely used. However, those without a modeling background seeking to create simple back-trajectory maps may find it challenging to optimize FLEXPART for their needs. In this study, we explain how to set up, operate, and optimize FLEXPART for back-trajectory analysis, and also provide automatization programs based on the open-source R language. Discussions include setting up an "AVAILABLE" file (directory of input meteorological fields stored on the computer), creating C-shell scripts for initiating FLEXPART runs and storing the output in directories designated by date, as wells as processing the FLEXPART output to create figures for a back-trajectory "footprint" (potential emission sensitivity within the boundary layer). Step by step instructions are explained for an example case of calculating back trajectories derived for Anmyeon-do, Korea for January 2011. One application is also demonstrated in interpreting observed variabilities in atmospheric $CO_2$ concentration at Anmyeon-do during this period. Back-trajectory modeling information introduced in this study should facilitate the creation and automation of most common back-trajectory calculation needs in atmospheric research.

Development of an Informetric Analysis System KnowledgeMatrix (계량정보분석시스템 KnowledgeMatrix 개발)

  • Lee, Bangrae;Yeo, Woon Dong;Lee, June Young;Lee, Chang-Hoan;Kwon, Oh-Jin;Moon, Yeong-ho
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.167-171
    • /
    • 2007
  • Application areas of Knowledge Discovery in Database (KDD) have been expanded into many R&D management processes including technology trends analysis, forecasting and evaluation etc. Established research field such as informetrics (or scientometrics) has recently fully utilized techniques or methods of KDD. Various systems have been developed to support works of analyzing large-scale R&D related databases such as patent DB or bibliographic DB by a few researchers or institutions. But extant systems have some problems for korean users to use. Their prices is not cheap, korean language process not available, and user's demands not reflected. To solve these problems, Korea Institute of Science and Technology Information (KISTI) developed stand-alone type information analysis system named as KnowledgeMatrix. KnowledgeMatrix system offer various functions to analyze retrieved data set from databases. Knowledge Matrix main operation unit is composed of user-defined lists and matrix generation, cluster analysis, visualization, data pre-processing. KnowledgeMatrix show better performances and offer more various functions than extant systems.

  • PDF