Search | Korea Science

Feature Selection for a Hangul Text Document Classification System (한글 텍스트 문서 분류시스템을 위한 속성선택)

Lee, Jae-Sik;Cho, You-Jung
- Proceedings of the Korea Inteligent Information System Society Conference
- /
- 2003.05a
- /
- pp.435-442
- /
- 2003
정보 추출(Information Retrieval) 시스템은 거대한 양의 정보들 가운데 필요한 정보의 적절한 탐색을 도와주기 위한 도구이다. 이는 사용자가 요구하는 정보를 보다 정확하고 보다 효과적이면서 보다 효율적으로 전달해주어야만 한다. 그러기 위해서는 문서내의 무수히 많은 속성들 가운데 해당 문서의 특성을 잘 반영하는 속성만을 선별해서 적절히 활용하는 것이 절실히 요구된다. 이에 본 연구는 기존의 한글 문서 분류시스템(CB_TFIDF)[1]의 정확도와 신속성 두 가지 측면의 성능향상에 초점을 두고 있다. 기존의 영문 텍스트 문서 분류시스템에 적용되었던 다양한 속성선택 기법들 가운데 잘 알려진 세가지 즉, Information Gain, Odds Ratio, Document Frequency Thresholding을 통해 선별적인 사례베이스를 구성한 다음에 한글 텍스트 문서 분류시스템에 적용시켜서 성능을 비교 평가한 후, 한글 문서 분류시스템에 가장 적절한 속성선택 기법과 속성 선택에 대한 가이드라인을 제시하고자 한다.
PDF

Energy Efficient Graph Setup in ISA100.11a (ISA100.11a 에서 에너지 효율적인 그래프 구성방법)

Jung, In-Su;jung, hayeon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2011.11a
- /
- pp.675-678
- /
- 2011
무선센서 네트워크는 많이 알려진 연구분야 중 하나이다. 스마트 그리드 나 자동화 시스템 등에 많이 적용된다. 산업장에서 사용되는 무선센서 네트워크 관련표준은 Ziebee, WirelessHART, ISA100.11a 등이 있다. ISA100.11a 는 그래프 라우팅 이라 불리 우는 간단하고 신뢰적인 그래프 라우팅 방법을 제안한다. 그래프 라우팅은 라우팅 테이블이 없는 고정된 라우팅이다. ISA100.11a 에서는 구체적인 라우팅 구성방법에 대해서는 설명되어 있지 않다. 따라서 본 논문에서는 베이스 라인(Base Line)을 이용한 에너지 효율적이고 신뢰적인 그래프 라우팅 구성방법을 제시한다.
https://doi.org/10.3745/PKIPS.y2011m11a.675 인용 PDF

Password-Based Authentication Protocol for Remote Access using Public Key Cryptography (공개키 암호 기법을 이용한 패스워드 기반의 원거리 사용자 인증 프로토콜)

최은정;김찬오;송주석
- Journal of KIISE:Information Networking
- /
- v.30 no.1
- /
- pp.75-81
- /
- 2003
User authentication, including confidentiality, integrity over untrusted networks, is an important part of security for systems that allow remote access. Using human-memorable Password for remote user authentication is not easy due to the low entropy of the password, which constrained by the memory of the user. This paper presents a new password authentication and key agreement protocol suitable for authenticating users and exchanging keys over an insecure channel. The new protocol resists the dictionary attack and offers perfect forward secrecy, which means that revealing the password to an attacher does not help him obtain the session keys of past sessions against future compromises. Additionally user passwords are stored in a form that is not plaintext-equivalent to the password itself, so an attacker who captures the password database cannot use it directly to compromise security and gain immediate access to the server. It does not have to resort to a PKI or trusted third party such as a key server or arbitrator So no keys and certificates stored on the users computer. Further desirable properties are to minimize setup time by keeping the number of flows and the computation time. This is very useful in application which secure password authentication is required such as home banking through web, SSL, SET, IPSEC, telnet, ftp, and user mobile situation.
PDF KSCI

Cross-Lingual Style-Based Title Generation Using Multiple Adapters (다중 어댑터를 이용한 교차 언어 및 스타일 기반의 제목 생성)

Yo-Han Park;Yong-Seok Choi;Kong Joo Lee
- KIPS Transactions on Software and Data Engineering
- /
- v.12 no.8
- /
- pp.341-354
- /
- 2023
The title of a document is the brief summarization of the document. Readers can easily understand a document if we provide them with its title in their preferred styles and the languages. In this research, we propose a cross-lingual and style-based title generation model using multiple adapters. To train the model, we need a parallel corpus in several languages with different styles. It is quite difficult to construct this kind of parallel corpus; however, a monolingual title generation corpus of the same style can be built easily. Therefore, we apply a zero-shot strategy to generate a title in a different language and with a different style for an input document. A baseline model is Transformer consisting of an encoder and a decoder, pre-trained by several languages. The model is then equipped with multiple adapters for translation, languages, and styles. After the model learns a translation task from parallel corpus, it learns a title generation task from monolingual title generation corpus. When training the model with a task, we only activate an adapter that corresponds to the task. When generating a cross-lingual and style-based title, we only activate adapters that correspond to a target language and a target style. An experimental result shows that our proposed model is only as good as a pipeline model that first translates into a target language and then generates a title. There have been significant changes in natural language generation due to the emergence of large-scale language models. However, research to improve the performance of natural language generation using limited resources and limited data needs to continue. In this regard, this study seeks to explore the significance of such research.
https://doi.org/10.3745/KTSDE.2023.12.8.341 인용 PDF

Multiple Pipelined Hash Joins using Synchronization of Page Execution Time (페이지 실행시간 동기화를 이용한 다중 파이프라인 해쉬 결합)

Lee, Kyu-Ock;Weon, Young-Sun;Hong, Man-Pyo
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.7
- /
- pp.639-649
- /
- 2000
In the relational database systems, the join operation is one of the most time-consuming query operations. Many parallel join algorithms have been developed to reduce the execution time. Multiple hash join algorithm using allocation tree is one of most efficient ones. However, it may have some delay on the processing each node of allocation tree, which is occurred in tuple-probing phase by the difference between one page reading time of outer relation and the processing time of already read one. In this paper, to solve the performance degrading problem by the delay, we develop a join algorithm using the concept of 'synchronization of page execution time' for multiple hash joins. We reduce the processing time of each nodes in the allocation tree and improve the total system performance. In addition, we analyze the performance by building the analytical cost model and verify the validity of it by various performance comparison with previous method.
PDF

A Study on Improving the Performance of Document Classification Using the Context of Terms (용어의 문맥활용을 통한 문헌 자동 분류의 성능 향상에 관한 연구)

Song, Sung-Jeon;Chung, Young-Mee
- Journal of the Korean Society for information Management
- /
- v.29 no.2
- /
- pp.205-224
- /
- 2012
One of the limitations of BOW method is that each term is recognized only by its form, failing to represent the term's meaning or thematic background. To overcome the limitation, different profiles for each term were defined by thematic categories depending on contextual characteristics. In this study, a specific term was used as a classification feature based on its meaning or thematic background through the process of comparing the context in those profiles with the occurrences in an actual document. The experiment was conducted in three phases; term weighting, ensemble classifier implementation, and feature selection. The classification performance was enhanced in all the phases with the ensemble classifier showing the highest performance score. Also, the outcome showed that the proposed method was effective in reducing the performance bias caused by the total number of learning documents.
https://doi.org/10.3743/KOSIM.2012.29.2.205 인용 PDF KSCI

Utilizing Local Bilingual Embeddings on Korean-English Law Data (한국어-영어 법률 말뭉치의 로컬 이중 언어 임베딩)

Choi, Soon-Young;Matteson, Andrew Stuart;Lim, Heui-Seok
- Journal of the Korea Convergence Society
- /
- v.9 no.10
- /
- pp.45-53
- /
- 2018
Recently, studies about bilingual word embedding have been gaining much attention. However, bilingual word embedding with Korean is not actively pursued due to the difficulty in obtaining a sizable, high quality corpus. Local embeddings that can be applied to specific domains are relatively rare. Additionally, multi-word vocabulary is problematic due to the lack of one-to-one word-level correspondence in translation pairs. In this paper, we crawl 868,163 paragraphs from a Korean-English law corpus and propose three mapping strategies for word embedding. These strategies address the aforementioned issues including multi-word translation and improve translation pair quality on paragraph-aligned data. We demonstrate a twofold increase in translation pair quality compared to the global bilingual word embedding baseline.
https://doi.org/10.15207/JKCS.2018.9.10.045 인용 PDF KSCI

A Process Decomposition Strategy for Qualitative Fault Diagnosis of Large-scale Processes (대형공정의 정성적 이상진단을 위한 공정분할전략)

Lee Gibaek
- Journal of the Korean Institute of Gas
- /
- v.4 no.4 s.12
- /
- pp.42-49
- /
- 2000
Due to their size and complexity, it is very difficult to make diagnostic system for the whole chemical processes. Therefore, a systematic approach is required to decompose larpge-scale process into sub-processes and then diagnose them. This paper suggests a method for the minimization of knowledge base and flexible diagnosis to be used in qualitative fault diagnosis based on Fault-Effect Tree model. The system can be decomposed for flexible diagnosis, size reduction of knowledge base, and consistent construction of complex knowledge base. The new node, gate-variable, is introduced to connect the cause-effect relationships of each sub-process. For on-line diagnosis, off-line analysis is performed to construct Fault-Effect Trees of gate-variables as well as activation conditions of gate-variables. On-line diagnosis strategy is modified to get the same diagnosis result without system decomposition. The proposed method is illustrated with a fault diagnosis system for a large-scale boiler plant.
PDF

Anaphoricity Determination of Zero Pronouns for Intra-sentential Zero Anaphora Resolution (문장 내 영 조응어 해석을 위한 영대명사의 조응성 결정)

Kim, Kye-Sung;Park, Seong-Bae;Park, Se-Young;Lee, Sang-Jo
- Journal of KIISE:Software and Applications
- /
- v.37 no.12
- /
- pp.928-935
- /
- 2010
Identifying the referents of omitted elements in a text is an important task to many natural language processing applications such as machine translation, information extraction and so on. These omitted elements are often called zero anaphors or zero pronouns, and are regarded as one of the most common forms of reference. However, since all zero elements do not refer to explicit objects which occur in the same text, recent work on zero anaphora resolution have attempted to identify the anaphoricity of zero pronouns. This paper focuses on intra-sentential anaphoricity determination of subject zero pronouns that frequently occur in Korean. Unlike previous studies on pair-wise comparisons, this study attempts to determine the intra-sentential anaphoricity of zero pronouns by learning directly the structure of clauses in which either non-anaphoric or inter-sentential subject zero pronouns occur. The proposed method outperforms baseline methods, and anaphoricity determination of zero pronouns will play an important role in resolving zero anaphora.
PDF KSCI

Applying SeqGAN Algorithm to Software Bug Repair (소프트웨어 버그 정정에 SeqGAN 알고리즘을 적용)

Yang, Geunseok;Lee, Byungjeong
- Journal of Internet Computing and Services
- /
- v.21 no.5
- /
- pp.129-137
- /
- 2020
Recently, software size and program code complexity have increased due to application to various fields of software. Accordingly, the existence of program bugs inevitably occurs, and the cost of software maintenance is increasing. In open source projects, developers spend a lot of debugging time when solving a bug report assigned. To solve this problem, in this paper, we apply SeqGAN algorithm to software bug repair. In detail, the SeqGAN model is trained based on the source code. Open similar source codes during the learning process are also used. To evaluate the suitability for the generated candidate patch, a fitness function is applied, and if all test cases are passed, software bug correction is considered successful. To evaluate the efficiency of the proposed model, it was compared with the baseline, and the proposed model showed better repair.
https://doi.org/10.7472/jksii.2020.21.5.129 인용 PDF KSCI HTML

Search Result 238, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)