Search | Korea Science

Kim, EunKwang;Jeon, SangJun;Han, JaeHyeok;Lee, MinWook;Lee, Sangjin
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.25 no.6
- /
- pp.1485-1494
- /
- 2015
Traditionally, data hiding has been done mainly in such a way that insert the data into the large-capacity multimedia files. However, the document files of the previous versions of Microsoft Office 2003 have been used as cover files as their structure are so similar to a File System that it is easy to hide data in them. If you open a compound-document file which has a secret message hidden in it with MS Office application, it is hard for users who don't know whether a secret message is hidden in the compound-document file to detect the secret message. This paper presents an analysis of Compound-File Binary Format features exploited in order to hide data and algorithms to detect the data hidden with these exploits. Studying methods used to hide data in unused area, unallocated area, reserved area and inserted streams led us to develop an algorithm to aid in the detection and examination of hidden data.
https://doi.org/10.13089/JKIISC.2015.25.6.1485 인용 PDF KSCI HTML

Lee Jin-Kwan;Jung Kyu-cheol;Lee Tae-hun;Park Ki-hong
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.8 no.8
- /
- pp.1769-1776
- /
- 2004
Effective keyword extraction is important in the information search system and there are several ways to select proper keyword in many keywords. Among them, DER Structure for AC Algorithm to search single keyword, can search multiple keywords but it has time complexity problem. In this paper, we developed a algorithm, "EDER structure" by expanding standalone search table based on DER structure search method to improve time complexity. We tested the algorithm using 500 text files and found that EDER structure is more efficient than DER structure for AC for keyword posting result and time complexity that 0.2 second for EDER and 0.6 second for DER structure,structure,
PDF KSCI

Cho, Sung Hye;Lee, Sang Jin
- KIPS Transactions on Computer and Communication Systems
- /
- v.6 no.2
- /
- pp.87-94
- /
- 2017
Microsoft Office is an office suite of applications developed by Microsoft. Recently users with malicious intent customize Office files as a container of the Malware because MS Office is most commonly used word processing program. To attack target system, many of malicious office files using a variety of skills and techniques like macro function, hiding shell code inside unused area, etc. And, people usually use two techniques to detect these kinds of malware. These are Signature-based detection and Sandbox. However, there is some limits to what it can afford because of the increasing complexity of malwares. Therefore, this paper propose methods to detect malicious MS office files in Computer forensics' way. We checked Macros and potential problem area with structural analysis of the MS Office file for this purpose.
https://doi.org/10.3745/KTCCS.2017.6.2.87 인용 PDF KSCI

Choi, Hyeong Kyu;Kang, Ah Reum
- Journal of the Korea Society of Computer and Information
- /
- v.27 no.5
- /
- pp.149-156
- /
- 2022
Recently, there have been many reports of document-type malicious code injecting malicious code into Microsoft Office files. Document-type malicious code is often hidden by encoding the malicious code in the document. Therefore, document-type malware can easily bypass anti-virus programs. We found that malicious code was inserted into the Visual Basic for Applications (VBA) macro, a function supported by Microsoft Office. Malicious codes such as shellcodes that run external programs and URL-related codes that download files from external URLs were identified. We selected 354 keywords repeatedly appearing in malicious Microsoft Office files and defined the number of times each keyword appears in the body of the document as a feature. We performed machine learning with SVM, naïve Bayes, logistic regression, and random forest algorithms. As a result, each algorithm showed accuracies of 0.994, 0.659, 0.995, and 0.998, respectively.
https://doi.org/10.9708/jksci.2022.27.05.149 인용 PDF KSCI HTML

Ahn, Tae-Sung;Yim, Joong-Su;Kim, Myung-Hoon;Ahn, Woo-Ram;Lee, Kyung-Il
- Annual Conference on Human and Language Technology
- /
- 2003.10d
- /
- pp.113-118
- /
- 2003
기존 문서 검색 시스템의 경우 단순히 문서 내에서 텍스트를 추출한 후 그 텍스트를 색인, 검색하는 형태를 가지고 있었다. 본 논문에서는 MS Word, Excel, HWP 등 다양한 형태의 문서에서 텍스트, 표, 이미지, 차트, 동영상 등의 문서 개체를 분석, 색인하고 이를 검색하는 시스템의 개발 방법을 제외하였다. 제안된 시스템은 문서의 내부 자료 구조를 CDML(Composite Document Markup Language)로 변환하고, 이를 색인, 저장함으로 기존의 전문 검색 시스템의 한계를 효과적으로 극복했으며, 문서 내의 검색 대상 개체로 자동 이동하고 하일라이팅 시키는 기술을 구현함으로 사용자 편익성을 높였다. 개발된 시스템의 성능을 평가한 결과, 다양한 문서 형식에 대해 평균 97% 이상의 CDML변환 성공률과 개체 검색 성공률을 보였으며, 이진 파일에서 직접 개체를 추출함으로 매우 높은 분석 및 색인 속도가 달성되었음을 확인할 수 있었다. 본 논문에서 소개된 새로운 패러다임의 문서 검색 솔루션을 통해 다양한 기술적 상업적 파급 효과가 기대되고 있다.
PDF