• Title/Summary/Keyword: Computing Dictionary

Search Result 43, Processing Time 0.026 seconds

A study on procedures of search and seize in digital data

  • Kim, Woon Go
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.2
    • /
    • pp.133-139
    • /
    • 2017
  • Today, the activities of individuals and corporations are dependent not only on digital technology but also on the future of society, which is referred to as the fourth industrial revolution. Since the traces that arise from the crimes that occur in the digital society are also inevitably developed into a society that should be found in the digital, the judicial dependence of judging by the digital evidence is inevitably increased in the criminal procedure. On the other hand, considering the fact that many users are using virtual shared computing resources of service providers considering the fact that they are being converted into a cloud computing environment system, searching for evidence in cloud computing resources is not related to crime. The possibility of infringing on the basic rights of the criminal procedure is increased, so that the ability of evidence of digital data which can be used in the criminal procedure is limited. Therefore, considering these two aspects of digital evidence, this point should be fully taken into account in judging the evidence ability in the post-seizure warrant issuance and execution stage as well as the pre-emptive control. There is a view that dictionary control is useless, but it needs to be done with lenient control in order to materialize post-modern control through judging ability of evidence. In other words, more efforts are needed than ever before, including legislation to ensure proper criminal procedures in line with the digital age.

Automatic Construction of Class Hierarchies and Named Entity Dictionaries using Korean Wikipedia (한국어 위키피디아를 이용한 분류체계 생성과 개체명 사전 자동 구축)

  • Bae, Sang-Joon;Ko, Young-Joong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.492-496
    • /
    • 2010
  • Wikipedia as an open encyclopedia contains immense human knowledge written by thousands of volunteer editors and its reliability is also high. In this paper, we propose to automatically construct a Korean named entity dictionary using the several features of the Wikipedia. Firstly, we generate class hierarchies using the class information from each article of Wikipedia. Secondly, the titles of each article are mapped to our class hierarchies, and then we calculate the entropy value of the root node in each class hierarchy. Finally, we construct named entity dictionary with high performance by removing the class hierarchies which have a higher entropy value than threshold. Our experiment results achieved overall F1-measure of 81.12% (precision : 83.94%, recall : 78.48%).

A Generation and Matching Method of Normal-Transient Dictionary for Realtime Topic Detection (실시간 이슈 탐지를 위한 일반-급상승 단어사전 생성 및 매칭 기법)

  • Choi, Bongjun;Lee, Hanjoo;Yong, Wooseok;Lee, Wonsuk
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.5
    • /
    • pp.7-18
    • /
    • 2017
  • Recently, the number of SNS user has rapidly increased due to smart device industry development and also the amount of generated data is exponentially increasing. In the twitter, Text data generated by user is a key issue to research because it involves events, accidents, reputations of products, and brand images. Twitter has become a channel for users to receive and exchange information. An important characteristic of Twitter is its realtime. Earthquakes, floods and suicides event among the various events should be analyzed rapidly for immediately applying to events. It is necessary to collect tweets related to the event in order to analyze the events. But it is difficult to find all tweets related to the event using normal keywords. In order to solve such a mentioned above, this paper proposes A Generation and Matching Method of Normal-Transient Dictionary for realtime topic detection. Normal dictionaries consist of general keywords(event: suicide-death-loop, death, die, hang oneself, etc) related to events. Whereas transient dictionaries consist of transient keywords(event: suicide-names and information of celebrities, information of social issues) related to events. Experimental results show that matching method using two dictionary finds more tweets related to the event than a simple keyword search.

An Electronic Dictionary Structure supporting Truncation Search (절단검색을 지원하는 전자사전 구조)

  • 김철수
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.1
    • /
    • pp.60-69
    • /
    • 2003
  • In an Information Retrieval System(IRS) based on an inverted file as a file structure it is possible to retrieve related documents when the searcher know the complete words of searching fields. however, there are many cases in which the searcher may not know the complete words but a partial string of words with which to search. In this case, if the searcher can search indexes that include the known partial string, it is possible to retrieve related documents. Futhermore, when the retrieved documents are few, we need a method to find all documents having indexes which include known the partial string. To satisfy these requests, the searcher should be able to construct a query formulation that uses the term truncation method. Also the IRS should have an electronic dictionary that can support a truncated search term. This paper designs and implements an electronic dictionary(ED) structure to support a truncation search efficiently. The ED guarantees very fast and constant searching time for searching a term entry and the inversely alphabetized entry of it, regardless of the number of inserted words. In order to support a truncation search efficiently, we use the Trie structure and in order to accommodate fast searching time we use a method using array. In the searching process of a truncated term, we can reduce the searching time by minimizing the length of string to be expanded.

Mutational Data Loading Routines for Human Genome Databases: the BRCA1 Case

  • Van Der Kroon, Matthijs;Ramirez, Ignacio Lereu;Levin, Ana M.;Pastor, Oscar;Brinkkemper, Sjaak
    • Journal of Computing Science and Engineering
    • /
    • v.4 no.4
    • /
    • pp.291-312
    • /
    • 2010
  • The last decades a large amount of research has been done in the genomics domain which has and is generating terabytes, if not exabytes, of information stored globally in a very fragmented way. Different databases use different ways of storing the same data, resulting in undesired redundancy and restrained information transfer. Adding to this, keeping the existing databases consistent and data integrity maintained is mainly left to human intervention which in turn is very costly, both in time and money as well as error prone. Identifying a fixed conceptual dictionary in the form of a conceptual model thus seems crucial. This paper presents an effort to integrate the mutational data from the established genomic data source HGMD into a conceptual model driven database HGDB, thereby providing useful lessons to improve the already existing conceptual model of the human genome.

Multilingual Automatic Translation Based on UNL: A Case Study for the Vietnamese Language

  • Thuyen, Phan Thi Le;Hung, Vo Trung
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.2
    • /
    • pp.77-84
    • /
    • 2016
  • In the field of natural language processing, Universal Networking Language (UNL) has been used by various researchers as an inter-lingual approach to automatic machine translation. The UNL system consists of two main components, namely, EnConverter for converting text from a source language to UNL, and DeConverter for converting from UNL to a target language. Currently, many projects are researching how to apply UNL to different languages. In this paper, we introduce the tools that are UNL's applications and discuss how to reuse them to encode a Vietnamese sentence into UNL expressions and decode UNL expressions into a Vietnamese sentence. The testing was done with about 1,000 Vietnamese sentences (a dictionary that includes 4573 entries and 3161 rules). In addition, we compare the proportion of sentences translated based on a direct method (Google Translator) and another one based on UNL.

Human Activities Recognition Based on Skeleton Information via Sparse Representation

  • Liu, Suolan;Kong, Lizhi;Wang, Hongyuan
    • Journal of Computing Science and Engineering
    • /
    • v.12 no.1
    • /
    • pp.1-11
    • /
    • 2018
  • Human activities recognition is a challenging task due to its complexity of human movements and the variety performed by different subjects for the same action. This paper presents a recognition algorithm by using skeleton information generated from depth maps. Concatenating motion features and temporal constraint feature produces feature vector. Reducing dictionary scale proposes an improved fast classifier based on sparse representation. The developed method is shown to be effective by recognizing different activities on the UTD-MHAD dataset. Comparison results indicate superior performance of our method over some existing methods.

A Word Embedding used Word Sense and Feature Mirror Model (단어 의미와 자질 거울 모델을 이용한 단어 임베딩)

  • Lee, JuSang;Shin, JoonChoul;Ock, CheolYoung
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.4
    • /
    • pp.226-231
    • /
    • 2017
  • Word representation, an important area in natural language processing(NLP) used machine learning, is a method that represents a word not by text but by distinguishable symbol. Existing word embedding employed a large number of corpora to ensure that words are positioned nearby within text. However corpus-based word embedding needs several corpora because of the frequency of word occurrence and increased number of words. In this paper word embedding is done using dictionary definitions and semantic relationship information(hypernyms and antonyms). Words are trained using the feature mirror model(FMM), a modified Skip-Gram(Word2Vec). Sense similar words have similar vector. Furthermore, it was possible to distinguish vectors of antonym words.

A High-Speed Korean Morphological Analysis Method based on Pre-Analyzed Partial Words (부분 어절의 기분석에 기반한 고속 한국어 형태소 분석 방법)

  • Yang, Seung-Hyun;Kim, Young-Sum
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.3
    • /
    • pp.290-301
    • /
    • 2000
  • Most morphological analysis methods require repetitive procedures of input character code conversion, segmentation and lemmatization of constituent morphemes, filtering of candidate results through looking up lexicons, which causes run-time inefficiency. To alleviate such problem of run-time inefficiency, many systems have introduced the notion of 'pre-analysis' of words. However, this method based on pre-analysis dictionary of surface also has a critical drawback in its practical application because the size of the dictionaries increases indefinite to cover all words. This paper hybridizes both extreme approaches methodologically to overcome the problems of the two, and presents a method of morphological analysis based on pre-analysis of partial words. Under such hybridized scheme, most computational overheads, such as segmentation and lemmatization of morphemes, are shifted to building-up processes of the pre-analysis dictionaries and the run-time dictionary look-ups are greatly reduced, so as to enhance the run-time performance of the system. Moreover, additional computing overheads such as input character code conversion can also be avoided because this method relies upon no graphemic processing.

  • PDF

Practical Password-Authenticated Three-Party Key Exchange

  • Kwon, Jeong-Ok;Jeong, Ik-Rae;Lee, Dong-Hoon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.2 no.6
    • /
    • pp.312-332
    • /
    • 2008
  • Password-based authentication key exchange (PAKE) protocols in the literature typically assume a password that is shared between a client and a server. PAKE has been applied in various environments, especially in the “client-server” applications of remotely accessed systems, such as e-banking. With the rapid developments in modern communication environments, such as ad-hoc networks and ubiquitous computing, it is customary to construct a secure peer-to-peer channel, which is quite a different paradigm from existing paradigms. In such a peer-to-peer channel, it would be much more common for users to not share a password with others. In this paper, we consider password-based authentication key exchange in the three-party setting, where two users do not share a password between themselves but only with one server. The users make a session-key by using their different passwords with the help of the server. We propose an efficient password-based authentication key exchange protocol with different passwords that achieves forward secrecy in the standard model. The protocol requires parties to only memorize human-memorable passwords; all other information that is necessary to run the protocol is made public. The protocol is also light-weighted, i.e., it requires only three rounds and four modular exponentiations per user. In fact, this amount of computation and the number of rounds are comparable to the most efficient password-based authentication key exchange protocol in the random-oracle model. The dispensation of random oracles in the protocol does not require the security of any expensive signature schemes or zero-knowlegde proofs.