• Title/Summary/Keyword: Document Classification

Search Result 448, Processing Time 0.03 seconds

A Method for Spam Message Filtering Based on Lifelong Machine Learning (Lifelong Machine Learning 기반 스팸 메시지 필터링 방법)

  • Ahn, Yeon-Sun;Jeong, Ok-Ran
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1393-1399
    • /
    • 2019
  • With the rapid growth of the Internet, millions of indiscriminate advertising SMS are sent every day because of the convenience of sending and receiving data. Although we still use methods to block spam words manually, we have been actively researching how to filter spam in a various ways as machine learning emerged. However, spam words and patterns are constantly changing to avoid being filtered, so existing machine learning mechanisms cannot detect or adapt to new words and patterns. Recently, the concept of Lifelong Learning emerged to overcome these limitations, using existing knowledge to keep learning new knowledge continuously. In this paper, we propose a method of spam filtering system using ensemble techniques of naive bayesian which is most commonly used in document classification and LLML(Lifelong Machine Learning). We validate the performance of lifelong learning by applying the model ELLA and the Naive Bayes most commonly used in existing spam filters.

Texture Feature-Based Language Identification Using Gabor Feature and Wavelet-Domain BDIP and BVLC Features (Gabor 특징과 웨이브렛 영역의 BDIP와 BVLC 특징을 이용한 질감 특징 기반 언어 인식)

  • Jang, Ick-Hoon;Lee, Woo-Shin;Kim, Nam-Chul
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.4
    • /
    • pp.76-85
    • /
    • 2011
  • In this paper, we propose a texture feature-based language identification using Gabor feature and wavelet-domain BDIP (block difference of inverse probabilities) and BVLC (block variance of local correlation coefficients) features. In the proposed method, Gabor and wavelet transforms are first applied to a test image. The wavelet subbands are next denoised by Donoho's soft-thresholding. The magnitude operator is then applied to the Gabor image and the BDIP and BVLC operators to the wavelet subbands. Moments for Gabor magnitude image and each subband of BDIP and BVLC are computed and fused into a feature vector. In classification, the WPCA (whitened principal component analysis) classifier, which is usually adopted in the face identification, searches the training feature vector most similar to the test feature vector. Experimental results show that the proposed method yields excellent language identification with rather low feature dimension for a document image DB.

A Study on Converting the Theological Thesaurus to the Ontology by Using SKOS (SKOS를 이용한 신학 시소러스의 온톨로지로의 변환에 관한 연구)

  • Yoo, Yeong-Jun
    • Journal of Korean Library and Information Science Society
    • /
    • v.43 no.3
    • /
    • pp.143-163
    • /
    • 2012
  • In order to convert a thesaurus described by a person to ontology, the first step is to translate the thesaurus to the ontology by using SKOS, which is suitable for conversion to ontology and was chose an international standard by W3C. SKOS is suitable for converting thesaurus or subject headings or classification system to ontology, but we need a web language to describe an ontology as RDF/XML. RDF/XML is so difficult to read and write that we can need RDFa embedded in HTML document or Turtle, which is more easily describable and readable. Along with description using SKOS, this research has experimentally constructed the ontology by using ontology construction program $Prot{\acute{e}}g{\acute{e}}$ 4.2. In addition to basic concept relationships of thesaurus like equivalent relationship, hierarchical relationships, association relationships transitive hierarchical relationships are included suggested by SKOS in this research.

Grading System of Movie Review through the Use of An Appraisal Dictionary and Computation of Semantic Segments (감정어휘 평가사전과 의미마디 연산을 이용한 영화평 등급화 시스템)

  • Ko, Min-Su;Shin, Hyo-Pil
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.4
    • /
    • pp.669-696
    • /
    • 2010
  • Assuming that the whole meaning of a document is a composition of the meanings of each part, this paper proposes to study the automatic grading of movie reviews which contain sentimental expressions. This will be accomplished by calculating the values of semantic segments and performing data classification for each review. The ARSSA(The Automatic Rating System for Sentiment analysis using an Appraisal dictionary) system is an effort to model decision making processes in a manner similar to that of the human mind. This aims to resolve the discontinuity between the numerical ranking and textual rationalization present in the binary structure of the current review rating system: {rate: review}. This model can be realized by performing analysis on the abstract menas extracted from each review. The performance of this system was experimentally calculated by performing a 10-fold Cross-Validation test of 1000 reviews obtained from the Naver Movie site. The system achieved an 85% F1 Score when compared to predefined values using a predefined appraisal dictionary.

  • PDF

An EXPRESS-to-XML Translator (EXPRESS 데이타를 XML 문서로 변환하는 번역기)

  • 이기호;김혜진
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.6
    • /
    • pp.746-755
    • /
    • 2002
  • EXPRESS is product information description language. It is interpretable by human and software. Product data written in EXPRESS make it possible to exchange between heterogeneous systems. However, the number of software that can use EXPRESS is limited and it is expensive to use the software. XML makes it possible to update and manage data on the Web. Because the Web is easier to use and access than other tools comparatively, data represented by XML need not depend on specific applications or systems and it can be used for exchange of data. Therefore, if we represent EXPRESS-driven data in XML, there will be more active data exchange widely and easily In this work, a method of translation EXPRESS document to XML DTD and XML Schema is proposed. By classification all of EXPRESS syntax element and consideration complex cases caused by this syntax element, a translation rule that represent XML DTD and XML Schema is suggested. Also, a translator which is corresponding to this rule is implemented.

Analysis the job-ability demanded in the security and secretary (경호비서에게 요구되는 업무능력 분석)

  • Park, Ok-cheol
    • Journal of the Society of Disaster Information
    • /
    • v.4 no.2
    • /
    • pp.40-50
    • /
    • 2008
  • The purpose of this study is to analyze the job of the security and secretary, to offer useful basic data throughout human resources management by withdrawing the ability required from the security and secretary based on the analyzed results. In this regard, this study intended to look into the job of the security and secretary, the necessary ability for the job of the security and secretary. To achieve the research goals, the study selected the research participants, composed of 5 secretaries for security with more than 5-year careers after graduating from a security and secretarial service department in a college and conducted an in-depth interview with them concerning their jobs. The in-depth interview data from the 5 participants was applied with a classification analysis used by Spradley (1980). In order to enhance the dependability and validity of the research, the study held an expert meeting composed of 2 persons with a doctoral degree in securities service studies and 1 person with a master's degree in secretarial information studies, twice. Also, the study drew results concerning the job of the security and secretary and the necessary ability of the job. Though the above process, the following conclusions were drawn; the job of the security and secretary includes the areas of job safety, a housing residence job, a health care job, an interpersonal relationships job, an assistant' s job, document and office works, general affairs, and an education job. The necessary ability for the job of the security and secretary involves martial arts abilities, risk management ability, the strict keeping of secrets, decision-making ability, information processing ability, foreign language proficiency, understanding other cultures, communications skills and office work ability.

  • PDF

Concept Network-based Personalized Web Search Systems (개념 네트워크 기반 사용자 인지형 웹 검색 시스템)

  • Yune, Hong-June;Noh, Joon-Ho;Kim, Han-Joon;Lee, Byung-Jeong;Kang, Soo-Yong;Chang, Jae-Young
    • Journal of Internet Computing and Services
    • /
    • v.12 no.2
    • /
    • pp.63-73
    • /
    • 2011
  • In general, conventional search engines provide the same search results for the same queries of users, and however such techniques do not consider users' characteristics. To overcome this problem, we need a new way of personalized search which returns customized search results according to users' preference. In this paper, we propose a concept network profile-based personalized web search system in which the concept network is developed for accumulating users' characteristics. The concept network-based user profile is used to expand initial search queries to achieve personalized search. The concept network is a network structure of concepts where each concept is generated whenever each query is submitted, and it can be defined as a set of keywords extracted from the selected documents. Furthermore, we have improved the concept networks by augmenting intent keywords of each concept with a set of classification tags, called folksonomy, assigned to each document. For an additional personalized search technique, we propose a new re-ranking method that analayzes the degree of overlapped search results.

A Construction of an Ontology Server based Intelligent Retrieval using XMDR (XMDR을 이용한 지능형 검색 온톨로지 서버 구축)

  • Hwang Chi-Gon;Jung Gye-Dong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.8B
    • /
    • pp.549-561
    • /
    • 2005
  • As Internet and network technologies have been developed, e-commerces are getting more complex and more various. This paper, for meta-data and data exchange between heterogeneous database systems, uses XML schema proposed in W3C, and XML schema can present meta-data and data of relational database system as XML document format which is structural. It supports various primitive data formats, so that it uses the structure which reflects adequately data formats which relational database system offered. However, current e-commerces use heterogeneous platforms, so difficulties that is mutual interchange and management exist. For the solution for these problems, a standard ontology which defines relations of product classifications and the standard of property expression and the location ontology which offers e-commerce's information about products are constructed. Applying these ontology information to search system, by offering information which customers need efficient search is performed. Combining these ontologies and product classification category information, called XMDR, this XMDR is introduced into product search system, so this paper proposes to construct ontology server method for efficient search.

The Scope of Practice for Registered Nurses in 64 South Korean Laws

  • Choi, Sungkyoung;Jang, Seung Gyeong;Lee, Won
    • Journal of Korean Academy of Nursing
    • /
    • v.49 no.6
    • /
    • pp.760-770
    • /
    • 2019
  • Purpose: The role of registered nurses is expanding in scope as the healthcare paradigm shifts from acute, hospital-based care to community and population-based care. Given this paradigm shift, this study explores the legal aspects of the role of a registered nurse. Methods: We used document analysis for extracting laws and legal orders related to nursing from the entirety of Korean law. Using textualism approach, we examined the contents utilizing a framework that was developed based on the role classification of community nurses by Clark in this study. Results: A total of 119 items related to nursing were derived from 64 laws. Of these, 71.4 % can be performed by people in multiple types of occupations including nurses. As a result of analyzing required qualifications, 45.4% of 119 items required additional qualifications besides registered nurse license. Analysis of workplace and activity type demonstrated that 26.1% of the 119 items were related to medical institutions, with nurses performing mostly "Client-oriented role." More than half (68.9%) were non-medical institutions, with nurses performing mostly "Delivery-oriented role." Some, however, did not stipulate the nurse's roles clearly. Conclusion: Therefore, to match the enhanced scope and responsibilities of registered nurses and to appropriately recognize, guide, and hold these nurses accountable, laws and policy must reflect these changes. In doing so, these updated laws and policies will ultimately serve as a basis for improving the quality and safety of nursing services.

A Leveling and Similarity Measure using Extended AHP of Fuzzy Term in Information System (정보시스템에서 퍼지용어의 확장된 AHP를 사용한 레벨화와 유사성 측정)

  • Ryu, Kyung-Hyun;Chung, Hwan-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.2
    • /
    • pp.212-217
    • /
    • 2009
  • There are rule-based learning method and statistic based learning method and so on which represent learning method for hierarchy relation between domain term. In this paper, we propose to leveling and similarity measure using the extended AHP of fuzzy term in Information system. In the proposed method, we extract fuzzy term in document and categorize ontology structure about it and level priority of fuzzy term using the extended AHP for specificity of fuzzy term. the extended AHP integrates multiple decision-maker for weighted value and relative importance of fuzzy term. and compute semantic similarity of fuzzy term using min operation of fuzzy set, dice's coefficient and Min+dice's coefficient method. and determine final alternative fuzzy term. after that compare with three similarity measure. we can see the fact that the proposed method is more definite than classification performance of the conventional methods and will apply in Natural language processing field.