• Title/Summary/Keyword: 오타 교정

Search Result 13, Processing Time 0.017 seconds

Preprocessing technique for natural language processing considering the form of characters used in malicious comments (악성 댓글에 사용된 문자의 형태를 고려한 한국어 자연어처리를 위한 전처리 기법)

  • Kim, Hae-Soo;Kim, Mi-hui
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.543-545
    • /
    • 2022
  • 최근 악플에 대한 논란이 끊이지 않고 있어 이것을 해결하기위한 방법으로 자연어 처리를 이용하고 있다. 특히 소셜 미디어, 온라인 커뮤니티에서 많이 발생하고 있고 해당 매체에서는 한글을 그대로 사용하지 않고 그들의 은어를 섞어서 사용하며 그중에서 한글이 아닌 문자를 섞어서 만들어낸 문장도 있다. 이러한 문장은 기존의 모델에 학습된 데이터의 형태와 다르며 한글이 아닌 문장이 많을수록 모델의 예측이 부정확해진다는 단점이 있어 본 논문에서는 인공지능을 이용한 이미지 분류와 띄어쓰기, 오타 교정을 이용한 전처리 기법을 제안한다.

Study on Automatic Mapping Method for Reference of Scholarly Papers (학술논문의 참고문헌 자동매핑 방법에 관한 연구)

  • Han, Jeong-Min;Jang, Hyun-Chul;Kim, Jin-Hyun;Yea, Sang-Jun;Kim, Sang-Kyun;Kim, Chul;Song, Mi-Young
    • Journal of Information Management
    • /
    • v.41 no.3
    • /
    • pp.155-173
    • /
    • 2010
  • With the advanced learning and the diversity of topics, researchers on each area keenly feel the need of precise and a quick discovery of required information at any time. This study presents a way of constructing the automatic mapping system that can compare and analyze duplicated data and that describes the result by building an effective reference extraction method and another way of correcting the wrong form of used Chinese characters with Traditional Korean Medicine dictionary. With this innovation, data duplication on references and Chinese characters errors can be fixed. Under the situation that a number of references of newly published papers that can continuously be extracted.

Automatic Inter-Phoneme Similarity Calculation Method Using PAM Matrix Model (PAM 행렬 모델을 이용한 음소 간 유사도 자동 계산 기법)

  • Kim, Sung-Hwan;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.3
    • /
    • pp.34-43
    • /
    • 2012
  • Determining the similarity between two strings can be applied various area such as information retrieval, spell checker and spam filtering. Similarity calculation between Korean strings based on dynamic programming methods firstly requires a definition of the similarity between phonemes. However, existing methods have a limitation that they use manually set similarity scores. In this paper, we propose a method to automatically calculate inter-phoneme similarity from a given set of variant words using a PAM-like probabilistic model. Our proposed method first finds the pairs of similar words from a given word set, and derives derivation rules from text alignment results among the similar word pairs. Then, similarity scores are calculated from the frequencies of variations between different phonemes. As an experimental result, we show an improvement of 10.1%~14.1% and 8.1%~11.8% in terms of sensitivity compared with the simple match-mismatch scoring scheme and the manually set inter-phoneme similarity scheme, respectively, with a specificity of 77.2%~80.4%.