• Title/Summary/Keyword: 자연어 처리 도구

Search Result 47, Processing Time 0.022 seconds

An English Essay Scoring System Based on Grammaticality and Lexical Cohesion (문법성과 어휘 응집성 기반의 영어 작문 평가 시스템)

  • Kim, Dong-Sung;Kim, Sang-Chul;Chae, Hee-Rahk
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.3
    • /
    • pp.223-255
    • /
    • 2008
  • In this paper, we introduce an automatic system of scoring English essays. The system is comprised of three main components: a spelling checker, a grammar checker and a lexical cohesion checker. We have used such resources as WordNet, Link Grammar/parser and Roget's thesaurus for these components. The usefulness of an automatic scoring system depends on its reliability. To measure reliability, we compared the results of automatic scoring with those of manual scoring, on the basis of the Kappa statistics and the Multi-facet Rasch Model. The statistical data obtained from the comparison showed that the scoring system is as reliable as professional human graders. This system deals with textual units rather than sentential units and checks not only formal properties of a text but also its contents.

  • PDF

A Semantic Web-enabled Woo System for Ontology Construction and Sharing (온톨로지 생성과 공유를 위한 시맨틱 웹 기반 위키 시스템)

  • Kim Hyun-Joo;Choi Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.8
    • /
    • pp.703-717
    • /
    • 2006
  • The Semantic Web has the objective of developing universal media in which machine-processable semantic information can be represented and shared, and it is therefore important to distribute ontologies that represent this kind of semantic information to the Web and make them available to multiple parties. However, the current ontology authoring tools are not operating on the Web, which makes it difficult to distribute ontologies directly to the Web and to create and edit them collaboratively with other people. This paper proposes a framework that facilitates the ontology construction and sharing, realizing easy distribution of ontologies to the Web. Wiki is one of the frameworks for collaborative construction and sharing of knowledge on the Web, and Wiki contents consist of natural language texts and simple markup language for visualization. For better collaboration in creating and sharing ontologies, this paper suggests the Semantic Wiki that embodies the Semantic Web features to the existing Wiki system. The Semantic Wiki framework facilitates the collaboration in ontology co-authoring and sharing for people, and at the same time, makes it possible for the agent software to easily manage the ontology information. Eventually, the Semantic Wiki system accomplishes various tasks including the semantic view, the semantic navigation, and the semantic query.

Constructing Tagged Corpus and Cue Word Patterns for Detecting Korean Hedge Sentences (한국어 Hedge 문장 인식을 위한 태깅 말뭉치 및 단서어구 패턴 구축)

  • Jeong, Ju-Seok;Kim, Jun-Hyeouk;Kim, Hae-Il;Oh, Sung-Ho;Kang, Sin-Jae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.6
    • /
    • pp.761-766
    • /
    • 2011
  • A hedge is a linguistic device to express uncertainties. Hedges are used in a sentence when the writer is uncertain or has doubt about the contents of the sentence. Due to this uncertainty, sentences with hedges are considered to be non-factual. There are many applications which need to determine whether a sentence is factual or not. Detecting hedges has the advantage in information retrieval, and information extraction, and QnA systems, which make use of non-hedge sentences as target to get more accurate results. In this paper, we constructed Korean hedge corpus, and extracted generalized hedge cue-word patterns from the corpus, and then used them in detecting hedges. In our experiments, we achieved 78.6% in F1-measure.

Verification on stock return predictability of text in analyst reports (애널리스트 보고서 텍스트의 주가예측력에 대한 검증)

  • Young-Sun Lee;Akihiko Yamada;Cheol-Won Yang;Hohsuk Noh
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.5
    • /
    • pp.489-499
    • /
    • 2023
  • As sharing of analyst reports became widely available, reports generated by analysts have become a useful tool to reduce difference in financial information between market participants. The quantitative information of analyst reports has been used in many ways to predict stock returns. However, there are relatively few domestic studies on the prediction power of text information in analyst reports to predict stock returns. We test stock return predictability of text in analyst reports by creating variables representing the TONE from the text. To overcome the limitation of the linear-model-assumption-based approach, we use the random-forest-based F-test.

An SAO-based Text Mining Approach for Technology Roadmapping Using Patent Information (기술로드맵핑을 위한 특허정보의 SAO기반 텍스트 마이닝 접근 방법)

  • Choi, Sung-Chul;Kim, Hong-Bin;Yoon, Jang-Hyeok
    • Journal of Technology Innovation
    • /
    • v.20 no.1
    • /
    • pp.199-234
    • /
    • 2012
  • Technology roadmaps (TRMs) are considered to be the essential tool for strategic technology planning and management. Recently, rapidly evolving technological trends and severe technological competition are making TRM more important than ever before. That is because TRM plays a role of "map" that align organizational objectives with their relevant technologies. However, constructing and managing TRMs are costly and time-consuming because they rely on the qualitative and intuitive knowledge of human experts. Therefore, enhancing the productivity of developing TRMs is one of the major concerns in technology planning. In this regard, this paper proposes a technology roadmapping approach based on function of which concept includes objectives, structures and effects of a technology and which are represented as Subject-Action-Object structures extractable by exploiting natural language processing of patent text. We expect that the proposed method will broaden experts' technological horizons in the technology planning process and will help to construct TRMs efficiently with the reduced time and costs.

  • PDF

Comparing the 2015 with the 2022 Revised Primary Science Curriculum Based on Network Analysis (2015 및 2022 개정 초등학교 과학과 교육과정에 대한 비교 - 네트워크 분석을 중심으로 -)

  • Jho, Hunkoog
    • Journal of Korean Elementary Science Education
    • /
    • v.42 no.1
    • /
    • pp.178-193
    • /
    • 2023
  • The aim of this study was to investigate differences in the achievement standards from the 2015 to the 2022 revised national science curriculum and to present the implications for science teaching under the revised curriculum. Achievement standards relevant to primary science education were therefore extracted from the national curriculum documents; conceptual domains in the two curricula were analyzed for differences; various kinds of centrality were computed; and the Louvain algorithm was used to identify clusters. These methods revealed that, in the revised compared with the preceding curriculum, the total number of nodes and links had increased, while the number of achievement standards had decreased by 10 percent. In the revised curriculum, keywords relevant to procedural skills and behavior received more emphasis and were connected to collaborative learning and digital literacy. Observation, survey, and explanation remained important, but varied in application across the fields of science. Clustering revealed that the number of categories in each field of science remained mostly unchanged in the revised compared with the previous curriculum, but that each category highlighted different skills or behaviors. Based on those findings, some implications for science instruction in the classroom are discussed.

Voice Synthesis Detection Using Language Model-Based Speech Feature Extraction (언어 모델 기반 음성 특징 추출을 활용한 생성 음성 탐지)

  • Seung-min Kim;So-hee Park;Dae-seon Choi
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.3
    • /
    • pp.439-449
    • /
    • 2024
  • Recent rapid advancements in voice generation technology have enabled the natural synthesis of voices using text alone. However, this progress has led to an increase in malicious activities, such as voice phishing (voishing), where generated voices are exploited for criminal purposes. Numerous models have been developed to detect the presence of synthesized voices, typically by extracting features from the voice and using these features to determine the likelihood of voice generation.This paper proposes a new model for extracting voice features to address misuse cases arising from generated voices. It utilizes a deep learning-based audio codec model and the pre-trained natural language processing model BERT to extract novel voice features. To assess the suitability of the proposed voice feature extraction model for voice detection, four generated voice detection models were created using the extracted features, and performance evaluations were conducted. For performance comparison, three voice detection models based on Deepfeature proposed in previous studies were evaluated against other models in terms of accuracy and EER. The model proposed in this paper achieved an accuracy of 88.08%and a low EER of 11.79%, outperforming the existing models. These results confirm that the voice feature extraction method introduced in this paper can be an effective tool for distinguishing between generated and real voices.