• Title/Summary/Keyword: Hadoop Environment

Search Result 91, Processing Time 0.015 seconds

Optimization and Performance Analysis of Distributed Parallel Processing Platform for Terminology Recognition System (전문용어 인식 시스템을 위한 분산 병렬 처리 플랫폼 최적화 및 성능평가)

  • Choi, Yun-Soo;Lee, Won-Goo;Lee, Min-Ho;Choi, Dong-Hoon;Yoon, Hwa-Mook;Song, Sa-kwang;Jung, Han-Min
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.10
    • /
    • pp.1-10
    • /
    • 2012
  • Many statistical methods have been adapted for terminology recognition to improve its accuracy. However, since previous studies have been carried out in a single core or a single machine, they have difficulties in real-time analysing explosively increasing documents. In this study, the task where bottlenecks occur in the process of terminology recognition is classified into linguistic processing in the process of 'candidate terminology extraction' and collection of statistical information in the process of 'terminology weight assignment'. A terminology recognition system is implemented and experimented to address each task by means of the distributed parallel processing-based MapReduce. The experiments were performed in two ways; the first experiment result revealed that distributed parallel processing by means of 12 nodes improves processing speed by 11.27 times as compared to the case of using a single machine and the second experiment was carried out on 1) default environment, 2) multiple reducers, 3) combiner, and 4) the combination of 2)and 3), and the use of 3) showed the best performance. Our terminology recognition system contributes to speed up knowledge extraction of large scale science and technology documents.