• Title/Summary/Keyword: named data

Search Result 1,232, Processing Time 0.032 seconds

Re-defining Named Entity Type for Personal Information De-identification and A Generation method of Training Data (개인정보 비식별화를 위한 개체명 유형 재정의와 학습데이터 생성 방법)

  • Choi, Jae-hoon;Cho, Sang-hyun;Kim, Min-ho;Kwon, Hyuk-chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.206-208
    • /
    • 2022
  • As the big data industry has recently developed significantly, interest in privacy violations caused by personal information leakage has increased. There have been attempts to automate this through named entity recognition in natural language processing. In this paper, named entity recognition data is constructed semi-automatically by identifying sentences with de-identification information from de-identification information in Korean Wikipedia. This can reduce the cost of learning about information that is not subject to de-identification compared to using general named entity recognition data. In addition, it has the advantage of minimizing additional systems based on rules and statistics to classify de-identification information in the output. The named entity recognition data proposed in this paper is classified into twelve categories. There are included de-identification information, such as medical records and family relationships. In the experiment using the generated dataset, KoELECTRA showed performance of 0.87796 and RoBERTa of 0.88.

  • PDF

Evaluating and Mitigating Malicious Data Aggregates in Named Data Networking

  • Wang, Kai;Bao, Wei;Wang, Yingjie;Tong, Xiangrong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4641-4657
    • /
    • 2017
  • Named Data Networking (NDN) has emerged and become one of the most promising architectures for future Internet. However, like traditional IP-based networking paradigm, NDN may not evade some typical network threats such as malicious data aggregates (MDA), which may lead to bandwidth exhaustion, traffic congestion and router overload. This paper firstly analyzes the damage effect of MDA using realistic simulations in large-scale network topology, showing that it is not just theoretical, and then designs a fine-grained MDA mitigation mechanism (MDAM) based on the cooperation between routers via alert messages. Simulations results show that MDAM can significantly reduce the Pending Interest Table overload in involved routers, and bring in normal data-returning rate and data-retrieval delay.

A Study on FIFA Partner Adidas of 2022 Qatar World Cup Using Big Data Analysis

  • Kyung-Won, Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.1
    • /
    • pp.164-170
    • /
    • 2023
  • The purpose of this study is to analyze the big data of Adidas brand participating in the Qatar World Cup in 2022 as a FIFA partner to understand useful information, semantic connection and context from unstructured data. Therefore, this study collected big data generated during the World Cup from Adidas participating in sponsorship as a FIFA partner for the 2022 Qatar World Cup and collected data from major portal sites to understand its meaning. According to text mining analysis, 'Adidas' was used the most 3,340 times based on the frequency of keyword appearance, followed by 'World Cup', 'Qatar World Cup', 'Soccer', 'Lionel Messi', 'Qatar', 'FIFA', 'Korea', and 'Uniform'. In addition, the TF-IDF rankings were 'Qatar World Cup', 'Soccer', 'Lionel Messi', 'World Cup', 'Uniform', 'Qatar', 'FIFA', 'Ronaldo', 'Korea', and 'Nike'. As a result of semantic network analysis and CONCOR analysis, four groups were formed. First, Cluster A named it 'Qatar World Cup Sponsor' as words such as 'Adidas', 'Nike', 'Qatar World Cup', 'Sponsor', 'Sponsor Company', 'Marketing', 'Nation', 'Launch', 'Official', 'Commemoration' and 'National Team' were formed into groups. Second, B Cluster named it 'Group stage' as words such as 'Qatar', 'Uruguay', 'FIFA' and 'group stage' were formed into groups. Third, C Cluster named it 'Winning' as words such as 'World Cup Winning', 'Champion', 'France', 'Argentina', 'Lionel Messi', 'Advertising' and 'Photograph' formed a group. Fourth, D Cluster named it 'Official Ball' as words such as 'Official Ball', 'World Cup Official Ball', 'Soccer Ball', 'All Times', 'Al Rihla', 'Public', 'Technology' was formed into groups.

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

  • Hwang, Sangwon;Hong, Jang-Eui;Nam, Young-Kwang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1639-1658
    • /
    • 2019
  • Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

Named entity recognition using transfer learning and small human- and meta-pseudo-labeled datasets

  • Kyoungman Bae;Joon-Ho Lim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.59-70
    • /
    • 2024
  • We introduce a high-performance named entity recognition (NER) model for written and spoken language. To overcome challenges related to labeled data scarcity and domain shifts, we use transfer learning to leverage our previously developed KorBERT as the base model. We also adopt a meta-pseudo-label method using a teacher/student framework with labeled and unlabeled data. Our model presents two modifications. First, the student model is updated with an average loss from both human- and pseudo-labeled data. Second, the influence of noisy pseudo-labeled data is mitigated by considering feedback scores and updating the teacher model only when below a threshold (0.0005). We achieve the target NER performance in the spoken language domain and improve that in the written language domain by proposing a straightforward rollback method that reverts to the best model based on scarce human-labeled data. Further improvement is achieved by adjusting the label vector weights in the named entity dictionary.

Named Entity Boundary Recognition Using Hidden Markov Model and Hierarchical Information (은닉 마르코프 모델과 계층 정보를 이용한 개체명 경계 인식)

  • Lim, Heui-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.2
    • /
    • pp.182-187
    • /
    • 2006
  • This paper proposes a method for boundary recognition of named entity using hidden markov model and ontology information of biological named entity. We uses smoothing method using 31 feature information of word and hierarchical information to alleviate sparse data problem in HMM. The GENIA corpus version 2.1 was used to train and to experiment the proposed boundary recognition system. The experimental results show that the proposed system outperform the previous system which did not use ontology information of hierarchical information and smoothing technique. Also the system shows improvement of execution time of boundary recognition.

  • PDF

Encoding Dictionary Feature for Deep Learning-based Named Entity Recognition

  • Ronran, Chirawan;Unankard, Sayan;Lee, Seungwoo
    • International Journal of Contents
    • /
    • v.17 no.4
    • /
    • pp.1-15
    • /
    • 2021
  • Named entity recognition (NER) is a crucial task for NLP, which aims to extract information from texts. To build NER systems, deep learning (DL) models are learned with dictionary features by mapping each word in the dataset to dictionary features and generating a unique index. However, this technique might generate noisy labels, which pose significant challenges for the NER task. In this paper, we proposed DL-dictionary features, and evaluated them on two datasets, including the OntoNotes 5.0 dataset and our new infectious disease outbreak dataset named GFID. We used (1) a Bidirectional Long Short-Term Memory (BiLSTM) character and (2) pre-trained embedding to concatenate with (3) our proposed features, named the Convolutional Neural Network (CNN), BiLSTM, and self-attention dictionaries, respectively. The combined features (1-3) were fed through BiLSTM - Conditional Random Field (CRF) to predict named entity classes as outputs. We compared these outputs with other predictions of the BiLSTM character, pre-trained embedding, and dictionary features from previous research, which used the exact matching and partial matching dictionary technique. The findings showed that the model employing our dictionary features outperformed other models that used existing dictionary features. We also computed the F1 score with the GFID dataset to apply this technique to extract medical or healthcare information.

Classification of whole body shape of the early 20s male

  • Cha, Su-Joung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.3
    • /
    • pp.113-122
    • /
    • 2019
  • In this study, I analyzed the measurement data of the early 20s male who are emphasizing the importance of good clothes in the fashion of body-contact clothes. Through this, I tried to provide basic data necessary for making clothing for early 20s male. Using data from Size Korea's 7th Human Body Survey, 588 people aged 20-25 years were analyzed and classified into four types. Type 1 have a thick and short body, narrow ankle and calf, thin legs. And the hip is not sagged, and height is a little short. So I named it 'short & thick body with bird legs'. Type 2 have a broad shoulder, slim and long body, and no sagging shoulders. So I named it 'slim inverted triangular figure'. Type 3 have a small height, thin and short body, and a thick ankle and calf. So I named it 'short & thin body with thick legs'. Type 4 have a tall height, narrow shoulder, and sagging hip and shoulders. So I named it 'Long triangle'. In order to improve fit of body-contact clothes reflecting the trend of men's wear in recent years, it is necessary to develop clothing prototypes by body type. 20s have the most ideal body shape after completion of growth, but differences in the length, thickness, and thickness of the trunk. This is reflected in the apparel pattern system, and it can be expected to increase consumers' satisfaction if they are used to make excellent ready-to-wear patterns.

A Named Data Networking Testbed with Global NDN Connection

  • Ni, Alexander;Lim, Huhnkuk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.12
    • /
    • pp.2419-2426
    • /
    • 2015
  • Named Data Networking (NDN) is one of the powerfully evolving future internet architectures. In this paper installation, configuration and several tests are addressed to show how well and properly our NDN testbed have been prepared and established using NDN platform, in order to have interoperability with global NDN testbed. Global NDN testbed status with our NDN node participation was addressed. To verify one reachability on the NDN connection to global NDN testbed, a latency result is presented using NDN ping test.

Optimal Provider Mobility in Large-Scale Named- Data Networking

  • Do, Truong-Xuan;Kim, Younghan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.10
    • /
    • pp.4054-4071
    • /
    • 2015
  • Named-Data Networking (NDN) is one of the promising approaches for the Future Internet to cope with the explosion and current usage pattern of Internet traffic. Content provider mobility in the NDN allows users to receive real-time traffic when the content providers are on the move. However, the current solutions for managing these mobile content providers suffer several issues such as long handover latency, high cost, and non-optimal routing path. In this paper, we survey main approaches for provider mobility in NDN and propose an optimal scheme to support the mobile content providers in the large-scale NDN domain. Our scheme predicts the movement of the provider and uses state information in the NDN forwarding plane to set up an optimal new routing path for mobile providers. By numerical analysis, our approach provides NDN users with better service access delay and lower total handover cost compared with the current solutions.