Search | Korea Science

A Development Method of Framework for Collecting, Extracting, and Classifying Social Contents

Cho, Eun-Sook
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.1
- /
- pp.163-170
- /
- 2021
As a big data is being used in various industries, big data market is expanding from hardware to infrastructure software to service software. Especially it is expanding into a huge platform market that provides applications for holistic and intuitive visualizations such as big data meaning interpretation understandability, and analysis results. Demand for big data extraction and analysis using social media such as SNS is very active not only for companies but also for individuals. However despite such high demand for the collection and analysis of social media data for user trend analysis and marketing, there is a lack of research to address the difficulty of dynamic interlocking and the complexity of building and operating software platforms due to the heterogeneity of various social media service interfaces. In this paper, we propose a method for developing a framework to operate the process from collection to extraction and classification of social media data. The proposed framework solves the problem of heterogeneous social media data collection channels through adapter patterns, and improves the accuracy of social topic extraction and classification through semantic association-based extraction techniques and topic association-based classification techniques.
https://doi.org/10.9708/jksci.2021.26.01.163 인용 PDF KSCI HTML

A Measurement of Lexical Relationship for Concept Network Based on Semantic Features (의미속성 기반의 개념망을 위한 어휘 연관도 측정)

Ock, Eun-Joo;Lee, Wang-Woo;Lee, Soo-Dong;Ock, Cheol-Young
- Annual Conference on Human and Language Technology
- /
- 2001.10d
- /
- pp.146-154
- /
- 2001
본 논문에서는 개념망 구축을 위해 사전 뜻풀이말에서 추출 가능한 의미속성의 분포 정보를 기반으로 어휘 연관도를 측정하고자 한다. 먼저 112,000여 개의 사전 뜻풀이말을 대상으로 품사 태그와 의미 태그가 부여된 코퍼스에서 의미속성을 추출한다. 추출 가능한 의미속성은 체언류, 부사류, 용언류 등이 있는데 본 논문에서는 일차적으로 명사류와 수식 관계에 있는 용언류 중 관형형 전성어미('ㄴ/은/는')가 부착된 것을 대상으로 한다. 추출된 공기쌍 45,000여 개를 대상으로 정제 작업을 거쳐 정보이론의 상호 정보량(MI)을 이용하여 명사류와 용언류의 연관도를 측정한다. 한편, 자료의 희귀성을 완화하기 위해 수식 관계의 명사류와 용언류는 기초어휘를 중심으로 유사어 집합으로 묶어서 작업을 하였다. 이러한 의미속성의 분포 정보를 통해 측정된 어휘 연관도는 의미속성의 공유 정도를 계산하여 개념들간에 계층구조를 구축하는 데 이용할 수 있다.
PDF

Web Document-based Associate Knowledge Extraction Method : Applying to Bioinformatics (웹 도큐먼트 기반 연관 지식 추출 기법 : 생명정보분야에의 적용)

문현정;김교정
- Journal of Internet Computing and Services
- /
- v.2 no.5
- /
- pp.9-19
- /
- 2001
In this paper. we develop associate knowledge extraction method for finding and expanding user preference knowledge automatically from web document database. To reflect user interest or preferences, agent explores and extracts relevant information to central term involving the intent of users from the example documents. To do so, we apply association rule exploration data-mining method to the extraction of the relevant objects in the web documents. Also, to give the weighted-value to the extracted and relevant information, we present associate tag block-based weighting method. We applied to bioinformatics above associate knowledge extraction method to find related keywords.
PDF

A Measure for Improvement in Quality of Association Rules in the Item Response Dataset (문항 응답 데이터에서 문항간 연관규칙의 질적 향상을 위한 도구 개발)

Kwak, Eun-Young;Kim, Hyeoncheol
- The Journal of Korean Association of Computer Education
- /
- v.10 no.3
- /
- pp.1-8
- /
- 2007
In this paper, we introduce a new measure called surprisal that estimates the informativeness of transactional instances and attributes in the item response dataset and improve the quality of association rules. In order to this, we set artificial dataset and eliminate noisy and uninformative data using the surprisal first, and then generate association rules between items. And we compare the association rules from the dataset after surprisal-based pruning with support-based pruning and original dataset unpruned. Experimental result that the surprisal-based pruning improves quality of association rules in question item response datasets significantly.
PDF

A Rule-Based Data Mining Method among the Unrelated DataBase Table (비연계 DB 테이블상에서의 데이터 추출을 위한 규칙 기반의 데이터 마이닝 기법)

김찬일;조대호
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2000.11a
- /
- pp.220-224
- /
- 2000
데이터 마이닝란 대량의 실제 데이터에서 묵시적이고 잠재적으로 유용한 정보를 추출하는 작업이다. 본 논문에서 서로 관계가 정의되지 않은 데이터베이스의 각 테이블간에서 필요한 정보를 추출 또는 가공하기 위해 데이터 마이닝 기법을 사용한다. 마이닝 기법인 연관 규칙은 어떤 사건이 일어나면 다른 사건이 일어나는 관련성을 의미하는 것이고, 제시된 규칙 기반의 데이터 마이닝 기법은 연관 규칙의 한 분야로서 데이터를 규칙 맞게 분류하는 기법이다. 이런 마이닝 기법을 구현하기 위해 인공지능 분야의 규칙 기반의 전문가 시스템을 사용하였고, 실 시스템인 GDS(Grating automatic Drawing System)에 적용하였다.
PDF

Flickr Image Classification using SIFT Algorism (SIFT 알고리즘을 이용한 플리커 이미지 자동분류)

Jang, Hyun-Woong;Cho, Soo-Sun
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.11a
- /
- pp.1394-1396
- /
- 2013
플리커와 같은 대용량 영상저장 및 공유 사이트가 인기를 끌면서 이미지 정보의 양은 점점 늘어나고 있고 사용자들은 정확한 이미지 정보 검색을 요구하고 있다. 태그기반의 이미지 검색에서 정확도를 높이기 위하여 태그들의 의미적 연관성을 이용하는 등 다양한 연구가 진행되고 있다. 본 논문에서는 특징점 추출에 기반하여 이미지를 분류하는데 뛰어난 성능을 가진 SIFT알고리즘을 사용하여 플리커 이미지를 분류하는 방법을 제안한다. 위키피디아 의미 연관성을 이용해 태그 정보로 1차 분류된 데이터베이스에 SIFT알고리즘을 사용해본 결과 기존의 SURF를 사용한 연구보다 높은 정확성을 보이는 것을 확인하였다. 따라서 이 방법을 통하여 다양한 이미지를 더욱 정확하게 분류할 수 있을 것으로 기대한다.
https://doi.org/10.3745/PKIPS.y2013m11a.1394 인용 PDF

Image Classification Using Bag of Visual Words and Visual Saliency Model (이미지 단어집과 관심영역 자동추출을 사용한 이미지 분류)

Jang, Hyunwoong;Cho, Soosun
- KIPS Transactions on Software and Data Engineering
- /
- v.3 no.12
- /
- pp.547-552
- /
- 2014
As social multimedia sites are getting popular such as Flickr and Facebook, the amount of image information has been increasing very fast. So there have been many studies for accurate social image retrieval. Some of them were web image classification using semantic relations of image tags and BoVW(Bag of Visual Words). In this paper, we propose a method to detect salient region in images using GBVS(Graph Based Visual Saliency) model which can eliminate less important region like a background. First, We construct BoVW based on SIFT algorithm from the database of the preliminary retrieved images with semantically related tags. Second, detect salient region in test images using GBVS model. The result of image classification showed higher accuracy than the previous research. Therefore we expect that our method can classify a variety of images more accurately.
https://doi.org/10.3745/KTSDE.2014.3.12.547 인용 PDF KSCI

Factor Analysis on Development Technology for Next generation Navigation Service (차세대 네비게이션 서비스를 위한 기술개발 요인분석)

Jin Hui-Chae;Jo Seong-Ik
- Proceedings of the Korean Association of Geographic Inforamtion Studies Conference
- /
- 2006.05a
- /
- pp.55-62
- /
- 2006
본 논문에서는 차세대 네비게이션 서비스의 개발을 위하여 사용자 요구사항 분석에 기반한 기술개발 요인을 분석하고 이를 추출해 보고자 한다. 사용자 요구분석을 위하여는 네비게이션 서비스의 6대 기능 요소를 기준으로 소항목을 도출하여 실제 네비게이션 서비스 사용자들을 통하여 기능요소의 중요성을 척도분석 하도록 한다. 이렇게 분석된 요소들의 중요성 척도를 바탕으로 우성 기능요소 인자들을 추출하고 우성 기능요소 인자들을 바탕으로 상관분석을 수행한다. 우리는 상관 분석을 통하여 기능요소들간의 연관성을 찾아낼 수 있으며 여기서 요소들간의 연관성을 바탕으로 한 주요한 기술개발 요인을 찾아낼 수 있다. 이렇게 찾아진 기술개발 요인은 다시 한번 요인분석의 통계량 검정과정을 거쳐 의미 있는 요인이 추출되고 있는가 다시 한번 확인하게 된다.
PDF

Association-Based Knowledge Model for Supporting Diagnosis of a Capsule Endoscopy (캡슐내시경 검사의 진단 보조를 위한 연관성 기반 지식 모델)

Hwang, Gyubon;Park, Ye-Seul;Lee, Jung-Won
- KIPS Transactions on Software and Data Engineering
- /
- v.6 no.10
- /
- pp.493-498
- /
- 2017
Capsule endoscopy is specialized for the observation of small intestine that is difficult to access by general endoscopy. The diagnostic procedure through capsule endoscopy consists of three stages: examination of indicant, endoscopy, and diagnosis. At this time, key information needed for diagnosis includes indicant, lesions, and suspected disease information. In this paper, these information are defined as semantic features and the extracting process is defined as semantic-based analysis. It is performed in whole capsule endoscopy. First, several symptoms of patient are checked before capsule endoscopy to get some information on suspected disease. Next, capsule endoscopy is performed by checking the suspected diseases. Finally, diagnosis is concluded by using supporting information. At this time, some association are used to conclude diagnosis. For example, there are the disease association between the symptom and the disease to identify the expected disease, and the anatomical association between the location of the lesion and supporting information. However, existing knowledge models such as MST and CEST only lists the simple term related to endoscopy and cannot consider such semantic associations. Therefore, in this paper, we propose association-based knowledge model for supporting diagnosis of capsule endoscopy. The proposed model is divided into two; a disease model and anatomical model of small intestine, interesting area(organs) of capsule endoscopy. It can effectively support diagnosis by providing key information for capsule endoscopy.
https://doi.org/10.3745/KTSDE.2017.10.493 인용 PDF KSCI

An Improved Automatic Text Summarization Based on Lexical Chaining Using Semantical Word Relatedness (단어 간 의미적 연관성을 고려한 어휘 체인 기반의 개선된 자동 문서요약 방법)

Cha, Jun Seok;Kim, Jeong In;Kim, Jung Min
- Smart Media Journal
- /
- v.6 no.1
- /
- pp.22-29
- /
- 2017
Due to the rapid advancement and distribution of smart devices of late, document data on the Internet is on the sharp increase. The increment of information on the Web including a massive amount of documents makes it increasingly difficult for users to understand corresponding data. In order to efficiently summarize documents in the field of automated summary programs, various researches are under way. This study uses TextRank algorithm to efficiently summarize documents. TextRank algorithm expresses sentences or keywords in the form of a graph and understands the importance of sentences by using its vertices and edges to understand semantic relations between vocabulary and sentence. It extracts high-ranking keywords and based on keywords, it extracts important sentences. To extract important sentences, the algorithm first groups vocabulary. Grouping vocabulary is done using a scale of specific weight. The program sorts out sentences with higher scores on the weight scale, and based on selected sentences, it extracts important sentences to summarize the document. This study proved that this process confirmed an improved performance than summary methods shown in previous researches and that the algorithm can more efficiently summarize documents.
PDF KSCI

Search Result 50, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)