• Title/Summary/Keyword: word extraction

Search Result 233, Processing Time 0.027 seconds

A Study on the Extraction of Psychological Distance Embedded in Company's SNS Messages Using Machine Learning (머신 러닝을 활용한 회사 SNS 메시지에 내포된 심리적 거리 추출 연구)

  • Seongwon Lee;Jin Hyuk Kim
    • Information Systems Review
    • /
    • v.21 no.1
    • /
    • pp.23-38
    • /
    • 2019
  • The social network service (SNS) is one of the important marketing channels, so many companies actively exploit SNSs by posting SNS messages with appropriate content and style for their customers. In this paper, we focused on the psychological distances embedded in the SNS messages and developed a method to measure the psychological distance in SNS message by mixing a traditional content analysis, natural language processing (NLP), and machine learning. Through a traditional content analysis by human coding, the psychological distance was extracted from the SNS message, and these coding results were used for input data for NLP and machine learning. With NLP, word embedding was executed and Bag of Word was created. The Support Vector Machine, one of machine learning techniques was performed to train and test the psychological distance in SNS message. As a result, sensitivity and precision of SVM prediction were significantly low because of the extreme skewness of dataset. We improved the performance of SVM by balancing the ratio of data by upsampling technique and using data coded with the same value in first content analysis. All performance index was more than 70%, which showed that psychological distance can be measured well.

Keyword Network Visualization for Text Summarization and Comparative Analysis (문서 요약 및 비교분석을 위한 주제어 네트워크 가시화)

  • Kim, Kyeong-rim;Lee, Da-yeong;Cho, Hwan-Gue
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.139-147
    • /
    • 2017
  • Most of the information prevailing in the Internet space consists of textual information. So one of the main topics regarding the huge document analyses that are required in the "big data" era is the development of an automated understanding system for textual data; accordingly, the automation of the keyword extraction for text summarization and abstraction is a typical research problem. But the simple listing of a few keywords is insufficient to reveal the complex semantic structures of the general texts. In this paper, a text-visualization method that constructs a graph by computing the related degrees from the selected keywords of the target text is developed; therefore, two construction models that provide the edge relation are proposed for the computing of the relation degree among keywords, as follows: influence-interval model and word- distance model. The finally visualized graph from the keyword-derived edge relation is more flexible and useful for the display of the meaning structure of the target text; furthermore, this abstract graph enables a fast and easy understanding of the target text. The authors' experiment showed that the proposed abstract-graph model is superior to the keyword list for the attainment of a semantic and comparitive understanding of text.

Topic Based Hierarchical Network Analysis for Entrepreneur Using Text Mining (텍스트 마이닝을 이용한 주제기반의 기업인 네트워크 계층 분석)

  • Lee, Donghun;Kim, Yonghwa;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.3
    • /
    • pp.33-49
    • /
    • 2018
  • The importance of convergence activities among business is increasing due to the necessity of designing and developing new products to satisfy various customers' needs. In particular, decision makers such as CEOs are required to participate in networks between entrepreneurs for being connected with valuable convergence partners. Moreover, it is important for entrepreneurs not only to make a large number of network connections, but also to understand the networking relationship with entrepreneurs with similar topic information. However, there is a difficult limit in collecting the topic information that can show the lack of current status of business and the technology and characteristics of entrepreneur in industry sector. In this paper, we solve these problems through the topic extraction method and analyze the business network in three aspects. Specifically, there are C, S, T-Layer models, and each model analyzes amount of entrepreneurs relationship, network centrality, and topic similarity. As a result of experiments using real data, entrepreneur need to activate network by connecting high centrality entrepreneur when the corporate relationship is low. In addition, we confirmed through experiments that there is a need to activate the topic-based network when topic similarity is low between entrepreneurs.

Trends in the Use of Artificial Intelligence in Medical Image Analysis (의료영상 분석에서 인공지능 이용 동향)

  • Lee, Gil-Jae;Lee, Tae-Soo
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.4
    • /
    • pp.453-462
    • /
    • 2022
  • In this paper, the artificial intelligence (AI) technology used in the medical image analysis field was analyzed through a literature review. Literature searches were conducted on PubMed, ResearchGate, Google and Cochrane Review using the key word. Through literature search, 114 abstracts were searched, and 98 abstracts were reviewed, excluding 16 duplicates. In the reviewed literature, AI is applied in classification, localization, disease detection, disease segmentation, and fit degree of registration images. In machine learning (ML), prior feature extraction and inputting the extracted feature values into the neural network have disappeared. Instead, it appears that the neural network is changing to a deep learning (DL) method with multiple hidden layers. The reason is thought to be that feature extraction is processed in the DL process due to the increase in the amount of memory of the computer, the improvement of the calculation speed, and the construction of big data. In order to apply the analysis of medical images using AI to medical care, the role of physicians is important. Physicians must be able to interpret and analyze the predictions of AI algorithms. Additional medical education and professional development for existing physicians is needed to understand AI. Also, it seems that a revised curriculum for learners in medical school is needed.

Design of Bit Manipulation Accelerator fo Communication DSP (통신용 DSP를 위한 비트 조작 연산 가속기의 설계)

  • Jeong Sug H.;Sunwoo Myung H.
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.8 s.338
    • /
    • pp.11-16
    • /
    • 2005
  • This paper proposes a bit manipulation accelerator (BMA) having application specific instructions, which efficiently supports scrambling, convolutional encoding, puncturing, and interleaving. Conventional DSPs cannot effectively perform bit manipulation functions since かey have multiply accumulate (MAC) oriented data paths and word-based functions. However, the proposed accelerator can efficiently process bit manipulation functions using parallel shift and Exclusive-OR (XOR) operations and bit jnsertion/extraction operations on multiple data. The proposed BMA has been modeled by VHDL and synthesized using the SEC $0.18\mu m$ standard cell library and the gate count of the BMA is only about 1,700 gates. Performance comparisons show that the number of clock cycles can be reduced about $40\%\sim80\%$ for scrambling, convolutional encoding and interleaving compared with existing DSPs.

The Design and Implementation of OWL Ontology Construction System through Information Extraction of Unstructured Documents (비정형 문서의 정보추출을 통한 OWL 온톨로지 구축 시스템의 설계 및 구현)

  • Jo, Dae Woong;Choi, Ji Woong;Kim, Myung Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.23-33
    • /
    • 2014
  • The development of the information retrieval field is evolving to the research field searching accurately for the information from thing finding rapidly a large amount of information. Personalization and the semantic web technology is a key technology. The automatic indexing technology about the web document and throughput go beyond the research stage and show up as the practical service. However, there is a lack of research on the document information retrieval field about the attached document type of except the web document. In this paper, we illustrate about the method in which it analyzed the text content of the unstructured documents prepared in the text, word, hwp form and it how to construction OWL ontology. To build TBox of the document ontology and the resources which can be obtained from the document is selected, and we implement with the system in order to utilize as the instant of the constructed document ontology. It is effectually usable in the information retrieval and document management system using the semantic technology of the correspondence document as the ontology automatic construction of this kind of the unstructured documents.

An Experimental Study on Automatic Summarization of Multiple News Articles (복수의 신문기사 자동요약에 관한 실험적 연구)

  • Kim, Yong-Kwang;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.1 s.59
    • /
    • pp.83-98
    • /
    • 2006
  • This study proposes a template-based method of automatic summarization of multiple news articles using the semantic categories of sentences. First, the semantic categories for core information to be included in a summary are identified from training set of documents and their summaries. Then, cue words for each slot of the template are selected for later classification of news sentences into relevant slots. When a news article is input, its event/accident category is identified, and key sentences are extracted from the news article and filled in the relevant slots. The template filled with simple sentences rather than original long sentences is used to generate a summary for an event/accident. In the user evaluation of the generated summaries, the results showed the 54.l% recall ratio and the 58.l% precision ratio in essential information extraction and 11.6% redundancy ratio.

An Effective Increment리 Content Clustering Method for the Large Documents in U-learning Environment (U-learning 환경의 대용량 학습문서 판리를 위한 효율적인 점진적 문서)

  • Joo, Kil-Hong;Choi, Jin-Tak
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.9
    • /
    • pp.859-872
    • /
    • 2004
  • With the rapid advance of computer and communication techonology, the recent trend of education environment is edveloping in the ubiquitous learning (u-learning) direction that learners select and organize the contents, time and order of learning by themselves. Since the amount of education information through the internet is increasing rapidly and it is managed in document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increased accuracy of search. This paper proposes an efficient incremental clustering method for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed.

  • PDF

A Realtime Hardware Design for Face Detection (얼굴인식을 위한 실시간 하드웨어 설계)

  • Suh, Ki-Bum;Cha, Sun-Tae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.2
    • /
    • pp.397-404
    • /
    • 2013
  • This paper propose the hardware architecture of face detection hardware system using the AdaBoost algorithm. The proposed structure of face detection hardware system is possible to work in 30frame per second and in real time. And the AdaBoost algorithm is adopted to learn and generate the characteristics of the face data by Matlab, and finally detected the face using this data. This paper describes the face detection hardware structure composed of image scaler, integral image extraction, face comparing, memory interface, data grouper and detected result display. The proposed circuit is so designed to process one point in one cycle that the prosed design can process full HD($1920{\times}1080$) image at 70MHz, which is approximate $2316087{\times}30$ cycle. Furthermore, This paper use the reducing the word length by Overflow to reduce memory size. and the proposed structure for face detection has been designed using Verilog HDL and modified in Mentor Graphics Modelsim. The proposed structure has been work on 45MHz operating frequency and use 74,757 LUT in FPGA Xilinx Virtex-5 XC5LX330.

A Korean Text Summarization System Using Aggregate Similarity (도합유사도를 이용한 한국어 문서요약 시스템)

  • 김재훈;김준홍
    • Korean Journal of Cognitive Science
    • /
    • v.12 no.1_2
    • /
    • pp.35-42
    • /
    • 2001
  • In this paper. a document is represented as a weighted graph called a text relationship map. In the graph. a node represents a vector of nouns in a sentence, an edge completely connects other nodes. and a weight on the edge is a value of the similarity between two nodes. The similarity is based on the word overlap between the corresponding nodes. The importance of a node. called an aggregate similarity in this paper. is defined as the sum of weights on the links connecting it to other nodes on the map. In this paper. we present a Korean text summarization system using the aggregate similarity. To evaluate our system, we used two test collection, one collection (PAPER-InCon) consists of 100 papers in the field of computer science: the other collection (NEWS) is composed of 105 articles in the newspapers and had built by KOROlC. Under the compression rate of 20%. we achieved the recall of 46.6% (PAPER-InCon) and 30.5% (NEWS) and the precision of 76.9% (PAPER-InCon) and 42.3% (NEWS).

  • PDF