• 제목/요약/키워드: Semi-Structured Data

검색결과 424건 처리시간 0.029초

Domain Adaptation for Opinion Classification: A Self-Training Approach

  • Yu, Ning
    • Journal of Information Science Theory and Practice
    • /
    • 제1권1호
    • /
    • pp.10-26
    • /
    • 2013
  • Domain transfer is a widely recognized problem for machine learning algorithms because models built upon one data domain generally do not perform well in another data domain. This is especially a challenge for tasks such as opinion classification, which often has to deal with insufficient quantities of labeled data. This study investigates the feasibility of self-training in dealing with the domain transfer problem in opinion classification via leveraging labeled data in non-target data domain(s) and unlabeled data in the target-domain. Specifically, self-training is evaluated for effectiveness in sparse data situations and feasibility for domain adaptation in opinion classification. Three types of Web content are tested: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. Findings of this study suggest that, when there are limited labeled data, self-training is a promising approach for opinion classification, although the contributions vary across data domains. Significant improvement was demonstrated for the most challenging data domain-the blogosphere-when a domain transfer-based self-training strategy was implemented.

Generic Multidimensional Model of Complex Data: Design and Implementation

  • Khrouf, Kais;Turki, Hela
    • International Journal of Computer Science & Network Security
    • /
    • 제21권12spc호
    • /
    • pp.643-647
    • /
    • 2021
  • The use of data analysis on large volumes of data constitutes a challenge for deducting knowledge and new information. Data can be heterogeneous and complex: Semi-structured data (Example: XML), Data from social networks (Example: Tweets) and Factual data (Example: Spreading of Covid-19). In this paper, we propose a generic multidimensional model in order to analyze complex data, according to several dimensions.

과학고 학생들의 비구조화된 문제 해결 과정 특성 분석 (Science High School Students' Analysis of Characteristics on Ill-Structured Problem-Solving Process)

  • 서진수;한신;김형범;정진우
    • 대한지구과학교육학회지
    • /
    • 제5권1호
    • /
    • pp.8-19
    • /
    • 2012
  • The purpose of this study is to: analyze the characteristics on ill-structured problem-solving process; examine the type of memories used in their monitoring. The data were primary collected from observation and secondary the semi-structured in-depth interviews based on analysis of observation results with two students who belong to science school and a guidance. The findings of this study revealed that the ill-structured problems possess multiple representations and the upper level's problem have several sub-problems. And multiple steps simultaneously exist in particular stage of problem-solving process that is not single sequential but complex flow and have high frequency of discussion step. Type of memories used in ill-structured problems include idiosyncratic memories which is related in personal histories such as school performance, problem-related memories, abstract rules and intuition.

XML 데이타 색인을 위한 경로 분할 기법 (A Path Partitioning Technique for Indexing XML Data)

  • 김종익;김형주
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제31권3호
    • /
    • pp.320-330
    • /
    • 2004
  • XML에 대한 질의 언어는 데이타 그래프 내의 경로를 이용하여 질의를 표현한다. 특히, 경로에 패턴 (예를 들어, 정규식)을 사용함으로써, 데이타의 구조를 정확히 알지 못하더라도 질의가 가능하도록 한다. 이때, 패턴을 이용하는 질의는 데이타 그래프의 탐색범위를 크게 넓히게 된다. 기존의 XML색인 기법은 질의의 탐색범위를 줄이기 위해 데이타 그래프 내의 서로 동일한 경로들을 하나로 묶어 작은 크기의 색인 그래프를 생성하는 방법을 이용한다. 하지만 이러한 색인들은 많은 경우 색인의 크기가 데이터 그래프의 크기만큼 증가하게 되어 질의의 탐색범위를 줄이지 못하고, 따라서 효율적인 질의 처리를 보장하지 못한다. 본 논문에서는 데이타 내에 존재하는 모든 경로를 분할(partitioning)하고 질의 처리 시 질의에 맞는 분할 영역을 빠르게 찾아낼 수 있는 색인 그래프를 제안한다. 본 논문에서 제안하는 색인 그래프는 데이터 그래프의 크기와 상관없이 색인 그래프의 크기를 조절할 수 있다. 따라서 색인 그래프의 크기를 작게 구성함으로써 색인 그래프 탐색 비용을 크게 줄일 수 있다. 본 논문에서는, 실험을 통해 기존의 그래프 기반색인 기법들보다 본 논문의 색인 기법이 보다 효율적임을 보이고 색인의 크기 변화에 따른 성능 변화에 대해 알아본다.

Comprehensive studies of Grassmann manifold optimization and sequential candidate set algorithm in a principal fitted component model

  • Chaeyoung, Lee;Jae Keun, Yoo
    • Communications for Statistical Applications and Methods
    • /
    • 제29권6호
    • /
    • pp.721-733
    • /
    • 2022
  • In this paper we compare parameter estimation by Grassmann manifold optimization and sequential candidate set algorithm in a structured principal fitted component (PFC) model. The structured PFC model extends the form of the covariance matrix of a random error to relieve the limits that occur due to too simple form of the matrix. However, unlike other PFC models, structured PFC model does not have a closed form for parameter estimation in dimension reduction which signals the need of numerical computation. The numerical computation can be done through Grassmann manifold optimization and sequential candidate set algorithm. We conducted numerical studies to compare the two methods by computing the results of sequential dimension testing and trace correlation values where we can compare the performance in determining dimension and estimating the basis. We could conclude that Grassmann manifold optimization outperforms sequential candidate set algorithm in dimension determination, while sequential candidate set algorithm is better in basis estimation when conducting dimension reduction. We also applied the methods in real data which derived the same result.

여성장애인을 위한 임신과 출산 돌봄에 대한 간호사의 경험 (Nurses' Experiences of Caring for Disabled Women during Pregnancy and Childbirth)

  • 이은주
    • 여성건강간호학회지
    • /
    • 제22권4호
    • /
    • pp.308-321
    • /
    • 2016
  • Purpose:This phenomenological study was to describe and to understand nurses' experiences of caring for woman with disability during pregnancy and childbirth. Methods: Participants were 13 nurses from 3 hospitals and 2 local clinics in J city, and were selected through snow-balling method. Data were collected two face to face, semi-structured interviews. The researcher used MP3 player and smart phone for recording as well as transcription process. As for the data analysis, Colaizzi's method was applied. Results: Nurses' experiences were structured as four theme clusters: 'Communicating between/among nurses', 'Recognizing pregnancy and childbirth of woman with disability', 'Taking care of woman with disability based on their differences' and 'Reflecting on nursing care for woman with disability'. Conclusion: It seemed that nurses' recognition to woman with disability and her pregnancy and childbirth was related their nursing care for woman with disability.

통신 가입자 데이터 관리를 위한 MSSQL Server와 NoSQL MongoDB의 성능 비교 (A Comparison of Performance Between MSSQL Server and MongoDB for Telco Subscriber Data Management)

  • ;구흥서
    • 전기학회논문지
    • /
    • 제65권3호
    • /
    • pp.469-476
    • /
    • 2016
  • Relational Database Management Systems have become de facto database model among most developers and users since the inception of Data Science. From IoT devices, sensors, social media and other sources, data is generated in structured, semi-structured and unstructured formats, in huge volumes, thereby the difficulty of data management greatly increases. Organizations that collect large amounts of data are increasingly turning to non relational databases - NoSQL databases. In this paper, through experiments with real field data, we demonstrate that MongoDB, a document-based NoSQL database, is a better alternative for building a Telco Subscriber Data Management System which hitherto is mainly built with Relational Database Management Systems. We compare the existing system in various phases of data flow with our proposed system powered by MongoDB. We show how various workloads at some phases of the existing system were either completely removed or significantly simplified on the new system. Based on experiment results, using MongoDB for managing telco subscriber data turned out to offer performance better than the existing system built with MSSQL Server.

통합 검색 환경에서 이용자 적합성 판단 기준에 관한 탐색적 연구 (Users' Relevance Criteria in Universal Search in Korea : An Exploratory Study)

  • 박정아
    • 정보관리학회지
    • /
    • 제29권2호
    • /
    • pp.113-133
    • /
    • 2012
  • 본 연구는 한국 통합 검색 환경에서의 이용자 적합성 판단 기준에 관한 탐색적 연구이다. 이를 위해 10명의 참가자들을 대상으로 반구조화(semi-structured) 인터뷰를 수행하여 데이터를 수집하였다. 참가자들은 네이버, 다음 등과 같은 통합 검색 환경에서 본인들이 관심 있거나 필요로 하는 다양한 검색을 수행하고, 그 과정에서 문서가 적합한지와 그 판단 기준에 대해 기술하였다. 연구 결과 8개의 적합성 판단 기준과 비적합성 판단 기준, 그리고 검색 환경이 변화하여도 이용자가 적합성을 판단하는 기준들이 크게 변화하지는 않지만 데이터 증가와 이용자 요구의 고도화로 특수성과 구체성이 중요한 적합성 판단 기준으로 부각되는 점을 발견하였다.

금강하구역 환경 변화와 주민 갈등 요인 (Environmental Change and Causes of Local Conflicts in the Geumgang Estuary)

  • 박금주;이창희;여형범;주용기;김억수;문슬기
    • 한국물환경학회지
    • /
    • 제33권2호
    • /
    • pp.149-159
    • /
    • 2017
  • After the artificial barrage was constructed in 1990s, the Geumgang estuary has been experiencing considerable changes in nature as well as in socioeconomic and culture in the vicinity villages. In order to understand how the change of estuarine environment bring about conflicts among the local communities, and resolve the conflicts, the research investigated the causes of the conflicts in the Geumgang estuary using in-depth and semi-structured interview method. 100 local people who have lived in the vicinity of Geumgang estuary for more than 30 years were selected for the interviews. Results of the research shows that local people's jobs determine the opinions about the estuary barrage and the way of estuarine management. Understanding environmental change and local conflicts helps to develop a sustainable and integrated estuary management system in the region.

기상 정보 전달자의 과학의 본성에 대한 인식 연구 (Weather-Forecasters' Perception about the Nature of Science)

  • 박계현;한신;정진우;박태윤
    • 대한지구과학교육학회지
    • /
    • 제8권2호
    • /
    • pp.114-127
    • /
    • 2015
  • The nature of science has been recognized in a great deal in the field of science education. However, Most of the papers were going to study of teachers and students. to improve their recognition of the nature of science. The current study describes and analyzes Weather-Forecaster's understandings of the nature of science (NOS). Data used in this study were collected from 3 Weather-Forecasters using an semi-structured interview. The results of this study were as follows. First, the participants recognized that science has explored the phenomenon of unknown facts or observations and they were careful inductive perspective. Second, participants felt that science and society are associated with each other. Also, all participants were judged science verification process is required. Third, they are showed that science and technology interact closely with social relationships.