• Title/Summary/Keyword: Semi-Structured Data

Search Result 421, Processing Time 0.022 seconds

Domain Adaptation for Opinion Classification: A Self-Training Approach

  • Yu, Ning
    • Journal of Information Science Theory and Practice
    • /
    • v.1 no.1
    • /
    • pp.10-26
    • /
    • 2013
  • Domain transfer is a widely recognized problem for machine learning algorithms because models built upon one data domain generally do not perform well in another data domain. This is especially a challenge for tasks such as opinion classification, which often has to deal with insufficient quantities of labeled data. This study investigates the feasibility of self-training in dealing with the domain transfer problem in opinion classification via leveraging labeled data in non-target data domain(s) and unlabeled data in the target-domain. Specifically, self-training is evaluated for effectiveness in sparse data situations and feasibility for domain adaptation in opinion classification. Three types of Web content are tested: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. Findings of this study suggest that, when there are limited labeled data, self-training is a promising approach for opinion classification, although the contributions vary across data domains. Significant improvement was demonstrated for the most challenging data domain-the blogosphere-when a domain transfer-based self-training strategy was implemented.

Generic Multidimensional Model of Complex Data: Design and Implementation

  • Khrouf, Kais;Turki, Hela
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.643-647
    • /
    • 2021
  • The use of data analysis on large volumes of data constitutes a challenge for deducting knowledge and new information. Data can be heterogeneous and complex: Semi-structured data (Example: XML), Data from social networks (Example: Tweets) and Factual data (Example: Spreading of Covid-19). In this paper, we propose a generic multidimensional model in order to analyze complex data, according to several dimensions.

Science High School Students' Analysis of Characteristics on Ill-Structured Problem-Solving Process (과학고 학생들의 비구조화된 문제 해결 과정 특성 분석)

  • Seo, Jin-Su;Han, Shin;Kim, Hyung-Bum;Jeong, Jin-Woo
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.5 no.1
    • /
    • pp.8-19
    • /
    • 2012
  • The purpose of this study is to: analyze the characteristics on ill-structured problem-solving process; examine the type of memories used in their monitoring. The data were primary collected from observation and secondary the semi-structured in-depth interviews based on analysis of observation results with two students who belong to science school and a guidance. The findings of this study revealed that the ill-structured problems possess multiple representations and the upper level's problem have several sub-problems. And multiple steps simultaneously exist in particular stage of problem-solving process that is not single sequential but complex flow and have high frequency of discussion step. Type of memories used in ill-structured problems include idiosyncratic memories which is related in personal histories such as school performance, problem-related memories, abstract rules and intuition.

A Path Partitioning Technique for Indexing XML Data (XML 데이타 색인을 위한 경로 분할 기법)

  • 김종익;김형주
    • Journal of KIISE:Databases
    • /
    • v.31 no.3
    • /
    • pp.320-330
    • /
    • 2004
  • Query languages for XML use paths in a data graph to represent queries. Actually, paths in a data graph are used as a basic constructor of an XML query. User can write more expressive Queries by using Patterns (e.g. regular expressions) for paths. There are many identical paths in a data graph because of the feature of semi-structured data. Current researches for indexing XML utilize identical paths in a data graph, but such an index can grow larger than source data graph and cannot guarantee efficient access path. In this paper we propose a partitioning technique that can partition all the paths in a data graph. We develop an index graph that can find appropriate partitions for a path query efficiently. The size of our index graph can be adjusted regardless of the source data. So, we can significantly improve the cost for index graph traversals. In the performance study, we show our index much faster than other graph based indexes.

Comprehensive studies of Grassmann manifold optimization and sequential candidate set algorithm in a principal fitted component model

  • Chaeyoung, Lee;Jae Keun, Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.721-733
    • /
    • 2022
  • In this paper we compare parameter estimation by Grassmann manifold optimization and sequential candidate set algorithm in a structured principal fitted component (PFC) model. The structured PFC model extends the form of the covariance matrix of a random error to relieve the limits that occur due to too simple form of the matrix. However, unlike other PFC models, structured PFC model does not have a closed form for parameter estimation in dimension reduction which signals the need of numerical computation. The numerical computation can be done through Grassmann manifold optimization and sequential candidate set algorithm. We conducted numerical studies to compare the two methods by computing the results of sequential dimension testing and trace correlation values where we can compare the performance in determining dimension and estimating the basis. We could conclude that Grassmann manifold optimization outperforms sequential candidate set algorithm in dimension determination, while sequential candidate set algorithm is better in basis estimation when conducting dimension reduction. We also applied the methods in real data which derived the same result.

Nurses' Experiences of Caring for Disabled Women during Pregnancy and Childbirth (여성장애인을 위한 임신과 출산 돌봄에 대한 간호사의 경험)

  • Lee, Eun-Joo
    • Women's Health Nursing
    • /
    • v.22 no.4
    • /
    • pp.308-321
    • /
    • 2016
  • Purpose:This phenomenological study was to describe and to understand nurses' experiences of caring for woman with disability during pregnancy and childbirth. Methods: Participants were 13 nurses from 3 hospitals and 2 local clinics in J city, and were selected through snow-balling method. Data were collected two face to face, semi-structured interviews. The researcher used MP3 player and smart phone for recording as well as transcription process. As for the data analysis, Colaizzi's method was applied. Results: Nurses' experiences were structured as four theme clusters: 'Communicating between/among nurses', 'Recognizing pregnancy and childbirth of woman with disability', 'Taking care of woman with disability based on their differences' and 'Reflecting on nursing care for woman with disability'. Conclusion: It seemed that nurses' recognition to woman with disability and her pregnancy and childbirth was related their nursing care for woman with disability.

A Comparison of Performance Between MSSQL Server and MongoDB for Telco Subscriber Data Management (통신 가입자 데이터 관리를 위한 MSSQL Server와 NoSQL MongoDB의 성능 비교)

  • Nichie, Aaron;Koo, Heung-Seo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.3
    • /
    • pp.469-476
    • /
    • 2016
  • Relational Database Management Systems have become de facto database model among most developers and users since the inception of Data Science. From IoT devices, sensors, social media and other sources, data is generated in structured, semi-structured and unstructured formats, in huge volumes, thereby the difficulty of data management greatly increases. Organizations that collect large amounts of data are increasingly turning to non relational databases - NoSQL databases. In this paper, through experiments with real field data, we demonstrate that MongoDB, a document-based NoSQL database, is a better alternative for building a Telco Subscriber Data Management System which hitherto is mainly built with Relational Database Management Systems. We compare the existing system in various phases of data flow with our proposed system powered by MongoDB. We show how various workloads at some phases of the existing system were either completely removed or significantly simplified on the new system. Based on experiment results, using MongoDB for managing telco subscriber data turned out to offer performance better than the existing system built with MSSQL Server.

Users' Relevance Criteria in Universal Search in Korea : An Exploratory Study (통합 검색 환경에서 이용자 적합성 판단 기준에 관한 탐색적 연구)

  • Park, Jung-Ah
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.113-133
    • /
    • 2012
  • This study is an exploratory research on the user relevance criteria in Korean search service environments that provide integrated search results. Data were collected from 10 participants using a semi-structured interview technique. The participants conducted a web search using integrated search services, such as Naver or Daum on a self-selected topic. They were asked to judge the relevance of retrieved documents and to report their relevance criteria. As a result, the research indicated 8 user-defined relevance and non-relevance criteria. The research shows that specificity and richness are the two most important criteria yet, the user's relevance criteria have not changed much despite the change in search environment.

Environmental Change and Causes of Local Conflicts in the Geumgang Estuary (금강하구역 환경 변화와 주민 갈등 요인)

  • Park, Keumjoo;Lee, Chang-hee;YEO, Hyoung Beom;Ju, Yung-Ki;Kim, Eoksu;Mun, Seul-ki
    • Journal of Korean Society on Water Environment
    • /
    • v.33 no.2
    • /
    • pp.149-159
    • /
    • 2017
  • After the artificial barrage was constructed in 1990s, the Geumgang estuary has been experiencing considerable changes in nature as well as in socioeconomic and culture in the vicinity villages. In order to understand how the change of estuarine environment bring about conflicts among the local communities, and resolve the conflicts, the research investigated the causes of the conflicts in the Geumgang estuary using in-depth and semi-structured interview method. 100 local people who have lived in the vicinity of Geumgang estuary for more than 30 years were selected for the interviews. Results of the research shows that local people's jobs determine the opinions about the estuary barrage and the way of estuarine management. Understanding environmental change and local conflicts helps to develop a sustainable and integrated estuary management system in the region.

Weather-Forecasters' Perception about the Nature of Science (기상 정보 전달자의 과학의 본성에 대한 인식 연구)

  • Park, Gye-Hyun;Han, Shin;Jeong, Jin-Woo;Park, Tae-Yoon
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.8 no.2
    • /
    • pp.114-127
    • /
    • 2015
  • The nature of science has been recognized in a great deal in the field of science education. However, Most of the papers were going to study of teachers and students. to improve their recognition of the nature of science. The current study describes and analyzes Weather-Forecaster's understandings of the nature of science (NOS). Data used in this study were collected from 3 Weather-Forecasters using an semi-structured interview. The results of this study were as follows. First, the participants recognized that science has explored the phenomenon of unknown facts or observations and they were careful inductive perspective. Second, participants felt that science and society are associated with each other. Also, all participants were judged science verification process is required. Third, they are showed that science and technology interact closely with social relationships.