• Title/Summary/Keyword: 집계단위

Search Result 84, Processing Time 0.027 seconds

SERADE: Section Representation Aggregation Retrieval for Long Document Ranking (SERADE : 섹션 표현 기반 문서 임베딩 모델을 활용한 긴 문서 검색 성능 개선)

  • Hye-In Jung;Hyun-Kyu Jeon;Ji-Yoon Kim;Chan-Hyeong Lee;Bong-Su Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.135-140
    • /
    • 2022
  • 최근 Document Retrieval을 비롯한 대부분의 자연어처리 분야에서는 BERT와 같이 self-attention을 기반으로 한 사전훈련 모델을 활용하여 SOTA(state-of-the-art)를 이루고 있다. 그러나 self-attention 메커니즘은 입력 텍스트 길이의 제곱에 비례하여 계산 복잡도가 증가하기 때문에, 해당 모델들은 선천적으로 입력 텍스트의 길이가 제한되는 한계점을 지닌다. Document Retrieval 분야에서는, 문서를 특정 토큰 길이 단위의 문단으로 나누어 각 문단의 유사 점수 또는 표현 벡터를 추출한 후 집계함으로서 길이 제한 문제를 해결하는 방법론이 하나의 주류를 이루고 있다. 그러나 논문, 특허와 같이 섹션 형식(초록, 결론 등)을 갖는 문서의 경우, 섹션 유형에 따라 고유한 정보 특성을 지닌다. 따라서 문서를 단순히 특정 길이의 문단으로 나누어 학습하는 PARADE와 같은 기존 방법론은 각 섹션이 지닌 특성을 반영하지 못한다는 한계점을 지닌다. 본 논문에서는 섹션 유형에 대한 정보를 포함하는 문단 표현을 학습한 후, 트랜스포머 인코더를 사용하여 집계함으로서, 결과적으로 섹션의 특징과 상호 정보를 학습할 수 있도록 하는 SERADE 모델을 제안하고자 한다. 실험 결과, PARADE-Transformer 모델과 비교하여 평균 3.8%의 성능 향상을 기록하였다.

  • PDF

A Study on the Application and Requirements of Socioeconomic GIS Data (사회경제적 지리정보 활용 및 데이터 요구조건에 관한 연구)

  • Nam, Kwang-Woo;Kim, Ho-Yong;Lee, Sung-Ho;Lee, Sang-Hak;Ha, Su-Wook;Choi, Hyun
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.8 no.3
    • /
    • pp.44-54
    • /
    • 2005
  • Most advanced countries in GIS field have established and managed georeferenced socioeconomic data systematically and made a great profit on various social and economic areas. In Korea, however, socioeconomic geographical information is relatively poor compared to systems related to geographical and topographical features. This is mainly due to the characteristics of the process from the construction to the utilization of socioeconomic data. That is, from the stage of data construction, socioeconomic data require solutions for frequent changes compared to data on geographical and topographical features and, because of difficulties in marking the positions of individual entities, information is built up through setting appropriate spatial units of aggregation. In the stage of data utilization, the data often need to be combined with other types of socioeconomic data due to the complexity of socioeconomic phenomena. Thus, the this study examined usability of GIS in socioeconomic fields and the spatial dimension of socioeconomic information through representative cases of GIS in developed countries and, based on the results, derived data requirements for socioeconomic GIS found in the construction and utilization of data and proposed solutions for the requirements.

  • PDF

A Study on the Result of Application of Designation Criteria for Urban Regeneration Activation Zone by the Spatial Range (공간적 범위의 차이에 의한 도시재생 활성화지역 지정기준 적용 결과에 관한 연구)

  • Lee, Jong Hwi;Lee, Tae Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.10
    • /
    • pp.567-573
    • /
    • 2020
  • This study was done to develop indicators for cities that can be used in the long term and in a sustainable manner. Activation indicators were developed to improve the resilience in the downtown area of Seo-gu, Incheon. Preliminary indicators were derived from prior studies on similar indicators of resilience for urban regeneration, and an expert opinion survey was conducted to analyze the suitability and importance of the indicators. Activation indicators were established for improving urban resilience in six areas: population stability, social inclusion, industrial diversity, local productivity, environmental sustainability, and social-based convenience. From 60 preliminary indicators, 42 indicators were selected through the expert opinion surveys for securing an economically active population, establishing a living infrastructure, improving the settlement environment, and upgrading industry to reflect the characteristics of the West, including industrial complexes. It was found that diversification is necessary. Further study is still necessary to improve the objectivity of the indicators and calculate a resilience index. The significance of this study is that it looks at quantitative indicators, complements other studies on regional decline diagnosis, and presents realistic alternatives suitable for domestic situations based on the concept of resilience.

Home-based OD Matrix Production and Analysis Using Mobile Phone Data (이동통신 자료를 활용한 가정기반 OD 구축 및 분석)

  • Kim, Kyoungtae;Oh, Dongkyu;Lee, Inmook;Min, Jae Hong
    • Journal of the Korean Society for Railway
    • /
    • v.19 no.5
    • /
    • pp.656-662
    • /
    • 2016
  • Based on time dependent location data of mobile phone users, users' ODs were produced after tracing their travel route and inducing their origins and destinations. System considered average signalizing frequency, which means that the longer the travel length is the more frequent the signal is. This is a home-based OD and is limited to the Seoul Metropolitan area. The OD matrix from the mobile phone data which was aggregated to the cell and transformed to the 'Dong' area, was compared to the KTDB OD. The results can be analyzed and it was determined that they are highly correlated because individual coefficients are 0.98 and 0.85, the former between the OD of this study and the KTDB Si/Gun/Gu unit area OD and the latter between the OD of this study and the Dong unit area KTDB OD.

A Dual Processing Load Shedding to Improve The Accuracy of Aggregate Queries on Clustering Environment of GeoSensor Data Stream (클러스터 환경에서 GeoSensor 스트림 데이터의 집계질의의 정확도 향상을 위한 이중처리 부하제한 기법)

  • Ji, Min-Sub;Lee, Yeon;Kim, Gyeong-Bae;Bae, Hae-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.1
    • /
    • pp.31-40
    • /
    • 2012
  • u-GIS DSMSs have been researched to deal with various sensor data from GeoSensors in ubiquitous environment. Also, they has been more important for high availability. The data from GeoSensors have some characteristics that increase explosively. This characteristic could lead memory overflow and data loss. To solve the problem, various load shedding methods have been researched. Traditional methods drop the overloaded tuples according to a particular criteria in a single server. Tuple deletion sensitive queries such as aggregation is hard to satisfy accuracy. In this paper a dual processing load shedding method is suggested to improve the accuracy of aggregation in clustering environment. In this method two nodes use replicated stream data for high availability. They process a stream in two nodes by using a characteristic they share stream data. Stream data are synchronized between them with a window as a unit. Then, processed results are merged. We gain improved query accuracy without data loss.

A Data Transformation Method for Visualizing the Statistical Information based on the Grid (격자 기반의 통계정보 표현을 위한 데이터 변환 방법)

  • Kim, Munsu;Lee, Jiyeong
    • Spatial Information Research
    • /
    • v.23 no.5
    • /
    • pp.31-40
    • /
    • 2015
  • The purpose of this paper is to propose a data transformation method for visualizing the statistical information based on the grid system which has regular shape and size. Grid is better solution than administrator boundary or census block to check the distribution of the statistical information and be able to use as a spatial unit on the map flexibly. On the other hand, we need the additional process to convert the various statistical information to grid if we use the current method which is areal interpolation. Therefore, this paper proposes the 3 steps to convert the various statistical information to grid. 1)Geocoding the statistical information, 2)Converting the spatial information through the defining the spatial relationship, 3)Attribute transformation considering the data scale measurement. This method applies to the population density of Seoul to convert to the grid. Especially, spatial autocorrelation is performed to check the consistency of grid display if the reference data is different for same statistic information. As a result, both distribution of grid are similar to each other when the population density data which is represented by census block and building is converted to grid. Through the result of implementation, it is demonstrated to be able to perform the consistent data conversion based on the proposed method.

A Comparative Analysis of Areal Interpolation Methods for Representing Spatial Distribution of Population Subgroups (하위인구집단의 분포 재현을 위한 에어리얼 인터폴레이션의 비교 분석)

  • Cho, Daeheon
    • Spatial Information Research
    • /
    • v.22 no.3
    • /
    • pp.35-46
    • /
    • 2014
  • Population data are usually provided at administrative spatial units in Korea, so areal interpolation is needed for fine-grained analysis. This study aims to compare various methods of areal interpolation for population subgroups rather than the total population. We estimated the number of elderly people and single-person households for small areal units from Dong data by the different interpolation methods using 2010 census data of Seoul, and compared the estimates to actual values. As a result, the performance of areal interpolation methods varied between the total population and subgroup populations as well as between different population subgroups. It turned out that the method using GWR (geographically weighted regression) and building type data outperformed other methods for the total population and households. However, the OLS regression method using building type data performed better for the elderly population, and the OLS regression method based on land use data was the most effective for single-person households. Based on these results, spatial distribution of the single elderly was represented at small areal units, and we believe that this approach can contribute to effective implementation of urban policies.

Evaluation of Metro Services based on Transit Smart Card Data (A Case Study of Incheon Line 1) (스마트카드 데이터를 활용한 도시철도 서비스 평가 (인천 1호선의 차내혼잡과 정시성을 중심으로))

  • Eom, Jin-Ki;Choi, Myoung-Hun;Kim, Dae-Sung;Lee, Jun;Song, Ji-Young
    • Journal of the Korean Society for Railway
    • /
    • v.15 no.1
    • /
    • pp.80-87
    • /
    • 2012
  • This study analyzed the quality of a commuter rail service of Incheon line 1 with respect to two service measures such as occupancy (crowdedness) and punctuality based on transit smart card data collected in 2009. In order to analyze the metro services by individual fleet, we aggregated the personal level card data into the fleet operated in each planned schedule. The results show a low level of service for both crowdedness and punctuality during peak hours at the line segment from 'Gyeyang' to 'International business district'. Further, a close relationship between vehicle occupancy and punctuality is found, which illustrates high passenger demand causes successive metro delay.

A Study on Uncertainty in the Probable Precipitation According to Precipitation Recording Methods (강수량 기록방식에 따른 확률강수량 산정의 불확실성 고찰)

  • Heeseong Park;Hyoung Seop Kim
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.259-259
    • /
    • 2023
  • 강수량자료는 기초 수문자료의 하나로서 자료 수집시 기록방식에 따라 자료의 정확도가 달라질 수 있다. 주로 많이 사용되는 기록방식은 정시 기록방식이지만 실제 강수계에서는 강수이벤트의 기록이 먼저 이루어진다. 정시 기록 방식은 관측을 하기로 정해 놓은 시각(정시)에 강수계에 집계된 강수량을 읽어 그대로 기록하는 방식이고, 강수이벤트의 기록은 최저관측해상도에 도달하는 강수가 발생한 시각을 기록하는 방식이다. 동일한 강수가 발생하더라도 기록 방식에 따라 이후에 분석에서 다른 결과를 보여줄 수 있다. 특히 확률강수량 산정에 불확실성을 키우는 방향으로 영향을 줄 수 있다. 이에 본 연구에서는 이러한 기록방식에 따른 불확실성을 분석하기 위해 강우모의 발생기법을 이용하여 대규모의 강우를 모의하고 이를 앞서의 두 가지 기록방식으로 기록한 후 기록된 자료를 이용해 확률강수량을 산정하고 기록으로 변환하지 않은 자료를 직접 이용하여 확률강수량을 산정하는 방법으로 각 방법의 불확실성을 비교해 보았다. 또한 측정의 최소단위를 변화시켜 기록한 다음 다시 분석하여 측정의 최소단위가 기록방식에 따라 어떻게 불확실성에 영향을 주는지 알아보았다. 이러한 결과가 향후 강수량의 기록 관리방법의 개선에 반영된다면 좀 더 정확한 수문 분석에 도움이 될 것이다.

  • PDF

Prioritization of Flood Restoration Projects by Administrative Districts (행정구역별 치수사업의 우선순위 결정)

  • Kang, Seongkyu;Choi, Si Jung;Lee, Dong Ryul
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.470-470
    • /
    • 2018
  • 본 연구는 2014년 부산지역에서 발생한 집중호우에 따른 피해상황을 읍 면 동 단위의 행정구역에 대해 조사하고, 각 행정구역별 수해복구사업의 우선순위를 결정 할 수 있는 방법을 모색하는 것을 목표로 한다. 피해현황은 인명피해(사망, 이재민 수), 건물 및 선박, 농경지 침수에 의한 피해, 공공시설물에 대한 피해를 조사하였다. 또한 피해 요인별 피해액을 집계하여 수해복구사업의 평가기준으로 이용하였다. 사업의 경제성은 B/C분석 결과를 통해 반영하였다. 각 행정구역에 대한 우선순위는 다기준분석 중 PROMETHEE, ELECTRE 방법으로 분석하였고 표준화 방식은 T-Score방식을, 가중치는 엔트로피 방식으로 결정하여 분석에 반영하였다. 본 연구에서는 실제 발생한 호우에 대한 피해복구사업에 적용하여 검증을 시도하였으나, 향후 치수사업의 다양한 대안을 선정하고 우선순위를 결정하여 실제 도시개발 및 정비 사업 등으로 분야를 확장할 수 있을 것으로 기대한다.

  • PDF