• Title/Summary/Keyword: Information Hub

Search Result 448, Processing Time 0.029 seconds

A Study on the Promotion of Electronic Government and Plans for Archival Management (전자정부 추진과 기록관리방안)

  • Kim, Jae-hun
    • The Korean Journal of Archival Studies
    • /
    • no.5
    • /
    • pp.39-85
    • /
    • 2002
  • This paper is aimed at proposing the policies for managing archives in the process of promoting Electronic Government System. Although there have been many studies of electronic government project and plans for its establishment, this research examines the electronic government system and its problems on the basis of archival science. What I acquired in this paper is as follows. The development of information technology needs great changes ranging from the nation to the individuals. It becomes common that the use of computerized program for business purposes, computerization of information materials and the effective way of search use of electronic documents. Therefore, more and more countries all over the world have been seeking to promote 'Electronic Government', which applies the fruits of the development in information technology to administration process. Recently, Korea has been rapidly entered into the 'Electronic Government' system being against the traditional way of administration. In electronic government system, the 'Life Cycle' of public records will be computerized. Therefore, it is important to change and develop along with the government's policies for 'electronic government project' in the archival management system. This means that the archival management system which have put emphasis on the textual records should be converted to electronic records system. In other words, the records management in electronic government system requires not the transfer and preservation of the records but the consistent management system including the whole process of creating, appraising, arranging, preserving and using the records. So, the systematic management of electronic records plays an important role in realization of electronic government, but it is a subject to be realized by electronic government at the same time. However, the government have overlooked the importance of archival management for long time, especially the importance of electronic records management system. First of all, this research attempts to infer limits and problems through the theoretical considerations of the existing studies for electronic government and to clear up the relations between electronic government and archival management. Based on this, I'll seek to progress the study through reviewing the present condition of archival management in the process of promoting electronic government and suggesting the policies for enhancing the successful electronic government and the construction of scientific archival management system. Since early 1990, many countries in the world have been making every effort to concrete 'Electronic Government'. Using the examples in other nations, it is not difficult to recognize that the embodiment of electronic government is closely connected with the archival management policies. Korea have completed legal and institutional equipments including the new establishment of "Electronic Government Law" to realize electronic government. Also, Korea has been promoting electronic government with the Ministry of Government Administration and Home Affairs and Government Computer Center as a leaders. Though managing records, especially the management of electronic records is essential in electronic government system, we haven't yet discussed this section in Korea. This is disapproved by the fact the Government Archives and Records Service has played little role in promoting electronic government project. There are two problems relating this environment. First, present system can't meet the consistent 'Life Cycle' ranging from the creation to the preservation of electronic records. Second, the 'Life Cycle' of electronic records is divided into two parts and managed separately by GCC and GARS. The life of records is not end with the process raged from creation to distribution. On the other hand, the records are approved their value only whole procedures. Therefore, GARS should play a deading role in designing and establishing the archival management system. The answer to these problems, is as follows. First, we have to complete the electronic records management system through introducing ERMS not EDMS. This means that we should not change and develop towards ERMS simply with supplementing the current electronic records management system. I confirm that it is important and proper to establish ERMS system from the very beginning of the process of promoting electronic government. Second, I suggest the developmental integration of GARS and GCC. At present, the divided operations of GCC and GARS, the former is in charge of the management center for electronic business and the latter is the hub institution of managing nation's records and archives result in many obstacles in establishing electronic government system and accomplishing the duties of systematic archival management. Therefore, I conclude that the expansive movement towards 'National Archives' through the integration among the related agencies will make a great contribution to the realization of electronic government and the establishment of archival management system. In addition to this, it will be of much help to constitute and operate the 'Task Force' regarding the management of electronic records with the two institution as the central figures.

Improved Social Network Analysis Method in SNS (SNS에서의 개선된 소셜 네트워크 분석 방법)

  • Sohn, Jong-Soo;Cho, Soo-Whan;Kwon, Kyung-Lag;Chung, In-Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.117-127
    • /
    • 2012
  • Due to the recent expansion of the Web 2.0 -based services, along with the widespread of smartphones, online social network services are being popularized among users. Online social network services are the online community services which enable users to communicate each other, share information and expand human relationships. In the social network services, each relation between users is represented by a graph consisting of nodes and links. As the users of online social network services are increasing rapidly, the SNS are actively utilized in enterprise marketing, analysis of social phenomenon and so on. Social Network Analysis (SNA) is the systematic way to analyze social relationships among the members of the social network using the network theory. In general social network theory consists of nodes and arcs, and it is often depicted in a social network diagram. In a social network diagram, nodes represent individual actors within the network and arcs represent relationships between the nodes. With SNA, we can measure relationships among the people such as degree of intimacy, intensity of connection and classification of the groups. Ever since Social Networking Services (SNS) have drawn increasing attention from millions of users, numerous researches have made to analyze their user relationships and messages. There are typical representative SNA methods: degree centrality, betweenness centrality and closeness centrality. In the degree of centrality analysis, the shortest path between nodes is not considered. However, it is used as a crucial factor in betweenness centrality, closeness centrality and other SNA methods. In previous researches in SNA, the computation time was not too expensive since the size of social network was small. Unfortunately, most SNA methods require significant time to process relevant data, and it makes difficult to apply the ever increasing SNS data in social network studies. For instance, if the number of nodes in online social network is n, the maximum number of link in social network is n(n-1)/2. It means that it is too expensive to analyze the social network, for example, if the number of nodes is 10,000 the number of links is 49,995,000. Therefore, we propose a heuristic-based method for finding the shortest path among users in the SNS user graph. Through the shortest path finding method, we will show how efficient our proposed approach may be by conducting betweenness centrality analysis and closeness centrality analysis, both of which are widely used in social network studies. Moreover, we devised an enhanced method with addition of best-first-search method and preprocessing step for the reduction of computation time and rapid search of the shortest paths in a huge size of online social network. Best-first-search method finds the shortest path heuristically, which generalizes human experiences. As large number of links is shared by only a few nodes in online social networks, most nods have relatively few connections. As a result, a node with multiple connections functions as a hub node. When searching for a particular node, looking for users with numerous links instead of searching all users indiscriminately has a better chance of finding the desired node more quickly. In this paper, we employ the degree of user node vn as heuristic evaluation function in a graph G = (N, E), where N is a set of vertices, and E is a set of links between two different nodes. As the heuristic evaluation function is used, the worst case could happen when the target node is situated in the bottom of skewed tree. In order to remove such a target node, the preprocessing step is conducted. Next, we find the shortest path between two nodes in social network efficiently and then analyze the social network. For the verification of the proposed method, we crawled 160,000 people from online and then constructed social network. Then we compared with previous methods, which are best-first-search and breath-first-search, in time for searching and analyzing. The suggested method takes 240 seconds to search nodes where breath-first-search based method takes 1,781 seconds (7.4 times faster). Moreover, for social network analysis, the suggested method is 6.8 times and 1.8 times faster than betweenness centrality analysis and closeness centrality analysis, respectively. The proposed method in this paper shows the possibility to analyze a large size of social network with the better performance in time. As a result, our method would improve the efficiency of social network analysis, making it particularly useful in studying social trends or phenomena.

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

Identifying Bridging Nodes and Their Essentiality in the Protein-Protein Interaction Networks (단백질 상호작용 네트워크에서 연결노드 추출과 그 중요도 측정)

  • Ahn, Myoung-Sang;Ko, Jeong-Hwan;Yoo, Jae-Soo;Cho, Wan-Sup
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.5
    • /
    • pp.1-13
    • /
    • 2007
  • In this research, we found out that bridging nodes have great effect on the robustness of protein-protein interaction networks. Until now, many researchers have focused on node's degree as node's essentiality. Hub nodes in the scale-free network are very essential in the network robustness. Some researchers have tried to relate node's essentiality with node's betweenness centrality. These approaches with betweenness centrality are reasonable but there is a positive relation between node's degree and betweenness centrality value. So, there are no differences between two approaches. We first define a bridging node as the node with low connectivity and high betweenness value, we then verify that such a bridging node is a primary factor in the network robustness. For a biological network database from Internet, we demonstrate that the removal of bridging nodes defragment an entire network severally and the importance of the bridging nodes in the network robustness.

  • PDF

A Folksonomy Ranking Framework: A Semantic Graph-based Approach (폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근)

  • Park, Hyun-Jung;Rho, Sang-Kyu
    • Asia pacific journal of information systems
    • /
    • v.21 no.2
    • /
    • pp.89-116
    • /
    • 2011
  • In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.

Multilateral Approach to forming Air Logistics Hub on North East Asia Region (동북아 항공물류허브을 구축하기 위한 다자적 접근방안)

  • Hong, Seock-Jin
    • The Korean Journal of Air & Space Law and Policy
    • /
    • v.19 no.2
    • /
    • pp.97-136
    • /
    • 2004
  • The Northeast Asian air cargo market has expanded tremendously as a result of the opening up of the Chinese market. The importance of the Asia-Pacific region in the global air transport has also increased. The exchange of human and material resources, services, and information in Northeast Asia, which is expected to increase in the near future, requires that the airlines operating within this region adopt a more liberalized approach. This paper introduced alternatives which can be applied to the Northeast Asian airlines industry so as to bring about the integration of regional air transport: First, this paper found a need for individual Northeast Asian nations to alter their policies towards the airlines industry. Second, each country should further liberalize their respective domestic air transport. Third, there is a need for freer air service agreements to be signed between the nations of Northeast Asia. Fourth, the strategic alliances between the airlines operating in Northeast Asia should be further strengthened. Fifth, this liberalization process should be carried out in an incremental manner, beginning with more competitive airports and routes, or with less-in-demand routes. Sixth, there is a need for a shuttle system to be put into place between the main airports in China, Korea, and Japan. Seventh, these three nations jointly develop aviation safety and security systems that are in accordance with international standards. Eighth, the liberalization process of the aviation industry should be undertaken in conjunction with other related fields. Ninth, organizations linking together civil aviation organization in the Asia-Pacific area should be formed, as should each government linking together. By doing so, these countries will be able to establish regular venues through which to exchange opinions on the integration and liberalization of the air cargo market so as to induce the gradual liberalization of the actual market. The liberalization of the air transport in Northeast Asia will prove to be a daunting task in the short term. However, if the Chinese airlines continue to exhibit continuous growth and Japanese airlines are able to complete their move towards a low-cost structure, this process could be completed earlier than expected. Over the last twenty five years the air transport has undergone tremendous changes. The most important factor behind these changes has been the increased liberalization of the market. As a result, rates have decreased while demand has increased. This has resulted in turning the air transport industry, which was long perceived as an industry in decline, into a high-growth industry. The only method of increasing regional exchanges in the air transport is to pursue further liberalization. The country which implements this liberalization process at the earliest date may very well emerge as a leading force within the air transport industry.

  • PDF

THE LUMINOSITY-LINEWIDTH RELATION AS A PROBE OF THE EVOLUTION OF FIELD GALAXIES

  • GUHATHAKURTA PURAGRA;ING KRISTINE;RIX HANS-WALTER;COLLESS MATTHEW;WILLIAMS TED
    • Journal of The Korean Astronomical Society
    • /
    • v.29 no.spc1
    • /
    • pp.63-64
    • /
    • 1996
  • The nature of distant faint blue field galaxies remains a mystery, despite the fact that much attention has been devoted to this subject in the last decade. Galaxy counts, particularly those in the optical and near ultraviolet bandpasses, have been demonstrated to be well in excess of those expected in the 'no-evolution' scenario. This has usually been taken to imply that galaxies were brighter in the past, presumably due to a higher rate of star formation. More recently, redshift surveys of galaxies as faint as B$\~$24 have shown that the mean redshift of faint blue galaxies is lower than that predicted by standard evolutionary models (de-signed to fit the galaxy counts). The galaxy number count data and redshift data suggest that evolutionary effects are most prominent at the faint end of the galaxy luminosity function. While these data constrain the form of evolution of the overall luminosity function, they do not constrain evolution in individual galaxies. We are carrying out a series of observations as part of a long-term program aimed at a better understanding of the nature and amount of luminosity evolution in individual galaxies. Our study uses the luminosity-linewidth relation (Tully-Fisher relation) for disk galaxies as a tool to study luminosity evolution. Several studies of a related nature are being carried out by other groups. A specific experiment to test a 'no-evolution' hypothesis is presented here. We have used the AUTOFIB multifibre spectro-graph on the 4-metre Anglo-Australian Telescope (AAT) and the Rutgers Fabry-Perot imager on the Cerro Tolalo lnteramerican Observatory (CTIO) 4-metre tele-scope to measure the internal kinematics of a representative sample of faint blue field galaxies in the red-shift range z = 0.15-0.4. The emission line profiles of [OII] and [OIII] in a typical sample galaxy are significantly broader than the instrumental resolution (100-120 km $s^{-l}$), and it is possible to make a reliable de-termination of the linewidth. Detailed and realistic simulations based on the properties of nearby, low-luminosity spirals are used to convert the measured linewidth into an estimate of the characteristic rotation speed, making statistical corrections for the effects of inclination, non-uniform distribution of ionized gas, rotation curve shape, finite fibre aperture, etc.. The (corrected) mean characteristic rotation speed for our distant galaxy sample is compared to the mean rotation speed of local galaxies of comparable blue luminosity and colour. The typical galaxy in our distant sample has a B-band luminosity of about 0.25 L$\ast$ and a colour that corresponds to the Sb-Sd/Im range of Hub-ble types. Details of the AUTOFIB fibre spectroscopic study are described by Rix et al. (1996). Follow-up deep near infrared imaging with the 10-metre Keck tele-scope+ NIRC combination and high angular resolution imaging with the Hubble Space Telescope's WFPC2 are being used to determine the structural and orientation parameters of galaxies on an individual basis. This information is being combined with the spatially resolved CTIO Fabry-Perot data to study the internal kinematics of distant galaxies (Ing et al. 1996). The two main questions addressed by these (preliminary studies) are: 1. Do galaxies of a given luminosity and colour have the same characteristic rotation speed in the distant and local Universe? The distant galaxies in our AUTOFIB sample have a mean characteristic rotation speed of $\~$70 km $s^{-l}$ after correction for measurement bias (Fig. 1); this is inconsistent with the characteristic rotation speed of local galaxies of comparable photometric proper-ties (105 km $s^{-l}$) at the > $99\%$ significance level (Fig. 2). A straightforward explanation for this discrepancy is that faint blue galaxies were about 1-1.5 mag brighter (in the B band) at z $\~$ 0.25 than their present-day counterparts. 2. What is the nature of the internal kinematics of faint field galaxies? The linewidths of these faint galaxies appear to be dominated by the global disk rotation. The larger galaxies in our sample are about 2"-.5" in diameter so one can get direct insight into the nature of their internal velocity field from the $\~$ I" seeing CTIO Fabry-Perot data. A montage of Fabry-Perot data is shown in Fig. 3. The linewidths are too large (by. $5\sigma$) to be caused by turbulence in giant HII regions.

  • PDF

A Suggestion for Spatiotemporal Analysis Model of Complaints on Officially Assessed Land Price by Big Data Mining (빅데이터 마이닝에 의한 공시지가 민원의 시공간적 분석모델 제시)

  • Cho, Tae In;Choi, Byoung Gil;Na, Young Woo;Moon, Young Seob;Kim, Se Hun
    • Journal of Cadastre & Land InformatiX
    • /
    • v.48 no.2
    • /
    • pp.79-98
    • /
    • 2018
  • The purpose of this study is to suggest a model analysing spatio-temporal characteristics of the civil complaints for the officially assessed land price based on big data mining. Specifically, in this study, the underlying reasons for the civil complaints were found from the spatio-temporal perspectives, rather than the institutional factors, and a model was suggested monitoring a trend of the occurrence of such complaints. The official documents of 6,481 civil complaints for the officially assessed land price in the district of Jung-gu of Incheon Metropolitan City over the period from 2006 to 2015 along with their temporal and spatial poperties were collected and used for the analysis. Frequencies of major key words were examined by using a text mining method. Correlations among mafor key words were studied through the social network analysis. By calculating term frequency(TF) and term frequency-inverse document frequency(TF-IDF), which correspond to the weighted value of key words, I identified the major key words for the occurrence of the civil complaint for the officially assessed land price. Then the spatio-temporal characteristics of the civil complaints were examined by analysing hot spot based on the statistics of Getis-Ord $Gi^*$. It was found that the characteristic of civil complaints for the officially assessed land price were changing, forming a cluster that is linked spatio-temporally. Using text mining and social network analysis method, we could find out that the occurrence reason of civil complaints for the officially assessed land price could be identified quantitatively based on natural language. TF and TF-IDF, the weighted averages of key words, can be used as main explanatory variables to analyze spatio-temporal characteristics of civil complaints for the officially assessed land price since these statistics are different over time across different regions.