• Title/Summary/Keyword: dataset

Search Result 3,881, Processing Time 0.032 seconds

Recommender Systems using Structural Hole and Collaborative Filtering (구조적 공백과 협업필터링을 이용한 추천시스템)

  • Kim, Mingun;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.107-120
    • /
    • 2014
  • This study proposes a novel recommender system using the structural hole analysis to reflect qualitative and emotional information in recommendation process. Although collaborative filtering (CF) is known as the most popular recommendation algorithm, it has some limitations including scalability and sparsity problems. The scalability problem arises when the volume of users and items become quite large. It means that CF cannot scale up due to large computation time for finding neighbors from the user-item matrix as the number of users and items increases in real-world e-commerce sites. Sparsity is a common problem of most recommender systems due to the fact that users generally evaluate only a small portion of the whole items. In addition, the cold-start problem is the special case of the sparsity problem when users or items newly added to the system with no ratings at all. When the user's preference evaluation data is sparse, two users or items are unlikely to have common ratings, and finally, CF will predict ratings using a very limited number of similar users. Moreover, it may produces biased recommendations because similarity weights may be estimated using only a small portion of rating data. In this study, we suggest a novel limitation of the conventional CF. The limitation is that CF does not consider qualitative and emotional information about users in the recommendation process because it only utilizes user's preference scores of the user-item matrix. To address this novel limitation, this study proposes cluster-indexing CF model with the structural hole analysis for recommendations. In general, the structural hole means a location which connects two separate actors without any redundant connections in the network. The actor who occupies the structural hole can easily access to non-redundant, various and fresh information. Therefore, the actor who occupies the structural hole may be a important person in the focal network and he or she may be the representative person in the focal subgroup in the network. Thus, his or her characteristics may represent the general characteristics of the users in the focal subgroup. In this sense, we can distinguish friends and strangers of the focal user utilizing the structural hole analysis. This study uses the structural hole analysis to select structural holes in subgroups as an initial seeds for a cluster analysis. First, we gather data about users' preference ratings for items and their social network information. For gathering research data, we develop a data collection system. Then, we perform structural hole analysis and find structural holes of social network. Next, we use these structural holes as cluster centroids for the clustering algorithm. Finally, this study makes recommendations using CF within user's cluster, and compare the recommendation performances of comparative models. For implementing experiments of the proposed model, we composite the experimental results from two experiments. The first experiment is the structural hole analysis. For the first one, this study employs a software package for the analysis of social network data - UCINET version 6. The second one is for performing modified clustering, and CF using the result of the cluster analysis. We develop an experimental system using VBA (Visual Basic for Application) of Microsoft Excel 2007 for the second one. This study designs to analyzing clustering based on a novel similarity measure - Pearson correlation between user preference rating vectors for the modified clustering experiment. In addition, this study uses 'all-but-one' approach for the CF experiment. In order to validate the effectiveness of our proposed model, we apply three comparative types of CF models to the same dataset. The experimental results show that the proposed model outperforms the other comparative models. In especial, the proposed model significantly performs better than two comparative modes with the cluster analysis from the statistical significance test. However, the difference between the proposed model and the naive model does not have statistical significance.

Airborne Hyperspectral Imagery availability to estimate inland water quality parameter (수질 매개변수 추정에 있어서 항공 초분광영상의 가용성 고찰)

  • Kim, Tae-Woo;Shin, Han-Sup;Suh, Yong-Cheol
    • Korean Journal of Remote Sensing
    • /
    • v.30 no.1
    • /
    • pp.61-73
    • /
    • 2014
  • This study reviewed an application of water quality estimation using an Airborne Hyperspectral Imagery (A-HSI) and tested a part of Han River water quality (especially suspended solid) estimation with available in-situ data. The estimation of water quality was processed two methods. One is using observation data as downwelling radiance to water surface and as scattering and reflectance into water body. Other is linear regression analysis with water quality in-situ measurement and upwelling data as at-sensor radiance (or reflectance). Both methods drive meaningful results of RS estimation. However it has more effects on the auxiliary dataset as water quality in-situ measurement and water body scattering measurement. The test processed a part of Han River located Paldang-dam downstream. We applied linear regression analysis with AISA eagle hyperspectral sensor data and water quality measurement in-situ data. The result of linear regression for a meaningful band combination shows $-24.847+0.013L_{560}$ as 560 nm in radiance (L) with 0.985 R-square. To comparison with Multispectral Imagery (MSI) case, we make simulated Landsat TM by spectral resampling. The regression using MSI shows -55.932 + 33.881 (TM1/TM3) as radiance with 0.968 R-square. Suspended Solid (SS) concentration was about 3.75 mg/l at in-situ data and estimated SS concentration by A-HIS was about 3.65 mg/l, and about 5.85mg/l with MSI with same location. It shows overestimation trends case of estimating using MSI. In order to upgrade value for practical use and to estimate more precisely, it needs that minimizing sun glint effect into whole image, constructing elaborate flight plan considering solar altitude angle, and making good pre-processing and calibration system. We found some limitations and restrictions such as precise atmospheric correction, sample count of water quality measurement, retrieve spectral bands into A-HSI, adequate linear regression model selection, and quantitative calibration/validation method through the literature review and test adopted general methods.

Accuracy Evaluation of Daily-gridded ASCAT Satellite Data Around the Korean Peninsula (한반도 주변 해역에서의 ASCAT 해상풍 격자 자료의 정확성 평가)

  • Park, Jinku;Kim, Dae-Won;Jo, Young-Heon;Kim, Deoksu
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.2_1
    • /
    • pp.213-225
    • /
    • 2018
  • In order to access the accuracy of the gridded daily Advanced Scatterometer (hereafter DASCAT) ocean surface wind data in the surrounding of Korea, the DASCAT was compared with the wind data from buoys. In addition, the reanalysis data for wind at 10 m provided by European Centre for Medium-Range Weather Forecasts (ECMWF, hereafter ECMWF), National Centers for Environmental Prediction and National Center for Atmospheric Research (NCEP/NCAR, hereafter NCEP), Modern Era Retrospective-analysis for Research and Applications-2 (MERRA-2, hereafter MERRA) were compared and analyzed. As a result, the RMSE of DASCAT for the actual wind speed is about 3 m/s. The zonal components of wind of buoys and the DASCAT have strong correlation more than 0.8 and the meridional components of wind them have lower correlation than that of zonal wind and are the lowest in the Yellow Sea (r=0.7). When the actual wind speed is below 10 m/s, the EMCWF has the highest accuracy, followed by DASCAT, MERRA, and NCEP. However, under the wind speed more than 10 m/s, DASCAT shows the highest accuracy. In the nature of error according to the wind direction, when the zonal wind is strong, all dataset has the error of more than $70^{\circ}$ on the average. On the other hand, the RMSE of wind direction was recorded $50^{\circ}$ under the strong meridional winds. ECMWF shows the highest accuracy in these results. The RMSE of the wind speed according to the wind direction varied depending on the actual wind direction. Especially, MERRA has the highest RMSE under the westerly and southerly wind condition, while the NCEP has the highest RMSE under the easterly and northerly wind condition.

Estimation of Moisture Content in Cucumber and Watermelon Seedlings Using Hyperspectral Imagery (초분광영상 이용 오이 및 수박 묘의 수분함량 추정)

  • Kim, Seong-Heon;Kang, Jeong-Gyun;Ryu, Chan-Seok;Kang, Ye-Seong;Sarkar, Tapash Kumar;Kang, Dong Hyeon;Ku, Yang-Gyu;Kim, Dong-Eok
    • Journal of Bio-Environment Control
    • /
    • v.27 no.1
    • /
    • pp.34-39
    • /
    • 2018
  • This research was conducted to estimate moisture content in cucurbitaceae seedlings, such as cucumber and watermelon, using hyperspectral imagery. Using a hyperspectral image acquisition system, the reflectance of leaf area of cucumber and watermelon seedlings was calculated after providing water stress. Then, moisture content in each seedling was measured by using a dry oven. Finally, using reflectance and moisture content, the moisture content estimation models were developed by PLSR analysis. After developing the estimation models, performance of the cucumber showed 0.73 of $R^2$, 1.45% of RMSE, and 1.58% of RE. Performance of the watermelon showed 0.66 of $R^2$, 1.06% of RMSE, and 1.14% of RE. The model performed slightly better after removing one sample from cucumber seedlings as outlier and unnecessary. Hence, the performance of new model for cucumber seedlings showed 0.79 of $R^2$, 1.10% of RMSE, and 1.20% of RE. The model performance combined with all samples showed 0.67 of $R^2$, 1.26% of RMSE, and 1.36% of RE. The model of cucumber showed better performance than the model of watermelon. This is because variables of cucumber are consisted of widely distributed variation, and it affected the performance. Further, accuracy and precision of the cucumber model were increased when an insignificant sample was eliminated from the dataset. Finally, it is considered that both models can be significantly used to estimate moisture content, as gradients of trend line are almost same and intersected. It is considered that the accuracy and precision of the estimating models possibly can be improved, if the models are constructed by using variables with widely distributed variation. The improved models will be utilized as the basis for developing low-priced sensors.

Inferring the Transit Trip Destination Zone of Smart Card User Using Trip Chain Structure (통행사슬 구조를 이용한 교통카드 이용자의 대중교통 통행종점 추정)

  • SHIN, Kangwon
    • Journal of Korean Society of Transportation
    • /
    • v.34 no.5
    • /
    • pp.437-448
    • /
    • 2016
  • Some previous researches suggested a transit trip destination inference method by constructing trip chains with incomplete(missing destination) smart card dataset obtained on the entry fare control systems. To explore the feasibility of the transit trip destination inference method, the transit trip chains are constructed from the pre-paid smart card tagging data collected in Busan on October 2014 weekdays by tracing the card IDs, tagging times(boarding, alighting, transfer), and the trip linking distances between two consecutive transit trips in a daily sequences. Assuming that most trips in the transit trip chains are linked successively, the individual transit trip destination zones are inferred as the consecutive linking trip's origin zones. Applying the model to the complete trips with observed OD reveals that about 82% of the inferred trip destinations are the same as those of the observed trip destinations and the inference error defined as the difference in distance between the inferred and observed alighting stops is minimized when the trip linking distance is less than or equal to 0.5km. When applying the model to the incomplete trips with missing destinations, the overall destination missing rate decreases from 71.40% to 21.74% and approximately 77% of the destination missing trips are the single transit trips for which the destinations can not be inferable. In addition, the model remarkably reduces the destination missing rate of the multiple incomplete transit trips from 69.56% to 6.27%. Spearman's rank correlation and Chi-squared goodness-of-fit tests showed that the ranks for transit trips of each zone are not significantly affected by the inferred trips, but the transit trip distributions only using small complete trips are significantly different from those using complete and inferred trips. Therefore, it is concluded that the model should be applicable to derive a realistic transit trip patterns in cities with the incomplete smart card data.

Using Spatial Data and Crop Growth Modeling to Predict Performance of South Korean Rice Varieties Grown in Western Coastal Plains in North Korea (공간정보와 생육모의에 의한 남한 벼 품종의 북한 서부지대 적응성 예측)

  • 김영호;김희동;한상욱;최재연;구자민;정유란;김재영;윤진일
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.4 no.4
    • /
    • pp.224-236
    • /
    • 2002
  • A long-term growth simulation was performed at 496 land units in the western coastal plains (WCP) of North Korea to test the potential adaptability of each land unit for growing South Korean rice cultivars. The land units for rice cultivation (CZU), each of them represented by a geographically referenced 5 by 5 km grid tell, were identified by analyzing satellite remote sensing data. Surfaces of monthly climatic normals for daily maximum and minimum temperature, precipitation number of rain days and solar radiation were generated at a 1 by 1 km interval by spatial statistical methods using observed data at 51 synoptic weather stations in North and South Korea during 1981-2000. Grid cells felling within a same CZU and, at the same time, corresponding to the satellite data- identified rice growing pixels were extracted and aggregated to make a spatially explicit climatic normals relevant to the rice growing area of the CZU. Daily weather dataset for 30 years was randomly generated from the monthly climatic normals of each CZU. Growth and development parameters of CERES-rice model suitable for 11 major South Korean cultivars were derived from long-term field observations. Eight treatments comprised of 2 transplanting dates $\times$ 2 cropping systems $\times$ 2 irrigation methods were assigned to each cultivar. Each treatment was simulated with the randomly generated 30 years' daily weather data (from planting to physiological maturity) for 496 land units in WCP to simulate the growth and yield responses to the interannual climate variation. The same model was run with the input data from the 3 major crop experiment stations in South Korea to obtain a 30 year normal performance of each cultivar, which was used as a "reference" for comparison. Results were analyzed with respect to spatial and temporal variation in yield and maturity, and used to evaluate the suitability of each land unit for growing a specific South Korean cultivar. The results may be utilized as decision aids for agrotechnology transfer to North Korea, for example, germplasm evaluation, resource allocation and crop calendar preparation.

Development of a Detection Model for the Companies Designated as Administrative Issue in KOSDAQ Market (KOSDAQ 시장의 관리종목 지정 탐지 모형 개발)

  • Shin, Dong-In;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.157-176
    • /
    • 2018
  • The purpose of this research is to develop a detection model for companies designated as administrative issue in KOSDAQ market using financial data. Administration issue designates the companies with high potential for delisting, which gives them time to overcome the reasons for the delisting under certain restrictions of the Korean stock market. It acts as an alarm to inform investors and market participants of which companies are likely to be delisted and warns them to make safe investments. Despite this importance, there are relatively few studies on administration issues prediction model in comparison with the lots of studies on bankruptcy prediction model. Therefore, this study develops and verifies the detection model of the companies designated as administrative issue using financial data of KOSDAQ companies. In this study, logistic regression and decision tree are proposed as the data mining models for detecting administrative issues. According to the results of the analysis, the logistic regression model predicted the companies designated as administrative issue using three variables - ROE(Earnings before tax), Cash flows/Shareholder's equity, and Asset turnover ratio, and its overall accuracy was 86% for the validation dataset. The decision tree (Classification and Regression Trees, CART) model applied the classification rules using Cash flows/Total assets and ROA(Net income), and the overall accuracy reached 87%. Implications of the financial indictors selected in our logistic regression and decision tree models are as follows. First, ROE(Earnings before tax) in the logistic detection model shows the profit and loss of the business segment that will continue without including the revenue and expenses of the discontinued business. Therefore, the weakening of the variable means that the competitiveness of the core business is weakened. If a large part of the profits is generated from one-off profit, it is very likely that the deterioration of business management is further intensified. As the ROE of a KOSDAQ company decreases significantly, it is highly likely that the company can be delisted. Second, cash flows to shareholder's equity represents that the firm's ability to generate cash flow under the condition that the financial condition of the subsidiary company is excluded. In other words, the weakening of the management capacity of the parent company, excluding the subsidiary's competence, can be a main reason for the increase of the possibility of administrative issue designation. Third, low asset turnover ratio means that current assets and non-current assets are ineffectively used by corporation, or that asset investment by corporation is excessive. If the asset turnover ratio of a KOSDAQ-listed company decreases, it is necessary to examine in detail corporate activities from various perspectives such as weakening sales or increasing or decreasing inventories of company. Cash flow / total assets, a variable selected by the decision tree detection model, is a key indicator of the company's cash condition and its ability to generate cash from operating activities. Cash flow indicates whether a firm can perform its main activities(maintaining its operating ability, repaying debts, paying dividends and making new investments) without relying on external financial resources. Therefore, if the index of the variable is negative(-), it indicates the possibility that a company has serious problems in business activities. If the cash flow from operating activities of a specific company is smaller than the net profit, it means that the net profit has not been cashed, indicating that there is a serious problem in managing the trade receivables and inventory assets of the company. Therefore, it can be understood that as the cash flows / total assets decrease, the probability of administrative issue designation and the probability of delisting are increased. In summary, the logistic regression-based detection model in this study was found to be affected by the company's financial activities including ROE(Earnings before tax). However, decision tree-based detection model predicts the designation based on the cash flows of the company.

Shallow subsurface structure of the Vulcano-Lipari volcanic complex, Italy, constrained by helicopter-borne aeromagnetic surveys (고해상도 항공자력탐사를 이용한 Italia Vulcano-Lipari 화산 복합체의 천부 지하 구조)

  • Okuma, Shigeo;Nakatsuka, Tadashi;Komazawa, Masao;Sugihara, Mitsuhiko;Nakano, Shun;Furukawa, Ryuta;Supper, Robert
    • Geophysics and Geophysical Exploration
    • /
    • v.9 no.1
    • /
    • pp.129-138
    • /
    • 2006
  • Helicopter-borne aeromagnetic surveys at two different times separated by three years were conducted to better understand the shallow subsurface structure of the Vulcano and Lipari volcanic complex, Aeolian Islands, southern Italy, and also to monitor the volcanic activity of the area. As there was no meaningful difference between the two magnetic datasets to imply an apparent change of the volcanic activity, the datasets were merged to produce an aeromagnetic map with wider coverage than was given by a single dataset. Apparent magnetisation intensity mapping was applied to terrain-corrected magnetic anomalies, and showed local magnetisation highs in and around Fossa Cone, suggesting heterogeneity of the cone. Magnetic modelling was conducted for three of those magnetisation highs. Each model implied the presence of concealed volcanic products overlain by pyroclastic rocks from the Fossa crater. The model for the Fossa crater area suggests a buried trachytic lava flow on the southern edge of the present crater. The magnetic model at Forgia Vecchia suggests that phreatic cones can be interpreted as resulting from a concealed eruptive centre, with thick latitic lavas that fill up Fossa Caldera. However, the distribution of lavas seems to be limited to a smaller area than was expected from drilling results. This can be explained partly by alteration of the lavas by intense hydrothermal activity, as seen at geothermal areas close to Porto Levante. The magnetic model at the north-eastern Fossa Cone implies that thick lavas accumulated as another eruption centre in the early stage of the activity of Fossa. Recent geoelectric surveys showed high-resistivity zones in the areas of the last two magnetic models.

Gender Roles, Accessibility, and Gendered Spatiality (성역할, 접근성, 그리고 젠더화된 공간성)

  • Kim, Hyun-Mi
    • Journal of the Korean Geographical Society
    • /
    • v.42 no.5
    • /
    • pp.808-834
    • /
    • 2007
  • This study attempts to elucidate manifold dimensions of gendered accessibility experiences. How gender roles(household responsibilities) differentiate accessibility experiences between women and men is explored through the comparison of married dual-earner couples' parental status, using the US Portland activity-travel diary dataset with GIS-based geocomputation results of(time-geography based) space-time accessibility. First, this study shows how gender division of labor within the household still permeates current society, despite the widespread belief of the social change toward a gender-egalitarian society. Then, the study pays special attention to the way gender roles structure individual accessibility experiences of women and men differently, and, in turn, the way such accessibility experiences take a form of gendered spatiality. Gendered spatiality is examined through the analysis of accessibility space as well as activity space in order to ascertain women's home-attached and spatially entrapped characteristics. More household responsibilities throughout a day and, even more, the time constraint of picking up children at the daycare centers after work lead women's possible activity space to be more home-centered. The analysis of the spatio-temporal context of accessibility space makes gendered spatiality visible. However, the findings suggest that behavioral outcomes should be understood with an explicit awareness of constraints individuals face. It is because the revealed activity spaces can be not only an outcome of constraint but also an outcome of choice. Behavioral outcomes should not be treated as a straightforward expression of the level of constraints. It is problematic to expect that behavioral outcomes directly mirror the level of constraints. It is also problematic to suppose that the level of constraints can be straightforwardly elicited from revealed behavioral outcomes.

Value of Information Technology Outsourcing: An Empirical Analysis of Korean Industries (IT 아웃소싱의 가치에 관한 연구: 한국 산업에 대한 실증분석)

  • Han, Kun-Soo;Lee, Kang-Bae
    • Asia pacific journal of information systems
    • /
    • v.20 no.3
    • /
    • pp.115-137
    • /
    • 2010
  • Information technology (IT) outsourcing, the use of a third-party vendor to provide IT services, started in the late 1980s and early 1990s in Korea, and has increased rapidly since 2000. Recently, firms have increased their efforts to capture greater value from IT outsourcing. To date, there have been a large number of studies on IT outsourcing. Most prior studies on IT outsourcing have focused on outsourcing practices and decisions, and little attention has been paid to objectively measuring the value of IT outsourcing. In addition, studies that examined the performance of IT outsourcing have mainly relied on anecdotal evidence or practitioners' perceptions. Our study examines the contribution of IT outsourcing to economic growth in Korean industries over the 1990 to 2007 period, using a production function framework and a panel data set for 54 industries constructed from input-output tables, fixed-capital formation tables, and employment tables. Based on the framework and estimation procedures that Han, Kauffman and Nault (2010) used to examine the economic impact of IT outsourcing in U.S. industries, we evaluate the impact of IT outsourcing on output and productivity in Korean industries. Because IT outsourcing started to grow at a significantly more rapid pace in 2000, we compare the impact of IT outsourcing in pre- and post-2000 periods. Our industry-level panel data cover a large proportion of Korean economy-54 out of 58 Korean industries. This allows us greater opportunity to assess the impacts of IT outsourcing on objective performance measures, such as output and productivity. Using IT outsourcing and IT capital as our primary independent variables, we employ an extended Cobb-Douglas production function in which both variables are treated as factor inputs. We also derive and estimate a labor productivity equation to assess the impact of our IT variables on labor productivity. We use data from seven years (1990, 1993, 2000, 2003, 2005, 2006, and 2007) for which both input-output tables and fixed-capital formation tables are available. Combining the input-output tables and fixed-capital formation tables resulted in 54 industries. IT outsourcing is measured as the value of computer-related services purchased by each industry in a given year. All the variables have been converted to 2000 Korean Won using GDP deflators. To calculate labor hours, we use the average work hours for each sector provided by the OECD. To effectively control for heteroskedasticity and autocorrelation present in our dataset, we use the feasible generalized least squares (FGLS) procedures. Because the AR1 process may be industry-specific (i.e., panel-specific), we consider both common AR1 and panel-specific AR1 (PSAR1) processes in our estimations. We also include year dummies to control for year-specific effects common across industries, and sector dummies (as defined in the GDP deflator) to control for time-invariant sector-specific effects. Based on the full sample of 378 observations, we find that a 1% increase in IT outsourcing is associated with a 0.012~0.014% increase in gross output and a 1% increase in IT capital is associated with a 0.024~0.027% increase in gross output. To compare the contribution of IT outsourcing relative to that of IT capital, we examined gross marginal product (GMP). The average GMP of IT outsourcing was 6.423, which is substantially greater than that of IT capital at 2.093. This indicates that on average if an industry invests KRW 1 millon, it can increase its output by KRW 6.4 million. In terms of the contribution to labor productivity, we find that a 1% increase in IT outsourcing is associated with a 0.009~0.01% increase in labor productivity while a 1% increase in IT capital is associated with a 0.024~0.025% increase in labor productivity. Overall, our results indicate that IT outsourcing has made positive and economically meaningful contributions to output and productivity in Korean industries over the 1990 to 2007 period. The average GMP of IT outsourcing we report about Korean industries is 1.44 times greater than that in U.S. industries reported in Han et al. (2010). Further, we find that the contribution of IT outsourcing has been significantly greater in the 2000~2007 period during which the growth of IT outsourcing accelerated. Our study provides implication for policymakers and managers. First, our results suggest that Korean industries can capture further benefits by increasing investments in IT outsourcing. Second, our analyses and results provide a basis for managers to assess the impact of investments in IT outsourcing and IT capital in an objective and quantitative manner. Building on our study, future research should examine the impact of IT outsourcing at a more detailed industry level and the firm level.