Symbolic data analysis is based on special descriptions of data known as symbolic objects (SOs). Such descriptions preserve more detailed information about units and their clusters than the usual representations with mean values. A special type of SO is a representation with frequency or probability distributions (modal values). This representation enables us to simultaneously consider variables of all measurement types during the clustering process. In this paper, we present the theoretical basis for compatible leaders and agglomerative clustering methods with alternative dissimilarities for modal-valued SOs. The leaders method efficiently solves clustering problems with large numbers of units, while the agglomerative method can be applied either alone to a small data set, or to leaders, obtained from the compatible leaders clustering method. We focus on (a) the inclusion of weights that enables clustering representatives to retain the same structure as if clustering only first order units and (b) the selection of relative dissimilarities that produce more interpretable, i.e., meaningful optimal clustering representatives. The usefulness of the proposed methods with adaptations was assessed and substantiated by carefully constructed simulation settings and demonstrated on three different real-world data sets gaining in interpretability from the use of weights (population pyramids and ESS data) or relative dissimilarity (US patents data).
This conclusion presents some closing thoughts on the concepts covered in the preceding chapters of this book and makes additional suggestions regarding potential future work. The book presents a wide variety of approaches and methods related to network clustering. It suggests that the institutional structure of science has a very large impact on the generation of scientific knowledge and the generation of scientific citation networks. The book advocates the optimization approach to the clustering problem. Using an appropriate criterion function, people can express their clustering goals, including the reduction of complexity, understanding network structures, and modeling networks. Together, optimization and the sought goals help define the nature of a “good” clustering. The book finishes by stressing two very general ideas. One is the importance of the exchange of ideas between different approaches with the goal of strengthening them. The second is the coupling of network processes and network structures to help readers understand both.
Статья посвящена концепту цифровая грамотность и истории его измерения.
Different research traditions have developed over time to study the quantitative aspects of information and knowledge production, such as scientometrics, bibliometrics, librametrics, informetrics, cybermetrics, webometrics, or altmetrics. These information metrics, or iMetrics, as they were labeled by Milojević and Leydesdorff in Scientometrics 95(1):141–157, 2013, are unified by the usage of quantitative data analysis, applying various statistical methods and techniques and are often used to supplement and complement each other. Representing different research traditions, they jointly form a common research field, a “discipline with many names”. In this article, we look at the development of iMetrics field and its evolution over time using bibliometric network analysis and identify its common basis, formed by the most important publications, journals, scholars and topics. The dataset consists of articles from the Web of Science database (26,414 records with complete descriptions). Analyzing the citation network, we evaluate the field’s growth and identify the most cited works. Using the Search path count (SPC) approach, we extract the Main path, Key routes paths, and Link islands in the citation network. The results show that in the last forty years the number of published papers increased, and it doubles every 8 years; the number of single author papers dropped from 50 to 10 %, and the number of papers authored by 3 or more authors is increasing. We make the conclusions about the field’s development and its current state. We also present the main authors, journals and keywords from the field, which form its common basis.
We present two ways (instantaneous and cumulative) to transform bibliographic networks, using the works’ publication year, into corresponding temporal networks based on temporal quantities. We also show how to use the addition of temporal quantities to define interesting temporal properties of nodes, links and their groups thus providing an insight into evolution of bibliographic networks. Using the multiplication of temporal networks we obtain different derived temporal networks providing us with new views on studied networks. The proposed approach is illustrated with examples from the collection of bibliographic networks on peer review.
This paper presents the results of the analysis of keywords used in Social Network Analysis (SNA) articles included in the WoS database and main SNA journals, from 1970 to 2018. 32,409 keywords were obtained from 70,792 works with complete descriptions. We provide a list of the most used keywords and show subgroups of keywords which are connected to each other. To go deeper, we place the keywords into the contexts of selected groups of authors and journals. We use temporal analysis to get an insight into some keyword usage. The distributions of the number of keyword types and tokens over time show fast growth starting from 2010s, which is the result of the growth in the number of articles on SNA topics and applications of SNA in various scientific fields. Even though the most frequently used keywords are trivial or general, the approaches used for the normalization of network link weights allow us to extract keywords representing substantive topics and methodological issues in SNA.