A model of coauthorship networks Guochang Zhou, Jianping Li, and Zonglin Xie Citation: AIP Conference Proceedings 1890, 040057 (2017); View online: https://doi.org/10.1063/1.5005259 View Table of Contents: http://aip.scitation.org/toc/apc/1890/1 Published by the American Institute of Physics A Model of Coauthorship Networks Guochang Zhoua), Jianping Lib) and Zonglin Xiec) College of Science, National University of Defense Technology, Changsha, 410000, China. a) Corresponding author: [email protected] b) [email protected] c) [email protected] Abstract. A natural way of representing the coauthorship of authors is to use a generalization of graphs known as hypergraphs. A random geometric hypergraph model is proposed here to model coauthorship networks, which is generated by placing nodes on a region of Euclidean space randomly and uniformly, and connecting some nodes if the nodes satisfy particular geometric conditions. Two kinds of geometric conditions are designed to model the collaboration patterns of academic authorities and basic researches respectively. The conditions give geometric expressions of two causes of coauthorship: the authority and similarity of authors. By simulation and calculus, we show that the forepart of the degree distribution of the network generated by the model is mixture Poissonian, and the tail is power-law, which are similar to these of some coauthorship networks. Further, we show more similarities between the generated network and real coauthorship networks: the distribution of cardinalities of hyperedges, high clustering coe cient, assortativity, and small-world property Key words: Coauthorship Networks; Hypergraphs; Simulation and Calculus; Power Law Distribution. INTRODUCTION Coauthorship networks are constructed from databases of papers, in which the nodes are authors, and two authors are connected if they have coauthored one or more papers. The famous Erdös number, named for Paul Erdös, is the shortest distance between the authors and Erdös in a coauthorship network [1]. Researching the coauthorship networks extracted from scienti¿c papers provides a window on patterns of collaboration within science. Newman showed that some of such networks are scale free and form small-worlds, and studied a variety of statistical properties of those networks, including numbers of papers written by authors, numbers of authors per paper, numbers of collaborators that scientists have, existence and size of giant component, clustering coe cient, and assortativity [2]. Barabási et al analysed the coauthorship network from the electronic database containing all relevant journals in mathematics and neuroscience, and found that evolution of some coauthorship networks is governed by preferential attachment [3]. In addition, they proposed a model to capture the network’s time evolution. The small world property and preferential attachment mechanism of coauthorship networks are also founded in several databases are deeply analyzed by many researchers [4–9]. A hypergraph is a generalization of a network or graph in which links-called hyperedges-can connect any number of nodes. Since the author list in a paper can be just right considered as a hyperedge, hypergraph is a good model to represent the coauthorship networks [10]. Hu et al gave a hypergraph model for coauthorship. Their model is originated from the preferential attachment model, and the generated network is scale-free. Liu et al proposed a so-called knowledge generation mode using hyperdegree preferential attachment mechanisms, and used it to model the coauthorship networks. The connection mechanisms of the existed hypergraph models for coauthorship networks are mainly induced by preferential attachment: new connections are made preferentially to academic authorities. Some research results show that collaboration patterns are geographically localised, namely relate to the similarity of geography [11]. 2nd International Conference on Materials Science, Resource and Environmental Engineering (MSREE 2017) AIP Conf. Proc. 1890, 040057-1–040057-6; https://doi.org/10.1063/1.5005259 Published by AIP Publishing. 978-0-7354-1568-3/$30.00 040057-1 Physical distance has a negative e ect on collaborations. In addition, the common research interest-the similarity of research interest-is the foundation of the collaborations. Hence, similarity also plays an important roles in coauthorship. Papadopoulos et al proposed a network model where new links, instead of preferring popular nodes, optimize certain trade-o s between popularity and similarity [12]. The model has an interesting geometric interpretation: random geometric graph (RGG) on hyperbolic space. A RGG is a graph drawn on a bounded region. It is generated by placing nodes on the region randomly and uniformly, and connecting two nodes if the distance between them is no more than a given threshold. The RGG model on some geometries can generate scale-free network and has been applied to model citation networks. We analyzed two coauthorship networks: one is from 72,269 papers published in mathematical journals in the years 1956-2013 containing by the dataset DBLP(http://dblp.uni-trier.de/xml/dblp), and the other one is from 3,007 interdisciplinary papers published in the Proceedings of the National Academy of Sciences(PNAS, http://pnas.org) in the years 1999-2013. We propose a random geometric hypergraph model. Sprinkle a number of nodes as potential authors onto a bounded region uniformly and randomly, and select some nodes randomly as corresponding authors. Each node attaches a disc, called inÀuence region. The subset of some nodes contained in an inÀuence region is de¿ned as a hyperedge. We design two kinds of inÀuence regions and corresponding connection mechanisms to model the di erent collaboration patterns of academic authorities and basic researches respectively. The inÀuence region expresses the node authority by its area and node similarity by its location. By simulation, calculus, and analysis, we show that two connection mechanisms cause that the forepart of the degree distribution of the generated network is mixture Poissonian, and the tail of that is power-law, which are similar to the distributions of some coauthorship networks. Moreover, we show that the hyperedge cardinality distribution, high clustering coe cient, assortativity, and small-world property of the network generated by the model are all similar to those of the coauthorship networks. THE MODEL (V , E ) , where V is a set of nodes, and E is a set of non-empty subsets of V called hyperedges. Consider a 2 -dimensional Euclidean space 2 with polar coordinates {T , r } . Let D be a disc in 2 . Assume its radius is R \ , and its center is at the origin. Let V be a set of nodes and E be a list of hyperedges. A hypergraph H is a pair H The geometric hypergraph model is constructed on this disc as follows, 1. Sprinkle N ] nodes uniformly and randomly onto D 2. For t 1 to T1 ] do D (T i )c ; D 2.1. Select a node i randomly, and generate a disc i centered on i with area j Dj p C C to i with probability 1 ; 2.2. Initialize a null set i , and add C 2.3. Append i to E ; C 2.4. Add the nodes in i to V . 3. For t 1 T2 ] to do For s 1 to m ] do D (Ti )t I 3.1. Select a node i randomly, and generate a disc i centered on i with area p j Ii C C 3.2. Initialize a null set i , and add to i with probability 2 ; C 3.3. Divide i into some subsets, and add i into each subset; 3.4. Append those subsets to E ; 3.5. Add the nodes in C i to V . E ; In the model, T i is the angular coordinate of node i , E , c \ and p1 , p2 [0,1] . The function D (T i ) is a piecewise constant non-negative function, i.e. 040057-2 D (T ) °1, T ® °¯ 0.25, ¬ª 0, 0.5 S ; (1) ª¬ 0.5 S , 2S . It gives an expression of the inhomogenous coauthor scale of different research fields. For example, the length of author list of a biological paper is often longer than than of a mathematical paper. The distribution of the cardinalities of those subsets in Step 3.3 follows a function f . The function f influences on the hyperedge cardinality distribution. The cardinality of a hyperedge is the number of nodes contained by this hyperedge. The hyperedge cardinality distribution is the probability distribution of these cardinalities over the whole hypergraph. We found that the hyperedge cardinality distribution of the empirical data described in section 3 approximately follow the generalized Poisson distribution f ( s) a(a bs ) s 1 e a bs s! (2) Where a, b R , and s Z .Hence, we choose the f which makes that the hyperedge cardinality distribution of the generated hypergraph also follows Eq. (2). The assumption of the disc shaped D is not necessary, but brings convenience for calculating the degree distributions. At the beginning, the nodes on D can be regarded as potential authors. We call a node corresponding author, if it is selected in the for loops. The definitions of Di and I i give a geometric expression of node similarity: if node i belongs to the influence region of node j , we say that i is similar to j . The definition of I i also gives a geometric expression of node authority: the nodes with larger I i have more chances to attract connections. The first connection mechanism, step 2, models the coauthor behavior of basic authors who often join one research group and coauthor with few and fixed researchers (Fig.3 (a)). The small value of c in step 2.1 makes the selec-ted nodes by step 2 have small number of neighbors and often belong to a small size component. The second cone-ction mechanism, step 3, models the coauthor behavior of academic authorities, who join several research groups, and whose neighbors in different groups don't coauthor generally (Fig.3 (b)). The parameters of the network generated the model(Table 1) are N 30, 000 , T1 9, 000 , T2 7, 000 , m 10 , 1. D (T ) Is given by Eq. (1). In step 3.3, we let 20% subsets with 2 cardinalities 2, 50% subsets with cardinalities 3, and 30% subsets with cardinalities 4. R 10 , c 0.01 , p1 p2 1 , and E THE DATA The dataset of DBLP provides open bibliographic information on major computer science journals and proceedings. Based on the bibliographic information of 72,269 papers published in 54 mathematical journals, we constructed a coauthorship network: DBLP Math. Another corpus we considered here is the paper set of 3,007 interdisciplinary papers in PNAS 1999--2013. In those networks, we identify authors by their names in their articles. This method is reliable in distinguishing authors from one another, but it mistakes one author as two if the author makes any changes in his name in different papers. Also, two authors may have the same name, which we can not to distinguish. The distributions of hyperedge cardinality, degree and length of shortest paths of some networks in Table 1 are fitted by corresponding functions respectively. The paraments of the functions are estimated by cftool: a curve fitting toolbox in MATLAB. Four statistical measures: The sum of squares due to error (SSE), Root mean squared error (RMSE), Coefficient of determination ( R2 ), and Degree-of-freedom adjusted coefficient of determination (Adjusted R 2 ) are used for measuring the goodness of fits. The tails of degree distributions are fitted by Clauset et al's method [13-16]. The fitting function is 040057-3 f (k ) k E f ¦ (n x min ) E (3) n 0 Where E is the scaling exponent and xmin is the lower bound of the power-law behavior. Here, the parameters E and xmin are calculated by Clauset et al's programs (http://tuvalu.santafe.edu/ aaronc/powerlaws). The p-value (p) and the maximum distance between the cumulative distribution functions of the data and the fitted function (gof) are also calculated by their program to show the goodness of fit tests. CONCLUSION We propose a random geometric hypergraph model for coauthorship networks. Two kinds of connection mechanisms of the model are designed to model the different coauthor patterns of academic authorities and basic researches respectively. The connection mechanisms give a geometric realization of that connections are made preferentially to more authority nodes and to more similar nodes. By simulation and calculus, we show that the degree distribution the distribution of cardinalities of hyperedge, high clustering coefficient, assortativity, and smallworld property of the network generated by the model are similar to those of some coauthorship networks. Figures FIGURE 1. Panels (a,b) shows the inference region and subgraphs generated by the ¿rst and second mechanism respectively. Panel(c) shows the combination of two subgraphs. FIGURE 2. Degree distributions of some networks in Fig.5 040057-4 FIGURE 3. The average degree of neighbors of nodes with degree k of some networks in Fig.5 FIGURE 4. The average degree of neighbors of nodes with degree k of some networks in Fig.5 FIGURE 5. Size, clustering coeƥcient (CC), and assortativity coeƥcient (AC) for some networks REFERENCES 1. 2. 3. Hoffman P (1998) the man who loved only numbers. Hyperion, New York. Newman MEJ (2001) the structure of scientific collaboration networks. Proc Natl Acad Sci USA 98: 404-409. Barabási AL, Jeong H, Nda Z, Ravasz E, Schubert A, et al. (2002) Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications 311: 590-614. 4. Moody J (2004) the strucutre of a social science collaboration network: Disciplinery cohesion form 1963 to 1999. Am Sociol Rev 69(2):213-238? 5. Mali F, Kronegger L, Ferligoj A (2010) Co-authorship trends and collaboration patterns in the Slovenian sociological community. Corvinus J Sociol Soc Policy 1(2):29-50. 6. Perc C (2010) Growth and structure of Slovenia's scientific collaboration network. J Informetr 4(4): 475-482. 7. Wagner CS, Leydesdorff L (2005) Network structure, self-organization, and the growth of international collaboration in science. Res Policy 34(10):1608-1618. 8. Tomassini M, Luthi L (2007) Empirical analysis of the evolution of a scientific collaboration network. Physica a 285(2): 750-764. 9. Zhou T, Wang BH, Jin YD, He DR, Zhang PP, He Y, et al. (2007) Modeling collaboration networks based on nonlinear preferential attachment, Int J Mod Phys C 18: 297-314. 10. Estrada E, RodrĚguez-Velázquez JA (2006) Subgraph centrality and clustering in complex hyper-networks. Physica A 364: 581¨C594. 040057-5 11. Papadopoulos F, Kitsak M, Serrano MA, Boguná M, Krioukov D (2012) Popularity versus similarity in growing networks. Nature 489: 537-540. 12. Xie Z, Ouyang ZZ, Zhang PY, Yi DY, Kong DX (2015) Modeling the citation network by network cosmology. Plos One (accepted). 13. Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in emprical data. SIAM Rev 51: 661-703. 14. Xie Z, Ouyang ZZ, Li JP (2016) a geometric graph model for coauthorship networks. J Informetr 10: 299-311. 15. Xie Z, Ouyang ZZ, Dong EM, Yi DY, Li JP (2016) Modelling transition phenomena of scientific coauthorship networks. ArXiv: 1604.08891. 16. Xie Z, Xie ZL, Li M, Li JP, Yi DY (2017) Modeling the coevolution between citations and coauthorship of scientific papers. Scientometrics 112, 483-507. 040057-6

1/--страниц