close

Вход

Забыли?

вход по аккаунту

?

1.5005259

код для вставкиСкачать
A model of coauthorship networks
Guochang Zhou, Jianping Li, and Zonglin Xie
Citation: AIP Conference Proceedings 1890, 040057 (2017);
View online: https://doi.org/10.1063/1.5005259
View Table of Contents: http://aip.scitation.org/toc/apc/1890/1
Published by the American Institute of Physics
A Model of Coauthorship Networks
Guochang Zhoua), Jianping Lib) and Zonglin Xiec)
College of Science, National University of Defense Technology, Changsha, 410000, China.
a)
Corresponding author: [email protected]
b)
[email protected]
c)
[email protected]
Abstract. A natural way of representing the coauthorship of authors is to use a generalization of graphs known as
hypergraphs. A random geometric hypergraph model is proposed here to model coauthorship networks, which is
generated by placing nodes on a region of Euclidean space randomly and uniformly, and connecting some nodes if the
nodes satisfy particular geometric conditions. Two kinds of geometric conditions are designed to model the collaboration
patterns of academic authorities and basic researches respectively. The conditions give geometric expressions of two
causes of coauthorship: the authority and similarity of authors. By simulation and calculus, we show that the forepart of
the degree distribution of the network generated by the model is mixture Poissonian, and the tail is power-law, which are
similar to these of some coauthorship networks. Further, we show more similarities between the generated network and
real coauthorship networks: the distribution of cardinalities of hyperedges, high clustering coe cient, assortativity, and
small-world property
Key words: Coauthorship Networks; Hypergraphs; Simulation and Calculus; Power Law Distribution.
INTRODUCTION
Coauthorship networks are constructed from databases of papers, in which the nodes are authors, and two
authors are connected if they have coauthored one or more papers. The famous Erdös number, named for Paul Erdös,
is the shortest distance between the authors and Erdös in a coauthorship network [1]. Researching the coauthorship
networks extracted from scienti¿c papers provides a window on patterns of collaboration within science. Newman
showed that some of such networks are scale free and form small-worlds, and studied a variety of statistical
properties of those networks, including numbers of papers written by authors, numbers of authors per paper,
numbers of collaborators that scientists have, existence and size of giant component, clustering coe cient, and
assortativity [2]. Barabási et al analysed the coauthorship network from the electronic database containing all
relevant journals in mathematics and neuroscience, and found that evolution of some coauthorship networks is
governed by preferential attachment [3]. In addition, they proposed a model to capture the network’s time evolution.
The small world property and preferential attachment mechanism of coauthorship networks are also founded in
several databases are deeply analyzed by many researchers [4–9].
A hypergraph is a generalization of a network or graph in which links-called hyperedges-can connect any
number of nodes. Since the author list in a paper can be just right considered as a hyperedge, hypergraph is a good
model to represent the coauthorship networks [10]. Hu et al gave a hypergraph model for coauthorship. Their model
is originated from the preferential attachment model, and the generated network is scale-free. Liu et al proposed a
so-called knowledge generation mode using hyperdegree preferential attachment mechanisms, and used it to model
the coauthorship networks.
The connection mechanisms of the existed hypergraph models for coauthorship networks are mainly induced by
preferential attachment: new connections are made preferentially to academic authorities. Some research results
show that collaboration patterns are geographically localised, namely relate to the similarity of geography [11].
2nd International Conference on Materials Science, Resource and Environmental Engineering (MSREE 2017)
AIP Conf. Proc. 1890, 040057-1–040057-6; https://doi.org/10.1063/1.5005259
Published by AIP Publishing. 978-0-7354-1568-3/$30.00
040057-1
Physical distance has a negative e ect on collaborations. In addition, the common research interest-the similarity of
research interest-is the foundation of the collaborations. Hence, similarity also plays an important roles in
coauthorship. Papadopoulos et al proposed a network model where new links, instead of preferring popular nodes,
optimize certain trade-o s between popularity and similarity [12]. The model has an interesting geometric
interpretation: random geometric graph (RGG) on hyperbolic space. A RGG is a graph drawn on a bounded region.
It is generated by placing nodes on the region randomly and uniformly, and connecting two nodes if the distance
between them is no more than a given threshold. The RGG model on some geometries can generate scale-free
network and has been applied to model citation networks.
We analyzed two coauthorship networks: one is from 72,269 papers published in mathematical journals in the
years 1956-2013 containing by the dataset DBLP(http://dblp.uni-trier.de/xml/dblp), and the other one is from 3,007
interdisciplinary papers published in the Proceedings of the National Academy of Sciences(PNAS, http://pnas.org)
in the years 1999-2013. We propose a random geometric hypergraph model. Sprinkle a number of nodes as potential
authors onto a bounded region uniformly and randomly, and select some nodes randomly as corresponding authors.
Each node attaches a disc, called inÀuence region. The subset of some nodes contained in an inÀuence region is
de¿ned as a hyperedge. We design two kinds of inÀuence regions and corresponding connection mechanisms to
model the di erent collaboration patterns of academic authorities and basic researches respectively. The inÀuence
region expresses the node authority by its area and node similarity by its location. By simulation, calculus, and
analysis, we show that two connection mechanisms cause that the forepart of the degree distribution of the generated
network is mixture Poissonian, and the tail of that is power-law, which are similar to the distributions of some
coauthorship networks. Moreover, we show that the hyperedge cardinality distribution, high clustering coe cient,
assortativity, and small-world property of the network generated by the model are all similar to those of the
coauthorship networks.
THE MODEL
(V , E ) , where V is a set of nodes, and E is a set of non-empty subsets of V called hyperedges. Consider a 2 -dimensional Euclidean space 2 with polar coordinates {T , r } . Let D be a disc in 2 .
Assume its radius is R  \ , and its center is at the origin. Let V be a set of nodes and E be a list of hyperedges.
A hypergraph H is a pair H
The geometric hypergraph model is constructed on this disc as follows,
1. Sprinkle N  ] nodes uniformly and randomly onto D
2. For t 1 to T1  ] do
D (T i )c ;
D
2.1. Select a node i randomly, and generate a disc i centered on i with area
j  Dj
p
C
C
to i with probability 1 ;
2.2. Initialize a null set i , and add
C
2.3. Append i to E ;
C
2.4. Add the nodes in i to V .
3. For t 1 T2  ] to do
For s 1 to m  ] do
D (Ti )t
I
3.1. Select a node i randomly, and generate a disc i centered on i with area
p
j  Ii
C
C
3.2. Initialize a null set i , and add
to i with probability 2 ;
C
3.3. Divide i into some subsets, and add i into each subset;
3.4. Append those subsets to E ;
3.5. Add the nodes in C i to V .
E
;
In the model, T i is the angular coordinate of node i , E , c  \ and p1 , p2  [0,1] . The function D (T i ) is a
piecewise constant non-negative function, i.e.
040057-2
D (T )
­°1, T 
®
°¯ 0.25,
» 0, 0.5 S ;
(1)
ª¬ 0.5 S , 2S .
It gives an expression of the inhomogenous coauthor scale of different research fields. For example, the length of
author list of a biological paper is often longer than than of a mathematical paper.
The distribution of the cardinalities of those subsets in Step 3.3 follows a function f . The function f influences on the hyperedge cardinality distribution. The cardinality of a hyperedge is the number of nodes contained by
this hyperedge. The hyperedge cardinality distribution is the probability distribution of these cardinalities over the
whole hypergraph. We found that the hyperedge cardinality distribution of the empirical data described in section 3
approximately follow the generalized Poisson distribution
f ( s)
a(a bs ) s 1
e a bs
s!
(2)
Where a, b  R , and s  Z .Hence, we choose the f which makes that the hyperedge cardinality distribution of the
generated hypergraph also follows Eq. (2).
The assumption of the disc shaped D is not necessary, but brings convenience for calculating the degree distributions. At the beginning, the nodes on D can be regarded as potential authors. We call a node corresponding author, if it is selected in the for loops. The definitions of Di and I i give a geometric expression of node similarity: if
node i belongs to the influence region of node j , we say that i is similar to j . The definition of I i also gives a
geometric expression of node authority: the nodes with larger I i have more chances to attract connections.
The first connection mechanism, step 2, models the coauthor behavior of basic authors who often join one research group and coauthor with few and fixed researchers (Fig.3 (a)). The small value of c in step 2.1 makes the
selec-ted nodes by step 2 have small number of neighbors and often belong to a small size component. The second
cone-ction mechanism, step 3, models the coauthor behavior of academic authorities, who join several research
groups, and whose neighbors in different groups don't coauthor generally (Fig.3 (b)).
The parameters of the network generated the model(Table 1) are N 30, 000 , T1 9, 000 , T2 7, 000 , m 10 ,
1.
D (T ) Is given by Eq. (1). In step 3.3, we let 20% subsets with
2
cardinalities 2, 50% subsets with cardinalities 3, and 30% subsets with cardinalities 4.
R 10 , c 0.01 , p1
p2 1 , and E
THE DATA
The dataset of DBLP provides open bibliographic information on major computer science journals and proceedings. Based on the bibliographic information of 72,269 papers published in 54 mathematical journals, we constructed a coauthorship network: DBLP Math. Another corpus we considered here is the paper set of 3,007 interdisciplinary papers in PNAS 1999--2013.
In those networks, we identify authors by their names in their articles. This method is reliable in distinguishing
authors from one another, but it mistakes one author as two if the author makes any changes in his name in different
papers. Also, two authors may have the same name, which we can not to distinguish.
The distributions of hyperedge cardinality, degree and length of shortest paths of some networks in Table 1 are
fitted by corresponding functions respectively. The paraments of the functions are estimated by cftool: a curve fitting toolbox in MATLAB. Four statistical measures: The sum of squares due to error (SSE), Root mean squared error
(RMSE), Coefficient of determination ( R2 ), and Degree-of-freedom adjusted coefficient of determination
(Adjusted R 2 ) are used for measuring the goodness of fits.
The tails of degree distributions are fitted by Clauset et al's method [13-16]. The fitting function is
040057-3
f (k )
k E
f
¦ (n x
min
) E
(3)
n 0
Where E is the scaling exponent and xmin is the lower bound of the power-law behavior. Here, the parameters
E and xmin are calculated by Clauset et al's programs (http://tuvalu.santafe.edu/ aaronc/powerlaws). The p-value (p)
and the maximum distance between the cumulative distribution functions of the data and the fitted function (gof) are
also calculated by their program to show the goodness of fit tests.
CONCLUSION
We propose a random geometric hypergraph model for coauthorship networks. Two kinds of connection
mechanisms of the model are designed to model the different coauthor patterns of academic authorities and basic
researches respectively. The connection mechanisms give a geometric realization of that connections are made
preferentially to more authority nodes and to more similar nodes. By simulation and calculus, we show that the
degree distribution the distribution of cardinalities of hyperedge, high clustering coefficient, assortativity, and smallworld property of the network generated by the model are similar to those of some coauthorship networks.
Figures
FIGURE 1. Panels (a,b) shows the inference region and subgraphs generated by the ¿rst and second mechanism respectively.
Panel(c) shows the combination of two subgraphs.
FIGURE 2. Degree distributions of some networks in Fig.5
040057-4
FIGURE 3. The average degree of neighbors of nodes with degree k of some networks in Fig.5
FIGURE 4. The average degree of neighbors of nodes with degree k of some networks in Fig.5
FIGURE 5. Size, clustering coeƥcient (CC), and assortativity coeƥcient (AC) for some networks
REFERENCES
1.
2.
3.
Hoffman P (1998) the man who loved only numbers. Hyperion, New York.
Newman MEJ (2001) the structure of scientific collaboration networks. Proc Natl Acad Sci USA 98: 404-409.
Barabási AL, Jeong H, Nda Z, Ravasz E, Schubert A, et al. (2002) Evolution of the social network of scientific
collaborations. Physica A: Statistical Mechanics and its Applications 311: 590-614.
4. Moody J (2004) the strucutre of a social science collaboration network: Disciplinery cohesion form 1963 to
1999. Am Sociol Rev 69(2):213-238?
5. Mali F, Kronegger L, Ferligoj A (2010) Co-authorship trends and collaboration patterns in the Slovenian
sociological community. Corvinus J Sociol Soc Policy 1(2):29-50.
6. Perc C (2010) Growth and structure of Slovenia's scientific collaboration network. J Informetr 4(4): 475-482.
7. Wagner CS, Leydesdorff L (2005) Network structure, self-organization, and the growth of international
collaboration in science. Res Policy 34(10):1608-1618.
8. Tomassini M, Luthi L (2007) Empirical analysis of the evolution of a scientific collaboration network. Physica
a 285(2): 750-764.
9. Zhou T, Wang BH, Jin YD, He DR, Zhang PP, He Y, et al. (2007) Modeling collaboration networks based on
nonlinear preferential attachment, Int J Mod Phys C 18: 297-314.
10. Estrada E, RodrĚguez-Velázquez JA (2006) Subgraph centrality and clustering in complex hyper-networks.
Physica A 364: 581¨C594.
040057-5
11. Papadopoulos F, Kitsak M, Serrano MA, Boguná M, Krioukov D (2012) Popularity versus similarity in
growing networks. Nature 489: 537-540.
12. Xie Z, Ouyang ZZ, Zhang PY, Yi DY, Kong DX (2015) Modeling the citation network by network cosmology.
Plos One (accepted).
13. Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in emprical data. SIAM Rev 51: 661-703.
14. Xie Z, Ouyang ZZ, Li JP (2016) a geometric graph model for coauthorship networks. J Informetr 10: 299-311.
15. Xie Z, Ouyang ZZ, Dong EM, Yi DY, Li JP (2016) Modelling transition phenomena of scientific coauthorship
networks. ArXiv: 1604.08891.
16. Xie Z, Xie ZL, Li M, Li JP, Yi DY (2017) Modeling the coevolution between citations and coauthorship of
scientific papers. Scientometrics 112, 483-507.
040057-6
Документ
Категория
Без категории
Просмотров
2
Размер файла
577 Кб
Теги
5005259
1/--страниц
Пожаловаться на содержимое документа