- Some background on the
*Polymath Projects*and their place in the literature. - The goal of this talk.
- Online research: methodology and general insights.
- Formal models of scientific communities.
- The role of higher-order information for collaboration.
- Extracting patterns of interaction from the data.
- Roles in the collaboration.
- Conclusion: Central figures as available active aggregators.

- Open collaborative mathematics project initiated by Timothy Gowers in Februari 2009.
- Polymath 1 was devoted to the task of finding an elementary proof of the
*Density Hales-Jewett Theorem*. - Terence Tao soon joined Gowers as one of the leading figures.
- Probable success was announced less than 6 weeks after the start of the project.
- Results were published under the name "D.H.J. Polymath".
- Still active: Polymath 10 and 11 started in the last 6 months.

- Philosophy of mathematical practices: Real mathematics in action.
- Social studies of the internet: e-research and citizen science.
- Social epistemology: structure of scientific communities.

**By insiders**Gowers & Nielsen. 2009. Massively collaborative mathematics.*Nature*46. Michael Nielsen. "Reinventing discovery : the new era of networked science".**Popular science**New Scientist: “Mathematics becomes more sociable”, “How to build the global mathematics brain”.**Quantitative and social network analysis**of Polymath 1 by Cranshaw & Kittur (2011).- Studied within the
**"Social Machine of Mathematics"**project by Pease and Martin. - More publications by: Varshney (2012), Stefaneas & Vandoulakis (2012).
- Van Bendegem, J.P., 2014, Logic, Methodology and Philosophy of Science. Proceedings of the 14th International Congress (Nancy). Logic and Science Facing the New Technologies, Mathematics and the new technologies, part III: The cloud and the web of proofs. College Publications, London, pp. 427-39.
- Allo, P., Bendegem, J.P.V. & Kerkhove, B.V. 2013, Mathematical Arguments and Distributed Knowledge, in Aberdein & Dove (eds), The Argument of Mathematics, Springer Netherlands, Dordrecht, pp. 339-60.

- Go beyond the retelling of the story of Polymath.
- Evaluate whether this is really
*a new kind of mathematics*. - Focus on division of epistemic labour, information-flow, community-structures and diversity of roles.
- Use this as a case-study in the study of the structure of scientific communities.

- The role of ICT in mathematical research.
- Polymath as e-research or as citizen science.
- Quantitative insights.
- Substantial network-analysis (e.g. compute centrality-measures and triadic closure, look for connected components, ...).
- The messy details of cleaning the data, and some of the obstacles I faced.

- Rely on how formal methods can be used to study scientific communities, interaction, and information-flow.
- Not as a-priori methods, but as a source of concepts to guide an empirical study.

- Blog-post (L0-comment)
- L1-comment:
*timestamp: ..., author: Alice, content: ..., ...* - L1-comment:
*timestamp: ..., author: Bob, content: ..., ...*- L2-comment:
*timestamp: ..., author: Alice, content: ..., ...* - L2-comment:
*timestamp: ..., author: Carol, content: ..., ...*

- L2-comment:
- L1-comment:
*timestamp: ..., author: Bob, content: ..., ...*- L2-sub-comment:
*timestamp: ..., author: Alice, content: ..., ...*- L3-comment:
*timestamp: ..., author: Bob, content: ..., ...*

- L3-comment:

- L2-sub-comment:

- L1-comment:

In [8]:

```
example_graph = nx.DiGraph()
example_graph.add_nodes_from([1,2,3,4,5,6,7])
example_graph.add_edges_from([(3,2), (4,2), (6,5), (7,6)])
matplotlib.style.use(SBSTYLE)
nx.draw_networkx(example_graph, pos={1: (0,0), 2: (0,1), 3: (1,1), 4: (1, 1.5), 5: (0, 2), 6: (1, 2), 7: (2,2)},
node_list=[1,2,3,4,5,6,7],
node_color=[1, 2, 1, 3, 2, 1,2])
limits=plt.axis("off")
```

- Blog-post is not included.
- Edges from child-node to parent-node.
- Level of comment = distance to blog-post.
- A project is a list of threads.

In [26]:

```
example_graph2 = nx.DiGraph()
example_graph2.add_nodes_from(['Alice', 'Bob', 'Carol'])
example_graph2.add_weighted_edges_from([('Alice', 'Bob', 2), ('Bob', 'Alice', 1), ('Carol', 'Bob', 1)])
matplotlib.style.use(SBSTYLE)
nx.draw_networkx(example_graph2, node_list=['Alice', 'Bob', 'Carol'], node_color=[1,3,2], node_size=1000)
limits=plt.axis("off")
```

- Total number of comments
- Number of comments by comment-level.
- Total word-count.
- Timestamps of all comments.

**Polymath1:**New proofs and bounds for the density Hales-Jewett theorem. Initiated Feb 1, 2009; research results have now been published.**Polymath2:**Must an “explicitly defined” Banach space contain $c_0$ or $l_p$? Initiated Feb 17, 2009; attempts to relaunch via wiki, June 9 2010.**Polymath3:**The polynomial Hirsch conjecture. Proposed July 17, 2009; launched, September 30, 2010.**Polymath4:**A deterministic way to find primes. Proposed July 27, 2009; launched Aug 9, 2009. Research results have now been published.**Polymath5:**The Erdős discrepancy problem. Proposed Jan 10, 2010; launched Jan 19, 2010. Activity ceased by the end of 2012, but results from the project were used to solve the problem in 2015.**Polymath6:**Improving the bounds for Roth's theorem. Proposed Feb 5, 2011. Partial result published by non-participant**Polymath7:**Establishing the Hot Spots conjecture for acute-angled triangles. Proposed, May 31st, 2012; launched, Jun 8, 2012.**Polymath8:**Improving the bounds for small gaps between primes. Proposed, June 4, 2013; launched, June 4, 2013. Research results have now been published.**Polymath9:**exploring Borel determinacy-based methods for giving complexity bounds. Proposed, Oct 24, 2013; launched, Nov 3, 2013. “success of a kind”.**Polymath10:**improving the bounds for the Erdos-Rado sunflower lemma. Launched, Nov 2, 2015. ongoing**Polymath11:**proving Frankl's union-closed conjecture. Proposed Jan 21, 2016; launched Jan 29, 2016. ongoing

In [10]:

```
from matplotlib.ticker import FuncFormatter
PROJECTS_TO_C = ["Polymath {}".format(i) for i in range(1, 11)]
PARTICIPANTS = Series([PM_FRAME.loc[project]['authors (accumulated)'].iloc[-1] for
project in PROJECTS_TO_C], index=PROJECTS_TO_C)
R_NETWORKS = Series([PM_FRAME.loc[project]['r_network'].dropna().iloc[-1] for project in PROJECTS_TO_C],
index=PROJECTS_TO_C)
WITH_D = [project for project in PROJECTS_TO_C if not PM_FRAME.loc[project]['research'].all()]
D_NETWORKS = Series([PM_FRAME.loc[project]['d_network'].dropna().iloc[-1] for project in WITH_D],
index=WITH_D)
R_PARTICIPANTS = R_NETWORKS.apply(lambda network: set(network.author_frame.index))
D_PARTICIPANTS = D_NETWORKS.apply(lambda network: set(network.author_frame.index))
COMMENTS = Series([PM_FRAME.loc[project]['number of comments (accumulated)'].iloc[-1] for
project in PROJECTS_TO_C], index=PROJECTS_TO_C)
df = DataFrame({'all threads': PARTICIPANTS, 'research threads': R_PARTICIPANTS, 'discussion threads': D_PARTICIPANTS},
index=PROJECTS_TO_C)
df['authors only active in research threads'] = df['research threads'] - df['discussion threads']
df['authors only active in "discussion" threads'] = df['discussion threads'] - df['research threads']
df['authors active in both types of threads'] = df['all threads'] - df['authors only active in research threads'] - df['authors only active in "discussion" threads']
for project in PROJECTS_TO_C:
if pd.isnull(df.loc[project]['authors only active in research threads']):
df.loc[project]['authors only active in research threads'] = df.loc[project]['all threads']
data = df[['authors only active in research threads', 'authors only active in "discussion" threads', 'authors active in both types of threads']]
data = data.applymap(lambda set: len(set) if pd.notnull(set) else 0)
matplotlib.style.use(SBSTYLE)
axes = data.plot(kind='bar', stacked=True, color=['steelblue', 'lightsteelblue', 'lightgrey'],
title="Number of participants per thread-type in each Polymath project\n Number of comments per project")
axes.set_ylabel("Number of participants")
axes.annotate('published', xy=(0, 115), xytext=(0, 130),
arrowprops=dict(facecolor='steelblue', shrink=0.05),
)
axes.annotate('published', xy=(3, 60), xytext=(1.5, 80),
arrowprops=dict(facecolor='steelblue', shrink=0.05),
)
axes.annotate('re-used', xy=(4, 130), xytext=(4.5, 140),
arrowprops=dict(facecolor='lightsteelblue', shrink=0.05),
)
axes.annotate('published', xy=(7, 155), xytext=(7.5, 170),
arrowprops=dict(facecolor='steelblue', shrink=0.05),
)
data2 = np.sqrt(COMMENTS)
axes2 = axes.twinx()
axes2.yaxis.set_major_formatter(FuncFormatter(lambda x, pos:"{:0.0f}".format(np.square(x))))
axes2.set_ylabel("Number of comments")
axes2.plot(axes.get_xticks(), data2.values,
linestyle='-', marker='.', linewidth=.5,
color='darkgrey')
```

Out[10]:

[<matplotlib.lines.Line2D at 0x1c7773f28>]

In [13]:

```
plot_community_evolution("Polymaths")
```

<matplotlib.figure.Figure at 0x1c730ca20>

In [28]:

```
select_n = plot_participation_evolution("Polymath", n=2)
```

(threshold: participation to at least two projects)

In [17]:

```
from mpl_toolkits.axes_grid1 import make_axes_locatable
authors_n = sorted([author for author, bool in select_n.items() if bool])
def general_heatmap(authors=None, binary=False, thread_level=True,
binary_method='average', method='ward', log=True,
fontsize=8):
if thread_level:
authors_filtered = list(ALL_AUTHORS)
try:
authors_filtered.remove("Anonymous")
except:
pass
data=PM_FRAME['comment_counter']
else:
authors_filtered = list(ALL_AUTHORS) if not authors else authors
try:
authors_filtered.remove("Anonymous")
except:
pass
data = get_last(POLYMATHS)[0]['comment_counter (accumulated)']
if binary:
as_matrix=np.array([[True if author in data[thread] else False for author in authors_filtered]
for thread in data.index])
Z_author = linkage(as_matrix.T, method=binary_method, metric='hamming')
Z_thread = linkage(as_matrix, method=binary_method, metric='hamming')
c, _ = cophenet(Z_author, pdist(as_matrix.T))
print("Cophenetic Correlation Coefficient with {}: {}".format(binary_method, c))
else:
as_matrix = []
for thread in data.index:
new_row = [data.loc[thread][author] for author in authors_filtered]
as_matrix.append(new_row)
as_matrix = np.array(as_matrix)
Z_author = linkage(as_matrix.T, method=method, metric='euclidean')
Z_thread = linkage(as_matrix, method=method, metric='euclidean')
c, _ = cophenet(Z_author, pdist(as_matrix.T))
print("Cophenetic Correlation Coefficient with {}: {}".format(method, c))
# start setting up plots
matplotlib.style.use(SBSTYLE)
fig, ax_heatmap = plt.subplots()
# compute and plot dendogram (top-plot)
ddata_author = dendrogram(Z_author, color_threshold=.07,
no_plot=True)
ddata_thread = dendrogram(Z_thread, color_threshold=.07, no_plot=True)
df = DataFrame(as_matrix, columns=authors_filtered)
cols = [authors_filtered[i] for i in ddata_author['leaves']]
df = df[cols]
rows = [df.index[i] for i in ddata_thread['leaves']]
df = df.reindex(rows)
# plot heatmap (bottom)
heatmap = ax_heatmap.pcolor(df,
edgecolors='w',
cmap=mpl.cm.binary if binary else mpl.cm.GnBu,
norm=mpl.colors.LogNorm() if log else None)
ax_heatmap.autoscale(tight=True) # get rid of whitespace in margins of heatmap
ax_heatmap.set_aspect('equal') # ensure heatmap cells are square
ax_heatmap.xaxis.set_ticks_position('bottom') # put column labels at the bottom
ax_heatmap.tick_params(bottom='off', top='off', left='off', right='off') # turn off ticks
ax_heatmap.set_title("Project-Engagement in Polymath")
ax_heatmap.set_yticks(np.arange(0.5, len(df.index)+.5, 1))
ax_heatmap.set_yticklabels(df.index + 1, fontsize=fontsize)
ax_heatmap.set_xticks(np.arange(len(df.columns)) + 0.5)
ax_heatmap.set_xticklabels(df.columns, rotation=90, fontsize=fontsize)
if not binary:
divider_h = make_axes_locatable(ax_heatmap)
cax = divider_h.append_axes("right", "3%", pad="1%")
plt.colorbar(heatmap, cax=cax)
lines = (ax_heatmap.xaxis.get_ticklines() +
ax_heatmap.yaxis.get_ticklines())
plt.setp(lines, visible=False)
plt.tight_layout()
general_heatmap(authors=authors_n, thread_level=False,
binary=False, log=True)
```

Cophenetic Correlation Coefficient with ward: 0.9424851136308227

**Leaders (2):**Tao & Gowers**Core-participants (11):**Peake -> Sauvaget**Periphery (49):**Kowalski -> mixedmath**Outsider (1):**Edgington (very active in PM5 and PM11, but not an early adopter)

- Most comments and most participants.
- Little commenting activity in the heatmap.
- Only 29 of the 180 participants to Polymath 8 overlap with other projects.
- Typical computational project: “a project to improve the bound $H=H_1$ on the least gap between consecutive primes that was attained infinitely often”

- Bala & Goyal model adapted by Zollman.

- Communities are represented as graphs.

- Success = converging to the truth.

- Results based on simulations (typically in netlog).

- Widely cited insight: maximal connectivity is
*not*the optimal organisation (being shielded from potentially bad information is beneficial).

- Only
*factual information*is exchanged.

- Agent-behaviour is reduced to choosing an option.

- Graph-based typology of communities

- Only one-to-one exchanges.

- Does not fit the open-ended character of Polymath.

- Interactions in Polymath cannot be reduced to observations.

- Public character of Polymath suggests maximal accessibility.

- Mutual intentions.
- Common intentions.
- Group commitments.

- Impossible to achive through asynchronous communication.

- Access to information does not explain why teamwork succeeds.
- Reliable information-aggregation isn't sufficient for teamwork.
- Patterns of one-to-many communication as a finer typology.

**Note:** Zollman's model could be seen as private announcements/observations only.

**Criterion:**participation in the same thread**Account:**who might have interacted

In [72]:

```
project_heatmap("Polymath 4", cluster_threads=True, method='average', log=True, fontsize=10)
```

In [100]:

```
project_heatmap("Polymath 1", cluster_threads=True, method='average', log=True, fontsize=9)
```

**Criterion:**direct replies**Account:**who did interact

In [83]:

```
draw_network("Polymath 4", graph_type="interaction", reset=True)
```

- Weighted directed graph.

**Criterion:**high probability of mutual awareness**Account:**who probably interacted

- But how can this information be extracted from actual interactions?

- Contributing puts one in the
*centre of discussion*.

- Closeness to the
*centre of discussion*is an indication for availability.

- Being close to the
*centre of discussion*is sufficient to know who's also close to the centre of discussion.

- The above is common knowledge.

- Modulo vagueness (and abstracting from the temporal ordering) about what it means to be close to the centre of discussion, the set
*A*of agents that are close to the centre of discussion is common knowledge within*A*.

- This approximates the conditions for the members of
*A*to make**public announcements**to the members of*A*.

- Common knowledge within
*A*of what is announced is highly probable.

- Distance from the centre of discussion is proportional to time elapsed since last comment.

In [95]:

```
import io
import base64
from IPython.display import HTML
video = io.open('FIGS/out.m4v', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
<source src="data:video/mp4;base64,{0}" type="video/mp4" />
</video>'''.format(encoded.decode('ascii')))
```

Out[95]: