Mechanisms to divide labour in social microorganisms 1

1


Introduction
Different species use different mechanisms to divide labour between reproductives and their helpers (Figure 1).In some cases, individuals interact with one another to determine which individual will perform which role, leading to a precise allocation of labour (coordinated specialisation).For example, workers feeding royal jelly to larvae to produce honey bee queens (Figure 1A), and how a lack of nitrogen fixation by neighbours causes cells of the bacteria Anabaena cylindrica to differentiate into sterile nitrogen-fixing heterocysts (Figure 1B) [1][2][3] .In other cases, there is no between-individual interaction, and each individual adopts its phenotype randomly (random specialisation).
For example, in Salmonella enterica co-infections, random biochemical fluctuations within each cell's cytoplasm determines which cells infect the gut tissue and trigger an inflammatory response that eliminates competitor species (Figure 1D) 4,5 .In yet further cases, phenotype is determined by the individual's genotype (genetic control).For instance, in some ant societies whether individuals develop into queens, major, or minor workers can be determined, in part, by their genes (Figure 1C) [6][7][8][9] .Across the tree of life some species employ one mechanism to divide labour whereas in other species mixed forms exist with a combination of coordinated specialisation, random specialisation, or genetic control 6,8,[10][11][12][13] .A. cylindrica filaments (cyanobacteria), some individuals develop into sterile nitrogen fixers (larger, round cells) if the amount of nitrogen fixed by their neighbours is insufficient (coordinated specialisation).This leads to a precise allocation of labour, with nitrogen-fixing cells distributed at fixed intervals along the filament 2 .(Picture taken by Robert Calentine.)C) In the army ant (Eciton Burchelli), whether individual ants become a major or minor worker has a genetic component (genetic control) 9 .(Photo by Alex Wild via the Wikimedia Commons, cropped.)D) In S. enterica infections (serovar Typhymurium), each cell amplifies intra-cellular noise to determine whether it will self-sacrifice and trigger an inflammatory response that eliminates competing strains (random specialisation). 5(Photo by Rocky Mountain Laboratories, NIAID, NIH via Wikimedia Commons.) We lack general evolutionary explanations for why different species use different mechanisms to divide labour between reproductives and their helpers 6,[14][15][16] .Previous work has focused on non-reproductive division of labour in the social insects, and the proximate mechanisms that lead to different worker castes 9,[17][18][19][20][21][22] .However, the focus in that literature is on a different question -how different proximate mechanisms can produce coordinated specialisation -rather than the broader question of whether coordinated specialisation should be favoured over random specialisation or genetic control in the first place.It is with reproductive division of labour that these three very different mechanisms have been observed in different species 16,23 .There is a lack of theory to explain this variation in the mechanisms used to produce reproductive division of labour 6,[14][15][16] .
Bacteria and other social microbes offer excellent opportunities for studying why different labour dividing mechanisms are favoured in different species.Across microbe species, two different mechanisms are used to produce reproductive division of labourcoordinated and random specialisation (Figure 2).Furthermore, while the form of cooperation and life histories of these microbes share many similarities, they also vary in factors that could influence the evolution of division of labour, such as social group size.We develop theory to examine whether the relative advantage of random versus coordinated specialisation can depend upon social or environmental conditions.Our aim here is to use microbes as a 'test system' to address the broader question of whether evolutionary theory can explain the diversity in the mechanisms that produce division of labour.
We compare the relative fitness advantage of dividing labour with either coordinated or random specialisation.We begin by assuming that coordinated specialisation leads to social groups containing the optimal allocation of helpers and reproductives.While this assumption is useful as a conceptual overview, such 'full coordination' may require a complex system of mechanisms and could be difficult to evolve from scratch.
Consequently, in a secondary analysis we relax the assumption of full coordination and allow the level of coordination to co-evolve along with the extent to which labour is divided.This scenario is more biologically realistic and allows us to examine both whether coordination can initially arise and the extent of coordination favoured.Cell-to-cell interactions parental control (epigenetics) or through relative condition dependence (signals or cues) 16,29 .
While this mechanism can be metabolically costly, it produces precise groups with the optimal proportion of helpers and reproductives.

Results and Discussion
Our first aim is to capture the problem in a deliberately simple model, which is easy to interpret, and can be applied across diverse microbes 30,31 .We assume a form of cooperation that is common in microbes, when some individuals produce a 'public good' that benefits the other cells.To examine when different mechanisms to divide labour could be favoured, we compare the relative fitness of individuals that use either fully coordinated or random specialization.We then test the robustness of our results, by developing a number of alternate models for different biological scenarios, and by examining how different levels of coordination could evolve.

Random specialisation vs fully coordinated specialisation
We assume that a single individual arrives on an empty patch and, through a fixed series of replications, spawns a clonal group of ݊ individuals consisting of ݇ sterile helpers and ݊ െ ݇ pure reproductives (݇ ‫א‬ ሼ0,1,2, … , ݊ሽ).The reproductive success (fitness) of a particular group (that we denote by ݃ , ), as measured by the per capita number of dispersing offspring at the end of the group life cycle is given by where ݊ െ ݇ is the number of reproductives in the group, and ݂ , is the fecundity of each reproductive, which we assume is increasing in ݇.
Expression (1) highlights the trade-off between the number of reproductives in the group (݊ െ ݇), which is higher when there are fewer helpers (lower ݇), and the amount of help that those reproductives obtain (݂ , ), which is higher when there are more helpers (higher ݇).The balance of this trade-off often results in an optimal number of helpers, ݇ ‫כ‬ , that is intermediate (i.e., 0 ൏ ݇ ‫כ‬ ൏ ݊).
In species that divide labour by coordination, we assume that group members share phenotypic information via signalling between cells, parental control, or some cue of relative condition (Figure 2B).Our model is deliberately agnostic to the details of how phenotypic information is shared so as to facilitate broad predictions across different systems.Coordination allows the outcome of individual specialisation to depend on the phenotypes of social group neighbours.In our first analysis, we make the simplifying assumption that individuals coordinate fully, so that groups always form with the optimal number of helpers, ݇ ‫כ‬ .The disadvantage of coordinated specialisation is that the mechanism of coordination can incur metabolic costs (e.g., from the production of extracellular signalling molecules).We posit that the fitness of a group of coordinated specialisers is given by: where ݃ ‫כ‬ , is the fecundity of a group with the optimal number of helpers (i.e., ݃ ‫כ‬ , ൌ max ݇ ݃ , ) and 0 ܿ 1 is a multiplicative cost of coordination that we assume is increasing in group size, ݊.A number of different models have examined how different proximate mechanisms could be favoured to coordinate division of labour in specific systems 17,[20][21][22] .
In species that divide labour by random specialisation, each individual in the group independently becomes a helper with a given probability and a reproductive otherwise (Figure 2A).Hence, the final number of helpers in the group is a random binomial variable.Assuming that the probability of adopting a helper role is equal to the optimal proportion of helpers ‫(‬ ‫כ‬ ൌ ݇ ‫כ‬ ݊ ⁄ ), the expected fitness of a group of random specialisers is given by: The potential advantage of random specialisation is that there are no upfront metabolic costs from, for example, between cell signalling.The downside of random specialisation is that groups sometimes form with fewer or more helpers than is optimal (demographic stochasticity).
In a notable exception to the lack of previous work in this area, Wahl examined when different mechanisms could be favoured to produce non-reproductive division of labour between different castes 15 .Analogous to the cost of random specialization that we have demonstrated, she showed a mechanistically different cost that arises when division of labour is determined genetically: groups may sometimes form that do not contain all of the genotypes required to produce all of the necessary phenotypes (castes) 15 .
Before proceeding, we must specify how reproductive fecundity depends on the amount of help in the group.We focus here on one of the most common forms of cooperation in microbes, where individuals secrete factors that provide a benefit to the local population of cells ("public goods") 32 .We assume that the amount of public good in the social group depends linearly on the number of helpers in the group and is "consumed" by all group members equally 33,34 .An example of such a public good is found in Bacilus subtilis populations, where helper cells produce and secrete proteases that degrade proteins into smaller peptides, which are then re-absorbed as a nutrient source by all cells 35 .
We consider the possibility that the importance of producing public goods may vary.
Accordingly, the fecundity of a reproductive is modelled as the sum of a baseline reproduction (1 െ ߳) and of benefits derived from the production of public goods by helpers (scaled by ߳).The parameter ߳, which we call the essentiality of cooperation, quantifies the degree to which reproductives are reliant on helpers in the group 36 .As ߳ varies from zero to one, the production of public goods goes from being completely unimportant to entirely essential.These assumptions lead to the following expression for the fecundity of a reproductive: Substituting Equation 4into Equations 1-3 and solving for the condition for fully coordinated specialisation to be favoured over random specialisation (i.e., ‫ݓ‬ ‫ݓ‬ ோ ), gives the simplified expression (see supplementary section C): where division of labour is favoured to evolve only when ߳ 1/2, which we will assume henceforth.Condition (5) gives three key (and easy to interpret) predictions for how the metabolic cost of coordination (ܿ ), the size of the group (݊), and the essentiality of cooperation (߳) affect which labour-dividing mechanism is optimal (Figure 3).If division of labour is favoured (߳ 1 2 ⁄ ) and there is no coordination cost for all ݊ then coordinated specialisation is trivially always the preferred strategy (i.e., ‫ݓ‬ ‫ݓ‬ ோ always holds).For positive coordination costs, condition (5) predicts: Prediction 1: Smaller costs of coordination favour coordinated specialisation.
The relative costs of coordination will depend on the specific mechanism by which individuals ensure that the optimal allocation of labour is achieved.For example, signalling between social partners (as occurs in the formation of Dictyostelium discoideum fruiting bodies) could have higher costs than parental control (as may occur in Volvocine algae) or than using the products of cooperation as a cue (as occurs in A. cylindrica) [37][38][39][40][41] .
If there is a cost to coordination, then the optimal mechanism to divide labour depends on how this cost balances against the hidden costs of random specialisation (i.e., the right hand side of inequality ( 5)).Theses hidden costs are determined by: (i) the likelihood that random groups deviate from the optimal proportion of helpers, and (ii) the degree to which those deviations from the optimal proportion of helpers leads to a reduced fecundity for the group (see supplementary section C).As captured by condition (5), the importance of these two factors depends upon the size of the group (n) and the essentiality of cooperation (߳).
Prediction 2. Smaller social groups favour coordinated specialisation.
Smaller groups favour coordinated specialisation for two reasons.First, the hidden costs of random specialisation are greater in smaller groups.In smaller groups, there are fewer possible outcomes for the proportion of helpers (݊ 1 possible allocations of labour for groups of size ݊).Consequently, random specialisation can more easily lead to the formation of groups with a realized proportion of helpers that deviates significantly from the optimum ‫(‬ ‫ا‬ ‫‬ ‫כ‬ or ‫‬ ‫ب‬ ‫‬ ‫כ‬ ).In contrast, in larger groups there are more possible outcomes and the resulting proportion of helpers will be more clustered about the optimal composition with highest fitness (with ‫‬ ൎ ‫‬ ‫כ‬ for very large group sizes).
This effect of group size on the cost of random specialisation is a consequence of the law of large numbers (e.g., one is much more likely have outcomes close to 50% heads when tossing 100 coins in a row compared to tossing 4 coins in a row, in which no heads or all heads may frequently occur).Our prediction here is also analogous to how, when mating occurs in small groups, small brood sizes select for more precise and less female biased sex ratios to decrease the probability of producing a group containing no males [42][43][44] .Similarly, in some cases random specialisation in a small group may strategically favour a lower (or higher) proportion of helpers than might otherwise be optimal, to decrease the chances of having zero or insufficient reproductives (or helpers; see supplementary section F and Figure S4).
The second reason why smaller groups favour coordinated specialisation is that in smaller groups there are fewer individuals with whom to coordinate and thus a smaller cost of coordination, given our assumption that ܿ is increasing in ݊.The importance of this influence of group size depends upon mechanistic details about how coordinated is achieved.

Prediction 3: More essential public goods favour coordinated specialisation
When cooperation is more essential (larger ߳), the hidden costs of random specialisation increase, and so coordinated specialisation is more likely to be favoured.
As the essentiality of cooperation increases, the fitness costs incurred from producing too few helpers also increases (i.e., there are larger costs to deviating from optimum).In addition, as cooperation becomes more essential (larger ߳), the fraction of helpers favoured increases towards 50% helpers ‫(‬ ‫כ‬ ൎ ଵ ଶ ).This increases the variance in the proportion of helpers produced by random specialisers, and so sub-optimal groups may arise more frequently (see supplementary section C).Both of these effects increase the hidden costs of random specialisation, and thus favour the evolution of fully coordinated specialisation (Figure 3).

Alternative forms of cooperation
The above analysis employs a deliberately simple public goods model, focusing on factors that are expected to be broadly relevant across many microbial systems.This facilitates the interpretation of our results, and generates broadly applicable predictions that are less reliant on the specifics of any particular biological species.
In order to test the robustness of our results (predictions 1-3) we also developed a number of alternate models for different biological scenarios (see supplementary sections D and E, and Figures S1-S3).In particular, we examined the possibility that the collective good provided by helpers: (i) is not consumed by its beneficiaries, as may occur when self-sacrificing S. enterica cells enter the gut to trigger an immune response that eliminates competitors (i.e., a non-rivalrous or non-congestible collective good); or (ii) is only consumed by the reproductives in the group, as may occur for the fixed nitrogen produced by heterocyst cells in A. cylindrica filaments (i.e., a kind of excludible good) 4,45,46 .We also examined the possibility that reproductive fecundity depends nonlinearly on the proportion of helpers in the group.In all of these alternative biological scenarios, we found qualitative agreement with the predictions of the linear public goods model.
That said, we made a number of assumptions that it would be useful to investigate theoretically.We assumed that social groups are always clonal, such that natural selection would act to maximise group fitness 47,48 .Relaxing this assumption of clonality would both decrease the optimal level of cooperation and increase the potential for conflict within groups, which could undermine the stability and honesty of inter-cellular signalling, making coordination less likely to be favoured in some cases 36,38,49 .
Analogously, increasing within-group conflict could favour a genetic predisposition (some genetic control) to adopting a reproductive role in systems that otherwise employ coordinated or random specialisation 6,8 .
The optimal level of coordination Finally, we relax our assumption that individuals coordinate fully, and allow for intermediate forms of coordination (Figure 4A).This allows us to examine whether costly coordination can evolve from an ancestral state where there is no coordination.
We assume that coordination is achieved by one-to-one interactions between specialising cells whereby phenotypic information is shared.We let ‫ݏ‬ be the level of coordination, which is equal to the probability that any two cells interact with one another.Once again, our model is agnostic to the specific mode of interaction (whether it be via an extracellular signal, or the intermediary of parental control, or otherwise) so as to generate predictions that are not anchored to the specifics of a particular system.
Cells randomly take turns specialising, and adopt a helper role if the proportion of helpers amongst the cells with which they interact falls below a critical threshold, ‫ݐ‬ (see supplementary section G.1 for more details on this model).We assume that higher levels of coordination (higher ‫)ݏ‬ entail a larger cost of coordination ܿ .Overall, this leads to a trade-off between more costly and precise forms of coordination on the one hand (higher ‫)ݏ‬ and cheaper and less precise forms of coordination on the other hand (lower ‫.)ݏ‬At the two extremes of ‫,ݏ‬ we have either random specialisation with no coordination ‫ݏ(‬ ൌ 0), or full coordination between all cells ‫ݏ(‬ ൌ 1ሻ (Figure 4A).
We used individual based simulations to determine the optimal level of coordination, ‫ݏ‬ ‫כ‬ , and target proportion of helpers, ‫ݐ‬ ‫כ‬ , that co-evolve in different scenarios (see supplementary section G.1, Figure 4B-D and Figure S5).The evolutionary outcomes, ‫ݏ‬ ‫כ‬ and ‫ݐ‬ ‫כ‬ , are estimated by the average trait values that evolved across the entire population, in the last 3,000 generations of 10 independent simulations.The results of our simulations are in broad qualitative agreement with our previous predictions: small groups (low ݊) and more essential cooperation (high ߳) favour the evolution of coordinated specialisation ‫ݏ(‬ ‫כ‬ 0; Figure 4D).Additionally, we found that both intermediate forms of coordination (0 ൏ ‫ݏ‬ ‫כ‬ ൏ 1ሻ and fully coordinated specialisation ‫ݏ(‬ ‫כ‬ ൎ 1ሻ can be favoured to evolve, the latter occurring when group size is particularly small and cooperation is particularly essential (lower ݊ and higher ߳; Figure 4D).We find the same qualitative results with a simple analytical model (see supplementary section G.3).As well as potentially explaining variation across systems, these results also suggest that individual cells could be selected to conditionally adjust their level of coordination in response to factors such as changing group size.Separate simulations have shown that these results are independent of starting conditions.(D) Smaller groups (lower ݊) and more essential cooperation (higher ߳) favour the evolution of fully coordinated specialisation ‫ݏ(‬ ‫כ‬ ൎ 1).Intermediate coordination (0 ൏ ‫ݏ‬ ‫כ‬ ൏ 1) may evolve in more moderate conditions.(See figure S5 for the optimal target proportion of helpers, ‫ݐ‬ ‫כ‬ ).We modelled the cost of coordination as the increasing and decelerating function ܿ ൌ ߙ൫1 െ ݁ ିఉ௦ሺିଵሻ ଶ ⁄ ሻ , where s݊ ሺ݊ െ 1ሻ 2 ⁄ is the expected number of cell-to-cell interactions in a coordinated group with level of coordination ‫,ݏ‬ and 0 ൏ ߙ ൏ 1 and ߚ 0 are parameters that affect the scale and the shape of the cost of coordination curve, respectively.Here, ߚ ൌ 0.01 and ߙ ൌ 0.1.
Overall, these predictions highlight that the level of coordination within a labour-dividing species is a trait that can be shaped by natural selection, and that fully coordinated specialisation is a mechanism that can be expected to evolve via intermediate forms in many cases.We also found that the degree to which intermediate levels of coordination may arise depends on the shape of the costs of coordination (see supplementary section G.2 and Figure S6).In particular, if there is a large up-front cost to coordination, then intermediate coordination (0 ൏ ‫ݏ‬ ‫כ‬ ൏ 1) is less likely to arise-random specialisation will either be favoured (no coordination; ‫ݏ‬ ‫כ‬ ൎ 0ሻ or systems will tend to evolve to fully coordinated specialisation ‫ݏ(‬ ‫כ‬ ൎ 1ሻ.Among microorganisms that employ coordination to divide labour, both cyanobacterial filaments and Volvox carteri colonies have very precise allocations of labour (full coordination) 50,51 .In contrast, the exact level of coordination in fruiting bodies such as D. discoideum and Myxococcus xanthus is less well known [52][53][54] .This highlights the need for further empirical work to determine the extent to which groups are "fully" coordinated in specific species.

Distribution of labour dividing mechanisms in the natural world
Most previous work on reproductive division of labour has tended to be either mechanistic, focusing on how different phenotypes are produced (caste determination), or evolutionary, focusing on why division of labour is favoured in the first place .We have examined how evolutionary theory can also be used to explain why different mechanisms are used in different species 4,6,[14][15][16][17] .In particular, we have shown that coordinated specialisation is more likely to be favoured over random specialisation in small groups, when coordination costs are low, and when there are larger fitness costs to deviating from optimal caste ratios.Our results identify social and environmental factors that could explain the distribution of mechanisms for division of labour that have been observed in bacteria and other microbes.Consistent with our predictions, coordination appears to be more prevalent in species where group sizes are smaller, cooperation is more essential, and/or the cost of coordination is lower (Table 1; see also supplementary section H).However, these connections are speculative, as more data is required for a formal comparative study across species.
Our results also suggest a hypothesis for why random caste determination has not been widely observed in animal societies: during the initial evolution of complex animal societies, group sizes were likely to be small and costs of coordination might have been minor compared to each individual's day-to-day organismal metabolic expenditure. .For illustrative purposes, we provide approximate categorical estimates of relative group size, essentiality of cooperation, and the cost of coordination.This suggests roughly that coordinated specialisation has evolved most frequently in species with smaller groups, low coordination costs, and more essential cooperation.An across-species phylogeny-based comparative study, with a larger species sampling, would be required to formally test the predictions of our models 62 .

Salmonella
This emphasizes the need for more data on the mechanisms underlying division of labour in these and other social microbe species (see supplementary section H).For example, persistence specialisation occurs randomly in large groups, and it has been suggested that this is cooperative 63 .*V. carteri has a very precise allocation of labour 13 , however there is observed variation in the number of germ and somatic cells in many Volvox species and so this may be a case where both coordinated specialisation and some random specialisation occur 11 .

Supplementary information:
A

A Overview
This manuscript is organized as follows.Section B defines the strategies of fully coordinated specialisation and random specialisation, and characterises their fitness.Section C derives the main analytical result of the main text (Equation 5), that is, the condition for fully coordinated specialisation to be favoured over random specialisation when the underlying division of labour is modelled as a linear public goods game.Section D re-examines key assumptions and approximations that simplified the analysis of the public goods model.Section E considers several models for alternative biological systems leading to different kinds of public goods.Both sections D and E demonstrate the robustness and generality of our results when relaxing our assumptions and approximations.Section F presents a numerical investigation on the impact of asymmetry in the fitness functions on the optimal probability of becoming a helper, and how this can mitigate some of the costs of random specialisation.Section G examines the optimal level of coordination using individual based simulations as well as an analytical model (as presented in Fig. 4 of the main text).Finally, section H presents details on how we categorised the relative group size, essentiality of cooperation, and cost of coordination for the different labour-dividing microbial species presented in Table 1 of the main text.

B Labour dividers and their fitness
We assume that a single individual arrives on an empty patch and, through a fixed series of replications, forms a clonal group of n individuals that is composed of k sterile helpers and n − k pure reproductives, where k ∈ N = {0, 1, 2, . . ., n}.The reproductive success (fitness) of the group, g k,n , as measured by the per capita number of dispersing offspring at the end of the group life cycle, is given by where f k,n is the fecundity of each reproductive in the group.We assume that f k,n depends only on the proportion of helpers in the group, p = k/n with p ∈ P = {0, 1/n, 2/n, . . ., (n − 1)/n, 1}, so that we can write f k,n = F (p), where F is a real function that is increasing on the interval [0, 1].This assumption allows us to rewrite (S1) as where we have defined

B.1 Fully coordinated specialisation
With fully coordinated specialisation (C), we assume that some mechanism, such as signalling between cells, ensures that groups always form with the optimal proportion of helpers, p * , where We posit that the fitness of a group of coordinated specialisers is given by: where c n ∈ [0, 1] is the cost of coordination, which we assume is a non-decreasing function of the group size, n.
A parsimonious choice, that we use to illustrate all of our results, is to set where n(n − 1)/2 is the number of cell-to-cell interactions in a fully coordinated group, α ∈ [0, 1] controls the scale of the cost, and β > 0 determines how diminishing the cost is (as presented in Fig. 3 of the main text).Note, however, that our main result (Equation 5 in the main text, to be derived below) is independent of the particular functional form of the cost of coordination.

B.2 Random specialisation
With random specialisation (R), each individual in the group independently becomes a helper with a given probability q, and a reproductive otherwise.Hence, the number of helpers in the group, that we denote by K, is a binomial random variable with parameters n and q (i.e., K ∼ Binomial(n, q)).In the following it will also be convenient to write Q = K/n for the random variable giving the proportion of helpers in the group.Within this framework, the expected fitness of a group of random specialisers is simply given by

C Linear public goods
Our main model assumes that the fecundity function F is linear in p and given by where ∈ [0, 1] is a parameter measuring the "essentiality of cooperation".For = 0, cooperation by helpers is totally unessential, as the fecundity of reproductives is constant and equal to 1.For = 1, cooperation by helpers is totally essential and (S8) reduces to F = p, so that the fecundity of reproductives equals the proportion of helpers in the group.For 0 < < 1, the fecundity of reproductives is the sum of a constant term (i.e., 1 − ) and a term proportional to the proportion of helpers in the group (i.e., p).
Replacing (S8) into (S3) and simplifying we obtain To find the fitness of coordinated specialisers, we first assume that p is a continuous variable, and calculate the derivative This derivative is decreasing in p (i.e., G(p) is concave), and has a single root, p, given by Such a root lies in the interval (0, 1) if and only if 1/2 < ≤ 1. Otherwise the maximiser of G(p) (and hence the optimal allocation of helpers) is given by p = 0 (i.e., it is optimal to have no helpers).To avoid this trivial scenario without division of labour, henceforth we assume that 1/2 < ≤ 1 holds.Further, to make progress we approximate the optimal allocation of helpers, p * , by p.The actual optimal value p * will be a value near p but constrained by the permissible group compositions, since p * ∈ P (cf.section D.1).Note that the approximate optimal proportion (S10) is independent of n, and that it is an increasing function of such that p = 1/2 when = 1.An approximation to the fitness of fully coordinated specialisers can then be obtained by letting p * ≈ p in equation (S5), that is, by letting To find the fitness of random specialisers, we replace (S9) into (S7) and simplify to obtain where we have made use of the first two moments of the binomial distribution, 2 , of expression (S9), and of the fact that Var(Q) = q(1 − q)/n.
In order to determine the condition under which coordinated specialisation is favoured over random specialisation, we assume in a first step that random specialisers play the strategy q = p * , so that their fitness is given by where P * = K * /n and K * ∼ Binomial(n, p * ) (and hence Var(P * ) = p * (1−p * )/n).This assumption simplifies our calculations and leads to results that are qualitatively similar to those that arise from the more parsimonious assumption that random specialisers play the strategy that maximises their fitness (cf.section D.2). Comparing expressions (S5) and (S14), it follows that the condition w C (p * ) > w R (p * ) can be written as where the critical coordination cost, c * , is given by Equation (S16) makes it explicit the fact that the critical cost of coordination may be decomposed into a measure of the deviation from the optimal allocation of labour, Var (P * ), and a quantity that captures the relative cost of deviating from the optimal proportion of helpers, /G(p * ).Furthermore, since G(p * ) is (approximately) independent of n, we can say that the effect of increasing the group size acts primarily on the deviation of groups from the optimal proportion of helpers, Var (P * ).In contrast, a larger essentiality of cooperation both (i) pushes p * closer to 1/2 (which in turns increases the variance Var(P * ) = (2 − 1)/(4n 2 )) and (ii) increases the cost of deviation (larger /G(p * ) = 4 2 ) and thus acts via both factors.
To obtain a simple expression of c * in terms of our parameters n and , we approximate p * by p as given in (S10) to obtain which is increasing in the essentiality of cooperation and decreasing in group size n.Expression (S15) together with the approximation yields expression 5 in the main text.

D Alternative modelling assumptions D.1 Discrete proportions of helpers
In section C, we approximated the optimal proportion of helpers in coordinated groups as a continuous variable In reality this quantity is discrete as it is given by p * = k * /n and there can only be an integer number of helpers in the group (k * ∈ N , so that p * ∈ P).
Here, we show that this assumption is relatively innocuous.First, it is clear that p * tends to p as n grows large (Fig. S1A).Second, even for relatively small group size n the predictions of the model are relatively independent of this assumption.To show this, we repeat our analysis without making the continuous approximation.That is, we evaluate the conditions c n < c * (coordinated specialisation favoured) and c n > c * (random specialisation favoured), where c * is given by (S16) and where p * is not approximated as p but is calculated numerically for each combination of parameters considered, so that p * ∈ P. The results of this analysis are shown in Fig. S1B.As expected, we find that the parameter space boundary is more jagged but that the broad results are similar to those of the continuous treatment.

D.2 Optimal random specialisers
In section C, we assumed that the probability of becoming a helper, q, was approximately equal to the optimal proportion of helpers (i.e., we let q = p ≈ p * ) (Fig. S1A).A more realistic assumption is that the probability of being a helper in random specialisers is the one optimising fitness, so that random specialisers play strategy q = q * , with This alternative modelling assumption makes the fitness of random specialisers be larger (as w R (q * ) ≥ w R (p * ) necessarily holds) and thus will make, all else being equal, random specialisation more likely to be favoured over coordinated specialisation.

A) B) C)
Figure S1: Alternative modelling assumptions.A) In the model presented in the main text, we assumed that the probability that random specialisers adopt a helper role was equal to the optimal proportion of helpers in the group (p * ) as opposed to the probability that optimises fitness for random specialisers (q * ).In turn, we approximated this optimal proportion of helpers by a continuous variable p.Here we show that all three of these quantities, p * (open circles), p (dashed line), and q * (solid line) converge to the same value as group size (n) increases.Here, = 3/4, leading to p = (1 − 2 )/(2 ).B) We evaluate the conditions c n < c * (coordinated specialisation favoured) or c n > c * (random specialisation favoured), without approximating p * by p (as in the main text).Instead, we evaluate the optimal value p * numerically for each parameter combination considered.C) We evaluate the condition w C (p * ) > w R (q * ) (coordinated specialisation favoured) or w C (p * ) > w R (q * ) (random specialisation favoured), while allowing for the probability that random specialisers adopt a helper role to evolve to the value that maximises the fitness of random specialisers (q * ).In both scenarios B) and C), we find that smaller group sizes (smaller n) and more essential cooperation (higher ) favour coordinated specialisation, in agreement with the results presented in the main text.The cost of coordination is given by (S6) with α = 0.1 and β = 0.01.The red area occurs where division of labour is not favoured, that is when the optimal proportion of helpers is zero.
To find q * for the linear public goods model, we take the derivative of (S12) with respect to q: This derivative is decreasing in q (hence w R (q) is concave) and has a single root q * given by Such root lies in the interval (0, 1) if w R (0) = (2n − n − )/n > 0 holds, or equivalently, if > n/(2n − 1) holds, which we assume in the following.Expression (S20) is increasing in both n and .We also note that and hence (i) p always overestimates q * , but that (ii) the difference between the two values is inversely proportional to n and goes to zero as n grows large.Thus, p approximates q * relatively well for relatively large n, which justifies our use of p as an approximation in section C (see Fig. S1A).
We further note that the derivative (S19) can be understood as a selection gradient on q.As such selection gradient is decreasing in q, the point q * is not only a fitness maximum, but also the value to which evolution by small-step mutations would eventually approach.In other words, q * is a convergence stable strategy [1, 2].
We present numerical results in Fig. S1C, solving for the optimal mechanism by determining the conditions under which either w C (p * ) > w R (q * ) (coordinated specialisation favoured) or w C (p * ) > w R (q * ) (random specialisation favoured) holds.Overall, we find similar qualitative results hold as in the simplified model.

D.3 Non-linear benefits to cooperation
In the main model, we assumed that reproductive fecundity depends linearly on the proportion of helpers in the group.Here, we consider the possibility that there exists a non-linear dependence on the proportion of helpers by modelling reproductive fecundity with the following sequence: where λ > 0 is a parameter controlling the shape of the return from an increasing proportion of helpers.When λ < 1, there is a large initial return from the addition of the first helper, followed by a decelerating rate of return as the proportion of helpers increases (i.e., f k,n is concave; see Fig. S2A).When λ > 1 there is a negligible initial return from the addition of the first helper followed by an accelerating rate of return as the proportion of helpers increases (i.e., f k,n is convex; see Fig. S2B).In all cases (S22) is monotonically increasing from zero to one.We considered the parameter discretisation, n ∈ {2, 4, . . ., 38, 40} and ∈ {0.1, 0.1 + 1 × 0.9/39, . . ., 0.1 + 38 × 0.9/39, 1} and for each combination of parameters, we solved numerically for the optimal proportion of helpers, p * , that maximises G(p).For each combination of parameters, we determined the optimal mechanism (i.e., whether w C (p * ) > w R (p * ) or w C (p * ) < w R (p * ) holds), to ascertain whether accelerating (λ > 1), or decelerating (λ < 1) returns affect the predictions of our analysis.We find that the same broad results hold as in the linear public goods game (see Fig. S2C and S2D).We further find that decelerating benefits can lead to fully coordinated specialisation being favoured even for relatively low essentiality of cooperation (Fig. S2C) but that accelerating benefits leads to fully coordinated specialisation being somewhat more advantageous than decelerating benefits at higher group sizes (Fig. S2D).

E Alternative forms of cooperation
In the main model, we assumed that the fecundity of reproductives depends on the proportion of helpers in the group, k/n.Here, we consider the conceptual underpinnings of this assumption (section E.1) and analyse alternative scenarios where reproductive fecundity depends on the number of helpers in the group, k (section E.2), or on the ratio of helpers to reproductives, k/(n − k) (section E.3).

E.1 Reproductives benefit from the proportion of helpers
There are two key scenarios where reproductive fecundity can be modelled as depending on the proportion of helpers in the group, k/n.In both scenarios, we find that small group sizes (lower n) and more essential cooperation (higher ) favour coordinated specialisation.The red area occurs where division of labour is not favoured, that is when the optimal proportion of helpers is zero.
First, the benefits provided by helpers may constitute a good that is consumed by its beneficiaries (sometimes termed a congestible or rivalrous good), wherein the benefit experienced by one individual proportionally decreases the amount of good that may benefit its neighbours [3, 4].For instance, the secretion of an extracellular product that must be absorbed and digested in order to provide benefits (as happens in populations of B. subtilis and A. cylindrica) can be conceptualised as such a good [5, 6, 7].In particular, if the good is non-excludible (i.e., all individuals use the benefits) then the benefits conferred to each reproductive is the total amount of help (in the linear case, approximated as the number of helpers, k) divided by the number of individuals in the group (n; reproductives and helpers alike) [3].That helpers also partake in the consumption of the public good may be considered a type of soaking [8].
Second, the amount of good that each helper provides may depend on the number of individuals in the group such that, all else being equal, larger groups entail less of a benefit to the group.For instance, helpers in V. carteri beat their flagella to keep the colony afloat at the optimal height in the water column for photosynthesis [9, 10].In this case, the contribution of each helper (the degree to which it helps keep the colony at the right height) depends inversely on the size of the group as larger colonies are more difficult to keep afloat.

E.2 Reproductives benefit from the number of helpers
An alternative assumption is that the fecundity of reproductives depends on the number of helpers in the group, k.This would imply that the contribution of each helper to the total good does not depend on the size of the group and that the benefit conferred to one individual does not decrease the benefits available for another.The benefits provided by helpers in S. enterica infections is an example of such a good (a non-congestible or non-rivalrous good) [3, 4].In this case, helpers trigger a host immune response that wipes out competing microbial strains so that reproductives may then proliferate without competition [11, 12].Thus, the competitive advantage afforded by the host immune response is not used up or depleted by any of its beneficiaries, unless there are so few niches vacated that reproductives then compete with one another.
We would like to know whether we recover similar results as the ones we obtained assuming that reproductives benefit from the proportion of helpers in an alternative biological model in which the fecundity of reproductives depends on the number of helpers in the group, k.To this end we posit the following specific sequence for reproductive fecundity: Replacing this expression into (S3) and by a slight abuse of notation (as G now depends explicitly on the group size n) we obtain Treating p as a continuous variable, we calculate the first derivative of G with respect to p as This derivative is decreasing in p (i.e., G(p) is concave), and has a single root p given by Such root lies in the interval (0, 1) if > 1/(n + 1), which we assume henceforth.
Using arguments similar to the ones we used in section C, it follows that we can approximate the fitness of a coordinated group as while the fitness of a randomly specialising group randomising with probability q can be written as In both cases, we find the same qualitative results as in the main analysis.That is, smaller group sizes (smaller n) and more essential cooperation (higher ) favour coordinated specialisation.The red area occurs where division of labour is not favoured, that is when the optimal proportion of helpers is zero.
Assuming that random specialisers play q = p * , division of labour by coordinated specialisation is favoured over random specialisation if w C (p * ) > w R (p * ) holds.This condition can be written as c n < c * , where the critical cost c * is given by where the approximation follows from assuming p * ≈ p with p given by (S24). 180 The critical cost c * is increasing in , and decreasing in n for relatively large n.This can be easily seen by taking the first forward difference of c * (n), and realizing that this expression is negative for large n.

183
We present some numerical results in Fig. S3A, evaluating when c n < c * (i.e., coordinated specialisation is favoured) or c n > c * (i.e., random specialisation is favoured), where c * is given by (S25).As in the main model where reproductives benefit from the proportion of helpers, we find that smaller group sizes (small n) and more essential cooperation (higher ) favour coordinated specialisation.

E.3 Reproductives benefit from the ratio of helpers to reproductives
In a second alternative scenario, only reproductive individuals benefit from the collective good.In this case, 189 reproductive fecundity is determined by the number of helpers (approximate amount of collective good) divided by the number of reproductives (the number of beneficiaries of the good).This gives the following expression for the fecundity of a reproductive in a group of size n with k helpers: where Replacing this expression into (S1) and simplifying, we obtain the following formula for the fitness of a group of size n with k helpers: This sequence is unimodal (first increasing, then decreasing) in k if > 1/2 holds, which we assume henceforth.
In this case, k * = n − 1 maximises g k,n for all n, so that the optimal proportion of helpers is given by p * = (n − 1)/n.It follows that the fitness for coordinated specialisers is given by Contrastingly, the expected fitness for random specialisers is given by: At this point we deviate from our previous approximations and compare the fitness of coordinated specialisers and the fitness of random specialisers by evaluating the condition w C (p * ) > w R (q * ), where q * is the optimal probability of becoming a helper for random specialisers.We do this because this model leads to highly asymmetric group fitness g k,n , which select for random specialisers to strongly under-weight the probability of becoming a helper (q * p * ; see section F) and therefore the approximation q * ≈ p * is less accurate.
The derivative of w R (q) (S29) with respect to q is given by w R (q) = 2 − 1 − n q n−1 .For > 1/2, this expression has a single root in the interval (0, 1), given by which maximises w R (q).Evaluating (S29) at q = q * and simplifying, we obtain that coordinated specialisation is favoured over random specialisation (i.e., w C (p * ) > w R (q * ) holds) if c n < c * , where We present some numerical results in Fig. S3B, where we graphically show whether c n < c * (i.e., coordinated specialisation is favoured) or c n > c * (i.e., random specialisation favoured) holds, and where c * is calculated according to (S31).We find that that smaller group sizes (small n) and more essential cooperation (higher ) favour coordinated specialisation.
We also find that, when benefits depend on the number of helpers (Fig S3A) and group size is sufficiently small (small n), then less essential forms of cooperation (small ) may still favour fully coordinated specialisation.
In contrast, when the benefits depend on the ratio of helpers to reproductives (Fig S3B) and the essentiality of cooperation is high (high ), then larger group sizes (large n) may still favour fully coordinated specialisation.Furthermore, when benefits depend on the number of helpers (Fig S3A) there seems to be little interaction between and n.That is, for any greater than a certain value (less than 1/2), the threshold value of n seems to be the same.This is not the case when benefits depend on the ratio of helpers to reproductives (Fig S3B ), where larger implies that a larger group size (higher n) is required to favour random specialisation.

F Risk and insurance for random specialisers
Here, we show that more asymmetric fitness functions favour random specialisers to differentially weight their probability of adopting a helper role by making q * considerably different from p * .This represents a form of insurance lessening the probability of forming particularly "risky" (low fitness) group compositions (Fig. S4A) [13, 14].For such asymmetric fitness functions, the approximation that the optimal probability of adopting a helper role is equal to the optimal proportion of helpers is less accurate, and results using the approximation might be biased.
We use numerical simulations to generate random fitness functions without having to make many assumptions about their particular form.We generate 100,000 such fitness functions, some of which, by chance, will be more or less asymmetric than others.For a group size of n, we generate a fecundity sequence f k,n = (f 0,n , f 1,n , . . ., f n,n ) as follows.First, we sample from a uniform distribution (i.e., ∼ U (0, 1)) and set the fecundity of a reproductive in a group with no helpers to f 0,n = 1 − .Then, we draw n numbers h 1 , . . ., h n from an uniform distribution (i.e., h i ∼ U (0, 1), for i = 1, . . ., n), and set the fecundity of a group with k > 0 helpers to This creates a random fecundity sequence that monotonically increases from 1 − to 1. Finally, we obtain the fitness sequence, g k,n , by substituting S32 into S1 (Fig. S4B).
For each fitness sequence g k,n , we numerically determine the optimal proportion of helpers, p * (S4), and the optimal probability of adopting a helper role for random specialisers, q * (S18).We reject any fitness function for which division of labour is not favoured (i.e., for which either p * = 0 or q * = 0 holds).
We measure the asymmetry of the fitness function as follows.We begin by setting which respectively quantify the expected fitnesses of a random specialising group if it undershoots (k < k * ) or overshoots (k > k * ) the optimal proportion of helpers, under the assumption that the probability of becoming a helper is equal to the optimal proportion of helpers (q = p * ).We then use the ratio as a measure of the asymmetry in the fitness function, with γ > 1/2 indicating fitness functions where more is to be gained by undershooting p * and with γ < 1/2 indicating the reverse.Accordingly, an asymmetry ratio of γ = 1/2 indicates that the fitness function is symmetric.The results of this analysis are shown in Figs.S4C and S4D.We find that on average the more asymmetric a fitness function is, the more that random specialisers are favoured to underweight or overweight their probability of becoming a helper, that is, the larger the distance between q * and p * , q * − p * (Fig. S4C).In turn, the larger the optimal deviation in the probability of becoming a helper (the larger |q * − p * |), the more significantly random specialisers stand to gain fitness advantages by evolving to this optimised allocation of labour (q = q * ) (Fig. S4D).This numerical exercise illustrates that in some cases, groups of random specialisers are strongly favoured to underweight or overweight the individual probability of becoming a helper, even though this means that they are less likely to produce groups with the optimal allocation of labour.This occurs so as to minimise the risk of producing groups with a particularly poor allocation of labour and so may be considered a type of insurance strategy akin to the fertility insurance observed in the biased sex ratio of some blood parasites [13, 14].The model in the main text can produce asymmetric fitness functions.However, we find in section D that this phenomenon does not change the qualitative predictions of the model (Fig. S1C).That is, smaller group sizes (small n) and more essential cooperation (higher ) still favour fully coordinated specialisation.Absolute deviation in optimal probability, |p*-q*| More symmetric More asymmetric Optimal proportion of helpers, p* Optimal weighting for random specialisers, q* Figure S4: Risk and insurance for random specialisers.A) With asymmetric fitness functions, a group of random specialisers faces a different expected fitness cost from undershooting than from overshooting the optimal proportion of helpers, p * .This can favour random specialisers to differentially weight the optimal probability of becoming a helper, q * , as a form of insurance.Here we plot an arbitrary asymmetric fitness function, , in which cooperation is essential.B) We randomly simulated 100,000 fitness sequences g k,n for a fixed group size of n = 6, some of which are more asymmetric than others.The 99% confidence interval over our simulations for each permissible proportion of helpers is shown in light grey.C) Across the 100,000 simulations, we found that the more asymmetric a fitness sequence is (i.e., the larger the γ ratio) the more the optimal probability of adopting a helper role, q * , is favoured to deviate from the optimal proportion of helpers, p * .We plotted the best fit line through these data to illustrate the positive dependence (slope = 1.0, intercept = −0.50).D) Across the 100,000 simulations, we found that more significant departures in the optimal probability of adopting the helper role (larger |q * − p * |) led to a larger fitness increase for random groups that employ the optimal strategy (q = q * ).We plotted the best fit line through this data to illustrate the positive dependence (slope = 0.13, intercept = −0.003).This demonstrates selection for such an insurance strategy as a means to avoid "risky" group compositions.

G The optimal level of coordination
In the first analysis of the main text, we assumed that all individuals in a coordinated group interact with one another and so have complete information about the phenotypes of their social partners when specialising.Consequently, coordinated groups are fully coordinated and always contain the optimal proportion of helpers to reproductives, p * .Here we relax this assumption and solve for the optimal level of coordination in two separate analyses, based on different assumptions and using different methodologies.

G.1 Individual based model
Here we model explicitly how increasing the number of cell-to-cell interactions within the group can lead incrementally to a more precise allocation of labour.We consider a costly trait s ∈ [0, 1] (the level of coordination) that is equal to the independent probability that any two cells in the group interact with each other.When a focal individual specialises, it adopts a helper role depending on how the proportion of helpers amongst cells that it interacts with compares to a critical threshold, t ∈ [0, 1].Variables s and t are co-evolving traits in our simulations; they influence both the degree to which groups are coordinated and the degree to which labour is divided (i.e., the realised proportion of helpers, p).As the expected connectivity of the group increases (i.e., higher s, more one-to-one interactions), there is a higher cost of coordination paid by the group.However, this higher cost of coordination may be offset by the increased amount of information afforded to each individual when specialising and thus an increased chance that the group ends up with the optimal proportion of helpers, p * .This trade-off means that in some cases the optimal strength of coordination may in fact be intermediate (0 < s < 1), rather than the full coordination considered in the main analysis (s = 1).
In the following, we outline our individual-based model.First, we show how the simulations are initialised.Second, given a group with a particular level of coordination (s), we show how we determine which individuals interact with one another.Third, given a group with an interaction network and target proportion of helpers (t), we show how the allocation of labour within the group is determined.Fourth, given the allocation of labour, we show how the fitness of each group is determined.Finally, we describe how the individuals in the population compete globally from one generation to the next, and how mutation may affect the trait values of s and t.Results for the evolved level of coordination and target proportion of helpers are depicted in Fig. 4 in the main text and in Fig. S6.

G.1.1 Simulation initialisation
We create a population of approximately N T cells, sorted into groups of n cells.Specifically, we set the number of groups in the population as N g = N T /n , that is, the least integer larger than or equal to N T /n.Thus, the population size, N g n, is relatively constant while the number of groups increases as the group size decreases.This reduces the effect of demographic stochasticity across simulations.All individuals in the population are characterised by their trait values s and t.We assume that all groups are founded by a single asexual individual, and thus that groups are clonal and that all individuals in the same group have the same trait values.At the beginning of the simulation, we assume that all individuals in the population have no propensity for either division of labour or coordination (i.e., s = 0, t = 0).

G.1.2 Coordination network
Given a group with a particular level of coordination, s, we determine the between-cell coordination network by constructing a random graph G(n, s) where the n nodes correspond to the n cells in the group and where each possible edge (representing the interaction between a given pair of cells) occurs independently with probability s [15].Thus, for a given s, the expected number of interacting pairs is sn(n−1)/2.If s = 0, there are no interacting pairs and individuals do not know the phenotypes of any other group members.If s = 1, all individuals interact with all other individuals in the group and know whether they are helpers or reproductives.In between these extremes, individuals may only interact with a subset of the cells in the group (or indeed none at all) and know only partial information about the proportion of helpers in the group.

G.1.3 Individual specialisation
Before we can show how individuals within the group specialise in this framework, we need to establish a few definitions.For a given interaction network, G, let V = {v 1 , . . ., v n } be the set of nodes (individuals or cells) and E be the set of edges, such that e ij ∈ E if individuals v i and v j interact.We define an allocation of labour as the map a : The realised proportion of helpers to reproductives in the group is then given by p = n i=1 a(v i )/n.We denote by N G (v i ) and N G [v i ] the open and closed neighbourhoods of individual v i , respectively.The degree of v i , d i , is the number of cells in its open neighborhood, that is, Using these definitions we can calculate the open proportion of helpers in the neighbourhood of individual v i as p o i = vj ∈N (vi) a(v j )/d i , that is, the fraction of its neighbours that are helpers.Similarly, the closed proportion of helpers in the neighbourhood of , that is, the fraction of its closed neighbours that are helpers.If an individual has no neighbours (d i = 0), then the open proportion of helpers is not defined and the allocation of labour proceeds differently.
For a particular group (given a set of cells V and an interaction network E), we determine the allocation of labour, a, using a threshold model.First, we assume that the group begins with an initial allocation of all reproductives (a(v) = 0, ∀v ∈ V ).This implicitly presupposes that reproduction is the default phenotype of cells.Then for τ ≥ n time steps, we randomly sample with replacement one cell v i ∈ V at a time from the group.We sample with replacement and set τ ≥ n so that a cell that has "committed" to a particular phenotype may still switch if too many of its neighbours have adopted the same choice.This emphasises that the allocation of labour process is continuous in time and that developmental trajectories are plastic/responsive to their environment.Each chosen cell, v i , considers the open proportion of cells in its neighbourhood p o i and chooses to adopt a helper or reproductive phenotype depending on its target proportion of helpers, t.If p o i < t, then the cell adopts a helper phenotype (a(v i ) = 1) as there are fewer helpers in its neighbourhood than its target proportion of helpers.If p o i > t, then the cell adopts a reproductive phenotype (a(v i ) = 0) as there are more helpers in its neighbourhood than its target proportion of helpers.If the open proportion of helpers is equal to the target proportion of helpers (p o i = t), then the cell adopts either phenotype with equal probability.If the individual has no neighbours (d i = 0), then the individual behaves as a random specialist with helper probability equal to the target proportion of helpers (q = t).

G.1.4 Fitness calculation
Once labour is allocated within the group, we calculate the fitness of the group using the fitness equation of the linear public goods model (S11), with the cost of coordination given by equation (S6), where the cost of coordination depends directly on the expected number of cell-to-cell interactions (sn(n − 1)/2) rather than the realised number of cell-to-cell interactions because the former captures the effort that each individual puts into coordination instead of how successful its efforts actually were, and so is under evolutionary control.We consider alternate formulations for the cost of coordination in the subsequent subsection (G.2).

G.1.5 Creating a new generation
Each generation, we create the new population by sampling group founders from individuals of the previous generation.That is, we randomly pick individuals with a weighting equal to the relative fitness of each group.The process repeats N g times with replacement.Each sampled individual forms a group of n individuals with inherited trait values s and t.When the founder individual is sampled, we assume that there is a chance µ of having a mutation in one of the traits.If a mutation happens, the trait value is perturbed by adding a normallydistributed random number, δ ∼ Normal(0, σ 2 ), where σ 2 is the variance in the size of the mutation, truncated between zero and one.Over time, the trait values of s and t across the population evolve towards an evolutionarily stable state.

G.1.6 Simulation results
We perform simulations for each combination of group size n ∈ {2, 4, . . ., 38, 40} and essentiality of cooperation ∈ {1/2, 1/2 + 1/38, . . ., 1/2 + 18/38, 1}.For each simulation we determine the number of groups as N g = N T /n , where N T = 10, 000.All individuals are initialised with no coordination or no propensity for division of labour (i.e., s = t = 0).Next, we proceed in each generation by determining the interaction network of each group (given its strength of coordination, s), and then each group's allocation of allocation of labour (given its target proportion of helpers, t).We then calculate the fitness of each group and sample individuals for the next generation with a probability proportional to fitness.We assume that mutations occur with probability µ = 0.001 and that the expected variance in the mutation size is σ 2 = 0.1.We set α = 0.1 and β = 0.01 (parameters of the cost of coordination).
The results of the simulations are shown in Fig. 4D in the main text and in Fig. S5, where we examined 400 combinations of group size and essentiality and we ran the simulation 10 times for each parameter combination, and where each simulation lasted 30,000 generations.For each simulation, we denoted ŝ and t as the average trait values across the population for a given generation.We denote s and t as the across-simulation average of ŝ and Figure S5: Optimal target proportion of helpers.We present the target proportion of helpers, t * , that evolves in the same simulations used to produce Fig. 4D in the main text.We find that higher essentiality of cooperation (higher ) leads to a higher proportion of helpers but that group size (n) has little effect on the target proportion of helpers.
t for each generation (See Fig. 4B-C in main text).Finally, the evolutionary outcomes, s * and t * , are estimated as the average s and t in the last 3,000 generations of the simulations.We show the evolutionary outcome s * as the main results (Fig. 4D in the main text) but also plot 10 individual time series of ŝ and t and the acrosssimulation average s and t (Fig. 4B-C in the main text).To account for simulation stochasticity, we categorised random specialisation as any strategy for which s * < 0.1, fully coordinated specialisation as any strategy for several times during the course of the labour allocation.It may seem at first that this sort of phenotypic switching is not something that is observed in the natural world during the development of a social group.However, we do not envision here that individuals are actually differentiated until the end of the allocation of labour stage.Instead, while the allocation of labour is ongoing, the individuals are only committing to a developmental pathway and "signalling" to their partners which phenotype they "intend" to adopt.As the individuals in their neighbourhood do the same, it may be that an overabundance of one phenotype or the other is sufficiently strong to push the focal 375 individual into an alternate pathway.If we were to choose individuals in a random order without replacement, such that each individual only had one opportunity to adopt a helper role, this would introduce a systematic bias towards more helpers in the allocation of labour.

G.2 Alternate costs of coordination
Throughout the presentation of our results, we have used the particular function for the cost of coordination given by (S6), with α = 0.1 and β = 0.01.As it has been pointed out, the qualitative predictions (1-3) in the main text do not depend on the specific form that c n takes (so long as it is increasing in n).However, the optimal level of coordination that evolves might be sensitive to the form of the coordination cost (Fig. 4 in the main text).Here, we test the robustness of our results in Fig. 4 of the main text and section G.1 by repeating the simulations of section G.1 with alternate forms of coordination cost, c n , to determine how the optimal level of coordination is affected.
We consider four functional forms for the costs of coordination c n that are each increasing in the expected number of cell-to-cell interactions y = sn(n − 1)/2.Function I is a Type I functional response characterised by a linear increase in cost up to a maximum group size beyond wich the cost is equal to one, c n (y) = min(1, χy).Second, we consider three functional forms characterised by an increasing but decelerating increase in cost, possibly up to maximum group size beyond which the cost is equal to one.Function IIa is the exponential form c n (y) = α(1 − e βy ), which we have employed in the analysis thus far (Fig. S6A).We set function IIb as c n (y) = min(1, ψ(ξy) ψ ) (Fig. S6A).Finally, function IIc is a particular kind of type II response function: c n (y) = min(1, θy/(1 + φy)) (Fig. S6A).
For each functional form, we ran simulations to determine the optimal level of coordination, s * .The results of these simulations are shown in Fig. S6B-E.Throughout we find that smaller groups (small n) and high essentiality of cooperation (high ) favour coordinated specialisation over random specialisation, with intermediate forms of specialisation (0 < s * < 1) favoured to evolve at the interface between random specialisation (s * = 0) and fully coordinated specialisation (s * = 1).This is all in qualitative agreement with the findings of the main text.
We also find that the effect of the shape of coordination costs on the optimal mechanism depends on the functional form of the costs.When costs are set by function IIc, we found little effect on whether coordinated specialisation is favoured and the level of coordination that evolves (Fig. S6E), relative to linear costs.In constrast, when costs were determined by function IIa or IIb, we found both (i) a smaller likelihood of coordinated specialisation (s * > 0) evolving and (ii) a relatively smaller chance that coordination will be intermediate (0 < s * < 1), relative to full coordination (s * = 1) (Fig. S6C and D).
The principal difference between function IIc and functions IIa and IIb is on the up-front costs of coordination.
For both functions IIa and IIb, there is a large up-front cost to coordination before any coordination can evolve (Fig. S6A).In contrast, function IIc only has an initially linear cost of coordination (Fig. S6A).One consequence of a large up-front cost of coordination is that it is harder to evolve any coordination at all (s * > 0).Another consequence is that intermediate coordination (0 < s * < 1) may be less stable.Indeed, if coordination costs saturate quickly after a large up-front cost (as is particularly the case for function IIb, Fig. S6A), then it can be more advantageous for coordinating individuals (s > 0) to adopt full coordination (s * = 1), having already paid a bulk of the costs associated with the mechanism.This can make the evolution of intermediate coordination comparatively rarer (Fig. S6C and D).The degree to which there are large up-front costs to coordination or how quickly costs saturate in the natural world will depend on the biology of specific species and the particular way that they might coordinate.

G.3 Heuristic analytical model
In the main analysis (section C), we found that the fitness of a group of random specialisers with probability q = p * of producing helpers may be calculated as w R (p * ) = G(p * ) − Var(P * ) (equation S13).That is, the fitness of a group of random specialisers is equal to the maximum possible fitness, G(p * ), reduced by the hidden cost of random specialisation, which is proportional to the variance in the proportion of helpers across randomly specialising groups, Var(P * ) = p * (1−p * )/n.Here, we sketch a simple, heuristic extension of this model with an evolving level of coordination z ∈ [0, 1] that we assume has two opposing effects: (i) on the one hand, an increase in z increases the total cost of coordination, and (ii) an increase in z leads to a linear reduction in the hidden cost of random specialisation.To this end, we posit the fitness function where all groups have the optimal proportion of helpers, p * .We deliberately do not specify how increasing We estimate relative group size by considering two key factors: how many individuals occupy a colony or aggregate of the species being considered, and how the benefits of cooperation are conferred to social partners.
Fruiting bodies of D. discoideum and M. xanthus, populations of B. subtilis, V. carteri colonies, S. enterica coinfections and A. cylindrica filaments are potentially composed of hundreds to thousands of cells [5, 6, 7, 9, 10,     11, 12, 16, 17, 18, 19].However, in some cases the benefits of cooperation are diffusive and so the effective radius of cooperation might be much smaller.For instance, protein degrading proteases diffuse locally in B. subtilis populations and fixed nitrogen must diffuse along the filament in A. cylindrica [5, 6].Due to the geometry of the populations, this may lead to a medium sized effective social group in the former case (diffusion in multiple axes) and a small one in the latter (diffusion along one axis).In the other cases, the benefits of cooperation might be conferred directly to beneficiary cells, such as in S. enterica, V. Carteri and in the fruiting bodies, and so the effective size of the social group could be very large [7, 9, 10, 11, 12, 16, 17, 18, 19].We approximate the relative essentiality of a trait by estimating whether reproductive cells would survive and proliferate in the absence of helpers.In S. enterica co-infections, the action of suicidal helpers only provides a competitive advantage and so might not be 'essential' [11, 12, 16].In B. subtilis and A. cylindrica, the amount of degraded protein or fixed nitrogen in the environment is growth rate-limiting and so helpers might only provide an advantage to the replication rate of reproductive cells (non-essential cooperation) [5, 6].Fruiting bodies of M.
xanthus, and D. discoideum form when there are insufficient nutrients locally and so cells that do not partake in the complex cannot disperse to more favourable environments (more essential cooperation) [17, 18, 19].Finally, V. carteri colonies that cannot stay afloat at the proper height in the water column may not receive enough light for photosynthesis, leading to cell death (essential cooperation) [9, 10].
We estimate the relative cost of coordination by considering two key factors: how close are cells to each other (attached or not attached), and how complex are the cells (how metabolically taxing would coordination be).For example, populations of S. enterica and B. subtilis are free-floating and so a signalling system would require a dispersed product [5, 11, 12].Furthermore, they are both bacteria and so the relative metabolic costs could be quite high.In contrast, M. xanthus and D. discoideum form a multicellular slug before fruiting body formation and so coordination between cells would occur over a shorten distance [17, 18, 19].Further, D. discoideum is a eukaryote and so the metabolic costs of signalling may be reasonably low relative to the higher costs of operating as a complex cell.Some special cases would lead to even lower costs of coordination.In V. carteri, offspring colonies form within the mother germ cell and so parental influence on cell fate means that no intercellular signalling system is necessary [9, 10].In A. cylindrica, cells are attached to each other and the amount of fixed nitrogen in the environment serves as a cue for whether a cell should differentiate as a helper cell.Thus, the cost of coordination might only be due to the cost of detecting the amount of fixed nitrogen in the environment [6].

Figure 1 .
Figure 1.Different mechanisms for division of labour in nature.A) In honey bee hives (Apis mellifera), larvae develop as sterile workers unless they are fed large amounts of royal jelly by workers (coordinated specialisation) 1 .(Photo by Wausberg via the Wikimedia Commons.)B) In

Figure 2 .
Figure 2. Mechanisms to divide labour in social microorganisms.We seek to determine the relative advantages and disadvantages of the two key mechanisms for reproductive division of labour in social microorganisms.(A) Random specialisation occurs when cells randomly specialise into helpers or reproductives independently of one another.This can occur when a genetic feedback circuit is used to amplify small molecular fluctuations in the cytoplasm (phenotypic noise) 4,24-28 .While potentially cheaper as a mechanism, random specialisation can lead to groups that deviate significantly from the optimal proportion of helpers and reproductives.(B) Coordinated specialisation occurs when cells interact with one another, and so share (or gain) phenotypic information while they are differentiating.This can arise through the secretion and detection of extracellular molecules (signals or cues), via the intermediary of

Figure 3 .
Figure3.Random specialisation versus fully coordinated specialisation.We find that small group sizes (lower ݊), more essential cooperation (higher ߳), and low costs to coordination (lower ܿ ) favour division of labour by fully coordinated specialisation (dark) over division of labour by random specialisation (white).We modelled the cost of coordination as the increasing and decelerating function ܿ ൌ ߙ൫1 െ ݁ ିఉሺିଵሻ ଶ ⁄ ൯, where ݊ ሺ݊ െ 1ሻ 2 ⁄ is the number of cell-to-cell interactions in a coordinated group and 0 ൏ ߙ ൏ 1 and ߚ 0 are parameters affecting the scale and the shape of the cost of coordination, respectively.Here, ߚ ൌ 0.01, and ߙ ൌ 0.1 (solid boundary) or ߙ ൌ 0.025 (dashed boundary).

Figure 4 .
Figure 4.The optimal level of coordination.(A) If the level of coordination may evolve, then a spectrum of possible mechanisms could arise.At one extreme ‫ݏ(‬ ൌ 0), individuals may not coordinate at all (random specialisation).At the other extreme ‫ݏ(‬ ൌ 1), all individuals coordinate with one another and the variance in the realised proportion of helpers is minimised (fully Figure S1.Alternative modelling assumptions

Figure S2 :
FigureS2: Non-linear public goods.We consider an extension of the model allowing the fecundity of reproductives to depend non-linearly on the proportion of helpers in the group.A) If λ < 1, then the presence of helpers provides an initially large increase in the fecundity of reproductives but then only provides benefits at a decreasing rate as the proportion of helpers goes up (decelerating cooperation; here λ = 2/3).B) If λ > 1, then the presence of helpers provides no initial increase in the fecundity of reproductives but then provides benefits at an increasing rate as the proportion of helpers goes up (accelerating cooperation; here λ = 3/2).In both A) and B) we have set = 1/2 (non-essential cooperation) and = 1 (essential cooperation).C) and D) We numerically evaluate the conditions in which w C (p * ) > w R (p * ) (coordinated specialisation favoured) or w C (p * ) < w R (p * ) (random specialisation favoured) holds while assuming either that λ = 2/3 (decelerating cooperation; panel C). or λ = 3/2 (accelerating cooperation; panel D).The cost of coordination is given by (S6) with α = 0.1 and β = 0.01.In both scenarios, we find that small group sizes (lower n) and more essential cooperation (higher ) favour coordinated specialisation.The red area occurs where division of labour is not favoured, that is when the optimal proportion of helpers is zero.

Figure S3 :
Figure S3: Alternative biological scenarios.A) We consider a scenario where the fecundity of reproductives depends on the number of helpers in the group, evaluating when c n < c * (coordinated specialisation favoured) or c n > c * (random specialisation favoured) holds, and where c * is calculated according to equation (S25).B)We consider a scenario where the fecundity of reproductives depends on the ratio of helpers to reproductives, evaluating when c n < c * (coordinated specialisation favoured) or c n > c * (random specialisation favoured) holds, and where c * is calculated with equation (S31).The cost of coordination is given by equation (S6) with α = 0.1 and β = 0.01.In both cases, we find the same qualitative results as in the main analysis.That is, smaller group sizes (smaller n) and more essential cooperation (higher ) favour coordinated specialisation.The red area occurs where division of labour is not favoured, that is when the optimal proportion of helpers is zero.
w(z) = (1 − c n z) (G(p * ) − (1 − z) Var(P * )) , (S33)where c n is the proportional cost of increased coordination.Notice that z = 0 recovers the fitness function of the random specialisation mechanism, and z = 1 recovers the fitness function of the fully coordinated mechanism,

Figure S6 :
Figure S6: The shape of the cost of coordination.We show how the optimal level of coordination, s * , depends on the form of the coordination costs, c n (y) as a function of y = sn(n − 1)/2, the expected level of coordination.A) The four functional forms for the cost of coordination we explore.Function I is given by c n = min(1, χy) (χ = 0.0002); function IIa is given by c n = α(1 − e βy ) (α = 0.1 and β = 0.01); function IIb is given by c n = min(1, ψ(ξy) ψ ) (ξ = 0.0002 and ψ = 0.1); function IIc is given by c n = min(1, θy/(1 + φy)) (θ = 0.0002 and φ = 0.001).B) The evolved level of coordination, s * , when the cost of coordination is given by function I. C) The evolved level of coordination, s * , when the cost of coordination follows function IIa.D) The evolved level of coordination, s * , when the cost of coordination follows function IIb.E) The evolved level of coordination, s * , when the cost of coordination follows function IIIc.Across all functional forms, we find that smaller groups (lower n) and more essential cooperation (higher ) favours coordinated specialisation.The degree to which intermediate forms of coordination (0 < s * < 1) can evolve depends on the shape of the coordination cost.