# Motifs in undirected networks We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I have an undirected network and I am currently analyzing both three and 4 node motifs in the network. However, I can't find any articles which describe the biological significance of 3 of 4 node motifs in my network. Comparing with randomly generated networks, I am able to deduce that the number of three node motifs and 4 node motifs are significantly higher. However, I am unable to infer any biological significance for this finding.

I am not 100% sure that I understand the question, but I am going to try to answer, based on the following assumptions:

1. The "number of 3 and 4 node motifs" is not very clear. If I understand correctly, it should be a quantity determined in large part by the degree distribution. You could rewire your network to lose all information about the TRN other than degree distribution and this number is probably very similar (or it is a quite weird network, which would make me suspicious of the data).
2. Therefore, the number of such motifs of size \$k\$ is not particularly interesting biologically, specifically due to technical issues such as incomplete ascertainment of edges, etc. The true network may have many more edges, and this will change this number, so the degree distribution is not itself interesting- it is in fact the thing you want to control for.
3. Therefore, when you compare with randomly generated networks and find different numbers of motifs of \$k\$ nodes, I suspect the random networks were generated with a different degree distribution. (It is pretty easy to generate networks of the same degree distribution, using e.g. the`rewire()`function in igraph)
4. Therefore, I interpret the question to be more specifically: "Among all 3 and 4 node motifs, a subset are overrepresented in my TRN compared to randomized graphs of the same degree distribution. What are some ways to interpret these motifs?" This is the kind of question that is traditionally asked with TRNs in my experience.

As an example of how other groups have analyzed TRN motifs, I suggest looking at Figures 5 and 6 of this paper. For instance, the "feed-forward" motif is overrepresented for links involving some TFs. I believe that it is standard to compare motif distributions to other biological networks, such as the C. elegans neural network. There are references in the Cell paper that could probably help you out further. The wikipedia article on network motifs also looks to have some information, and there are other resources on the internet if you google.

While there is not necessarily a strict biological interpretation of such motifs, they nonetheless can be informative in identifying specific patterns peculiar to different TFs or master regulators.

Caveat It is possible that I have misunderstood the question or made bad assumptions- my graph theory is superficial. But if I am wrong and you are actually interested in the simple number of motifs of size \$k\$, that observation is of interest as a "I have a weird graph" theoretical problem, not as a "this is biologically interesting" problem. As a biologist I would care much more about which specific motifs of size \$k\$ you have, rather than the summed count of all motifs of size \$k\$.

On the other hand, forgetting about motifs for a moment, a "weird" network could be quite topologically interesting biologically. For instance, do your different clustered components associate with different pieces of biology, like sugar metabolism vs. morphogenesis? That would be expected, but it has very little to do with motifs- they may just be a side effect of that functional topology. In that case, it would be not only degree distribution but also that topology that you would have to control for to make interesting statements about motifs.

good luck!

## Biological network motif detection and evaluation

Molecular level of biological data can be constructed into system level of data as biological networks. Network motifs are defined as over-represented small connected subgraphs in networks and they have been used for many biological applications. Since network motif discovery involves computationally challenging processes, previous algorithms have focused on computational efficiency. However, we believe that the biological quality of network motifs is also very important.

### Results

We define biological network motifs as biologically significant subgraphs and traditional network motifs are differentiated as structural network motifs in this paper. We develop five algorithms, namely, E DGE GO- BNM , E DGE B ETWEENNESS-BNM , NMF- BNM , NMFGO- BNM and V OLTAGE-BNM , for efficient detection of biological network motifs, and introduce several evaluation measures including motifs included in complex, motifs included in functional module and GO term clustering score in this paper. Experimental results show that E DGE GO- BNM and E DGE B ETWEENNESS-BNM perform better than existing algorithms and all of our algorithms are applicable to find structural network motifs as well.

### Conclusion

We provide new approaches to finding network motifs in biological networks. Our algorithms efficiently detect biological network motifs and further improve existing algorithms to find high quality structural network motifs, which would be impossible using existing algorithms. The performances of the algorithms are compared based on our new evaluation measures in biological contexts. We believe that our work gives some guidelines of network motifs research for the biological networks.

## Introduction

Complex relational systems from different domains, such as biology, sociology or economics, can be systematically analyzed using their network representations. A network (also known as a graph) is composed of nodes and edges, where nodes represent the entities in the system and edges represent the relationships between these entities. Depending on the type of represented relations, the node pairs that form the edges can have a certain ordering, in which case the resulting network is called directed. For example, in networks of biological neurons and synapses (also known as neuronal connectomes ), the nodes correspond to individual neurons, while directed edges between the nodes (typically) represent the existence of chemical synapses that enable communications between neurons . The wiring patterns of networks cast light on the functional mechanisms of the analyzed complex systems, and therefore, network structure analysis is gaining increasing interest from different disciplines.

However, many network analysis problems are computationally intractable . Therefore, the only available solutions are based on approximations to the exact solutions of these problems. Network properties that describe different wiring characteristics of networks are used for this purpose. For example, given two networks without any labeling on the nodes, the problem of finding all the node pairs that have identical wiring patterns in the two networks is a computationally intractable problem. However, this problem can be simplified by computing the degrees (i.e., the number of neighbors a node has) of all nodes and using the degree statistics to compare the nodes. Even if the resulting matches are not guaranteed to have identical wiring patterns, these matches would extensively reduce the size of the search space. The search space can be reduced even further by computing other network properties that capture different types of interaction patterns e.g., using the similarities of clustering coefficients that measure the tendency of nodes to form triangular interactions .

Different subgraphs of a network can be obtained from different subsets of its nodes and edges. Many of the network properties are indeed dependent on the subgraph properties of the networks e.g., clustering coefficient is defined based on three-node subgraphs of a network in which all nodes are connected with each other forming a triangle. In a connected subgraph, all nodes are reachable from any of the other nodes in the subgraph. A subgraph is induced (also known as node induced) if it is enforced that all the edges between the chosen subset of nodes are included in the subgraph. The subgraphs that do not carry the induced property are called partial (also known as edge induced) subgraphs. For example, a 3-node clique contains 3 different two-path subgraphs (two-path subgraphs are those that contain 3 nodes and 2 edges) when partial subgraph properties are considered. However, such a graph does not contain any two-path subgraphs when induced subgraph properties are considered.

Triangular patterns in networks are commonly utilized to analyze the network topology. In undirected networks, the clustering coefficient of a node is calculated by dividing the number of triangles around the node by the number of different pairs of its neighbors . Average clustering coefficient explains the clustering (triangulation) within a network by averaging the clustering coefficients of all its nodes. Extension of clustering coefficient to directed networks is not trivial since there are two different types of triangular directed subgraphs one being a cyclic subgraph (m =𠂥 in Figs. 1 and ​ and2) 2 ) and the other being an acyclic subgraph (m =𠂩 in Figs. 1 and ​ and2). 2 ). Based on the counts of the four distinct node roles on these two subgraphs (i.e., and 18 in Figs. 1 and ​ and2), 2 ), the definition of clustering coefficient has been extended to the directed case , . A different metric for quantifying network clustering known as transitivity is calculated by considering every possible combination of three nodes in a network, and counting how many of these triads are mutually connected by three edges, normalized by the number of triads with at least two edges . It is similar to clustering coefficient but unlike that metric, it is not an average of local node-specific clustering. Transitivity is typically used for undirected networks rather than directed ones, but an expression for directed transitivity is given in . A directed network is assumed. The numerical label for each motif (denoted with the label m) is identical to that used in . Each distinct motif-role within each motif is denoted by different colours, and the numerical label next to each node. The numerical label provided for each motif-role is represented by the label in the text and in Fig. 2, where . The first column depicts the 9 distinct roles on functional motifs. Each row shows each three-node motif in which the corresponding role appears (indexed by ), and the plurality with which motif-role appears within motif (see Methods). Black filled circles indicate the nodes in motif that play motif-role (see also Fig. 1). The equations shown for each role, r, are the entries of the functional motif-role fingerprint matrix, , where denotes the Hadamard product, is an unit column matrix, is the identity matrix, and is the matrix of reciprocal edges.

Recent work on network properties use the statistics of all observable connected subgraph configurations as detailed descriptors of the wiring in networks , . Network motifs were originally defined as the partial subgraph patterns of a network that appears more frequently than expected from a ’null-hypothesis’ network model that preserve the input network's degree distribution, or other statistical properties , –. Network motifs are defined for both directed and undirected networks, covering all observable subgraphs patterns on sets of nodes ranging in dimension from 2 to n. Network motifs have been used to analyze network structures of a wide-range of networks, such as those of the neuronal connectome of C. elegans –. Practically, network motif analyses are performed with 3-node subgraph patterns due to the high computational cost of null model generation step for larger subgraphs all directed 3-node subgraph patterns are illustrated in Fig. 1 .

Another group of network properties that are based on subgraph counts have been studied in the context of graphlets—these are small, connected, non-isomorphic and induced subgraphs of a large network . There are three major differences between network motifs and graphlets:

network motifs account for partial subgraphs while graphlets are based on induced subgraphs

network motifs are dependent on a given null network model while graphlets are completely independent from any null hypotheses and

graphlets are defined only for undirected graphs while network motifs are defined also for directed graphs.

The number of times that each graphlet appears in a network describes the network's topology . Currently, the most advanced method for describing the topology of an undirected network is based on the dependencies between different graphlets .

Subgraph properties are not only useful for describing the topology of networks, but they can also be utilized for describing the local wiring around nodes. For instance, degree describes the wiring around a node by counting the number of edges touching the node. Replacing edges with subgraphs of each kind in this definition, the local wiring around a node can be described by the number of subgraph patterns that the node participates in. While these subgraph statistics on nodes can be computed without imposing any orientations on the subgraphs , , a node's role in the network can be characterized more accurately by introducing such orientation constraints based on the symmetries within the subgraphs –. For example, as illustrated in Figure 1 of , and Fig. 1 here, there are 30 unique motif-roles on the 3-node directed subgraph configurations. Przulj  identifies the orbits (i.e., the nodes that have identical wiring patterns within graphlets) of all 2- to 5-node graphlets and uses these orbits to describe the wiring around a node by defining graphlet degree, which is the number of graphlets that touch a node at an orbit. Furthermore, the vector containing the graphlet degrees of all 73 orbits of 2- to 5-node graphlets is termed the graphlet degree vector and successfully applied for identifying the wiring similarities between the nodes of a network, and also, between the nodes of different networks , . It has been argued that analysis of neuronal connectome data will need to take into account node-referenced heterogeneity –, such as measured by graphlet degree. Another possible application is in the analysis of genetic networks .

The terminology on subgraph properties is not well-defined, with some studies using the terms “subgraphs”, “network motifs” and “graphlets” interchangeably. In order to avoid confusion, we use the term “functional motifs” to represent the partial subgraph properties (e.g., network motif properties defined in ), and “structural motifs” to represent the induced subgraph properties (e.g., graphlet properties defined in , ) in a consistent manner with . Structural motifs quantify anatomical building blocks, whereas functional motifs represent elementary processing modes of the networks . This distinction between structural and functional subgraph properties have different implications for neuronal networks: structural motifs describe all synapses amongst a specific subset of neurons. In contrast, functional motifs can describe, for example, potential patterns of actual synaptic activations occurring (near) simultaneously amongst a specific subset of neurons. It is expected to observe correlation between structural and functional subgraph properties to some extent. Even though this is the case, the wiring characteristics that can be captured by these two types of subgraphs differ. For example, a node's importance in the networks as a 𠆋roker’ (e.g., in Fig. 2 ) can only be captured by structural motifs since functional motifs consider also the cases that the node appears as roles or 19 ( Fig. 2 ). In these cases, the reference node is not a broker because of the edge between the two other nodes.

For both structural and functional motifs, we consider four different types of subgraph frequency derived network properties, as follows:

Global Metrics: These metrics aim to describe the topology of an entire network.

Motif Counts: A network's topology can be described by the number of subgraphs that appear in the network. We use the term motif counts to represent these networks statistics. Different from the original definition of network motifs  (but consistent with usage in ), our motif statistics are independent of any comparison to null-hypothesis network model. For a given network, the corresponding motif counts form a M dimensional vector, each value representing the count for one of the M subgraphs.

Motif-Role Counts: A network's topology can also be described in terms of the roles within subgraphs. We use the term motif-role counts to represent the number of times that a given motif role appears in a network. Motif-role counts can be directly obtained by scaling the motif counts depending on the number of times the motif-role appears within the corresponding subgraph. For a given network, the corresponding motif-role counts form an L dimensional vector, each value representing the number of times one of the L node roles appears in the network.

Node-referenced Metrics: These metrics aim to describe the local topology around a node in the network.

Motif Fingerprints: The wiring around a node in a network can be described by the number of subgraph patterns that it participates in, independent of the position (i.e., the role) on these subgraphs. Such statistics have been termed motif fingerprints , . For each of the N nodes in a given network, the corresponding motif fingerprints are M dimensional vectors, each value corresponding to count of one of the M subgraphs that the node participates in.

Motif-Role Fingerprints: The wiring around a node in the network can be described at a finer detail by the number of subgraphs that touches the node at a specific orientation (i.e., on a node-role within the subgraph). We term such statistics as motif-role fingerprints. For each of the N nodes in a given network, the corresponding motif-role fingerprints are L dimensional vectors, each value corresponding to the number of subgraphs that touches a node at one of the L node-role positions.

In this study, we explore the relationships between all these different types of subgraph statistics (see Fig. 3 ). First, we present efficient ways of calculating the functional motif-role fingerprints of a given directed network. Second, we show that structural motif statistics can be derived from functional motif statistics and vice versa. This transformation enables efficient computation of structural motif-fingerprints which are computationally more expensive to obtain. Third, we show that the motif-role fingerprints are the most fundamental and informative of all the other subgraph metrics. We identify the transformations that derive all other subgraph statistics (i.e., motif fingerprints, motif-role counts, motif counts) from the motif-role fingerprints. Fourth, we discuss the relationships between motif-role fingerprints and directed clustering coefficients and transitivities, and show how these can be derived from motif-role fingerprints. Finally, we illustrate applications of these transformations on the neuronal connectome of c. elegans. Arrows indicate that metrics can be derived from other metrics and numbers in brackets refer to equations in the text that mathematically describe these dependencies. The left side of the figure lists metrics that count subgraphs, while the right side shows metrics that are ratios of subgraph counts. The top half of the figure shows metrics that are node-referenced subgraph counts, while the bottom half shows metrics that are global subgraph counts.

### Visualize a multi-level network

The following network is an example network from an empirical analysis of wetlands management in Switzerland. It consists of two levels - one level specifies a network of relations between actors. A second level specifies a network of relations between different activities occurring in the wetland, based on causal interdependence among activities. Links between the levels specify which actors carry out which activities.

It is possible to specify layouts for every network level separately. Below, one level is plotted based on a circle layout, the second one based on Kamada-Kawai. motifr provides a reliable starting point for multi-level network visualization but is focused on motif analyis at its core. For advanced visualization of multi-level networks we recommend pairing ggraph and graphlayouts. This blog post provides an excellent introduction.

### Selecting motifs

See the vignette on the motif zoo ( vignette("motif_zoo") ) for details on nomenclature for motifs (motif identifier strings). We highly recommend the use of two helper functions implemented in motifr to ensure that the software interprets the motif identifier provided as intended by the analyst.

use explore_motifs() to launch a shiny app where all motifs implemented for analysis with motifr can be displayed. You can pass your own network to explore_motifs() to see what motifs mean exactly for your data. For example, if your network is stored in a object named my_net with a level attribute lvl you can explore motifs within it interactively using explore_motifs(net = my_net, lvl_attr = "lvl") . Be aware that if your network does not contain a specific motif, it cannot be displayed.

check a specific motif of interest using show_motif() , which will either illustrate the motif in a dummy example network or, if you pass a network object to the function, in your network. show_motif() is specifically helpful to explore the impact of position matching (see vignette("motif_zoo") for more details).

### Count motifs

Motifs can be counted using the versatile function count_motifs() . It takes as parameters a statnet network or igraph graph object (use ml_net or dummy_net provided by this package as examples) and a list of motif identifiers (see below) specifying the motifs.

Let’s quickly check out two classic examples of three-node, two-level motifs (open and closed triangles) in the wetlands management network introduced above:  Let’s count the number of of these motifs in the entire network.

An exploratory approach can be taken by calling motif_summary() . This function counts the occurrences of a couple of basic motifs. Furthermore it computes expectations and variances for the occurrence of these motifs in a modified Erdős-Rényi or so-called “Actor’s choice” model. See the package vignette("random_baselines") for details.

### Identify gaps and critical edges

motifr makes it possible to identify gaps and critical edges in multi-level networks. This is motivated by theories of functional fit and misfit in networks, which posit that certain motifs are especially valuable for network outcomes (depending on the context).

In relation to gaps, we can therefore try to identify potential edges that would create a large number of a given motif if they were to exist (“activated” or “flipped”). The number of such motifs created by an edge is their contribution. For example, we can get all edges that would create closed triangles ( "1,2[II.C]" ), including the information about how many such triangles they would create for the wetlands case study network:

We can also plot these gaps in various ways in our network, including the option to only look at gaps above a certain weight (contribution) and different levels of focus to only show nodes involved in such gaps. Here again for the wetlands management network, only showing gaps with a weight above 5 and subsetting the level where we analyze gaps to only contain nodes involved in gaps. identify_gaps has a sibling in critical_dyads . Critical_dyads works in reverse to identifying gaps - it analyses for every existing edge how many instances of a given motif would disappear if the edge was to be removed. Below an example showing critical dyads in a plot of the full wetlands management example network. ### Comparing motif occurrence to a baseline model

Motifr can be used to simulate a baseline of networks to compare against. Motif counts in an empirical network can then be compared to the distribution of motif counts in the networks simulated from the baseline model. Four different ways of specifying models for baseline distributions are implemented in motifr, from a basic Erdős–Rényi model to the possiblity of supplying an exponential random graph model (ERGM) fit to draw simulations from. See the vignette("random_baselines") for details.

As an illustration, we simulate networks from a “Actor’s choice” baseline model here as a baseline to compare counts of open and closed triangles in the wetland management network against. This model keeps all ties fixed except ties on a specifc level. On this level (here set by setting level to 1, which is the actor level in this network), ties are allowed to vary based on a fixed probability (Erdős-Rényi) model.

We find that open triangles occur much less frequently and closed triangles much more often than in the baseline model.

This is an unsurprising result - everything else would have been concerning. It indicates that actors tend to close triangles across levels to other actors working on the same wetland management tasks much more often compared to what would be expected if they just chose random collaboration partners. We would expect such “fit to task” in a network of professional organizations working in wetland management. We highlight this interpretation because we want to stress that baseline models need to be judged very carefully for what they represent substantially. This is why motifr allows for a variety of baseline model configurations, (including fitted ergm objects).

## Identification of important nodes in directed biological networks: a network motif approach

Identification of important nodes in complex networks has attracted an increasing attention over the last decade. Various measures have been proposed to characterize the importance of nodes in complex networks, such as the degree, betweenness and PageRank. Different measures consider different aspects of complex networks. Although there are numerous results reported on undirected complex networks, few results have been reported on directed biological networks. Based on network motifs and principal component analysis (PCA), this paper aims at introducing a new measure to characterize node importance in directed biological networks. Investigations on five real-world biological networks indicate that the proposed method can robustly identify actually important nodes in different networks, such as finding command interneurons, global regulators and non-hub but evolutionary conserved actually important nodes in biological networks. Receiver Operating Characteristic (ROC) curves for the five networks indicate remarkable prediction accuracy of the proposed measure. The proposed index provides an alternative complex network metric. Potential implications of the related investigations include identifying network control and regulation targets, biological networks modeling and analysis, as well as networked medicine.

### Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

### Figures

Figure 1. A real-world biological network and…

Figure 1. A real-world biological network and some network motifs.

(a) A Drosophila developmental transcriptional…

Figure 2. An illustrative example.

Figure 2. An illustrative example.

(a) A simple network with six nodes. (b) Subgraphs that…

Figure 3. Cluster analysis for the identified…

Figure 3. Cluster analysis for the identified top-30 nodes in the five networks based on…

Figure 4. ROC curves based on the…

Figure 4. ROC curves based on the available information in the CEN and ECT.

Figure 5. Evaluation of I score via…

Figure 5. Evaluation of I score via ROC curves with composite reference standards for the…

Figure 6. Topological neighborhoods of several nodes.

Figure 6. Topological neighborhoods of several nodes.

(a) Topological neighborhood of a hub but not…

Figure 7. The curves of connectivity density…

Figure 7. The curves of connectivity density against for different ranking measures in the ECT…

## Methods

We begin by describing a method that computes the count of F1 and F2 for a given motif in a static network. Next, we describe possible network operations that change the topology of the networks, and discuss how to dynamically update the count of F1 and F2 for each of these operations.

### Motif counting in static networks

Assume that we are given a motif topology denoted with P. Given a graph G, we want to compute the count of F1 and F2 of pattern P in G. Let us denote the set of all embedding of P in G with S. We denote the cardinality of the set S (i.e., F1 count of P) with |S|. Recall that the F2 count of pattern P is the cardinality of the maximal set of embedding where two embeddings do not share edges. We denote such set with (S^<'>phantom !>) . To compute the F2 count of P, we introduce the concept of an overlap graph, which is unique to P and G. Let us denote the overlap graph with (phantom !>G^=(V^, E^)) . Here, each node in V o corresponds to an embedding of P listed in S. Let us denote the relationship between the nodes in V o and the embeddings in S with a bijection function ϕ:V oS. Each edge (u,v) ∈ E o indicates that the two embeddings ϕ(u) and ϕ(v) share at least one edge.

We use the overlap graph to generate the maximal, non-overlapping embedding set (S^<'>phantom !>) in an iterative fashion. First, we find the node uV o with the smallest degree. If there are multiple nodes with the same smallest degree, we randomly select one of them. We insert the corresponding embedding ϕ(u) into (S^<'>phantom !>) . Since (S^<'>phantom !>) only contains non-overlapping embeddings, we remove node u from V o along with all the nodes vV o , such that (u,v) ∈ E o . We repeat this process to populate (phantom !>S^<'>) until V o becomes the empty set.

### Motif counting in dynamic networks

Let us denote the given network with G=(V,E). Also, let us denote the topology of the network after the ith edge insertion or deletion with G i=(V,E i). Thus, we have G0=G and ∀ i≥0,|E iEi−1|=1. Given a motif topology denoted with pattern M, we compute the F1 and F2 counts of M in the initial network G0 by using the method described in “Motif counting in static networks” subsection. As the network G evolves (i.e., new edges are added and/or deleted), the count of F1 and F2 of M can change. Next, we will show an algorithm for efficiently updating the F1 and F2 counts as the network evolves from G i to Gi+1i≥0. By repeatedly applying our algorithm, after each network edit operation, the motif count is updated for arbitrarily large sequence of network updates.

#### Updating the F1 count

We now describe our method for updating the count of F1 of M as G i evolves into Gi+1. We assume that F1 for G i is known. Our algorithm for updating F1 relies on initially constructing and maintaining an auxiliary data structure that allows for the embeddings containing an edge to be efficiently queried. Thus, at the beginning of our algorithm, we find all embeddings of a given motif M in the initial network G0. After finding these embeddings, we create a list of embeddings for each edge eE, denoted as D e, which stores all embeddings that contain e. That is, for a motif M, let m be an embedding in a given network. Then mD e if em. This data structure, which we refer to as the edge compressed bitmap, is updated each time an edge is either added or deleted. The F1 is then updated based on the edge compressed bitmap.

Suppose that as the network G i evolves to Gi+1 the eE i is deleted. This reduces the F1 count of motif M, if the deleted edge is a part of embeddings of M. From the edge compressed bitmap, we find the set of embeddings of M which contain e. We remove this set (D e) from the edge compressed bitmap and reduce the F1 count of M by the cardinality of this set.

Next, assume that an edge eE i is added to G i. Unlike the edge deletion, prior to this update, we do not know whether e is a part of an embedding of M in Gi+1. We locate such embeddings of M in Gi+1 as follows. Let us denote the diameter of M with k. We search the k-neighborhood of e in Gi+1. The set of embeddings of M which contains e can be formed with its neighboring edges. We add this set to the edge compressed bitmap and increase the F1 count of M by the cardinality of the set of new embeddings.

#### Updating the F2 count

After updating the F1 count, we proceed to update the F2 count. Updating the F2 count is more challenging than updating F1 because computing the count of F2 is NP-complete  and the methods used are heuristics. As a result, the F2 count we compute even for a single static network may deviate from the optimal result. We would like to minimize the additional errors introduced by dynamic updates.

First, we assume that we have already computed the F1 and F2 counts of the given motif M in G i and the F1 count of M in Gi+1. Next, we describe how we update the F2 count for Gi+1. There are following two possible scenarios: (1) an edge has been deleted from G i, and (2) and edge has been added to G i. In the first scenario, the removal of an edge e from G i will cause the F2 count to either remain the same or decrease by one. The former case occurs when none of the embeddings in the set D e contribute to the F2 count in G i. The latter case occurs when one of the embeddings from the set D e, contributes to the F2 count of G i. Let us denote that embedding with X (XD e). After removing e, the embedding X does not exist in Gi+1. This reduces the F2 count of M by one. However, it is possible that there is another embedding (say Y), which can be included in the F2 count for Gi+1 to replace X. For this to happen, Y must satisfy two conditions: (i) Y overlaps with X, and (ii) Y does not overlap with any other embedding included in the F2 count of M in G i. If such an embedding Y exists, we include it in the F2 set. Thus the F2 count remains unaltered. Otherwise the F2 count decreases by one.

In order to identify any embedding Y that satisfies the two condition above, we explore the neighbors of X in the overlap graph. Recall that the neighbors of an embedding in the overlap graph are those embeddings of M which share at least one common edge with that embedding. If say, X consists of edges e1, e2 and e3 then the neighbors of X will be the union of sets (D_>) , (D_>) and (D_>) .

From the set of neighbors of X, we consider each embedding and check if they can be included in the updated F2 count. If an embedding Y in that set, has all of its edges free then we include it in the F2 set for Gi+1. Therefore, if such an embedding Y exists, the F2 count remains unaltered as the inclusion of Y compensates for the deletion of X. Otherwise we decrease the F2 count by one.

Assume that an edge e, where Ei+1E i=<e> is added G i.This addition will either increase the F2 count of M in Gi+1 by one or has no influence. The new edge can form new embeddings of M in Gi+1. We explain how we obtain such new embeddings in “Updating the F1 count” section. We then check if any of these new embeddings can be included in the updated F2 count. To do this, we consider each new embedding, and check if all of its edges are uninvolved in the F2 count (they could be involved in the F2 count with other embeddings). If such an embedding exists, we include it in the F2 set and increase the F2 count by one.

We note here that only the observed number of motifs is cited in Wuchty & Stadler (2003), not their Z-scores. Moreover in a network comprising 3183 proteins they find e.g. 3.6 million copies of motif 1 in figure 2b. This can only happen if motifs are counted in a highly degenerate way which raises the question as to whether such a motif definition will give rise to biologically meaningful results.

Start from an empty graph at time t=0 which contains no nodes and no edges (we could also start from , a network with a single node and a single edge which starts and ends at the same node).

## Conclusions

Genome wide expression analysis of transcription factor mutants has traditionally been used to predict novel transcription factor targets. However, as shown in this paper, these data sets contain only a small fraction (about 10 to 20%) of direct targets. In order to understand the indirect response mechanisms following the deletion or overexpression of a transcription factor, we introduced the concept of regulatory path motifs, short paths in an integrated network of transcriptional, protein-protein and phosphorylation interactions which occur significantly more often than expected by chance between transcription factors and their perturbed targets in large-scale deletion and overexpression libraries. Regulatory path motifs extend the well-known notion of static network motifs and are conceptually related to the recently introduced activity motifs. We found eight enriched paths, of which five were overrepresented in both deletion and overexpression data (TRI, TRI-TRI, PPI-TRI, PPI-TRI-TRI and PPI-PhI-TRI). The TRI-PPI path is overrepresented only in deletion data, while the TRI-PhI-TRI and TRI-PPI-TRI paths are overrepresented only in overexpression data. These eight motifs explain about 13% of all genes differentially expressed in the deletion data and 24% in overexpression data, a more than five- to ten-fold increase compared to direct transcriptional links. Like static network motifs, regulatory path motifs are organized in a modular structure where a module consists of perturbed genes reached from a transcription factor by the same type of path with the same intermediate nodes. These modules contain strongly coexpressed and functionally coherent genes and can be used for diverse purposes like predicting periodically expressed genes.

An important property of regulatory networks is their condition-dependent nature. Although currently only a limited number of transcription factor mutant expression experiments are available under different conditions, we have shown that the relative abundance of the eight path motifs in a DNA-damage and cell cycle specific network agrees well with previously observed qualitative differences between exogenous and endogenous processes. Thus regulatory path motifs can be used to characterize the condition-dependency of the response mechanisms across multiple integrated networks.

As the amount of interaction data covering cellular networks at multiple levels of regulation continues to increase, questions regarding the cross-talk between these networks and which parts of the networks are activated upon different kinds of perturbations will quickly gain importance. In this paper we have shown that searching for small, statistically overrepresented patterns integrating functional and interaction data is a simple, yet effective way to address these problems. We have implemented our method as a Cytoscape plugin Pathicular which allows to calculate regulatory path significance values, to visualize regulatory paths on the integrated interaction network, and to extract and visualize regulatory path modules.

Pathicular is applicable to a wide variety of cause-effect and physical interaction networks and is freely available for academic use.

## Motif metrics

With the motif scheme in mind, the very first thing we want to know is given a motif (one of the figure below), how can we tell this motif is significant?. One approach to this problem is to compare the motif count between the given network and a random network of the same order (same number of nodes). I wonder which type of motif statistic or random network is the best for comparision…

### Z-score

z-score measures the different between the number of a motif type found in the network we need to analyse and the mean number of that motif in random networks of the same order (i.e. number of nodes and edges). The tool I used for graph motif analysis is a Python package called graph-tool . Since motif analysis is a demanding task, running undirected-size-4 motif z-score on Blogcatalog3 (

300k edges) takes almost 2 weeks on my lab machine (single thread). However, there is a trick to force graph-tool to use multiple-cores processing mentioned here.

The selection of the graph random-rewire algorithm is also an open question in complex network research. In my work, I settle with the configuration model for random graph generation. However, other random graph models such as block model could be better for a certain type of motif. Professor Barabasi also mentioned about this matter in his slide (Barabasi 2016). graph-tool also provides the implementations of some popular random graph rewiring functions.