SeQuery provides two navigation interfaces for exploring the GPCR network, as illustrated in Figure 1. We recommend users to use the Chrome browser for better viewing quality. At its homepage (http://cluster.phy.ntnu.edu.tw), users can select the GPCR2841 dataset and explore the GPCR network from the top down (I). Alternatively, users can also submit a query sequence and explore its role in the GPCR network from the bottom up (II).
(I) In the top-down navigation scheme, SeQuery first displays the minimum spanning tree (MST) graph of the GPCR superfamily at the third level of clustering, which contains seven GPCR classes as shown in Figure 2. Each node in the network graph represents a GPCR family. The same scheme of colors and shapes as shown in the legend is used to denote the functions of GPCR clusters for all levels of clustering. When users click on a node, as shown in Figure 3, SeQuery shows the MST graph of the superfamily at the 2nd level clustering, centering at constituent clusters of the selected GPCR family. At this second level of clustering, the GPCR superfamily is represented by a MST graph of receptor clusters predicted from the first-level MSC clustering and each node in the graph represents a first-level MSC cluster. Users can explore the 2nd level graph of the network by dragging and zooming the network or locate any GPCR family of interest by clicking on the node in the legend at the left-hand side (SeQuery will center the second level graph at the clusters of the selected family). By clicking on a node in the 2nd level graph, SeQuery shows constituent sequences of the selected cluster in a 1st level graph, as shown in Figure 4 for a Peptide Receptor cluster Pe001 (A) and a Olfactory Receptor cluster Ol001 (B). Each node in a 1st level graph is a GPCR sequence, whose detailed information can be viewed in an information box by clicking on the node. At the first level of clustering, receptor clusters are represented by a tree graph which may or may not contain a core of zero-distance sequences. If a sequence has a known protein structure, such as P51681 in Figure 4(A), its structure will be displayed on the node. When the user clicks on a sequence node, SeQuery displays its sequence and function information, as shown in Figure 5.
Fig. 4 The tree diagrams of first-level GPCR clusters, showing member sequences of cluster Pe001 which has a conservative core (A), and member sequences of cluster Ol001 which has no conservative core (B). In (A) the three-dimensional protein structure of P51681 is displayed in its node.
Alternatively, users can select the centrality tab to see the centrality network on the 3rd level graph. When users click a node on the 3rd level network, SeQuery displays various thresholded network graphs showing statistical properties of the corresponding receptor family, as shown in Figure 6 for the Cannabinoid Receptor family in the Rhodopsin-like class. Here nodes are colored according to the value of their closeness centrality, and edges are colored according to the value of their betweenness centrality. The threshold value of sequence distance is 10–20 (A), 10–50 (B), 10–80 (C), and 10–100 (D), and edges longer than the threshold are not shown in Figure 6. From Figure 6, it is clear that the edge connecting Q98894 and P47936 has the largest betweenness and thus the largest potential to disconnect graphs if removed. It is also seen that the node Q98894 has the largest weighted degree and betweenness, and thus its sequence can be considered as the representative of the Cannabinoid Receptor family. By clicking the MST graph button on the centrality page (only for Olfactory Receptor family for now), MST graphs are also available to show the centrality measures of each GPCR family, as shown in Figure 7 for the Olfactory Receptor family. In the centrality networks, nodes are colored according to the value of their centrality measures (closeness in thresholded graphs, and the weighted degree in MST graphs), and edges are colored according to the value of their betweenness centrality. The functional information and centrality data of a GPCR sequence will be displayed in a box after clicking its corresponding node in the network.
Fig. 6 Thresholded sequence similarity network graphs of the Cannabinoid Receptor family with the threshold distance of 10–20 (A), 10–50 (B), 10–80 (C), and 10–100 (D). Nodes and edges are respectively colored by the value of their closeness centrality and betweenness centrality according to the color bar.
(II) In the bottom-up navigation, the database containing the base dataset of 2841 GPCRs is selected, and the query sequence is uploaded or entered at the homepage in the FASTA format. SeQuery determines if it is a GPCR based on its distance to the sequences in the base dataset. If so, as shown in Figure 5, SeQuery will display the graph of network connections between the query sequence and its neighbors at the first level of clustering, as well as information related to the overall cluster and its closest neighbor. The role of this cluster in the GPCR network can be further investigated at the second or third level of clustering.
Centrality measures for the sequence similarity network of GPCRs
In a network, central nodes are those in the thick of things. To evaluate the centrality of important nodes, we consider four different centrality measures, namely the weighted degree, closeness, betweenness, and eigenvector centralities (as defined in Figure 8), for an all-to-all, undirected, and weighted graph G := (V, E) with |V| nodes and |E| edges. The weight matrix W of the graph has weights wuv for the edges connecting nodes u and v (∀u,v∈V). Equivalently, we define a distance matrix D with elements duv, where duv ≡ wuv–1–1.