Difference between revisions of "SIF (file format)"
(Automatic synchronization with BioUML) |
(Automatic synchronization with BioUML) |
||
Line 2: | Line 2: | ||
:SIF | :SIF | ||
;Element type | ;Element type | ||
− | :{{Type link| | + | :{{Type link|Diagram}} |
;Plugin | ;Plugin | ||
:[[Biouml.plugins.microarray (plugin)|biouml.plugins.microarray (Microarray plug-in)]] | :[[Biouml.plugins.microarray (plugin)|biouml.plugins.microarray (Microarray plug-in)]] |
Latest revision as of 19:01, 13 February 2017
- File format title
- SIF
- Element type
- Diagram
- Plugin
- biouml.plugins.microarray (Microarray plug-in)
[edit] SIF format
The Simple Interaction Format (SIF) was originally created for use with Cytoscape1,2, the open source bioinformatics software platform for visualizing molecular interaction networks.
SIF is convenient for building a graph from a list of interactions and makes it easy to combine different interaction sets into a larger network, or add new interactions to an existing data set. The main disadvantage is that this format does not include any layout information, forcing Cytoscape to re-compute a new layout of the network each time it is loaded.
Lines in the SIF file specify a source node, a relationship type (or edge type), and one or more target nodes:
nodeA nodeB nodeC nodeA nodeD nodeE nodeF nodeB nodeG ... nodeY nodeZ
A more specific example is:
node1 typeA node2 node2 typeB node3 node4 node5 node0
The first line identifies two nodes, called node1
and node2
, and a single relationship between node1
and node2
of type typeA
. The second line specifies three new nodes, node3
, node4
, and node5
; here "node2
" refers to the same node as in the first line. The second line also specifies three relationships, all of type typeB
and with node2
as the source, with node3
, node4
, and node5
as the targets. This second form is simply shorthand for specifying multiple relationships of the same type with the same source node. The third line indicates how to specify a node that has no relationships with other nodes. This form is not needed for nodes that do have relationships, since the specification of the relationship implicitly identifies the nodes as well.
Duplicate entries are ignored. Multiple edges between the same nodes must have different edge types. For example, the following specifies two edges between the same pair of nodes, one of type xx
and one of type yy
:
node1 xx node2 node1 xx node2 node1 yy node2
Edges connecting a node to itself (self-edges) are also allowed:
node1 xx node1
Every node and edge in Cytoscape has an identifying name, most commonly used with the node and edge data attribute structures. Node names must be unique, as identically named nodes will be treated as identical nodes. The name of each node will be the name in this file by default (unless another string is mapped to display on the node using the visual mapper). This is discussed in the section on visual styles. The name of each edge will be formed from the name of the source and target nodes plus the interaction type: for example, sourceName (edgeType) targetName.
The tag can be any string. Whole words or concatenated words may be used to define types of relationships, e.g. geneFusion, cogInference, pullsDown, activates, degrades, inactivates, inhibits, phosphorylates, upRegulates, etc.
Some common interaction types used in the Systems Biology community are as follows:
pp .................. protein � protein interaction pd .................. protein -> DNA (e.g. transcription factor binding upstream of a regulating gene.)
Some less common interaction types used are:
pr .................. protein -> reaction rc .................. reaction -> compound cr .................. compound -> reaction gl .................. genetic lethal relationship pm .................. protein-metabolite interaction mp .................. metabolite-protein interaction
[edit] Delimiters
Whitespace (space or tab) is used to delimit the names in the simple interaction file format. However, in some cases spaces are desired in a node name or edge type. The standard is that, if the file contains any tab characters, then tabs are used to delimit the fields and spaces are considered part of the name. If the file contains no tabs, then any spaces are delimiters that separate names (and names cannot contain spaces).
If your network unexpectedly contains no edges and node names that look like edge names, it probably means your file contains a stray tab that's fooling the parser. On the other hand, if your network has nodes whose names are half of a full name, then you probably meant to use tabs to separate node names with spaces.
Networks in simple interactions format are often stored in files with a .sif extension, and Cytoscape recognizes this extension when browsing a directory for files of this type.