桑基图(Sankey diagram)

简介

桑基图(Sankey diagram)是用于表示能量或信息流动的一种可视化方式,应用于微生物组数据,可以清晰展示各个taxonomy level之间物种相对丰度的流动。从Domain到Species,相邻两级之间分支的总宽度保持不变(能量守恒),如下图:

数据准备

此处我们使用Metaphlan2 tutorial中的数据,来源于牙龈菌斑的宏基因组测序:

download.file('https://github.com/biobakery/biobakery/raw/master/demos/biobakery_demos/data/metaphlan2/output/SRS014476-Supragingival_plaque_profile.txt', destfile = 'SRS014476-Supragingival_plaque_profile.txt')
taxtab <- read.table('SRS014476-Supragingival_plaque_profile.txt', sep='\t', stringsAsFactors=FALSE)

加载networkdD3包

该包使用D3.js实现数据的交互式可视化,具体一些例子参见这里

library(networkD3)

数据预处理

函数sankeyNetwork主要需要两个data frame: Links和Nodes。Links主要有连接的起点(source)和终点(target),似乎此处的起点和终点只能是从0开始的数字。Nodes如果不指定ID,默认行是按照0开始的数字排列。

second_last <- function(x) ifelse(length(x)>1, x[length(x)-1], NA)

taxtab <- taxtab[-grep('unclassified', taxtab$V1), ]   ## remove unlassified taxa
 
tax_split <- strsplit(taxtab$V1, '\\|')                ## split into different taxonomy levels
target <- sapply(tax_split, tail, n=1)                 ## target node
mapping <- data.frame(id=0:(length(target)-1), row.names=target)
target_id <- mapping$id
source <- sapply(tax_split, second_last)               ## source node
source_id <- mapping[source,]
value <- taxtab$V2                                     ## width of flow

links <- data.frame(source=source_id, target=target_id, value) ## links
links <- links[!is.na(links$source), ]                 ## root of the tree has no parent (source)
nodes <- data.frame(do.call(rbind, strsplit(target,'__')))
colnames(nodes)=c('tax', 'name')

绘图

sankeyNetwork(Links = links, Nodes = nodes, 
              Source = "source", Target = "target", 
              Value = "value", NodeID = "name", NodeGroup = 'tax',
              sinksRight=FALSE, fontSize = 10, nodeWidth = 20, nodePadding=40, 
              height =200, width = 900)

Bacteria → Actinobacteria
90 
Actinobacteria → Actinobacteria
90 
Actinobacteria → Actinomycetales
90 
Actinomycetales → Corynebacteriaceae
58 
Corynebacteriaceae → Corynebacterium
58 
Corynebacterium → Corynebacterium_matruchotii
58 
Actinomycetales → Micrococcaceae
32 
Micrococcaceae → Rothia
32 
Rothia → Rothia_dentocariosa
32 
Bacteria → Bacteroidetes
10 
Bacteroidetes → Flavobacteriia
10 
Flavobacteriia → Flavobacteriales
10 
Flavobacteriales → Flavobacteriaceae
10 
Flavobacteriaceae → Capnocytophaga
10 
Bacteria
100
Bacteria
Actinobacteria
90
Actinobacteria
Bacteroidetes
10
Bacteroidetes
Actinobacteria
90
Actinobacteria
Flavobacteriia
10
Flavobacteriia
Actinomycetales
90
Actinomycetales
Flavobacteriales
10
Flavobacteriales
Corynebacteriaceae
58
Corynebacteriaceae
Micrococcaceae
32
Micrococcaceae
Flavobacteriaceae
10
Flavobacteriaceae
Corynebacterium
58
Corynebacterium
Rothia
32
Rothia
Capnocytophaga
10
Capnocytophaga
Corynebacterium_matruchotii
58
Corynebacterium_matruchotii
Rothia_dentocariosa
32
Rothia_dentocariosa
在此图中,不同颜色代表不同的taxonomy level,连接的宽度代表相对丰度。

桑基图工具的其他选择