A pharmacometabolomics aided pharmacogenomics workflow inclu
A “pharmacometabolomics-aided pharmacogenomics” workflow includes sample acquisition and preparation, analysis (NMR or mass spectrometry technologies), data processing and data analysis (targeted and non-targeted) (Mussap et al., 2013). For those familiar with (pharmaco)metabolomics approaches, such an outline is not new. What we propose at this ldv point is the use of information technologies for in-depth data mining, analysis, and argumentation. Tools such as the human metabolome database (http://www.hmdb.ca) and/or MetaboAnalyst (www.metaboanalyst.ca) are fundamental for initial data processing and interpretation. An integrative genes/metabolites analysis of confirmed metabolite identifiers will be facilitated by applying MAGENTA (http://www.broadinstitute.org/mpg/magenta/) to all curated biological pathways, such as: KEGG (http://www.genome.jp/kegg/), GO (http://www.geneontology.org), Reactome (http://www.reactome.org), Panther (http://www.pantherdb.org), Biocarta (http://www.biocarta.com) and Ingenuity (http://www.ingenuity.com/) databases, allowing statistical filtering and gene set enrichment. If kinome is of interest, ReKINect that has been recently reported and validated could be employed (Creixell et al., 2015). Data visualization and analysis will be further supported by applications and web services, such as BioGRID (Stark et al., 2006), BDNB (Birkland and Yona, 2006), BioMart (Guberman et al., 2011), Oncomine (Rhodes et al., 2004), GenePattern (http://www.broadinstitute.org/cancer/software/genepattern/), PyMOL (https://www.pymol.org/), MetaMapR (http://dgrapov.github.io/MetaMapR/), the UCSC Genome Browser (Rosenbloom et al., 2013; Karolchik et al., 2009) or even networks (caBIG, http://cabig.cancer.gov; BIRN, http://www.nbirn.net) and projects (Genotype-Tissue Expression Project) (Lonsdale et al., 2013) that enable sharing of data and resources. As the next step and on the basis of candidate pathways of interest, untyped SNP genotypes may be imputed with the software package MaCH 1.0 (Li et al., 2010; Biernacka et al., 2009; Nothnagel et al., 2009) to merge candidate pathway data with pharmacogenomics, cost-effectively and possibly, with broader gene coverage than that of routine tag SNP genotyping. Quality control measures will be also employed (MaCH “Rsq”) to define the correlation between imputed and true genotypes (Li et al., 2010). Data validation and replication will occur by routine genotyping (PCR, Sanger sequencing). In silico tools, such as those provided by RD-Connect (http://rd-connect.eu/), CRAVAT (Douville et al., 2013), SIFT (Sim et al., 2012) and PROVEAN (Choi and Chan, 2015) at the gene level will further assist on data interpretation. Various statistical approaches will be employed at several steps throughout the proposed workflow (the R project for statistical computing, https://www.r-project.org/), including those that are mostly used for (pharmaco)metabolomics and pharmacogenomics studies — principal component analysis (PCA) and partial least square discriminant analysis (PLS-DA). PCA, a statistical method for element reduction through an orthogonal transformation, is an unsupervised method that can be used to identify specific structures in a dataset (clusters, anomalies or trends that exist between the observations). For this, PCA is employed to identify patients who respond to treatment from those who do not. On the contrary, PLS-DA is a supervised method. This supervised analysis will define the important variables — the main metabolites responsible for the separation among the groups in question. Data mining, analysis, collaboration and decision-making in such diverse data-intensive and cognitively complex settings will be performed via the Dicode approach, supporting artificial and human intelligence. The envisioned architecture combines batch (a series of non-interactive tasks is executed all at one time) and stream (continuous computation that occurs as data is flowing through the system) processing (Karacapilidis, 2014), ensuring rapid and efficient outcomes.