Quantcast
Channel: SEQanswers
Viewing all 26680 articles
Browse latest View live

Genome assembly workflows for NGS

$
0
0
Hello,
I'm trying to assembly a genome and I'd like to know if the following workflows are correct:
1) fastqc - velveth - velvetg - mauve ordering with a reference genome - mauve metrics
2) fastqc - velveth - velvetg - reapr to assess quality and close gap - mauve ordering with a reference genome - mauve

metrics
3) fastqc - abyss - mummer -reapr.

Can you give me any suggestion about the above workflows?
Do you know any other efficent workflow to assembly and evalute a bacterial genome?

Thanks in advance

Miseq Error: No usable signal found in the images; it is possible that clustering has

$
0
0
Hello all,

I wanted to reach out to see if anyone has experienced this error, descript below. The run appears to have failed shortly after starting it and did not take any images of clustering. I was paranoid that the sample may not have denatured optimally before adding the HT1 buffer to neutralize the solution, even if this was the case I feel SOME clustering could have occurred.

Error:
"No usable signal found in the images; it is possible that clustering has failed"

correlation in per base sequence quality profiles of multiplexed samples.

$
0
0
I have a question related to the "per base sequence quality" profiles.
In a Illumina HiSeq 150bp paired end runs of multiplexed samples, I found that the "per base mean sequence quality" profiles of all the samples correlate with each other. Please see the attached file for figures of quality profiles of all samples, for forward(read1) and reverse (read2) reads.

Though the variation in quality is very small (within 1 quality score), I was expecting that the variation should be rather random for each sample. Is this normal and a common occurrence, and a characteristic of the machine, with something to do with the base calling at each cycle?

* There is a prominent dip in quality at around 105 position in both forward and reverse reads. I was advised by the Sequencing provider that this is a common occurrence, associated with an increase in the laser intensity which occurs around this position. Is it true for all HiSeq runs?

Thanks for your suggestions.

Attached Files
File Type: pdf PBSQ.pdf (265.7 KB)

combine cells with same length?

$
0
0
Hi all,

I am new to pacbio, and recently working with iso-seq datasets. Take MFC7 as a example, I find there are 7 cells sequenced with 3-5kb. So I did analysis for each of them, then combine the sam files after mapping high quality cluster sequences to reference.

When I check the wiki page of cDNA primer, it seems another tool was developed for chaining GTF.

Then I think is it possible to provide all the 7 cells to ConsensusTools, and generate a big CCS file for them. Or maybe the better way is to feed all the 28 cells to tofu_warp to sizing automatically.

So, I am confused about the strategy of analysing samples with more cells for different size. It seems that I have four options for construction the FL cDNA:
1. combine all cells -> pb_warp
2. combine cells with same length -> ConsensusTools -> classify -> cluster -> collapse -> chain
3. do not combine -> ConsensusTools -> classify -> cluster -> collapse -> chain
4. do not combine -> ConsensusTools -> classify -> cluster -> merge sam -> collapse -> chain

Which one is better?

Thanks a lot

cummeRbund error

$
0
0
Hi guys
for some reason cummerbund in R cannot read my diffout folder.
this is the command that I used after installing the library of course
cuff<-readCufflinks()
I got the following errors
No records found in Eutrema_STAR_Cuffdiff/tss_groups.fpkm_tracking
TSS FPKM tracking file was empty.
Reading Eutrema_STAR_Cuffdiff/tss_group_exp.diff
No records found in Eutrema_STAR_Cuffdiff/tss_group_exp.diff
Reading Eutrema_STAR_Cuffdiff/splicing.diff
No records found in Eutrema_STAR_Cuffdiff/splicing.diff
Reading Eutrema_STAR_Cuffdiff/tss_groups.count_tracking
No records found in Eutrema_STAR_Cuffdiff/tss_groups.count_tracking
Reading read group info in Eutrema_STAR_Cuffdiff/tss_groups.read_group_tracking
No records found in Eutrema_STAR_Cuffdiff/tss_groups.read_group_tracking
Reading Eutrema_STAR_Cuffdiff/cds.fpkm_tracking

and then when I ran the cuff command I got the wrong number of samples and zero TSS, CDS...etc

Help plz!

Searching for a sequence in WGS / RNA-seq data

$
0
0
I am interested in searching for a specific sequence in both my RNA-seq and WGS data, and the sequence is quite a bit above the read lengths for either experiment. I have access to all BAM files, some VCF files for WGS, raw fastq files, and everything else you can imagine coming from the sequencing. I want to see if a sequence is present in the data, and if it is, if it's present in the aligned or unaligned BAM files.

The background to my question would be that the sequence in question is a sequence that I believe would not be successfully mapped to the reference, but might still exist in the data/reads. I am unsure of how to go about this, or if it's even something that can be done.

My initial idea was to create some kind of consensus sequence from the RNA-seq BAM-files (both unaligned and aligned), and simply search the resulting sequencing against my sequence of interest. This, however, has proven to be hard, as there seems to be numerous ways of doing it according to Google, and none being the best (the "best" of which involving vcftools, which I for the life of me I cannot get to install on my Mac; no make files, although the documentation says there should be!)

In essence, I just want to find my sequence in my data. How do I do this?

bedtools sort by faidx

$
0
0
The sort order of my bam file is:

Code:

cmccabe@DTV-A5211QLM:~/Desktop/NGS/pool_I_090215$ samtools view -H IonXpress_008_150902_newheader.bam | grep SQ | cut -f 2 | awk '{ sub(/^SN:/, ""); print;}'
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrX
chrY
chrM

So I created a names.txt to do a sortBed in bedtools but it appears that the option I need is not there.

Code:

Tool:    bedtools sort (aka sortBed)
Version: v2.25.0
Summary: Sorts a feature file in various and useful ways.

Usage:  bedtools sort [OPTIONS] -i <bed/gff/vcf>

Options:
        -sizeA                        Sort by feature size in ascending order.
        -sizeD                        Sort by feature size in descending order.
        -chrThenSizeA                Sort by chrom (asc), then feature size (asc).
        -chrThenSizeD                Sort by chrom (asc), then feature size (desc).
        -chrThenScoreA                Sort by chrom (asc), then score (asc).
        -chrThenScoreD                Sort by chrom (asc), then score (desc).
        -faidx (names.txt)        Sort according to the chromosomes declared in "names.txt"
        -header        Print the header from the A file prior to results.

cmccabe@DTV-A5211QLM:~/Desktop/NGS$ sortBed faidx -i /home/cmccabe/Desktop/NGS/bed/bedtools/xgen_targets.bed > /home/cmccabe/Desktop/NGS/bed/bedtools/xgen_targets_sorted.bed

*****ERROR: Unrecognized parameter: faidx *****

Basically, since the sort order of my bam is in "human ordering" I wanted to sort my bed in the same way. Thank you :).

Complete bioinformatics newbie here...hello

$
0
0
So, as the title implies, I am a complete greenhorn when it comes to bioinformatics and have much to learn. I am also just learning how to use the terminal on a MAC. I am a brand new graduate student here at the University of Kentucky and am working on Sea Lamprey genomics and development.

I will probably be posting a lot of softball questions in these forums out of desperation so please keep that in mind when I make posts that seem "too easy". Other than that I am determined to master everything I need to know to do bioinformatics and to learn all that I can. Hello!

DESeq2 multivariate analysis - retrieving certain stats

$
0
0
Dear All,

I am running a DESeq2 within-subject treatment response analysis. I have RNA seq data on 75 individuals, for most of these I have data before and after treatment (but not all). I also have a variable indicating successful treatment response across all individuals. I want to model gene expression as dependent on treatment and treatment response (res01) within each subject. counts is an R object with counts across all samples and genes.

The design matrix:
Code:

> head(A_design)
    sampleID subject treatment res01
A10a    A10a    A10        0    0
A10b    A10b    A10        1    1
A11a    A11a    A11        0    0
A11b    A11b    A11        1    0
A12a    A12a    A12        0    0
A12b    A12b    A12        1    1

The analysis is ran as follows:
Code:

dds <- DESeqDataSetFromMatrix(countData = counts[,A_design[,1]], colData = as.data.frame(A_design), design = ~ subject + treatment + res01)
Atreat.dds <- DESeq(dds)

Originally I though that I could extract the effect of res01 on gene expression as:
Code:

> res <- results(Atreat.dds,name='res01',pAdjustMethod='BH')
Error in results(Atreat.dds, name = "res01", pAdjustMethod = "BH") :
  cannot find appropriate results in the DESeqDataSet.
possibly nbinomWaldTest or nbinomLRT has not yet been run.

At this point I realized that the results looked differently than I expected:
Code:

> resultsNames(Atreat.dds)
 [1] "Intercept"  "subjectA01" "subjectA02" "subjectA03" "subjectA04"
 [6] "subjectA05" "subjectA06" "subjectA07" "subjectA08" "subjectA09"
.....
[76] "subjectA92" "treatment0" "treatment1" "res010"    "res011"

I have two questions regarding this:
1. Why do I get two res01 results and two treatment results? This seems to be due to the within-subject design. I don't understand which model that has been tested here explicitly.
2. Should I run the following code to extract the effect that res01 has on gene expression given all other covariates?
Code:

Ares_res01=results(Atreat.dds, contrast=list("res010",'res011'))
Thanks in advance for any help,
Boel

My sessionInfo:
Code:

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Scientific Linux release 6.7 (Carbon)

locale:
 [1] LC_CTYPE=sv_SE.UTF-8      LC_NUMERIC=C             
 [3] LC_TIME=sv_SE.UTF-8        LC_COLLATE=sv_SE.UTF-8   
 [5] LC_MONETARY=sv_SE.UTF-8    LC_MESSAGES=sv_SE.UTF-8 
 [7] LC_PAPER=sv_SE.UTF-8      LC_NAME=C               
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C     

attached base packages:
[1] parallel  stats4    stats    graphics  grDevices utils    datasets
[8] methods  base   

other attached packages:
[1] DESeq2_1.8.1              RcppArmadillo_0.5.400.2.0
[3] Rcpp_0.12.0              GenomicRanges_1.20.5   
[5] GenomeInfoDb_1.4.2        IRanges_2.2.7           
[7] S4Vectors_0.6.3          BiocGenerics_0.14.0     

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2  futile.logger_1.4.1  plyr_1.8.3         
 [4] XVector_0.8.0        futile.options_1.0.0 tools_3.2.1       
 [7] rpart_4.1-10        digest_0.6.8        RSQLite_1.0.0     
[10] annotate_1.46.1      gtable_0.1.2        lattice_0.20-33   
[13] DBI_0.3.1            proto_0.3-10        gridExtra_2.0.0   
[16] genefilter_1.50.0    stringr_1.0.0        cluster_2.0.3     
[19] locfit_1.5-9.1      nnet_7.3-10          grid_3.2.1         
[22] Biobase_2.28.0      AnnotationDbi_1.30.1 XML_3.98-1.3       
[25] survival_2.38-3      BiocParallel_1.2.20  foreign_0.8-66     
[28] latticeExtra_0.6-26  Formula_1.2-1        geneplotter_1.46.0 
[31] ggplot2_1.0.1        reshape2_1.4.1      lambda.r_1.1.7     
[34] magrittr_1.5        scales_0.2.5        Hmisc_3.16-0       
[37] MASS_7.3-43          splines_3.2.1        xtable_1.7-4       
[40] colorspace_1.2-6    stringi_0.5-5        acepack_1.3-3.3   
[43] munsell_0.4.2

Error in normalizeDoubleBracketSubscript

$
0
0
Hello,

I am getting an error while trying to use a function to count reads for every chromosome across the genome in bins of 50bp (windowAnalysis function from groHMM package). The error is only for one chromosome and runs fine for rest of the chromosomes. Upon reading, I found that it is related to IRanges package. Following is the error:

$chrM
1] "Error in normalizeDoubleBracketSubscript(i, x, exact = exact, error.if.nomatch = FALSE) : \n subscript is out of bounds\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in normalizeDoubleBracketSubscript(i, x, exact = exact, error.if.nomatch = FALSE): subscript is out of bounds>


Example for how it works fine for other chromosomes:

$chr1
integer-Rle of length 4985012 with 64 runs
Lengths: 935331 1 1 478754 1 ... 228032 1 1 1650
Values : 0 47 4 0 1 ... 0 1 50 0



Any suggestions to fix this is greatly appreciated.


Thanks,
Anusha

Senior Financial Analyst

$
0
0
Senior Financial Analyst

Responsibilities: Job will evolve over time depending on candidate’s capabilities and could include the following:
• Responsible for monthly P&L reporting
• Responsibility for weekly and monthly sales reporting to commercial team
• Support accounting team with analysis of monthly general ledger entries
• Responsible for monthly commission and bonus accrual and payout calculations
• Responsible for quarterly P&L flux analysis for outside auditors
• Support monthly product demand and supply-side forecasting process
• Administration of corporate bonus plans
• Support of annual budget process
• Support long range forecasting process and modeling
• Support annual insurance renewal process and audits
• Various ad-hoc financial modeling requirements

Requirements
• 4-6 yrs related experience in financial analysis role
• Understanding of general accounting process and general accounting principles
• Effective communication skills (written & verbal)
• General understanding ERP systems and data extraction and analysis
• Ability to work and succeed in a team environment
• Ability to adapt quickly and learn new tasks independently
• High level of curiosity and creativity
• Strong analytical skills
• Excellent organization skills
• Ability to manage competing priorities
• Finance/Accounting degree or undergraduate degree which requires high level of analytics

Nice to Have
• MBA preferred but not necessary

All qualified applicants will receive consideration for employment without regard to race, sex, color, religion, national origin, protected veteran status, or on the basis of disability, gender identity, and sexual orientation.

Application Instructions:

For immediate consideration, please follow this link to submit your resume: Senior Financial Analyst



Pacific Biosciences is an Equal Opportunity Employer

Hey There!!

$
0
0
A quick hello to the SEQanswers community!

I am currently working with the HiSeq, MiSeq, & soon the NextSeq in the cancer genomics world. I am looking forward to learning all that I can from this group of intelligent people!!

Upcoming: Epigenomics Hands-On Workshop

$
0
0


DNA Methylation Data Analysis
How to use bisulfite-treated sequencing to study DNA methylation

Link to workshop page

When?
15 - 17 December 2015

Where?
iad Pc-Pool, Rosa-Luxemburg-Straße 23, Leipzig, Germany


Scope and Topics
The purpose of this workshop is to get a deeper understanding of the use of bisulfite-treated DNA in order to analyze the epigenetic layer of DNA methylation. Advantages and disadvantages of the so-called 'bisulfite sequencing' and its implications on data analyses will be covered. The participants will be trained to understand bisulfite-treated NGS data, to detect potential problems/errors and finally to implement their own pipelines. After this course they will be able to analyze DNA methylation and create ready-to-publish graphics.

By the end of this workshop the participants will:
  • be familiar with the sequencing method of Illumina
  • understand how bisulfite sequencing works
  • be aware of the mapping problem of bisulfite-treated data
  • understand how bisulfite-treated reads are mapped to a reference genome
  • be familiar with common data formats and standards
  • know relevant tools for data processing
  • automate tasks with shell scripting to create reusable data pipelines
  • perform basic analyses (call methylated regions, perform basic downstream analyses)
  • plot and visualize results (ready-to-publish)
  • be able to reuse all analyses

Target Audience
  • biologists or data analysts with no or little experience in analyzing bisulfite sequencing data

Requirements
  • basic understanding of molecular biology (DNA, RNA, gene expression, PCR, ...)
  • the data analysis will partly take place on the linux commandline. Is is therefore beneficial to be familiar with the commandline and in particular the commands covered in the Learning the Shell Tutorial

Included in the Course
  • Course materials
  • Catering
  • Conference Dinner

Trainers
  • Helene Kretzmer (University of Leipzig) is working on DNA methylation analyses using high-throughput sequencing since 2011. She is responsible for the bioinformatic analysis of MMML-Seq study of the International Cancer Genome Consortium (ICGC).
  • Dr. Christian Otto (CCR-BioIT) is one of the developers of the bisulfite read mapping tool segemehl and is an expert on implementing efficient algorithms for HTS data analyses.
  • Dr. David Langenberger (ecSeq Bioinformatics) started working with small non-coding RNAs in 2006. Since 2009 he uses HTS technolgies to investigate these short regulatory RNAs as well as other targets. He has been part of several large HTS projects, for example the International Cancer Genome Consortium (ICGC).
  • Dr. Mario Fasold (ecSeq Bioinformatics) has developed several bioinformatics tools such as the Bioconductor package AffyRNADegradation and the Larpack program package. Since 2011 he is specialized in the field of HTS data analysis and helped analysing sequecing data of several large consortium projects.

Key Dates
Opening Date of Registration: 1 June 2015
Closing Date of Registration: 15 November 2015
Workshop: 15 - 17 December 2015 (8 am - 5 pm)

Attendance
Location: iad Pc-Pool, Rosa-Luxemburg-Straße 23, Leipzig, Germany
Language: English
Available seats: 24 (first-come, first-served)

Registration fees:

998 EUR (without VAT)

Travel expenses and accommodation are not covered by the registration fee.

Contact
ecSeq Bioinformatics
Brandvorwerkstr.43
04275 Leipzig
Germany
Email: events@ecSeq.com

Visit: http://www.ecseq.com/workshops/workshop_2015-02

NGS beginner

$
0
0
Hello all!

A couple of months ago we decided to try Illumina NGS for metagenomics purposes. Data analysis is completely new for me and also for the lab where I work. I registered to Seqanswers because I expect to have some questions along the way.

Kind regards,

Karel

read number in fastq does not match bwa-mem produced sam file

$
0
0
Hi all,

I used bwa mem to align my quality trimmed reads to a reference. I have 5M reads to align but in the sam file there are only 3M of them included (I grep the read names' initial part and count).

There is the "4" flag for many of the reads that are present in the sam file as unaligned. So I do not understand where the rest of the reads are. Does bwamem has a preference for which reads to align and report? Any ideas on this?

bwa mem index reads.fastq > align.sam

Thank y'all!
Melis

Best assembly

$
0
0
I've run QUAST to assess the quality of a genome assembled with 3 different tools (Abyss, Velvet,SoapDeNovo) see Attachment.
According to you which is the best?
Why contigs in the last genome (SoapDeNovo) are 513 and in the other genomes 225/228?



Kind regards

Attached Images
File Type: jpg assembly.jpg (34.4 KB)

novoalign parameters_alignment scoring options

$
0
0
Dear All,

I am confusing about the parameter in novoalign, the -t, -g, -x, I read the manual, and it seems that these three parameters can be used for setting mismatch when you map your reads to the reference. However, how could I set them? Is there any way to calculate? How could I know which number should I set for the -t, -g, -x. I checked the forum, and I did not get it why eg. -t=60, then it is around 3 mismatch... Could anyone help me?

Thanks in advance!

Cheers,

Sadiexiaoyu

Bowtie alignments and --local function

$
0
0
I've just starting sequencing using MiSeq and am stumbling a bit in the analysis of my output reads, specifically using bowtie2 to align the reads to the reference genome. I'm using bowtie to get a rough estimate of what my amplification looks like and it's generally intuitive and quick. However:

When aligning reads to a genome sequence, what is “% reads align exactly 1 time” vs “% reads align >1 times” in the output? From my understanding, bowtie only records the best possible match (by default).
Is this saying that, to use an example from one of my samples, 40.26% of my reads align equally well to several places in the genome, suggesting a repeat region, while only 17.35% of my reads align to a unique region?
Or should this be interpreted as 40% of the reads align somewhere in the genome that already has one or more reads assembled to it, and that 17% of my reads are “unique”?

Also, my —-local alignments are substantially different (99% of reads align using —-local while only <70% without) from those where I do not specify —-local, even though I trim adapters and low quality reads before aligning (using trimmomatic). From my understanding of —-local, it should give a slightly more liberal alignment, but the two should be closer especially with trimming beforehand.

Forgive my ignorance and thanks for any advice.

Alex

coverageBed -g option error

$
0
0
I am using:

Code:

cmccabe@DTV-A5211QLM:~/Desktop/NGS$ coverageBed -d -sorted -g /home/cmccabe/Desktop/NGS/bedtools2-25.0/genomes/human.hg19.genome -a /home/cmccabe/Desktop/NGS/bed/bedtools/xgen_targets_sorted.bed -b /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_newheader.bam > /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_output.txt
Error: Sorted input specified, but the file /home/cmccabe/Desktop/NGS/bed/bedtools/xgen_targets_sorted.bed has the following record with a different sort order than the genomeFile /home/cmccabe/Desktop/NGS/bedtools2-25.0/genomes/human.hg19.genome
chr20        126045        126343        +        DEFB126:exon.2;DEFB126:exon.3

The newheader.bam is sorted like so:

Code:

cmccabe@DTV-A5211QLM:~/Desktop/NGS/pool_I_090215$ samtools view -H IonXpress_008_150902_newheader.bam | grep SQ | cut -f 2 | awk '{ sub(/^SN:/, ""); print;}'
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrX
chrY
chrM

Since the bam file is uses "human ordering, I sorted the bed file in the same way using the -faidx option in bedtools.

Code:

cmccabe@DTV-A5211QLM:~/Desktop/NGS/bed/bedtools$ awk '!_[$1]++' | cut -f1 xgen_targets_sorted.bed | uniq
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrX
chrY

The output file that results stops after chr19. I guess my question is if there was an error sorting wouldn't all the records be a problem and if I made my own genome file using the coordinates in the bedtools genome file but re-ordered them to match mine, would that work? Or is there another problem I am overlooking? Thank you :).

DESeq2 plotMA: data appears incorrect, patterned

$
0
0
Hi,
I'm trying to learn DESeq2 using a simplified data set with 3 controls ("ctr"), and 3 treatments ("koh"). I used the summary from an earlier thread (MDonlin;120724) to get started. I am not getting any error messages, but the output from plotMA does not appear as it does in the DESeq2 "airway" vignette. It looks patterned in a non-random way suggesting something is incorrect (plot attached).

If anyone has any suggestions, I'd much appreciate the help.

Best,
Byron

code:
> library("DESeq2")
>
> #generate count table from text file
> GeneCountTable <- read.table("KM272.d3.ctr.koh.txt", header=TRUE, row.names=1)
>
> head(GeneCountTable)
ctr1d3 ctr2d3 ctr3d3 koh1d3 koh2d3 koh3d3
gi|10000000001|loc|edl|EDL_NS211000002.1| 0 3 0 0 0 0
gi|10000000002|loc|edl|EDL_NS211000003.1| 0 0 0 0 0 0
gi|10000000003|loc|edl|EDL_NS211000004.1| 0 0 0 0 0 0
gi|10000000008|loc|edl|EDL_NS211000009.1| 0 0 0 0 0 0
gi|10000000011|loc|edl|EDL_NS211000012.1| 0 0 0 0 0 0
gi|10000000018|loc|edl|EDL_NS211000019.1| 0 0 0 0 0 0
>
> #define samples
> samples <- data.frame(row.names=c("ctr1d3","ctr2d3","ctr3d3","koh1d3","koh2d3","koh3d3"), condition=as.factor(c(rep("ctr",3),rep("koh",3))))
> samples
condition
ctr1d3 ctr
ctr2d3 ctr
ctr3d3 ctr
koh1d3 koh
koh2d3 koh
koh3d3 koh
>
> #generate DESeq dataset
> KM272dds <- DESeqDataSetFromMatrix(countData = GeneCountTable, colData=samples, design=~condition)
> KM272dds
class: DESeqDataSet
dim: 559312 6
exptData(0):
assays(1): counts
rownames(559312): gi|10000000001|loc|edl|EDL_NS211000002.1|
gi|10000000002|loc|edl|EDL_NS211000003.1| ... gi|9964628|ref|NP_064758.1|
gi|99878752|ref|YP_615055.1|
rowRanges metadata column names(0):
colnames(6): ctr1d3 ctr2d3 ... koh2d3 koh3d3
colData names(1): condition
>
> #run DESeq on dataset
> KM272dds_1 <- DESeq(KM272dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
>
> #generate results table
> KM272_res <- results(KM272dds_1)
>
> #reorder results table by lowest adjusted P-value
> KM272_resOrdered <- KM272_res[order(KM272_res$padj),]
> head(KM272_resOrdered)
log2 fold change (MAP): condition koh vs ctr
Wald test p-value: condition koh vs ctr
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
gi|426411412|ref|YP_007031511.1| 1257.5658 8.121059 0.8434747 9.628101 6.084362e-22 1.696016e-17
gi|537453526|ref|YP_008487251.1| 967.6043 7.914203 0.9057204 8.738021 2.372297e-18 3.306389e-14
gi|152995336|ref|YP_001340171.1| 294.0563 11.018376 1.3738295 8.020192 1.055802e-15 7.357622e-12
gi|224584543|ref|YP_002638341.1| 236.0105 9.143821 1.1382139 8.033482 9.474453e-16 7.357622e-12
gi|333907837|ref|YP_004481423.1| 291.8945 11.013159 1.3848704 7.952484 1.828092e-15 1.019161e-11
gi|285019583|ref|YP_003377294.1| 147.3368 6.864467 0.8663821 7.923140 2.315875e-15 1.075917e-11
>
> #write CSV file
> write.csv(KM272_resOrdered,file="KM272_RNA_results.csv")
>
> #summarize results
> summary(KM272_res)

out of 139894 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 1927, 1.4%
LFC < 0 (down) : 17, 0.012%
outliers [1] : 272, 0.19%
low counts [2] : 111747, 80%
(mean count < 0.5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> #VISUALIZE
> #In DESeq2, the function plotMA shows the log2 fold changes attributable to a given variable over the mean of normalized counts. Points will be colored red if the adjusted p value is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down.
> plotMA(KM272_res, main="DESeq2", ylim=c(-15,15))


> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] DESeq2_1.8.1 RcppArmadillo_0.5.600.2.0 Rcpp_0.12.1
[4] GenomicRanges_1.20.8 GenomeInfoDb_1.4.3 IRanges_2.2.7
[7] S4Vectors_0.6.6 BiocGenerics_0.14.0

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.3 XVector_0.8.0
[5] futile.options_1.0.0 tools_3.2.2 rpart_4.1-10 digest_0.6.8
[9] RSQLite_1.0.0 annotate_1.46.1 gtable_0.1.2 lattice_0.20-33
[13] DBI_0.3.1 proto_0.3-10 gridExtra_2.0.0 genefilter_1.50.0
[17] stringr_1.0.0 cluster_2.0.3 locfit_1.5-9.1 nnet_7.3-11
[21] grid_3.2.2 Biobase_2.28.0 AnnotationDbi_1.30.1 XML_3.98-1.3
[25] survival_2.38-3 BiocParallel_1.2.21 foreign_0.8-66 latticeExtra_0.6-26
[29] Formula_1.2-1 geneplotter_1.46.0 ggplot2_1.0.1 reshape2_1.4.1
[33] lambda.r_1.1.7 magrittr_1.5 scales_0.3.0 Hmisc_3.17-0
[37] MASS_7.3-44 splines_3.2.2 xtable_1.7-4 colorspace_1.2-6
[41] stringi_0.5-5 acepack_1.3-3.3 munsell_0.4.2
>

Attached Files
File Type: pdf KM272_RplotMA_1.pdf (2.26 MB)
Viewing all 26680 articles
Browse latest View live