Differential expression in Python with pyDESeq2

Рет қаралды 19,586

Sanbomics

Жүктеу

Пікірлер: 96

@sanbomics
Жыл бұрын
Fix: You should used the normalized counts layer for the PCA step.
@sarap5548
Жыл бұрын
Thanks! Could you put the code for it?
@cagankaplan5302
9 ай бұрын
Should we specify sc.tl.pca(dds.layers["normed_counts"] or just adding the log1p layer to dds before the pca is enough?
@user-rb2pt1zk1s
7 ай бұрын
@@sarap5548 did you know please how to fix it in the code ?
@vincenzomazza2880
6 ай бұрын
sc.pp.log1p(dds) sc.tl.pca(dds) sc.pl.pca(dds, color = 'Condition', size = 30)
@louis-philippechaumont8400
Жыл бұрын
I watched a bunch of your videos. They are really helpful and well made. I like that you present both python and R. They are helping me get started in bioinformatics!
@sanbomics
Жыл бұрын
I am glad you have found them helpful!
@vjsanchezarevalo
Жыл бұрын
Great video! I have been using pyDeseq2 and works great! Very cool the mapper function! Your videos are always very helpful, thanks a lot!
@sanbomics
Жыл бұрын
Thank you for the kind words :)
@PsychedCamo
2 күн бұрын
Thank you so much this tutorial very helpful!! A quick question: How can you save the volcano plot? for the heatmapL I used: ax.savefig('heat2.png') for the enrichment plot I used: gseaplot(rank_metric=pre_res.ranking, term=terms[0], ofname='enrichment.pdf', **pre_res.results[terms[0]]) for the pca plot I used: sc.pl.pca(dds, color = 'Condition', size = 200, save='PCAplot.png') I couldn't figure out how to save the volcano plot (last graph)
@lly6115
Жыл бұрын
I will click your video everyday to pump that viewed number.(lol) Thank you!
@sanbomics
Жыл бұрын
Doing the lord's work
@lly6115
Жыл бұрын
@@sanbomics thank you for your respectul endeavor and time!
@AbelDavid-qc5xy
2 ай бұрын
Thank you for this very helpful tutorial. is there a function for plotting the euclidean distance map for each sample?
@alexrutherford3105
10 ай бұрын
I'm making a dds and I got a weird error of: TypeError: DeseqDataSet.__init__() got an unexpected keyword argument 'clinical'. has this been updated?
@Rodrigo-yx3sn
10 ай бұрын
I think it changed.I used it like this : dds = DeseqDataSet(counts=counts, metadata=metadata, design_factors="Condition")
@sanbomics
10 ай бұрын
Yeah, sorry for the late reply. They changed clinical to metadata
@user-zm6ip8iy9q
Жыл бұрын
As a beginner in bioinformatics, I am watching the uploaded videos well. There are cases where TCGA data is used for cancer analysis at the single cell level, so if you have time, could you make a related video? Thank you always.
@sanbomics
Жыл бұрын
What analysis are you trying to do?
@parneetkaur7068
26 күн бұрын
For anyone watching in 2024, the DeseqDataSet function does not consider 'clinical' an argument. Instead, it takes 'metadata'.
@juanjara2221
5 ай бұрын
where can I download count table? in the GSE accesion we only have normalized counts or raw data
@heatherpeng7437
9 ай бұрын
Hello, thank you so much for this great, informative tutorial! I'd like to ask what does the PCA plot mean in your example? Thank you.
@lamjennygrace623
9 ай бұрын
Thank you for such a comprehensive tutorial. I'm totally new to Bioinformatics and I found your videos are very helpful. Could you please do a tutorial on how to get the count table and sample file from scratch in Python?? I would very appreaciated
@sanbomics
9 ай бұрын
Hi! You should be able to find guides on how to open csv with pandas. If you can load the counts into pandas you should be able to follow along with the rest
@mikezhao5294
Жыл бұрын
Your video is super helpful! I am wondering is any pipeline for differential expression analysis for ST data, or we just use SC pipeline? Thanks!!
@JAIRAJMATHUR
Жыл бұрын
I love your videos, and I am deeply grateful for all the content, especially python related that you put out! Hope to see more in the future!! What exactly does baseMean tell us? How would you set the threshold for it, as it could vary from dataset to dataset? Also when you use scanpy for PCA, you didnt normalize your data, or log transform it. Are those not necessary when using sc.tl.pca? Another approach that I have tried and cross verified with R, is to use vst transformation using the dds.vst function, and then use this vst transformed data to make PCA, do you think this would make your PCA plot look better? Sorry, one last question: For GSEA, What would be the difference using stat for ranking vs sign(LFC)*-log10(padj) vs LFC*-log10(padj) Also, pre_res.res2d gives a nice dataframe with everything :)
@sanbomics
Жыл бұрын
Thank you! Basemean is just the mean of all the samples for that given gene. Setting a threshold is somewhat arbitrary. You could do it based on the distribution but most people just pick some number between 10-100, unfortunately I haven't seen standardized guidelines. Thanks for pointing that out.. it is actually a mistake on my part. Should have specified the normed_counts layer. Stat vs other metrics for GSEA is another one of those things that hasn't really been standardized. I like stat more now because it doesn't require an extra line of code and it doesn't seem to be inflated as badly as p-value can be sometimes. I would expect the actual results to not differ much though. Wow, thanks for pointing out pre_res.res2d xD
@sanbomics
Жыл бұрын
I'm going to pin this message because it has several important points. Thanks again for commenting!
@burcakotlu7858
Жыл бұрын
@@sanbomics I also liked the content. First of all, thanks for it. Regarding PCA, when I apply vst and then PCA in R, I got more meaningful PCA plot than the scanpy PCA without any transformation beforehand for the same bulk RNA-seq data at hand.
@burcakotlu7858
Жыл бұрын
@@sanbomics By the way, I got error for GSEA. gp.prerank(rnk=ranking, gene_sets=['GO_Biological_Process_2023', my_gene_set], seed = 6, permutation_num=100) givesTypeError: ufunc 'isinf' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' on my side.
@marwanmohamed3844
5 ай бұрын
Hey! Many thanks for this tutorial! i got a questions regarding the design factors, am interested in DE to check effect of AGE, while i want to control for SEX and also want to test for there interaction , do you know how to implement the interaction term in the design factors? design_factors. = ["Age","Sex","Age x Sex"],? continuous_factors=["AGE"]
@nikelElegance
5 ай бұрын
AMAZING. Thank you
@Shin-ct1ts
4 ай бұрын
Thank you for your great introduction. I have the questions in GSEA part. Though I could get ranking and wanna do all gene analysis in 9:40 min and always get the error like below. ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements How can I solve this problem...?
@sanbomics
2 ай бұрын
Sometimes the server is down. Were you able to get it working?
@NewPlant_
3 ай бұрын
Hey man, I'm trying to use this video to help me with my own work but during the GSEA part where you are using the gp.prerank function, I keep getting a value error. ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements. I've done everything you have and my dataframe has two columns just like yours, I don't know why it's expecting 3.
@sanbomics
3 ай бұрын
It might be related to the service being down. It's an API call to their server and when it's down it returns not what is expected and you get a weird error. At least I know I have had to deal with this in the past
@AbelDavid-qc5xy
2 ай бұрын
Can you cover batch correction in python?
@analeighgui4693
Жыл бұрын
Thx a million for the video! I have a general question about Python vs. R. For people new to bioinformatics, R is usually the lang they start with. Many cool packages are written in Python (and more and more). Do you see benefits for bioinformatics newbies to adopt python? You have great videos on both langs, so I want to see your opinion.
@sanbomics
Жыл бұрын
You'll need to learn at least a little of both. But Python is the future in my opinion. I did a video on this a while back: kzitem.info/news/bejne/14R-mXybo3Oohno
@analeighgui4693
Жыл бұрын
@@sanbomics Thanks for sharing! I rely heavily on R but have come across more and more often Python packages, especially in VDJ analysis. Got to seriously pick up the Python now!
@ustigergirl
10 ай бұрын
at the 'mapper' stage if your species is mouse do you change it to 'mouse'? What would be the correct format?
@sanbomics
10 ай бұрын
just make sure to specify species = mouse when you initialize the mapper the first time. Then you can use it over and over again without specifying
@DrRam_Undi
3 ай бұрын
Thank you for the video tutorial. I am not able to install pydeseq2 modules using jupyter notebook. Could you provide me any details on how to solve this? I am getting an error like this : ModuleNotFoundError: No module named 'scipy.stats._boost.nct_ufunc'. Thank you
@sanbomics
2 ай бұрын
I would try making a new environment with python=3.9 and reinstall your packages. Try to install them all in one command so that pip can do a better job avoiding conflicts: pip install a b c d e f ...
@vidalmrc
5 ай бұрын
Hey! Many thanks for this tutorial! I don't know if something changed in the PyDeseq2 API but the line "stat_res.summary()" triggers this error: "IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed" I have spent about 1h now trying to troubleshoot it without any success. I use the "count_table_for_deseq_example.csv" from your github, and I tried going throught the official "Getting Started" to end up with the same error... Any idea what's going on ?
@sanbomics
5 ай бұрын
They have made a lot of changes since I made this video, to the point where I am thinking about doing a new one. I'm not sure I have seen that error at that specific point before, but I have had that same error somewhere else in the workflow. Off the top of my head I can't remember what I did to fix it. It may be that something is off in the dds object. Hard to say without taking a closer look. Let me know if you figure it out
@vidalmrc
5 ай бұрын
@@sanbomics Thanks for your reply, and your videos in general!
@user-fn6fl4jl4x
8 ай бұрын
Hi, can i compare 4 groups(you use 2 groups)? Or DESeq2 only for two? Thanks for answer.
@sanbomics
8 ай бұрын
You can only do pairwise comparisons. But if you want to group them you can. for example, 3 vs 1. Or 2 vs 2. Very dependent on what you are trying to accomplish.
@paulavazquez5135
9 ай бұрын
Thanks for the great tutorial. How can I get a data frame with the normalized counts for each gene in each sample?
@sanbomics
9 ай бұрын
Without checking.. I think it is in one of the layers of the dds object. Something like dds.layers['norm_counts'}
@NadaHashimOsman
7 ай бұрын
what would be the difference in the code it i have "normalized" paired data? whick steps should i neglect , whick steps should be edited to fit my case
@sanbomics
6 ай бұрын
If the data are already normalized you may not be able to use Deseq. I think limma works for that
@chunliang8290
7 ай бұрын
Can you install pyDESeq2 into your google drive. Then, later, you just need to re-mount your google drive so that the pyDESeq2 will be available next time you use it without re-installation?
@sanbomics
6 ай бұрын
No idea, I haven't tried this. Sorry!
@begomorras
10 ай бұрын
My dataset is just samples and doesn't have any control. I have made the metadata dataframe with all S in the conditions, as this is my design but then the prgram spits an error and says : ValueError: Factors should take at least two values, but Condition takes the single value '['S']'. I know that in R the program can be bypassed by writting " design =~1 " and I would like to get help on how to do so in python Thanks in advance
@sanbomics
10 ай бұрын
You don't need a control, but you do need more than one group to test. You need to specify the groups in the condition column
@chunliang8290
7 ай бұрын
I am using your sanbomics. Unfortunately, you only has human and mouse as the species. I work with chicken data. What is your suggestion for alternative solution here?
@sanbomics
6 ай бұрын
I would stick to ensembl/biomart for now
@lly6115
Жыл бұрын
A quick simple question, do you know if pydeseq2 is capable of processing multiple groups? since I haven't being able to find that in its official docs.
@sanbomics
Жыл бұрын
What exactly are you trying to do?
@lly6115
Жыл бұрын
@@sanbomics Thank you for your reply. Let's say I have RNA counts data for 9 samples, including control(3 sample), treatment1(3 samples), treatment2(3 samples). Is it possible to transform 9 columns of counts data to expression matix data using pydeseq2. Could be a blatant rookie question( so ignore this would be totally understandable). Thank you very much for your time.
@JAIRAJMATHUR
Жыл бұрын
@@lly6115 @sanbomics I have been able to convert it into a single condition column, by joining the strings. Then I can specify which contrast I want to run and get my results.
@jousty6696
7 ай бұрын
At 4:13 Name error: name ‘DeseqDataSet_init_’ is not definied
@imranluqman4622
8 ай бұрын
Hey! I got an error saying 'DeseqStats.__init__() got an unexpected keyword argument 'n_cpus''. Could it be that it's an old argument? Thank you!
@sanbomics
8 ай бұрын
Yeah possibly, it was a brand new package and they have been making a lot of changes. For example, they changed "clinical" to "metadata" since i've made the video too
@irvingfrancisco6822
6 ай бұрын
Hey, do you know what is the new argument for 'n_cups'? I have the same issue
@nicowerschler9596
Ай бұрын
stat_res = DeseqStats(dds, contrast = ('Condition','XX','YY'))
@mehwishshoaib1591
10 ай бұрын
When define condition colum you jave 8 samples for which yoù put C and RS manually what to do when we have more no of samples for exapmle first 50 are diseased and last 40 are normal for that how to define condition column
@sanbomics
9 ай бұрын
You can do ['disease'] * 50 + ['normal']*40 or you can import the metadata from csv
@sasimuo
5 ай бұрын
Thanks for your video . Sometimes I get a value error when I run dds.deseq2() , ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported. how do I fix it?
@sanbomics
5 ай бұрын
Hard to say without looking more closely. See if you filter more lowly expressed genes out if that gets rid of the error. For example, get rid of genes that aren't expressed in at least two samples
@sasimuo
5 ай бұрын
@@sanbomics Thank you. when I filter (counts = counts[counts.sum(axis = 1) > 5] ), it helps. Sometimes I play with different counts sum value. However, if I get too high of counts sum value, I get another error which says perfect partition error. is there any parameters I could use in dds.deseq2() function to force to run that overcomes any error? Thanks
@ustigergirl
10 ай бұрын
can your sanbomic script/package be used for cuffdiff results (cufflink tool)?
@sanbomics
9 ай бұрын
Which part? The volcano plot?
@ustigergirl
8 ай бұрын
@@sanbomics sorry for the late reply, for heatmap and volcano plot. Thanks!
@mocabeentrill
Жыл бұрын
Been a while man! Hope you've been well.
@sanbomics
Жыл бұрын
Been busy with werk and haven't been able to do one. But doing well, thanks!
@LeticiaRojasC
Жыл бұрын
Can pyDESeq2 be use for single-cell RNAseq instead of bulk?
@sanbomics
Жыл бұрын
Yup! I actually just did a pseudobulk video kzitem.info/news/bejne/pptmtoWLsIh5a4Y
@adekunleajiboye1244
Жыл бұрын
Thanks for this video. I was not getting the one in R. How do I get your contact?
@sanbomics
Жыл бұрын
You can send me a message on twitter!
@user-rb2pt1zk1s
5 ай бұрын
Hello , Can you please share the source of the dataset ? which article did you use pleaaaase
@sanbomics
5 ай бұрын
Ooops I forgot to put that in the description. Thanks! The samples include normal human cell control and replicative senescence cells from NCBI accession GSE171663
@alexrutherford3105
10 ай бұрын
do you have a link to your sample count_table.csv ? thanks!
@sanbomics
10 ай бұрын
There is a counts table on my github, but I don't remember if it is the same one I used in this video
@Dr.UgurComlekcioglu
10 ай бұрын
Thank you very much for this great video. How can I get gene symbols of "Cow"?
@sanbomics
10 ай бұрын
You are doing DE analysis from cow samples? Interesting. Check ensembl for the species and get the gtf. The gtf will have ensembl id and gene symbol, you just have to parse the file
@Dr.UgurComlekcioglu
9 ай бұрын
@@sanbomics Thank you very much. I did it, and it worked.
@MegaLiaoliao
9 ай бұрын
ValueError: The count matrix should only contain integers at 4:00 min
@sanbomics
9 ай бұрын
Try converting the whole matrix to int and see if you get an error
@TakaTatsumi
Жыл бұрын
thanks !
@sanbomics
Жыл бұрын
You're welcome!
@gerolduntergasser4000
Жыл бұрын
cool
@serychristianrenaud
Жыл бұрын
thanks
@sanbomics
Жыл бұрын
No problem!