Short, complete and crystal clear! You absolutely rock, Dr Roger Peng!
@pensivenincompoop2016
7 жыл бұрын
I am new to R and I am learning it for my phylogenetics and statistics and I can already tell that this package is very useful. Thanks for the tutorial!
@anthonychariton9952
6 жыл бұрын
Brilliant overview, thank you kindly for this
@PandiMengri
4 жыл бұрын
This is exactly what I was looking for! Thank you, Roger! :)
@bodobruckner9600
9 жыл бұрын
Good, flawless and fast, as we have got to appreciate in Roger Peng´s and friends´ Coursera courses :-)
@kvafsu225
3 жыл бұрын
Really nice video.Thanks.
@gmshadowtraders
8 жыл бұрын
Dude you rock! You look a lot like the other R expert Professor Andrew Ng :)
@ChristopherSkyi
9 жыл бұрын
To get chicago.rds, go here: github.com/DataScienceSpecialization/courses/blob/master/03_GettingData/dplyr/chicago.rds
@jyotijain5157
8 жыл бұрын
Thank You.
@lalaithan
7 жыл бұрын
Can someone explain why it is that get all "NA"s when I input chicago
@michelemelchiori7628
9 жыл бұрын
V Nice! Please consider to add the explanation of joins that are important too
@WahranRai
5 жыл бұрын
14:27 assigning work variables and split one instruction per ligne is useful for debugging and facilitate the readibility of the code !!!
@calefalejandrorodriguezcue3754
7 жыл бұрын
Hi Roger. Thanks for this video. I have a DataFrame in R that has several variables (at least three). What I would like to do is to make a pivot table but showing sub totals for each of the variables. I've achieved this with only 2 variables but, unfortunately, when I add a third or a fourth variable doesn't add its sub total in its parent variable. Do you know how to do this in R? I've also tried it in pandas pivot_table but I've got the same. Please help :'(
@kevinmaeir1612
6 жыл бұрын
Hey, I have a table with 4 columns. 2 of them are list of diferents dates and in the another are numbers. I want to compare the columns of dates and get a new table just with the number of the same date. Can you help me? thks
@yousfoss4367
4 жыл бұрын
thks grand prof
@c.deg.7982
5 жыл бұрын
For some reason I cannot get tally() or count() to work inside the summarize() function for a dataset grouped by a catagorical variable...
@tuanlong9238
6 жыл бұрын
my god, look like he uses R original version, supper =)))
@MrAlivallo
5 жыл бұрын
so the hardest part of getting started with 'dplyr' is getting the data wrangled to match for manipulation. How do I do this inside {r} ? If I do this in PowerBI it is all Drag/Drop/Click. Why doesnt this exist for RStudio?
@AllenMartin-hp5yf
Жыл бұрын
What/where is the website you downloaded "chicago" from?
@linussunil83
8 жыл бұрын
can someone explain me the step where he mutates tempcat column in df. i dont understand arguments used for factor : factor(1*(tmpd
@rohanshingade7228
8 жыл бұрын
1 multiplied by (tmpd < 80). If we simply typle (tmpd < 80) we get logical vector. But we multiply it by 1 we will get a numeric vector.
@linussunil83
8 жыл бұрын
Thanks buddy
@kevintan6484
8 жыл бұрын
Hello everyone, I am such a beginner in R. I could not even import the Chicago.rds file right, I click the import data on the right hand side and I select the file and it turn to be messy code. So, I imported my own data (name data1) set from a txt file and try to follow the steps in the video. I can only success few of them, please help me out. I have checked many times that I have downloaded "dplyr" package, and I even try to reinstall the R and R studio, my R version is 3.2.4 data 1 looks like this: V1 V2 V3 V4 Product Names Qty Numeric No.1 Numeric No.2 1. head(select(data1, V1:V3)) returns: Error in head(select(data1, V1)) : could not find function "select" 2. data1.f = filter(data1, V4 > 50) returns: Error in filter(data1, V4 > 50) : object 'V4' not found Then I tried: data1.f = filter(data1, "V4" > 50) it worked, but when I View the data1.f, there are still numbers bigger smaller than 50 in V4 Then I tried: data1.f = filter(data1, data1$V4 > 50) I View all the "N/A" shown in the frame 3. Rename data.1 = rename(data.1, V1 = Productnames, V2 = Qty) returns: Error in rename(data.1, V1 = Productnames, V2 = Qty) : unused arguments (V1 = Productnames, V2 = Qty) 4. Group_by: goodbad = group_by(data1, tempcat) returns: Error: could not find function "group I am really appreciate you guys for helping me out of the wood!!
@lobbielobbie1766
8 жыл бұрын
Hey Kelvin, It is quite difficult by just looking at the error messages without the dataset and reproducible examples. Here's a code sample which you can try. I am using RStudio and you can find a good dplyr cheat sheet at www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf. If you are worried or confused by the %>% pipe in the code, it just mean 'passing the results of one statement to the next' in layman terms. In addition, downloading the package means you are getting the package ready to be used. To use any package in your code, you need to import the package into your code using library() as shown. # import libraries library(dplyr) # create a data frame with named columns set.seed(888) MyDF 50 MyFilter % filter(SalesAmount > 50) View(MyFilter) # create a new sales commission variable using 1% of TotalSales MySales % mutate(MyCommission = 0.01 * SalesAmount) View(MySales) # sum totals by SalesID MySummary % group_by(SalesID) %>% summarise(NumbOfSales = n(), TotalSales = sum(SalesAmount), TotalCommission = sum(MyCommission)) View(MySummary) # sum sales amount by LocationID MyLocationSales % group_by(LocationID) %>% summarise(LocationSalesTotal = sum(SalesAmount)) View(MyLocationSales) HTH, Lobbie
@carriballa
9 жыл бұрын
Thanks Roger, where can I get the data set from? I tried looking for it.
@claveralvaro6245
5 жыл бұрын
You can do it even from excell , just make sure you got the right kind of variables to work with. And also look for the packages you need to load the data in case of a xlsx format (excel file) is the package called "readxl". But if you are like , too lazy or something there are some default data files to work with like "iris" or "crabs" just put it as dataframe into a variable, print it and KAPOO YAH !
@claudiuskerth9497
9 жыл бұрын
where can chicago.rds be downloaded from? It isn't the same dataset as in the gamair package may thanks
@michelemelchiori7628
9 жыл бұрын
github.com/DataScienceSpecialization/courses/blob/master/03_GettingData/dplyr/chicago.rds then click on "Raw" button
@ghtyu99
6 жыл бұрын
I have tried several times to download this dataset from GitHiub using the link above and also receive an error message (see below) whether or not I use the "View Raw" button. I am running R for Mac OS R 3.3.3 GUI 1.69 Mavericks build (7328). Does anyone have a workaround or correction? Thanks. "Error: bad restore file magic number (file may be corrupted) -- no data loadedIn addition: Warning message: file ‘chicago.rds’ has magic number 'X' Use of save versions prior to 2 is deprecated".
@MultiHunter36
5 жыл бұрын
why am I not able to use select function? Error in select(chicago, city:dptp) : could not find function "select" >
@rrmaximiliano
5 жыл бұрын
Maybe you didn't load the dplyr package. Use library(dplyr)
@mikebosko9077
9 жыл бұрын
I'm new to R, what is meant by 'making sure all the factors are annotated'? I understand factors, but annotated how? Thanks much! -Mike
@mdev1187
9 жыл бұрын
@3:14 it's the *levels* of any factors present (there aren't any in the chicago data.frame), so you can control if and when levels are kept or dropped. Usually I'd want retain levels of an *ordered* factor (like a Year), but not unordered ones (like City). If data is missing for a Year (derived from date variable) in one City I wouldn't want to lose that Year as a level, so make Year an Ordered Factor before filtering. If City were a factor I probably wouldn't want to retain every level after filtering, so it's best left as a character variable so the issue doesn't arise.
@jdlopez131
5 жыл бұрын
Isn't sqldf package a lot better than dplyr? I mean sql commands :) need I say more?
@kunalbali810
9 жыл бұрын
I have two dataframe suppose like latitude longitude values 20 11 3.5 20 12 1.5 20 13 4.5 20 14 4 21 11 1.2 21 12 1.4 21 13 1.4 21 14 1.8 and latitude longitude values 20 11 3 20 12 1 20 13 4 20 14 4 21 11 1 21 12 1 21 13 1.4 21 14 1.2 now i need to get the result like 20 11 3.32 20 12 1.25 20 13 4.25 20 14 4 21 11 1.1 21 12 1.2 21 13 1.4 21 14 1.5 You see i just did the mean of 3rd column with each rows So how can i do that as i am dealing with atmospheric data so i need to do this please tell me how to do ??
@sushantchoudhary6393
9 жыл бұрын
you could just say dataframe3$values = dataframe1$values + dataframe2$values. How you got 3.32 there in the third table though is ... it's not the mean of 3 and 3.5, just so we're on the same page.
@sushantchoudhary6393
9 жыл бұрын
Sorry forgot to divide by 2. dataframe3$values = dataframe3$values/2
@kunalbali810
9 жыл бұрын
Sushant Choudhary Do you know how to plot standard error or standard bar plot in time series graph ??
@sushantchoudhary6393
9 жыл бұрын
Yes, I do. To say any more than that, I would need a more precise question, though.
What version of R is Dr. Peng using here? I have downloaded R version 3.2.1 (2015-06-18). But, unfortunately, I cannot use the "chicago.rds" package -- error message -- is not available (for R version 3.2.1) Is there any workarounds for this? Or would I need to uninstall my current version of R and find the older version in order to install/load this package? Thank you! I'm new to programming in R, so any help would be greatly appreciated!
Пікірлер: 46