-
Notifications
You must be signed in to change notification settings - Fork 1k
feat(7614): added redirection of data.frame at dcast and melt to dcast.data.table and melt.data.table #7634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
No obvious timing issues in HEAD=fix/dcast_melt_for_data.frame Generated via commit 3cfe6a0 Download link for the artifact containing the test results: ↓ atime-results.zip
|
R/fcast.R
Outdated
| dcast.data.frame = function(data, ...) { | ||
| if (!is.data.frame(data)) stopf("'%s' must be a data.frame", "data") | ||
| data = as.data.table(data) | ||
| dcast.data.table(data, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remove the error in dcast.data.table. Don't coerce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
R/fmelt.R
Outdated
| ans | ||
| } | ||
|
|
||
| melt.data.frame = function(data, ...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this constitutes a breaking change for packages/scripts using both {data.table} and {reshape2} (e.g. {WebAnalytics} or {BTSPAS}), which might result in the wrong S3 method being invoked.
Please investigate the interaction of your update with {reshape2}, e.g. the load order.
I think the solution is not to have a data.frame method, but to fake one in the dcast() generic like:
dcast = function(x, ...) {
if (!is.data.table(x) && is.data.frame(x)) return(dcast.data.table(x, ...))
UseMethod("dcast")
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have redirected data.frame to the dcast.data.table
currently fixing a few errors
Running test id 2365.2 Test 2365.2 produced 1 errors but expected 0
Expected:
Observed: argument "formula" is missing, with no default
Test 2365.2 produced 1 messages but expected 0
Expected:
Observed: Using 'v' as value column. Use 'value.var' to override
Will update the PR once this is fixed
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7634 +/- ##
=======================================
Coverage 99.02% 99.02%
=======================================
Files 87 87
Lines 16890 16896 +6
=======================================
+ Hits 16726 16732 +6
Misses 164 164 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hii @MichaelChirico without coercing it is failing the following test error: object of type 'symbol' is not subsettable Will need to test out the point of crash (using gdb) - will check it out tomorrow ^^ |
… and melt.data.table
|
Found the issue The way we discussed to use the following It was failing at the following lines This is using match call hence m contains data, formula etc meaning m[["subset"]] = subset The soln is to use match call at the dcast function and find the user args and forward it and redirect it to dcast.data.table Although this crash only occured in dcast and a return worked for melt but I have still changed both but I am not sure which will be more optimal perf wise Let me know if anything needs to be changed or anything can be optimized in this case Thanks |
inst/tests/tests.Rraw
Outdated
| test(1953.4, melt.data.table(DT, id.vars = 'id', measure.vars = 'a'), | ||
| error = "must be a data.table") | ||
| expected = data.table(id = rep(DT$id, 2), variable = factor(rep(c("a1", "a2"), each = 3)), value = c(DT$a1, DT$a2)) | ||
| test(1953.4, melt.data.table(DT, id.vars = "id", measure.vars = c("a1", "a2")), expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can avoid calling private functions, that's great. Does this test still work with just using melt(DT, ...)?
This test is labeled as being for "coverage", for which purpose we sometimes break the public API, but now that it's possible to send a data.frame through to melt.data.table(), ideally we can stick to the public API even in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it does, will change it to melt
inst/tests/tests.Rraw
Outdated
| message = 'Using.*value column.*override') | ||
| setDF(DT) | ||
| test(1962.085, dcast.data.table(DT), error = 'must be a data.table') | ||
| test(1962.085, guess(DT), 'V', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this changes the nature of the test a fair amount. Why just just test agains the output dcast(DT) for some trivial formula?
Or perhaps, we can just drop this test and mark it with a comment "removed test of error case for dcast() passed a data.frame for #7614"
inst/tests/tests.Rraw
Outdated
| # test for data.frame reshape for dcast | ||
| df_dcast = data.frame(a = c("x", "y"), b = 1:2, v = 3:4) | ||
| dt_dcast = data.table(a = c("x", "y"), b = 1:2, v = 3:4) | ||
| test(2365.2, dcast(df_dcast, a ~ b, value.var = "v"), dcast(dt_dcast, a ~ b, value.var = "v")) No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restore trailing newline
R/fcast.R
Outdated
| eval(mc, parent.frame()) | ||
| } | ||
| else { | ||
| UseMethod("dcast", data) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| eval(mc, parent.frame()) | |
| } | |
| else { | |
| UseMethod("dcast", data) | |
| } | |
| return(eval(mc, parent.frame())) | |
| } | |
| UseMethod("dcast", data) |
R/fmelt.R
Outdated
| eval(mc, parent.frame()) | ||
| } | ||
| else { | ||
| UseMethod("melt", data) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| eval(mc, parent.frame()) | |
| } | |
| else { | |
| UseMethod("melt", data) | |
| } | |
| return(eval(mc, parent.frame())) | |
| } | |
| UseMethod("melt", data) |
| UseMethod("dcast", data) | ||
| if (!is.data.table(data) && is.data.frame(data)){ | ||
| mc <- match.call() | ||
| mc[[1L]] <- as.name("dcast.data.table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm this works outside the test suite?
R CMD INSTALL data.table
Rscript -e "library(data.table); dcast(mtcars, cyl ~ gear, value.var='wt', fun.aggregate=mean)"I am not sure without running it myself whether the frame in which mc is evaluated will be able to see dcast.data.table given that it's not exported.
In the test suite, we define dcast.data.table = data.table:::dcast.data.table, so the test passing is not enough to resolve my apprehension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> dcast(mtcars, cyl ~ gear, value.var='wt', fun.aggregate=mean)
Key: <cyl>
cyl 3 4 5
<num> <num> <num> <num>
1: 4 2.465000 2.378125 1.8265
2: 6 3.337500 3.093750 2.7700
3: 8 4.104083 NaN 3.3700
This is the output for dcast of mtcars
gives the same output on converting it to data.table
> a = as.data.table(mtcars)
> dcast(a, cyl ~ gear, value.var='wt', fun.aggregate=mean)
Key: <cyl>
cyl 3 4 5
<num> <num> <num> <num>
1: 4 2.465000 2.378125 1.8265
2: 6 3.337500 3.093750 2.7700
3: 8 4.104083 NaN 3.3700
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had also printed the m values while debugging (line 170 of fcast)
and the results for data.table and data.frame were same after these changes and were as expected
eg: m["data"] was giving dt_dcast which was the user arg passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> dcast(mtcars, cyl ~ gear, value.var='wt', fun.aggregate=mean)
$data
mtcars
$formula
cyl ~ gear
$fun.aggregate
mean
$value.var
[1] "wt"
Key: <cyl>
cyl 3 4 5
<num> <num> <num> <num>
1: 4 2.465000 2.378125 1.8265
2: 6 3.337500 3.093750 2.7700
3: 8 4.104083 NaN 3.3700
>
when printing m values
|
Updated with the suggested changes ^^ |

Closes #7614
I added tests for data.frame to check if dcast and melt work directly but got the following error
Hence redirected data.frame to dcast.data.table directly similarly for melt