From d1b7af0c2e097345f9a42acefe9aa6c85d2dfb18 Mon Sep 17 00:00:00 2001 From: Tyson Barrett Date: Wed, 24 Dec 2025 12:51:53 -0700 Subject: [PATCH 01/25] 1.18.0 on CRAN. Bump to 1.18.99 --- DESCRIPTION | 2 +- NEWS.md | 2 +- po/R-data.table.pot | 298 ++++++++++++++-------------- po/data.table.pot | 468 +++++++++++++++++++------------------------- src/init.c | 2 +- 5 files changed, 350 insertions(+), 422 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 966865afe3..ab359df087 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: data.table -Version: 1.17.99 +Version: 1.18.0 Title: Extension of `data.frame` Depends: R (>= 3.4.0) Imports: methods diff --git a/NEWS.md b/NEWS.md index 1c8540a9cc..aed1752d87 100644 --- a/NEWS.md +++ b/NEWS.md @@ -2,7 +2,7 @@ **If you are viewing this file on CRAN, please check [latest news on GitHub](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) where the formatting is also better.** -## data.table [v1.17.99](https://github.com/Rdatatable/data.table/milestone/35) (in development) +## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025 ### BREAKING CHANGE diff --git a/po/R-data.table.pot b/po/R-data.table.pot index a27ccad521..960681dd7c 100644 --- a/po/R-data.table.pot +++ b/po/R-data.table.pot @@ -1,7 +1,7 @@ msgid "" msgstr "" "Project-Id-Version: data.table 1.17.99\n" -"POT-Creation-Date: 2025-12-13 17:01+0000\n" +"POT-Creation-Date: 2025-12-23 12:03-0700\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" @@ -65,30 +65,30 @@ msgstr "" msgid "Please provide either 'key' or 'sorted', but not both." msgstr "" -#: as.data.table.R:111 +#: as.data.table.R:112 #, c-format msgid "" "Argument 'value.name' should not overlap with column names in result: %s" msgstr "" -#: as.data.table.R:161 +#: as.data.table.R:162 #, c-format msgid "" "POSIXlt column type detected and converted to POSIXct. We do not recommend " "use of POSIXlt at all because it uses 40 bytes to store one date." msgstr "" -#: as.data.table.R:206 +#: as.data.table.R:207 #, c-format msgid "Item %d has %d rows but longest item has %d; recycled with remainder." msgstr "" -#: as.data.table.R:221 +#: as.data.table.R:222 #, c-format msgid "A column may not be called .SD. That has special meaning." msgstr "" -#: as.data.table.R:247 +#: as.data.table.R:248 #, c-format msgid "class must be length 1" msgstr "" @@ -833,13 +833,13 @@ msgid "" "'numeric')" msgstr "" -#: data.table.R:1209 +#: data.table.R:1210 msgid "" "No rows match i. No new columns to add so not evaluating RHS of :=\n" "Assigning to 0 row subset of %d rows" msgstr "" -#: data.table.R:1225 +#: data.table.R:1226 #, c-format msgid "" "A shallow copy of this data.table was taken so that := can add or remove %d " @@ -853,23 +853,7 @@ msgid "" "improved." msgstr "" -#: data.table.R:1285 -#, c-format -msgid "" -"Variable '%s' is not found in calling scope. Looking in calling scope " -"because this symbol was prefixed with .. in the j= parameter." -msgstr "" - -#: data.table.R:1358 -#, c-format -msgid "" -"j (the 2nd argument inside [...]) is a single symbol but column name '%1$s' " -"is not found. If you intended to select columns using a variable in calling " -"scope, please try DT[, ..%1$s]. The .. prefix conveys one-level-up similar " -"to a file system path." -msgstr "" - -#: data.table.R:1408 +#: data.table.R:1246 msgid "" "Growing vector of column pointers from truelength %d to %d. A shallow copy " "has been taken, see ?setalloccol. Only a potential issue if two variables " @@ -879,77 +863,93 @@ msgid "" "'datatable.alloccol' option." msgstr "" -#: data.table.R:1410 +#: data.table.R:1248 msgid "" "Note that the shallow copy will assign to the environment from which := was " "called. That means for example that if := was called within a function, the " "original table may be unaffected." msgstr "" -#: data.table.R:1495 +#: data.table.R:1326 +#, c-format +msgid "" +"Variable '%s' is not found in calling scope. Looking in calling scope " +"because this symbol was prefixed with .. in the j= parameter." +msgstr "" + +#: data.table.R:1399 +#, c-format +msgid "" +"j (the 2nd argument inside [...]) is a single symbol but column name '%1$s' " +"is not found. If you intended to select columns using a variable in calling " +"scope, please try DT[, ..%1$s]. The .. prefix conveys one-level-up similar " +"to a file system path." +msgstr "" + +#: data.table.R:1512 #, c-format msgid "" "The column '.N' can't be grouped because it conflicts with the special .N " "variable. Try setnames(DT,'.N','N') first." msgstr "" -#: data.table.R:1496 +#: data.table.R:1513 #, c-format msgid "" "The column '.I' can't be grouped because it conflicts with the special .I " "variable. Try setnames(DT,'.I','I') first." msgstr "" -#: data.table.R:1515 +#: data.table.R:1532 msgid "" "Note: forcing units=\"secs\" on implicit difftime by group; call difftime " "explicitly to choose custom units" msgstr "" -#: data.table.R:1524 +#: data.table.R:1541 #, c-format msgid "logical error. i is not a data.table, but mult='all' and 'by'=.EACHI" msgstr "" -#: data.table.R:1551 +#: data.table.R:1568 msgid "Finding groups using forderv ..." msgstr "" -#: data.table.R:1565 data.table.R:1597 +#: data.table.R:1582 data.table.R:1614 msgid "Finding group sizes from the positions (can be avoided to save RAM) ..." msgstr "" -#: data.table.R:1573 +#: data.table.R:1590 msgid "Getting back original order ..." msgstr "" -#: data.table.R:1585 +#: data.table.R:1602 msgid "Finding groups using uniqlist on key ..." msgstr "" -#: data.table.R:1589 +#: data.table.R:1606 msgid "Finding groups using uniqlist on index '%s' ..." msgstr "" -#: data.table.R:1808 +#: data.table.R:1825 msgid "lapply optimization changed j from '%s' to '%s'" msgstr "" -#: data.table.R:1810 +#: data.table.R:1827 msgid "lapply optimization is on, j unchanged as '%s'" msgstr "" -#: data.table.R:1819 data.table.R:1843 +#: data.table.R:1836 data.table.R:1860 msgid "GForce optimized j to '%s' (see ?GForce)" msgstr "" -#: data.table.R:1844 +#: data.table.R:1861 msgid "" "GForce is on, but not activated for this query; left j unchanged (see ?" "GForce)" msgstr "" -#: data.table.R:1863 +#: data.table.R:1880 #, c-format msgid "" "Unable to optimize call to mean() and could be very slow. You must name 'na." @@ -957,31 +957,31 @@ msgid "" "'trim' which is the 2nd argument of mean. 'trim' is not yet optimized." msgstr "" -#: data.table.R:1867 +#: data.table.R:1884 msgid "Old mean optimization changed j from '%s' to '%s'" msgstr "" -#: data.table.R:1869 +#: data.table.R:1886 msgid "Old mean optimization is on, left j unchanged." msgstr "" -#: data.table.R:1879 +#: data.table.R:1896 msgid "All optimizations are turned off" msgstr "" -#: data.table.R:1880 +#: data.table.R:1897 msgid "Optimization is on but left j unchanged (single plain symbol): '%s'" msgstr "" -#: data.table.R:1909 +#: data.table.R:1926 msgid "Making each group and running j (GForce %s) ..." msgstr "" -#: data.table.R:2005 +#: data.table.R:2022 msgid "setkey() after the := with keyby= ..." msgstr "" -#: data.table.R:2009 +#: data.table.R:2026 #, c-format msgid "" "The setkey() normally performed by keyby= has been skipped (as if by= was " @@ -990,81 +990,81 @@ msgid "" "existing column names to keyby=." msgstr "" -#: data.table.R:2035 +#: data.table.R:2052 msgid "setkey() afterwards for keyby=.EACHI ..." msgstr "" -#: data.table.R:2144 +#: data.table.R:2161 #, c-format msgid "rownames and rownames.value cannot both be used at the same time" msgstr "" -#: data.table.R:2147 +#: data.table.R:2164 #, c-format msgid "" "length(rownames)==%d but nrow(DT)==%d. The rownames argument specifies a " "single column name or number. Consider rownames.value= instead." msgstr "" -#: data.table.R:2151 +#: data.table.R:2168 #, c-format msgid "" "length(rownames)==0 but should be a single column name or number, or NULL" msgstr "" -#: data.table.R:2155 +#: data.table.R:2172 #, c-format msgid "" "rownames is TRUE but key has multiple columns %s; taking first column x[,1] " "as rownames" msgstr "" -#: data.table.R:2165 +#: data.table.R:2182 #, c-format msgid "'%s' is not a column of x" msgstr "" -#: data.table.R:2171 +#: data.table.R:2188 #, c-format msgid "" "as.integer(rownames)==%d which is outside the column number range [1,ncol=" "%d]." msgstr "" -#: data.table.R:2176 +#: data.table.R:2193 #, c-format msgid "length(rownames.value)==%d but should be nrow(x)==%d" msgstr "" -#: data.table.R:2272 +#: data.table.R:2289 #, c-format msgid "" "When i is a matrix in DT[i]<-value syntax, it doesn't make sense to provide j" msgstr "" -#: data.table.R:2282 +#: data.table.R:2299 #, c-format msgid "j must be an atomic vector, see ?is.atomic" msgstr "" -#: data.table.R:2283 +#: data.table.R:2300 #, c-format msgid "NA in j" msgstr "" -#: data.table.R:2289 +#: data.table.R:2306 #, c-format msgid "j must be vector of column name or positions" msgstr "" -#: data.table.R:2290 +#: data.table.R:2307 #, c-format msgid "" "Attempt to assign to column position greater than ncol(x). Create the column " "by name, instead. This logic intends to catch (most likely) user errors." msgstr "" -#: data.table.R:2357 +#: data.table.R:2374 #, c-format msgid "" "data.table inherits from data.frame (from v1.5), but this data.table does " @@ -1072,87 +1072,87 @@ msgid "" "'data.table') or saved to disk using a prior version of data.table?" msgstr "" -#: data.table.R:2366 +#: data.table.R:2383 #, c-format msgid "attempting to assign invalid object to dimnames of a data.table" msgstr "" -#: data.table.R:2367 +#: data.table.R:2384 #, c-format msgid "data.tables do not have rownames" msgstr "" -#: data.table.R:2368 data.table.R:2752 +#: data.table.R:2385 data.table.R:2769 #, c-format msgid "Can't assign %d names to a %d-column data.table" msgstr "" -#: data.table.R:2432 +#: data.table.R:2449 #, c-format msgid "'subset' must evaluate to logical" msgstr "" -#: data.table.R:2475 +#: data.table.R:2492 #, c-format msgid "Argument 'invert' must be logical TRUE/FALSE" msgstr "" -#: data.table.R:2521 +#: data.table.R:2538 #, c-format msgid "group length is 0 but data nrow > 0" msgstr "" -#: data.table.R:2523 +#: data.table.R:2540 #, c-format msgid "" "passing 'f' argument together with 'by' is not allowed, use 'by' when split " "by column in data.table and 'f' when split by external factor" msgstr "" -#: data.table.R:2531 +#: data.table.R:2548 #, c-format msgid "Either 'by' or 'f' argument must be supplied" msgstr "" -#: data.table.R:2533 +#: data.table.R:2550 #, c-format msgid "Column '.ll.tech.split' is reserved for split.data.table processing" msgstr "" -#: data.table.R:2534 +#: data.table.R:2551 #, c-format msgid "Column '.nm.tech.split' is reserved for split.data.table processing" msgstr "" -#: data.table.R:2535 +#: data.table.R:2552 #, c-format msgid "Argument 'by' must refer to column names in x" msgstr "" -#: data.table.R:2536 +#: data.table.R:2553 #, c-format msgid "" "Argument 'by' must refer only to atomic-type columns, but the following " "columns are non-atomic: %s" msgstr "" -#: data.table.R:2583 +#: data.table.R:2600 msgid "Processing split.data.table with: %s" msgstr "" -#: data.table.R:2683 +#: data.table.R:2700 #, c-format msgid "" "x is not a data.table|frame. Shallow copy is a copy of the vector of column " "pointers (only), so is only meaningful for data.table|frame" msgstr "" -#: data.table.R:2692 +#: data.table.R:2709 #, c-format msgid "setalloccol attempting to modify `*tmp*`" msgstr "" -#: data.table.R:2727 +#: data.table.R:2744 #, c-format msgid "" "Input is a length=1 logical that points to the same address as R's global " @@ -1160,52 +1160,52 @@ msgid "" "copy. You will need to assign the result back to a variable. See issue #1281." msgstr "" -#: data.table.R:2742 +#: data.table.R:2759 #, c-format msgid "x is not a data.table or data.frame" msgstr "" -#: data.table.R:2744 +#: data.table.R:2761 #, c-format msgid "x has %d columns but its names are length %d" msgstr "" -#: data.table.R:2751 +#: data.table.R:2768 #, c-format msgid "Passed a vector of type '%s'. Needs to be type 'character'." msgstr "" -#: data.table.R:2764 +#: data.table.R:2781 #, c-format msgid "'new' is not a character vector or a function" msgstr "" -#: data.table.R:2766 +#: data.table.R:2783 #, c-format msgid "NA in 'new' at positions %s" msgstr "" -#: data.table.R:2767 +#: data.table.R:2784 #, c-format msgid "Some duplicates exist in 'old': %s" msgstr "" -#: data.table.R:2769 +#: data.table.R:2786 #, c-format msgid "'old' is type %s but should be integer, double or character" msgstr "" -#: data.table.R:2770 +#: data.table.R:2787 #, c-format msgid "'old' is length %d but 'new' is length %d" msgstr "" -#: data.table.R:2771 +#: data.table.R:2788 #, c-format msgid "NA (or out of bounds) in 'old' at positions %s" msgstr "" -#: data.table.R:2774 +#: data.table.R:2791 #, c-format msgid "" "Item %d of 'old' is '%s' which appears several times in column names. Just " @@ -1213,40 +1213,40 @@ msgid "" "duplicated in column names." msgstr "" -#: data.table.R:2782 +#: data.table.R:2799 #, c-format msgid "" "Items of 'old' not found in column names: %s. Consider skip_absent=TRUE." msgstr "" -#: data.table.R:2823 +#: data.table.R:2840 #, c-format msgid "Provide either before= or after= but not both" msgstr "" -#: data.table.R:2825 +#: data.table.R:2842 #, c-format msgid "before=/after= accept a single column name or number, not more than one" msgstr "" -#: data.table.R:2882 +#: data.table.R:2910 #, c-format msgid "Input is %s but should be a plain list of items to be stacked" msgstr "" -#: data.table.R:2886 +#: data.table.R:2914 #, c-format msgid "" "idcol must be a logical or character vector of length 1. If logical TRUE the " "id column will named '.id'." msgstr "" -#: data.table.R:2891 +#: data.table.R:2919 #, c-format msgid "use.names=NA invalid" msgstr "" -#: data.table.R:2893 +#: data.table.R:2921 #, c-format msgid "" "use.names='check' cannot be used explicitly because the value 'check' is new " @@ -1254,7 +1254,7 @@ msgid "" "behavior. See ?rbindlist." msgstr "" -#: data.table.R:2908 +#: data.table.R:2936 #, c-format msgid "" "Check that is.data.table(DT) == TRUE. Otherwise, `:=` is defined for use in " @@ -1265,33 +1265,33 @@ msgid "" "=` as the only statement in `j`." msgstr "" -#: data.table.R:2925 +#: data.table.R:2953 #, c-format msgid "" "setDF only accepts data.table, data.frame or list of equal length as input" msgstr "" -#: data.table.R:2926 +#: data.table.R:2954 #, c-format msgid "rownames contains duplicates" msgstr "" -#: data.table.R:2933 data.table.R:2944 data.table.R:2967 +#: data.table.R:2961 data.table.R:2972 data.table.R:2995 #, c-format msgid "rownames incorrect length; expected %d names, got %d" msgstr "" -#: data.table.R:2952 +#: data.table.R:2980 #, c-format msgid "All elements in argument 'x' to 'setDF' must be of same length" msgstr "" -#: data.table.R:2981 +#: data.table.R:3009 #, c-format msgid "Cannot find symbol %s" msgstr "" -#: data.table.R:2988 +#: data.table.R:3016 #, c-format msgid "" "Cannot convert '%1$s' to data.table by reference because binding is locked. " @@ -1301,92 +1301,92 @@ msgid "" "setDT again." msgstr "" -#: data.table.R:3042 +#: data.table.R:3070 #, c-format msgid "" "Argument 'x' to 'setDT' should be a 'list', 'data.frame' or 'data.table'" msgstr "" -#: data.table.R:3075 data.table.R:3100 +#: data.table.R:3103 data.table.R:3128 #, c-format msgid "'prefix' must be NULL or a character vector of length 1." msgstr "" -#: data.table.R:3078 data.table.R:3103 +#: data.table.R:3106 data.table.R:3131 #, c-format msgid "x is a single vector, non-NULL 'cols' doesn't make sense." msgstr "" -#: data.table.R:3082 data.table.R:3107 +#: data.table.R:3110 data.table.R:3135 #, c-format msgid "x is a list, 'cols' cannot be 0-length." msgstr "" -#: data.table.R:3262 +#: data.table.R:3290 #, c-format msgid "" "It looks like you re-used `:=` in argument %d a functional assignment " "call -- use `=` instead: %s(col1=val1, col2=val2, ...)" msgstr "" -#: data.table.R:3328 +#: data.table.R:3356 #, c-format msgid "" "RHS of %s is length %d which is not 1 or nrow (%d). For robustness, no " "recycling is allowed (other than of length 1 RHS). Consider %%in%% instead." msgstr "" -#: data.table.R:3360 +#: data.table.R:3388 msgid "" "Subsetting optimization disabled because the cross-product of RHS values " "exceeds 1e4, causing memory problems." msgstr "" -#: data.table.R:3378 +#: data.table.R:3406 msgid "Optimized subsetting with key %s" msgstr "" -#: data.table.R:3397 data.table.R:3409 +#: data.table.R:3425 data.table.R:3437 msgid "Optimized subsetting with index '%s'" msgstr "" -#: data.table.R:3404 +#: data.table.R:3432 msgid "Creating new index '%s'" msgstr "" -#: data.table.R:3405 +#: data.table.R:3433 msgid "Creating index %s done in ..." msgstr "" -#: data.table.R:3438 +#: data.table.R:3466 #, c-format msgid "" "'on' argument should be a named atomic vector of column names indicating " "which columns in 'i' should be joined with which columns in 'x'." msgstr "" -#: data.table.R:3479 +#: data.table.R:3507 #, c-format msgid "" "Found more than one operator in one 'on' statement: %s. Please specify a " "single operator." msgstr "" -#: data.table.R:3502 +#: data.table.R:3530 #, c-format msgid "" "'on' contains no column name: %s. Each 'on' clause must contain one or two " "column names." msgstr "" -#: data.table.R:3504 +#: data.table.R:3532 #, c-format msgid "" "'on' contains more than 2 column names: %s. Each 'on' clause must contain " "one or two column names." msgstr "" -#: data.table.R:3509 +#: data.table.R:3537 #, c-format msgid "Invalid join operators %s. Only allowed operators are %s." msgstr "" @@ -3485,7 +3485,7 @@ msgid "" "too, please restart R with LANGUAGE=en" msgstr "" -#: test.data.table.R:148 +#: test.data.table.R:149 msgid "" "***\n" "*** memtest=%d. This should be the first call in a fresh R_GC_MEM_GROW=0 R " @@ -3493,129 +3493,133 @@ msgid "" "***" msgstr "" -#: test.data.table.R:149 +#: test.data.table.R:150 #, c-format msgid "" "memtest intended for Linux. Step through data.table:::rss() to see what went " "wrong." msgstr "" -#: test.data.table.R:203 +#: test.data.table.R:204 #, c-format msgid "Attempt to subset to %d tests matching '%s' failed, running full suite." msgstr "" -#: test.data.table.R:208 +#: test.data.table.R:209 msgid "Running %d of %d tests matching '%s'" msgstr "" -#: test.data.table.R:278 +#: test.data.table.R:279 #, c-format msgid "Failed in %s after test %s before the next test() call in %s" msgstr "" -#: test.data.table.R:306 +#: test.data.table.R:308 #, c-format msgid "" "Tests succeeded, but non-test code caused warnings. Search %s for tests " "shown above." msgstr "" -#: test.data.table.R:316 +#: test.data.table.R:318 #, c-format msgid "Timings count mismatch: %d vs %d" msgstr "" -#: test.data.table.R:318 +#: test.data.table.R:320 msgid "10 longest running tests took %ds (%d%% of %ds)" msgstr "" -#: test.data.table.R:324 +#: test.data.table.R:326 msgid "10 largest RAM increases (MiB); see plot for cumulative effect (if any)" msgstr "" -#: test.data.table.R:334 +#: test.data.table.R:336 +msgid "Skipped %d tests for translated messages." +msgstr "" + +#: test.data.table.R:337 msgid "All %d tests (last %.8g) in %s completed ok in %s" msgstr "" -#: test.data.table.R:430 +#: test.data.table.R:434 msgid "Running test id %s" msgstr "" -#: test.data.table.R:448 +#: test.data.table.R:452 #, c-format msgid "" "Test %s is invalid: when error= is provided it does not make sense to pass y " "as well" msgstr "" -#: test.data.table.R:493 +#: test.data.table.R:506 msgid "Test id %s is not in increasing order" msgstr "" -#: test.data.table.R:510 +#: test.data.table.R:523 msgid "" "Test %s produced %d %ss but expected %d\n" "%s\n" "%s" msgstr "" -#: test.data.table.R:518 +#: test.data.table.R:531 msgid "" "Test %s didn't produce the correct %s:\n" "Expected: %s\n" "Observed: %s" msgstr "" -#: test.data.table.R:527 +#: test.data.table.R:540 msgid "Output captured before unexpected warning/error/message:" msgstr "" -#: test.data.table.R:538 +#: test.data.table.R:551 msgid "Test %s did not produce correct output:" msgstr "" -#: test.data.table.R:539 +#: test.data.table.R:552 msgid "Expected: <<%s>>" msgstr "" -#: test.data.table.R:540 test.data.table.R:552 +#: test.data.table.R:553 test.data.table.R:565 msgid "Observed: <<%s>>" msgstr "" -#: test.data.table.R:542 +#: test.data.table.R:555 msgid "Expected (raw): <<%s>>" msgstr "" -#: test.data.table.R:543 test.data.table.R:555 +#: test.data.table.R:556 test.data.table.R:568 msgid "Observed (raw): <<%s>>" msgstr "" -#: test.data.table.R:550 +#: test.data.table.R:563 msgid "Test %s produced output but should not have:" msgstr "" -#: test.data.table.R:551 +#: test.data.table.R:564 msgid "Expected absent (case insensitive): <<%s>>" msgstr "" -#: test.data.table.R:554 +#: test.data.table.R:567 msgid "Expected absent (raw): <<%s>>" msgstr "" -#: test.data.table.R:570 +#: test.data.table.R:583 msgid "Test %s ran without errors but selfrefok(%s) is FALSE" msgstr "" -#: test.data.table.R:595 +#: test.data.table.R:608 msgid "Test %s ran without errors but failed check that x equals y:" msgstr "" -#: test.data.table.R:600 +#: test.data.table.R:613 msgid "First %d of %d (type '%s'):" msgstr "" -#: test.data.table.R:605 +#: test.data.table.R:618 msgid "Non-ASCII string detected, raw representation:" msgstr "" @@ -3843,13 +3847,13 @@ msgid_plural "unsupported column types found in x or y: %s" msgstr[0] "" msgstr[1] "" -#: test.data.table.R:288 +#: test.data.table.R:290 msgid "%d error out of %d. Search %s for test number %s. Duration: %s." msgid_plural "%d errors out of %d. Search %s for test numbers %s. Duration: %s." msgstr[0] "" msgstr[1] "" -#: test.data.table.R:298 +#: test.data.table.R:300 msgid "Caught %d warning outside the test() calls:\n" msgid_plural "Caught %d warnings outside the test() calls:\n" msgstr[0] "" diff --git a/po/data.table.pot b/po/data.table.pot index 9553941a66..d4fb8b49e6 100644 --- a/po/data.table.pot +++ b/po/data.table.pot @@ -1,7 +1,7 @@ msgid "" msgstr "" "Project-Id-Version: data.table 1.17.99\n" -"POT-Creation-Date: 2025-12-13 17:01+0000\n" +"POT-Creation-Date: 2025-12-23 12:03-0700\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" @@ -98,7 +98,7 @@ msgstr "" msgid "getOption('datatable.alloccol')==%d. It must be >=0 and not NA." msgstr "" -#: assign.c:225 between.c:22 between.c:28 frollR.c:97 frollR.c:112 fsort.c:117 gsumm.c:351 gsumm.c:587 gsumm.c:733 gsumm.c:871 gsumm.c:1026 gsumm.c:1118 nafill.c:108 openmp-utils.c:97 uniqlist.c:358 utils.c:118 utils.c:120 +#: assign.c:225 between.c:22 between.c:28 frollR.c:97 frollR.c:112 fsort.c:117 gsumm.c:351 gsumm.c:587 gsumm.c:733 gsumm.c:871 gsumm.c:1026 gsumm.c:1118 nafill.c:108 openmp-utils.c:97 uniqlist.c:358 utils.c:122 utils.c:124 #, c-format msgid "%s must be TRUE or FALSE" msgstr "" @@ -152,11 +152,6 @@ msgstr "" msgid "Assigning to %d row subset of %d rows\n" msgstr "" -#: assign.c:326 -#, c-format -msgid "Added %d new column initialized with all-NA\n" -msgstr "" - #: assign.c:332 msgid "length(LHS)==0; no columns to delete or assign RHS to." msgstr "" @@ -398,171 +393,152 @@ msgid "" "character, NA in any type, or level numbers." msgstr "" -#: assign.c:813 -msgid "Unable to allocate working memory of %zu bytes to combine factor levels" -msgstr "" - -#: assign.c:840 +#: assign.c:822 #, c-format msgid "Coercing 'character' RHS to '%s' to match the type of target vector." msgstr "" -#: assign.c:842 +#: assign.c:824 #, c-format msgid "" "Coercing 'character' RHS to '%s' to match the type of column %d named '%s'." msgstr "" -#: assign.c:850 +#: assign.c:832 msgid "" "Cannot coerce 'list' RHS to 'integer64' to match the type of target vector." msgstr "" -#: assign.c:852 +#: assign.c:834 #, c-format msgid "" "Cannot coerce 'list' RHS to 'integer64' to match the type of column %d named " "'%s'." msgstr "" -#: assign.c:858 +#: assign.c:840 #, c-format msgid "Coercing 'list' RHS to '%s' to match the type of target vector." msgstr "" -#: assign.c:860 +#: assign.c:842 #, c-format msgid "Coercing 'list' RHS to '%s' to match the type of column %d named '%s'." msgstr "" -#: assign.c:868 +#: assign.c:850 #, c-format msgid "Zero-copy coerce when assigning '%s' to '%s' target vector.\n" msgstr "" -#: assign.c:870 +#: assign.c:852 #, c-format msgid "" "Zero-copy coerce when assigning '%s' to column %d named '%s' which is '%s'.\n" msgstr "" -#: assign.c:886 +#: assign.c:868 #, c-format msgid "(target vector)" msgstr "" -#: assign.c:896 assign.c:897 +#: assign.c:878 assign.c:879 #, c-format msgid "" "%d (type '%s') at RHS position %d taken as TRUE when assigning to type '%s' " "%s" msgstr "" -#: assign.c:899 +#: assign.c:881 #, c-format msgid "" "% (type '%s') at RHS position %d taken as TRUE when assigning to " "type '%s' %s" msgstr "" -#: assign.c:900 +#: assign.c:882 #, c-format msgid "" "%f (type '%s') at RHS position %d taken as TRUE when assigning to type '%s' " "%s" msgstr "" -#: assign.c:904 +#: assign.c:886 #, c-format msgid "" "%d (type '%s') at RHS position %d taken as 0 when assigning to type '%s' %s" msgstr "" -#: assign.c:906 +#: assign.c:888 #, c-format msgid "" "% (type '%s') at RHS position %d taken as 0 when assigning to type " "'%s' %s" msgstr "" -#: assign.c:907 +#: assign.c:889 #, c-format msgid "" "%f (type '%s') at RHS position %d either truncated (precision lost) or taken " "as 0 when assigning to type '%s' %s" msgstr "" -#: assign.c:912 +#: assign.c:894 #, c-format msgid "" "% (type '%s') at RHS position %d out-of-range (NA) when assigning to " "type '%s' %s" msgstr "" -#: assign.c:913 assign.c:920 +#: assign.c:895 assign.c:902 #, c-format msgid "" "%f (type '%s') at RHS position %d out-of-range(NA) or truncated (precision " "lost) when assigning to type '%s' %s" msgstr "" -#: assign.c:915 assign.c:924 +#: assign.c:897 assign.c:906 #, c-format msgid "" "%f (type '%s') at RHS position %d either imaginary part discarded or real " "part truncated (precision lost) when assigning to type '%s' %s" msgstr "" -#: assign.c:925 +#: assign.c:907 #, c-format msgid "" "%f (type '%s') at RHS position %d imaginary part discarded when assigning to " "type '%s' %s" msgstr "" -#: assign.c:970 +#: assign.c:952 #, c-format msgid "type '%s' cannot be coerced to '%s'" msgstr "" -#: assign.c:1128 +#: assign.c:1110 #, c-format msgid "Unsupported column type in assign.c:memrecycle '%s'" msgstr "" -#: assign.c:1222 -#, c-format -msgid "Failed to allocate initial %d items in savetl_init" -msgstr "" - -#: assign.c:1238 -#, c-format -msgid "Failed to realloc saveds to %d items in savetl" -msgstr "" - -#: assign.c:1244 -#, c-format -msgid "Failed to realloc savedtl to %d items in savetl" -msgstr "" - -#: assign.c:1267 +#: assign.c:1193 msgid "x must be a character vector" msgstr "" -#: assign.c:1268 +#: assign.c:1194 msgid "'which' must be an integer vector" msgstr "" -#: assign.c:1269 +#: assign.c:1195 msgid "'new' must be a character vector" msgstr "" -#: assign.c:1270 +#: assign.c:1196 #, c-format msgid "'new' is length %d. Should be the same as length of 'which' (%d)" msgstr "" -#: assign.c:1273 +#: assign.c:1199 #, c-format msgid "" "Item %d of 'which' is %d which is outside range of the length %d character " @@ -677,13 +653,6 @@ msgstr "" msgid "x is type '%s' (must be 'character' or NULL)" msgstr "" -#: chmatch.c:106 -#, c-format -msgid "" -"Failed to allocate % bytes working memory in chmatchdup: " -"length(table)=%d length(unique(table))=%d" -msgstr "" - #: cj.c:95 #, c-format msgid "Type '%s' is not supported by CJ." @@ -748,33 +717,33 @@ msgstr "" msgid "env is not an environment" msgstr "" -#: dogroups.c:133 +#: dogroups.c:136 msgid "row.names attribute of .SD not found" msgstr "" -#: dogroups.c:135 +#: dogroups.c:138 #, c-format msgid "" "row.names of .SD isn't integer length 2 with NA as first item; i.e., ." "set_row_names(). [%s %d %d]" msgstr "" -#: dogroups.c:166 +#: dogroups.c:169 #, c-format msgid "length(iSD)[%d] != length(jiscols)[%d]" msgstr "" -#: dogroups.c:167 +#: dogroups.c:170 #, c-format msgid "length(xSD)[%d] != length(xjiscols)[%d]" msgstr "" -#: dogroups.c:279 +#: dogroups.c:282 #, c-format msgid "j evaluates to type '%s'. Must evaluate to atomic vector or list." msgstr "" -#: dogroups.c:288 +#: dogroups.c:291 #, c-format msgid "" "Entry %d for group %d in j=list(...) should be atomic vector or list. If you " @@ -782,7 +751,7 @@ msgid "" "instead (much quicker), or cbind or merge afterwards." msgstr "" -#: dogroups.c:295 +#: dogroups.c:298 #, c-format msgid "" "Entry %d for group %d in j=list(...) is an array with %d dimensions > 1, " @@ -790,13 +759,13 @@ msgid "" "that is intentional." msgstr "" -#: dogroups.c:305 +#: dogroups.c:308 msgid "" "RHS of := is NULL during grouped assignment, but it's not possible to delete " "parts of a column." msgstr "" -#: dogroups.c:309 +#: dogroups.c:312 #, c-format msgid "" "Supplied %d items to be assigned to group %d of size %d in column '%s'. The " @@ -805,16 +774,16 @@ msgid "" "make this intent clear to readers of your code." msgstr "" -#: dogroups.c:342 +#: dogroups.c:345 #, c-format msgid "Group %d column '%s': %s" msgstr "" -#: dogroups.c:349 +#: dogroups.c:352 msgid "j doesn't evaluate to the same number of columns for each group" msgstr "" -#: dogroups.c:383 +#: dogroups.c:386 #, c-format msgid "" "Column %d of j's result for the first group is NULL. We rely on the column " @@ -825,14 +794,14 @@ msgid "" "integer() or numeric()." msgstr "" -#: dogroups.c:386 +#: dogroups.c:389 msgid "" "j appears to be a named vector. The same names will likely be created over " "and over again for each group and slow things down. Try and pass a named " "list (which data.table optimizes) or an unnamed list() instead.\n" msgstr "" -#: dogroups.c:388 +#: dogroups.c:391 #, c-format msgid "" "Column %d of j is a named vector (each item down the rows is named, " @@ -840,7 +809,7 @@ msgid "" "over and over for each group). They are ignored anyway.\n" msgstr "" -#: dogroups.c:396 +#: dogroups.c:399 msgid "" "The result of j is a named list. It's very inefficient to create the same " "names over and over again for each group. When j=list(...), any names are " @@ -849,12 +818,12 @@ msgid "" "to :=). This message may be upgraded to warning in future.\n" msgstr "" -#: dogroups.c:408 +#: dogroups.c:411 #, c-format msgid "dogroups: growing from %d to %d rows\n" msgstr "" -#: dogroups.c:428 +#: dogroups.c:431 #, c-format msgid "" "Item %d of j's result for group %d is zero length. This will be filled with " @@ -863,14 +832,14 @@ msgid "" "buffer." msgstr "" -#: dogroups.c:435 +#: dogroups.c:438 #, c-format msgid "" "Column %d of result for group %d is type '%s' but expecting type '%s'. " "Column types must be consistent for each group." msgstr "" -#: dogroups.c:437 +#: dogroups.c:440 #, c-format msgid "" "Supplied %d items for column %d of group %d which has %d rows. The RHS " @@ -879,41 +848,41 @@ msgid "" "make this intent clear to readers of your code." msgstr "" -#: dogroups.c:456 fsort.c:264 fwrite.c:749 +#: dogroups.c:459 fsort.c:264 fwrite.c:749 msgid "\n" msgstr "" -#: dogroups.c:458 dogroups.c:475 +#: dogroups.c:461 dogroups.c:478 #, c-format msgid "" "Processed %d groups out of %d. %.0f%% done. Time elapsed: %ds. ETA: %ds." msgstr "" -#: dogroups.c:481 +#: dogroups.c:484 #, c-format msgid "Wrote less rows (%d) than allocated (%d).\n" msgstr "" -#: dogroups.c:505 +#: dogroups.c:493 #, c-format msgid "" "\n" " collecting discontiguous groups took %.3fs for %d groups\n" msgstr "" -#: dogroups.c:506 +#: dogroups.c:494 #, c-format msgid "" "\n" " memcpy contiguous groups took %.3fs for %d groups\n" msgstr "" -#: dogroups.c:508 +#: dogroups.c:496 #, c-format msgid " eval(j) took %.3fs for %d calls\n" msgstr "" -#: dogroups.c:522 +#: dogroups.c:510 msgid "growVector passed NULL" msgstr "" @@ -1194,7 +1163,7 @@ msgid "" "table" msgstr "" -#: fmelt.c:466 +#: fmelt.c:465 #, c-format msgid "" "'measure.vars' [%s] are not all of the same type. By order of hierarchy, the " @@ -1203,170 +1172,162 @@ msgid "" "coercion.\n" msgstr "" -#: fmelt.c:578 +#: fmelt.c:577 #, c-format msgid "Unknown column type '%s' for column '%s'." msgstr "" -#: fmelt.c:685 +#: fmelt.c:684 #, c-format msgid "variable_table does not support column type '%s' for column '%s'." msgstr "" -#: fmelt.c:779 +#: fmelt.c:778 #, c-format msgid "Unknown column type '%s' for column '%s' in 'data'" msgstr "" -#: fmelt.c:790 +#: fmelt.c:789 msgid "Input is not of type VECSXP, expected a data.table, data.frame or list" msgstr "" -#: fmelt.c:791 +#: fmelt.c:790 msgid "Argument 'value.factor' should be logical TRUE/FALSE" msgstr "" -#: fmelt.c:792 +#: fmelt.c:791 msgid "Argument 'variable.factor' should be logical TRUE/FALSE" msgstr "" -#: fmelt.c:793 +#: fmelt.c:792 msgid "Argument 'na.rm' should be logical TRUE/FALSE." msgstr "" -#: fmelt.c:794 +#: fmelt.c:793 msgid "Argument 'variable.name' must be a character vector" msgstr "" -#: fmelt.c:795 +#: fmelt.c:794 msgid "Argument 'value.name' must be a character vector" msgstr "" -#: fmelt.c:796 +#: fmelt.c:795 msgid "Argument 'verbose' should be logical TRUE/FALSE" msgstr "" -#: fmelt.c:800 +#: fmelt.c:799 msgid "ncol(data) is 0. Nothing to melt. Returning original data.table." msgstr "" -#: forder.c:112 utils.c:660 +#: forder.c:108 utils.c:652 msgid "Internal error in" msgstr "" -#: forder.c:112 utils.c:660 +#: forder.c:108 utils.c:652 msgid "Please report to the data.table issues tracker." msgstr "" -#: forder.c:123 +#: forder.c:119 #, c-format msgid "Failed to realloc thread private group size buffer to %d*4bytes" msgstr "" -#: forder.c:139 +#: forder.c:135 #, c-format msgid "Failed to realloc group size result to %d*4bytes" msgstr "" -#: forder.c:273 +#: forder.c:270 #, c-format msgid "" "Logical error. counts[0]=%d in cradix but should have been decremented to 0. " "radix=%d" msgstr "" -#: forder.c:291 +#: forder.c:290 msgid "Failed to alloc cradix_counts and/or cradix_tmp" msgstr "" -#: forder.c:324 +#: forder.c:322 #, c-format msgid "Unable to realloc %d * %d bytes in range_str" msgstr "" -#: forder.c:351 -msgid "Failed to alloc ustr3 when converting strings to UTF8" -msgstr "" - -#: forder.c:371 -msgid "Failed to alloc tl when converting strings to UTF8" -msgstr "" - -#: forder.c:401 +#: forder.c:404 msgid "Must an integer or numeric vector length 1" msgstr "" -#: forder.c:402 +#: forder.c:405 msgid "Must be 2, 1 or 0" msgstr "" -#: forder.c:437 +#: forder.c:440 msgid "Unknown non-finite value; not NA, NaN, -Inf or +Inf" msgstr "" -#: forder.c:476 +#: forder.c:479 msgid "" "Input is an atomic vector (not a list of columns) but order= is not a length " "1 integer" msgstr "" -#: forder.c:478 +#: forder.c:481 #, c-format msgid "forder.c received a vector type '%s' length %d\n" msgstr "" -#: forder.c:486 +#: forder.c:489 #, c-format msgid "forder.c received %d rows and %d columns\n" msgstr "" -#: forder.c:496 +#: forder.c:499 #, c-format msgid "'order' length (%d) is different to by='s length (%d)" msgstr "" -#: forder.c:509 +#: forder.c:512 #, c-format msgid "" "Column %d is length %d which differs from length of column 1 (%d), are you " "attempting to order by a list column?\n" msgstr "" -#: forder.c:513 forder.c:1683 +#: forder.c:516 forder.c:1688 msgid "retGrp must be TRUE or FALSE" msgstr "" -#: forder.c:516 forder.c:1686 +#: forder.c:519 forder.c:1691 msgid "retStats must be TRUE or FALSE" msgstr "" -#: forder.c:519 forder.c:1689 +#: forder.c:522 forder.c:1694 msgid "retStats must be TRUE whenever retGrp is TRUE" msgstr "" -#: forder.c:521 forder.c:1691 +#: forder.c:524 forder.c:1696 msgid "sort must be TRUE or FALSE" msgstr "" -#: forder.c:524 +#: forder.c:527 msgid "At least one of retGrp= or sort= must be TRUE" msgstr "" -#: forder.c:526 forder.c:1694 +#: forder.c:529 forder.c:1699 msgid "na.last must be logical TRUE, FALSE or NA of length 1" msgstr "" -#: forder.c:560 forder.c:675 +#: forder.c:562 forder.c:680 #, c-format msgid "Unable to allocate % bytes of working memory" msgstr "" -#: forder.c:578 +#: forder.c:580 #, c-format msgid "Item %d of order (ascending/descending) is %d. Must be +1 or -1." msgstr "" -#: forder.c:608 +#: forder.c:611 #, c-format msgid "" "\n" @@ -1375,146 +1336,146 @@ msgid "" "to save space and time.\n" msgstr "" -#: forder.c:620 +#: forder.c:625 #, c-format msgid "Column %d passed to [f]order is type '%s', not yet supported." msgstr "" -#: forder.c:799 +#: forder.c:804 #, c-format msgid "" "Failed to allocate TMP or UGRP or they weren't cache line aligned: nth=%d" msgstr "" -#: forder.c:808 +#: forder.c:813 msgid "Could not allocate (very tiny) group size thread buffers" msgstr "" -#: forder.c:876 +#: forder.c:881 #, c-format msgid "Timing block %2d%s = %8.3f %8d\n" msgstr "" -#: forder.c:927 forder.c:997 forder.c:1019 forder.c:1126 forder.c:1262 forder.c:1318 +#: forder.c:932 forder.c:1002 forder.c:1024 forder.c:1131 forder.c:1267 forder.c:1323 #, c-format msgid "Failed to allocate %d bytes for '%s'." msgstr "" -#: forder.c:1162 +#: forder.c:1167 #, c-format msgid "Failed to allocate parallel counts. my_n=%d, nBatch=%d" msgstr "" -#: forder.c:1174 +#: forder.c:1179 #, c-format msgid "Failed to allocate 'my_otmp' and/or 'my_ktmp' arrays (%d bytes)." msgstr "" -#: forder.c:1279 +#: forder.c:1284 #, c-format msgid "Unable to allocate TMP for my_n=%d items in parallel batch counting" msgstr "" -#: forder.c:1405 forder.c:1456 +#: forder.c:1410 forder.c:1461 #, c-format msgid "issorted 'by' [%d] out of range [1,%d]" msgstr "" -#: forder.c:1410 +#: forder.c:1415 msgid "is.sorted does not work on list columns" msgstr "" -#: forder.c:1443 forder.c:1473 forder.c:1507 +#: forder.c:1448 forder.c:1478 forder.c:1512 #, c-format msgid "type '%s' is not yet supported" msgstr "" -#: forder.c:1520 +#: forder.c:1525 msgid "x must be either NULL or an integer vector" msgstr "" -#: forder.c:1522 +#: forder.c:1527 msgid "nrow must be integer vector length 1" msgstr "" -#: forder.c:1524 +#: forder.c:1529 #, c-format msgid "nrow==%d but must be >=0" msgstr "" -#: forder.c:1541 +#: forder.c:1546 msgid "x must be type 'double'" msgstr "" -#: forder.c:1651 +#: forder.c:1656 msgid "'datatable.use.index' option must be TRUE or FALSE" msgstr "" -#: forder.c:1664 +#: forder.c:1669 msgid "'datatable.forder.auto.index' option must be TRUE or FALSE" msgstr "" -#: forder.c:1681 +#: forder.c:1686 msgid "DT is NULL" msgstr "" -#: forder.c:1697 +#: forder.c:1702 msgid "order must be integer" msgstr "" -#: forder.c:1699 +#: forder.c:1704 msgid "reuseSorting must be logical TRUE, FALSE or NA of length 1" msgstr "" -#: forder.c:1711 +#: forder.c:1716 #, c-format msgid "" "forderReuseSorting: opt not possible: is.data.table(DT)=%d, sortGroups=%d, " "all1(ascArg)=%d\n" msgstr "" -#: forder.c:1730 +#: forder.c:1735 #, c-format msgid "forderReuseSorting: using key: %s\n" msgstr "" -#: forder.c:1765 +#: forder.c:1770 #, c-format msgid "forderReuseSorting: index found but not for retGrp and retStats: %s\n" msgstr "" -#: forder.c:1768 +#: forder.c:1773 #, c-format msgid "forderReuseSorting: index found but not for retGrp: %s\n" msgstr "" -#: forder.c:1771 +#: forder.c:1776 #, c-format msgid "forderReuseSorting: index found but not for retStats: %s\n" msgstr "" -#: forder.c:1778 +#: forder.c:1783 #, c-format msgid "" "forderReuseSorting: index found but na.last=TRUE and no stats available: %s\n" msgstr "" -#: forder.c:1781 +#: forder.c:1786 #, c-format msgid "forderReuseSorting: index found but na.last=TRUE and NAs present: %s\n" msgstr "" -#: forder.c:1789 +#: forder.c:1794 #, c-format msgid "forderReuseSorting: using existing index: %s\n" msgstr "" -#: forder.c:1801 +#: forder.c:1806 #, c-format msgid "forderReuseSorting: setting index (retGrp=%d, retStats=%d) on DT: %s\n" msgstr "" -#: forder.c:1805 +#: forder.c:1810 #, c-format msgid "forderReuseSorting: opt=%d, took %.3fs\n" msgstr "" @@ -1823,11 +1784,6 @@ msgstr "" msgid " Skipped to line %d in the file" msgstr "" -#: fread.c:1766 -#, c-format -msgid "skip=% but the input only has %d line" -msgstr "" - #: fread.c:1777 msgid "" "Input is either empty, fully whitespace, or skip has been set after the last " @@ -2480,7 +2436,7 @@ msgstr "" msgid "%s: window width of size 0, returning all NaN vector\n" msgstr "" -#: froll.c:236 froll.c:424 froll.c:636 froll.c:840 froll.c:1072 froll.c:1766 +#: froll.c:236 froll.c:424 froll.c:636 froll.c:840 froll.c:1072 froll.c:1767 #, c-format msgid "" "%s: running in parallel for input length %, window %d, hasnf %d, " @@ -2545,7 +2501,7 @@ msgstr "" msgid "%s: calling sqrt(frollvarExact(...))\n" msgstr "" -#: froll.c:1508 froll.c:1769 +#: froll.c:1508 froll.c:1770 #, c-format msgid "%s: window width of size 0, returning all NA vector\n" msgstr "" @@ -2569,10 +2525,19 @@ msgstr "" #: froll.c:1628 #, c-format -msgid "%s: finding order and initializing links for %d blocks %stook %.3fs\n" +msgid "" +"%s: finding order and initializing links for %d blocks in parallel took %." +"3fs\n" msgstr "" -#: froll.c:1638 +#: froll.c:1629 +#, c-format +msgid "" +"%s: finding order and initializing links for %d blocks sequentially took %." +"3fs\n" +msgstr "" + +#: froll.c:1639 #, c-format msgid "" "%s: running implementation as described in the paper by Jukka Suomela, for " @@ -2580,17 +2545,17 @@ msgid "" "input data\n" msgstr "" -#: froll.c:1688 +#: froll.c:1689 #, c-format msgid "%s: skip rolling for %d padded elements\n" msgstr "" -#: froll.c:1721 +#: froll.c:1722 #, c-format msgid "%s: rolling took %.3f\n" msgstr "" -#: froll.c:1818 frolladaptive.c:992 +#: froll.c:1819 frolladaptive.c:992 #, c-format msgid "%s: no NAs detected, redirecting to itself using has.nf=FALSE\n" msgstr "" @@ -2894,47 +2859,30 @@ msgstr "" msgid "No data rows present (nrow==0)\n" msgstr "" -#: fwrite.c:1100 -#, c-format -msgid "" -"Written %.1f%% of % rows in %d secs using %d thread. maxBuffUsed=%d" -"%%. ETA %d secs. " -msgstr "" - -#: fwrite.c:1131 +#: fwrite.c:1132 msgid "Failed to write gzip trailer" msgstr "" -#: fwrite.c:1150 +#: fwrite.c:1151 #, c-format msgid "" "zlib: uncompressed length=%zu (%zu MiB), compressed length=%zu (%zu MiB), " "ratio=%.1f%%, crc=%x\n" msgstr "" -#: fwrite.c:1154 fwrite.c:1154 -#, c-format -msgid "Wrote % row in %.3f secs using %d thread. MaxBuffUsed=%d%%\n" -msgstr "" - -#: fwrite.c:1156 -#, c-format -msgid "Wrote % row in %.3f secs using %d threads. MaxBuffUsed=%d%%\n" -msgstr "" - -#: fwrite.c:1171 +#: fwrite.c:1172 #, c-format msgid "" "zlib %s (zlib.h %s) deflate() returned error %d Z_FINISH=%d Z_BLOCK=%d. %s" msgstr "" -#: fwrite.c:1173 +#: fwrite.c:1174 msgid "" "Please include the full output above and below this message in your data." "table bug report." msgstr "" -#: fwrite.c:1174 +#: fwrite.c:1175 msgid "" "Please retry fwrite() with verbose=TRUE and include the full output with " "your data.table bug report." @@ -3196,117 +3144,117 @@ msgstr "" msgid "Final step, fetching indices in overlaps ... done in %8.3f seconds\n" msgstr "" -#: init.c:182 +#: init.c:183 msgid "" "Pointers are %zu bytes, greater than 8. We have not tested on any " "architecture greater than 64bit yet." msgstr "" -#: init.c:196 +#: init.c:197 msgid "... failed. Please forward this message to maintainer('data.table')." msgstr "" -#: init.c:197 +#: init.c:198 #, c-format msgid "Checking NA_INTEGER [%d] == INT_MIN [%d] %s" msgstr "" -#: init.c:198 +#: init.c:199 #, c-format msgid "Checking NA_INTEGER [%d] == NA_LOGICAL [%d] %s" msgstr "" -#: init.c:199 init.c:200 init.c:202 init.c:205 init.c:206 init.c:207 init.c:208 init.c:209 init.c:210 init.c:211 +#: init.c:200 init.c:201 init.c:203 init.c:206 init.c:207 init.c:208 init.c:209 init.c:210 init.c:211 init.c:212 #, c-format msgid "Checking sizeof(%s) [%zu] is %d %s" msgstr "" -#: init.c:203 +#: init.c:204 #, c-format msgid "Checking sizeof(pointer) [%zu] is 4 or 8 %s" msgstr "" -#: init.c:204 +#: init.c:205 #, c-format msgid "Checking sizeof(SEXP) [%zu] == sizeof(pointer) [%zu] %s" msgstr "" -#: init.c:214 +#: init.c:215 #, c-format msgid "Checking LENGTH(allocVector(INTSXP,2)) [%d] is 2 %s" msgstr "" -#: init.c:221 +#: init.c:222 #, c-format msgid "Checking memset(&i,0,sizeof(int)); i == (int)0 %s" msgstr "" -#: init.c:224 +#: init.c:225 #, c-format msgid "Checking memset(&ui, 0, sizeof(unsigned int)); ui == (unsigned int)0 %s" msgstr "" -#: init.c:227 +#: init.c:228 #, c-format msgid "Checking memset(&d, 0, sizeof(double)); d == (double)0.0 %s" msgstr "" -#: init.c:230 +#: init.c:231 #, c-format msgid "Checking memset(&ld, 0, sizeof(long double)); ld == (long double)0.0 %s" msgstr "" -#: init.c:233 +#: init.c:234 msgid "" "Unlike the very common case, e.g. ASCII, the character '/' is not just " "before '0'." msgstr "" -#: init.c:234 +#: init.c:235 msgid "The C expression (uint_fast8_t)('/'-'0')<10 is true. Should be false." msgstr "" -#: init.c:235 +#: init.c:236 msgid "" "Unlike the very common case, e.g. ASCII, the character ':' is not just after " "'9'." msgstr "" -#: init.c:236 +#: init.c:237 msgid "The C expression (uint_fast8_t)('9'-':')<10 is true. Should be false." msgstr "" -#: init.c:241 +#: init.c:242 #, c-format msgid "Conversion of NA_INT64 via double failed %!=%" msgstr "" -#: init.c:245 +#: init.c:246 msgid "NA_INT64_D (negative -0.0) is not == 0.0." msgstr "" -#: init.c:246 +#: init.c:247 msgid "NA_INT64_D (negative -0.0) is not ==-0.0." msgstr "" -#: init.c:247 +#: init.c:248 msgid "ISNAN(NA_INT64_D) is TRUE but should not be" msgstr "" -#: init.c:248 +#: init.c:249 msgid "isnan(NA_INT64_D) is TRUE but should not be" msgstr "" -#: init.c:282 +#: init.c:283 #, c-format msgid "PRINTNAME(install(\"integer64\")) has returned %s not %s" msgstr "" -#: init.c:341 +#: init.c:342 msgid "verbose option must be length 1 non-NA logical or integer" msgstr "" -#: init.c:366 +#: init.c:367 msgid ".Last.updated in namespace is not a length 1 integer" msgstr "" @@ -3362,11 +3310,6 @@ msgstr "" msgid "fill must be a vector of length 1 or a list of length of x." msgstr "" -#: nafill.c:237 -#, c-format -msgid "%s: parallel processing of %d column took %.3fs\n" -msgstr "" - #: negate.c:6 msgid "not logical or integer vector" msgstr "" @@ -3459,13 +3402,6 @@ msgid "" "length %d. Only length-1 columns are recycled." msgstr "" -#: rbindlist.c:61 -#, c-format -msgid "" -"Column %d ['%s'] of item %d is length 0. This (and %d other like it) has " -"been filled with NA (NULL for list columns) to make each item uniform." -msgstr "" - #: rbindlist.c:66 #, c-format msgid "" @@ -3477,38 +3413,31 @@ msgstr "" msgid "use.names=TRUE but no item of input list has any names" msgstr "" -#: rbindlist.c:76 -#, c-format -msgid "" -"Failed to allocate upper bound of % unique column names " -"[sum(lapply(l,ncol))]" -msgstr "" - -#: rbindlist.c:105 +#: rbindlist.c:106 #, c-format msgid "Failed to allocate nuniq=%d items working memory in rbindlist.c" msgstr "" -#: rbindlist.c:140 +#: rbindlist.c:139 #, c-format msgid "Failed to allocate ncol=%d items working memory in rbindlist.c" msgstr "" -#: rbindlist.c:201 +#: rbindlist.c:199 msgid "" " use.names='check' (default from v1.12.2) emits this message and proceeds as " "if use.names=FALSE for backwards compatibility. See news item 5 in v1.12.2 " "for options to control this message." msgstr "" -#: rbindlist.c:215 +#: rbindlist.c:213 #, c-format msgid "" "Column %d ['%s'] of item %d is missing in item %d. Use fill=TRUE to fill " "with NA (NULL for list columns), or use.names=FALSE to ignore column names.%s" msgstr "" -#: rbindlist.c:224 +#: rbindlist.c:222 #, c-format msgid "" "Column %d ['%s'] of item %d appears in position %d in item %d. Set use." @@ -3516,40 +3445,40 @@ msgid "" "names.%s" msgstr "" -#: rbindlist.c:233 +#: rbindlist.c:231 msgid "" "options()$datatable.rbindlist.check is set but is not a single string. See " "news item 5 in v1.12.2." msgstr "" -#: rbindlist.c:240 +#: rbindlist.c:238 #, c-format msgid "" "options()$datatable.rbindlist.check=='%s' which is not 'message'|'warning'|" "'error'|'none'. See news item 5 in v1.12.2." msgstr "" -#: rbindlist.c:303 +#: rbindlist.c:301 #, c-format msgid "" "Column %d of item %d has type 'factor' but has no levels; i.e. malformed." msgstr "" -#: rbindlist.c:332 +#: rbindlist.c:330 #, c-format msgid "" "Class attribute on column %d of item %d does not match with column %d of " "item %d. You can deactivate this safety-check by using ignore.attr=TRUE" msgstr "" -#: rbindlist.c:383 +#: rbindlist.c:380 rbindlist.c:389 #, c-format msgid "" "Failed to allocate working memory for %d ordered factor levels of result " "column %d" msgstr "" -#: rbindlist.c:406 +#: rbindlist.c:408 #, c-format msgid "" "Column %d of item %d is an ordered factor but level %d ['%s'] is missing " @@ -3558,7 +3487,7 @@ msgid "" "factor will be created for this column." msgstr "" -#: rbindlist.c:411 +#: rbindlist.c:413 #, c-format msgid "" "Column %d of item %d is an ordered factor with '%s'<'%s' in its levels. But " @@ -3566,14 +3495,14 @@ msgid "" "will be created for this column due to this ambiguity." msgstr "" -#: rbindlist.c:456 +#: rbindlist.c:456 rbindlist.c:465 #, c-format msgid "" "Failed to allocate working memory for %d factor levels of result column %d " "when reading item %d of item %d" msgstr "" -#: rbindlist.c:548 rbindlist.c:551 +#: rbindlist.c:553 rbindlist.c:556 #, c-format msgid "Column %d of item %d: %s" msgstr "" @@ -3745,87 +3674,82 @@ msgstr "" msgid "x is not a logical vector" msgstr "" -#: utils.c:96 +#: utils.c:100 #, c-format msgid "Unsupported type '%s' passed to allNA()" msgstr "" -#: utils.c:116 +#: utils.c:120 msgid "'x' argument must be data.table compatible" msgstr "" -#: utils.c:140 +#: utils.c:144 msgid "" "argument specifying columns is type 'double' and one or more items in it are " "not whole integers" msgstr "" -#: utils.c:146 +#: utils.c:150 #, c-format msgid "" "argument specifying columns received non-existing column(s): cols[%d]=%d" msgstr "" -#: utils.c:153 +#: utils.c:157 msgid "'x' argument data.table has no names" msgstr "" -#: utils.c:159 +#: utils.c:163 #, c-format msgid "" "argument specifying columns received non-existing column(s): cols[%d]='%s'" msgstr "" -#: utils.c:163 +#: utils.c:167 msgid "argument specifying columns must be character or numeric" msgstr "" -#: utils.c:166 +#: utils.c:170 msgid "argument specifying columns received duplicate column(s)" msgstr "" -#: utils.c:307 -#, c-format -msgid "Found and copied %d column with a shared memory address\n" -msgstr "" - -#: utils.c:389 +#: utils.c:381 msgid "'x' is not atomic" msgstr "" -#: utils.c:391 +#: utils.c:383 msgid "'x' must not be matrix or array" msgstr "" -#: utils.c:393 +#: utils.c:385 msgid "input must not be matrix or array" msgstr "" -#: utils.c:397 +#: utils.c:389 #, c-format msgid "copy=false and input already of expected type and class %s[%s]\n" msgstr "" -#: utils.c:404 +#: utils.c:396 #, c-format msgid "Coercing %s[%s] into %s[%s]\n" msgstr "" -#: utils.c:420 +#: utils.c:412 #, c-format msgid "zlib header files were not found when data.table was compiled" msgstr "" -#: utils.c:546 +#: utils.c:538 msgid "'x' should not be data.frame or data.table." msgstr "" -#: utils.c:548 +#: utils.c:540 #, c-format msgid "%s must be TRUE or FALSE." msgstr "" -#: utils.c:629 +#: utils.c:621 #, c-format msgid "Type '%s' is not supported by frev" msgstr "" diff --git a/src/init.c b/src/init.c index e4e77e96c1..8a305192c3 100644 --- a/src/init.c +++ b/src/init.c @@ -371,5 +371,5 @@ SEXP initLastUpdated(SEXP var) { SEXP dllVersion(void) { // .onLoad calls this and checks the same as packageVersion() to ensure no R/C version mismatch, #3056 - return(ScalarString(mkChar("1.17.99"))); + return(ScalarString(mkChar("1.18.0"))); } From 86dc32b471ad4267e5577ebc7f350669a07bbb5a Mon Sep 17 00:00:00 2001 From: Tyson Barrett Date: Sun, 11 Jan 2026 23:53:46 -0700 Subject: [PATCH 02/25] bump versions --- DESCRIPTION | 2 +- NEWS.md | 2 ++ src/init.c | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index ab359df087..f07f9fd5d2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: data.table -Version: 1.18.0 +Version: 1.18.1 Title: Extension of `data.frame` Depends: R (>= 3.4.0) Imports: methods diff --git a/NEWS.md b/NEWS.md index aed1752d87..a97bd81371 100644 --- a/NEWS.md +++ b/NEWS.md @@ -2,6 +2,8 @@ **If you are viewing this file on CRAN, please check [latest news on GitHub](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) where the formatting is also better.** +## data.table [v1.18.2](https://github.com/Rdatatable/data.table/milestone/37?closed=1) In Development + ## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025 ### BREAKING CHANGE diff --git a/src/init.c b/src/init.c index 8a305192c3..2b103cf59b 100644 --- a/src/init.c +++ b/src/init.c @@ -371,5 +371,5 @@ SEXP initLastUpdated(SEXP var) { SEXP dllVersion(void) { // .onLoad calls this and checks the same as packageVersion() to ensure no R/C version mismatch, #3056 - return(ScalarString(mkChar("1.18.0"))); + return(ScalarString(mkChar("1.18.1"))); } From 78abd14d3a26f4194585c23d4de784282700b3dc Mon Sep 17 00:00:00 2001 From: Tyson Barrett Date: Sun, 11 Jan 2026 23:58:03 -0700 Subject: [PATCH 03/25] Fix milestone link for 1.18.2 --- NEWS.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index a97bd81371..b3b0b8ccf3 100644 --- a/NEWS.md +++ b/NEWS.md @@ -2,7 +2,9 @@ **If you are viewing this file on CRAN, please check [latest news on GitHub](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) where the formatting is also better.** -## data.table [v1.18.2](https://github.com/Rdatatable/data.table/milestone/37?closed=1) In Development +## data.table [v1.18.2](https://github.com/Rdatatable/data.table/milestone/44?closed=1) In Development + + ## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025 From 63dec52f5f799169b689912c37739bb0e45d76e8 Mon Sep 17 00:00:00 2001 From: aitap Date: Fri, 9 Jan 2026 06:55:13 +0000 Subject: [PATCH 04/25] Replace `ATTRIB`, `SET_ATTRIB` (#7487) * frev: drop SET_ATTRIB Instead, backport and use CLEAR_ATTRIB (R >= 4.5). * mergeIndexAttrib: drop SET_ATTRIB Use SHALLOW_DUPLICATE_ATTRIB (R >= 3.3) for the simple case. Also, Backport ANY_ATTRIB (R >= 4.5) instead of testing !isNull(ATTRIB(.)). * cbindlist: use ANY_ATTRIB * nafillR: use ANY_ATTRIB * Backport R_mapAttrib * anySpecialStatic: switch to R_mapAttrib * dogroups: construct rownames anew Instead of trying to walk ATTRIB in search of the compact 'rownames' attribute to modify, install it anew, take note of the returned reference to the value being installed (a different one!) and modify that. * mergeIndexAttrib: switch to R_mapAttrib * assign: factor out index fixup Instead of walking the attribute list directly, use R_mapAttrib(). Create a hash table of index names instead of relying on chin() and a temporary string vector. Move all temporary allocations onto the R heap. * assign: drop indexLength * assign: fix index unmarking * Comments, better field names * Update src/dogroups.c Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> * mapAttrib: protect the attribute value Otherwise the callback could remove the attribute and end up with the value unprotected. Protect the attribute tag as well for uniformity. Co-Authored-By: HughParsonage * dogroups: look up rownames using mapAttrib This solution is closer to the working approach previously taken by the code. * Fix comment, function name * Protect the newly found rownames attribute * add NEWS entry --------- Co-authored-by: HughParsonage Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> Co-authored-by: Michael Chirico Co-authored-by: Michael Chirico --- NEWS.md | 2 + src/assign.c | 201 ++++++++++++++++++++++++++--------------------- src/data.table.h | 10 +++ src/dogroups.c | 31 +++++--- src/mergelist.c | 19 +++-- src/nafill.c | 2 +- src/utils.c | 19 ++++- 7 files changed, 174 insertions(+), 110 deletions(-) diff --git a/NEWS.md b/NEWS.md index b3b0b8ccf3..4c6b489391 100644 --- a/NEWS.md +++ b/NEWS.md @@ -4,7 +4,9 @@ ## data.table [v1.18.2](https://github.com/Rdatatable/data.table/milestone/44?closed=1) In Development +### Notes +1. Removed use of non-API macros `ATTRIB`, `SET_ATTRIB`, [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks @aitap for the continued assiduous work here. ## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025 diff --git a/src/assign.c b/src/assign.c index 849cb08f2a..4cc8bccb13 100644 --- a/src/assign.c +++ b/src/assign.c @@ -256,6 +256,103 @@ SEXP selfrefokwrapper(SEXP x, SEXP verbose) { return ScalarInteger(_selfrefok(x,FALSE,LOGICAL(verbose)[0])); } +struct attrib_name_ctx { + hashtab *indexNames; // stores a 1 for every CHARSXP index name in use, 0 for removed + R_xlen_t indexNamesLen; // how much memory to allocate for the hash? + SEXP index; // attr(DT, "index") + SEXP assignedNames; // STRSXP vector of variable names just assigned + bool verbose; +}; + +// Mark each CHARSXP attribute name with a 1 inside the hash, or count them to find out the allocation size. +static SEXP getOneAttribName(SEXP key, SEXP val, void *ctx_) { + (void)val; + struct attrib_name_ctx *ctx = ctx_; + if (ctx->indexNames) + hash_set(ctx->indexNames, PRINTNAME(key), 1); + else + ctx->indexNamesLen++; + return NULL; +} + +// For a given index, find out if it sorts a column that has just been assigned. If so, shorten the index (if an equivalent one doesn't already exist) or remove it altogether. +static SEXP fixIndexAttrib(SEXP tag, SEXP value, void *ctx_) { + const struct attrib_name_ctx *ctx = ctx_; + + hashtab *indexNames = ctx->indexNames; + SEXP index = ctx->index, assignedNames = ctx->assignedNames; + R_xlen_t indexLength = xlength(value); + bool verbose = ctx->verbose; + + const char *tc1, *c1; + tc1 = c1 = CHAR(PRINTNAME(tag)); // the index name; e.g. "__col1__col2" + + if (*tc1!='_' || *(tc1+1)!='_') { + // fix for #1396 + if (verbose) { + Rprintf(_("Dropping index '%s' as it doesn't have '__' at the beginning of its name. It was very likely created by v1.9.4 of data.table.\n"), tc1); + } + setAttrib(index, tag, R_NilValue); + return NULL; + } + + tc1 += 2; // tc1 always marks the start of a key column + if (!*tc1) internal_error(__func__, "index name ends with trailing __"); // # nocov + + void *vmax = vmaxget(); + // check the position of the first appearance of an assigned column in the index. + // the new index will be truncated to this position. + size_t newKeyLength = strlen(c1); + char *s4 = R_alloc(newKeyLength + 3, 1); + memcpy(s4, c1, newKeyLength); + memcpy(s4 + newKeyLength, "__", 3); + + for(int i = 0; i < xlength(assignedNames); i++){ + const char *tc2 = CHAR(STRING_ELT(assignedNames, i)); + void *vmax2 = vmaxget(); + size_t tc2_len = strlen(tc2); + char *s5 = R_alloc(tc2_len + 5, 1); //4 * '_' + \0 + memcpy(s5, "__", 2); + memcpy(s5 + 2, tc2, tc2_len); + memcpy(s5 + 2 + tc2_len, "__", 3); + tc2 = strstr(s4, s5); + if(tc2 && (tc2 - s4 < newKeyLength)){ // new column is part of key; match is before last match + newKeyLength = tc2 - s4; + } + vmaxset(vmax2); + } + + s4[newKeyLength] = '\0'; // truncate the new key to the new length + if(newKeyLength == 0){ // no valid key column remains. Drop the key + setAttrib(index, tag, R_NilValue); + hash_set(indexNames, PRINTNAME(tag), 0); + if (verbose) { + Rprintf(_("Dropping index '%s' due to an update on a key column\n"), c1+2); + } + } else if(newKeyLength < strlen(c1)) { + SEXP s4Str = PROTECT(mkChar(s4)); + if(indexLength == 0 && // shortened index can be kept since it is just information on the order (see #2372) + !hash_lookup(indexNames, s4Str, 0)) { // index with shortened name not present yet + setAttrib(index, installChar(s4Str), value); + hash_set(indexNames, PRINTNAME(tag), 0); + setAttrib(index, tag, R_NilValue); + hash_set(indexNames, s4Str, 1); + if (verbose) + Rprintf(_("Shortening index '%s' to '%s' due to an update on a key column\n"), c1+2, s4+2); + } else { // indexLength > 0 || shortened name present already + // indexLength > 0 indicates reordering. Drop it to avoid spurious reordering in non-indexed columns (#2372) + // shortened name already present indicates that index needs to be dropped to avoid duplicate indices. + setAttrib(index, tag, R_NilValue); + hash_set(indexNames, PRINTNAME(tag), 0); + if (verbose) + Rprintf(_("Dropping index '%s' due to an update on a key column\n"), c1+2); + } + UNPROTECT(1); // s4Str + } //else: index is not affected by assign: nothing to be done + vmaxset(vmax); + return NULL; +} + int *_Last_updated = NULL; SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values) @@ -264,12 +361,12 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values) // newcolnames : add these columns (if any) // cols : column names or numbers corresponding to the values to set // rows : row numbers to assign - R_len_t numToDo, targetlen, vlen, oldncol, oldtncol, coln, protecti=0, newcolnum, indexLength; - SEXP targetcol, nullint, s, colnam, tmp, key, index, a, assignedNames, indexNames; + R_len_t numToDo, targetlen, vlen, oldncol, oldtncol, coln, protecti=0, newcolnum; + SEXP targetcol, nullint, s, colnam, tmp, key, index, a, assignedNames; bool verbose=GetVerbose(); int ndelete=0; // how many columns are being deleted const char *c1, *tc1, *tc2; - int *buf, indexNo; + int *buf; if (isNull(dt)) error(_("assign has been passed a NULL dt")); if (TYPEOF(dt) != VECSXP) error(_("dt passed to assign isn't type VECSXP")); if (islocked(dt)) @@ -549,93 +646,17 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values) } index = getAttrib(dt, install("index")); if (index != R_NilValue) { - s = ATTRIB(index); - indexNo = 0; - // get a vector with all index names - PROTECT(indexNames = allocVector(STRSXP, xlength(s))); protecti++; - while(s != R_NilValue){ - SET_STRING_ELT(indexNames, indexNo, PRINTNAME(TAG(s))); - indexNo++; - s = CDR(s); - } - s = ATTRIB(index); // reset to first element - indexNo = 0; - while(s != R_NilValue) { - a = TAG(s); - indexLength = xlength(CAR(s)); - tc1 = c1 = CHAR(PRINTNAME(a)); // the index name; e.g. "__col1__col2" - if (*tc1!='_' || *(tc1+1)!='_') { - // fix for #1396 - if (verbose) { - Rprintf(_("Dropping index '%s' as it doesn't have '__' at the beginning of its name. It was very likely created by v1.9.4 of data.table.\n"), tc1); - } - setAttrib(index, a, R_NilValue); - indexNo++; - s = CDR(s); - continue; // with next index - } - tc1 += 2; // tc1 always marks the start of a key column - if (!*tc1) internal_error(__func__, "index name ends with trailing __"); // # nocov - // check the position of the first appearance of an assigned column in the index. - // the new index will be truncated to this position. - char *s4 = malloc(strlen(c1) + 3); - if (!s4) { - internal_error(__func__, "Couldn't allocate memory for s4"); // # nocov - } - memcpy(s4, c1, strlen(c1)); - memset(s4 + strlen(c1), '\0', 1); - strcat(s4, "__"); // add trailing '__' to newKey so we can search for pattern '__colName__' also at the end of the index. - int newKeyLength = strlen(c1); - for(int i = 0; i < xlength(assignedNames); i++){ - tc2 = CHAR(STRING_ELT(assignedNames, i)); - char *s5 = malloc(strlen(tc2) + 5); //4 * '_' + \0 - if (!s5) { - free(s4); // # nocov - internal_error(__func__, "Couldn't allocate memory for s5"); // # nocov - } - memset(s5, '_', 2); - memset(s5 + 2, '\0', 1); - strcat(s5, tc2); - strcat(s5, "__"); - tc2 = strstr(s4, s5); - if(tc2 == NULL){ // column is not part of key - free(s5); - continue; - } - if(tc2 - s4 < newKeyLength){ // new column match is before last match - newKeyLength = tc2 - s4; - } - free(s5); - } - memset(s4 + newKeyLength, '\0', 1); // truncate the new key to the new length - if(newKeyLength == 0){ // no valid key column remains. Drop the key - setAttrib(index, a, R_NilValue); - SET_STRING_ELT(indexNames, indexNo, NA_STRING); - if (verbose) { - Rprintf(_("Dropping index '%s' due to an update on a key column\n"), c1+2); - } - } else if(newKeyLength < strlen(c1)) { - SEXP s4Str = PROTECT(mkString(s4)); - if(indexLength == 0 && // shortened index can be kept since it is just information on the order (see #2372) - LOGICAL(chin(s4Str, indexNames))[0] == 0) {// index with shortened name not present yet - SET_TAG(s, install(s4)); - SET_STRING_ELT(indexNames, indexNo, mkChar(s4)); - if (verbose) - Rprintf(_("Shortening index '%s' to '%s' due to an update on a key column\n"), c1+2, s4 + 2); - } else { // indexLength > 0 || shortened name present already - // indexLength > 0 indicates reordering. Drop it to avoid spurious reordering in non-indexed columns (#2372) - // shortened name already present indicates that index needs to be dropped to avoid duplicate indices. - setAttrib(index, a, R_NilValue); - SET_STRING_ELT(indexNames, indexNo, NA_STRING); - if (verbose) - Rprintf(_("Dropping index '%s' due to an update on a key column\n"), c1+2); - } - UNPROTECT(1); // s4Str - } //else: index is not affected by assign: nothing to be done - free(s4); - indexNo ++; - s = CDR(s); - } + struct attrib_name_ctx ctx = { 0, }; + R_mapAttrib(index, getOneAttribName, &ctx); // how many attributes? + hashtab *h = hash_create(ctx.indexNamesLen); + PROTECT(h->prot); + ctx.indexNames = h; + R_mapAttrib(index, getOneAttribName, &ctx); // now remember the names + ctx.index = index; + ctx.assignedNames = assignedNames; + ctx.verbose = verbose; + R_mapAttrib(index, fixIndexAttrib, &ctx); // adjust indices as needed + UNPROTECT(1); // h } if (ndelete) { // delete any columns assigned NULL (there was a 'continue' earlier in loop above) diff --git a/src/data.table.h b/src/data.table.h index 434d0a340a..a7f7872581 100644 --- a/src/data.table.h +++ b/src/data.table.h @@ -15,6 +15,8 @@ #endif #if R_VERSION < R_Version(4, 5, 0) # define isDataFrame(x) isFrame(x) // #6180 +# define CLEAR_ATTRIB(x) SET_ATTRIB(x, R_NilValue) +# define ANY_ATTRIB(x) (!(isNull(ATTRIB(x)))) #endif #include #define SEXPPTR_RO(x) ((const SEXP *)DATAPTR_RO(x)) // to avoid overhead of looped STRING_ELT and VECTOR_ELT @@ -103,6 +105,11 @@ } # define R_resizeVector(x, newlen) R_resizeVector_(x, newlen) #endif +// TODO(R>=4.6.0): remove the SVN revision check +#if R_VERSION < R_Version(4, 6, 0) || R_SVN_REVISION < 89194 +# define BACKPORT_MAP_ATTRIB +# define R_mapAttrib(x, fun, ctx) R_mapAttrib_(x, fun, ctx) +#endif // init.c extern SEXP char_integer64; @@ -343,6 +350,9 @@ SEXP R_allocResizableVector_(SEXPTYPE type, R_xlen_t maxlen); SEXP R_duplicateAsResizable_(SEXP x); void R_resizeVector_(SEXP x, R_xlen_t newlen); #endif +#ifdef BACKPORT_MAP_ATTRIB +SEXP R_mapAttrib_(SEXP x, SEXP (*fun)(SEXP key, SEXP val, void *ctx), void *ctx); +#endif SEXP is_direct_child(SEXP pids); // types.c diff --git a/src/dogroups.c b/src/dogroups.c index 373242516f..3148256a6b 100644 --- a/src/dogroups.c +++ b/src/dogroups.c @@ -3,6 +3,8 @@ #include #include +static SEXP anySpecialAttribute(SEXP key, SEXP val, void *ctx); + static bool anySpecialStatic(SEXP x, hashtab * specials) { // Special refers to special symbols .BY, .I, .N, and .GRP; see special-symbols.Rd // Static because these are like C static arrays which are the same memory for each group; e.g., dogroups @@ -39,7 +41,7 @@ static bool anySpecialStatic(SEXP x, hashtab * specials) { // with PR#4164 started to copy input list columns too much. Hence PR#4655 in v1.13.2 moved that copy here just where it is needed. // Currently the marker is negative truelength. These specials are protected by us here and before we release them // we restore the true truelength for when R starts to use vector truelength. - SEXP attribs, list_el; + SEXP list_el; const int n = length(x); // use length() not LENGTH() because LENGTH() on NULL is segfault in R<3.5 where we still define USE_RINTERNALS // (see data.table.h), and isNewList() is true for NULL @@ -54,20 +56,29 @@ static bool anySpecialStatic(SEXP x, hashtab * specials) { list_el = VECTOR_ELT(x,i); if (anySpecialStatic(list_el, specials)) return true; - for(attribs = ATTRIB(list_el); attribs != R_NilValue; attribs = CDR(attribs)) { - if (anySpecialStatic(CAR(attribs), specials)) - return true; // #4936 - } + if (R_mapAttrib(list_el, anySpecialAttribute, specials)) + return true; // #4936 } } return false; } +static SEXP anySpecialAttribute(SEXP key, SEXP val, void *specials) { + (void)key; + return anySpecialStatic(val, specials) ? R_NilValue : NULL; +} + +static SEXP findRowNames(SEXP key, SEXP val, void *data) { + (void)data; + if (key == R_RowNamesSymbol) return val; + return NULL; +} + SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEXP xjiscols, SEXP grporder, SEXP order, SEXP starts, SEXP lens, SEXP jexp, SEXP env, SEXP lhs, SEXP newnames, SEXP on, SEXP verboseArg, SEXP showProgressArg) { R_len_t ngrp, nrowgroups, njval=0, ngrpcols, ansloc=0, maxn, estn=-1, thisansloc, grpn, thislen, igrp; int nprotect=0; - SEXP ans=NULL, jval, thiscol, BY, N, I, GRP, iSD, xSD, rownames, s, RHS, target, source; + SEXP ans=NULL, jval, thiscol, BY, N, I, GRP, iSD, xSD, s, RHS, target, source; Rboolean wasvector, firstalloc=FALSE, NullWarnDone=FALSE; const bool verbose = LOGICAL(verboseArg)[0]==1; double tstart=0, tblock[10]={0}; int nblock[10]={0}; // For verbose printing, tstart is updated each block @@ -130,11 +141,11 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX R_LockBinding(install(".I"), env); SEXP dtnames = PROTECT(getAttrib(dt, R_NamesSymbol)); nprotect++; // added here to fix #91 - `:=` did not issue recycling warning during "by" - // fetch rownames of .SD. rownames[1] is set to -thislen for each group, in case .SD is passed to + + // override rownames of .SD. rownames[1] is set to -thislen for each group, in case .SD is passed to // non data.table aware package that uses rownames - for (s = ATTRIB(SD); s != R_NilValue && TAG(s)!=R_RowNamesSymbol; s = CDR(s)); // getAttrib0 basically but that's hidden in attrib.c; #loop_counter_not_local_scope_ok - if (s==R_NilValue) error(_("row.names attribute of .SD not found")); - rownames = CAR(s); + SEXP rownames = PROTECT(R_mapAttrib(SD, findRowNames, NULL)); nprotect++; + if (rownames == NULL) error(_("row.names attribute of .SD not found")); if (!isInteger(rownames) || LENGTH(rownames)!=2 || INTEGER(rownames)[0]!=NA_INTEGER) error(_("row.names of .SD isn't integer length 2 with NA as first item; i.e., .set_row_names(). [%s %d %d]"),type2char(TYPEOF(rownames)),LENGTH(rownames),INTEGER(rownames)[0]); // fetch names of .SD and prepare symbols. In case they are copied-on-write by user assigning to those variables diff --git a/src/mergelist.c b/src/mergelist.c index 51f28d224a..90854ae824 100644 --- a/src/mergelist.c +++ b/src/mergelist.c @@ -17,18 +17,21 @@ SEXP copyCols(SEXP x, SEXP cols) { return R_NilValue; } +static SEXP setDuplicateOneAttrib(SEXP key, SEXP val, void *x) { + setAttrib(x, PROTECT(key), PROTECT(shallow_duplicate(val))); + UNPROTECT(2); + return NULL; // continue +} + void mergeIndexAttrib(SEXP to, SEXP from) { if (!isInteger(to) || LENGTH(to)!=0) internal_error(__func__, "'to' must be integer() already"); // # nocov if (isNull(from)) return; - SEXP t = ATTRIB(to), f = ATTRIB(from); - if (isNull(t)) // target has no attributes -> overwrite - SET_ATTRIB(to, shallow_duplicate(f)); - else { - for (t = ATTRIB(to); CDR(t) != R_NilValue; t = CDR(t)); // traverse to end of attributes list of to - SETCDR(t, shallow_duplicate(f)); - } + if (!ANY_ATTRIB(to)) // target has no attributes -> overwrite + SHALLOW_DUPLICATE_ATTRIB(to, from); + else + R_mapAttrib(from, setDuplicateOneAttrib, to); } SEXP cbindlist(SEXP x, SEXP copyArg) { @@ -84,7 +87,7 @@ SEXP cbindlist(SEXP x, SEXP copyArg) { key = getAttrib(thisx, sym_sorted); UNPROTECT(protecti); // thisnames, thisxcol } - if (isNull(ATTRIB(index))) + if (!ANY_ATTRIB(index)) setAttrib(ans, sym_index, R_NilValue); setAttrib(ans, R_NamesSymbol, names); setAttrib(ans, sym_sorted, key); diff --git a/src/nafill.c b/src/nafill.c index 5c9568efb5..ff2a7fc344 100644 --- a/src/nafill.c +++ b/src/nafill.c @@ -218,7 +218,7 @@ SEXP nafillR(SEXP obj, SEXP type, SEXP fill, SEXP nan_is_na_arg, SEXP inplace, S if (!binplace) { for (R_len_t i=0; i Date: Thu, 8 Jan 2026 23:38:42 -0800 Subject: [PATCH 05/25] use getVar over findVar (#7575) --- NEWS.md | 11 ++++++++++- src/data.table.h | 1 + src/dogroups.c | 12 ++++++------ 3 files changed, 17 insertions(+), 7 deletions(-) diff --git a/NEWS.md b/NEWS.md index 4c6b489391..3514f783c0 100644 --- a/NEWS.md +++ b/NEWS.md @@ -6,7 +6,16 @@ ### Notes -1. Removed use of non-API macros `ATTRIB`, `SET_ATTRIB`, [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks @aitap for the continued assiduous work here. +1. Removed use of non-API `ATTRIB`, `SET_ATTRIB`, and `findVar` [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks @aitap for the continued assiduous work here, and @MichaelChirico for the easy fix to replace `findVar` with `R_getVar`. + +### Notes + +1. {data.table} now depends on R 3.5.0 (2018). + +2. pydatatable compatibility layer in `fread()` and `fwrite()` has been removed, [#7069](https://github.com/Rdatatable/data.table/issues/7069). Thanks @badasahog for the report and the PR. + +3. Vignettes are now built using `litedown` instead of `knitr`, [#6394](https://github.com/Rdatatable/data.table/issues/6394). Thanks @jangorecki for the suggestion and @ben-schwen and @aitap for the implementation. + ## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025 diff --git a/src/data.table.h b/src/data.table.h index a7f7872581..d6c67c7521 100644 --- a/src/data.table.h +++ b/src/data.table.h @@ -14,6 +14,7 @@ # define LOGICAL_RO LOGICAL #endif #if R_VERSION < R_Version(4, 5, 0) +# define R_getVar(x, env, inherits) findVar(x, env) # define isDataFrame(x) isFrame(x) // #6180 # define CLEAR_ATTRIB(x) SET_ATTRIB(x, R_NilValue) # define ANY_ATTRIB(x) (!(isNull(ATTRIB(x)))) diff --git a/src/dogroups.c b/src/dogroups.c index 3148256a6b..2d8ae5d3dd 100644 --- a/src/dogroups.c +++ b/src/dogroups.c @@ -95,8 +95,8 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX ngrpcols = length(grpcols); nrowgroups = length(VECTOR_ELT(groups,0)); // fix for longstanding FR/bug, #495. E.g., DT[, c(sum(v1), lapply(.SD, mean)), by=grp, .SDcols=v2:v3] resulted in error.. the idea is, 1) we create .SDall, which is normally == .SD. But if extra vars are detected in jexp other than .SD, then .SD becomes a shallow copy of .SDall with only .SDcols in .SD. Since internally, we don't make a copy, changing .SDall will reflect in .SD. Hopefully this'll workout :-). - SEXP SDall = PROTECT(findVar(install(".SDall"), env)); nprotect++; // PROTECT for rchk - SEXP SD = PROTECT(findVar(install(".SD"), env)); nprotect++; + SEXP SDall = PROTECT(R_getVar(install(".SDall"), env, false)); nprotect++; // PROTECT for rchk + SEXP SD = PROTECT(R_getVar(install(".SD"), env, false)); nprotect++; const bool showProgress = LOGICAL(showProgressArg)[0]==1 && ngrp > 1; // showProgress only if more than 1 group double startTime = (showProgress) ? wallclock() : 0; // For progress printing, startTime is set at the beginning @@ -125,12 +125,12 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX error("!length(bynames)[%d]==length(groups)[%d]==length(grpcols)[%d]", length(bynames), length(groups), length(grpcols)); // # notranslate // TO DO: check this check above. - N = PROTECT(findVar(install(".N"), env)); nprotect++; // PROTECT for rchk + N = PROTECT(R_getVar(install(".N"), env, false)); nprotect++; // PROTECT for rchk hash_set(specials, N, -1); // marker for anySpecialStatic(); see its comments - GRP = PROTECT(findVar(install(".GRP"), env)); nprotect++; + GRP = PROTECT(R_getVar(install(".GRP"), env, false)); nprotect++; hash_set(specials, GRP, -1); // marker for anySpecialStatic(); see its comments - iSD = PROTECT(findVar(install(".iSD"), env)); nprotect++; // 1-row and possibly no cols (if no i variables are used via JIS) - xSD = PROTECT(findVar(install(".xSD"), env)); nprotect++; + iSD = PROTECT(R_getVar(install(".iSD"), env, false)); nprotect++; // 1-row and possibly no cols (if no i variables are used via JIS) + xSD = PROTECT(R_getVar(install(".xSD"), env, false)); nprotect++; R_len_t maxGrpSize = 0; const int *ilens = INTEGER(lens), n=LENGTH(lens); for (R_len_t i=0; i Date: Fri, 9 Jan 2026 09:56:15 +0100 Subject: [PATCH 06/25] remove unused vars (#7578) --- src/assign.c | 3 +-- src/dogroups.c | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/src/assign.c b/src/assign.c index 4cc8bccb13..1dd712b41c 100644 --- a/src/assign.c +++ b/src/assign.c @@ -362,10 +362,9 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values) // cols : column names or numbers corresponding to the values to set // rows : row numbers to assign R_len_t numToDo, targetlen, vlen, oldncol, oldtncol, coln, protecti=0, newcolnum; - SEXP targetcol, nullint, s, colnam, tmp, key, index, a, assignedNames; + SEXP targetcol, nullint, colnam, tmp, key, index, assignedNames; bool verbose=GetVerbose(); int ndelete=0; // how many columns are being deleted - const char *c1, *tc1, *tc2; int *buf; if (isNull(dt)) error(_("assign has been passed a NULL dt")); if (TYPEOF(dt) != VECSXP) error(_("dt passed to assign isn't type VECSXP")); diff --git a/src/dogroups.c b/src/dogroups.c index 2d8ae5d3dd..3cace1fea7 100644 --- a/src/dogroups.c +++ b/src/dogroups.c @@ -78,7 +78,7 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX { R_len_t ngrp, nrowgroups, njval=0, ngrpcols, ansloc=0, maxn, estn=-1, thisansloc, grpn, thislen, igrp; int nprotect=0; - SEXP ans=NULL, jval, thiscol, BY, N, I, GRP, iSD, xSD, s, RHS, target, source; + SEXP ans=NULL, jval, thiscol, BY, N, I, GRP, iSD, xSD, RHS, target, source; Rboolean wasvector, firstalloc=FALSE, NullWarnDone=FALSE; const bool verbose = LOGICAL(verboseArg)[0]==1; double tstart=0, tblock[10]={0}; int nblock[10]={0}; // For verbose printing, tstart is updated each block From 155795048d26c6b3e87dcc3b7740b7c8d52a32db Mon Sep 17 00:00:00 2001 From: aitap Date: Fri, 26 Dec 2025 07:57:26 +0000 Subject: [PATCH 07/25] Fix code blocks in NEWS.md (#7518) Add a missing triple-backtick separator. Separate the indented code blocks from the preceding paragraph because otherwise Pandoc fails to realise that the whitespace followed by triple-backtick denotes a fenced code block. --- NEWS.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/NEWS.md b/NEWS.md index 3514f783c0..78a4e23008 100644 --- a/NEWS.md +++ b/NEWS.md @@ -26,6 +26,7 @@ 2. `melt()` returns an integer column for `variable` when `measure.vars` is a list of length=1, consistent with the documented behavior, [#5209](https://github.com/Rdatatable/data.table/issues/5209). Thanks to @tdhock for reporting. Any users who were relying on this behavior can change `measure.vars=list("col_name")` (output `variable` was column name, now is column index/integer) to `measure.vars="col_name"` (`variable` still is column name). This change has been planned since 1.16.0 (25 Aug 2024). 3. Rolling functions `frollmean` and `frollsum` distinguish `Inf`/`-Inf` from `NA` to match the same rules as base R when `algo="fast"` (previously they were considered the same). If your input into those functions has `Inf` or `-Inf` then you will be affected by this change. As a result, the argument that controls the handling of `NA`s has been renamed from `hasNA` to `has.nf` (_has non-finite_). `hasNA` continues to work with a warning, for now. + ```r ## before frollsum(c(1,2,3,Inf,5,6), 2) @@ -34,8 +35,10 @@ ## now frollsum(c(1,2,3,Inf,5,6), 2) #[1] NA 3 5 Inf Inf 11 + ``` 4. `frollapply` result is not coerced to numeric anymore. Users' code could possibly break if it depends on forced coercion of input/output to numeric type. + ```r ## before frollapply(c(F,T,F,F,F,T), 2, any) @@ -45,6 +48,7 @@ frollapply(c(F,T,F,F,F,T), 2, any) #[1] NA TRUE TRUE FALSE FALSE TRUE ``` + Additionally argument names in `frollapply` has been renamed from `x` to `X` and `n` to `N` to avoid conflicts with common argument names that may be passed to `...`, aligning to base R API of `lapply`. `x` and `n` continue to work with a warning, for now. 5. Negative and missing values of `n` argument of adaptive rolling functions trigger an error. @@ -233,6 +237,7 @@ See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. T ``` 18. New helper `frolladapt` to facilitate applying rolling functions over windows of fixed calendar-time width in irregularly-spaced data sets, thereby bypassing the need to "augment" such data with placeholder rows, [#3241](https://github.com/Rdatatable/data.table/issues/3241). Thanks to @jangorecki for implementation. + ```r idx = as.Date("2025-09-05") + c(0,4,7,8,9,10,12,13,17) dt = data.table(index=idx, value=seq_along(idx)) From a661e47603ea4d356ffffc1bb34508e8aa3df98b Mon Sep 17 00:00:00 2001 From: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> Date: Fri, 26 Dec 2025 19:15:56 +0100 Subject: [PATCH 08/25] make rchk happy (#7520) Co-authored-by: Michael Chirico --- src/subset.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/subset.c b/src/subset.c index b51a33f597..b3372f057b 100644 --- a/src/subset.c +++ b/src/subset.c @@ -331,7 +331,7 @@ SEXP subsetDT(SEXP x, SEXP rows, SEXP cols) { // API change needs update NEWS.md SEXP tmp = PROTECT(R_allocResizableVector(STRSXP, LENGTH(cols)+overAlloc)); nprotect++; R_resizeVector(tmp, LENGTH(cols)); setAttrib(ans, R_NamesSymbol, tmp); - subsetVectorRaw(tmp, getAttrib(x, R_NamesSymbol), cols, /*anyNA=*/false); + subsetVectorRaw(tmp, PROTECT(getAttrib(x, R_NamesSymbol)), cols, /*anyNA=*/false); nprotect++; tmp = PROTECT(allocVector(INTSXP, 2)); nprotect++; INTEGER(tmp)[0] = NA_INTEGER; @@ -341,7 +341,7 @@ SEXP subsetDT(SEXP x, SEXP rows, SEXP cols) { // API change needs update NEWS.md // clear any index that was copied over by copyMostAttrib() above, e.g. #1760 and #1734 (test 1678) setAttrib(ans, sym_index, R_NilValue); // but maintain key if ordered subset - SEXP key = getAttrib(x, sym_sorted); + SEXP key = PROTECT(getAttrib(x, sym_sorted)); nprotect++; if (length(key)) { SEXP innames = PROTECT(getAttrib(ans,R_NamesSymbol)); nprotect++; SEXP in = PROTECT(chin(key, innames)); nprotect++; @@ -352,7 +352,7 @@ SEXP subsetDT(SEXP x, SEXP rows, SEXP cols) { // API change needs update NEWS.md setAttrib(ans, sym_sorted, R_NilValue); } else { // make a new key attribute; shorter if i Date: Fri, 2 Jan 2026 22:14:33 +0100 Subject: [PATCH 09/25] escape one frollsd tests for valgrind (#7548) * escape one tests for valgrind * increment skipped count * escape proper one * make test robust to valgrind numerical issues --------- Co-authored-by: Michael Chirico --- inst/tests/froll.Rraw | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/inst/tests/froll.Rraw b/inst/tests/froll.Rraw index 489afcdad0..298ee5c8d1 100644 --- a/inst/tests/froll.Rraw +++ b/inst/tests/froll.Rraw @@ -11,7 +11,7 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) { exact_NaN = isTRUE(capabilities()["long.double"]) && identical(as.integer(.Machine$longdouble.digits), 64L) if (!exact_NaN) { - cat("\n**** Skipping 7 NaN/NA algo='exact' tests because .Machine$longdouble.digits==", .Machine$longdouble.digits, " (!=64); e.g. under valgrind\n\n", sep="") + cat("\n**** Skipping 8 NaN/NA algo='exact' tests because .Machine$longdouble.digits==", .Machine$longdouble.digits, " (!=64); e.g. under valgrind\n\n", sep="") # for Matt when he runs valgrind it is 53, but 64 when running regular R # froll.c uses long double and appears to require full long double accuracy in the algo='exact' } @@ -1448,9 +1448,12 @@ test(6001.727, frollvar(adaptive=TRUE, c(1:2,NA), c(2,0,2), algo="exact"), c(NA_ test(6001.728, frollvar(adaptive=TRUE, c(1:2,NA), c(2,0,2), algo="exact", na.rm=TRUE), c(NA_real_,NA_real_,NA_real_)) test(6001.729, frollvar(adaptive=TRUE, c(1:2,NA), c(2,0,2), algo="exact", na.rm=TRUE, partial=TRUE), c(NA_real_,NA_real_,NA_real_)) test(6001.730, frollvar(adaptive=TRUE, c(1:2,NA), c(2,0,2), fill=99, algo="exact", na.rm=TRUE), c(99,NA,NA)) -y = c(1e8+2.980232e-8, 1e8, 1e8, 1e8) # CLAMP0 test -test(6001.731, frollvar(y, 3)[4L], 0) -test(6001.732, frollsd(y, 3)[4L], 0) +# numerical stability: we need to guarantee frollvar(x, n) >= 0 for all x, n +# the exact epsilon here is a bit implementation-dependent (as in #7546), but what's +# crucial is the output is never negative (or NaN after sqrt() for frollsd). +y = c(1e8+2.980232e-8, 1e8, 1e8, 1e8) +test(6001.731, between(frollvar(y, 3)[4L], 0, 1e-7)) +test(6001.732, between(frollsd(y, 3)[4L], 0, 1e-7)) test(6001.733, frollvar(y, c(3,3,3,3), adaptive=TRUE)[4L], 0) test(6001.734, frollsd(y, c(3,3,3,3), adaptive=TRUE)[4L], 0) test(6001.740, frollvar(c(1.5,2.5,2,NA), c(3,3)), list(c(NA,NA,0.25,NA), c(NA,NA,0.25,NA)), output="running sequentially, because outer parallelism has been used", options=c(datatable.verbose=TRUE)) # ensure no nested parallelism in rolling functions #7352 From 23b0a92ecef32e51f887b1ed451637f0d4ad01d1 Mon Sep 17 00:00:00 2001 From: Manmita Das <34617961+manmita@users.noreply.github.com> Date: Fri, 9 Jan 2026 00:53:37 +0530 Subject: [PATCH 10/25] fix(7571): bug fix for narm issue on gforce in int64 case (#7572) * fix(7571): bug fix for narm issue on gforce in int64 case * fix(7571): test sequencing * fix(7571): updated the NEWS.md * trailing newline * Use $V1 * fix(7571): added db optimize 2L * refine NEWS * fix(7571): add more tests and change to code similar to int for gsum * fix(7571): added more tests for mean * eliminate intermediate variable * NEWS again --------- Co-authored-by: Michael Chirico --- NEWS.md | 2 ++ inst/tests/tests.Rraw | 52 +++++++++++++++++++++++++++++++++++++++++++ src/gsumm.c | 12 +++++----- 3 files changed, 60 insertions(+), 6 deletions(-) diff --git a/NEWS.md b/NEWS.md index 78a4e23008..72a579d167 100644 --- a/NEWS.md +++ b/NEWS.md @@ -17,6 +17,8 @@ 3. Vignettes are now built using `litedown` instead of `knitr`, [#6394](https://github.com/Rdatatable/data.table/issues/6394). Thanks @jangorecki for the suggestion and @ben-schwen and @aitap for the implementation. +4. `sum()` by group is correct with missing entries and GForce activated ([#7571](https://github.com/Rdatatable/data.table/issues/7571)). Thanks to @rweberc for the report and @manmita for the fix. The issue was caused by a faulty early `break` that spilled between groups, and resulted in silently incorrect results! + ## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025 ### BREAKING CHANGE diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index b0a5c690b2..9db0e7e11f 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -21938,3 +21938,55 @@ dimnames(X) = copy(Xdn) DT = as.data.table(X) test(2354.2, dimnames(X), Xdn) rm(X, Xdn, DT) + +#7571 issue for na.rm on int64 +if (test_bit64) local({ + # integer64 + GForce grouped sum with na.rm = FALSE + # Example 1 from issue: ids 1:8, 9, 9; three leading NAs then 4:10 + dt_short = data.table( + id = c(1:8, 9, 9), + value = c(rep(NA_integer64_, 3L), as.integer64(4:10)) + ) + test(2355.1, options=c(datatable.optimize=2L), + dt_short[, sum(value, na.rm = FALSE), by = id]$V1, + as.integer64(c(NA, NA, NA, 4:8, 19)) + ) + + # Example 2 from issue: ids in pairs, same values; checks multi-row groups + dt_short2 = data.table( + id = rep(1:5, each = 2L), + value = c(rep(NA_integer64_, 3L), as.integer64(4:10)) + ) + test(2355.2, options=c(datatable.optimize=2L), + dt_short2[, sum(value, na.rm = FALSE), by = id]$V1, + as.integer64(c(NA, NA, 11, 15, 19)) + ) + + # Test mean for integer64 with NA + dt_mean = data.table( + id = c(1,1,2,2,3,3), + value = as.integer64(c(NA, NA, NA, 20000000, 5, 3)) + ) + test(2355.3, options=c(datatable.optimize=2L), + dt_mean[, mean(value, na.rm=FALSE), by = id]$V1, + c(NA, NA, 4) + ) + + # GForce sum vs base::sum for integer64 + DT = data.table(id = sample(letters, 1000, TRUE), value = as.integer64(sample(c(1:100, NA), 1000, TRUE))) + gforce = DT[, .(gforce_sum = sum(value)), by=id] + base = DT[, .(true_sum = base::sum(value)), by=id] + merged = merge(gforce, base, by="id", all=TRUE) + test(2355.4, options=c(datatable.optimize=2L), + merged$gforce_sum, merged$true_sum + ) + + # GForce mean vs base::mean for integer64 + DTm = data.table(id = sample(letters, 1000, TRUE), value = as.integer64(sample(c(1:100, NA), 1000, TRUE))) + gforce_m = DTm[, .(gforce_mean = mean(value)), by=id] + base_m = DTm[, .(true_mean = base::mean(value)), by=id] + merged_m = merge(gforce_m, base_m, by="id", all=TRUE) + test(2355.5, options=c(datatable.optimize=2L), + merged$gforce_mean, merged$true_mean + ) +}) diff --git a/src/gsumm.c b/src/gsumm.c index 5970f59194..ed0bc1b56c 100644 --- a/src/gsumm.c +++ b/src/gsumm.c @@ -502,13 +502,13 @@ SEXP gsum(SEXP x, SEXP narmArg) const int64_t *my_gx = gx + b*batchSize + pos; const uint16_t *my_low = low + b*batchSize + pos; for (int i=0; i Date: Thu, 8 Jan 2026 11:40:05 -0800 Subject: [PATCH 11/25] add @manmita (#7573) --- DESCRIPTION | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index f07f9fd5d2..1a17af80a2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -105,5 +105,6 @@ Authors@R: c( person(given="@badasahog", role="ctb", comment="GitHub user"), person("Vinit", "Thakur", role="ctb"), person("Mukul", "Kumar", role="ctb"), - person("Ildikó", "Czeller", role="ctb") + person("Ildikó", "Czeller", role="ctb"), + person("Manmita", "Das", role="ctb") ) From 20e25d1e9701125d9bb889b9951b655dcd0c419a Mon Sep 17 00:00:00 2001 From: aitap Date: Mon, 12 Jan 2026 17:50:13 +0000 Subject: [PATCH 12/25] `utils.c`: include `` for `siginfo_t` (#7517) * utils.c: include for siginfo_t POSIX says: > The header shall define the siginfo_t type as a structure So is not enough to see the definition (not just a forward declaration) of siginfo_t. * NEWS entry * Amend NEWS * more robustly define _POSIX_C_SOURCE (h/t Hugh) * tidy up NEWS * -D_POSIX_C_SOURCE=200809L in gitlab CI job for regression test * revert gitlab-ci change --------- Co-authored-by: Michael Chirico Co-authored-by: Michael Chirico --- NEWS.md | 2 ++ src/utils.c | 7 ++++++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index 72a579d167..29258ef2c3 100644 --- a/NEWS.md +++ b/NEWS.md @@ -8,6 +8,8 @@ 1. Removed use of non-API `ATTRIB`, `SET_ATTRIB`, and `findVar` [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks @aitap for the continued assiduous work here, and @MichaelChirico for the easy fix to replace `findVar` with `R_getVar`. +7. Fixed compilation failure like "error: unknown type name 'siginfo_t'" in v1.18.0 in some strict environments, e.g., FreeBSD, where the header file declaring the POSIX function `waitid` does not transitively include the header file defining the `siginfo_t` type, [#7516](https://github.com/rdatatable/data.table/issues/7516). Thanks to @jszhao for the report and @aitap for the fix. + ### Notes 1. {data.table} now depends on R 3.5.0 (2018). diff --git a/src/utils.c b/src/utils.c index f229786b83..256694a761 100644 --- a/src/utils.c +++ b/src/utils.c @@ -1,5 +1,10 @@ #ifndef _WIN32 -# include +# if !defined(_POSIX_C_SOURCE) || _POSIX_C_SOURCE < 200809L +# undef _POSIX_C_SOURCE +# define _POSIX_C_SOURCE 200809L // required for POSIX (not standard C) features in is_direct_child e.g. 'siginfo_t' +# endif +# include // siginfo_t +# include // waitid #endif #include "data.table.h" From 04363800bc8ef0108d340a45607f38933e2a7b93 Mon Sep 17 00:00:00 2001 From: Ivan K Date: Mon, 12 Jan 2026 21:08:32 +0300 Subject: [PATCH 13/25] Remove unrelated release notes Amend 1c05f11dd03115319e9b39b1dbdb6becad693b78: remove the notes not relevant to the 1.18.2 release. --- NEWS.md | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/NEWS.md b/NEWS.md index 29258ef2c3..413647636f 100644 --- a/NEWS.md +++ b/NEWS.md @@ -8,18 +8,9 @@ 1. Removed use of non-API `ATTRIB`, `SET_ATTRIB`, and `findVar` [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks @aitap for the continued assiduous work here, and @MichaelChirico for the easy fix to replace `findVar` with `R_getVar`. -7. Fixed compilation failure like "error: unknown type name 'siginfo_t'" in v1.18.0 in some strict environments, e.g., FreeBSD, where the header file declaring the POSIX function `waitid` does not transitively include the header file defining the `siginfo_t` type, [#7516](https://github.com/rdatatable/data.table/issues/7516). Thanks to @jszhao for the report and @aitap for the fix. +2. Fixed compilation failure like "error: unknown type name 'siginfo_t'" in v1.18.0 in some strict environments, e.g., FreeBSD, where the header file declaring the POSIX function `waitid` does not transitively include the header file defining the `siginfo_t` type, [#7516](https://github.com/rdatatable/data.table/issues/7516). Thanks to @jszhao for the report and @aitap for the fix. -### Notes - -1. {data.table} now depends on R 3.5.0 (2018). - -2. pydatatable compatibility layer in `fread()` and `fwrite()` has been removed, [#7069](https://github.com/Rdatatable/data.table/issues/7069). Thanks @badasahog for the report and the PR. - -3. Vignettes are now built using `litedown` instead of `knitr`, [#6394](https://github.com/Rdatatable/data.table/issues/6394). Thanks @jangorecki for the suggestion and @ben-schwen and @aitap for the implementation. - - -4. `sum()` by group is correct with missing entries and GForce activated ([#7571](https://github.com/Rdatatable/data.table/issues/7571)). Thanks to @rweberc for the report and @manmita for the fix. The issue was caused by a faulty early `break` that spilled between groups, and resulted in silently incorrect results! +3. `sum()` by group is correct with missing entries and GForce activated ([#7571](https://github.com/Rdatatable/data.table/issues/7571)). Thanks to @rweberc for the report and @manmita for the fix. The issue was caused by a faulty early `break` that spilled between groups, and resulted in silently incorrect results! ## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025 From bcec07ba7c9739a6b3e256371250f9e2a1270c12 Mon Sep 17 00:00:00 2001 From: aitap Date: Mon, 12 Jan 2026 19:39:50 +0000 Subject: [PATCH 14/25] Fix http -> https link that is now a redirect (#7588) Found the following (possibly) invalid URLs: URL: http://stereopsis.com/radix.html (moved to https://stereopsis.com/radix.html) From: man/setkey.Rd man/setorder.Rd Status: 200 Message: OK --- man/setkey.Rd | 2 +- man/setorder.Rd | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/man/setkey.Rd b/man/setkey.Rd index 96e293fd28..f15373c94e 100644 --- a/man/setkey.Rd +++ b/man/setkey.Rd @@ -110,7 +110,7 @@ reference. \references{ \url{https://en.wikipedia.org/wiki/Radix_sort}\cr \url{https://en.wikipedia.org/wiki/Counting_sort}\cr - \url{http://stereopsis.com/radix.html}\cr + \url{https://stereopsis.com/radix.html}\cr \url{https://codercorner.com/RadixSortRevisited.htm}\cr \url{https://cran.r-project.org/package=bit64}\cr \url{https://github.com/Rdatatable/data.table/wiki/Presentations} diff --git a/man/setorder.Rd b/man/setorder.Rd index c810048d4e..b4a346cf18 100644 --- a/man/setorder.Rd +++ b/man/setorder.Rd @@ -113,7 +113,7 @@ If you require a copy, take a copy first (using \code{DT2 = copy(DT)}). See \references{ \url{https://en.wikipedia.org/wiki/Radix_sort}\cr \url{https://en.wikipedia.org/wiki/Counting_sort}\cr - \url{http://stereopsis.com/radix.html}\cr + \url{https://stereopsis.com/radix.html}\cr \url{https://codercorner.com/RadixSortRevisited.htm}\cr \url{https://medium.com/basecs/getting-to-the-root-of-sorting-with-radix-sort-f8e9240d4224} } From a748ab09fd33fb92f7459abdb707dea668dede07 Mon Sep 17 00:00:00 2001 From: Michael Chirico Date: Sun, 11 Jan 2026 13:52:19 -0800 Subject: [PATCH 15/25] Fix latest rchk issues (#7585) * attempt PROTECT for new rchk issues * different approach for longestLevels * use nprotect? * no longer using names SEXP * only UNPROTECT near exit * no, that cant be it... * move assignment into loop * REPROTECT approach * reduce diff * reduce diff --- src/assign.c | 6 +++--- src/mergelist.c | 7 ++++--- src/rbindlist.c | 6 ++++-- 3 files changed, 11 insertions(+), 8 deletions(-) diff --git a/src/assign.c b/src/assign.c index 1dd712b41c..156e73c9ee 100644 --- a/src/assign.c +++ b/src/assign.c @@ -52,7 +52,7 @@ void setselfref(SEXP x) { */ static int _selfrefok(SEXP x, Rboolean checkNames, Rboolean verbose) { - SEXP v, p, tag, prot, names; + SEXP v, p, tag, prot; v = getAttrib(x, SelfRefSymbol); if (v==R_NilValue || TYPEOF(v)!=EXTPTRSXP) { // .internal.selfref missing is expected and normal for i) a pre v1.7.8 data.table loaded @@ -70,11 +70,11 @@ static int _selfrefok(SEXP x, Rboolean checkNames, Rboolean verbose) { if (!isNull(p)) internal_error(__func__, ".internal.selfref ptr is neither NULL nor R_NilValue"); // # nocov tag = R_ExternalPtrTag(v); if (!(isNull(tag) || isString(tag))) internal_error(__func__, ".internal.selfref tag is neither NULL nor a character vector"); // # nocov - names = getAttrib(x, R_NamesSymbol); prot = R_ExternalPtrProtected(v); if (TYPEOF(prot) != EXTPTRSXP) // Very rare. Was error(_(".internal.selfref prot is not itself an extptr")). return 0; // # nocov ; see http://stackoverflow.com/questions/15342227/getting-a-random-internal-selfref-error-in-data-table-for-r - return checkNames ? names==tag : x==R_ExternalPtrAddr(prot); + if (!checkNames) return x == R_ExternalPtrAddr(prot); + return getAttrib(x, R_NamesSymbol) == tag; } static Rboolean selfrefok(SEXP x, Rboolean verbose) { // for readability diff --git a/src/mergelist.c b/src/mergelist.c index 90854ae824..2ed3950455 100644 --- a/src/mergelist.c +++ b/src/mergelist.c @@ -82,9 +82,10 @@ SEXP cbindlist(SEXP x, SEXP copyArg) { SET_VECTOR_ELT(ans, ians, thisxcol); SET_STRING_ELT(names, ians, STRING_ELT(thisnames, j)); } - mergeIndexAttrib(index, getAttrib(thisx, sym_index)); - if (isNull(key)) // first key is retained - key = getAttrib(thisx, sym_sorted); + mergeIndexAttrib(index, PROTECT(getAttrib(thisx, sym_index))); protecti++; + if (isNull(key)) { // first key is retained + key = PROTECT(getAttrib(thisx, sym_sorted)); protecti++; + } UNPROTECT(protecti); // thisnames, thisxcol } if (!ANY_ATTRIB(index)) diff --git a/src/rbindlist.c b/src/rbindlist.c index 764558c184..b10feb59df 100644 --- a/src/rbindlist.c +++ b/src/rbindlist.c @@ -277,7 +277,9 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg, SEXP ignor int maxType=LGLSXP; // initialize with LGLSXP for test 2002.3 which has col x NULL in both lists to be filled with NA for #1871 bool factor=false, orderedFactor=false; // ordered factor is class c("ordered","factor"). isFactor() is true when isOrdered() is true. int longestLen=-1, longestW=-1, longestI=-1; // just for ordered factor; longestLen must be initialized as -1 so that rbind zero-length ordered factor could work #4795 + PROTECT_INDEX ILongestLevels; SEXP longestLevels=R_NilValue; // just for ordered factor + PROTECT_WITH_INDEX(longestLevels, &ILongestLevels); nprotect++; bool int64=false, date=false, posixct=false, itime=false, asis=false; const char *foundName=NULL; bool anyNotStringOrFactor=false; @@ -303,7 +305,7 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg, SEXP ignor if (isOrdered(thisCol)) { orderedFactor = true; int thisLen = length(getAttrib(thisCol, R_LevelsSymbol)); - if (thisLen>longestLen) { longestLen=thisLen; longestLevels=getAttrib(thisCol, R_LevelsSymbol); /*for warnings later ...*/longestW=w; longestI=i; } + if (thisLen > longestLen) { longestLen=thisLen; REPROTECT(longestLevels=getAttrib(thisCol, R_LevelsSymbol), ILongestLevels); /*for warnings later ...*/longestW=w; longestI=i; } } } else if (!isString(thisCol)) anyNotStringOrFactor=true; // even for length 0 columns for consistency; test 2113.3 if (INHERITS(thisCol, char_integer64)) { @@ -562,6 +564,6 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg, SEXP ignor } } } - UNPROTECT(nprotect); // ans, ansNames, coercedForFactor? + UNPROTECT(nprotect); // ans, ansNames, longestLevels? coercedForFactor? return(ans); } From 85a4cf5ebdab470a8d9a2da67939ffddd2e02a30 Mon Sep 17 00:00:00 2001 From: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> Date: Tue, 30 Dec 2025 18:11:24 +0100 Subject: [PATCH 16/25] set automatically allocates new column slots if needed (#7538) * set automatically allocates new column slots if needed * use GetOption1 instead of GetOption * fix test * change froll test * remove assign change * add output statements to test loop * add helper function --- NEWS.md | 2 ++ R/data.table.R | 12 +++++++++++- inst/tests/froll.Rraw | 3 ++- inst/tests/tests.Rraw | 13 +++++++++++-- 4 files changed, 26 insertions(+), 4 deletions(-) diff --git a/NEWS.md b/NEWS.md index 413647636f..a7c75f6862 100644 --- a/NEWS.md +++ b/NEWS.md @@ -12,6 +12,8 @@ 3. `sum()` by group is correct with missing entries and GForce activated ([#7571](https://github.com/Rdatatable/data.table/issues/7571)). Thanks to @rweberc for the report and @manmita for the fix. The issue was caused by a faulty early `break` that spilled between groups, and resulted in silently incorrect results! +2. `set()` now automatically pre-allocates new column slots if needed, similar to what `:=` already does, [#1831](https://github.com/Rdatatable/data.table/issues/1831) [#4100](https://github.com/Rdatatable/data.table/issues/4100). Thanks to @zachokeeffe and @tyner for the report and @ben-schwen for the fix. + ## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025 ### BREAKING CHANGE diff --git a/R/data.table.R b/R/data.table.R index f05220a62b..27c985e44c 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -2854,10 +2854,20 @@ setcolorder = function(x, neworder=key(x), before=NULL, after=NULL, skip_absent= invisible(x) } +.set_needs_alloccol = function(x, value) { + # automatically allocate more space when tl <= ncol (either full or loaded from disk) + if (truelength(x) <= length(x)) return(TRUE) + if (selfrefok(x, verbose=FALSE) >= 1L) return(FALSE) + # value can be NULL or list with NULLs inside + if (is.null(value)) return(TRUE) + if (!is.list(value)) return(FALSE) + any(vapply_1b(value, is.null)) +} + set = function(x,i=NULL,j,value) # low overhead, loopable { # If removing columns from a table that's not selfrefok, need to call setalloccol first, #7488 - if ((is.null(value) || (is.list(value) && any(vapply_1b(value, is.null)))) && selfrefok(x, verbose=FALSE) < 1L) { + if (.set_needs_alloccol(x, value)) { name = substitute(x) setalloccol(x, verbose=FALSE) if (is.name(name)) { diff --git a/inst/tests/froll.Rraw b/inst/tests/froll.Rraw index 298ee5c8d1..6fd16f9806 100644 --- a/inst/tests/froll.Rraw +++ b/inst/tests/froll.Rraw @@ -2087,7 +2087,8 @@ if (use.fork) { test(6010.772, .selfref.ok(ans[[2L]])) ans = frollapply(1:2, 2, function(x) list(data.table(x)), fill=list(data.table(NA)), simplify=FALSE) test(6010.773, !.selfref.ok(ans[[2L]][[1L]])) - test(6010.7731, set(ans[[2L]][[1L]],, "newcol", 1L), error="data.table has either been loaded from disk") + # deactivated by #5443 + # test(6010.7731, set(ans[[2L]][[1L]],, "newcol", 1L), error="data.table has either been loaded from disk") ans = lapply(ans, lapply, setDT) test(6010.774, .selfref.ok(ans[[2L]][[1L]])) ## fix after ans = frollapply(1:2, 2, function(x) list(data.table(x)), fill=list(data.table(NA)), simplify=function(x) lapply(x, lapply, setDT)) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 9db0e7e11f..6bf1cde5ad 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -14798,7 +14798,7 @@ test(2016.1, name, "DT") test(2016.2, DT, data.table(a=1:3)) test(2016.3, DT[2,a:=4L], data.table(a=INT(1,4,3))) # no error for := when existing column test(2016.4, set(DT,3L,1L,5L), data.table(a=INT(1,4,5))) # no error for set() when existing column -test(2016.5, set(DT,2L,"newCol",5L), error="either been loaded from disk.*or constructed manually.*Please run setDT.*setalloccol.*on it first") # just set() +test(2016.5, set(DT,2L,"newCol",5L), data.table(a=INT(1,4,5), newCol=INT(NA,5L,NA))) # works since set overallocates #4100 test(2016.6, DT[2,newCol:=6L], data.table(a=INT(1,4,5), newCol=INT(NA,6L,NA))) # := ok (it changes DT in caller) unlink(tt) @@ -19453,7 +19453,7 @@ test(2290.4, DT[, `:=`(a = 2, c := 3)], error="It looks like you re-used `:=` in df = data.frame(a=1:3) setDT(df) attr(df, "att") = 1 -test(2291.1, set(df, NULL, "new", "new"), error="either been loaded from disk.*or constructed manually.*Please run setDT.*setalloccol.*on it first") +test(2291.1, set(df, NULL, "new", "new"), setattr(data.table(a=1:3, new="new"), "att", 1)) # fixed when calling setalloccol before set #4100 # ns-qualified bysub error, #6493 DT = data.table(a = 1) @@ -21990,3 +21990,12 @@ if (test_bit64) local({ merged$gforce_mean, merged$true_mean ) }) + +# re-overallocate in set if quota is reached #496 #1831 #4100 +DT = data.table() +test(2356.1, options=c(datatable.alloccol=1L), {for (i in seq(10L)) set(DT, j = paste0("V",i), value = i); ncol(DT)}, 10L) +DT = structure(list(a = 1, b = 2), class = c("data.table", "data.frame")) +test(2356.2, options=c(datatable.alloccol=1L), set(DT, j="c", value=3), data.table(a=1, b=2, c=3)) +# ensure := and set are consistent if they need to overallocate +DT = data.table(); DT2 = data.table() +test(2356.3, options=c(datatable.alloccol=1L), {for (i in seq(10L)) set(DT, j = sprintf("V%d",i), value = i); DT}, {for (i in seq(10)) DT2[, sprintf("V%d",i) := i]; DT2}) From 21bd6b04d488467f7ac1927a35b3b821f9d6b459 Mon Sep 17 00:00:00 2001 From: aitap Date: Wed, 14 Jan 2026 17:42:54 +0000 Subject: [PATCH 17/25] frollmedianFast: avoid reading uninitialised array (#7589) The 'n' array is only initialised if 'even' is true, so skip the comparisons otherwise. Detected by checking with --use-valgrind or performing a frollmedian() with an odd window size under R -d valgrind. Fixes: #7546 --- src/froll.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/froll.c b/src/froll.c index 526134095d..f8315c3ec4 100644 --- a/src/froll.c +++ b/src/froll.c @@ -1707,11 +1707,13 @@ void frollmedianFast(const double *x, uint64_t nx, ans_t *ans, int k, double fil snprintf(end(ans->message[3]), 500, _("%s: 's[A] + s[B] == h' is not true\n"), "frollmedianFast"); return; }*/ - if (n[A]!=tail && m[A] == n[A]) { - n[A] = tail; - } - if (n[B]!=tail && m[B] == n[B]) { - n[B] = tail; + if (even) { + if (n[A]!=tail && m[A] == n[A]) { + n[A] = tail; + } + if (n[B]!=tail && m[B] == n[B]) { + n[B] = tail; + } } ansv[j*k+i] = even ? MED2(A, B) : MED(A, B); } From 0add2dc9e542e5f9efcc59aa15f714446add05e9 Mon Sep 17 00:00:00 2001 From: aitap Date: Wed, 14 Jan 2026 19:44:25 +0000 Subject: [PATCH 18/25] `setlevels()`: avoid crash on missing factor values (#7596) Check for missing or out of bounds values and set them to NA. --- NEWS.md | 4 ++++ inst/tests/tests.Rraw | 4 ++++ src/wrappers.c | 7 +++++-- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/NEWS.md b/NEWS.md index a7c75f6862..2ebfedb56e 100644 --- a/NEWS.md +++ b/NEWS.md @@ -4,6 +4,10 @@ ## data.table [v1.18.2](https://github.com/Rdatatable/data.table/milestone/44?closed=1) In Development +### BUG FIXES + +1. When fixing duplicate factor levels, `setattr()` no longer crashes upon encountering missing factor values, [#7595](https://github.com/Rdatatable/data.table/issues/7595). Thanks to @sindribaldur for the report and @aitap for the fix. + ### Notes 1. Removed use of non-API `ATTRIB`, `SET_ATTRIB`, and `findVar` [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks @aitap for the continued assiduous work here, and @MichaelChirico for the easy fix to replace `findVar` with `R_getVar`. diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 6bf1cde5ad..62ade34eb8 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -21999,3 +21999,7 @@ test(2356.2, options=c(datatable.alloccol=1L), set(DT, j="c", value=3), data.tab # ensure := and set are consistent if they need to overallocate DT = data.table(); DT2 = data.table() test(2356.3, options=c(datatable.alloccol=1L), {for (i in seq(10L)) set(DT, j = sprintf("V%d",i), value = i); DT}, {for (i in seq(10)) DT2[, sprintf("V%d",i) := i]; DT2}) + +# setattr() must not crash for out-of-bounds factor indices when fixing duplicate levels, #7595 +test(2357.1, setattr(factor(c(1, NA), levels = 1), "levels", c("1", "1")), factor(c(1, NA))) +test(2357.2, setattr(structure(c(-999L, 999L), class = "factor", levels = "a"), "levels", c("b", "b")), factor(c(NA, NA), levels = "b")) diff --git a/src/wrappers.c b/src/wrappers.c index 2b26761bfd..fb6aa7f351 100644 --- a/src/wrappers.c +++ b/src/wrappers.c @@ -44,8 +44,11 @@ SEXP setlevels(SEXP x, SEXP levels, SEXP ulevels) { SEXP xchar, newx; xchar = PROTECT(allocVector(STRSXP, nx)); int *ix = INTEGER(x); - for (int i=0; i= 1 && ixi <= nlevels) ? STRING_ELT(levels, ix[i]-1) : NA_STRING); + } newx = PROTECT(chmatch(xchar, ulevels, NA_INTEGER)); int *inewx = INTEGER(newx); for (int i=0; i Date: Mon, 19 Jan 2026 12:44:24 +0100 Subject: [PATCH 19/25] Avoid out-of-bounds access in `overlaps` (#7598) * Add tests * overlaps: avoid accessing length-0 vectors in ux If 'ux' contains 0 rows, pretend that all comparisons against its non-existent elements fail. * overlaps: avoid 'lookup' list overflow This used to happen when from[i] was 0. (No match on non-range columns?) * NEWS entry * overlaps: uncomment one more underflow test Technically this one was harmless (and thus not caught by sanitizers) because the preceding VECSEXP header always contained a 0, preventing the branch where VECTOR_ELT() would be called with a negative index. * test formatting * Update src/ijoin.c Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> * Update src/ijoin.c Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> * Update src/ijoin.c Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> * Update inst/tests/tests.Rraw * overlaps: uncomment the remaining underflow test The underflow is covered by already existing tests. --------- Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> --- NEWS.md | 4 +++- inst/tests/tests.Rraw | 13 +++++++++++++ src/ijoin.c | 25 ++++++++++++------------- 3 files changed, 28 insertions(+), 14 deletions(-) diff --git a/NEWS.md b/NEWS.md index 2ebfedb56e..5ade7d08c8 100644 --- a/NEWS.md +++ b/NEWS.md @@ -8,6 +8,8 @@ 1. When fixing duplicate factor levels, `setattr()` no longer crashes upon encountering missing factor values, [#7595](https://github.com/Rdatatable/data.table/issues/7595). Thanks to @sindribaldur for the report and @aitap for the fix. +2. `foverlaps()` no longer crashes due to out-of-bounds access to list and integer vectors when `y` has no rows or the non-range part of the join fails, [#7597](https://github.com/Rdatatable/data.table/issues/7597). Thanks to @nextpagesoft for the report and @aitap for the fix. + ### Notes 1. Removed use of non-API `ATTRIB`, `SET_ATTRIB`, and `findVar` [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks @aitap for the continued assiduous work here, and @MichaelChirico for the easy fix to replace `findVar` with `R_getVar`. @@ -16,7 +18,7 @@ 3. `sum()` by group is correct with missing entries and GForce activated ([#7571](https://github.com/Rdatatable/data.table/issues/7571)). Thanks to @rweberc for the report and @manmita for the fix. The issue was caused by a faulty early `break` that spilled between groups, and resulted in silently incorrect results! -2. `set()` now automatically pre-allocates new column slots if needed, similar to what `:=` already does, [#1831](https://github.com/Rdatatable/data.table/issues/1831) [#4100](https://github.com/Rdatatable/data.table/issues/4100). Thanks to @zachokeeffe and @tyner for the report and @ben-schwen for the fix. +4. `set()` now automatically pre-allocates new column slots if needed, similar to what `:=` already does, [#1831](https://github.com/Rdatatable/data.table/issues/1831) [#4100](https://github.com/Rdatatable/data.table/issues/4100). Thanks to @zachokeeffe and @tyner for the report and @ben-schwen for the fix. ## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025 diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 62ade34eb8..e89f16907b 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -22003,3 +22003,16 @@ test(2356.3, options=c(datatable.alloccol=1L), {for (i in seq(10L)) set(DT, j = # setattr() must not crash for out-of-bounds factor indices when fixing duplicate levels, #7595 test(2357.1, setattr(factor(c(1, NA), levels = 1), "levels", c("1", "1")), factor(c(1, NA))) test(2357.2, setattr(structure(c(-999L, 999L), class = "factor", levels = "a"), "levels", c("b", "b")), factor(c(NA, NA), levels = "b")) + +# foverlaps shouldn't segfault on 0-row 'y', #7597 +x = data.table(Id="A", StartX=1L, EndX=2L) +y = data.table(Id=character(), StartY=integer(), EndY=integer()) +by.x = c("Id", "StartX", "EndX") +by.y = c("Id", "StartY", "EndY") +setkeyv(y, by.y) +y2 = data.table(Id="none", StartY=integer(1), EndY=integer(1)) +setkeyv(y2, by.y) +test(2363.1, foverlaps(x, y, by.x, by.y), foverlaps(x, y2, by.x, by.y)) +test(2363.2, foverlaps(x, y2, by.x, by.y, type="any", mult="all"), foverlaps(x, y2, by.x, by.y, type="any", mult="first")) +test(2363.3, foverlaps(x, y, by.x, by.y, which=TRUE, mult="first", nomatch=NULL), foverlaps(x, y2, by.x, by.y, which=TRUE, mult="first", nomatch=NULL)) +rm(x, y, y2) diff --git a/src/ijoin.c b/src/ijoin.c index e81e8325fa..737ac61b63 100644 --- a/src/ijoin.c +++ b/src/ijoin.c @@ -223,7 +223,7 @@ SEXP lookup(SEXP ux, SEXP xlen, SEXP indices, SEXP gaps, SEXP overlaps, SEXP mul SEXP overlaps(SEXP ux, SEXP imatches, SEXP multArg, SEXP typeArg, SEXP nomatchArg, SEXP verbose) { - R_len_t uxcols=LENGTH(ux),rows=length(VECTOR_ELT(imatches,0)); + R_len_t uxcols=LENGTH(ux), rows=length(VECTOR_ELT(imatches,0)), xrows=length(VECTOR_ELT(ux,0)); int nomatch = INTEGER(nomatchArg)[0], totlen=0, thislen; int *from = INTEGER(VECTOR_ELT(imatches, 0)); int *to = INTEGER(VECTOR_ELT(imatches, 1)); @@ -252,8 +252,7 @@ SEXP overlaps(SEXP ux, SEXP imatches, SEXP multArg, SEXP typeArg, SEXP nomatchAr // As a first pass get the final length, so that we can allocate up-front and not deal with R_Calloc + R_Realloc + size calculation hassle // Checked the time for this loop on realisitc data (81m reads) and took 0.27 seconds! No excuses ;). start = clock(); - if (mult == ALL) { - totlen=0; + if (xrows && mult == ALL) { switch (type) { case START: case END: for (int i=0; i 0) ? from[i] : 1; - const int k = from[i]; + const int k = (from[i] > 0) ? from[i] : 1; if (k<=to[i]) totlen += count[k-1]; for (int j=k+1; j<=to[i]; ++j) @@ -340,7 +338,7 @@ SEXP overlaps(SEXP ux, SEXP imatches, SEXP multArg, SEXP typeArg, SEXP nomatchAr // switching mult=ALL,FIRST,LAST separately to // - enhance performance for special cases, and // - easy to fix any bugs in the future - switch (mult) { + if (xrows) switch (mult) { case ALL: switch (type) { case START : case END : @@ -402,8 +400,7 @@ SEXP overlaps(SEXP ux, SEXP imatches, SEXP multArg, SEXP typeArg, SEXP nomatchAr case ANY : for (int i=0; i0) ? from[i] : 1; - const int k = from[i]; + const int k = (from[i]>0) ? from[i] : 1; if (k<=to[i]) { tmp1 = VECTOR_ELT(lookup, k-1); for (int m=0; m0) ? from[i] : 1; - const int k = from[i]; + const int k = (from[i]>0) ? from[i] : 1; for (int j=k; j<=to[i]; ++j) { if (type_count[j-1]) { tmp2 = VECTOR_ELT(type_lookup, j-1); @@ -559,7 +555,7 @@ SEXP overlaps(SEXP ux, SEXP imatches, SEXP multArg, SEXP typeArg, SEXP nomatchAr ++thislen; ++j; ++m; break; } else if ( INTEGER(tmp1)[j] > INTEGER(tmp2)[m] ) { - ++m;; + ++m; } else ++j; } } @@ -659,8 +655,7 @@ SEXP overlaps(SEXP ux, SEXP imatches, SEXP multArg, SEXP typeArg, SEXP nomatchAr for (int i=0; i0) ? from[i] : 1; - const int k = from[i]; + const int k = (from[i]>0) ? from[i] : 1; if (k <= to[i]) { if (k==to[i] && count[k-1]) { tmp1 = VECTOR_ELT(lookup, k-1); @@ -723,6 +718,10 @@ SEXP overlaps(SEXP ux, SEXP imatches, SEXP multArg, SEXP typeArg, SEXP nomatchAr } break; default: internal_error(__func__, "unknown mult: %d", mult); // # nocov + } else if (totlen) { + int *f1i = INTEGER(f1__), *f2i = INTEGER(f2__); + for (R_len_t i = 0; i < totlen; ++i) f1i[i] = i+1; + for (R_len_t i = 0; i < totlen; ++i) f2i[i] = nomatch; } end2 = clock() - start; if (LOGICAL(verbose)[0]) From 2a9feba770274e47160d658b6b5db0f43347cef3 Mon Sep 17 00:00:00 2001 From: aitap Date: Tue, 20 Jan 2026 06:47:40 +0000 Subject: [PATCH 20/25] Only export R_init_data_table (#7607) This will avoid name clashes between data.table functions (now hidden) and other functions in the global namespace visible to the shared library loader. Fixes: #7605 --- src/Makevars.in | 4 ++-- src/data.table-win.def | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) create mode 100644 src/data.table-win.def diff --git a/src/Makevars.in b/src/Makevars.in index fcfaceba99..8cf8db729c 100644 --- a/src/Makevars.in +++ b/src/Makevars.in @@ -1,5 +1,5 @@ -PKG_CFLAGS = @PKG_CFLAGS@ @openmp_cflags@ @zlib_cflags@ -PKG_LIBS = @PKG_LIBS@ @openmp_cflags@ @zlib_libs@ +PKG_CFLAGS = $(C_VISIBILITY) @PKG_CFLAGS@ @openmp_cflags@ @zlib_cflags@ +PKG_LIBS = $(C_VISIBILITY) @PKG_LIBS@ @openmp_cflags@ @zlib_libs@ # See WRE $1.2.1.1. But retain user supplied PKG_* too, #4664. # WRE states ($1.6) that += isn't portable and that we aren't allowed to use it. # Otherwise we could use the much simpler PKG_LIBS += @openmp_cflags@ -lz. diff --git a/src/data.table-win.def b/src/data.table-win.def new file mode 100644 index 0000000000..db64187670 --- /dev/null +++ b/src/data.table-win.def @@ -0,0 +1,3 @@ +LIBRARY data.table.dll +EXPORTS + R_init_data_table From c54ae16f0b708483e84ff7126db56e74ab327b06 Mon Sep 17 00:00:00 2001 From: aitap Date: Tue, 20 Jan 2026 09:21:30 +0000 Subject: [PATCH 21/25] `set()`: only reallocate the table if resizing would fail otherwise (#7606) * Regression tests * set(): only reallocate if resizing would fail * Update R/data.table.R Co-authored-by: Michael Chirico * Rename test variables Co-Authored-By: Michael Chirico * Cache j %chin% names(x) Co-Authored-By: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com --------- Co-authored-by: Michael Chirico Co-authored-by: Michael Chirico --- R/data.table.R | 29 +++++++++++++++++++++-------- inst/tests/tests.Rraw | 12 ++++++++++++ 2 files changed, 33 insertions(+), 8 deletions(-) diff --git a/R/data.table.R b/R/data.table.R index 27c985e44c..fe492a0aef 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -2854,20 +2854,33 @@ setcolorder = function(x, neworder=key(x), before=NULL, after=NULL, skip_absent= invisible(x) } -.set_needs_alloccol = function(x, value) { +.set_needs_alloccol = function(x, j, value) { + # set() will try to resize x when adding or removing columns + # when removing a column, value can be NULL or list with NULLs inside + removing = is.null(value) || (is.list(value) && length(value) == length(j) && any(vapply_1b(value, is.null))) + # columns can be created by name + adding = if (is.character(j)) { + jexists = j %chin% names(x) + !all(jexists) + } else FALSE + + if (!(removing || adding)) return(FALSE) + # automatically allocate more space when tl <= ncol (either full or loaded from disk) - if (truelength(x) <= length(x)) return(TRUE) - if (selfrefok(x, verbose=FALSE) >= 1L) return(FALSE) - # value can be NULL or list with NULLs inside - if (is.null(value)) return(TRUE) - if (!is.list(value)) return(FALSE) - any(vapply_1b(value, is.null)) + # (or if a resize operation would otherwise fail) + if (selfrefok(x, verbose=FALSE) < 1L || truelength(x) <= length(x)) + return(TRUE) + + if (adding) + return(truelength(x) < length(x) + sum(!jexists)) + + FALSE } set = function(x,i=NULL,j,value) # low overhead, loopable { # If removing columns from a table that's not selfrefok, need to call setalloccol first, #7488 - if (.set_needs_alloccol(x, value)) { + if (.set_needs_alloccol(x, j, value)) { name = substitute(x) setalloccol(x, verbose=FALSE) if (is.name(name)) { diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index e89f16907b..40f778b304 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -22016,3 +22016,15 @@ test(2363.1, foverlaps(x, y, by.x, by.y), foverlaps(x, y2, by.x, by.y)) test(2363.2, foverlaps(x, y2, by.x, by.y, type="any", mult="all"), foverlaps(x, y2, by.x, by.y, type="any", mult="first")) test(2363.3, foverlaps(x, y, by.x, by.y, which=TRUE, mult="first", nomatch=NULL), foverlaps(x, y2, by.x, by.y, which=TRUE, mult="first", nomatch=NULL)) rm(x, y, y2) + +# internal use of set() causes non-resizable data.tables to be re-assigned in the wrong frame, #7604 +# bmerge -> coerce_col +x = structure(list(a = as.double(2:3), b = list("foo", "bar")), class = c("data.table", "data.frame")) +y = structure(list(a = 1:3), class = c("data.table", "data.frame")) +test(2364.1, x[y, on = "a"], data.table(a = 1:3, b = list(NULL, "foo", "bar"))) +x = structure(list(a = factor("a", levels = letters)), class = c("data.table", "data.frame")) +y = data.table(a = factor("a", levels = letters)) +setdroplevels(x) +setdroplevels(y) +test(2364.2, levels(x$a), levels(y$a)) +rm(x, y) From d234fb577cb7caff8f47c14b6e7a9032faca9e94 Mon Sep 17 00:00:00 2001 From: aitap Date: Tue, 20 Jan 2026 14:53:28 +0000 Subject: [PATCH 22/25] NEWS entry for #7607 (#7608) * NEWS entry for #7607 * More about the problem being solved Co-Authored-By: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> --- NEWS.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/NEWS.md b/NEWS.md index 5ade7d08c8..f77204413f 100644 --- a/NEWS.md +++ b/NEWS.md @@ -10,6 +10,8 @@ 2. `foverlaps()` no longer crashes due to out-of-bounds access to list and integer vectors when `y` has no rows or the non-range part of the join fails, [#7597](https://github.com/Rdatatable/data.table/issues/7597). Thanks to @nextpagesoft for the report and @aitap for the fix. +3. The dynamic library now exports only `R_init_data_table`, preventing symbol name conflicts like `hash_create` with PostgreSQL, [#7605](https://github.com/Rdatatable/data.table/issues/7605). Thanks to @ced75 for the report and @aitap for the fix + ### Notes 1. Removed use of non-API `ATTRIB`, `SET_ATTRIB`, and `findVar` [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks @aitap for the continued assiduous work here, and @MichaelChirico for the easy fix to replace `findVar` with `R_getVar`. From b3b7499d540c4ed49ed2a7a3f01bc2999b57c28f Mon Sep 17 00:00:00 2001 From: aitap Date: Sat, 24 Jan 2026 17:29:46 +0000 Subject: [PATCH 23/25] `tests/froll.R`: disable `mcparallel` under Valgrind (#7621) --- tests/froll.R | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tests/froll.R b/tests/froll.R index faf69d28b8..d4ca87f9db 100644 --- a/tests/froll.R +++ b/tests/froll.R @@ -1,4 +1,10 @@ -Sys.setenv(OMP_THREAD_LIMIT = Sys.getenv("OMP_THREAD_LIMIT", "2")) +# Valgrind developers say that performing work after fork() without +# exec() is problematic for Valgrind. frollapply() uses +# parallel::mcparallel(), which causes Valgrind to run +# Rstd_ReadConsole() incorrectly. +Sys.setenv(OMP_THREAD_LIMIT = + if (grepl("valgrind", Sys.getenv("LD_PRELOAD"))) "1" else Sys.getenv("OMP_THREAD_LIMIT", "2") +) require(data.table) test.data.table(script="froll.Rraw") test.data.table(script="frollBatch.Rraw", optional=TRUE) From e4c5317e4febbd7f05cadb5becfaa4d68aa5778f Mon Sep 17 00:00:00 2001 From: Benjamin Schwendinger Date: Tue, 27 Jan 2026 15:15:54 +0100 Subject: [PATCH 24/25] update NEWS --- NEWS.md | 40 +++++++++++++++++++++++----------------- 1 file changed, 23 insertions(+), 17 deletions(-) diff --git a/NEWS.md b/NEWS.md index a5e73e24ef..769055c77f 100644 --- a/NEWS.md +++ b/NEWS.md @@ -30,39 +30,45 @@ 1. `fread()` with `skip=0` and `(header=TRUE|FALSE)` no longer skips the first row when it has fewer fields than subsequent rows, [#7463](https://github.com/Rdatatable/data.table/issues/7463). Thanks @emayerhofer for the report and @ben-schwen for the fix. -2. `set()` now automatically pre-allocates new column slots if needed, similar to what `:=` already does, [#1831](https://github.com/Rdatatable/data.table/issues/1831) [#4100](https://github.com/Rdatatable/data.table/issues/4100). Thanks to @zachokeeffe and @tyner for the report and @ben-schwen for the fix. +2. `fread("file://...")` works for file URIs with spaces, [#7550](https://github.com/Rdatatable/data.table/issues/7550). Thanks @aitap for the report and @MichaelChirico for the PR. -3. `fread("file://...")` works for file URIs with spaces, [#7550](https://github.com/Rdatatable/data.table/issues/7550). Thanks @aitap for the report and @MichaelChirico for the PR. +3. `fread(text=)` could segfault when reading text input ending with a `\x1a` (ASCII SUB) character after a long line, [#7407](https://github.com/Rdatatable/data.table/issues/7407) which is solved by adding check for eof. Thanks @aitap for the report and @manmita for the fix. -4. `sum()` by group is correct with missing entries and GForce activated ([#7571](https://github.com/Rdatatable/data.table/issues/7571)). Thanks to @rweberc for the report and @manmita for the fix. The issue was caused by a faulty early `break` that spilled between groups, and resulted in silently incorrect results! +4. `rowwiseDT()` now provides a helpful error message when a complex object that is not a list (e.g., a function) is provided as a cell value, instructing the user to wrap it in `list()`, [#7219](https://github.com/Rdatatable/data.table/issues/7219). Thanks @kylebutts for the report and @venom1204 for the fix. -5. `fread(text=)` could segfault when reading text input ending with a `\x1a` (ASCII SUB) character after a long line, [#7407](https://github.com/Rdatatable/data.table/issues/7407) which is solved by adding check for eof. Thanks @aitap for the report and @manmita for the fix. +### Notes -6. `rowwiseDT()` now provides a helpful error message when a complex object that is not a list (e.g., a function) is provided as a cell value, instructing the user to wrap it in `list()`, [#7219](https://github.com/Rdatatable/data.table/issues/7219). Thanks @kylebutts for the report and @venom1204 for the fix. +1. {data.table} now depends on R 3.5.0 (2018). -7. Fixed compilation failure like "error: unknown type name 'siginfo_t'" in v1.18.0 in some strict environments, e.g., FreeBSD, where the header file declaring the POSIX function `waitid` does not transitively include the header file defining the `siginfo_t` type, [#7516](https://github.com/rdatatable/data.table/issues/7516). Thanks to @jszhao for the report and @aitap for the fix. +2. pydatatable compatibility layer in `fread()` and `fwrite()` has been removed, [#7069](https://github.com/Rdatatable/data.table/issues/7069). Thanks @badasahog for the report and the PR. -8. When fixing duplicate factor levels, `setattr()` no longer crashes upon encountering missing factor values, [#7595](https://github.com/Rdatatable/data.table/issues/7595). Thanks to @sindribaldur for the report and @aitap for the fix. +3. Vignettes are now built using `litedown` instead of `knitr`, [#6394](https://github.com/Rdatatable/data.table/issues/6394). Thanks @jangorecki for the suggestion and @ben-schwen and @aitap for the implementation. -9. `foverlaps()` no longer crashes due to out-of-bounds access to list and integer vectors when `y` has no rows or the non-range part of the join fails, [#7597](https://github.com/Rdatatable/data.table/issues/7597). Thanks to @nextpagesoft for the report and @aitap for the fix. +4. The data.table test suite is a bit more robust to lacking UTF-8 support via a new `requires_utf8` argument to `test()` to skip tests when UTF-8 support is not available, [#7336](https://github.com/Rdatatable/data.table/issues/7336). Thanks @MichaelChirico for the suggestion and @ben-schwen for the implementation. -10. The dynamic library now exports only `R_init_data_table`, preventing symbol name conflicts like `hash_create` with PostgreSQL, [#7605](https://github.com/Rdatatable/data.table/issues/7605). Thanks to @ced75 for the report and @aitap for the fix. +5. `melt()` and `dcast()` no longer provide nudges when receiving incompatible inputs (e.g. data.frames). As of now, we only define methods for `data.table` inputs. -### Notes +## data.table [v1.18.2.1](https://github.com/Rdatatable/data.table/milestone/34?closed=1) (22 January 2026) -1. {data.table} now depends on R 3.5.0 (2018). +### BUG FIXES -2. pydatatable compatibility layer in `fread()` and `fwrite()` has been removed, [#7069](https://github.com/Rdatatable/data.table/issues/7069). Thanks @badasahog for the report and the PR. +1. When fixing duplicate factor levels, `setattr()` no longer crashes upon encountering missing factor values, [#7595](https://github.com/Rdatatable/data.table/issues/7595). Thanks to @sindribaldur for the report and @aitap for the fix. -3. Vignettes are now built using `litedown` instead of `knitr`, [#6394](https://github.com/Rdatatable/data.table/issues/6394). Thanks @jangorecki for the suggestion and @ben-schwen and @aitap for the implementation. +2. `foverlaps()` no longer crashes due to out-of-bounds access to list and integer vectors when `y` has no rows or the non-range part of the join fails, [#7597](https://github.com/Rdatatable/data.table/issues/7597). Thanks to @nextpagesoft for the report and @aitap for the fix. + +3. The dynamic library now exports only `R_init_data_table`, preventing symbol name conflicts like `hash_create` with PostgreSQL, [#7605](https://github.com/Rdatatable/data.table/issues/7605). Thanks to @ced75 for the report and @aitap for the fix + +### Notes + +1. Removed use of non-API `ATTRIB`, `SET_ATTRIB`, and `findVar` [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks @aitap for the continued assiduous work here, and @MichaelChirico for the easy fix to replace `findVar` with `R_getVar`. -4. Removed use of non-API `ATTRIB`, `SET_ATTRIB`, and `findVar` [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks @aitap for the continued assiduous work here, and @MichaelChirico for the easy fix to replace `findVar` with `R_getVar`. +2. Fixed compilation failure like "error: unknown type name 'siginfo_t'" in v1.18.0 in some strict environments, e.g., FreeBSD, where the header file declaring the POSIX function `waitid` does not transitively include the header file defining the `siginfo_t` type, [#7516](https://github.com/rdatatable/data.table/issues/7516). Thanks to @jszhao for the report and @aitap for the fix. -5. The data.table test suite is a bit more robust to lacking UTF-8 support via a new `requires_utf8` argument to `test()` to skip tests when UTF-8 support is not available, [#7336](https://github.com/Rdatatable/data.table/issues/7336). Thanks @MichaelChirico for the suggestion and @ben-schwen for the implementation. +3. `sum()` by group is correct with missing entries and GForce activated ([#7571](https://github.com/Rdatatable/data.table/issues/7571)). Thanks to @rweberc for the report and @manmita for the fix. The issue was caused by a faulty early `break` that spilled between groups, and resulted in silently incorrect results! -6. `melt()` and `dcast()` no longer provide nudges when receiving incompatible inputs (e.g. data.frames). As of now, we only define methods for `data.table` inputs. +4. `set()` now automatically pre-allocates new column slots if needed, similar to what `:=` already does, [#1831](https://github.com/Rdatatable/data.table/issues/1831) [#4100](https://github.com/Rdatatable/data.table/issues/4100). Thanks to @zachokeeffe and @tyner for the report and @ben-schwen for the fix. -## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025 +## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) (23 December 2025) ### BREAKING CHANGE From de3750e3da6d4369f6a9fe1909dc9d708e93a62a Mon Sep 17 00:00:00 2001 From: Benjamin Schwendinger Date: Tue, 27 Jan 2026 15:23:59 +0100 Subject: [PATCH 25/25] fix merge --- NEWS.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/NEWS.md b/NEWS.md index f7270c283b..24a0788360 100644 --- a/NEWS.md +++ b/NEWS.md @@ -68,11 +68,7 @@ 4. `set()` now automatically pre-allocates new column slots if needed, similar to what `:=` already does, [#1831](https://github.com/Rdatatable/data.table/issues/1831) [#4100](https://github.com/Rdatatable/data.table/issues/4100). Thanks to @zachokeeffe and @tyner for the report and @ben-schwen for the fix. -<<<<<<< HEAD ## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) (23 December 2025) -======= -## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025 ->>>>>>> patch-1.18.2 ### BREAKING CHANGE