GSoC Week 5–6 Progress: Named Vectors, Added Classed Error Conditions, and Clear Docs
Hi everyone!
Here’s my progress update covering Weeks 5 and 6 of GSoC. These two weeks have been focused on usability and developer experience improvements in data.table
. While they weren’t about deep C-level work, they helped make things easier and clearer for both users and developers.
Row Names from Named Vectors in data.table()
— Now Supported!
One of the things I worked on was adding row name extraction for named vectors when using:
data.table(x = named_vector, keep.rownames = TRUE)
Before this change, keep.rownames
only worked for matrix or data.frame
inputs, but not for plain named vectors. That wasn’t consistent with how data.frame()
behaves.
After some discussion in the issue page, I introduced logic in as.data.table.list()
to:
- Detect the first named atomic vector in the input.
- Extract its names as a new column called
"rn"
or a custom name. - Remove the names from the vector to avoid duplication.
Example
Here’s what it looks like now:
data.table(x = setNames((1:12)^3, month.abb), keep.rownames = TRUE)
Before:
x
<num>
1: 1
2: 8
...
12: 1728
After:
rn x
<char> <num>
1: Jan 1
...
12: Dec 1728
It’s a small thing, but it smooths out an edge case for people coming from base R. You can check out PR #7103 for full details.
Smarter, Programmatic Error Handling with Classed Conditions
Another thing I focused on was making data.table
errors easier to work with in R scripts.
Right now, data.table
users can catch errors using tryCatch(..., error = ...)
. But there wasn’t an easy way to distinguish between different types of errors without parsing error messages.
What I Added
In PR #7134, I added four new strategic condition classes:
dt_missing_column_error
dt_unsortable_type_error
dt_join_type_mismatch_error
dt_invalid_input_error
How to Use It
Now users can catch specific error types like this:
tryCatch({
some_data_table_code()
}, dt_missing_column_error = function(e) {
message("A specific column was missing!")
})
This builds on the infrastructure introduced in an earlier PR with raise_condition()
. And just like we did before, I added a dedicated help page:
?datatable-condition-classes
Documentation and Linter
A new linter under .ci/linters/rd/
ensures all condition classes are properly documented — same idea as what we did for data.table-options
earlier. This sets things up so that as new error classes get added in the future, the docs and code will stay in sync automatically.
Fixing Long-Standing Help Text Confusion with with=
Finally, I tackled a smaller but meaningful documentation improvement in PR #7155.
The help page for data.table()
had a confusing statement that said:
“x[, cols] is equivalent to x[, ..cols]…”
That’s not actually true.
x[, cols]
literally looks for a column called"cols"
indata.table
.- Only
x[, ..cols]
orx[, cols, with=FALSE]
correctly look up a variablecols
in the parent environment.
What I Changed
So I rewrote that section of the help text to:
- Remove the incorrect equivalence claim.
- Add clearer, up-to-date examples showing how to handle dynamic column selection in
data.table
.
Even though it’s not a code change, keeping documentation clear and accurate is just as important, especially for a widely used package like data.table
.
Closing Thoughts
Weeks 5–6 felt really productive. Instead of just fixing one-off issues, I got to work on things that improve the overall developer experience for data.table
— from smarter error handling to clearer documentation to smoother named vector behavior.
Next up, I’m looking forward to tackling some deeper technical tasks. Thanks for reading and following along!
— Mukul