5-6thWeek of GSoC

By Mukul Kumar

GSoC Week 5–6 Progress: Named Vectors, Added Classed Error Conditions, and Clear Docs

Hi everyone!

Here’s my progress update covering Weeks 5 and 6 of GSoC. These two weeks have been focused on usability and developer experience improvements in data.table. While they weren’t about deep C-level work, they helped make things easier and clearer for both users and developers.

Row Names from Named Vectors in data.table() — Now Supported!

One of the things I worked on was adding row name extraction for named vectors when using:

data.table(x = named_vector, keep.rownames = TRUE)

Before this change, keep.rownames only worked for matrix or data.frame inputs, but not for plain named vectors. That wasn’t consistent with how data.frame() behaves.

After some discussion in the issue page, I introduced logic in as.data.table.list() to:

  • Detect the first named atomic vector in the input.
  • Extract its names as a new column called "rn" or a custom name.
  • Remove the names from the vector to avoid duplication.

Example

Here’s what it looks like now:

data.table(x = setNames((1:12)^3, month.abb), keep.rownames = TRUE)

Before:

         x
     <num>
 1:     1
 2:     8
...
12:  1728

After:

        rn     x
    <char> <num>
 1:    Jan     1
...
12:    Dec  1728

It’s a small thing, but it smooths out an edge case for people coming from base R. You can check out PR #7103 for full details.

Smarter, Programmatic Error Handling with Classed Conditions

Another thing I focused on was making data.table errors easier to work with in R scripts.

Right now, data.table users can catch errors using tryCatch(..., error = ...). But there wasn’t an easy way to distinguish between different types of errors without parsing error messages.

What I Added

In PR #7134, I added four new strategic condition classes:

  • dt_missing_column_error
  • dt_unsortable_type_error
  • dt_join_type_mismatch_error
  • dt_invalid_input_error

How to Use It

Now users can catch specific error types like this:

tryCatch({
  some_data_table_code()
}, dt_missing_column_error = function(e) {
  message("A specific column was missing!")
})

This builds on the infrastructure introduced in an earlier PR with raise_condition(). And just like we did before, I added a dedicated help page:

?datatable-condition-classes

Documentation and Linter

A new linter under .ci/linters/rd/ ensures all condition classes are properly documented — same idea as what we did for data.table-options earlier. This sets things up so that as new error classes get added in the future, the docs and code will stay in sync automatically.

Fixing Long-Standing Help Text Confusion with with=

Finally, I tackled a smaller but meaningful documentation improvement in PR #7155.

The help page for data.table() had a confusing statement that said:

“x[, cols] is equivalent to x[, ..cols]…”

That’s not actually true.

  • x[, cols] literally looks for a column called "cols" in data.table.
  • Only x[, ..cols] or x[, cols, with=FALSE] correctly look up a variable cols in the parent environment.

What I Changed

So I rewrote that section of the help text to:

  • Remove the incorrect equivalence claim.
  • Add clearer, up-to-date examples showing how to handle dynamic column selection in data.table.

Even though it’s not a code change, keeping documentation clear and accurate is just as important, especially for a widely used package like data.table.

Closing Thoughts

Weeks 5–6 felt really productive. Instead of just fixing one-off issues, I got to work on things that improve the overall developer experience for data.table — from smarter error handling to clearer documentation to smoother named vector behavior.

Next up, I’m looking forward to tackling some deeper technical tasks. Thanks for reading and following along!

— Mukul

Share: LinkedIn