
phinterval is a package for representing and manipulating time spans
that may contain gaps. It implements the <phinterval>
(think “potentially-holey-interval”) vector class, designed as an
extension of the {lubridate}
<Interval>, to represent spans of time that are
contiguous, disjoint, empty, or missing.
Functionality for manipulating these spans includes:
Install the released version from CRAN with:
install.packages("phinterval")You can install the development version of phinterval from GitHub with:
# install.packages("pak")
pak::pak("EthanSansom/phinterval")Each element of a <phinterval> vector is a set of
non-overlapping and non-adjacent intervals. For scalar intervals (one
span per element), phinterval() works like
lubridate::interval():
library(phinterval)
library(lubridate, warn.conflicts = FALSE)
# Create scalar phintervals (equivalent to interval())
phinterval(
start = ymd(c("2000-01-01", "2000-01-03", "2000-01-04")),
end = ymd(c("2000-01-02", "2000-01-05", "2000-01-09"))
)
#> <phinterval<UTC>[3]>
#> [1] {2000-01-01--2000-01-02} {2000-01-03--2000-01-05} {2000-01-04--2000-01-09}To create phintervals with multiple disjoint spans per element, use
the by argument to group intervals. Overlapping or adjacent
spans within each group are automatically merged:
# Create a phinterval with disjoint spans using the by argument
phint <- phinterval(
start = ymd(c("2000-01-03", "2000-01-01", "2000-01-04")),
end = ymd(c("2000-01-05", "2000-01-02", "2000-01-09")),
by = c(1, 2, 2)
)
phint
#> <phinterval<UTC>[2]>
#> [1] {2000-01-03--2000-01-05}
#> [2] {2000-01-01--2000-01-02, 2000-01-04--2000-01-09}Graphically, the elements of phint are represented
as:
In most cases, a <phinterval> vector will appear
as the result of manipulating <Interval> vectors. For
example, phint_squash() flattens a vector of time spans
into a scalar <phinterval>.
jan_1_to_9 <- interval(ymd("2000-01-01"), ymd("2000-01-09"))
jan_1_to_2 <- interval(ymd("2000-01-01"), ymd("2000-01-02"))
jan_3_to_5 <- interval(ymd("2000-01-03"), ymd("2000-01-05"))
jan_4_to_9 <- interval(ymd("2000-01-04"), ymd("2000-01-09"))
ints <- c(jan_1_to_2, jan_3_to_5, jan_4_to_9)
phint_squash(ints)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-02, 2000-01-03--2000-01-09}The squashed intervals contain the set of time spans within any of the input intervals, without duplication.
The phinterval package is most useful when working with tabular data, such as a longitudinal employment panel.
library(dplyr, warn.conflicts = FALSE)
jobs <- tribble(
~name, ~job_title, ~start, ~end,
"Greg", "Mascot", "2018-01-01", "2018-06-03",
"Greg", "Executive Assistant", "2018-06-10", "2020-04-01",
"Shiv", "Political Consultant", "2017-01-01", "2019-04-01"
)
employment <- jobs |>
# Squash overlapping/adjacent intervals into a single phinterval
group_by(name) |>
summarize(employed = datetime_squash(ymd(start), ymd(end))) |>
# Invert the employment timeline to find gaps
mutate(unemployed = phint_invert(employed))
employment
#> # A tibble: 2 × 3
#> name employed unemployed
#> <chr> <phint<UTC>> <phint<UTC>>
#> 1 Greg {2018-01-01-[2]-2020-04-01} {2018-06-03--2018-06-10}
#> 2 Shiv {2017-01-01--2019-04-01} <hole><phinterval> column formatting adapts to the
available console width. The "[2]" in Greg’s employment
interval "{2018-01-01-[2]-2020-04-01}" indicates that his
employment history is made up of two disjoint spans, with the first span
beginning on 2018-01-01 and the second ending on 2020-04-01. When more
space is available, every span is shown explicitly.
employment |> select(name, employed)
#> # A tibble: 2 × 2
#> name employed
#> <chr> <phint<UTC>>
#> 1 Greg {2018-01-01--2018-06-03, 2018-06-10--2020-04-01}
#> 2 Shiv {2017-01-01--2019-04-01}Operations on <phinterval> vectors behave like
those on standard intervals. Here, we can see that there was a 7-day gap
in Greg’s employment history:
employment |>
mutate(
days_employed = employed / ddays(1),
days_unemployed = unemployed / ddays(1)
) |>
select(name, days_employed, days_unemployed)
#> # A tibble: 2 × 3
#> name days_employed days_unemployed
#> <chr> <dbl> <dbl>
#> 1 Greg 814 7
#> 2 Shiv 820 0The <phinterval> class is a generalization of the
<Interval> class, meaning any
<Interval> can be converted into an equivalent
<phinterval> and all phinterval functions accept
either <Interval> or <phinterval>
inputs. The table below shows the lubridate functions that have drop-in
phinterval replacements.
| phinterval | lubridate | Returns |
|---|---|---|
phinterval(start, end) |
interval(start, end) |
Spans bounded by start/end |
phint_intersect(x, y) |
intersect(x, y) |
Times in x and y |
phint_setdiff(x, y) |
setdiff(x, y) |
Times in x, but not in y |
phint_union(x, y) |
union(x, y) |
Times in x or y |
phint_start(x) |
int_start(x) |
The start time of x |
phint_end(x) |
int_end(x) |
The end time of x |
phint_length(x) |
int_length(x) |
The number of seconds in x |
phint_overlaps(x, y) |
int_overlaps(x, y) |
Whether x and y intersect |
phint_within(x, y) |
x %within% y |
Whether y contains x |
x / duration(...) |
x / duration(...) |
How many durations fit in x |
All phinterval set operations work as expected with arbitrary time
spans, enabling operations that are not supported by lubridate. For
example, the intersection of two non-overlapping intervals is an empty
time span, called a <hole>.
lubridate::intersect(jan_1_to_2, jan_4_to_9)
#> [1] NA--NA
phint_intersect(jan_1_to_2, jan_4_to_9)
#> <phinterval<UTC>[1]>
#> [1] <hole>The set-difference of a time span and itself is also a
<hole>.
lubridate::setdiff(jan_1_to_2, jan_1_to_2)
#> [1] 2000-01-01 UTC--2000-01-02 UTC
phint_setdiff(jan_1_to_2, jan_1_to_2)
#> <phinterval<UTC>[1]>
#> [1] <hole>Performing a set-difference may “punch a hole” in a time span, creating a discontinuous interval.
try(lubridate::setdiff(jan_1_to_9, jan_3_to_5))
#> Error in setdiff.Interval(jan_1_to_9, jan_3_to_5) :
#> Cases 1 result in discontinuous intervals.
phint_setdiff(jan_1_to_9, jan_3_to_5)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-03, 2000-01-05--2000-01-09}The union of two disjoint intervals is a single
<phinterval> containing two spans.
lubridate::union(jan_1_to_2, jan_4_to_9)
#> [1] 2000-01-01 UTC--2000-01-09 UTC
phint_union(jan_1_to_2, jan_4_to_9)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-02, 2000-01-04--2000-01-09}As with the lubridate equivalents, all phinterval set operations are vectorized.
phint_intersect(
c(jan_1_to_2, jan_3_to_5, jan_1_to_2),
c(jan_1_to_9, jan_4_to_9, jan_4_to_9)
)
#> <phinterval<UTC>[3]>
#> [1] {2000-01-01--2000-01-02} {2000-01-04--2000-01-05} <hole>This package builds on {lubridate}’s
<Interval> class for representing contiguous time
spans. The prototype <phinterval> data structure (a
list of matrices) and the C++ implementation of
phint_squash() were inspired by the {intervals}
package by Richard Bourgon and Edzer Pebesma. The figures used in this
README were inspired by Davis Vaughan’s {ivs} package
documentation.