nomiShapeData can be measured on different scales, which fundamentally affects how they can be analyzed and visualized (Table 1). Four commonly recognized measurement scales are nominal, ordinal, interval, and ratio. Variables measured on continuous scales can take any value within a range and are often modeled using continuous probability distributions, whereas variables with a finite set of possible values follow discrete distributions.
Among discrete and qualitative variables, nominal variables are unique in that they classify observations into categories without any inherent order, ranking, or numerical meaning. Nominal categories indicate membership only: an observation either belongs to a category or it does not. No information about magnitude, distance, or direction is implied. Common examples of nominal variables include species identities in an ecological community, political attitudes or party affiliation in social surveys, behavioral categories in ethological or psychological studies (e.g. play, aggression, vigilance), word types in a linguistic corpus, or thematic codes in qualitative research.
Although nominal variables lack intrinsic numeric structure, the
frequency with which categories occur provides rich
information about the organization of the system under study. Count data
derived from nominal variables can reveal patterns of dominance, rarity,
symmetry, and tail structure—features that are rarely formalized but are
often visually apparent. The nomiShape package is designed
to make these distributional properties explicit by combining centered
visualizations with quantitative indices and model-based comparisons
tailored specifically to nominal data.
Table 1. Summary of Nominal Data Characteristics and
Visualization and Analysis Tools in the nomiShape
package
| Concept | Description |
|---|---|
| Variable Type | Nominal (categorical, unordered) |
| Core Properties | Discrete categories with no intrinsic order or numeric meaning |
| Typical Examples | Species in a biological community; political attitudes (e.g. conservative, liberal, undecided); behavioral categories (e.g. play, aggression, grooming); word types in a text corpus; qualitative themes or codes |
| What Can Be Counted | Frequencies, proportions, dominance, rarity |
| What Cannot Be Computed | Means, medians, variances, distances, or ranks derived from numeric magnitude |
| Common Visualizations | Standard bar plots (unordered or frequency-ranked) |
| Often-Ignored Distributional Structure | Dominance, symmetry, central concentration, tail heaviness |
| Main Analytical Challenge | Distributional “shape” exists but is difficult to formalize for nominal data |
Visual Tools in nomiShape |
Centered Bar Plot, Centered Dot Plot, Ranked Bar Plot, Ranked Dot Plot, Pareto Chart |
Analytical Tools in nomiShape |
Pielou’s evenness, Dominance index, Central concentration, Tail index |
| Model-Based Shape Comparison | AIC-based comparison of uniform, triangular, normal-like, and exponential (Pareto-like) shapes |
| Design Philosophy | Reveal latent distributional structure visually (via centering and ranking), then formalize it analytically |
Handling nominal (categorical) data is an essential part of data analysis. Almost every data science project involves working with such variables, and students and practitioners alike should know how to store, summarize, visualize, and manipulate them. Traditional visualizations of nominal variables often use unordered bar plots or frequency-sorted bar plots (from high to low), which emphasize category counts but rarely provide insight into distributional structure. As a result, concepts like symmetry, skewness, dominance, or tail behaviour—commonly discussed for numerical variables—are seldom considered for nominal data. However, exceptions include Pareto charts and other ranked visualizations, which can highlight the “vital few” categories following the 80:20 rule or reveal long-tailed distributions, such as rank-abundance plots in ecology where typically most species are relatively rare and a few are common. These visualizations allow insights into categorical dominance and rarity patterns even for nominal variables.
The nomiShape package is designed to further explore the
shape of nominal distributions. It offers multiple plotting functions,
including classic visualizations such as Pareto charts and ranked bar
plots, as well as novel centered bar and dot plots. These functions help
users understand frequency structures, dominance patterns, and
distributional characteristics of nominal variables, facilitating more
nuanced analysis of categorical data.
This vignette demonstrates how to visualize and analyze the
distributions of nominal variables using various plotting functions
provided by the nomiShape package. We will explore centered
bar plots, ranked bar plots, centered dot plots, and ranked dot
plots.
Ranked bar plots order categories from the most frequent to the least frequent, providing a clear view of category dominance and distribution.
Ranked dot plots display categories as points ordered from the most frequent to the least frequent, allowing for easy comparison of category frequencies.
# Example usage of ranked_dotplot
ranked_dotplot(categories2, "animal", connect = TRUE, shade = TRUE)# Example usage of ranked_dotplot
ranked_dotplot(categories3, "animal", connect = FALSE, shade = TRUE)Pareto charts combine bar plots and line graphs to highlight the most significant categories in a nominal variable. They help identify the “vital few” categories that contribute most to the overall distribution.
# Example usage of pareto
pareto(categories3, "animal")
#> Category Freq cumulative cumulative_percentage
#> 1 Sea sponge 110 110 44.0
#> 2 Starfish 75 185 74.0
#> 3 Octopus 20 205 82.0
#> 4 Crab 12 217 86.8
#> 5 Squirrel 9 226 90.4
#> 6 Copepod 7 233 93.2
#> 7 Snail 6 239 95.6
#> 8 Pufferfish 5 244 97.6
#> 9 Whale 3 247 98.8
#> 10 Lobster 2 249 99.6
#> 11 Sea god 1 250 100.0Centered bar plots arrange categories symmetrically around the center, with the most frequent categories in the middle and less frequent ones towards the edges. This layout helps to visualize the distribution shape effectively.
Centered dot plots display categories as points arranged symmetrically around the center, with the most frequent categories in the middle. Optionally, points can be connected with lines to highlight trends.
# Example usage of centered_dotplot
centered_dotplot(categories, "animal",connect = TRUE,shade = TRUE)# Example usage of centered_dotplot
centered_dotplot(categories2, "animal",connect = TRUE,shade = TRUE)# Example usage of centered_dotplot
centered_dotplot(categories3, "animal",connect = TRUE,shade = TRUE)Pielou’s evenness quantifies how evenly individuals are distributed across categories in a nominal variable.
The dominance index quantifies the degree to which a few categories dominate the distribution of a nominal variable.
The central concentration quantifies how concentrated the distribution of a nominal variable is around its most frequent categories.
The tail index quantifies the proportion of categories contributing to the lower part of the distribution, useful for identifying long-tail structures in nominal data. By default, it uses a threshold of 0.8, following the Pareto principle, but this can be adjusted as needed.
The shape_comp_plot function allows users to visualize
common theoretical distribution shapes (uniform, triangular,
normal-like, and exponential/Pareto-like) for nominal variables in
comparison with the observed distribution. This helps in understanding
how different distributions appear when plotted.
The shape_aic function computes the Akaike Information
Criterion (AIC) for different theoretical shape models fitted to the
distribution of a nominal variable. This allows users to quantitatively
compare how well each model fits the observed data.
# Example usage of shape_aic
shape_aic(categories, "animal")
#> Shape AIC DeltaAIC
#> 1 Uniform 1198.948 0.00000
#> 2 Triangular 1267.201 68.25385
#> 3 Exponential 1994.144 795.19640
#> 4 Normal 2897.166 1698.21859