Skip to contents

Plot a histogram chart

Usage

plot_histogram(
  data,
  x,
  color = "#FFFFFF",
  fill = NULL,
  pal_name = "qual_benvi",
  scale_name = "",
  zero = TRUE,
  bins = NULL,
  method = "fd",
  density = FALSE,
  facet = FALSE,
  ...
)

Arguments

data

A data.frame type object

x

<data-masked> Indicates the numeric variable to be mapped

color

Color of the column border. Defaults to "#FFFFFF" (white).

fill

Fill color for the columns. Either a color string (e.g., "blue", "#021841") for a single static color, or a bare column name (without quotes) to map a grouping variable to fill color.

pal_name

String indicating the name of which palette to use when fill is a variable mapping.

scale_name

String indicating fill legend title.

zero

Logical indicating if a horizontal (y = 0) line should be drawn on the plot.

bins

Number of bins. When specified, overrides method.

method

Character specifying the binning algorithm. Must be one of: "fd" (default), "FD", "Scott", "Sturges", "Rice", or "sqrt". See Details for algorithm descriptions. Ignored when bins is specified.

density

Logical indicating if density should be plotted on y-axis.

facet

<data-masked> Optional variable to facet the graphics.

...

Additional parameters to facet_wrap()

Value

A ggplot2 object

Details

Binning Methods

The method parameter controls which algorithm is used to compute the optimal bin width. Available methods:

"fd" or "FD"

Freedman-Diaconis rule (default). Robust to outliers, uses IQR. Formula: \(2 * IQR / n^{1/3}\). Best for most distributions.

"Scott"

Scott's rule. Uses standard deviation. Formula: \(3 * sd / n^{1/3}\). Works well for normal-like distributions.

"Sturges"

Sturges' formula. Simple logarithmic rule. Formula: \(k = \lceil log_2(n) \rceil\) bins. Good for roughly normal data.

"Rice"

Rice rule. Cube root based. Formula: \(k = \lceil 2n^{1/3} \rceil\) bins. General purpose rule.

"sqrt"

Square root rule. Formula: \(k = \lceil \sqrt{n} \rceil\) bins. Simple, tends to oversmooth.

When in doubt, use the default "fd" (Freedman-Diaconis), which is robust and works well across different distributions.

Examples

set.seed(5)
tbl <- data.frame(x = rnorm(n = 1000))

# Default parameters use Freedman-Diaconis
plot_histogram(data = tbl, x = x)

# Use bins to manually choose number of bins
plot_histogram(data = tbl, x = x, bins = 50)

# Example of alternative methods: square root and Rice
plot_histogram(data = tbl, x = x, method = "sqrt")

plot_histogram(data = tbl, x = x, method = "Rice")


# Facet by rooms category
spo <- subset(iqaiw, name_muni == "S\u00e3o Paulo" & rooms != "Total")
plot_histogram(data = spo, x = index, facet = rooms)