Skip to contents

realestatebr 0.6.1

CRAN Submission Fixes

  • Upgraded HTTP URLs to HTTPS in get_secovi.R
  • Added URL field to DESCRIPTION
  • Fixed incomplete @source tag for dim_city dataset documentation
  • Fixed typo in b3_real_estate documentation (“mian” -> “main”)
  • Removed local skip_on_cran() definition that shadowed testthat
  • Updated stale description for CBIC dataset in registry

realestatebr 0.6.0 (2025-11-09)

Cache Freshness Detection (Phase 2)

Version 0.6.0 introduces an intelligent cache freshness detection system with relaxed defaults to avoid annoying users with unnecessary warnings.

New Features

Cache Age Tracking
Relaxed Warning Thresholds

Cache warnings only appear when data is significantly stale (exceeds 2x the update frequency): - Weekly datasets: warn after 14 days (not 7) - Monthly datasets: warn after 60 days (not 30) - Manual datasets: never warn

Advanced Cache Control
  • max_age parameter in get_dataset(): Force fresh download if cache exceeds specified age
  • Useful for users who need very recent data
  • Most users won’t need this - relaxed defaults handle typical use cases
Registry Enhancements

All datasets in inst/extdata/datasets.yaml now include: - update_schedule: “weekly”, “monthly”, or “manual” - warn_after_days: Custom threshold for staleness warnings (NULL for manual datasets)

Examples

# Check status of all cached datasets
check_cache_status()

# Get age of specific dataset
get_cache_age("bcb_series")

# Check if dataset is stale (uses relaxed defaults from registry)
is_cache_stale("bcb_series")

# Advanced: Force very fresh data (< 1 day old)
get_dataset("bcb_series", max_age = 1)

# Advanced: Only use cache if less than 3 days old
get_dataset("rppi", table = "sale", max_age = 3)

Tests

  • Added 8 comprehensive tests for cache freshness system
  • All tests passing (41 total cache tests)
  • Validated relaxed thresholds and max_age behavior

Code Simplification: Logic Consolidation (Phase 3)

Generic Helper Functions

Version 0.6.0 introduces 7 generic helper functions that consolidate 890 lines of repetitive code patterns across dataset functions.

What Changed
  • Created: R/helpers-dataset.R with 6 new helper functions (430 lines)
  • Refactored: 5 core files using new helpers (417 lines removed, 15.4% reduction)
  • Added: apply_table_filtering() in R/get-dataset.R (95 lines, eliminates 156 lines of duplication)
  • Added: 52 comprehensive tests for all helper functions
  • Improved: Consistent error messages and metadata across all datasets
File-by-File Results
File Before After Lines Saved % Reduction
get_abecip_indicators.R 551 431 120 21.8%
get_abrainc_indicators.R 544 445 99 18.2%
get_secovi.R 438 356 82 18.7%
get_bcb_series.R 334 278 56 16.8%
get-dataset.R 833 773 60 7.2%
TOTAL 2,700 2,283 417 15.4%
New Helper Functions
  1. validate_dataset_params() (R/helpers-dataset.R)
    • Consolidated input validation for table, cached, quiet, max_retries parameters
    • Ensures consistent error messages across all datasets
    • Saves ~28 lines per file
  2. handle_dataset_cache() (R/helpers-dataset.R)
    • Unified cache loading with fallback strategies
    • Consistent error handling and user messages
    • Saves ~35-50 lines per file
  3. attach_dataset_metadata() (R/helpers-dataset.R)
    • Standardized metadata attachment (source, download_time, download_info)
    • Flexible extra_info parameter for dataset-specific metadata
    • Saves ~8-16 lines per file
  4. validate_dataset() (R/helpers-dataset.R)
    • Generic data validation (rows, columns, dates)
    • Configurable validation rules with detailed error messages
    • Saves ~44 lines per file
  5. validate_excel_file() (R/helpers-dataset.R)
    • Excel file validation (size, expected sheets)
    • Used by abrainc and abecip functions
    • Prevents silent failures
  6. download_with_retry() (R/rppi-helpers.R - REUSED)
    • Found existing implementation, avoided duplication
    • Saves ~46-86 lines per file that would have been duplicated
  7. apply_table_filtering() (R/get-dataset.R)
    • Centralizes all table/category filtering logic
    • Supports property_records, SECOVI, BCB Real Estate, BCB Series
    • Eliminates 156 lines of duplication between cache functions
Impact

Code Quality: - DRY principle applied - eliminated 890 lines of code duplication - Single source of truth for common operations - Changes to validation logic now require 1 edit instead of 7

Maintainability: - Helper functions well-documented with roxygen2 - 52 comprehensive tests ensure quality - Clear separation of concerns

Consistency: - Uniform error messages across all datasets - Standardized parameter validation - Consistent metadata structure

Testing: - Helper function tests: 52 tests (100% passing) - Integration tests: 100 tests, 99 passing (1 pre-existing failure) - Full test suite: 253 tests, 248 passing (98.0%) - 3 failures: expected error message format changes - 2 failures: incomplete datasets under development

Files Changed
  • New: R/helpers-dataset.R (430 lines, 6 helpers, 52 tests)
  • Updated: R/get_abecip_indicators.R (21.8% reduction)
  • Updated: R/get_abrainc_indicators.R (18.2% reduction)
  • Updated: R/get_secovi.R (18.7% reduction)
  • Updated: R/get_bcb_series.R (16.8% reduction)
  • Updated: R/get-dataset.R (7.2% reduction, added apply_table_filtering())
Rationale
  • Simplification: Reduce codebase complexity and maintenance burden
  • Consistency: Ensure uniform behavior across all dataset functions
  • DRY: Follow “Don’t Repeat Yourself” principle
  • Testing: Well-tested helpers prevent regressions

See .claude/phase3_completion_summary.md for complete details.


BREAKING CHANGES: API Simplification (Phase 2)

Removed Deprecated Function Exports

Version 0.6.0 removes 8 deprecated functions from the public API. These functions are now internal-only. Since we are pre-1.0.0, this is an acceptable breaking change.

What Changed
Impact
  • Users must migrate: These functions can no longer be called directly
  • get_dataset() is the only supported API: All data access must go through get_dataset()
  • Cleaner public interface: Package exports only essential user-facing functions
Migration

These functions were deprecated in v0.4.0 (18+ months ago). Users must now use get_dataset():

# Old way (NO LONGER WORKS):
data <- get_secovi()
data <- get_bcb_series(table = "price")
data <- get_abecip_indicators(table = "sbpe")

# New way (REQUIRED):
data <- get_dataset("secovi")
data <- get_dataset("bcb_series", "price")
data <- get_dataset("abecip", "sbpe")
Rationale
  • Simpler API: One function (get_dataset()) instead of 15+
  • Reduced maintenance: Fewer exported functions to document and test
  • Pre-1.0.0 flexibility: Breaking changes acceptable before stable release
  • 18-month deprecation period: Functions were deprecated since v0.4.0

Code Clarity Improvements

Renamed Confusing “Legacy” Terminology
  • Renamed: get_from_legacy_function()get_from_internal_function()
  • Rationale: These functions call internal worker functions, not “legacy” code
  • Impact: Internal only - no user-facing changes

Files changed: R/get-dataset.R

CBIC Code Simplification and Table Availability

Code Reduction (~223 lines, 11% smaller)
  • Removed: 100+ lines of commented-out old implementation code
  • Removed: 4 unused helper functions (124 lines total):
    • suppress_external_warnings() - Never called
    • explore_cbic_structure() - Only in examples
    • get_cbic_files() - Only in examples
    • get_cbic_materials() - Only in examples
  • Removed: Unnecessary metadata attributes from get_cbic_steel() and get_cbic_pim():
    • attr(result, "source")
    • attr(result, "download_time")
    • attr(result, "download_info")
    • Associated tryCatch blocks and cli_user messages (~69 lines)
Table Availability Fixed
  • Unblocked: steel_prices and pim tables now accessible
  • Blocked: Only steel_production remains blocked (has data quality issues)
  • Updated: Error messages now accurately reflect v0.6.0 status
  • Available tables:
    • Cement: cement_monthly_consumption, cement_annual_consumption, cement_production_exports, cement_monthly_production, cement_cub_prices
    • Steel: steel_prices
    • PIM: pim, pim_production_index
Rationale
  • Simplification: Removed dead code and unused functions for better maintainability
  • Accuracy: Updated table availability to reflect actual working status
  • User experience: Clear error messages guide users to available tables

Files changed: R/get_cbic.R


BREAKING CHANGES: Documentation Simplification (Phase 1)

Removed Examples from Deprecated Functions

Version 0.6.0 removes usage examples from deprecated legacy functions to simplify the codebase. Since we are pre-1.0.0, this is an acceptable breaking change.

What Changed
Impact
  • ~260-290 lines of documentation removed
  • Documentation now focuses on migration guidance rather than usage examples
  • All functions still exported and callable (no functionality changes)
  • Help pages now emphasize using get_dataset() instead
Migration

These functions were deprecated in v0.4.0. Users should migrate to the modern API:

# Old way (still works, but no longer documented with examples):
data <- get_secovi()

# New way (recommended):
data <- get_dataset("secovi")

Full migration examples are available in each function’s @section Deprecation block.

Rationale
  • Pre-1.0.0: Breaking changes are acceptable before stable release
  • Codebase simplification: Reduces maintenance burden and package size
  • Focus on modern API: Encourages users to adopt get_dataset() interface
  • Clear migration path: Enhanced deprecation warnings guide users to new API

realestatebr 0.5.1

Bug Fixes

SECOVI Default Table Fix

Fixed SECOVI dataset to return all categories by default instead of only “condo”

  • Problem: get_dataset("secovi") was only returning the “condo” category (1,939 rows) instead of all categories (9,398 rows). This caused test failures for launch/rent/sale tables.

  • Root Cause: When no table parameter was specified, the code defaulted to the first category alphabetically (“condo”), rather than fetching all categories.

  • Solution:

    • Added default_table configuration support in datasets.yaml
    • Updated validate_and_resolve_table() to check for default_table setting
    • Set SECOVI’s default_table: "all" in registry
    • Regenerated cache with all 4 categories
  • Impact:

    • Cache size: 12KB → 55KB (includes all categories)
    • Data completeness: 1,939 → 9,398 rows
    • Categories: condo (1,939), launch (780), rent (2,779), sale (3,900)
# Now returns all categories by default
get_dataset("secovi")  # → 9,398 rows, 4 categories ✅

# Specific tables still work correctly
get_dataset("secovi", "launch")  # → 780 rows
get_dataset("secovi", "rent")    # → 2,779 rows
get_dataset("secovi", "sale")    # → 3,900 rows

Test Infrastructure Improvements

  • Updated test suite to use devtools::load_all() instead of library() to ensure testing of development version
  • Added comprehensive pre-release test suite (tests/comprehensive_check_v0.5.qmd)
  • Added test result documentation (tests/TEST_RESULTS_SUMMARY.md, tests/QUICK_SUMMARY.md)

Pipeline Improvements

  • Updated _targets.R to always load development version for consistency
  • Ensures targets pipeline uses latest code during cache regeneration

realestatebr 0.5.0

BREAKING CHANGES: User-Level Caching Architecture

Major Architectural Change

Version 0.5.0 introduces user-level caching, removing bundled datasets from the package to comply with CRAN’s 5MB size limit. This is a BREAKING CHANGE that affects how datasets are accessed.

What Changed

  • Removed: All cached datasets from inst/cached_data/ (previously ~25MB)
  • Added: User-level cache directory at ~/.local/share/realestatebr/ (Linux/Mac) or %LOCALAPPDATA%/realestatebr/Cache/ (Windows)
  • Added: GitHub Releases integration for pre-processed datasets
  • Changed: source="cache" now refers to user cache, not package cache
  • Changed: source="github" now downloads from GitHub releases, not package files

New Cache Behavior

# First use: downloads from GitHub releases to user cache
data <- get_dataset("abecip")  # Downloads once

# Subsequent uses: loads from user cache (instant, offline)
data <- get_dataset("abecip")  # Loads from ~/.local/share/realestatebr/

# Force fresh download from original source
data <- get_dataset("abecip", source = "fresh")  # Downloads and caches

# Explicit source selection
data <- get_dataset("abecip", source = "cache")   # User cache only
data <- get_dataset("abecip", source = "github")  # GitHub releases only

Auto Fallback Strategy (source = “auto”, default)

  1. User Cache: Check ~/.local/share/realestatebr/ (instant, offline)
  2. GitHub Releases: Download pre-processed data (requires piggyback package)
  3. Fresh Download: Download from original source (saves to user cache)

New Dependencies

  • Added: rappdirs (Imports) - Cross-platform user cache directory support
  • Added: piggyback (Suggests) - GitHub releases download support

New Functions

Migration Guide

For Users
# Install updated package
install.packages("realestatebr")  # or devtools::install_github()

# Install piggyback for GitHub downloads (recommended)
install.packages("piggyback")

# First use after update: will download datasets to user cache
data <- get_dataset("abecip")

# Check cache location
get_user_cache_dir()

# Manage cache
list_cached_files()           # See what's cached
clear_user_cache("abecip")    # Clear specific dataset
clear_user_cache()            # Clear all (with confirmation)
For Package Developers
  • Cached data files now excluded from package build via .Rbuildignore
  • Package size reduced from ~25MB to <5MB (CRAN compliant)
  • inst/cached_data/ kept for development/CI but excluded from distribution
  • GitHub Actions workflow publishes cache to releases via data-raw/publish-cache.R

Benefits

  • CRAN Compliant: Package size now <5MB (was 25MB)
  • Faster Installation: Package downloads are much smaller
  • Offline Usage: Once cached, datasets work offline
  • User Control: Users manage their own cache
  • Weekly Updates: GitHub releases updated automatically by CI
  • No Breaking APIs: get_dataset() interface unchanged

Deprecations

  • import_cached(): Still works but now loads from user cache (previously from inst/)
  • Old cached=TRUE parameter in legacy functions: Still supported but uses new cache

Files Changed

  • New: R/cache-user.R - User cache management
  • New: R/cache-github.R - GitHub releases integration
  • New: data-raw/publish-cache.R - Upload cache to releases
  • Updated: R/get-dataset.R - Refactored cache logic
  • Updated: R/cache.R - Marked as deprecated (kept for compatibility)
  • Updated: .Rbuildignore - Exclude inst/cached_data/ files
  • Updated: DESCRIPTION - Added rappdirs and piggyback dependencies

Targets Pipeline Fixes

Critical Pipeline Functionality

  • Fixed: Targets pipeline now fully functional for automated data updates
  • Fixed: FGV IBRE and NRE-IRE datasets now work correctly in targets pipeline
    • Changed from source="fresh" to source="github" for manually-updated datasets
    • These datasets have no API/download capability and require manual updates
  • Fixed: Removed broken internal data object fallback in get_fgv_ibre() and get_nre_ire()
    • Previously tried to access non-existent fgv_data and ire objects from R/sysdata.rda
    • Now provides clear error messages when fresh downloads are attempted with cached=FALSE

Enhanced Dataset Registry

  • Added: manual_update flag to datasets.yaml for FGV IBRE and NRE-IRE
  • Added: update_notes field documenting why fresh downloads aren’t available
  • Improved: Clear documentation in _targets.R explaining data source choices

Files Changed

  • _targets.R: Updated fetch_dataset() to support source parameter; FGV and NRE-IRE now use source="github"
  • R/get_fgv_ibre.R: Removed broken internal data fallback; added clear error for fresh downloads
  • R/get_nre_ire.R: Removed broken internal data fallback; added clear error for fresh downloads
  • inst/extdata/datasets.yaml: Added manual update flags and notes

Bug Fixes from Recent Commits

Property Records Simplification (Commit 9eab0ca)

  • Refactored: Major simplification of get_property_records.R (14% code reduction: 780→673 lines)
  • Removed: Deprecated functions get_ri_capitals() and get_ri_aggregates() with warning messages
  • Removed: Unused metadata attributes (source, download_time, download_info) that were never used
  • Simplified: Documentation for internal function (removed verbose examples and sections)
  • Improved: scrape_registro_imoveis_links() with better connection cleanup and reduced complexity

BCB Dataset Critical Fixes (Commit bb580c8)

BCB Real Estate
  • Fixed: CLI message serialization error in targets pipeline
  • Fixed: Compute nrow() before CLI interpolation to avoid closure issues
BCB Series - Graceful Degradation (CRITICAL)
  • Fixed: Replaced batch download with individual series downloads for better reliability
  • Fixed: Now returns successful series even if some fail (e.g., 14/15 instead of 0/15)
  • Added: Per-series retry logic with exponential backoff using purrr::possibly() pattern
  • Added: Clear warnings showing which series failed
  • Restored: Commented-out table filtering logic - now filters by bcb_category when table specified
  • Improved: Metadata-driven approach using bcb_metadata dynamically (now downloads all 140 series, not just 15)
Get Dataset Infrastructure
  • Fixed: BCB Real Estate table filtering by category in get-dataset.R
  • Fixed: BCB Series table filtering by bcb_category
  • Added: Support for table="all" in validate_and_resolve_table() function
  • Fixed: Proper mapping of user-facing table names to internal Portuguese categories
Registry and Tests
  • Updated: bcb_series categories in datasets.yaml to match metadata
  • Added: Missing categories: production, interest-rate, exchange, government, real-estate
  • Added: Integration tests for BCB table filtering and graceful degradation
  • Result: All 97 integration tests now pass

Get Dataset Critical Fixes (Commit ce4768b)

CLI Message Scoping
  • Fixed: Added .envir = parent.frame() to cli::cli_inform() calls in cli_user() and cli_debug()
  • Fixed: “cannot coerce type ‘closure’ to vector of type ‘character’” error
  • Affected: Previously failed for rppi_bis, property_records, and all functions using these helpers
FipeZap Data Quality
  • Fixed: Added standardize_city_names() call after binding FipeZap data
  • Fixed: Now correctly shows “Brazil” instead of “Índice Fipezap” for national index
Property Records Table Extraction
  • Fixed: Added special handling for nested property_records structure in get-dataset.R
  • Fixed: Now returns single tibbles instead of nested lists
  • Fixed: All tables (capitals, cities, aggregates, transfers) now work correctly
Testing Infrastructure
  • Added: Comprehensive integration test suite with 37 tests covering critical get_dataset() functionality
  • Added: Tests with source="fresh" to catch real-world failures before production
  • Added: GitHub Actions CI workflow for weekly integration tests
  • Added: Manual testing script tests/basic_checks.R for development

Note on Vignettes

  • Vignettes temporarily set to eval=FALSE for faster development
  • TODO: Re-enable vignette evaluation before CRAN release

realestatebr 0.4.1

Bug Fixes

RPPI Individual Table Access

  • Fixed: get_dataset("rppi", "ivgr") and other individual RPPI tables now work correctly
  • Fixed: Vignette build errors caused by RPPI table routing issues
  • Improved: Internal get_rppi() function now supports all individual RPPI tables (fipezap, ivgr, igmi, iqa, iqaiw, ivar, secovi_sp) in addition to stacked tables (sale, rent, all)

CRAN Compliance

  • Fixed: Removed all non-ASCII characters from R source files (7 files affected)
  • Replaced Portuguese characters with Unicode escapes for CRAN compliance
  • Files updated: get_bcb_realestate.R, get_cbic.R, get_fgv_ibre.R, get_property_records.R, get_rppi.R, get_rppi_bis.R, get_secovi.R

Test Suite

  • Fixed: Updated deprecated category= parameter to table= in tests/sanity_check.R

realestatebr 0.4.0

Major Breaking Changes - API Consolidation

🎯 Unified Data Interface

This release implements a major breaking change that consolidates 15+ individual get_*() functions into a single, unified get_dataset() interface. This dramatically simplifies the package API while maintaining full functionality.

BREAKING CHANGE: All individual get_*() functions have been removed: - get_abecip_indicators(), get_abrainc_indicators(), get_rppi(), get_bcb_realestate(), etc. - Migration: Use get_dataset("dataset_name") instead

🔧 RPPI Code Simplification (Internal)

Major refactoring of RPPI functions for better maintainability: - 67% code reduction: 1579 lines → 519 lines (1060 lines removed) - Bug fix: FipeZap national index now correctly standardized to name_muni == "Brazil" - Shared helpers: Created rppi-helpers.R with common functions to eliminate duplication - Removed overhead: Eliminated unused stack parameter, cli_debug calls, and metadata attributes - Simplified documentation: Removed verbose sections (Progress Reporting, Error Handling, Examples) from internal functions - All functions now @keywords internal: Only get_dataset() is user-facing

Benefits: - Easier to maintain and debug - Faster execution (less overhead) - Consistent error handling across all indices - Bug fixes apply to all functions automatically

📊 CBIC Dataset - Partial Release (Cement Only)

Note: In v0.4.0, the CBIC dataset is limited to cement tables only (validated data). Steel and PIM tables will be added in v0.4.1.

Available in v0.4.0: - ✅ cement_monthly_consumption - Monthly cement consumption by state - ✅ cement_annual_consumption - Annual cement consumption by region - ✅ cement_production_exports - Production, consumption, and export data - ✅ cement_monthly_production - Monthly cement production by state - ✅ cement_cub_prices - CUB cement prices by state

Coming in v0.4.1: - ⏳ Steel prices and production data - ⏳ PIM industrial production indices

# Works in v0.4.0
get_dataset("cbic")  # Default: cement_monthly_consumption
get_dataset("cbic", "cement_cub_prices")

# Will error with informative message
get_dataset("cbic", "steel_prices")  # Deferred to v0.4.1

🏗️ New Internal Architecture

  • Internal fetch functions: Created 12 new fetch_*() functions with @keywords internal
  • Registry-driven: All datasets managed through centralized inst/extdata/datasets.yaml
  • Hierarchical RPPI: Consolidated rppi and rppi_indices into single hierarchical structure
  • Consistent parameters: All internal functions use standardized table, cached, quiet, max_retries

📋 Simplified Public API

New unified interface:

# Get data from any dataset
data <- get_dataset("abecip")               # Default table
data <- get_dataset("abecip", table = "sbpe")  # Specific table
data <- get_dataset("rppi", table = "fipezap")  # Hierarchical access

# Discover datasets
datasets <- list_datasets()
info <- get_dataset_info("rppi")

Removed functions (now internal): - get_abecip_indicators()get_dataset("abecip") - get_abrainc_indicators()get_dataset("abrainc") - get_rppi()get_dataset("rppi") - get_bcb_realestate()get_dataset("bcb_realestate") - get_bcb_series()get_dataset("bcb_series") - Plus 10 more functions

🔧 Enhanced Data Access

  • Smart fallback: Auto fallback from GitHub cache → fresh download
  • Source control: Explicit source = "cache"/"github"/"fresh" options
  • Better error messages: Detailed troubleshooting information
  • Metadata preservation: All data includes source tracking and download info

🧪 Comprehensive Testing

  • New test suite: test-internal-functions-0.4.0.R with 100 tests
  • Registry validation: Ensures all datasets have proper internal function mappings
  • Parameter consistency: Validates all internal functions follow standard interface
  • Hierarchical testing: Comprehensive RPPI access pattern validation

Migration Guide

For Existing Code (Breaking Changes)

# OLD (0.3.x) - Will no longer work
data <- get_abecip_indicators(table = "sbpe")
data <- get_rppi(table = "fipezap")
data <- get_bcb_realestate(table = "all")

# NEW (0.4.0) - Required migration
data <- get_dataset("abecip", table = "sbpe")
data <- get_dataset("rppi", table = "fipezap")
data <- get_dataset("bcb_realestate", table = "all")

Dataset Name Mapping

Old Function New get_dataset() Name
get_abecip_indicators() "abecip"
get_abrainc_indicators() "abrainc"
get_rppi() "rppi"
get_bcb_realestate() "bcb_realestate"
get_bcb_series() "bcb_series"
get_rppi_bis() "rppi_bis"
get_secovi() "secovi"
get_fgv_indicators() "fgv_indicators"
get_b3_stocks() "b3_stocks"
get_nre_ire() "nre_ire"
get_cbic_*() "cbic"
get_itbi() "itbi"
get_property_records() "registro"

RPPI Consolidation

# OLD - Multiple functions
fipezap <- get_rppi_fipezap()
igmi <- get_rppi_igmi()
bis <- get_rppi_bis()

# NEW - Unified hierarchical access
fipezap <- get_dataset("rppi", table = "fipezap")
igmi <- get_dataset("rppi", table = "igmi")
bis <- get_dataset("rppi", table = "bis")

Technical Implementation

Internal Architecture

Backward Compatibility

  • Phase 1: Internal functions call legacy functions for gradual transition
  • Testing: Comprehensive test coverage ensures functionality preservation
  • Error handling: Graceful degradation with informative error messages

This release represents a major architectural shift toward a unified, maintainable API. While it introduces breaking changes, the new interface is significantly simpler and more powerful.

Full Changelog: https://github.com/viniciusoike/realestatebr/compare/v0.3.0…v0.4.0


realestatebr 0.3.0

Major Features and Improvements

🎯 Phase 2: Data Pipeline Implementation Complete

  • {targets} Pipeline Framework: Implemented comprehensive targets workflow for automated data processing and validation
  • Automated Data Workflows: Added daily and weekly GitHub Actions workflows using the targets pipeline
  • Data Validation Infrastructure: Added comprehensive validation rules and reporting for all datasets
  • Pipeline Performance Monitoring: Added automated report generation and validation status tracking

📊 Enhanced Data Processing

  • Targets Pipeline: _targets.R workflow with automated dependency management and parallel processing
  • Validation System: Comprehensive data validation rules with automated quality checks
  • Pipeline Helpers: Centralized helper functions for consistent data processing across all sources
  • Report Generation: Automated pipeline status reports and data quality summaries

🔧 Improved Function Reliability

  • Error Handling: Enhanced error handling in cache.R with better fallback mechanisms
  • Function Fixes: Fixed parameter bugs in get_abrainc_indicators() (category → table)
  • Data Access: Improved get_nre_ire() to use internal package data directly
  • Internal Data: Updated sysdata.rda with latest processed datasets

🚀 Infrastructure Improvements

  • Workflow Automation: Replaced single update workflow with focused daily/weekly pipelines
  • Cache Management: Improved cache validation and fallback strategies
  • Data Source Updates: Enhanced FGV data cleaning with improved formatting
  • Dependency Updates: Added targets and tarchetypes to package dependencies

📈 New Data Sources

  • B3 Stocks: Added enhanced B3 stock data processing with standardized formatting
  • FGV Indicators: Improved FGV consultation data processing and validation
  • Industrial Production: Enhanced CBIC PIM data integration
  • Construction Materials: Updated CBIC cement and steel data processing

Technical Implementation

Targets Pipeline Architecture

  • Automated Processing: All datasets now processed through unified targets pipeline
  • Quality Assurance: Built-in validation and quality checks for all data sources
  • Performance Monitoring: Real-time pipeline status and error reporting
  • Dependency Management: Automatic detection of data updates and re-processing

Enhanced Error Handling

  • Graceful Degradation: Improved fallback mechanisms for failed data retrievals
  • Better Diagnostics: Enhanced error messages and troubleshooting information
  • Retry Logic: Smart retry mechanisms with exponential backoff
  • Progress Reporting: Real-time progress updates during long-running operations

Data Quality Improvements

  • Validation Rules: Comprehensive validation for all datasets
  • Metadata Tracking: Enhanced metadata preservation and source tracking
  • Format Standardization: Consistent data formatting across all sources
  • Quality Metrics: Automated quality assessment and reporting

Migration Notes

For Existing Users

  • All existing functions continue to work unchanged
  • Enhanced reliability and performance with new pipeline backend
  • Improved error messages and troubleshooting information
  • Better cache management and fallback strategies

For Developers

  • New targets pipeline provides foundation for custom data workflows
  • Enhanced validation framework for quality assurance
  • Standardized helper functions for consistent data processing
  • Comprehensive pipeline documentation and examples

This release establishes the foundation for automated data processing and validation, setting the stage for Phase 3 implementation with large dataset support.

Full Changelog: https://github.com/viniciusoike/realestatebr/compare/v0.2.0…v0.3.0


realestatebr 0.2.0

Major Features and Improvements

🎯 Phase 1 Modernization Complete

  • Modernized 13 core get_* functions with consistent APIs, CLI-based error handling, and progress reporting
  • Standardized function signatures with table, cached, quiet, and max_retries parameters
  • Robust error handling with retry logic, exponential backoff, and informative error messages
  • Enhanced documentation with comprehensive examples and @section blocks

📊 New Unified Data Architecture

  • list_datasets() - Discover available datasets with filtering capabilities
  • get_dataset() - Unified data access function with intelligent fallback
  • Registry system in inst/extdata/datasets.yaml for centralized dataset management
  • Improved caching with standardized cache location and validation

🔧 API Standardization

  • Introduced table parameter replacing category across all functions
  • Backward compatibility maintained with deprecation warnings for category parameter
  • Consistent return types - single tibble by default, list when table = "all"
  • Metadata attributes on all returned data with source tracking and download info

📈 New Data Sources

  • CBIC construction materials data:
  • Enhanced RPPI suite with improved coordination and error handling
  • Updated B3 stock data with standardized column names

🚀 Performance and Reliability

  • Progress reporting with cli package integration for long-running operations
  • Exponential backoff for failed web scraping and API calls
  • Parallel processing support in web scraping functions
  • Comprehensive input validation with helpful error messages

🌐 Bilingual Support

  • Translation system for Portuguese/English column names and values
  • Standardized naming conventions across all datasets
  • Region and state name translations for geographic data

Breaking Changes

Parameter Changes

  • category parameter deprecated across all functions in favor of table
    • Backward compatibility maintained with deprecation warnings
    • Will be removed in a future version
    • Migration: Replace category = "value" with table = "value"

Cache Location

  • Cache moved from cached_data/ to inst/cached_data/ for package compliance
  • Existing cache files automatically migrated

Modernized Functions

Fully Modernized (13 functions)

Legacy Functions (Maintained)

  • get_rppi_bis() - Main function with modernized backend and single tibble returns
  • get_itbi() and get_itbi_bhe() - Planned for Phase 3 (DuckDB integration)

Infrastructure Improvements

New Architecture Components

  • Dataset registry system with YAML configuration
  • Unified cache management with validation and fallback
  • Translation framework for multilingual support
  • Helper function ecosystem for robust web operations

Developer Experience

  • Comprehensive test suite with 35+ tests covering all modernized functions
  • Consistent documentation patterns with @section blocks and examples
  • CLI-based development workflow with devtools integration
  • GitHub Actions for automated testing and deployment

Migration Guide

For Existing Code

# Old (deprecated but still works)
data <- get_abecip_indicators(category = "all")

# New (recommended)
data <- get_abecip_indicators(table = "all")

For New Code

# Discover available datasets
datasets <- list_datasets()

# Get data with unified interface
data <- get_dataset("abecip_indicators")

# Use modernized functions with progress
data <- get_abecip_indicators(table = "indicators", quiet = FALSE)

Technical Details

Dependencies

  • Added: cli for modern error handling and progress reporting
  • Enhanced: Better integration with dplyr, readr, httr, and rvest
  • Maintained: Full backward compatibility with existing dependencies

Performance

  • Improved web scraping with intelligent retry logic
  • Faster cache access with optimized file structure
  • Better memory usage with streaming and lazy loading where appropriate

This release represents the completion of Phase 1 modernization, establishing a solid foundation for Phase 2 (data pipeline automation) and Phase 3 (large dataset support with DuckDB).

Full Changelog: https://github.com/viniciusoike/realestatebr/compare/v0.1.5…v0.2.0