Skip to contents

realestatebr 0.6.0 (2025-10-15)

Code Simplification: Logic Consolidation (Phase 3)

Generic Helper Functions

Version 0.6.0 introduces 7 generic helper functions that consolidate 890 lines of repetitive code patterns across dataset functions.

What Changed
  • Created: R/helpers-dataset.R with 6 new helper functions (430 lines)
  • Refactored: 5 core files using new helpers (417 lines removed, 15.4% reduction)
  • Added: apply_table_filtering() in R/get-dataset.R (95 lines, eliminates 156 lines of duplication)
  • Added: 52 comprehensive tests for all helper functions
  • Improved: Consistent error messages and metadata across all datasets
File-by-File Results
File Before After Lines Saved % Reduction
get_abecip_indicators.R 551 431 120 21.8%
get_abrainc_indicators.R 544 445 99 18.2%
get_secovi.R 438 356 82 18.7%
get_bcb_series.R 334 278 56 16.8%
get-dataset.R 833 773 60 7.2%
TOTAL 2,700 2,283 417 15.4%
New Helper Functions
  1. validate_dataset_params() (R/helpers-dataset.R)
    • Consolidated input validation for table, cached, quiet, max_retries parameters
    • Ensures consistent error messages across all datasets
    • Saves ~28 lines per file
  2. handle_dataset_cache() (R/helpers-dataset.R)
    • Unified cache loading with fallback strategies
    • Consistent error handling and user messages
    • Saves ~35-50 lines per file
  3. attach_dataset_metadata() (R/helpers-dataset.R)
    • Standardized metadata attachment (source, download_time, download_info)
    • Flexible extra_info parameter for dataset-specific metadata
    • Saves ~8-16 lines per file
  4. validate_dataset() (R/helpers-dataset.R)
    • Generic data validation (rows, columns, dates)
    • Configurable validation rules with detailed error messages
    • Saves ~44 lines per file
  5. validate_excel_file() (R/helpers-dataset.R)
    • Excel file validation (size, expected sheets)
    • Used by abrainc and abecip functions
    • Prevents silent failures
  6. download_with_retry() (R/rppi-helpers.R - REUSED)
    • Found existing implementation, avoided duplication
    • Saves ~46-86 lines per file that would have been duplicated
  7. apply_table_filtering() (R/get-dataset.R)
    • Centralizes all table/category filtering logic
    • Supports property_records, SECOVI, BCB Real Estate, BCB Series
    • Eliminates 156 lines of duplication between cache functions
Impact

Code Quality: - DRY principle applied - eliminated 890 lines of code duplication - Single source of truth for common operations - Changes to validation logic now require 1 edit instead of 7

Maintainability: - Helper functions well-documented with roxygen2 - 52 comprehensive tests ensure quality - Clear separation of concerns

Consistency: - Uniform error messages across all datasets - Standardized parameter validation - Consistent metadata structure

Testing: - Helper function tests: 52 tests (100% passing) - Integration tests: 100 tests, 99 passing (1 pre-existing failure) - Full test suite: 253 tests, 248 passing (98.0%) - 3 failures: expected error message format changes - 2 failures: incomplete datasets under development

Files Changed
  • New: R/helpers-dataset.R (430 lines, 6 helpers, 52 tests)
  • Updated: R/get_abecip_indicators.R (21.8% reduction)
  • Updated: R/get_abrainc_indicators.R (18.2% reduction)
  • Updated: R/get_secovi.R (18.7% reduction)
  • Updated: R/get_bcb_series.R (16.8% reduction)
  • Updated: R/get-dataset.R (7.2% reduction, added apply_table_filtering())
Rationale
  • Simplification: Reduce codebase complexity and maintenance burden
  • Consistency: Ensure uniform behavior across all dataset functions
  • DRY: Follow “Don’t Repeat Yourself” principle
  • Testing: Well-tested helpers prevent regressions

See .claude/phase3_completion_summary.md for complete details.


BREAKING CHANGES: API Simplification (Phase 2)

Removed Deprecated Function Exports

Version 0.6.0 removes 8 deprecated functions from the public API. These functions are now internal-only. Since we are pre-1.0.0, this is an acceptable breaking change.

What Changed
Impact
  • Users must migrate: These functions can no longer be called directly
  • get_dataset() is the only supported API: All data access must go through get_dataset()
  • Cleaner public interface: Package exports only essential user-facing functions
Migration

These functions were deprecated in v0.4.0 (18+ months ago). Users must now use get_dataset():

# Old way (NO LONGER WORKS):
data <- get_secovi()
data <- get_bcb_series(table = "price")
data <- get_abecip_indicators(table = "sbpe")

# New way (REQUIRED):
data <- get_dataset("secovi")
data <- get_dataset("bcb_series", "price")
data <- get_dataset("abecip", "sbpe")
Rationale
  • Simpler API: One function (get_dataset()) instead of 15+
  • Reduced maintenance: Fewer exported functions to document and test
  • Pre-1.0.0 flexibility: Breaking changes acceptable before stable release
  • 18-month deprecation period: Functions were deprecated since v0.4.0

Code Clarity Improvements

Renamed Confusing “Legacy” Terminology
  • Renamed: get_from_legacy_function()get_from_internal_function()
  • Rationale: These functions call internal worker functions, not “legacy” code
  • Impact: Internal only - no user-facing changes

Files changed: R/get-dataset.R

CBIC Code Simplification and Table Availability

Code Reduction (~223 lines, 11% smaller)
  • Removed: 100+ lines of commented-out old implementation code
  • Removed: 4 unused helper functions (124 lines total):
    • suppress_external_warnings() - Never called
    • explore_cbic_structure() - Only in examples
    • get_cbic_files() - Only in examples
    • get_cbic_materials() - Only in examples
  • Removed: Unnecessary metadata attributes from get_cbic_steel() and get_cbic_pim():
    • attr(result, "source")
    • attr(result, "download_time")
    • attr(result, "download_info")
    • Associated tryCatch blocks and cli_user messages (~69 lines)
Table Availability Fixed
  • Unblocked: steel_prices and pim tables now accessible
  • Blocked: Only steel_production remains blocked (has data quality issues)
  • Updated: Error messages now accurately reflect v0.6.0 status
  • Available tables:
    • Cement: cement_monthly_consumption, cement_annual_consumption, cement_production_exports, cement_monthly_production, cement_cub_prices
    • Steel: steel_prices
    • PIM: pim, pim_production_index
Rationale
  • Simplification: Removed dead code and unused functions for better maintainability
  • Accuracy: Updated table availability to reflect actual working status
  • User experience: Clear error messages guide users to available tables

Files changed: R/get_cbic.R


BREAKING CHANGES: Documentation Simplification (Phase 1)

Removed Examples from Deprecated Functions

Version 0.6.0 removes usage examples from deprecated legacy functions to simplify the codebase. Since we are pre-1.0.0, this is an acceptable breaking change.

What Changed
Impact
  • ~260-290 lines of documentation removed
  • Documentation now focuses on migration guidance rather than usage examples
  • All functions still exported and callable (no functionality changes)
  • Help pages now emphasize using get_dataset() instead
Migration

These functions were deprecated in v0.4.0. Users should migrate to the modern API:

# Old way (still works, but no longer documented with examples):
data <- get_secovi()

# New way (recommended):
data <- get_dataset("secovi")

Full migration examples are available in each function’s @section Deprecation block.

Rationale
  • Pre-1.0.0: Breaking changes are acceptable before stable release
  • Codebase simplification: Reduces maintenance burden and package size
  • Focus on modern API: Encourages users to adopt get_dataset() interface
  • Clear migration path: Enhanced deprecation warnings guide users to new API

realestatebr 0.5.1

Bug Fixes

SECOVI Default Table Fix

Fixed SECOVI dataset to return all categories by default instead of only “condo”

  • Problem: get_dataset("secovi") was only returning the “condo” category (1,939 rows) instead of all categories (9,398 rows). This caused test failures for launch/rent/sale tables.

  • Root Cause: When no table parameter was specified, the code defaulted to the first category alphabetically (“condo”), rather than fetching all categories.

  • Solution:

    • Added default_table configuration support in datasets.yaml
    • Updated validate_and_resolve_table() to check for default_table setting
    • Set SECOVI’s default_table: "all" in registry
    • Regenerated cache with all 4 categories
  • Impact:

    • Cache size: 12KB → 55KB (includes all categories)
    • Data completeness: 1,939 → 9,398 rows
    • Categories: condo (1,939), launch (780), rent (2,779), sale (3,900)
# Now returns all categories by default
get_dataset("secovi")  # → 9,398 rows, 4 categories ✅

# Specific tables still work correctly
get_dataset("secovi", "launch")  # → 780 rows
get_dataset("secovi", "rent")    # → 2,779 rows
get_dataset("secovi", "sale")    # → 3,900 rows

Test Infrastructure Improvements

  • Updated test suite to use devtools::load_all() instead of library() to ensure testing of development version
  • Added comprehensive pre-release test suite (tests/comprehensive_check_v0.5.qmd)
  • Added test result documentation (tests/TEST_RESULTS_SUMMARY.md, tests/QUICK_SUMMARY.md)

Pipeline Improvements

  • Updated _targets.R to always load development version for consistency
  • Ensures targets pipeline uses latest code during cache regeneration

realestatebr 0.5.0

BREAKING CHANGES: User-Level Caching Architecture

Major Architectural Change

Version 0.5.0 introduces user-level caching, removing bundled datasets from the package to comply with CRAN’s 5MB size limit. This is a BREAKING CHANGE that affects how datasets are accessed.

What Changed

  • Removed: All cached datasets from inst/cached_data/ (previously ~25MB)
  • Added: User-level cache directory at ~/.local/share/realestatebr/ (Linux/Mac) or %LOCALAPPDATA%/realestatebr/Cache/ (Windows)
  • Added: GitHub Releases integration for pre-processed datasets
  • Changed: source="cache" now refers to user cache, not package cache
  • Changed: source="github" now downloads from GitHub releases, not package files

New Cache Behavior

# First use: downloads from GitHub releases to user cache
data <- get_dataset("abecip")  # Downloads once

# Subsequent uses: loads from user cache (instant, offline)
data <- get_dataset("abecip")  # Loads from ~/.local/share/realestatebr/

# Force fresh download from original source
data <- get_dataset("abecip", source = "fresh")  # Downloads and caches

# Explicit source selection
data <- get_dataset("abecip", source = "cache")   # User cache only
data <- get_dataset("abecip", source = "github")  # GitHub releases only

Auto Fallback Strategy (source = “auto”, default)

  1. User Cache: Check ~/.local/share/realestatebr/ (instant, offline)
  2. GitHub Releases: Download pre-processed data (requires piggyback package)
  3. Fresh Download: Download from original source (saves to user cache)

New Dependencies

  • Added: rappdirs (Imports) - Cross-platform user cache directory support
  • Added: piggyback (Suggests) - GitHub releases download support

New Functions

Migration Guide

For Users
# Install updated package
install.packages("realestatebr")  # or devtools::install_github()

# Install piggyback for GitHub downloads (recommended)
install.packages("piggyback")

# First use after update: will download datasets to user cache
data <- get_dataset("abecip")

# Check cache location
get_user_cache_dir()

# Manage cache
list_cached_files()           # See what's cached
clear_user_cache("abecip")    # Clear specific dataset
clear_user_cache()            # Clear all (with confirmation)
For Package Developers
  • Cached data files now excluded from package build via .Rbuildignore
  • Package size reduced from ~25MB to <5MB (CRAN compliant)
  • inst/cached_data/ kept for development/CI but excluded from distribution
  • GitHub Actions workflow publishes cache to releases via data-raw/publish-cache.R

Benefits

  • CRAN Compliant: Package size now <5MB (was 25MB)
  • Faster Installation: Package downloads are much smaller
  • Offline Usage: Once cached, datasets work offline
  • User Control: Users manage their own cache
  • Weekly Updates: GitHub releases updated automatically by CI
  • No Breaking APIs: get_dataset() interface unchanged

Deprecations

  • import_cached(): Still works but now loads from user cache (previously from inst/)
  • Old cached=TRUE parameter in legacy functions: Still supported but uses new cache

Files Changed

  • New: R/cache-user.R - User cache management
  • New: R/cache-github.R - GitHub releases integration
  • New: data-raw/publish-cache.R - Upload cache to releases
  • Updated: R/get-dataset.R - Refactored cache logic
  • Updated: R/cache.R - Marked as deprecated (kept for compatibility)
  • Updated: .Rbuildignore - Exclude inst/cached_data/ files
  • Updated: DESCRIPTION - Added rappdirs and piggyback dependencies

Targets Pipeline Fixes

Critical Pipeline Functionality

  • Fixed: Targets pipeline now fully functional for automated data updates
  • Fixed: FGV IBRE and NRE-IRE datasets now work correctly in targets pipeline
    • Changed from source="fresh" to source="github" for manually-updated datasets
    • These datasets have no API/download capability and require manual updates
  • Fixed: Removed broken internal data object fallback in get_fgv_ibre() and get_nre_ire()
    • Previously tried to access non-existent fgv_data and ire objects from R/sysdata.rda
    • Now provides clear error messages when fresh downloads are attempted with cached=FALSE

Enhanced Dataset Registry

  • Added: manual_update flag to datasets.yaml for FGV IBRE and NRE-IRE
  • Added: update_notes field documenting why fresh downloads aren’t available
  • Improved: Clear documentation in _targets.R explaining data source choices

Files Changed

  • _targets.R: Updated fetch_dataset() to support source parameter; FGV and NRE-IRE now use source="github"
  • R/get_fgv_ibre.R: Removed broken internal data fallback; added clear error for fresh downloads
  • R/get_nre_ire.R: Removed broken internal data fallback; added clear error for fresh downloads
  • inst/extdata/datasets.yaml: Added manual update flags and notes

Bug Fixes from Recent Commits

Property Records Simplification (Commit 9eab0ca)

  • Refactored: Major simplification of get_property_records.R (14% code reduction: 780→673 lines)
  • Removed: Deprecated functions get_ri_capitals() and get_ri_aggregates() with warning messages
  • Removed: Unused metadata attributes (source, download_time, download_info) that were never used
  • Simplified: Documentation for internal function (removed verbose examples and sections)
  • Improved: scrape_registro_imoveis_links() with better connection cleanup and reduced complexity

BCB Dataset Critical Fixes (Commit bb580c8)

BCB Real Estate
  • Fixed: CLI message serialization error in targets pipeline
  • Fixed: Compute nrow() before CLI interpolation to avoid closure issues
BCB Series - Graceful Degradation (CRITICAL)
  • Fixed: Replaced batch download with individual series downloads for better reliability
  • Fixed: Now returns successful series even if some fail (e.g., 14/15 instead of 0/15)
  • Added: Per-series retry logic with exponential backoff using purrr::possibly() pattern
  • Added: Clear warnings showing which series failed
  • Restored: Commented-out table filtering logic - now filters by bcb_category when table specified
  • Improved: Metadata-driven approach using bcb_metadata dynamically (now downloads all 140 series, not just 15)
Get Dataset Infrastructure
  • Fixed: BCB Real Estate table filtering by category in get-dataset.R
  • Fixed: BCB Series table filtering by bcb_category
  • Added: Support for table="all" in validate_and_resolve_table() function
  • Fixed: Proper mapping of user-facing table names to internal Portuguese categories
Registry and Tests
  • Updated: bcb_series categories in datasets.yaml to match metadata
  • Added: Missing categories: production, interest-rate, exchange, government, real-estate
  • Added: Integration tests for BCB table filtering and graceful degradation
  • Result: All 97 integration tests now pass

Get Dataset Critical Fixes (Commit ce4768b)

CLI Message Scoping
  • Fixed: Added .envir = parent.frame() to cli::cli_inform() calls in cli_user() and cli_debug()
  • Fixed: “cannot coerce type ‘closure’ to vector of type ‘character’” error
  • Affected: Previously failed for rppi_bis, property_records, and all functions using these helpers
FipeZap Data Quality
  • Fixed: Added standardize_city_names() call after binding FipeZap data
  • Fixed: Now correctly shows “Brazil” instead of “Índice Fipezap” for national index
Property Records Table Extraction
  • Fixed: Added special handling for nested property_records structure in get-dataset.R
  • Fixed: Now returns single tibbles instead of nested lists
  • Fixed: All tables (capitals, cities, aggregates, transfers) now work correctly
Testing Infrastructure
  • Added: Comprehensive integration test suite with 37 tests covering critical get_dataset() functionality
  • Added: Tests with source="fresh" to catch real-world failures before production
  • Added: GitHub Actions CI workflow for weekly integration tests
  • Added: Manual testing script tests/basic_checks.R for development

Note on Vignettes

  • Vignettes temporarily set to eval=FALSE for faster development
  • TODO: Re-enable vignette evaluation before CRAN release

realestatebr 0.4.1

Bug Fixes

RPPI Individual Table Access

  • Fixed: get_dataset("rppi", "ivgr") and other individual RPPI tables now work correctly
  • Fixed: Vignette build errors caused by RPPI table routing issues
  • Improved: Internal get_rppi() function now supports all individual RPPI tables (fipezap, ivgr, igmi, iqa, iqaiw, ivar, secovi_sp) in addition to stacked tables (sale, rent, all)

CRAN Compliance

  • Fixed: Removed all non-ASCII characters from R source files (7 files affected)
  • Replaced Portuguese characters with Unicode escapes for CRAN compliance
  • Files updated: get_bcb_realestate.R, get_cbic.R, get_fgv_ibre.R, get_property_records.R, get_rppi.R, get_rppi_bis.R, get_secovi.R

Test Suite

  • Fixed: Updated deprecated category= parameter to table= in tests/sanity_check.R

realestatebr 0.4.0

Major Breaking Changes - API Consolidation

🎯 Unified Data Interface

This release implements a major breaking change that consolidates 15+ individual get_*() functions into a single, unified get_dataset() interface. This dramatically simplifies the package API while maintaining full functionality.

BREAKING CHANGE: All individual get_*() functions have been removed: - get_abecip_indicators(), get_abrainc_indicators(), get_rppi(), get_bcb_realestate(), etc. - Migration: Use get_dataset("dataset_name") instead

🔧 RPPI Code Simplification (Internal)

Major refactoring of RPPI functions for better maintainability: - 67% code reduction: 1579 lines → 519 lines (1060 lines removed) - Bug fix: FipeZap national index now correctly standardized to name_muni == "Brazil" - Shared helpers: Created rppi-helpers.R with common functions to eliminate duplication - Removed overhead: Eliminated unused stack parameter, cli_debug calls, and metadata attributes - Simplified documentation: Removed verbose sections (Progress Reporting, Error Handling, Examples) from internal functions - All functions now @keywords internal: Only get_dataset() is user-facing

Benefits: - Easier to maintain and debug - Faster execution (less overhead) - Consistent error handling across all indices - Bug fixes apply to all functions automatically

📊 CBIC Dataset - Partial Release (Cement Only)

Note: In v0.4.0, the CBIC dataset is limited to cement tables only (validated data). Steel and PIM tables will be added in v0.4.1.

Available in v0.4.0: - ✅ cement_monthly_consumption - Monthly cement consumption by state - ✅ cement_annual_consumption - Annual cement consumption by region - ✅ cement_production_exports - Production, consumption, and export data - ✅ cement_monthly_production - Monthly cement production by state - ✅ cement_cub_prices - CUB cement prices by state

Coming in v0.4.1: - ⏳ Steel prices and production data - ⏳ PIM industrial production indices

# Works in v0.4.0
get_dataset("cbic")  # Default: cement_monthly_consumption
get_dataset("cbic", "cement_cub_prices")

# Will error with informative message
get_dataset("cbic", "steel_prices")  # Deferred to v0.4.1

🏗️ New Internal Architecture

  • Internal fetch functions: Created 12 new fetch_*() functions with @keywords internal
  • Registry-driven: All datasets managed through centralized inst/extdata/datasets.yaml
  • Hierarchical RPPI: Consolidated rppi and rppi_indices into single hierarchical structure
  • Consistent parameters: All internal functions use standardized table, cached, quiet, max_retries

📋 Simplified Public API

New unified interface:

# Get data from any dataset
data <- get_dataset("abecip")               # Default table
data <- get_dataset("abecip", table = "sbpe")  # Specific table
data <- get_dataset("rppi", table = "fipezap")  # Hierarchical access

# Discover datasets
datasets <- list_datasets()
info <- get_dataset_info("rppi")

Removed functions (now internal): - get_abecip_indicators()get_dataset("abecip") - get_abrainc_indicators()get_dataset("abrainc") - get_rppi()get_dataset("rppi") - get_bcb_realestate()get_dataset("bcb_realestate") - get_bcb_series()get_dataset("bcb_series") - Plus 10 more functions

🔧 Enhanced Data Access

  • Smart fallback: Auto fallback from GitHub cache → fresh download
  • Source control: Explicit source = "cache"/"github"/"fresh" options
  • Better error messages: Detailed troubleshooting information
  • Metadata preservation: All data includes source tracking and download info

🧪 Comprehensive Testing

  • New test suite: test-internal-functions-0.4.0.R with 100 tests
  • Registry validation: Ensures all datasets have proper internal function mappings
  • Parameter consistency: Validates all internal functions follow standard interface
  • Hierarchical testing: Comprehensive RPPI access pattern validation

Migration Guide

For Existing Code (Breaking Changes)

# OLD (0.3.x) - Will no longer work
data <- get_abecip_indicators(table = "sbpe")
data <- get_rppi(table = "fipezap")
data <- get_bcb_realestate(table = "all")

# NEW (0.4.0) - Required migration
data <- get_dataset("abecip", table = "sbpe")
data <- get_dataset("rppi", table = "fipezap")
data <- get_dataset("bcb_realestate", table = "all")

Dataset Name Mapping

Old Function New get_dataset() Name
get_abecip_indicators() "abecip"
get_abrainc_indicators() "abrainc"
get_rppi() "rppi"
get_bcb_realestate() "bcb_realestate"
get_bcb_series() "bcb_series"
get_rppi_bis() "rppi_bis"
get_secovi() "secovi"
get_fgv_indicators() "fgv_indicators"
get_b3_stocks() "b3_stocks"
get_nre_ire() "nre_ire"
get_cbic_*() "cbic"
get_itbi() "itbi"
get_property_records() "registro"

RPPI Consolidation

# OLD - Multiple functions
fipezap <- get_rppi_fipezap()
igmi <- get_rppi_igmi()
bis <- get_rppi_bis()

# NEW - Unified hierarchical access
fipezap <- get_dataset("rppi", table = "fipezap")
igmi <- get_dataset("rppi", table = "igmi")
bis <- get_dataset("rppi", table = "bis")

Technical Implementation

Internal Architecture

Backward Compatibility

  • Phase 1: Internal functions call legacy functions for gradual transition
  • Testing: Comprehensive test coverage ensures functionality preservation
  • Error handling: Graceful degradation with informative error messages

This release represents a major architectural shift toward a unified, maintainable API. While it introduces breaking changes, the new interface is significantly simpler and more powerful.

Full Changelog: https://github.com/viniciusoike/realestatebr/compare/v0.3.0…v0.4.0


realestatebr 0.3.0

Major Features and Improvements

🎯 Phase 2: Data Pipeline Implementation Complete

  • {targets} Pipeline Framework: Implemented comprehensive targets workflow for automated data processing and validation
  • Automated Data Workflows: Added daily and weekly GitHub Actions workflows using the targets pipeline
  • Data Validation Infrastructure: Added comprehensive validation rules and reporting for all datasets
  • Pipeline Performance Monitoring: Added automated report generation and validation status tracking

📊 Enhanced Data Processing

  • Targets Pipeline: _targets.R workflow with automated dependency management and parallel processing
  • Validation System: Comprehensive data validation rules with automated quality checks
  • Pipeline Helpers: Centralized helper functions for consistent data processing across all sources
  • Report Generation: Automated pipeline status reports and data quality summaries

🔧 Improved Function Reliability

  • Error Handling: Enhanced error handling in cache.R with better fallback mechanisms
  • Function Fixes: Fixed parameter bugs in get_abrainc_indicators() (category → table)
  • Data Access: Improved get_nre_ire() to use internal package data directly
  • Internal Data: Updated sysdata.rda with latest processed datasets

🚀 Infrastructure Improvements

  • Workflow Automation: Replaced single update workflow with focused daily/weekly pipelines
  • Cache Management: Improved cache validation and fallback strategies
  • Data Source Updates: Enhanced FGV data cleaning with improved formatting
  • Dependency Updates: Added targets and tarchetypes to package dependencies

📈 New Data Sources

  • B3 Stocks: Added enhanced B3 stock data processing with standardized formatting
  • FGV Indicators: Improved FGV consultation data processing and validation
  • Industrial Production: Enhanced CBIC PIM data integration
  • Construction Materials: Updated CBIC cement and steel data processing

Technical Implementation

Targets Pipeline Architecture

  • Automated Processing: All datasets now processed through unified targets pipeline
  • Quality Assurance: Built-in validation and quality checks for all data sources
  • Performance Monitoring: Real-time pipeline status and error reporting
  • Dependency Management: Automatic detection of data updates and re-processing

Enhanced Error Handling

  • Graceful Degradation: Improved fallback mechanisms for failed data retrievals
  • Better Diagnostics: Enhanced error messages and troubleshooting information
  • Retry Logic: Smart retry mechanisms with exponential backoff
  • Progress Reporting: Real-time progress updates during long-running operations

Data Quality Improvements

  • Validation Rules: Comprehensive validation for all datasets
  • Metadata Tracking: Enhanced metadata preservation and source tracking
  • Format Standardization: Consistent data formatting across all sources
  • Quality Metrics: Automated quality assessment and reporting

Migration Notes

For Existing Users

  • All existing functions continue to work unchanged
  • Enhanced reliability and performance with new pipeline backend
  • Improved error messages and troubleshooting information
  • Better cache management and fallback strategies

For Developers

  • New targets pipeline provides foundation for custom data workflows
  • Enhanced validation framework for quality assurance
  • Standardized helper functions for consistent data processing
  • Comprehensive pipeline documentation and examples

This release establishes the foundation for automated data processing and validation, setting the stage for Phase 3 implementation with large dataset support.

Full Changelog: https://github.com/viniciusoike/realestatebr/compare/v0.2.0…v0.3.0


realestatebr 0.2.0

Major Features and Improvements

🎯 Phase 1 Modernization Complete

  • Modernized 13 core get_* functions with consistent APIs, CLI-based error handling, and progress reporting
  • Standardized function signatures with table, cached, quiet, and max_retries parameters
  • Robust error handling with retry logic, exponential backoff, and informative error messages
  • Enhanced documentation with comprehensive examples and @section blocks

📊 New Unified Data Architecture

  • list_datasets() - Discover available datasets with filtering capabilities
  • get_dataset() - Unified data access function with intelligent fallback
  • Registry system in inst/extdata/datasets.yaml for centralized dataset management
  • Improved caching with standardized cache location and validation

🔧 API Standardization

  • Introduced table parameter replacing category across all functions
  • Backward compatibility maintained with deprecation warnings for category parameter
  • Consistent return types - single tibble by default, list when table = "all"
  • Metadata attributes on all returned data with source tracking and download info

📈 New Data Sources

  • CBIC construction materials data:
  • Enhanced RPPI suite with improved coordination and error handling
  • Updated B3 stock data with standardized column names

🚀 Performance and Reliability

  • Progress reporting with cli package integration for long-running operations
  • Exponential backoff for failed web scraping and API calls
  • Parallel processing support in web scraping functions
  • Comprehensive input validation with helpful error messages

🌐 Bilingual Support

  • Translation system for Portuguese/English column names and values
  • Standardized naming conventions across all datasets
  • Region and state name translations for geographic data

Breaking Changes

Parameter Changes

  • category parameter deprecated across all functions in favor of table
    • Backward compatibility maintained with deprecation warnings
    • Will be removed in a future version
    • Migration: Replace category = "value" with table = "value"

Cache Location

  • Cache moved from cached_data/ to inst/cached_data/ for package compliance
  • Existing cache files automatically migrated

Modernized Functions

Fully Modernized (13 functions)

Legacy Functions (Maintained)

  • get_rppi_bis() - Main function with modernized backend and single tibble returns
  • get_itbi() and get_itbi_bhe() - Planned for Phase 3 (DuckDB integration)

Infrastructure Improvements

New Architecture Components

  • Dataset registry system with YAML configuration
  • Unified cache management with validation and fallback
  • Translation framework for multilingual support
  • Helper function ecosystem for robust web operations

Developer Experience

  • Comprehensive test suite with 35+ tests covering all modernized functions
  • Consistent documentation patterns with @section blocks and examples
  • CLI-based development workflow with devtools integration
  • GitHub Actions for automated testing and deployment

Migration Guide

For Existing Code

# Old (deprecated but still works)
data <- get_abecip_indicators(category = "all")

# New (recommended)
data <- get_abecip_indicators(table = "all")

For New Code

# Discover available datasets
datasets <- list_datasets()

# Get data with unified interface
data <- get_dataset("abecip_indicators")

# Use modernized functions with progress
data <- get_abecip_indicators(table = "indicators", quiet = FALSE)

Technical Details

Dependencies

  • Added: cli for modern error handling and progress reporting
  • Enhanced: Better integration with dplyr, readr, httr, and rvest
  • Maintained: Full backward compatibility with existing dependencies

Performance

  • Improved web scraping with intelligent retry logic
  • Faster cache access with optimized file structure
  • Better memory usage with streaming and lazy loading where appropriate

This release represents the completion of Phase 1 modernization, establishing a solid foundation for Phase 2 (data pipeline automation) and Phase 3 (large dataset support with DuckDB).

Full Changelog: https://github.com/viniciusoike/realestatebr/compare/v0.1.5…v0.2.0