Changelog • realestatebr

realestatebr 0.6.0 (2025-10-15)

Code Simplification: Logic Consolidation (Phase 3)

Generic Helper Functions

Version 0.6.0 introduces 7 generic helper functions that consolidate 890 lines of repetitive code patterns across dataset functions.

What Changed

Created: R/helpers-dataset.R with 6 new helper functions (430 lines)
Refactored: 5 core files using new helpers (417 lines removed, 15.4% reduction)
Added: apply_table_filtering() in R/get-dataset.R (95 lines, eliminates 156 lines of duplication)
Added: 52 comprehensive tests for all helper functions
Improved: Consistent error messages and metadata across all datasets

File-by-File Results

File	Before	After	Lines Saved	% Reduction
get_abecip_indicators.R	551	431	120	21.8%
get_abrainc_indicators.R	544	445	99	18.2%
get_secovi.R	438	356	82	18.7%
get_bcb_series.R	334	278	56	16.8%
get-dataset.R	833	773	60	7.2%
TOTAL	2,700	2,283	417	15.4%

New Helper Functions

validate_dataset_params() (R/helpers-dataset.R)
- Consolidated input validation for table, cached, quiet, max_retries parameters
- Ensures consistent error messages across all datasets
- Saves ~28 lines per file
handle_dataset_cache() (R/helpers-dataset.R)
- Unified cache loading with fallback strategies
- Consistent error handling and user messages
- Saves ~35-50 lines per file
attach_dataset_metadata() (R/helpers-dataset.R)
- Standardized metadata attachment (source, download_time, download_info)
- Flexible extra_info parameter for dataset-specific metadata
- Saves ~8-16 lines per file
validate_dataset() (R/helpers-dataset.R)
- Generic data validation (rows, columns, dates)
- Configurable validation rules with detailed error messages
- Saves ~44 lines per file
validate_excel_file() (R/helpers-dataset.R)
- Excel file validation (size, expected sheets)
- Used by abrainc and abecip functions
- Prevents silent failures
download_with_retry() (R/rppi-helpers.R - REUSED)
- Found existing implementation, avoided duplication
- Saves ~46-86 lines per file that would have been duplicated
apply_table_filtering() (R/get-dataset.R)
- Centralizes all table/category filtering logic
- Supports property_records, SECOVI, BCB Real Estate, BCB Series
- Eliminates 156 lines of duplication between cache functions

Impact

Code Quality: - DRY principle applied - eliminated 890 lines of code duplication - Single source of truth for common operations - Changes to validation logic now require 1 edit instead of 7

Maintainability: - Helper functions well-documented with roxygen2 - 52 comprehensive tests ensure quality - Clear separation of concerns

Consistency: - Uniform error messages across all datasets - Standardized parameter validation - Consistent metadata structure

Testing: - Helper function tests: 52 tests (100% passing) - Integration tests: 100 tests, 99 passing (1 pre-existing failure) - Full test suite: 253 tests, 248 passing (98.0%) - 3 failures: expected error message format changes - 2 failures: incomplete datasets under development

Files Changed

New: R/helpers-dataset.R (430 lines, 6 helpers, 52 tests)
Updated: R/get_abecip_indicators.R (21.8% reduction)
Updated: R/get_abrainc_indicators.R (18.2% reduction)
Updated: R/get_secovi.R (18.7% reduction)
Updated: R/get_bcb_series.R (16.8% reduction)
Updated: R/get-dataset.R (7.2% reduction, added apply_table_filtering())

Rationale

Simplification: Reduce codebase complexity and maintenance burden
Consistency: Ensure uniform behavior across all dataset functions
DRY: Follow “Don’t Repeat Yourself” principle
Testing: Well-tested helpers prevent regressions

See .claude/phase3_completion_summary.md for complete details.

BREAKING CHANGES: API Simplification (Phase 2)

Removed Deprecated Function Exports

Version 0.6.0 removes 8 deprecated functions from the public API. These functions are now internal-only. Since we are pre-1.0.0, this is an acceptable breaking change.

What Changed

Removed from NAMESPACE: 8 deprecated functions no longer exported:
- get_abecip_indicators()
- get_abrainc_indicators()
- get_bcb_realestate()
- get_bcb_series()
- get_fgv_ibre()
- get_nre_ire()
- get_rppi_bis()
- get_secovi()
Still callable internally: Functions remain in the package for get_dataset() to use
NAMESPACE reduction: Exports reduced from 23 to 15 functions

Impact

Users must migrate: These functions can no longer be called directly
get_dataset() is the only supported API: All data access must go through get_dataset()
Cleaner public interface: Package exports only essential user-facing functions

Migration

These functions were deprecated in v0.4.0 (18+ months ago). Users must now use get_dataset():

# Old way (NO LONGER WORKS):
data <- get_secovi()
data <- get_bcb_series(table = "price")
data <- get_abecip_indicators(table = "sbpe")

# New way (REQUIRED):
data <- get_dataset("secovi")
data <- get_dataset("bcb_series", "price")
data <- get_dataset("abecip", "sbpe")

Rationale

Simpler API: One function (get_dataset()) instead of 15+
Reduced maintenance: Fewer exported functions to document and test
Pre-1.0.0 flexibility: Breaking changes acceptable before stable release
18-month deprecation period: Functions were deprecated since v0.4.0

Code Clarity Improvements

Renamed Confusing “Legacy” Terminology

Renamed: get_from_legacy_function() → get_from_internal_function()
Rationale: These functions call internal worker functions, not “legacy” code
Impact: Internal only - no user-facing changes

Files changed: R/get-dataset.R

CBIC Code Simplification and Table Availability

Code Reduction (~223 lines, 11% smaller)

Removed: 100+ lines of commented-out old implementation code
Removed: 4 unused helper functions (124 lines total):
- suppress_external_warnings() - Never called
- explore_cbic_structure() - Only in examples
- get_cbic_files() - Only in examples
- get_cbic_materials() - Only in examples
Removed: Unnecessary metadata attributes from get_cbic_steel() and get_cbic_pim():
- attr(result, "source")
- attr(result, "download_time")
- attr(result, "download_info")
- Associated tryCatch blocks and cli_user messages (~69 lines)

Table Availability Fixed

Unblocked: steel_prices and pim tables now accessible
Blocked: Only steel_production remains blocked (has data quality issues)
Updated: Error messages now accurately reflect v0.6.0 status
Available tables:
- Cement: cement_monthly_consumption, cement_annual_consumption, cement_production_exports, cement_monthly_production, cement_cub_prices
- Steel: steel_prices
- PIM: pim, pim_production_index

Rationale

Simplification: Removed dead code and unused functions for better maintainability
Accuracy: Updated table availability to reflect actual working status
User experience: Clear error messages guide users to available tables

Files changed: R/get_cbic.R

BREAKING CHANGES: Documentation Simplification (Phase 1)

Removed Examples from Deprecated Functions

Version 0.6.0 removes usage examples from deprecated legacy functions to simplify the codebase. Since we are pre-1.0.0, this is an acceptable breaking change.

What Changed

Removed: All @examples blocks from 8 deprecated functions:
- get_secovi()
- get_bcb_realestate()
- get_abrainc_indicators()
- get_abecip_indicators()
- get_rppi_bis()
- get_bcb_series()
- get_fgv_ibre()
- get_nre_ire()
Removed: Verbose @section blocks (Progress Reporting, Error Handling)
Simplified: @details sections to 1-3 essential lines
Enhanced: @section Deprecation blocks with code migration examples

Impact

~260-290 lines of documentation removed
Documentation now focuses on migration guidance rather than usage examples
All functions still exported and callable (no functionality changes)
Help pages now emphasize using get_dataset() instead

Migration

These functions were deprecated in v0.4.0. Users should migrate to the modern API:

# Old way (still works, but no longer documented with examples):
data <- get_secovi()

# New way (recommended):
data <- get_dataset("secovi")

Full migration examples are available in each function’s @section Deprecation block.

Rationale

Pre-1.0.0: Breaking changes are acceptable before stable release
Codebase simplification: Reduces maintenance burden and package size
Focus on modern API: Encourages users to adopt get_dataset() interface
Clear migration path: Enhanced deprecation warnings guide users to new API

realestatebr 0.5.1

Bug Fixes

SECOVI Default Table Fix

Fixed SECOVI dataset to return all categories by default instead of only “condo”

Problem: get_dataset("secovi") was only returning the “condo” category (1,939 rows) instead of all categories (9,398 rows). This caused test failures for launch/rent/sale tables.
Root Cause: When no table parameter was specified, the code defaulted to the first category alphabetically (“condo”), rather than fetching all categories.
Solution:
- Added default_table configuration support in datasets.yaml
- Updated validate_and_resolve_table() to check for default_table setting
- Set SECOVI’s default_table: "all" in registry
- Regenerated cache with all 4 categories
Impact:
- Cache size: 12KB → 55KB (includes all categories)
- Data completeness: 1,939 → 9,398 rows
- Categories: condo (1,939), launch (780), rent (2,779), sale (3,900)

# Now returns all categories by default
get_dataset("secovi")  # → 9,398 rows, 4 categories ✅

# Specific tables still work correctly
get_dataset("secovi", "launch")  # → 780 rows
get_dataset("secovi", "rent")    # → 2,779 rows
get_dataset("secovi", "sale")    # → 3,900 rows

Test Infrastructure Improvements

Updated test suite to use devtools::load_all() instead of library() to ensure testing of development version
Added comprehensive pre-release test suite (tests/comprehensive_check_v0.5.qmd)
Added test result documentation (tests/TEST_RESULTS_SUMMARY.md, tests/QUICK_SUMMARY.md)

Pipeline Improvements

Updated _targets.R to always load development version for consistency
Ensures targets pipeline uses latest code during cache regeneration

realestatebr 0.5.0

BREAKING CHANGES: User-Level Caching Architecture

Major Architectural Change

Version 0.5.0 introduces user-level caching, removing bundled datasets from the package to comply with CRAN’s 5MB size limit. This is a BREAKING CHANGE that affects how datasets are accessed.

What Changed

Removed: All cached datasets from inst/cached_data/ (previously ~25MB)
Added: User-level cache directory at ~/.local/share/realestatebr/ (Linux/Mac) or %LOCALAPPDATA%/realestatebr/Cache/ (Windows)
Added: GitHub Releases integration for pre-processed datasets
Changed: source="cache" now refers to user cache, not package cache
Changed: source="github" now downloads from GitHub releases, not package files

New Cache Behavior

# First use: downloads from GitHub releases to user cache
data <- get_dataset("abecip")  # Downloads once

# Subsequent uses: loads from user cache (instant, offline)
data <- get_dataset("abecip")  # Loads from ~/.local/share/realestatebr/

# Force fresh download from original source
data <- get_dataset("abecip", source = "fresh")  # Downloads and caches

# Explicit source selection
data <- get_dataset("abecip", source = "cache")   # User cache only
data <- get_dataset("abecip", source = "github")  # GitHub releases only

Auto Fallback Strategy (source = “auto”, default)

User Cache: Check ~/.local/share/realestatebr/ (instant, offline)
GitHub Releases: Download pre-processed data (requires piggyback package)
Fresh Download: Download from original source (saves to user cache)

New Dependencies

Added: rappdirs (Imports) - Cross-platform user cache directory support
Added: piggyback (Suggests) - GitHub releases download support

New Functions

get_user_cache_dir(): Get path to user cache directory
list_cached_files(): List all cached datasets
clear_user_cache(): Remove cached datasets
is_cached(): Check if dataset is in cache
list_github_assets(): List available datasets on GitHub releases
download_from_github_release(): Download specific dataset from releases
update_cache_from_github(): Update cached datasets from GitHub
is_cache_up_to_date(): Compare local vs GitHub cache timestamps

Migration Guide

For Users

# Install updated package
install.packages("realestatebr")  # or devtools::install_github()

# Install piggyback for GitHub downloads (recommended)
install.packages("piggyback")

# First use after update: will download datasets to user cache
data <- get_dataset("abecip")

# Check cache location
get_user_cache_dir()

# Manage cache
list_cached_files()           # See what's cached
clear_user_cache("abecip")    # Clear specific dataset
clear_user_cache()            # Clear all (with confirmation)

For Package Developers

Cached data files now excluded from package build via .Rbuildignore
Package size reduced from ~25MB to <5MB (CRAN compliant)
inst/cached_data/ kept for development/CI but excluded from distribution
GitHub Actions workflow publishes cache to releases via data-raw/publish-cache.R

Benefits

✅ CRAN Compliant: Package size now <5MB (was 25MB)
✅ Faster Installation: Package downloads are much smaller
✅ Offline Usage: Once cached, datasets work offline
✅ User Control: Users manage their own cache
✅ Weekly Updates: GitHub releases updated automatically by CI
✅ No Breaking APIs: get_dataset() interface unchanged

Deprecations

import_cached(): Still works but now loads from user cache (previously from inst/)
Old cached=TRUE parameter in legacy functions: Still supported but uses new cache

Files Changed

New: R/cache-user.R - User cache management
New: R/cache-github.R - GitHub releases integration
New: data-raw/publish-cache.R - Upload cache to releases
Updated: R/get-dataset.R - Refactored cache logic
Updated: R/cache.R - Marked as deprecated (kept for compatibility)
Updated: .Rbuildignore - Exclude inst/cached_data/ files
Updated: DESCRIPTION - Added rappdirs and piggyback dependencies

Targets Pipeline Fixes

Critical Pipeline Functionality

Fixed: Targets pipeline now fully functional for automated data updates
Fixed: FGV IBRE and NRE-IRE datasets now work correctly in targets pipeline
- Changed from source="fresh" to source="github" for manually-updated datasets
- These datasets have no API/download capability and require manual updates
Fixed: Removed broken internal data object fallback in get_fgv_ibre() and get_nre_ire()
- Previously tried to access non-existent fgv_data and ire objects from R/sysdata.rda
- Now provides clear error messages when fresh downloads are attempted with cached=FALSE

Enhanced Dataset Registry

Added: manual_update flag to datasets.yaml for FGV IBRE and NRE-IRE
Added: update_notes field documenting why fresh downloads aren’t available
Improved: Clear documentation in _targets.R explaining data source choices

Files Changed

_targets.R: Updated fetch_dataset() to support source parameter; FGV and NRE-IRE now use source="github"
R/get_fgv_ibre.R: Removed broken internal data fallback; added clear error for fresh downloads
R/get_nre_ire.R: Removed broken internal data fallback; added clear error for fresh downloads
inst/extdata/datasets.yaml: Added manual update flags and notes

Bug Fixes from Recent Commits

Property Records Simplification (Commit 9eab0ca)

Refactored: Major simplification of get_property_records.R (14% code reduction: 780→673 lines)
Removed: Deprecated functions get_ri_capitals() and get_ri_aggregates() with warning messages
Removed: Unused metadata attributes (source, download_time, download_info) that were never used
Simplified: Documentation for internal function (removed verbose examples and sections)
Improved: scrape_registro_imoveis_links() with better connection cleanup and reduced complexity

BCB Dataset Critical Fixes (Commit bb580c8)

BCB Real Estate

Fixed: CLI message serialization error in targets pipeline
Fixed: Compute nrow() before CLI interpolation to avoid closure issues

BCB Series - Graceful Degradation (CRITICAL)

Fixed: Replaced batch download with individual series downloads for better reliability
Fixed: Now returns successful series even if some fail (e.g., 14/15 instead of 0/15)
Added: Per-series retry logic with exponential backoff using purrr::possibly() pattern
Added: Clear warnings showing which series failed
Restored: Commented-out table filtering logic - now filters by bcb_category when table specified
Improved: Metadata-driven approach using bcb_metadata dynamically (now downloads all 140 series, not just 15)

Get Dataset Infrastructure

Fixed: BCB Real Estate table filtering by category in get-dataset.R
Fixed: BCB Series table filtering by bcb_category
Added: Support for table="all" in validate_and_resolve_table() function
Fixed: Proper mapping of user-facing table names to internal Portuguese categories

Registry and Tests

Updated: bcb_series categories in datasets.yaml to match metadata
Added: Missing categories: production, interest-rate, exchange, government, real-estate
Added: Integration tests for BCB table filtering and graceful degradation
Result: All 97 integration tests now pass

Get Dataset Critical Fixes (Commit ce4768b)

CLI Message Scoping

Fixed: Added .envir = parent.frame() to cli::cli_inform() calls in cli_user() and cli_debug()
Fixed: “cannot coerce type ‘closure’ to vector of type ‘character’” error
Affected: Previously failed for rppi_bis, property_records, and all functions using these helpers

FipeZap Data Quality

Fixed: Added standardize_city_names() call after binding FipeZap data
Fixed: Now correctly shows “Brazil” instead of “Índice Fipezap” for national index

Property Records Table Extraction

Fixed: Added special handling for nested property_records structure in get-dataset.R
Fixed: Now returns single tibbles instead of nested lists
Fixed: All tables (capitals, cities, aggregates, transfers) now work correctly

Testing Infrastructure

Added: Comprehensive integration test suite with 37 tests covering critical get_dataset() functionality
Added: Tests with source="fresh" to catch real-world failures before production
Added: GitHub Actions CI workflow for weekly integration tests
Added: Manual testing script tests/basic_checks.R for development

Note on Vignettes

Vignettes temporarily set to eval=FALSE for faster development
TODO: Re-enable vignette evaluation before CRAN release

realestatebr 0.4.1

Bug Fixes

RPPI Individual Table Access

Fixed: get_dataset("rppi", "ivgr") and other individual RPPI tables now work correctly
Fixed: Vignette build errors caused by RPPI table routing issues
Improved: Internal get_rppi() function now supports all individual RPPI tables (fipezap, ivgr, igmi, iqa, iqaiw, ivar, secovi_sp) in addition to stacked tables (sale, rent, all)

CRAN Compliance

Fixed: Removed all non-ASCII characters from R source files (7 files affected)
Replaced Portuguese characters with Unicode escapes for CRAN compliance
Files updated: get_bcb_realestate.R, get_cbic.R, get_fgv_ibre.R, get_property_records.R, get_rppi.R, get_rppi_bis.R, get_secovi.R

Test Suite

Fixed: Updated deprecated category= parameter to table= in tests/sanity_check.R

realestatebr 0.4.0

Major Breaking Changes - API Consolidation

🎯 Unified Data Interface

This release implements a major breaking change that consolidates 15+ individual get_*() functions into a single, unified get_dataset() interface. This dramatically simplifies the package API while maintaining full functionality.

BREAKING CHANGE: All individual get_*() functions have been removed: - get_abecip_indicators(), get_abrainc_indicators(), get_rppi(), get_bcb_realestate(), etc. - Migration: Use get_dataset("dataset_name") instead

🔧 RPPI Code Simplification (Internal)

Major refactoring of RPPI functions for better maintainability: - 67% code reduction: 1579 lines → 519 lines (1060 lines removed) - Bug fix: FipeZap national index now correctly standardized to name_muni == "Brazil" - Shared helpers: Created rppi-helpers.R with common functions to eliminate duplication - Removed overhead: Eliminated unused stack parameter, cli_debug calls, and metadata attributes - Simplified documentation: Removed verbose sections (Progress Reporting, Error Handling, Examples) from internal functions - All functions now @keywords internal: Only get_dataset() is user-facing

Benefits: - Easier to maintain and debug - Faster execution (less overhead) - Consistent error handling across all indices - Bug fixes apply to all functions automatically

📊 CBIC Dataset - Partial Release (Cement Only)

Note: In v0.4.0, the CBIC dataset is limited to cement tables only (validated data). Steel and PIM tables will be added in v0.4.1.

Available in v0.4.0: - ✅ cement_monthly_consumption - Monthly cement consumption by state - ✅ cement_annual_consumption - Annual cement consumption by region - ✅ cement_production_exports - Production, consumption, and export data - ✅ cement_monthly_production - Monthly cement production by state - ✅ cement_cub_prices - CUB cement prices by state

Coming in v0.4.1: - ⏳ Steel prices and production data - ⏳ PIM industrial production indices

# Works in v0.4.0
get_dataset("cbic")  # Default: cement_monthly_consumption
get_dataset("cbic", "cement_cub_prices")

# Will error with informative message
get_dataset("cbic", "steel_prices")  # Deferred to v0.4.1

🏗️ New Internal Architecture

Internal fetch functions: Created 12 new fetch_*() functions with @keywords internal
Registry-driven: All datasets managed through centralized inst/extdata/datasets.yaml
Hierarchical RPPI: Consolidated rppi and rppi_indices into single hierarchical structure
Consistent parameters: All internal functions use standardized table, cached, quiet, max_retries

📋 Simplified Public API

New unified interface:

# Get data from any dataset
data <- get_dataset("abecip")               # Default table
data <- get_dataset("abecip", table = "sbpe")  # Specific table
data <- get_dataset("rppi", table = "fipezap")  # Hierarchical access

# Discover datasets
datasets <- list_datasets()
info <- get_dataset_info("rppi")

Removed functions (now internal): - get_abecip_indicators() → get_dataset("abecip") - get_abrainc_indicators() → get_dataset("abrainc") - get_rppi() → get_dataset("rppi") - get_bcb_realestate() → get_dataset("bcb_realestate") - get_bcb_series() → get_dataset("bcb_series") - Plus 10 more functions

🔧 Enhanced Data Access

Smart fallback: Auto fallback from GitHub cache → fresh download
Source control: Explicit source = "cache"/"github"/"fresh" options
Better error messages: Detailed troubleshooting information
Metadata preservation: All data includes source tracking and download info

🧪 Comprehensive Testing

New test suite: test-internal-functions-0.4.0.R with 100 tests
Registry validation: Ensures all datasets have proper internal function mappings
Parameter consistency: Validates all internal functions follow standard interface
Hierarchical testing: Comprehensive RPPI access pattern validation

Migration Guide

For Existing Code (Breaking Changes)

# OLD (0.3.x) - Will no longer work
data <- get_abecip_indicators(table = "sbpe")
data <- get_rppi(table = "fipezap")
data <- get_bcb_realestate(table = "all")

# NEW (0.4.0) - Required migration
data <- get_dataset("abecip", table = "sbpe")
data <- get_dataset("rppi", table = "fipezap")
data <- get_dataset("bcb_realestate", table = "all")

Dataset Name Mapping

Old Function	New get_dataset() Name
`get_abecip_indicators()`	`"abecip"`
`get_abrainc_indicators()`	`"abrainc"`
`get_rppi()`	`"rppi"`
`get_bcb_realestate()`	`"bcb_realestate"`
`get_bcb_series()`	`"bcb_series"`
`get_rppi_bis()`	`"rppi_bis"`
`get_secovi()`	`"secovi"`
`get_fgv_indicators()`	`"fgv_indicators"`
`get_b3_stocks()`	`"b3_stocks"`
`get_nre_ire()`	`"nre_ire"`
`get_cbic_*()`	`"cbic"`
`get_itbi()`	`"itbi"`
`get_property_records()`	`"registro"`

RPPI Consolidation

# OLD - Multiple functions
fipezap <- get_rppi_fipezap()
igmi <- get_rppi_igmi()
bis <- get_rppi_bis()

# NEW - Unified hierarchical access
fipezap <- get_dataset("rppi", table = "fipezap")
igmi <- get_dataset("rppi", table = "igmi")
bis <- get_dataset("rppi", table = "bis")

Technical Implementation

Internal Architecture

12 internal fetch functions: fetch_rppi(), fetch_abecip(), etc.
Registry system: Complete mapping in datasets.yaml
Fallback mechanism: get_from_internal_function() → get_from_legacy_function()
NAMESPACE cleanup: Only exports get_dataset(), list_datasets(), utilities

Backward Compatibility

Phase 1: Internal functions call legacy functions for gradual transition
Testing: Comprehensive test coverage ensures functionality preservation
Error handling: Graceful degradation with informative error messages

This release represents a major architectural shift toward a unified, maintainable API. While it introduces breaking changes, the new interface is significantly simpler and more powerful.

Full Changelog: https://github.com/viniciusoike/realestatebr/compare/v0.3.0…v0.4.0

realestatebr 0.3.0

Major Features and Improvements

🎯 Phase 2: Data Pipeline Implementation Complete

{targets} Pipeline Framework: Implemented comprehensive targets workflow for automated data processing and validation
Automated Data Workflows: Added daily and weekly GitHub Actions workflows using the targets pipeline
Data Validation Infrastructure: Added comprehensive validation rules and reporting for all datasets
Pipeline Performance Monitoring: Added automated report generation and validation status tracking

📊 Enhanced Data Processing

Targets Pipeline: _targets.R workflow with automated dependency management and parallel processing
Validation System: Comprehensive data validation rules with automated quality checks
Pipeline Helpers: Centralized helper functions for consistent data processing across all sources
Report Generation: Automated pipeline status reports and data quality summaries

🔧 Improved Function Reliability

Error Handling: Enhanced error handling in cache.R with better fallback mechanisms
Function Fixes: Fixed parameter bugs in get_abrainc_indicators() (category → table)
Data Access: Improved get_nre_ire() to use internal package data directly
Internal Data: Updated sysdata.rda with latest processed datasets

🚀 Infrastructure Improvements

Workflow Automation: Replaced single update workflow with focused daily/weekly pipelines
Cache Management: Improved cache validation and fallback strategies
Data Source Updates: Enhanced FGV data cleaning with improved formatting
Dependency Updates: Added targets and tarchetypes to package dependencies

📈 New Data Sources

B3 Stocks: Added enhanced B3 stock data processing with standardized formatting
FGV Indicators: Improved FGV consultation data processing and validation
Industrial Production: Enhanced CBIC PIM data integration
Construction Materials: Updated CBIC cement and steel data processing

Technical Implementation

Targets Pipeline Architecture

Automated Processing: All datasets now processed through unified targets pipeline
Quality Assurance: Built-in validation and quality checks for all data sources
Performance Monitoring: Real-time pipeline status and error reporting
Dependency Management: Automatic detection of data updates and re-processing

Enhanced Error Handling

Graceful Degradation: Improved fallback mechanisms for failed data retrievals
Better Diagnostics: Enhanced error messages and troubleshooting information
Retry Logic: Smart retry mechanisms with exponential backoff
Progress Reporting: Real-time progress updates during long-running operations

Data Quality Improvements

Validation Rules: Comprehensive validation for all datasets
Metadata Tracking: Enhanced metadata preservation and source tracking
Format Standardization: Consistent data formatting across all sources
Quality Metrics: Automated quality assessment and reporting

Migration Notes

For Existing Users

All existing functions continue to work unchanged
Enhanced reliability and performance with new pipeline backend
Improved error messages and troubleshooting information
Better cache management and fallback strategies

For Developers

New targets pipeline provides foundation for custom data workflows
Enhanced validation framework for quality assurance
Standardized helper functions for consistent data processing
Comprehensive pipeline documentation and examples

This release establishes the foundation for automated data processing and validation, setting the stage for Phase 3 implementation with large dataset support.

Full Changelog: https://github.com/viniciusoike/realestatebr/compare/v0.2.0…v0.3.0

realestatebr 0.2.0

Major Features and Improvements

🎯 Phase 1 Modernization Complete

Modernized 13 core get_* functions with consistent APIs, CLI-based error handling, and progress reporting
Standardized function signatures with table, cached, quiet, and max_retries parameters
Robust error handling with retry logic, exponential backoff, and informative error messages
Enhanced documentation with comprehensive examples and @section blocks

📊 New Unified Data Architecture

list_datasets() - Discover available datasets with filtering capabilities
get_dataset() - Unified data access function with intelligent fallback
Registry system in inst/extdata/datasets.yaml for centralized dataset management
Improved caching with standardized cache location and validation

🔧 API Standardization

Introduced table parameter replacing category across all functions
Backward compatibility maintained with deprecation warnings for category parameter
Consistent return types - single tibble by default, list when table = "all"
Metadata attributes on all returned data with source tracking and download info

📈 New Data Sources

CBIC construction materials data:
- get_cbic_cement() - Cement consumption, production, and CUB prices
- get_cbic_steel() - Steel prices and production data
- get_cbic_pim() - Industrial production indices
Enhanced RPPI suite with improved coordination and error handling
Updated B3 stock data with standardized column names

🚀 Performance and Reliability

Progress reporting with cli package integration for long-running operations
Exponential backoff for failed web scraping and API calls
Parallel processing support in web scraping functions
Comprehensive input validation with helpful error messages

🌐 Bilingual Support

Translation system for Portuguese/English column names and values
Standardized naming conventions across all datasets
Region and state name translations for geographic data

Breaking Changes

Parameter Changes

category parameter deprecated across all functions in favor of table
- Backward compatibility maintained with deprecation warnings
- Will be removed in a future version
- Migration: Replace category = "value" with table = "value"

Cache Location

Cache moved from cached_data/ to inst/cached_data/ for package compliance
Existing cache files automatically migrated

Modernized Functions

Fully Modernized (13 functions)

get_abecip_indicators() - ABECIP real estate financing data
get_abrainc_indicators() - ABRAINC launches and sales data
get_b3_stocks() - B3 stock market data with improved column naming
get_bcb_realestate() - Central Bank real estate credit data
get_bcb_series() - BCB macroeconomic time series
get_rppi_bis() - Bank for International Settlements RPPI data
get_cbic_cement() - CBIC cement industry data (NEW)
get_cbic_steel() - CBIC steel industry data (NEW)
get_cbic_pim() - CBIC industrial production data (NEW)
get_fgv_indicators() - FGV construction confidence indicators
get_nre_ire() - Real Estate Index from NRE-Poli USP
get_property_records() - Property registration data with robust Excel processing
get_rppi() - Comprehensive RPPI coordinator with all sources
get_secovi() - SECOVI-SP real estate data with parallel processing

Legacy Functions (Maintained)

get_rppi_bis() - Main function with modernized backend and single tibble returns
get_itbi() and get_itbi_bhe() - Planned for Phase 3 (DuckDB integration)

Infrastructure Improvements

New Architecture Components

Dataset registry system with YAML configuration
Unified cache management with validation and fallback
Translation framework for multilingual support
Helper function ecosystem for robust web operations

Developer Experience

Comprehensive test suite with 35+ tests covering all modernized functions
Consistent documentation patterns with @section blocks and examples
CLI-based development workflow with devtools integration
GitHub Actions for automated testing and deployment

Migration Guide

For Existing Code

# Old (deprecated but still works)
data <- get_abecip_indicators(category = "all")

# New (recommended)
data <- get_abecip_indicators(table = "all")

For New Code

# Discover available datasets
datasets <- list_datasets()

# Get data with unified interface
data <- get_dataset("abecip_indicators")

# Use modernized functions with progress
data <- get_abecip_indicators(table = "indicators", quiet = FALSE)

Technical Details

Dependencies

Added: cli for modern error handling and progress reporting
Enhanced: Better integration with dplyr, readr, httr, and rvest
Maintained: Full backward compatibility with existing dependencies

Performance

Improved web scraping with intelligent retry logic
Faster cache access with optimized file structure
Better memory usage with streaming and lazy loading where appropriate

This release represents the completion of Phase 1 modernization, establishing a solid foundation for Phase 2 (data pipeline automation) and Phase 3 (large dataset support with DuckDB).

Full Changelog: https://github.com/viniciusoike/realestatebr/compare/v0.1.5…v0.2.0