A unified firmographic data platform with thousands of companies from open-source datasets
Company Atlas collects, cleans, and normalizes firmographic data from multiple sources, producing an analytics-ready dataset with thousands of companies worldwide.
Combines open-source Kaggle Fortune 1000 dataset with web crawler enrichment from multiple data sources
Apache Airflow orchestration with automated data quality checks using dbt tests and Great Expectations
Star schema design with dimension and fact tables, deduplication across sources, ready for analytics and visualization
Explore profiles of leading companies ranked by market capitalization
Search by company name or CEO name
A modern data pipeline architecture built with industry-leading tools for scalable, reliable, and maintainable data processing
AWS S3 for raw data storage with CSV and Parquet formats
Snowflake for staging and analytics-ready tables
dbt for modeling and data transformation with version control
dbt tests and Great Expectations for comprehensive validation
Apache Airflow for automated workflow scheduling
FastAPI REST endpoints with interactive web interface
Access the unified company dataset through a modern REST API built with FastAPI. Interactive documentation and comprehensive filtering capabilities.
/api/v1/companies
Search and retrieve companies with filtering and pagination
/api/v1/companies/{id}
Get a specific company by ID
/api/v1/statistics
Get dataset statistics and distributions
/api/v1/industries
Get list of all industries
/api/v1/countries
Get list of all countries
import requests
# Search for Apple by company name
response = requests.get(
"http://localhost:8000/api/v1/companies",
params={
"company_name": "Apple",
"page": 1,
"page_size": 10
}
)
companies = response.json()
print(f"Found {companies['total']} companies")
for company in companies['companies']:
print(f"- {company['company_name']} ({company['domain']})")
print(f" Industry: {company['industry']}")
print(f" Revenue: ${company['revenue']:,.0f}")
print(f" Employees: {company['employee_count']:,}")
# Get statistics
curl "http://localhost:8000/api/v1/statistics"
# Search for Apple
curl "http://localhost:8000/api/v1/companies?company_name=Apple"
# Get specific company by ID
curl "http://localhost:8000/api/v1/companies/{company_id}"