Data Engineering

Data Engineering

Data Lake

A centralized repository that stores structured, semi-structured, and unstructured data at any scale, allowing for flexible analysis.

Data Pipeline

A series of processes and tools used to move, transform, and store data from source systems to destinations like data warehouses.

Data Warehouse

A centralized database optimized for analytical queries and reporting, typically storing historical data from multiple sources.

ETL (Extract, Transform, Load)

A process used in data integration where data is extracted from sources, transformed into a usable format, and loaded into a target system.

ELT (Extract, Load, Transform)

A modern variation of ETL where raw data is first loaded into the target system and then transformed in place.

Data Governance

The management of data availability, usability, integrity, and security in enterprise systems, based on internal standards and policies.

Schema

The structure of a database or dataset, including table definitions, columns, data types, and relationships.

Batch Processing

A data processing method that collects and processes data in large blocks or batches at scheduled intervals.

Stream Processing

The real-time processing of data as it is generated, allowing for immediate analysis and action.

Big Data

Extremely large and complex data sets that require advanced tools and techniques to capture, store, manage, and analyze.

Data Modeling

The process of creating a conceptual representation of data structures and relationships to support business processes.

Data Quality

A measure of the condition of data based on factors such as accuracy, completeness, reliability, and timeliness.

Airflow

An open-source platform used to programmatically author, schedule, and monitor data workflows.

Apache Kafka

A distributed streaming platform used for building real-time data pipelines and streaming apps, capable of handling high-throughput data feeds.

Spark

An open-source distributed computing system used for big data processing, featuring in-memory computation for increased speed.


Want to explore more? Stay tuned for new terms and updates!

Last updated on