Roadmap 2024 - Data engineering in Spanish
One more repository with basic concepts, technical challenges and resources on data engineering in Spanish?
Would you like to contribute to the repository? Visit the contribution guide
Note: the following learning route is designed at personal discretion with the idea of facilitating the study of those interested in data engineering with free, free material in Spanish that I found on the internet. It is not a definitive guide or a course, it is a list of resources that can be improved over time with contributions from the community.
Data engineering books in English
Design Patterns for DE in English
Programming
Basics
We start with understanding the fundamental concepts of programming and logic. This section can be developed simultaneously with learning the programming language of your choice.
- Course: Platzi Basic Programming
- Videos: Introduction to Algorithms and TodoCode Programming
- Videos: TodoCode Pseudocode Exercises
- Videos: Datademia Command Line
- Videos: Bash scripting by Fazt
- Reading: Introduction to the Linux Command Line and Shell from Microsoft Learn
? Programming language
I recommend starting with Python because of its friendly learning curve and its prevalence in today's industry. However, it is important to note that data processing can also be done with R, Java, Scala, Julia, among others.
- Videos: Python from 0 by PildorasInformáticas
- Course: Scientific Computing with Python from FreeCodeCamp
- Course: College Algebra with Python from FreeCodeCamp
- Course: Harvard CS50's Introduction to Programming with Python subtitled by FreeCodeCamp
- Course: Subtitled Intermediate Python from FreeCodeCamp
- Course: Kaggle Pandas
- Videos: Regular Expressions by Ada Lovecode
- Video: Principles of BettaTech Object-Oriented Programming
- Videos: Object Oriented Programming explained with Minecraft by Absolute
- Course: Julia for people in a hurry by Miguel Raz
Excel
Version control with Git
Learning about version control is not only valuable when working in teams, but it also provides us with the ability to track, understand and manage the changes made to our project and thus maintain efficient and collaborative development.
- Video: What is version control and why is it so important for programming? by Datademia
- Course: Git and Github by MoureDev
- Videos: Git and Github by TodoCode
- Reading: Use Git Correctly by Attlasian
- Game: Learn Git Branching
More tools
- Notebooks: Google Collab, Jupyter or Deepnote
- Text editors: VSCode, Spyder or Google IDX
Databases
Basics
In this instance it is time to learn about databases. The choice of database manager to use is at your discretion, although I personally recommend PostgreSQL for structured data and MongoDB for unstructured data. However, there are many other options: MySQL, SQLite and so on.
- Videos: Introduction to TodoCode databases
- Reading: Differences between DDL, DML and DCL of TodoPostgreSQL
- Video: Stored Procedures #1 by Héctor de León
- Video: Stored Procedures #2 by Héctor de León
- Video: MongoDB by Fazt
- Videos: MongoDB by MitoCode
SQL
You will also learn SQL, a query language for managing and manipulating relational databases.
- Videos: SQL from Data Engineering LATAM
- Intro to SQL by Kaggle
- Kaggle Advanced SQL
? Design
Now we continue with more advanced concepts that will help us design databases, data lakes, data warehouses, schemas, etc.
- Video: When to use SQL and when to use NoSQL? by Hector de León
- Video: How are NoSQL databases modeled? from HelloWorld
- Reading: Oracle Graph Oriented Databases
- Video: Graph Databases, Datahack Fundamentals and Practice
Big Data
The next thing is to understand some concepts of Big Data. In addition, it is interesting to acquire basic knowledge about artificial intelligence, business intelligence and data analysis without the need to delve too deeply.
Basics
- Video: Big Data for Datahack Dummies
- Reading: Big Data: What is it and how does it help my business? from Salesforce
- Certification: Design and program IoT solutions with the use of Big Data from Universidad del Rosario
- Certification: Big Data from University of California San Diego
- Video: Big data and Databits privacy
- Videos: Smart Data Data Governance
- Video: How to Get Started with Data Governance without Breaking the Budget by Software Guru
Analytics and data exploration
- Certification: Professional Fundamentals of Data Analytics, from Microsoft and LinkedIn
- Certification: Google Data Analytics Professional Certificate
- Certification: IBM Data Analyst Professional Certificate
- Course: Data Analysis with Python from FreeCodeCamp
- Video: Storytelling: How to turn your content into a story? by Coderhouse
Statistics
? Artificial intelligence
- Course: Machine Learning with Python from FreeCodeCamp
- Channel: LearnIA with Ligdi Gonzalez
- Videos: Learn Artificial Intelligence from Dot CSV
- Video: How to use ChatGPT in Datalytics data engineering
- Course: Artificial Intelligence subtitled from Columbia University
? business intelligence
- Videos: Google Business Intelligence Certificate subtitled from Google Career
- Videos: Business Intelligence for Everyone! by PEALCALA
DataViz
Data Processing
In this section is the heart of data engineering, we will see what data pipelines are, what an ETL is, orchestrators, and more. In addition, I leave a list of key concepts that I will update with their respective resources in the future. If you are interested in learning them in detail, you can search the books uploaded in the repository.
- Channel: CodinEric
- Channel: Data Engineering LATAM
- Channel: Datademia
- Channel: Datalytics
- Blog: Start (English)
- DataWars Learning Platform
? ETL and Data Pipelines
- Video: Data Engineering: Journey to the Heart of RockingData Data Projects
- Video: How to become a real Data Engineer? by Databits
- Videos: Data Preprocessing in Python by Rocio Chavez
- Videos: Data Preprocessing in R by Rocio Chavez
- Video: A/B Testing: Data, Not Opinions from SantanDev
- Incremental loads
- Messaging queues
- Cron Expressions
❄️ Advanced databases
- Relational model
- Dimensional model
- Facts and dimensions
- Datalake, Datamart, Datawarehouse and Dataqube
- Column-based and row-based layout
- Star and snowflake schemes
- On read and on write schemes
? Orchestrators
- Videos: Airflow from Data Engineering LATAM
- Video: Automating ideas with Apache Airflow - Yesi Díaz from Software Guru
- Videos: Pentaho Spoon by LEARNING-BI
- Videos: Luigi subtitled by Seattle Data Guy
- Reading: Microsoft's Azure Data Factory
? Architectures
- Batch data processing
- Real-time processing or streaming
- Lambda and kappa architectures
- Reading: Key differences between AWS OLAP and OLTP
- Video: Build ETL in batch and streaming with Databits Spark
- Reading: Atlassian Virtual Machines and Containers Comparison
- Videos: Peeling Nerd's Docker
- Videos: Kubernetes by Pelado Nerd
- Reading: What is a distributed system? by Atlassian
- Videos: Spark from Data Engineering LATAM
- Video: Infrastructure as code for Spark Mexico data engineering
- Videos: Apache Spark by NullSafe Architect
- Videos: Apache Kafka by NullSafe Architect
? Testing
- Video: Great Expectations: Validate Data Pipelines like a Pro by CodingEric at PyConAr 2020
- Video: ETL Testing and its Automation with Python by Patricio Miner at #QSConf 2023
Cloud
It is useful to have knowledge of cloud computing. At this point, I would recommend considering preparing official certifications. Although these exams usually have a cost, you can find free and official preparation resources from the best-known providers in the industry.
☁️ Cloud Basics
- Video: Datahack Cloud Computing Fundamentals
- Reading: Discover the advantages and disadvantages of the Platzi cloud
- Reading: Architecture for Big Data in the Cloud by Platzi
Official certifications
- Google Cloud Data Engineering
- Videos: Google Cloud (GCP) from Learning Big Data
- Microsoft Azure Data Engineering
- Videos: Azure by Data Engineering LATAM
- Videos: Azure Certifications from Learning Big Data
- Data engineering with Microsoft Azure Fabric
- AWS Data Engineering
- Videos: AWS from Data Engineering LATAM
Job Search
Finally, I leave you some readings and videos that offer advice and experiences related to the job search in the systems field. Later, technical challenges and other resources related to the topic will be added.
? Tips
- Video: How to get your first job in data engineering? from Spark Mexico
- Videos: Work Tips for the IT world of TodoCode
- Videos: Essential to get started in the world of Maxi Program systems
- Thread: Tips for completing @natayadev's LinkedIn profile
- Thread: Tips for getting a remote job in IT from @natayadev
- Thread: How to create a neat and readable CV by @iamdoomling
- Thread: I leave you these tips to survive interviews with human resources from @iamdoomling
- Video: Programming in companies, startups or freelance. What is better? by @iamdoomling
- Video: I finished the programming bootcamp. Now what? by @iamdoomling
- Video: Work as a contractor from Argentina by @iamdoomling
- Podcast: DevRock by Jonatan Ariste
Technical challenges
- (2023) Repository: MoureDev Community Code Challenges
- (2024) Repository: MoureDev Community Programming Challenges Roadmap
In progress ?
If you found this repository useful, give me a star