Data Engineers are responsible for building data systems that collect, manage, and convert raw data into usable information. They have expert knowledge of data engineering and programming and use their skills to improve data reliability and quality by combining raw information from different sources to create consistent and machine-readable formats. Data engineers are required to have high-level technical skills, including a deep understanding of SQL database design and a variety of programming languages such as Java and Python.
Data engineers work with various data professionals, such as software developers, database architects, data analysts, and data scientists, to ensure optimal and consistent data delivery architecture is applied to all ongoing projects. They must have excellent problem-solving skills, good communication skills, and exceptional technical knowledge in a range of fields, including software engineering and programming languages.
Sample job description #1
As a Data Engineer, you will be collaborating to build a robust and highly performant data platform using cutting-edge technologies. You will develop distributed services that process data in batch and real-time with a focus on scalability, data quality, and business requirements.
Must have skills
- Identify and implement improvements to our data ecosystem based on industry best practices
- Build, refactor and maintain data pipelines that ingest data from multiple sources
- Assembling large, complex sets of data that meet non-functional and functional business requirements
- Build ETL Pipelines. Build and support the tools we use for monitoring data hygiene and the health of our pipelines.
- Automate processes to reduce manual data entry
- Ability to work with semi-structured and unstructured data
- Interact with data via APIs. Knowledgeable on the creation of API endpoints
Requirements
- Bachelor’s degree in Computer Science, Software Engineering or related field required or equivalent combination of industry related professional experience and education
- Minimum 3 years in SQL and Python
- Azure or Amazon storage solution
- Experience building ETL Pipelines using code or ETL platforms
- Experience with Jira and Confluence
- Working knowledge on Relational Database Systems and concepts
Sample job description #2
We’re looking for a strong, technically sound Data Engineer who is interested in working within a startup-oriented environment while having the backing of a large company. If that’s you, please read on.
You will
- Work with cross functional partners – Data Scientists, Engineers, Product Managers to understand and deliver data needs.
- Champion code quality, reusability, scalability, security and help make strategic architecture decisions with the lead engineer
- Design, build and launch extremely efficient and reliable data pipelines to move data across a number of platforms including Data Warehouse, online caches and real-time systems.
- Build product-focused datasets and scalable, fault-tolerant pipelines.
- Build data quality checks, data anomaly detection, and optimize pipelines for ideal compute storage
Required experience and skills
- 3+ years of experience as a Data Engineer writing code to extract, ingest, process and store data within SQL, NoSQL and MPP databases like Snowflake
- Strong development experience with Python (or Scala/Java)
- Experience with complex SQL and building batch and streaming pipelines with Apache Spark framework.
- Knowledge of schema design and dimensional modeling.
- Experience with data quality checks, data validation and data anomaly detection.
- Experience with workflow management engines like Airflow
- Experience with Git, CI/CD pipelines, Docker, Kubernetes
- Experience with architecting solutions on AWS or similar public clouds,
- Experience with offline and online feature engineering solutions for Machine Learning is a plus.
Sample job description #3
As a data engineer, you will extend and maintain the data pipelines that feed our ever growing data lake. Join a small autonomous team responsible for this data lake and its ingress and egress pipelines. Through this data lake and its data pipelines you will be providing immensely important data to internal business analysts, data scientists, leadership, as well as content partners in a multi-billion dollar industry.
Who is the role reporting to? Engineering Manager
Requirements
- BS/MS in computer science or equivalent experience in data engineering.
- You love different types of data. i.e. content metadata, viewership metrics, etc.
- You love to solve difficult and interesting problems using data from various systems.
- You have experience developing and maintaining software in Python.
- You have experience with data pipelines that process large data sets via streams and/or batches.
- You have experience in building services, capable of handling large amounts of data.
- You have experience building and maintaining tests (unit, integration, etc.) that provide necessary quality checks. TDD experience is a plus.
- You have experience with modern persistence stores, primarily SQL; however NoSQL experience is a plus.
- You embrace best practices via pair programming, constructive code reviews, and thorough testing.
- You thrive in an environment with rapid iterations on platform features.
- You’re a team player and work well in a highly collaborative environment, which includes staff in remote locations.
Details
- As a member of our team, you will:
- Be responsible for designing, building and supporting components that compose the data lake and its pipelines.
- Help build and extend our data lake by designing and implementing: data pipeline libraries and systems, internal analytics tooling / dashboards, and monitoring and alerting dashboards.
- Provide support for the data pipelines including after-hours support on a rotational basis.
- Work in a collaborative environment with other data engineers, data scientists, and software engineers to achieve important goals for the company.
Average salary and compensation
The average salary for a data engineer is $112,300 per year in the United States. Salary ranges can vary depending on education, certifications, additional skills, the number of years of experience.
Location | Salary Low | Salary High |
---|---|---|
Phoenix, Arizona | $118,250 | $144,500 |
Los Angeles, California | $133,400 | $163,050 |
Denver, Colorado | $111,150 | $135,850 |
Washington, DC | $135,400 | $165,500 |
Miami, Florida | $110,650 | $135,250 |
Orlando, Florida | $102,050 | $124,750 |
Tampa, Florida | $103,050 | $126,000 |
Atlanta, Georgia | $108,150 | $132,150 |
Chicago, Illinois | $124,300 | $151,900 |
Boston, Massachusetts | $134,400 | $164,250 |
Minneapolis-St. Paul, Minnesota | $107,100 | $130,900 |
New York City, New York | $141,450 | $172,900 |
Philadelphia, Pennsylvania | $115,200 | $140,800 |
Dallas, Texas | $112,150 | $137,100 |
Houston, Texas | $111,650 | $136,500 |
Seattle, Washington | $129,300 | $158,100 |
National Average | $101,050 | $123,500 |
Sample interview questions
- Which ETL Tools are you familiar with?
- What skills are important for a data engineer?
- What data engineering platforms and software are you familiar with?
- Which computer languages do you have experience using?
- How do you create reliable data pipelines?
- What is the difference between structured and unstructured data?
- How would you deploy a big data solution?
- Have you engineered a distributed system? How did you engineer it?
- Have you used data modeling?
- Which frameworks and applications are essential for a data engineer?
- Are you more database or pipeline-centric?
- How would you validate a data migration from one database to another?
- What are the pros and cons of cloud computing?
- How would you prepare to develop a new product?
- Which Python libraries would you use for efficient data processing?
- How would you deal with duplicate data points in an SQL query?
- How would you plan to add more capacity to the data processing architecture to accommodate an expected increase in data volume?
- What is the difference between relational vs. non-relational databases?
- Can you explain the components of a Hadoop application?