Data Engineer
At The National Archives, we are more than custodians of the past – we are pioneers in digital archiving, ensuring the UK's public records are robust, connected, and accessible for generations to come. As a Data Engineer in our small, dynamic Digital Archiving team, you'll help transform how archival data is managed, structured, and enriched, unlocking its value for public use and discovery.
This role is ideal for someone who enjoys designing and building efficient data workflows, developing tools to automate data processing, and collaborating with product teams and specialists to understand data needs and deliver practical solutions. You will bring strong programming skills and a good understanding of data modelling, transformation, and integration across a variety of formats and technologies.
You will work in a supportive, forward-looking team that values transparency, curiosity, and collaboration. You'll have opportunities to develop your skills in existing and emerging technologies that enhance public access to archives and promote the re-use of data in meaningful ways.
This role maps to the DDaT Data Engineer and Data Analyst roles with elements of the Data Scientist role.
Data processing and pipelines:
- Design and develop scalable, repeatable data pipelines (ETL/ELT) to process, transform and load archival data.
- Create scripts and tools for data cleansing, validation, standardisation, enrichment, and transformation.
- Automate routine data tasks to improve efficiency, accuracy, and consistency.
- Identify and manage connections between datasets, including those external to TNA.
- Research and implement tools for entity extraction and semantic tagging to improve the usability and findability of archival data.
Data modelling and integration:
- Develop and document schemas, ontologies, and other models that represent and structure archival data.
- Connect and integrate data from different internal and external sources, including structured and semi-structured formats (XML, CSV, JSON, RDF).
- Identify patterns and relationships within data to support better discovery and reuse.
Technical advice and collaboration:
- Develop excellent relationships with product teams, advocate for data-centric approaches to software development and help solve data problems.
- Maintain effective communication with external technical partners (e.g. suppliers), representing The National Archives to resolve complex data issues, clarify requirements, and ensure smooth delivery of digital archiving workflows
- Work closely with product teams, archivists, and developers to support data-centric solutions.
- Provide advice and input on data engineering approaches within the team.
- Share knowledge on data standards, validation rules, and transformation practices.
Quality, openness and growth:
- Work in the open and share code and documentation internally (and externally where appropriate).
- Participate in code reviews, technical discussions, and knowledge sharing.
- Contribute to building a data culture focused on reuse, quality, and continuous improvement.
Working conditions:
- Normal office environment
- Display Screen Equipment user
Person specification:
Essential criteria:
- Data manipulation: Strong experience of accurate, reliable, large-scale analysis and manipulation of complex datasets using relevant programming languages and tools (e.g. Python, Shell scripting, SQL, XSLT).
- Data formats: Solid understanding of working with different data formats (e.g. XML, CSV, JSON, RDF).
- Database technologies: Practical knowledge of database technologies (e.g. RDBMS, noSQL, graph databases, linked data stores).
- Analysis and problem solving: Excellent analytical skills, with a structured and proactive approach to solving technical problems. Able to apply a range of techniques to capture, document and communicate requirements, issues, designs and solutions.
- Communication and relationships: Strong relationship building and communication skills, with an excellent user focus. Able to work collaboratively in multidisciplinary teams and explain technical issues clearly to non-technical colleagues.
- Prioritisation and organisation: Ability to work with high accuracy, attention to detail and organisation, both independently and as a project team member. Able to prioritise competing tasks and deliver high quality work to agreed deadlines.
Desirable criteria:
- Awareness of the value and meaning of archival data and the ethical responsibilities that come with working with public records.
- Understanding of probabilistic, messy, or uncertain data handling.
- Experience in Agile development environments.
- Familiarity with archival principles or metadata standards (e.g. EAD, Dublin Core).
- Experience in semantic web technologies and entity extraction tools.