✨ About The Role
- The Web Scraping Engineer will be responsible for designing and implementing the architecture of a large-scale crawling system.
- The role involves maintaining various components of the data acquisition infrastructure, including building new crawlers and maintaining existing ones.
- The engineer will develop tools to facilitate scraping at scale and monitor the health of crawlers to ensure data quality.
- Collaboration with product and business teams is necessary to understand and anticipate requirements for data gathering systems.
- The position requires maintaining all aspects of a scraping pipeline, from building and maintaining spiders to monitoring their health and performance.
⚡ Requirements
- The ideal candidate will have over 3 years of experience with Python, particularly in data wrangling and cleaning.
- A strong background in data crawling and scraping at scale, with experience managing over 100 spiders, is essential.
- Familiarity with various scraping libraries and monitoring tools, such as BeautifulSoup and Selenium, is highly recommended.
- The candidate should possess experience in bypassing bot detection techniques and protecting web scrapers against common issues like site bans and IP leaks.
- Experience with cloud environments like GCP or AWS, as well as containerization tools like Docker, is important for this role.