Full stack Engineer (Web Crawling and Automation)
Full-time
Remote
Time is our most precious resource. Recs is a recommendation layer built to help people spend theirs on what's actually worth it. We cut through the noise so people can find what fits their lives, make better choices, and avoid wasting time on things that don't deliver.
About the role
We’re looking for a Senior Full Stack Engineer focused on web crawling and automation to build API-driven, end-to-end data acquisition systems that interact with real-world websites using appropriate, target-specific interaction strategies.
Our crawling systems power other internal services via on-demand REST APIs. They handle a wide range of targets, including openly accessible websites, login-protected platforms, and paywalled content. Depending on the site, crawlers may use anything from high-throughput programmatic access to more stateful or behavior-aware automation when required.
This role goes beyond basic browser automation. You’ll own the full lifecycle of crawled data (from interaction and extraction to parsing, normalization, indexing, and API delivery) ensuring the data is reliable, structured, and usable by downstream systems.
What you’ll do
Before diving into the technical responsibilities, here are the traits we value most:
Candor: You communicate directly and honestly in service of better outcomes.
Conscientiousness: You take ownership, respect teammates, and build systems others can rely on.
First-principles thinking: You question assumptions and make decisions grounded in evidence.
In this role, you will:
Design and build end-to-end web crawling systems exposed as REST APIs
Implement browser-based automation using tools such as Playwright, Puppeteer, or similar
Build crawlers that adapt their interaction model based on the target, which may include:
Programmatic, high-throughput access where appropriate
Session-aware and stateful navigation flows
Realistic timing or behavior-aware interaction when required
Handle sites that require:
Authentication and login workflows
Persistent sessions and identity management
JavaScript-heavy or dynamically rendered content
Develop robust parsing and extraction pipelines to convert raw web data into structured formats
Design and maintain data normalization, enrichment, and validation workflows
Implement indexing strategies to make crawled data searchable, performant, and reliable
Build backend services and APIs that expose crawled and indexed data to internal consumers
Monitor, debug, and improve crawl correctness, stability, and cost efficiency
Collaborate with other engineers to integrate crawling pipelines into larger product workflows
Contribute to CI/CD pipelines, observability, and operational tooling
Who you are
You think of crawling as a system, not a script. You understand that different targets require different approaches and enjoy reasoning about trade-offs between speed, reliability, realism, and cost.
You’re comfortable debugging non-deterministic failures, working with imperfect or inconsistent data, and owning systems end-to-end, from first request to final API response.
You care about data quality and long-term maintainability. You think about schemas, indexing, and downstream consumers as part of the core problem, not an afterthought.
Required qualifications
Strong professional experience with JavaScript/TypeScript and/or Python
Proven experience building production-grade crawling or browser automation systems
Hands-on experience with Playwright, Puppeteer, Selenium, or similar
Experience designing API-driven crawling services
Strong understanding of:
Browser behavior and JavaScript execution
Sessions, cookies, headers, and authentication flows
Experience building parsing, normalization, and data processing pipelines
Backend experience building services and REST and/or GraphQL APIs
Experience working with relational and/or NoSQL databases
Proficiency with Git and collaborative development workflows
Nice-to-have skills
Experience designing adaptive interaction strategies for complex or sensitive websites
Experience crawling large or complex platforms (dynamic, authenticated, or paywalled)
Search and indexing systems (Elasticsearch, OpenSearch, or similar)
Distributed or queue-based processing systems
Experience with Rust for performance-critical components
Containerization and cloud infrastructure (Docker, AWS, GCP, or similar)
Observability tooling (logging, metrics, tracing)
CI/CD pipeline experience
Experience integrating AI/ML services into extraction, enrichment, or classification workflows
What we offer
A high-trust, remote-first engineering culture
End-to-end ownership of complex, business-critical systems
A team that values clear thinking, technical rigor, and direct communication
Room to influence architecture and technical direction
Competitive compensation based on experience and impact