We are looking for passionate and highly collaborative SREs to join our international Infrastructure team, where one will get the opportunity to work on interesting and challenging projects and help us build and maintain reliable end-to-end infrastructure solutions.
The SRE team at OLX Group is responsible for reliability, scalability, availability, latency, performance, efficiency, monitoring, alerting, emergency response, and capacity planning of the OLX platform that caters to multiple countries. We are developing tools and optimizing strategies that make the end-user experience smooth and stable.
- Build and maintain end-to-end infrastructure solutions.
- Write and maintain Infrastructure as Code (IaC).
- Develop new tools/platforms for deployment flows, monitoring/dashboarding, alerting, incident management, observability, automation, security, horizontal requirements, and much more.
- Manage and improve the whole lifecycle of microservices - deployments, architecture, operations, security, performance tuning, etc.
- Diagnose, resolve, and prevent production issues. Perform RCAs/Postmortems and also automate the process, wherever required.
- Introduce new technologies and tools that could help in faster and more efficient development, keeping reliability, scalability, resiliency, and availability in mind.
- Architect solutions and design robust pipelines (including CI/CD and Data pipelines).
- Provide on-call support on rotation basis.
- Write proper documentation and publish system design / architectural blueprints. Promote best practices and standards. Do regular code reviews.
- Work on infrastructure cost optimization and sustainability.
- Work towards effective SLIs/SLOs.
- Be open to exploring and adapting to new technologies/tools/languages/trends.
- B.Tech/B.E in Computer Science or a related technical discipline with 4+ years of relevant experience.
- Expertise in building reliable, scalable, highly-available, and resilient API driven platforms, preferably in Java, Python, Node.js or Golang.
- Strong grasp over Data Structures, Algorithms, and Design Patterns.
- Understanding of Unix/Linux operating systems, system administration, and networking stack (TCP/IP, NAT, DNS, SSL, iptables, routing, network topologies and protocols).
- Experience in designing, analyzing, and troubleshooting large-scale distributed systems and cloud-based architectures.
- Extensive knowledge of cloud infrastructure, operations, networking, resources, and services is a must; experience with Amazon Web Services (AWS) or Google Cloud Platform (GCP) is required.
- Working knowledge of containers (Docker or rkt) and orchestration systems like Kubernetes is mandatory.
- Proven track record of working with microservice-based architecture in production environments. Should have solid understanding of microservice design patterns like Circuit Breaker Pattern, Aggregator Pattern, etc.
- Good knowledge of monitoring, alerting, and dashboarding tools like Prometheus, Sensu, Grafana, New Relic, Datadog, AppDynamics, Instana, PagerDuty, VictorOps, OpsGenie, etc.
- Experience of writing Infrastructure as Code (IaC) using Terraform.
- Very strong hands-on experience of building CI/CD Automation Pipelines using platforms like GitLab, GitHub, Jenkins, Spinnaker, etc.
- Demonstrated experience of working towards effective SLIs/SLOs.
- Ability to perform application performance tuning and reason about security and process interaction.
- Solid understanding of VCS (GIT, SVN, etc.).
- [Bonus] Good to have experience with ELK Stack (Elasticsearch, Logstash, Beats, and Kibana).
- [Bonus] Hands-on experience with distributed systems like Kafka, RabbitMQ, Redis, Aerospike, Airflow, ZooKeeper, Solr, Elasticsearch, etc.
- [Bonus] Decent experience with at least one of the Relational Databases like MySQL or PostgreSQL and at least one of the NoSQL Databases like MongoDB, Cassandra, DynamoDB, etc. Should know about database clustering, management, upgrade process, disaster recovery mechanisms, performance, scalability, high availability, and reliability.
- [Bonus] Working knowledge of Helm and Helmfile/Helmsman.
- [Bonus] Experience in developing Custom Controllers and Dynamic Admission Controllers in Kubernetes.
- [Bonus] Experience with OpenTelemetry (OpenTracing, OpenMetrics, and OpenCensus).
- Fluent in both written and spoken English.
- Competitive compensation and additional benefits/perks.
- Opportunity to contribute to the global OLX Group.
- Collaborative learning and abundance of learning resources that would help you become better every day.
- Company mobile phone.
- Laptop of your choice: MacBook Pro, Windows, or Linux.
IT / Computers - Software
OLX People is a end-to-end, tech-based Staffing & Recruitment Services Provider. Onestop to connect employers, consultants and job seekers. Our goal is to make hiring quick, easy and convenient for everyone involved.
At OLX we improve people’s lives by bringing them together for win-win exchanges. Sellers win. Buyers win. Communities win. OLX wins. And most of all, you win. Being part of something special. In an open culture that encourages own initiatives, growth and entrepreneurial spirits.
We believe in creating a richer world by empowering people. Every day millions of people across the globe use OLX to buy and sell goods. We serve over 350 million unique users every month. We generate traffic of over 11+ billion monthly page views in more than 40
We have a clear ambition – to shape the future of trade to unlock the hidden value in everything. This inspiring purpose has helped us build one of the world’s leading Internet companies. This combination of business and purpose attracts a certain type of person. If
you’ve got an entrepreneurial drive and think you can use that drive to contribute to our purpose, then we’d love to empower you to be successful. Working at OLX means being part of an entrepreneurial culture that is open to new ideas, new ambition and new opportunities.
You will be joining a fast-growing organisation that empowers you to be successful and improve the lives of others.