As a Hadoop Site Reliability Engineer (SRE), you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure, and reducing work through automation. You'll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you'll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an Hadoop SRE, you'll be focused on running better production applications and systems.
Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents.
Identify application patterns and analytics in support of better service level objectives.
Design self-healing and resiliency patterns.
Act as the subject matter expert in the area of Hadoop Site Reliability Engineering.
Assess the current landscape and book of work, and partner with various teams to identify key areas for infrastructure automations, configuration management, monitoring, alerting, etc.
Continuously work on designing and improving processes of detecting and responding to production service outages, and build preventive solutions.
Produce and improve on availability and performance metrics for services
Build internal knowledge base to educate partners and support teams.
Participate in 24x7 support coverage as needed.
Bachelor's degree or equivalent experience.
5+ years of hands on working experience on Cloudera Hadoop infrastructure stack (Ex: HDFS, HBase, Impala, Spark, YARN, etc).
Proven track record of designing highly available platforms and services supporting various types of workloads.
Experience in designing fail-over processes and solutions.
Strong programming and scripting skills - Java, shell scripts, Python, etc.
Analytical thinker able to assess various aspects to methodically arrive at a solution.
Individual with hands on experience in gathering performance metrics, troubleshooting, tuning, monitoring, etc.
Expertise with automations including latest technologies in the space.
Good understanding of security concepts and best practices.
Excellent written and verbal communication skills.
Good team player interested in sharing knowledge and cross-training other team members and shows interest in learning new technologies and products.
Experience managing vendor interactions for troubleshooting sessions, enhancement requestsJPMorgan Chase & Co., one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as any mental health or physical disability needs.
Equal Opportunity Employer/Disability/Veterans
It's easy, and free! Add jobs from any website! Get recommendations from your friends! Start by adding this job...