Engineer, site reliability, monitoring, observability, loggingRoyal Caribbean Group

Workplace: MetroManila, Manila
Salary: Agreement
Work form: Full time
Posting Date: 16/11/2025
Deadline: 12/03/2022

This job has expired, you can refer to some similar jobs here:


You will focus on ensuring that technology services scale to meet performance requirements and agreements and can sustain current and future demands effectively. Position Summary: The Enterprise Monitoring and Logging team provides the tools and strategy to drive holistic end-to-end enterprise technology monitoring across a combination of people, processes, and tools. The team defines and implements a comprehensive standardized set of monitoring tools for use across the enterprise. You will play a role in helping a large enterprise rollout and utilize application performance monitoring tools to enhance Royal Caribbean Group's (RCG) ability to identify production issues early and drive them through to completion. The team democratizes the monitoring data and federates dashboard configuration and events management. We are implementing a modern automation platform using artificial intelligence and machine learning to facilitate automated issue resolution, faster triage, application, and infrastructure management. In this role, you will focus on ensuring that technology services scale to meet performance requirements and agreements and can sustain current and future demands effectively. Responsibilities include implementation of Application Performance Monitoring (APM) monitoring ( AppDynamics) on multiple technology stacks On-prem and Cloud ( e.g. OpenShift, AWS ), analyzing and troubleshooting application performance and scalability issues, and recommending optimizations, (reviewing application architecture and design, creating Dashboards in AppDynamics for various technology-specific monitoring metrics (OpenShift, Micro-services, Kubernetes, JVM, Apache, CloudWatch metrics, etc.), researching and monitoring current end-user response time as well as application and system performance and availability; planning performance improvement; and reporting on demand and performance of technology infrastructure, applications, shared platforms, and cloud with the ability to assess and summarize application performance and volumetrics trends and make recommendations. Essential Duties and Responsibilities: Work closely with application teams to onboard to AppDynamics and help to uplift their monitoring configuration. Debug/Triage various challenges incurred by Application teams integrating with AppDynamics Support DevOps and partner with application developers to find the best way to optimize the application performance. Work with various teams on learning how to best utilize Application Performance Monitoring to uplift both their development and production operations Identify, resolve, and call out application performance bottlenecks and challenges. Perform Product and OS upgrades for the platform during maintenance windows. Help to document, streamline, or otherwise automate processes required for onboarding applications, databases, and other middleware to AppDynamics Work with Analytics teams to extract AppDynamics data for consumption and use in anomaly detection and other types of analytics AppDynamics agent testing, packaging, and implementation. Develop automation solutions for AppDynamics installation and configuration. Developed automation solutions for Reporting Monitoring metrics. Provide right-sizing and optimization for microservices, APIs, PODS (OpenShift) configurations. Consult with business partners to resolve capacity and performance issues Instrument and Maintain APM tool - AppDynamics. Support multiple business services and solutions to identify service levels and capacity requirements. Anticipate and interpret business requirements to generate accurate demand trends and forecasts. Ensure that any applications deployed will properly scale and meet expected SLOs and SLAs for Performance, Availability, Scalability, and Stability. Perform APM tuning of applications in development and critical production applications as well as provide 3rd level performance support. Provide evaluation of application performance and capacity on system resources used to sustain business application volume processing. Work closely with a team of Performance Engineers to orchestrate, conduct, and participate in line of business performance testing analysis. Utilize diagnostic and monitoring tools to measure, detect, isolate, and resolve performance issues found during application development performance testing including measuring, monitoring, and capturing required infrastructure & application performance metrics, logs and reports. Identify application performance and scalability risks and mitigation for risks in a timely manner. Monitoring and reporting for existing applications/systems and infrastructure Create an automated solution for data collection and reporting for existing applications for Performance and Availability Create/Maintain Synthetic Monitoring Scripts using Thousand Eyes to support availability monitoring/alerting Management of APM tools (AppDynamics): Planning & Implementation, support and maintain APM tools including instrumentation, configuration, creation of dashboards and reports as well as provide deep-dive and root cause analysis on Performance issues. Documentation of best practices and maintain audit log. Project Risk Analysis and Assessment, Documentation, Mitigation. Perform performance troubleshooting for applications/systems Root cause analysis, heap dump, thread dump, and other log analysis, code profiling, event tracing, and resource analysis. Leverage tools such as Splunk and AppDynamics Responsible for APM Tuning for Cloud, Hybrid environments Application Performance Tuning (Java, Microservices, Containers/OpenShift, API's and microservices) & Monitoring Good understanding of microservices architecture and monitoring/alerting for APIs, Cloud, OpenShift, AWS Services, and Hybrid deployments. Web Application Performance and end-user experience (client-side browser profiling) - deep understanding of chrome dev tools. Strong working knowledge of a variety of technologies including but not limited to compute, Amazon Web Services (AWS), OpenShift, Kubernetes, WAS, Apache, JBOSS, Tomcat, etc. Qualifications, Knowledge, and Skills: At least three (3) years of relevant work experience A Bachelor's degree in Computer Science or any related field is preferred Excellent verbal and written English communication, presentation, and interpersonal skills Ability to apply technical expertise across business or disciplines Ability to think big picture and step back to understand the context of problems before applying analytical skills to address the issues at a more detailed level Highly motivated self-starter with excellent organizational and time management skills; ability to work with minimal direct supervision Strong consulting, relationship building, and collaboration skills Effectively drives results independently and in a cross-functional team environment Ability to effectively adapt to shifting priorities, demands, and timeline Experience with instrumenting agent-based and network-based APM tools for performance and availability monitoring like AppDynamics is a plus Pregaming language knowledge and hands-on experience with either Python, Shell Scripts, Java, or C++ is a plus Good SQL skills and database skills with SQL server Good understanding of Technology specific Performance Metrics and Alert thresholds Good programming and scripting skills Basic statistical analysis skills and graphical skills Experience with application tuning for optimal performance including JVM tuning Good understanding of virtualization on VMware, RHEV, and OpenShift Good understanding of Microservices Frameworks, platforms like OpenShift, Kubernetes, AWS, and technology-specific monitoring metrics Knowledge of Cloud Platforms (AWS, Pivotal, etc.) Strong Application and System architecture knowledge AppDynamics, SolarWinds, Splunk, and other toolset experience Mid-level working experience with different operating systems Experience with Web and Application Servers - WebSphere, NGINX, JBOSS, Apache, IIS Windows, IHS, Experience with SOA and Microservice architecture. Experience with shift left techniques, automation, java and web-based profiling tools, network profiling tools.
Monster

Other Info

Metro Manila
Permanent
Full-time

Submit profile

Royal Caribbean Group

About the company


Position Engineer, site reliability, monitoring, observability, logging recruited by the company Royal Caribbean Group at MetroManila, Manila, Joboko automatically collects the salary of , finds more jobs on Engineer, Site Reliability, Monitoring, Observability, Logging or Royal Caribbean Group company in the links above

About the company

  • Employer support:
  • +84 962.107.888