Datadog is a monitoring and analytics platform for cloud applications. It provides monitoring of servers, databases, tools, and services through a SaaS-based data analytics platform. Datadog is used by developers, IT operations teams, and business users to monitor and analyze the performance of their technology stack. Some key things that Datadog does include:
Infrastructure Monitoring
Datadog allows you to monitor servers, virtual machines, containers, databases, tools, and more across dynamic environments including cloud, on-premise, and hybrid. It auto-discovers infrastructure components and begins monitoring them without requiring manual setup. Key things you can monitor related to infrastructure include:
- Server resources – CPU, memory, disk, I/O
- Virtual machine performance
- Container health and resource usage
- Cloud provider services – RDS, Lambda, Elasticache, etc.
- Network traffic and bandwidth
- Log monitoring
Datadog ingest metrics, events, and logs from your infrastructure components. It lets you view granular real-time performance data as well as historical trends and patterns. Alerts can be configured to notify you of critical issues or anomalies.
Application Performance Monitoring
Datadog enables you to instrument your applications to monitor performance and understand user behavior. This includes:
- Tracing requests across distributed architectures
- Monitoring response times and latency
- Tracking errors and exceptions
- Analyzing user sessions and behavior flows
- Gathering performance metrics from any application language
Datadog’s APM tools provide insight into how your applications are performing in production. You can use traces to visualize request flows across microservices, pinpoint poor performing requests, and troubleshoot errors. Metrics expose underlying resource issues. User analytics uncover usage patterns to optimize applications.
Visualization and Dashboards
Datadog allows you to visualize metrics, events, traces, logs, and more via custom dashboards. Dashboards enable you to:
- Create graphs, lists, heatmaps, and more from your monitoring data
- Build dashboard templates and widgets to reuse visualizations
- Customize layouts for different teams and purposes
- Share dashboards and collaborate with comments
Dashboards provide a unified view into the health and performance of infrastructure and applications. You can quickly pivot between high-level overview to detailed views into issues. Datadog also includes specialized dashboards for monitoring networks, databases, containers and more out of the box.
Alerting
Datadog allows flexible alerting across metrics, events, logs, network monitors, and more. Alerts can notify you of issues via:
- PagerDuty
- Webhooks
- Slack
- And more…
Alerts are highly customizable, allowing you to define thresholds, notifications channels, times of day to notify, and more granular criteria. This enables proactive monitoring, reducing mean time to detection and resolution when issues occur in your environment.
Log Management
Datadog ingests logs from all your systems, indexes them, and allows powerful analysis including:
- Live tailing logs in real-time
- Filtering and search through all logs with instant results
- Clickable links to pivot from logs to metrics and traces
- Aggregations and analytics across your log events
- Alerting on log events
This provides a unified view across your logging and monitoring, allowing you to pivot from metrics and traces to relevant logs that provide additional context. Datadog’s log management simplifies troubleshooting and security analysis.
Network Monitoring
Datadog allows monitoring of network performance and security. This includes:
- Network mapping and visualization
- Traffic and bandwidth monitoring
- Network flow analysis
- Domain Name System (DNS) monitoring
- Dynamic network security monitoring
Datadog provides visibility into network traffic patterns, volumes, drops, connectivity between systems, and security profiles. Abnormal network activity can trigger alerts for real-time investigation.
Integrations
Datadog seamlessly integrates with technologies across the modern tech stack including:
- Infrastructure – AWS, Azure, GCP, Kubernetes, containers, OS
- Monitoring – Prometheus, StatsD, CollectD, Nagios, Graphite
- Logging – Logstash, Fluentd, Kafka, S3, CloudWatch
- Service apps – Slack, PagerDuty, email, Jira, webhooks
- Databases – MongoDB, MySQL, PostgreSQL, Aurora, Redis
- Frameworks – .NET, Java, Node.js, Python, Go, Ruby
- And much more…
With 400+ built-in integrations, Datadog can connect seamlessly with your existing infrastructure and apps to provide end-to-end visibility.
APIs
Datadog provides APIs and SDKs to enable programmatic configuration, metrics ingestion, and automation. This allows you to:
- Leverage Datadog alerts and dashboards in code
- Build custom integrations to Datadog
- Automate error handling, notifications, and more
- Extend Datadog capabilities into your systems
Any actions you can take in the Datadog UI can be executed and automated via API. This enables robust automation for monitoring workflows.
Collaboration
Datadog provides collaboration features to streamline communication between teams including:
- Comments on dashboards, monitors, and graphs
- Team assignment for shared graphs and monitors
- Dashboard sharing and permissions
- @-mentions in comments for notifications
- Integration with Slack, PagerDuty, Jira, email, and more
Monitoring and observability become collaborative efforts between developers, ops, and business users. Datadog surfaces its capabilities through the tools your teams are already using like Slack, PagerDuty, and Jira.
Security Monitoring
Datadog provides visibility into security across your infrastructure and applications including:
- Detection of network security threats
- Monitoring of user behavior and potential insider threats
- Security automation and orchestration
- Audit logging of changes to systems
- Vulnerability analysis
Datadog can ingest security signals from specialized tools like firewalls and SIEMs. It correlates security events with performance and behavior data to provide unified security monitoring and analytics.
Business Analytics
Datadog allows correlations between monitoring data and key business KPIs, enabling:
- Tracking of business metrics based on monitoring signals – e.g. customer signups, revenue, funnel events
- Monitoring of workflows and processes critical to the business
- Reporting on how technology performance impacts business performance
This provides a cross-functional view of how systems are supporting core business outcomes. Business users gain visibility into the technology performance powering the business.
Machine Learning
Datadog applies machine learning algorithms to detect anomalies, correlations, and patterns including:
- Anomaly detection on metrics to automatically surface issues
- Automated correlation of related events across logs, metrics, traces
- Pattern recognition to fingerprint baseline system behavior
- Forecasting future resource utilization based on historical data
Machine learning provides intelligent alerting, advanced troubleshooting, and capacity planning. Datadog’s algorithms automatically adapt to your systems and data to reduce noise.
Synthetic Monitoring
Datadog lets you actively monitor your applications and sites from outside your network via synthetic tests. Tests include:
- API tests
- Web browser tests
- Multi-step business workflow tests
- Network connectivity tests
- External site performance tests
Synthetic tests run at regular intervals from points around the globe to validate availability, performance, and functionality for customers. Advanced tests can replicate user journeys through your applications.
Incident Management
Datadog provides workflow automation for faster incident response including:
- Integrated on-call notification and escalation via PagerDuty, Slack, etc.
- Playbooks to standardize incident processes
- Dedicated alerts page to filter and manage open incidents
- Collaboration via @-mentions and comments on alerts
- Post-incident review
This enables formal incident management that brings together monitoring data, people, and processes to quickly mitigate downtime and service degradation when issues occur.
Conclusion
In summary, Datadog provides an end-to-end monitoring and analytics platform across infrastructure, applications, logs, networks, and business data. It connects directly with all your systems to gather metrics, events, and logs in one unified platform. Datadog delivers powerful visualization, alerting, collaboration, and automation capabilities leveraging machine learning and analytics. This enables observability across both engineering and business teams to understand system performance, troubleshoot issues, and optimize workflows. Datadog transforms monitoring data into objective insights and actions to improve operational performance.