Netflix, the popular video-streaming service that takes up a third of all internet traffic during peak traffic hours isn't just the single largest internet traffic service. Netflix, without doubt, is also the largest pure cloud service. Netflix, with more than a billion video delivery instances per month, is the largest cloud application in the world.
At the Linux Foundation's Linux Collaboration Summit
in San Francisco, California, Adrian Cockcroft, director of
architecture for Netflix's cloud systems team, after first thanking
everyone "for building the internet so we can fill it with movies", said
that Netflix's Linux, FreeBSD, and open-source based services are
"cloud native".
By this, Cockcroft meant that even with more than a billion video instances delivered every month over the internet, "there is no datacenter behind Netflix". Instead, Netflix, which has been using Amazon Web Services since 2009 for some of its services, moved its entire technology infrastructure to AWS in November 2012.
Specifically, depending on customer demand, Netflix's front-end services are running on 500 to 1,000 Linux-based Tomcat JavaServer and NGINX web servers. These are empowered by hundreds of other Amazon Simple Storage Service (S3) and the NoSQL Cassandra database servers using the Memcached high-performance, distributed memory object caching system. All of this, and more besides, are distributed across three Amazon Web Services
availability zones. Every time you visit Netflix either with a device
or a web browser, all these are brought together within a second to show
you your video selections.
According to Cockcroft, if something goes wrong, Netflix can continue
to run the entire service on two out of three zones. Netcraft didn't
simply take Amazon's word for this. They tested out total Amazon Elastic
Compute Cloud (EC2) failures with its open-source Chaos Gorilla
software. "We go around trying to break things to prove everything is
resistant to it," said Cockcroft. Netflix, in concert with Amazon, is
working on multi EC2 region availability. Once in place, an entire EC2
zone failure won't stop Netflix videos from flowing to customers.
That won't be easy though. It's not so much that the problem is replicating videos and services across the EC2 zones. Netflix already has its own content delivery network (CDN), Open Connect,
and servers placed at local ISP hubs for that. No, the real problem is
setting the Domain Name System (DNS) so that users are directed to the
right Amazon zone when one is down. That's because Cockcroft said, DNS
provider wildly different application programming interfaces (API)s, and
they're designed to be hand-managed by an engineer and thus are not at
all easy to automate.
That isn't stopping Netflix from addressing the problem just because
it's difficult. Indeed, Netflix plans on failure. As Cockcroft titled
his talk, Netflix is about dystopia as a service. The plan isn't if
something will fail on the cloud, it's on how to keep working no matter
how the clouds or specific services fail. Netflix's services are
designed to, when something go wrong, gradually degrade rather than fail
completely.
As he said, sure, perfection, utopia would be great, but if you're
always striving for perfection, you always end up compromising. So
instead of striving for perfection, Netflix is continuously updating its
systems in real time rather than perfecting them. How fast is that?
Netflix wants to "code features in days instead of months; we want to
deploy new hardware in minutes instead of weeks; and we want to see
instant responses in seconds instead of hours". By deploying on the
cloud, Netflix can do all of this.
Sure, sometimes, this doesn't work. In December 2012, for example, a
failure in AWS's Elastic Load Balancer in the US-East-Region1 datacenter
brought Netflix down during the Christmas holiday.
On the other hand, the Netflix method of producing code sooner rather
than later, and running in such a way that the service keeps going even
though some components are — not may, but are — broken and
inefficient at any given time, has produced a service that is capable of
being the single largest consumer of internet bandwidth. Clearly, it's
not perfect, but Netflix's design decision to "create a highly agile and
highly available service from ephemeral and often broken components" on
the cloud works, and as far as Netflix is concerned, for day to day
cloud-based video delivery, that's much better than "perfection" could
ever be. Related stories
No comments:
Post a Comment