Facebook Thrift
Thrift is a software library
and a set of code generation tool which was developed at the Facebook Office at
Palo Alto, California, to expedite development and implementation of scalable
and efficient backend services.
The primary goal of thrift is enable efficient and reliable communication across programming languages by abstracting the portions of each language that tend to require the most customization into a common library that is implemented in each language. This is done by allowing the users to define the data types and service interfaces in a common Interface Definition Logic File (IDL File) which is supposed to be language neutral file and it generates all the necessary code to build Remote Procedure Calls to clients and servers. This report explains the design choices and implementation level details and also tries to demonstrate a sample Thrift Service.
The whole concept of Thrift
stemmed out from the fact that a new direction was required to tackle the
resource demands problems for many of Facebook's on-site applications, which
couldn’t be addressed by staying within the LAMP framework. LAMP is the acronym
for Linux, MySQL, Apache and PHP. When Facebook was being laboriously designed,
it was done from ground up using this LAMP framework. By 2006 Facebook was
widely accepted all over the world as the social networking site and
consequently its network traffic also grew giving rise to the need for scaling
its network structure for many of its onsite applications like, search, ad
selection and delivery and event logging.
Scaling these operations to
match the resource demands was not possible within the LAMP framework. In their
implementation of creating many of these services like search, event logging
various programming languages had been selected to optimize for the right
combination of performance, ease and speed of development, availability of
existing libraries etc. Also a large portion of the Facebook's culture has
always preferred to choose the best tools and implementations over the standardizing
on any one programming language and begrudgingly accepting its inherent
limitations. Most of the programming languages either suffered from subpar
performance or constrained data type freedom. Given all these technical
challenges and design choices, the engineers at Facebook were presented with a
herculean task of building a scalable, transparent and high performance bridge
across various programming languages.
Thrift Design
Features
The primary idea behind Thrift
is that it consists of a language neutral stack which is implemented across
various programming languages and an associated code generation engine which
transforms a simple interface and data definition language into client and
server remote procedure call libraries. Thrift is designed to be as simple as
possible for the developers who can define all the necessary data structures
and interfaces for a complex service in a single short file. This file is
called as Thrift Interface Definition Logic File or Thrift IDL File. The
developers identified some important features while evaluating the technical
challenges of cross language interactions in a networked environment.
Types:
A common type system should
exist across all the programming languages without requiring the need for the
developers to write their own serialization code. Serialization is the process
of transforming an object of one type to another. For example if a programmer
has written an application implementing a strongly typed STL map for a Python
dictionary. Neither programmer should be forced to write any code below the
application layer. Dictionary is a data type in Python which allows sequencing
a collection of items or elements using keys. It is very similar to
'Associative Arrays'.
Transport:
Each language must have a
common interface to bidirectional raw data transport. Consider a scenario where
there are 2 servers in which, one is deployed in Java and the other one is
deployed in Python. So a typical service written in Java should be able to send
the raw data from that service to a common interface which will be understood
by the other server which is running on Python and vice-versa. The Transport
Layer should be able to transport the raw data file across the two ends. The
specifics about how this transport is implemented shouldn’t matter to the
service developer. The same application code should be able to run against TCP
Stream Sockets, raw data in memory or files on disk.
Protocol:
In order to transport the raw
data, they have to be encoded into a particular format like binary, XML etc.
Therefore the Transport Layer uses some particular protocol to encode or decode
the data. Again the application developer will not be bothered about this. He
is only worried whether the data can be read or written in some deterministic
manner.
Version:
For the services to be robust
they must evolve from their present version. They should incorporate new
features and in order to do this the data types involved in the service should
provide a mechanism to add or delete fields of an object or alter the arguments
list of a function without any interruption in service. This is called Version.
Processors:
Processors are the ones which
process the data streams and accomplish Remote Procedure Calls.
Thrift allows programmers to
develop completely using thrift's native data type rather than using any
wrapper objects or special dynamic types. It also does not require the
developer to write any serialization code for transport. The developer is given
the freedom to logically annotate their data structures in Thrift Interface
Definition Logic File (IDL File), with minimal amount of extra information
necessary to tell the code generator how to safely transport the objects across
languages.
Structs:
A thrift struct defines a
common object to be used across languages. A struct is essentially similar to a
class in object oriented programming languages. A Thrift struct has a strongly
typed field with unique field identifiers. The basic syntax for Thrift struct is
very similar to the structs used in C. The fields in a Thrift struct may be
annotated with unique field identifiers unique to the scope of the struct and
also with optional default values. The concept of field identifiers can be
omitted also and this concept of field identifers was introduced strictly for
versioning purposes.
This is how a Thrift Struct
looks like,
struct Example
{
1: i32 number =10,
2: i64 bignumber,
3: double decimals,
4: string name= “NB”
};
As you can see the fields
inside the Thrift struct are labeled with unique field identifiers.
Facebook Thrift
Services
Thrift has been employed in a
large number of applications at Facebook, including search, logging, mobile,
ads and the developer platform. Two specific usages are discussed below.
Search
Thrift is used as the
underlying protocol and transport layer for the Facebook Search service. The
multi-language code generation is well suited for search because it allows for
application development in an efficient server side language (C++) and allows
the Facebook PHP-based web application to make calls to the search service
using Thrift PHP libraries. There is also a large variety of search stats,
deployment and testing functionality that is built on top of generated Python
code. Additionally, the Thrift log file format is used as a redo log for
providing real-time search index updates. Thrift has allowed the search team to
leverage each language for its strengths and to develop code at a rapid pace.
Logging
The Thrift TFileTransport
functionality is used for structured logging. Each service function definition
along with its parameters can be considered to be a structured log entry
identified by the function name. This log can then be used for a variety of
purposes, including online and offline processing, stats aggregation and as a
redo log.
Thrift has enabled Facebook to
build scalable backend services efficiently by enabling engineers to divide and
conquer. Application developers can focus on application code without worrying
about the sockets layer. We avoid duplicated work by writing buffering and I/O
logic in one place, rather than interspersing it in each application. Thrift
has been employed in a wide variety of applications at Face book, including
search, logging, mobile, ads, and the developer platform. We have found that
the marginal performance cost incurred by an extra layer of software
abstraction is far eclipsed by the gains in developer efficiency and systems
reliability. Finally Thrift has been added to Apache Software Foundation as the
Apache Thrift Project , making it open source framework for cross-language
services implementation.
References
- Kempf, Williams, “Boost. Threads”, http://www.boost.org/doc/html.
- Thrift White Paper, http://thrift.apache.org/static/thrift- 20070401.pdf.
- Thrift Tutorial http://wiki.apache.org/thrift/Tutorial.
- Thrift Wiki http://wiki.apache.org/thrift.
- Protocol Buffers
- http://code.google.com/apis/protocolbuffers/docs/overview.html
No comments:
Post a Comment