English |  Español |  Français |  Italiano |  Português |  Русский |  Shqip

Big Data Dictionary

Apache Flume

Flume is an open source project, developed by Cloudera, that provides a distributed, reliable, available service for eciently moving large amounts of data as it is produced. It is ideally suited to gathering logs from multiple systems and inserting them into HDFS as they are generated. Flume is typically used to ingest log les from real-time systems such as Web servers, rewalls, mailservers etc into HDFS. Each Flume node has a source and a sink. The Source tells the node where to receive data from. The Sink Tells the node where to send data to. Sink can have one or more decorators that perform simple processing on the data as it passes though (e.g., compression, grep-like functionality). Flume scales horizontally. As load increases, more machines can be added to the con guration. Flume provides a central Master, where users can monitor data ows and recon gure them on the fly. The figure below iilustrate a scenario of using Flum for gathering data from multiple systems to be stored in HDFS.

There has been error in communication with Booktype server. Not sure right now where is the problem.

You should refresh this page.