Developing an Akka edge special code

Chapter 7: Remoting

The power of Akka is hopefully apparent to you by now, but really, we've only just begun to scratch the surface. We've already highlighted the fact that Akka brings great flexibility and resiliency for problems involving concurrency, but so far we've only focused on running Akka at the single machine level. Bringing additional machines into the picture is where the capabilities of Akka really show their strongest side.

Akka's support for remoting is where the strength of the message passing approach is perhaps best highlighted. This, and the fundamental focus on making everything in Akka support remote actors as easily as local actors, makes a dramatic difference in your ability to take a single system application and transform it into a distributed system without extreme contortions. That alone is far from typical in usual enterprise frameworks that most of us have been subjected to over the years.

This ability to transparently interact between actors that may be local or remote without having to change the underlying code, is referred to as location transparency, as mentioned in the first chapter. Building your applications to take advantage of this can be driven almost entirely from a configuration-based approach. In this chapter we'll look at how to get started with Akka's remoting module and how to go about getting to the point where location transparency is just an assumed feature of your system.

Setting up remoting

We've used the most basic of the libraries that Akka provides. In order to get started with Akka remoting, you'll need to add the following to your build definition:

libraryDependencies += "com.typesafe.akka" %% "akka-remote" % "2.2.3"

If you're running in the sbt console, don't forget to run reload so that it reloads the updated build configuration. The next time you try to compile anything, sbt will see that there is a new dependency and attempt to retrieve it.

The other key addition is to your Akka configuration. These additions will enable remote access from other Akka instances. It's important to note that the akka.remote.netty.tcp.hostname setting needs to reference a hostname that any remote Akka instances will be able to resolve. In many of the Akka tutorials you'll see online, this field is set to localhost or 127.0.0.1, which causes needless frustration when some hapless developer is trying to learn how to use Akka remoting and can't figure out why two instances can't talk to each other. Make sure that if you use a hostname here (and there's no reason not to — we're using IP addresses for convenience) it is globally addressable by all Akka instances, typically this is done through an update to the UNIX host file /etc/hosts and/or updating the domain-name-service.

The port number specified can also vary, as needed. In particular, if you want to try running multiple Akka instances on a single machine, they will need to use unique port numbers to avoid errors. Here we use 2552, but if we were to run another instance of an Akka remoting-based application on the same machine, we would need to use a different port.

akka {
  actor {
    provider = "akka.remote.RemoteActorRefProvider"
  }
  remote {
    enabled-transports = ["akka.remote.netty.tcp"]
    netty.tcp {
      hostname = "192.168.5.5"
      port = 2552
    }
  }
}

Looking up and creating actors remotely

Let's understand the creation and lookup of actors from a local context before considering the remote context. Creation of actors, as we've learnt, can be achieved through invoking actorOf in either an ActorSystem or an Actor and we can lookup actors using the actorFor through an ActorSystem or an Actor. In Akka 2.2 onwards, actorFor is deprecated in favour of actorSelection as a more unified way of conducting lookups and readers can consult the documentation for details.

Now that we have the configuration in place, actually using remote actors is incredibly simple. Sending a message to a remote actor requires simply knowing the actor's address. This code looks up an actor called remoteActor that lives on the host with IP 192.168.5.5 using port 2552, with an actor system named remoteActorSystem:

val remoteActor = context
  .actorFor("akka.tcp://remoteActorSystemName@192.168.5.5:2552/user/remoteActor")
remoteActor ! SomeMessage("hello!")

Similarly, you can create an actor running on a remote node with just a bit more work:

import akka.actor.{ Props, Deploy, AddressFromURIString }
import akka.remote.RemoteScope 

val remoteAddress =
  AddressFromURIString("akka.tcp://remoteActorSystem@192.168.5.5:2552/user/remoteActor")

val newRemoteActor = context.actorOf(Props[MyRemoteActor]
 .withDeploy(Deploy(scope = RemoteScope(remoteAddress))))

There is an additional requirement here, though, which is that the class MyRemoteActor that's being instantiated here needs to be available to the runtime in both the local and remote JVM instances. An observation from these two approaches is that the address string is embedded in the program code and that presents a problem if your application logic has more remote actor invocations as part of the overall system; however it is a quick solution if you are building out your Actor application over the weekend.

Akka provides us another way to achieve this and reduces this form of coupling by mapping the address in the actor deployment configuration. This would be the equivalent configuration for the previous example:

akka.actor.deployment {
  /remoteActor {
    remote = "akka.tcp://remoteActorSystem@192.168.5.5:2552"
  }
}

With this configuration in place, the previous code is simplified:

val newRemoteActor = context.actorOf(Props[MyRemoteActor],
  name = "remoteActor")

You should notice that this looks just like code we've used previously to start up actors. In other words, we're creating a remote actor without the code having to take the location into account. You should be aware that when we apply this particular technique of creating the remote actor (with help from the configuration) is that we do not actually instantiate the actor there and then, but rather we direct our request to the daemon actor listening to that port i.e. the value of akka.actor.deployment.<actor name>.remote. This is truely location transparency.

Serialization

In Akka, it is easy to forget about data that's being transported from one actor to another actor or even ActorRefs themselves i.e. we typically use Scala case objects or primitive types like Strings, Integers etc and even have domain objects that might possibly capture some state of your application at the time prior to being transported across the network. In remoting actors, this data problem is most glaring when your data is not serializable. Akka caters for this by providing the default serializers or if you are not happy with the performance they deliver, you can build your own too. The default configuration is a mapping of serializers (i.e. akka.actor.serializers) to their respective message objects (i.e. akka.actor.serialization-bindings) and shown below:

akka {
actor {
serializers {
 akka-containers = "akka.remote.serialization.MessageContainerSerializer"
 proto = "akka.remote.serialization.ProtobufSerializer"
 daemon-create = "akka.remote.serialization.DaemonMsgCreateSerializer"
 }
serialization-bindings {
 # Since com.google.protobuf.Message does not extend Serializable but
 # GeneratedMessage does, need to use the more specific one here in order
 # to avoid ambiguity
 "akka.actor.ActorSelectionMessage" = akka-containers
 "com.google.protobuf.GeneratedMessage" = proto
 "akka.remote.DaemonMsgCreate" = daemon-create
 }

In the event that your data is simple enough such that you don't need to write your own serializer/de-serializer, then you would apply the typical technique of marking your objects as serializable, as what you would do in Java. But in the event that you need a custom serializer by building one yourself, then Akka provides a framework to do this. In both situations, we won't discuss serialization further from here onwards and we invited readers to check the documentation here.

Secure Communication Data

We've discussed briefly about data serialization and its primarily about making sure your data stays consistent as it travels from the source to its destination and over here, we discuss a little about making this communication etc more secure. Akka allows us to accomplish in three ways by providing mechanisms to secure the transport layer, the kinds of messages you can send and who can actually receive those messages:

Untrusted mode
Secure Cookie Handshake
SSL

Akka typically runs in the trusted mode i.e. any actor can connect to any actor by knowing its address (local/remote/clustered) and that's a problem because by default any actor can send a shutdown message i.e. PoisonPill, monitor actors remotely or locally (via the Terminated message) and remote deployment. You can avoid this by turning on the property in your configuration

akka.remote.untrusted-mode = on

You should combine that setting by also marking the data messages your application sends with a trait PossiblyHarmful (super trait of Kill, PoisonPill, Terminated & ReceiveTimeout) which effectively muffled those messages.

A second option is to restrict who can actually send the message to the backend and by examining a cookie during the handshaking process, a server can possibly reject a client if this cookie is not present. You can generate this cookie in two ways by invoking a script (found in $AKKA_HOME/scripts/generate_config_with_secure_cookie.sh) or programmatically (via akka.util.Crypt.generateSecureCookie). The following configuration needs to be present on both client and server systems else it'll fail:

akka.remote{
 secure-cookie="090A030E0F0A05010900000A0C0E0C0B03050D05"
 require-cookie=on
}

The final option is to enable SSL in the transport layer and it's done through a transport (Akka defines transports as mechanisms that initializes the underlying transmission capabilities and includes establishing links with remote entities) and in particular the default is the NettyTransport. We know this because if you can recall that we had akka.remote.enabled-transports = ["akka.remote.netty.tcp"] set into our configuration when we began this chapter. But the NettyTransport is not the only one you can use, since Akka allows you to build your own transport class but we won't discuss that here (read details at the documentation). Using the NettyTransport, we can actually configure the key-store, trust-store, protocol etc and the following is the default configuration which you can alter using your project's security credentials:

netty.ssl = {
 # Enable SSL/TLS encryption.
 # This must be enabled on both the client and server to work.
 enable-ssl = true
 security {
  key-store = "keystore"
  key-store-password = "changeme"
  key-password = "changeme"
  trust-store = "truststore"
  trust-store-password = "changeme"
  protocol = "TLSv1"
  enabled-algorithms = ["TLS_RSA_WITH_AES_128_CBC_SHA"]
  random-number-generator=""
 }
}

Remoting with routers

Routers are also able to take advantage of remoting successfully, as you might imagine. This is done via some additional configuration directives that we didn't cover in the Routing chapter. A simple example showing how remoting and routers can be combined would be a broadcaster that allows for broadcasting messages to a set of remote actors.

On the target remote nodes, you might define the receiving actors as follows:

import akka.actor._ 

class ReceiverActor extends Actor with ActorLogging {
  def receive = {
    case msg => log.info(msg)
  }
}

Then, to send a message to each of these nodes concurrently, you would start up your router, to which you can then send a message, which will appear on each of the remotes:

val broadcaster = context.actorOf(Props[BroadcastReceivingActor]
  .withRouter(FromConfig()), name = "broadcaster")

broadcaster ! "hello, remote nodes!"

The configuration that tells this router to use a BroadcastRouter with remote nodes is very simple and what it does is that it would clone three instances of the actor broadcaster and distribute it to the nodes at 10.10.10.01, 10.10.10.02 & 10.10.10.03:

akka.actor.deployment {
  /broadcaster {
    router = broadcast
    nr-of-instances = 3
    target {
      nodes = [
        "akka.tcp://broadcastExampleSystem@10.10.10.01:2552",
        "akka.tcp://broadcastExampleSystem@10.10.10.02:2552",
        "akka.tcp://broadcastExampleSystem@10.10.10.03:2552"
      ]
    }
  }
}

Scatter-gather across a set of remote workers

Returning to the earlier example application, let's add some handy functionality to our system that grabs pages we bookmark. We're going to use a very simplified approach that's not really ideal, but that will at least get the idea across, and then show you how you can use remoting effectively here to help ensure you actually retrieve the page successfully. First, here's the new Crawler actor that I'll be using to grab the pages:

import akka.actor.Actor
import io.Source

object Crawler {
  case class RetrievePage(bookmark: Bookmark)
  case class Page(bookmark: Bookmark, content: String)
}

class Crawler extends Actor {
  import Crawler.{RetrievePage, Page}
  def receive = {
   case RetrievePage(bookmark) =>
     val content = Source.fromURL(bookmark.url).getLines().mkString("\n")
     sender ! Page(bookmark, content)
  }
}

There's not much to this. It just receives a request to retrieve a page using the URL in a given bookmark and then sends the response back to the original requestor. There are a few different ways we could hook this into our existing application, but keeping with the theme of simplicity, I'll just hook it in to the code that stores the bookmark.

class BookmarkStore(database: Database[Bookmark, UUID]) extends Actor {
 
  import BookmarkStore.{GetBookmark, AddBookmark}
  import Crawler.{RetrievePage, Page}

  val crawler = context.actorOf(Props[Crawler])

  def receive = {

    case AddBookmark(title, url) =>
      val bookmark = Bookmark(title, url)
      database.find(bookmark) match {
        case Some(found) => sender ! None
        case None =>
          database.create(UUID.randomUUID, bookmark)
          sender ! Some(bookmark)
          crawler ! RetrievePage(bookmark)
      }

    case GetBookmark(uuid) =>
      sender ! database.read(uuid)

    case Page(bookmark, pageContent) =>
      database.find(bookmark).map { 
        found =>
          database.update(found._1, bookmark.copy(content = Some(pageContent)))
      }
  }
}

The main items to note here are where we send the RetrievePage message to the crawler and then, later, when we get the Page response back, we go look the bookmark back up from the database and add the freshly retreived contents to it. This is all well and good, but imagine that you had a system that was not very reliable at pulling these page contents, and you wanted a higher likelihood of actually getting the page contents successfully. Our approach would be to use a ScatterGatherFirstCompletedRouter combined with a handful of remote nodes to farm out the request handling. A few little changes will show how this might work:

import akka.actor.{Props, Actor}
import java.util.UUID
import akka.routing.ScatterGatherFirstCompletedRouter
import scala.concurrent.duration._
import akka.pattern.{ask, pipe}
import akka.util.Timeout

class BookmarkStore(database: Database[Bookmark, UUID], crawlerNodes: Seq[String]) extends Actor {

  import BookmarkStore.{GetBookmark, AddBookmark}
  import Crawler.{RetrievePage, Page}

  val crawlerRouter =
    context.actorOf(Props.empty.withRouter(ScatterGatherFirstCompletedRouter(routees = crawlers,
      within = 30 seconds)))

  def receive = {

    case AddBookmark(title, url) =>
      val bookmark = Bookmark(title, url)
      database.find(bookmark) match {
        case Some(found) => sender ! None
        case None =>
          database.create(UUID.randomUUID, bookmark)
          sender ! Some(bookmark)
          import context.dispatcher
          implicit val timeout = Timeout(30 seconds)
          (crawlerRouter ? RetrievePage(bookmark)) pipeTo self
      }

    case GetBookmark(uuid) =>
      sender ! database.read(uuid)

    case Page(bookmark, pageContent) =>
      database.find(bookmark).map { 
        found =>
          database.update(found._1, bookmark.copy(content = Some(pageContent)))
      }
  }
}

There are a few new things here that we'll cover in more detail in the next chapter on futures, but the idea here is that the scatter-gather approach is sending the request to each of the actors we've defined, which is determined by the crawlerNodes parameter. We're assuming this will be passed in as a Seq of remote actor system addresses, for example: "akka.tcp://bookmarker@192.168.5.5:2553/user/crawler". The application daemon is Bookmarker (as before) and as an example of how our sample application would work, we would start the ActorSystem named bookmarker and with that start two remote actors using the configuration approach and have our main actor lookup these two remote actors whenever a valid bookmark is created (recall a valid bookmark would contain a URL that our crawler would crawl the given page and store it into our in-memory cache which is available upon the next request) and the following snippets from application.conf and Bookmarker.scalacaptures this:/p>

akka.actor.deployment {
 /crawler-0 {
 remote = "akka.tcp://bookmarker@127.0.0.1:2552"
 }
 /crawler-1 {
 remote = "akka.tcp://bookmarker@127.0.0.1:2552"
 }
}
object Bookmarker extends App {
  ...
  val crawlers = Seq(system.actorOf(Props[Crawler], "crawler-0"), system.actorOf(Props[Crawler], "crawler-1"))
 
  val bookmarkStoreGuardian =
    system.actorOf(Props(new BookmarkStoreGuardian(database,      
     collection.immutable.Seq(
"akka.tcp://bookmarker@127.0.0.1:2552/user/crawler-0",  
"akka.tcp://bookmarker@127.0.0.1:2552/user/crawler-1"))).withDispatcher("boundedBookmarkDispatcher"))
}//end of Bookmarker

The router will send the request to each node, waiting up to 30 seconds for a response. The first result it gets back is then sent to the BookmarkStore actor itself to process, just as it did earlier.

Gotchas and troubleshooting

The two most difficult things about working with remote actors are understanding when to use them and troubleshooting things when they aren't working. Understanding the characteristics of the problem you're trying to solve is a good place to start thinking about what sort of actor system you might want to design. You should ask yourself whether the system you're building either requires more resources than a single system can provide, if it necessarily requires being spread across multiple systems, or if there is some other design constraint that makes interaction between multiple actor systems a requirement. Perhaps the important thing to keep in mind is that you still need to focus on developing clear protocols for communicating between your actors. If you've designed a reasonably robust actor hierarchy, with appropriate supervisors and topologies for distributing work, nothing should need to change.

The point we want you to come away with is that working with remote actors should not generally be any different from working with local actors. You should be assuming your actors may fail, behave unexpectedly (did you really cover all the corner cases of that data parsing actor you wrote?), or whatever, no matter whether it is designed to run locally or remotely. Sure there will be differences, particularly if you are doing things that are implicitly designed around system-locality -- for example, accessing local files could fall into this bucket. But those are differences you are hopefully taking into account regardless of whether you're using actors or not. But with remoting and the ease of taking a local actor and transforming it into a remote one, Akka gives you one more reason to start designing with actors.

When things do go wrong while you're using remote actors, keep in mind that they are a network-based resource. So troubleshooting primarily falls into the same set of strategies you would use for troubleshooting any other network based resource. If your traffic appears to simply not be appearing on your remote system, start with the network layer and work up from there. Check any firewall rules; make sure any host names you use are actually resolving correctly and that TCP ports are not being used by other resources or services. And, of course, don't forget to check that things are actually plugged in!

There are also configuration options that Akka provides to help you debug the messages being sent between your remote systems. These go hand-in-hand with the same steps mentioned in chapter three. To be specific, you need to make sure your actors include logging functionality, by adding the akka.actor.ActorLogging trait to your actors and by using the akka.event.LoggingReceive wrapper around the receive block in your actor. The configuration directives are:

akka {
  remote {
    log-received-messages = on
    log-sent-messages = on
    log-remote-lifecycle-events=on
  }
}

You can also get a ton of debugging information by listening to Akka's EventBus, which is used internally by Akka to distribute internal messages. You can listen to either the remote server events, remote client events, or both, by listening for one of the event types RemoteServerLifeCycleEvent, RemoteClientLifeCycleEvent, or RemoteLifeCycleEvent. Each of these is actually part of a hierarchy of events you can listen for, but these top level events are often what you want when you're just debugging. To find information about the additional event types, see the documentation. Listening to this event bus is simply a matter of registering an actor and subscribing to the particular event class type you want to listen for (this is called a classifier in the documentation), as this example shows:

import akka.remote.RemotingLifecycleEvent
val eventStreamListener = system.actorOf(Props(new Actor {
 def receive = {
 case msg => println(msg)
 }
}))
system.eventStream.subscribe(eventStreamListener, classOf[RemotingLifecycleEvent])

Wrap-up

Truthfully, we've only scratched the surface, both with remoting and Akka in general. But now you've seen how to put these tools into use and you should be able to take them from these simple examples to start building real applications. It's important to remember to use the resources you have available to you, so be sure to continue experimenting, reading and exploring with Akka. The best resource at your disposal is real experience.