Enough jibber-jabber: writing an OpenSRF service

Imagine an application architecture in which 10 lines of Perl or Python, using the data types native to each language, are enough to implement a method that can then be deployed and invoked seamlessly across hundreds of servers. You have just imagined developing with OpenSRF – it is truly that simple. Under the covers, of course, the OpenSRF language bindings do an incredible amount of work on behalf of the developer. An OpenSRF application consists of one or more OpenSRF services that expose methods: for example, the opensrf.simple-text demonstration service exposes the opensrf.simple-text.split() and opensrf.simple-text.reverse() methods. Each method accepts zero or more arguments and returns zero or one results. The data types supported by OpenSRF arguments and results are typical core language data types: strings, numbers, booleans, arrays, and hashes.

To implement a new OpenSRF service, perform the following steps:

  1. Include the base OpenSRF support libraries
  2. Write the code for each of your OpenSRF methods as separate procedures
  3. Register each method
  4. Add the service definition to the OpenSRF configuration files

For example, the following code implements an OpenSRF service. The service includes one method named opensrf.simple-text.reverse() that accepts one string as input and returns the reversed version of that string:

#!/usr/bin/perl

package OpenSRF::Application::Demo::SimpleText;

use strict;

use OpenSRF::Application;
use parent qw/OpenSRF::Application/;

sub text_reverse {
    my ($self , $conn, $text) = @_;
    my $reversed_text = scalar reverse($text);
    return $reversed_text;
}

__PACKAGE__->register_method(
    method    => 'text_reverse',
    api_name  => 'opensrf.simple-text.reverse'
);

Ten lines of code, and we have a complete OpenSRF service that exposes a single method and could be deployed quickly on a cluster of servers to meet your application’s ravenous demand for reversed strings! If you’re unfamiliar with Perl, the use OpenSRF::Application; use parent qw/OpenSRF::Application/; lines tell this package to inherit methods and properties from the OpenSRF::Application module. For example, the call to __PACKAGE__->register_method() is defined in OpenSRF::Application but due to inheritance is available in this package (named by the special Perl symbol __PACKAGE__ that contains the current package name). The register_method() procedure is how we introduce a method to the rest of the OpenSRF world.

Registering a service with the OpenSRF configuration files

Two files control most of the configuration for OpenSRF:

  • opensrf.xml contains the configuration for the service itself as well as a list of which application servers in your OpenSRF cluster should start the service
  • opensrf_core.xml (often referred to as the "bootstrap configuration" file) contains the OpenSRF networking information, including the XMPP server connection credentials for the public and private routers; you only need to touch this for a new service if the new service needs to be accessible via the public router

Begin by defining the service itself in opensrf.xml. To register the opensrf.simple-text service, add the following section to the <apps> element (corresponding to the XPath /opensrf/default/apps/):

<apps>
  <opensrf.simple-text>                                                      <!-- 1 -->
    <keepalive>3</keepalive>                                                 <!-- 2 -->
    <stateless>1</stateless>                                                 <!-- 3 -->
    <language>perl</language>                                                <!-- 4 -->
    <implementation>OpenSRF::Application::Demo::SimpleText</implementation>  <!-- 5 -->
    <max_requests>100</max_requests>                                         <!-- 6 -->
    <unix_config>
      <max_requests>1000</max_requests>                                      <!-- 7 -->
      <unix_log>opensrf.simple-text_unix.log</unix_log>                      <!-- 8 -->
      <unix_sock>opensrf.simple-text_unix.sock</unix_sock>                   <!-- 9 -->
      <unix_pid>opensrf.simple-text_unix.pid</unix_pid>                     <!-- 10 -->
      <min_children>5</min_children>                                        <!-- 11 -->
      <max_children>15</max_children>                                       <!-- 12 -->
      <min_spare_children>2</min_spare_children>                            <!-- 13 -->
      <max_spare_children>5</max_spare_children>                            <!-- 14 -->
    </unix_config>
  </opensrf.simple-text>

  <!-- other OpenSRF services registered here... -->
</apps>

1

The element name is the name that the OpenSRF control scripts use to refer to the service.

2

Specifies the interval (in seconds) between checks to determine if the service is still running.

3

Specifies whether OpenSRF clients can call methods from this service without first having to create a connection to a specific service backend process for that service. If the value is 1, then the client can simply issue a request and the router will forward the request to an available service and the result will be returned directly to the client.

4

Specifies the programming language in which the service is implemented

5

Specifies the name of the library or module in which the service is implemented

6

(C implementations): Specifies the maximum number of requests a process serves before it is killed and replaced by a new process.

7

(Perl implementations): Specifies the maximum number of requests a process serves before it is killed and replaced by a new process.

8

The name of the log file for language-specific log messages such as syntax warnings.

9

The name of the UNIX socket used for inter-process communications.

10

The name of the PID file for the master process for the service.

11

The minimum number of child processes that should be running at any given time.

12

The maximum number of child processes that should be running at any given time.

13

The minimum number of child processes that should be available to handle incoming requests. If there are fewer than this number of spare child processes, new processes will be spawned.

14

The maximum number of child processes that should be available to handle incoming requests. If there are more than this number of spare child processes, the extra processes will be killed.

To make the service accessible via the public router, you must also edit the opensrf_core.xml configuration file to add the service to the list of publicly accessible services:

Making a service publicly accessible in opensrf_core.xml

<router>                                                                     <!-- 1 -->
    <!-- This is the public router. On this router, we only register applications
     which should be accessible to everyone on the opensrf network -->
    <name>router</name>
    <domain>public.localhost</domain>                                        <!-- 2 -->
    <services>
        <service>opensrf.math</service>
        <service>opensrf.simple-text</service>                               <!-- 3 -->
    </services>
</router>

1

This section of the opensrf_core.xml file is located at XPath /config/opensrf/routers/.

2

public.localhost is the canonical public router domain in the OpenSRF installation instructions.

3

Each <service> element contained in the <services> element offers their services via the public router as well as the private router.

Once you have defined the new service, you must restart the OpenSRF Router to retrieve the new configuration and start or restart the service itself.

Calling an OpenSRF method

OpenSRF clients in any supported language can invoke OpenSRF services in any supported language. So let’s see a few examples of how we can call our fancy new opensrf.simple-text.reverse() method:

Calling OpenSRF methods from the srfsh client

srfsh is a command-line tool installed with OpenSRF that you can use to call OpenSRF methods. To call an OpenSRF method, issue the request command and pass the OpenSRF service and method name as the first two arguments; then pass a list of JSON objects as the arguments to the method being invoked.

The following example calls the opensrf.simple-text.reverse method of the opensrf.simple-text OpenSRF service, passing the string "foobar" as the only method argument:

$ srfsh
srfsh # request opensrf.simple-text opensrf.simple-text.reverse "foobar"

Received Data: "raboof"

=------------------------------------
Request Completed Successfully
Request Time in seconds: 0.016718
=------------------------------------

Getting documentation for OpenSRF methods from the srfsh client

The srfsh client also gives you command-line access to retrieving metadata about OpenSRF services and methods. For a given OpenSRF method, for example, you can retrieve information such as the minimum number of required arguments, the data type and a description of each argument, the package or library in which the method is implemented, and a description of the method. To retrieve the documentation for an opensrf method from srfsh, issue the introspect command, followed by the name of the OpenSRF service and (optionally) the name of the OpenSRF method. If you do not pass a method name to the introspect command, srfsh lists all of the methods offered by the service. If you pass a partial method name, srfsh lists all of the methods that match that portion of the method name.

Note

The quality and availability of the descriptive information for each method depends on the developer to register the method with complete and accurate information. The quality varies across the set of OpenSRF and Evergreen APIs, although some effort is being put towards improving the state of the internal documentation.

srfsh# introspect opensrf.simple-text "opensrf.simple-text.reverse"
--> opensrf.simple-text

Received Data: {
  "__c":"opensrf.simple-text",
  "__p":{
    "api_level":1,
    "stream":0,                                                      \ # 1
    "object_hint":"OpenSRF_Application_Demo_SimpleText",
    "remote":0,
    "package":"OpenSRF::Application::Demo::SimpleText",              \ # 2
    "api_name":"opensrf.simple-text.reverse",                        \ # 3
    "server_class":"opensrf.simple-text",
    "signature":{                                                    \ # 4
      "params":[                                                     \ # 5
        {
          "desc":"The string to reverse",
          "name":"text",
          "type":"string"
        }
      ],
      "desc":"Returns the input string in reverse order\n",          \ # 6
      "return":{                                                     \ # 7
        "desc":"Returns the input string in reverse order",
        "type":"string"
      }
    },
    "method":"text_reverse",                                         \ # 8
    "argc":1                                                         \ # 9
  }
}

1

stream denotes whether the method supports streaming responses or not.

2

package identifies which package or library implements the method.

3

api_name identifies the name of the OpenSRF method.

4

signature is a hash that describes the parameters for the method.

5

params is an array of hashes describing each parameter in the method; each parameter has a description (desc), name (name), and type (type).

6

desc is a string that describes the method itself.

7

return is a hash that describes the return value for the method; it contains a description of the return value (desc) and the type of the returned value (type).

8

method identifies the name of the function or method in the source implementation.

9

argc is an integer describing the minimum number of arguments that must be passed to this method.

Calling OpenSRF methods from Perl applications

To call an OpenSRF method from Perl, you must connect to the OpenSRF service, issue the request to the method, and then retrieve the results.

#/usr/bin/perl
use strict;
use OpenSRF::AppSession;
use OpenSRF::System;

OpenSRF::System->bootstrap_client(config_file => '/openils/conf/opensrf_core.xml'); # 1

my $session = OpenSRF::AppSession->create("opensrf.simple-text");                   # 2

print "substring: Accepts a string and a number as input, returns a string\n";
my $result = $session->request("opensrf.simple-text.substring", "foobar", 3);       # 3
my $request = $result->gather();                                                    # 4
print "Substring: $request\n\n";

print "split: Accepts two strings as input, returns an array of strings\n";
$request = $session->request("opensrf.simple-text.split", "This is a test", " ");   # 5
my $output = "Split: [";
my $element;
while ($element = $request->recv()) {                                               # 6
    $output .= $element->content . ", ";                                            # 7
}
$output =~ s/, $/]/;
print $output . "\n\n";

print "statistics: Accepts an array of strings as input, returns a hash\n";
my @many_strings = [
    "First I think I'll have breakfast",
    "Then I think that lunch would be nice",
    "And then seventy desserts to finish off the day"
];

$result = $session->request("opensrf.simple-text.statistics", @many_strings);       # 8
$request = $result->gather();                                                       # 9
print "Length: " . $result->{'length'} . "\n";
print "Word count: " . $result->{'word_count'} . "\n";

$session->disconnect();                                                             # 10

1

The OpenSRF::System->bootstrap_client() method reads the OpenSRF configuration information from the indicated file and creates an XMPP client connection based on that information.

2

The OpenSRF::AppSession->create() method accepts one argument - the name of the OpenSRF service to which you want to want to make one or more requests - and returns an object prepared to use the client connection to make those requests.

3

The OpenSRF::AppSession->request() method accepts a minimum of one argument - the name of the OpenSRF method to which you want to make a request - followed by zero or more arguments to pass to the OpenSRF method as input values. This example passes a string and an integer to the opensrf.simple-text.substring method defined by the opensrf.simple-text OpenSRF service.

4

The gather() method, called on the result object returned by the request() method, iterates over all of the possible results from the result object and returns a single variable.

5

This request() call passes two strings to the opensrf.simple-text.split method defined by the opensrf.simple-text OpenSRF service and returns (via gather()) a reference to an array of results.

6

The opensrf.simple-text.split() method is a streaming method that returns an array of results with one element per recv() call on the result object. We could use the gather() method to retrieve all of the results in a single array reference, but instead we simply iterate over the result variable until there are no more results to retrieve.

7

While the gather() convenience method returns only the content of the complete set of results for a given request, the recv() method returns an OpenSRF result object with status, statusCode, and content fields as we saw in the HTTP results example.

8

This request() call passes an array to the opensrf.simple-text.statistics method defined by the opensrf.simple-text OpenSRF service.

9

The result object returns a hash reference via gather(). The hash contains the length and word_count keys we defined in the method.

10

The OpenSRF::AppSession->disconnect() method closes the XMPP client connection and cleans up resources associated with the session.

Accepting and returning more interesting data types

Of course, the example of accepting a single string and returning a single string is not very interesting. In real life, our applications tend to pass around multiple arguments, including arrays and hashes. Fortunately, OpenSRF makes that easy to deal with; in Perl, for example, returning a reference to the data type does the right thing. In the following example of a method that returns a list, we accept two arguments of type string: the string to be split, and the delimiter that should be used to split the string.

Text splitting method - streaming mode. 

sub text_split {
    my $self = shift;
    my $conn = shift;
    my $text = shift;
    my $delimiter = shift || ' ';

    my @split_text = split $delimiter, $text;
    return \@split_text;
}

__PACKAGE__->register_method(
    method    => 'text_split',
    api_name  => 'opensrf.simple-text.split'
);

We simply return a reference to the list, and OpenSRF does the rest of the work for us to convert the data into the language-independent format that is then returned to the caller. As a caller of a given method, you must rely on the documentation used to register to determine the data structures - if the developer has added the appropriate documentation.

Accepting and returning Evergreen objects

OpenSRF is agnostic about objects; its role is to pass JSON back and forth between OpenSRF clients and services, and it allows the specific clients and services to define their own semantics for the JSON structures. On top of that infrastructure, Evergreen offers the fieldmapper: an object-relational mapper that provides a complete definition of all objects, their properties, their relationships to other objects, the permissions required to create, read, update, or delete objects of that type, and the database table or view on which they are based.

The Evergreen fieldmapper offers a great deal of convenience for working with complex system objects beyond the basic mapping of classes to database schemas. Although the result is passed over the wire as a JSON object containing the indicated fields, fieldmapper-aware clients then turn those JSON objects into native objects with setter / getter methods for each field.

All of this metadata about Evergreen objects is defined in the fieldmapper configuration file (/openils/conf/fm_IDL.xml), and access to these classes is provided by the open-ils.cstore, open-ils.pcrud, and open-ils.reporter-store OpenSRF services which parse the fieldmapper configuration file and dynamically register OpenSRF methods for creating, reading, updating, and deleting all of the defined classes.

Example fieldmapper class definition for "Open User Summary". 

<class id="mous" controller="open-ils.cstore open-ils.pcrud"
 oils_obj:fieldmapper="money::open_user_summary"
 oils_persist:tablename="money.open_usr_summary"
 reporter:label="Open User Summary">                                       <!-- 1 -->
    <fields oils_persist:primary="usr" oils_persist:sequence="">           <!-- 2 -->
        <field name="balance_owed" reporter:datatype="money" />            <!-- 3 -->
        <field name="total_owed" reporter:datatype="money" />
        <field name="total_paid" reporter:datatype="money" />
        <field name="usr" reporter:datatype="link"/>
    </fields>
    <links>
        <link field="usr" reltype="has_a" key="id" map="" class="au"/>     <!-- 4 -->
    </links>
    <permacrud xmlns="http://open-ils.org/spec/opensrf/IDL/permacrud/v1">  <!-- 5 -->
        <actions>
            <retrieve permission="VIEW_USER">                              <!-- 6 -->
                <context link="usr" field="home_ou"/>                      <!-- 7 -->
            </retrieve>
        </actions>
    </permacrud>
</class>

1

The <class> element defines the class:

  • The id attribute defines the class hint that identifies the class both elsewhere in the fieldmapper configuration file, such as in the value of the field attribute of the <link> element, and in the JSON object itself when it is instantiated. For example, an "Open User Summary" JSON object would have the top level property of "__c":"mous".
  • The controller attribute identifies the services that have direct access to this class. If open-ils.pcrud is not listed, for example, then there is no means to directly access members of this class through a public service.
  • The oils_obj:fieldmapper attribute defines the name of the Perl fieldmapper class that will be dynamically generated to provide setter and getter methods for instances of the class.
  • The oils_persist:tablename attribute identifies the schema name and table name of the database table that stores the data that represents the instances of this class. In this case, the schema is money and the table is open_usr_summary.
  • The reporter:label attribute defines a human-readable name for the class used in the reporting interface to identify the class. These names are defined in English in the fieldmapper configuration file; however, they are extracted so that they can be translated and served in the user’s language of choice.

2

The <fields> element lists all of the fields that belong to the object.

  • The oils_persist:primary attribute identifies the field that acts as the primary key for the object; in this case, the field with the name usr.
  • The oils_persist:sequence attribute identifies the sequence object (if any) in this database provides values for new instances of this class. In this case, the primary key is defined by a field that is linked to a different table, so no sequence is used to populate these instances.

3

Each <field> element defines a single field with the following attributes:

  • The name attribute identifies the column name of the field in the underlying database table as well as providing a name for the setter / getter method that can be invoked in the JSON or native version of the object.
  • The reporter:datatype attribute defines how the reporter should treat the contents of the field for the purposes of querying and display.
  • The reporter:label attribute can be used to provide a human-readable name for each field; without it, the reporter falls back to the value of the name attribute.

4

The <links> element contains a set of zero or more <link> elements, each of which defines a relationship between the class being described and another class.

  • The field attribute identifies the field named in this class that links to the external class.
  • The reltype attribute identifies the kind of relationship between the classes; in the case of has_a, each value in the usr field is guaranteed to have a corresponding value in the external class.
  • The key attribute identifies the name of the field in the external class to which this field links.
  • The rarely-used map attribute identifies a second class to which the external class links; it enables this field to define a direct relationship to an external class with one degree of separation, to avoid having to retrieve all of the linked members of an intermediate class just to retrieve the instances from the actual desired target class.
  • The class attribute identifies the external class to which this field links.

5

The <permacrud> element defines the permissions that must have been granted to a user to operate on instances of this class.

6

The <retrieve> element is one of four possible children of the <actions> element that define the permissions required for each action: create, retrieve, update, and delete.

  • The permission attribute identifies the name of the permission that must have been granted to the user to perform the action.
  • The contextfield attribute, if it exists, defines the field in this class that identifies the library within the system for which the user must have privileges to work. If a user has been granted a given permission, but has not been granted privileges to work at a given library, they can not perform the action at that library.

7

The rarely-used <context> element identifies a linked field (link attribute) in this class which links to an external class that holds the field (field attribute) that identifies the library within the system for which the user must have privileges to work.

When you retrieve an instance of a class, you can ask for the result to flesh some or all of the linked fields of that class, so that the linked instances are returned embedded directly in your requested instance. In that same request you can ask for the fleshed instances to in turn have their linked fields fleshed. By bundling all of this into a single request and result sequence, you can avoid the network overhead of requiring the client to request the base object, then request each linked object in turn.

You can also iterate over a collection of instances and set the automatically generated isdeleted, isupdated, or isnew properties to indicate that the given instance has been deleted, updated, or created respectively. Evergreen can then act in batch mode over the collection to perform the requested actions on any of the instances that have been flagged for action.

Returning streaming results

In the previous implementation of the opensrf.simple-text.split method, we returned a reference to the complete array of results. For small values being delivered over the network, this is perfectly acceptable, but for large sets of values this can pose a number of problems for the requesting client. Consider a service that returns a set of bibliographic records in response to a query like "all records edited in the past month"; if the underlying database is relatively active, that could result in thousands of records being returned as a single network request. The client would be forced to block until all of the results are returned, likely resulting in a significant delay, and depending on the implementation, correspondingly large amounts of memory might be consumed as all of the results are read from the network in a single block.

OpenSRF offers a solution to this problem. If the method returns results that can be divided into separate meaningful units, you can register the OpenSRF method as a streaming method and enable the client to loop over the results one unit at a time until the method returns no further results. In addition to registering the method with the provided name, OpenSRF also registers an additional method with .atomic appended to the method name. The .atomic variant gathers all of the results into a single block to return to the client, giving the caller the ability to choose either streaming or atomic results from a single method definition.

In the following example, the text splitting method has been reimplemented to support streaming; very few changes are required:

Text splitting method - streaming mode. 

sub text_split {
    my $self = shift;
    my $conn = shift;
    my $text = shift;
    my $delimiter = shift || ' ';

    my @split_text = split $delimiter, $text;
    foreach my $string (@split_text) {           # 1
        $conn->respond($string);
    }
    return undef;
}

__PACKAGE__->register_method(
    method    => 'text_split',
    api_name  => 'opensrf.simple-text.split',
    stream    => 1                              # 2
);

1

Rather than returning a reference to the array, a streaming method loops over the contents of the array and invokes the respond() method of the connection object on each element of the array.

2

Registering the method as a streaming method instructs OpenSRF to also register an atomic variant (opensrf.simple-text.split.atomic).

Error! Warning! Info! Debug!

As hard as it may be to believe, it is true: applications sometimes do not behave in the expected manner, particularly when they are still under development. The server language bindings for OpenSRF include integrated support for logging messages at the levels of ERROR, WARNING, INFO, DEBUG, and the extremely verbose INTERNAL to either a local file or to a syslogger service. The destination of the log files, and the level of verbosity to be logged, is set in the opensrf_core.xml configuration file. To add logging to our Perl example, we just have to add the OpenSRF::Utils::Logger package to our list of used Perl modules, then invoke the logger at the desired logging level.

You can include many calls to the OpenSRF logger; only those that are higher than your configured logging level will actually hit the log. The following example exercises all of the available logging levels in OpenSRF:

use OpenSRF::Utils::Logger;
my $logger = OpenSRF::Utils::Logger;
# some code in some function
{
    $logger->error("Hmm, something bad DEFINITELY happened!");
    $logger->warn("Hmm, something bad might have happened.");
    $logger->info("Something happened.");
    $logger->debug("Something happened; here are some more details.");
    $logger->internal("Something happened; here are all the gory details.")
}

If you call the mythical OpenSRF method containing the preceding OpenSRF logger statements on a system running at the default logging level of INFO, you will only see the INFO, WARN, and ERR messages, as follows:

Results of logging calls at the default level of INFO. 

[2010-03-17 22:27:30] opensrf.simple-text [ERR :5681:SimpleText.pm:277:] Hmm, something bad DEFINITELY happened!
[2010-03-17 22:27:30] opensrf.simple-text [WARN:5681:SimpleText.pm:278:] Hmm, something bad might have happened.
[2010-03-17 22:27:30] opensrf.simple-text [INFO:5681:SimpleText.pm:279:] Something happened.

If you then increase the the logging level to INTERNAL (5), the logs will contain much more information, as follows:

Results of logging calls at the default level of INTERNAL. 

[2010-03-17 22:48:11] opensrf.simple-text [ERR :5934:SimpleText.pm:277:] Hmm, something bad DEFINITELY happened!
[2010-03-17 22:48:11] opensrf.simple-text [WARN:5934:SimpleText.pm:278:] Hmm, something bad might have happened.
[2010-03-17 22:48:11] opensrf.simple-text [INFO:5934:SimpleText.pm:279:] Something happened.
[2010-03-17 22:48:11] opensrf.simple-text [DEBG:5934:SimpleText.pm:280:] Something happened; here are some more details.
[2010-03-17 22:48:11] opensrf.simple-text [INTL:5934:SimpleText.pm:281:] Something happened; here are all the gory details.
[2010-03-17 22:48:11] opensrf.simple-text [ERR :5934:SimpleText.pm:283:] Resolver did not find a cache hit
[2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:Cache.pm:125:] Stored opensrf.simple-text.test_cache.masaa => "here" in memcached server
[2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:Application.pm:579:] Coderef for [OpenSRF::Application::Demo::SimpleText::test_cache] has been run
[2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:Application.pm:586:] A top level Request object is responding de nada
[2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:Application.pm:190:] Method duration for [opensrf.simple-text.test_cache]:  10.005
[2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:AppSession.pm:780:] Calling queue_wait(0)
[2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:AppSession.pm:769:] Resending...0
[2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:AppSession.pm:450:] In send
[2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:AppSession.pm:506:] AppSession sending RESULT to opensrf@private.localhost/_dan-karmic-liblap_1268880489.752154_5943 with threadTrace [1]
[2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:AppSession.pm:506:] AppSession sending STATUS to opensrf@private.localhost/_dan-karmic-liblap_1268880489.752154_5943 with threadTrace [1]
...

To see everything that is happening in OpenSRF, try leaving your logging level set to INTERNAL for a few minutes - just ensure that you have a lot of free disk space available if you have a moderately busy system!

Caching results: one secret of scalability

If you have ever used an application that depends on a remote Web service outside of your control-say, if you need to retrieve results from a microblogging service-you know the pain of latency and dependability (or the lack thereof). To improve response time in OpenSRF applications, you can take advantage of the support offered by the OpenSRF::Utils::Cache module for communicating with a local instance or cluster of memcache daemons to store and retrieve persistent values.

use OpenSRF::Utils::Cache;                                       # 1
sub test_cache {
    my $self = shift;
    my $conn = shift;
    my $test_key = shift;
    my $cache = OpenSRF::Utils::Cache->new('global');            # 2
    my $cache_key = "opensrf.simple-text.test_cache.$test_key";  # 3
    my $result = $cache->get_cache($cache_key) || undef;         # 4
    if ($result) {
        $logger->info("Resolver found a cache hit");
        return $result;
    }
    sleep 10;                                                    # 5
    my $cache_timeout = 300;                                     # 6
    $cache->put_cache($cache_key, "here", $cache_timeout);       # 7
    return "There was no cache hit.";
}

This example:

1

Imports the OpenSRF::Utils::Cache module

2

Creates a cache object

3

Creates a unique cache key based on the OpenSRF method name and request input value

4

Checks to see if the cache key already exists; if so, it immediately returns that value

5

If the cache key does not exist, the code sleeps for 10 seconds to simulate a call to a slow remote Web service, or an intensive process

6

Sets a value for the lifetime of the cache key in seconds

7

When the code has retrieved its value, then it can create the cache entry, with the cache key, value to be stored ("here"), and the timeout value in seconds to ensure that we do not return stale data on subsequent calls

Initializing the service and its children: child labour

When an OpenSRF service is started, it looks for a procedure called initialize() to set up any global variables shared by all of the children of the service. The initialize() procedure is typically used to retrieve configuration settings from the opensrf.xml file.

An OpenSRF service spawns one or more children to actually do the work requested by callers of the service. For every child process an OpenSRF service spawns, the child process clones the parent environment and then each child process runs the child_init() process (if any) defined in the OpenSRF service to initialize any child-specific settings.

When the OpenSRF service kills a child process, it invokes the child_exit() procedure (if any) to clean up any resources associated with the child process. Similarly, when the OpenSRF service is stopped, it calls the DESTROY() procedure to clean up any remaining resources.

Retrieving configuration settings

The settings for OpenSRF services are maintained in the opensrf.xml XML configuration file. The structure of the XML document consists of a root element <opensrf> containing two child elements:

  • <default> contains an <apps> element describing all OpenSRF services running on this system — see the section called “Registering a service with the OpenSRF configuration files” --, as well as any other arbitrary XML descriptions required for global configuration purposes. For example, Evergreen uses this section for email notification and inter-library patron privacy settings.
  • <hosts> contains one element per host that participates in this OpenSRF system. Each host element must include an <activeapps> element that lists all of the services to start on this host when the system starts up. Each host element can optionally override any of the default settings.

OpenSRF includes a service named opensrf.settings to provide distributed cached access to the configuration settings with a simple API:

  • opensrf.settings.default_config.get: accepts zero arguments and returns the complete set of default settings as a JSON document
  • opensrf.settings.host_config.get: accepts one argument (hostname) and returns the complete set of settings, as customized for that hostname, as a JSON document
  • opensrf.settings.xpath.get: accepts one argument (an XPath expression) and returns the portion of the configuration file that matches the expression as a JSON document

For example, to determine whether an Evergreen system uses the opt-in support for sharing patron information between libraries, you could either invoke the opensrf.settings.default_config.get method and parse the JSON document to determine the value, or invoke the opensrf.settings.xpath.get method with the XPath /opensrf/default/share/user/opt_in argument to retrieve the value directly.

In practice, OpenSRF includes convenience libraries in all of its client language bindings to simplify access to configuration values. C offers osrfConfig.c, Perl offers OpenSRF::Utils::SettingsClient, Java offers org.opensrf.util.SettingsClient, and Python offers osrf.set. These libraries locally cache the configuration file to avoid network roundtrips for every request and enable the developer to request specific values without having to manually construct XPath expressions.