7 Programming Considerations

This chapter describes programming changes you can make to an application's source code to allow it to run in a cluster. You must have access to the application's source files to make the required changes.

This chapter discusses the following topics:

Modifications that are required for remote procedure call (RPC) programs (Section 7.1)

Portable applications -- developing applications that run in a cluster or on a standalone system (Section 7.2)

Support for the Cluster Logical Storage Manager (CLSM) (Section 7.3)

Diagnostic utility support (Section 7.4)

Compact Disc-Read Only Memory File System (CDFS) file system restrictions (Section 7.5)

Scripts called from the /cluster/admin/run directory (Section 7.6)

Cluster member status during a rolling upgrade (Section 7.7)

File access resilience in a cluster (Section 7.8)

7.1 Modifications Required for RPC Programs

Make the following modifications to existing, nonclusterized remote procedure call (RPC) programs to allow them to run in a clustered environment:

Conditionalize the code to replace calls to bind() with calls to clusvc_getcommport() or clusvc_getresvcommport() when the code is executed in a cluster environment. Use these functions only if you want to run an RPC application on multiple cluster members, making them accessible via a cluster alias. In addition to ensuring that each instance of an RPC application uses the same common port, these functions also inform the portmapper that the application is a multi-instance, alias application. See clusvc_getcommport(3) and clusvc_getresvcommport(3) for more information.

Services that are not calling svc_register() must call clua_registerservice() to allow the service to accept incoming cluster alias connections. See clua_registerservice(3) for more information. (Note that clusvc_getcommport() and clusvc_getresvcommport() automatically call clua_registerservice().)

7.2 Portable Applications: Standalone and Cluster

Tru64 UNIX Version 5.0 or later provides the following built-in features that make it easier to develop applications that run either in a cluster or on a standalone system:

Stub libraries in Tru64 UNIX Version 5.0 or later let you build applications that call functions in the libclu.so API library that ships with TruCluster Server.

The clu_is_member() function, which is provided in libc, determines whether the local system is a cluster member. If the local system is a cluster member, the function returns TRUE; otherwise, it returns FALSE.

The clu_is_ready() function, which is also provided in libc, determines whether the local system has been configured to run in a cluster (that is, TruCluster Server Software is installed and the system is running a clusterized kernel). If the local system is configured to run in a cluster, the function returns TRUE; otherwise, it returns FALSE. The clu_is_ready() function is most useful in code that runs in the boot path before the connection manager establishes cluster membership.

The clu_info() function and the clu_get_info() command return information about the configuration or a value indicating that the system is not configured to be in a cluster.

For more information, see clu_info(3), clu_is_member(3), and clu_get_info(8).

7.3 CLSM Support

The Cluster Logical Storage Manager (CLSM) does not provide interlocking support for normal I/O on mirrored volumes between different nodes. CLSM assumes any application that simultaneously opens the same volume from different nodes already performs the necessary locking to prevent two nodes from writing to the same block at the same time. In other words, if a cluster-aware application is not well-behaved and issues simultaneous writes to the same block from different nodes on a CLSM mirrored volume, data integrity will be compromised.

This is not an issue with Oracle Parallel Server (OPS) because OPS uses the distributed lock manager (DLM) to prevent this situation. Also, it is not an issue with the Cluster File System (CFS), because only one node can have a file system mounted at a time.

While steady-state I/O to mirrored volumes is not interlocked by CLSM between different nodes, CLSM provides interlocking between nodes to accomplish mirror recovery and plex attach operations.

7.4 Diagnostic Utility Support

If you have, or want to write, a diagnostic utility for an application or subsystem, the TruCluster Server clu_check_config command calls diagnostic utilities, provides an execution environment, and maintains log files.

See clu_check_config(8) for a description of how to add a diagnostic utility to the cluster environment and have it called by the clu_check_config command.

7.5 CDFS File System Restrictions

In TruCluster Server Version 5.1A, there are restrictions on managing Compact Disc-Read Only Memory File System (CDFS) file systems in a cluster. Some commands and library functions behave differently on a cluster than on a standalone system.

Table 7-1 lists the CDFS library functions and their expected behavior in a TruCluster Server environment.

Table 7-1: CDFS Library Functions

Library Function	Expected Result on Server	Expected Result on Client
`cd_drec`	Success	Not supported
`cd_ptrec`	Success	Not supported
`cd_pvd`	Success	Not supported
`cd_suf`	Success	Not supported
`cd_type`	Success	Not supported
`cd_xar`	Success	Not supported
`cd_nmconvCD_GETNMCONV`	Success	Success
`cd_nmconvCD_SETNMCONV`	Success	Success
`cd_getdevmap`	No map	Not supported
`cd_setdevmap`	Not supported	Not supported
`cd_idmapCD_GETUMAPCD_GETGMAP`	Success	Not supported
`cd_idmapCD_SETUMAP` `CD_SETGMAP`	Success	Success
`cd_defsCD_GETDEFS`	Success	Success
`cd_defsCD_SETDEFS`	Success	Success

For information about managing CDFS file systems in a cluster, see the TruCluster Server Cluster Administration manual.

7.6 Scripts Called from the /cluster/admin/run Directory

An application that needs to have specific actions taken on its behalf when a cluster is created, or when members are added or deleted, can place a script in the /cluster/admin/run directory. These scripts are called during the first boot of the initial cluster member following the running of clu_create, each cluster member (including the newest one) following clu_add_member, and all remaining cluster members following clu_delete_member.

The scripts in /cluster/admin/run must use the following entry points:

-c
For actions to take when clu_create runs.

-a
For actions to take when clu_add_member runs.

-d memberid
For actions to take when clu_delete_member runs.

Place only files or symbolic links to files that are executable by root in the /cluster/admin/run directory. We recommend that you adhere to the following file-naming convention:

Begin the executable file name with an uppercase letter C.

Make the next two characters a sequential number as used in /sbin/rc3.d for your area.

For the remaining characters, use a name that is associated with your script.

The following file name is an example of this naming convention:

/cluster/admin/run/C40sendmail

The clu_create, clu_add_member, and clu_delete_member commands create the required it(8) files and links to ensure that the scripts are run at the correct time.

7.7 Testing the Status of a Cluster Member During a Rolling Upgrade

The following example program shows one way to determine whether the cluster is in the middle of a rolling upgrade, and whether this cluster member has rolled:

#include <stdio.h>
#include <sys/clu.h>	/* compile with -lclu */
 
#define DEBUG 1
 
main()
{
  struct clu_gen_info  *clu_gen_ptr = NULL;
  char cmd[256];
 
  if (clu_is_member()) {
    if (clu_get_info(&clu_gen_ptr) == 0) {
      if(system("/usr/sbin/clu_upgrade -q status") == 0) {
      sprintf(cmd, "/usr/sbin/clu_upgrade -q check roll %d",
            clu_gen_ptr->my_memberid);
      if (system(cmd) == 0) {
        if (DEBUG) printf("member has rolled\n");
      }
      else if (DEBUG) printf("member has not rolled\n");
      }
      else if (DEBUG) printf("no rolling upgrade in progress\n");
   }
   else if (DEBUG) printf("nonzero return from clu_get_info(\n");
 }
 else if (DEBUG) printf("not a member of a cluster\n");
}

7.8 File Access Resilience in a Cluster

While a cluster application is transferring files, read and write operations may fail if the member running the application is shut down or fails. Typically, a client of the application will see the connection to the server as lost. Be aware of how your application handles lost connections. Some ways applications handle lost connections are:

The client application simply fails (for example, an error is written and the application exits).

The client application sees a problem with the connection and automatically retries its read or write operation.

The client application sees a problem with the connection and displays a window that allows the user to abort, retry, or cancel the operation.

If your client application fails when it loses its connection to the server application (regardless of whether it is running on a single system or a cluster), consider implementing the following:

When updating files, first write the update to a new temporary file. If the write operation is successful, copy the temporary file over the original file. If the write operation encountered a problem, you have not destroyed the original file. You need only to clean up your temporary files and start again.

When reading files, make sure that your application is set up to deal with read operation errors and recover from them (for example, retry the operation).