This chapter describes programming changes you can make to an application's source code to allow it to run in a cluster. You must have access to the application's source files to make the required changes.
This chapter discusses the following topics:
Modifications that are required for remote procedure call (RPC) programs (Section 7.1)
Portable applications -- developing applications that run in a cluster or on a standalone system (Section 7.2)
Support for the Cluster Logical Storage Manager (CLSM) (Section 7.3)
Diagnostic utility support (Section 7.4)
Compact Disc-Read Only Memory File System (CDFS) file system restrictions (Section 7.5)
Scripts called from the
/cluster/admin/run
directory
(Section 7.6)
Cluster member status during a rolling upgrade (Section 7.7)
File access resilience in a cluster (Section 7.8)
7.1 Modifications Required for RPC Programs
Make the following modifications to existing, nonclusterized remote procedure call (RPC) programs to allow them to run in a clustered environment:
Conditionalize the code to replace calls to
bind
() with
calls to
clusvc_getcommport
() or
clusvc_getresvcommport
() when the code is executed in a
cluster environment.
Use these functions only if you want to run
an RPC application on multiple cluster members, making them accessible via
a cluster alias.
In addition to ensuring that each instance of an RPC
application uses the same common port, these functions also inform the
portmapper that the application is a multi-instance, alias application.
See
clusvc_getcommport
(3)
and
clusvc_getresvcommport
(3)
for more information.
Services that are not calling
svc_register
()
must call
clua_registerservice
() to allow the
service to accept incoming cluster alias connections.
See
clua_registerservice
(3)
for more information.
(Note that
clusvc_getcommport
() and
clusvc_getresvcommport
() automatically call
clua_registerservice
().)
7.2 Portable Applications: Standalone and Cluster
Tru64 UNIX Version 5.0 or later provides the following built-in features that make it easier to develop applications that run either in a cluster or on a standalone system:
Stub libraries in Tru64 UNIX Version 5.0 or later let you build
applications that call functions in the
libclu.so
API
library that ships with TruCluster Server.
The
clu_is_member
() function, which is provided in
libc
, determines whether the local system is a
cluster member.
If the local system is a cluster member, the function
returns TRUE; otherwise, it returns FALSE.
The
clu_is_ready
() function, which is also provided in
libc
, determines whether the local system has been
configured to run in a cluster (that is, TruCluster Server Software is
installed and the system is running a clusterized kernel).
If the local
system is configured to run in a cluster, the function returns TRUE;
otherwise, it returns FALSE.
The
clu_is_ready
()
function is most useful in code that runs in the boot path before
the connection manager establishes cluster membership.
The
clu_info
() function and the
clu_get_info
() command return information about the
configuration or a value indicating that the system is not configured
to be in a cluster.
For more information, see
clu_info
(3),
clu_is_member
(3), and
clu_get_info
(8).
7.3 CLSM Support
The Cluster Logical Storage Manager (CLSM) does not provide interlocking support for normal I/O on mirrored volumes between different nodes. CLSM assumes any application that simultaneously opens the same volume from different nodes already performs the necessary locking to prevent two nodes from writing to the same block at the same time. In other words, if a cluster-aware application is not well-behaved and issues simultaneous writes to the same block from different nodes on a CLSM mirrored volume, data integrity will be compromised.
This is not an issue with Oracle Parallel Server (OPS) because OPS uses the distributed lock manager (DLM) to prevent this situation. Also, it is not an issue with the Cluster File System (CFS), because only one node can have a file system mounted at a time.
While steady-state I/O to mirrored volumes is not interlocked by CLSM
between different nodes, CLSM provides interlocking between nodes to
accomplish mirror recovery and plex attach operations.
7.4 Diagnostic Utility Support
If you have, or want to write, a diagnostic utility for an application or
subsystem, the TruCluster Server
clu_check_config
command calls diagnostic utilities,
provides an execution environment, and maintains log files.
See
clu_check_config
(8)
for a description of how to
add a diagnostic utility to the cluster environment and have it called
by the
clu_check_config
command.
7.5 CDFS File System Restrictions
In TruCluster Server Version 5.1A, there are restrictions on managing Compact Disc-Read Only Memory File System (CDFS) file systems in a cluster. Some commands and library functions behave differently on a cluster than on a standalone system.
Table 7-1
lists the CDFS library functions and their
expected behavior in a TruCluster Server environment.
Table 7-1: CDFS Library Functions
Library Function | Expected Result on Server | Expected Result on Client |
cd_drec |
Success | Not supported |
cd_ptrec |
Success | Not supported |
cd_pvd |
Success | Not supported |
cd_suf |
Success | Not supported |
cd_type |
Success | Not supported |
cd_xar |
Success | Not supported |
cd_nmconv CD_GETNMCONV |
Success | Success |
cd_nmconv CD_SETNMCONV |
Success | Success |
cd_getdevmap |
No map | Not supported |
cd_setdevmap |
Not supported | Not supported |
cd_idmap CD_GETUMAP CD_GETGMAP |
Success | Not supported |
cd_idmap CD_SETUMAP
CD_SETGMAP |
Success | Success |
cd_defs CD_GETDEFS |
Success | Success |
cd_defs CD_SETDEFS |
Success | Success |
For information about managing CDFS file systems in a cluster, see the
TruCluster Server
Cluster Administration
manual.
7.6 Scripts Called from the /cluster/admin/run Directory
An application that needs to have specific actions taken on its behalf
when a cluster is created, or when members are added or deleted, can
place a script in the
/cluster/admin/run
directory.
These scripts are called during the first boot of the initial cluster
member following the running of
clu_create
, each cluster
member (including the newest one) following
clu_add_member
,
and all remaining cluster members following
clu_delete_member
.
The scripts in
/cluster/admin/run
must use the following
entry points:
-c
For actions to take when
clu_create
runs.
-a
For actions to take when
clu_add_member
runs.
-d memberid
For actions to take when
clu_delete_member
runs.
Place only files or symbolic links to files that are executable by
root
in the
/cluster/admin/run
directory.
We recommend
that you adhere to the following file-naming convention:
Begin the executable file name with an uppercase letter
C
.
Make the next two characters a sequential number as used in
/sbin/rc3.d
for your area.
For the remaining characters, use a name that is associated with your script.
The following file name is an example of this naming convention:
/cluster/admin/run/C40sendmail
The
clu_create
,
clu_add_member
, and
clu_delete_member
commands create the required
it
(8)
files and links to ensure
that the scripts are run at the correct time.
7.7 Testing the Status of a Cluster Member During a Rolling Upgrade
The following example program shows one way to determine whether the cluster is in the middle of a rolling upgrade, and whether this cluster member has rolled:
#include <stdio.h> #include <sys/clu.h> /* compile with -lclu */ #define DEBUG 1 main() { struct clu_gen_info *clu_gen_ptr = NULL; char cmd[256]; if (clu_is_member()) { if (clu_get_info(&clu_gen_ptr) == 0) { if(system("/usr/sbin/clu_upgrade -q status") == 0) { sprintf(cmd, "/usr/sbin/clu_upgrade -q check roll %d", clu_gen_ptr->my_memberid); if (system(cmd) == 0) { if (DEBUG) printf("member has rolled\n"); } else if (DEBUG) printf("member has not rolled\n"); } else if (DEBUG) printf("no rolling upgrade in progress\n"); } else if (DEBUG) printf("nonzero return from clu_get_info(\n"); } else if (DEBUG) printf("not a member of a cluster\n"); }
7.8 File Access Resilience in a Cluster
While a cluster application is transferring files, read and write operations may fail if the member running the application is shut down or fails. Typically, a client of the application will see the connection to the server as lost. Be aware of how your application handles lost connections. Some ways applications handle lost connections are:
The client application simply fails (for example, an error is written and the application exits).
The client application sees a problem with the connection and automatically retries its read or write operation.
The client application sees a problem with the connection and displays a window that allows the user to abort, retry, or cancel the operation.
If your client application fails when it loses its connection to the server application (regardless of whether it is running on a single system or a cluster), consider implementing the following:
When updating files, first write the update to a new temporary file. If the write operation is successful, copy the temporary file over the original file. If the write operation encountered a problem, you have not destroyed the original file. You need only to clean up your temporary files and start again.
When reading files, make sure that your application is set up to deal with read operation errors and recover from them (for example, retry the operation).