 |
Index for Section 4 |
|
 |
Alphabetical listing for V |
|
volrec(4)
NAME
volrec - Structure defining a volume record
SYNOPSIS
#include <sys/types.h>
#include <sys/vol.h>
#define NAME_LEN 14
#define COMMENT_LEN 40
#define UTIL_NUM 3
#define UTIL_LEN 14
#define NAME_SZ (NAME_LEN + 1)
#define COMMENT_SZ (COMMENT_LEN + 1)
#define UTIL_SZ (UTIL_LEN + 1)
struct volseqno { ulong_t seqno_lo, seqno_hi; };
typedef struct volseqno volseqno_t;
typedef struct volseqno volrid_t;
struct volrec {
struct v_tmp v_tmp; /* non-persistent fields */
struct v_perm v_perm; /* persistent fields */ };
DESCRIPTION
The volrec structure is used internally by LSM. This structure is used to
communicate volume record information between the volume configuration
daemon, vold, and programs using the Logical Storage Manager library to
query for configurations and to make configuration changes.
The two structures contained in the volrec structure differentiate elements
of the volume record that are persistent and that are non-persistent. The
division of fields between v_tmp and v_perm structures is somewhat
historical, however the v_perm structure contains information that is
stored persistently (for example, fields that are recovered unchanged after
a system reboot), or is directly derivable from persistent volume record
information. The v_tmp field, on the other hand, contains fields that can
be modified without the changes being stored persistently.
The v_perm structure includes the following fields:
char v_name[NAME_SZ]; /* record name */
char v_use_type[NAME_SZ]; /* volume usage type name */
char v_fstype[FSTYPE_SZ]; /* guess of volumes fstype */
char v_comment[COMMENT_SZ]; /* comment field */
char v_putil[UTIL_NUM][UTIL_SZ];
/* persistent util fields */
char v_state[STATE_SZ]; /* utility state of volume */
char v_pref_name[NAME_SZ]; /* plex name if V_PREPER */
char v_start_opts[V_STOPTS_SZ];
/* volume start options */
enum vol_r_pol v_read_pol; /* method of plex selection */
minor_t v_minor; /* minor number in disk group */
uid_t v_uid; /* owner of /dev/vol/name */
gid_t v_gid; /* group of /dev/vol/name */
mode_t v_mode; /* mode of /dev/vol/name */
ulong_t v_pflag; /* persistent volume flags */
long v_pl_num; /* associated plex count */
volseqno_t v_update_tid; /* trans id of last update */
voff_t v_len; /* byte length of volume */
voff_t v_log_len; /* length of log area */
volrid_t v_rid; /* unique identifier */
volrid_t v_pref_plex_rid; /* preferred plex record ID */
volseqno_t v_detach_tid; /* trans id of kernel detach*/
The v_tmp structure includes the following fields:
char v_tutil[UTIL_NUM][UTIL_SZ]; /* non-persistent util fields */
long v_rec_lock; /* 1 if record is locked */
long v_data_lock; /* 1 if volume is data locked */
enum vol_kstate v_kstate; /* relation to file space */
enum vol_except v_r_all; /* if all plex reads fail */
enum vol_except v_r_some; /* if some plex reads fail */
enum vol_except v_w_all; /* if all plex writes fail */
enum vol_except v_w_some; /* if some plex writes fail */
long v_lasterr; /* last volume error or 0 */
ulong_t v_tflag; /* non-persistent volume flags */
long v_log_serial_lo; /* log serial number/low part */
long v_log_serial_hi; /* log serial number/hi part */
dev_t v_bdev; /* block dev for volume */
dev_t v_cdev; /* char dev for volume */
size_t v_iosize; /* minimum size for raw I/Os */
voff_t v_rwback_offset; /* read/write-back offset */
The uses of the various volume fields are defined as follows:
v_name
The volume name.
v_rid
This is a 64-bit record ID assigned to the volume record, which is
unique within the disk group for the duration of existence for the disk
group.
v_use_type
The usage type associated with the volume. This is used to select a
utility set that maintains state and plex consistency in a manner
appropriate to the usage of the volume.
v_fstype
The file system type of any file system residing on the volume. A
usage type may choose to use or ignore this field.
v_comment
A null-terminated comment string associated with the record. The
contents are arbitrary except that they cannot contain a new line.
v_putil
An array of three null-terminated strings that can be used as scratch
pads by utilities. These fields are preserved across reboots. By
convention, the first field is reserved for usage types; the second
field for higher-level applications, such as the Visual Administrator;
and the third field for local site administrators.
v_state
A null-terminated state field that is reserved specifically for use by
usage types.
v_pref_name
The name of the preferred plex for use when the v_read_pol field is set
to V_PREFER. This field is derived from the v_pref_plex_rid field.
v_start_opts
This is an arbitrary string that is reserved for usage-type utilities.
The intention is that this field be used to store options that apply to
the volume, such as for the volume start operation. This is normally a
comma-separated list of flag names, and option=value pairs. See the
gen and fsgen versions of volume(8) for information on how this field
is used by the gen and fsgen utilities.
v_read_pol
The policy for selecting plexes to satisfy volume read operations.
This can have one of the following values:
V_ROUND Candidate plexes are selected in sequence for each sequential
volume read operation. This is known as a round-robin approach.
V_PREFER
The plex named by the v_pref_name field is used if it can
satisfy the read request. If the preferred plex cannot satisfy
the read request, then this policy becomes equivalent to the
round-robin policy.
V_R_POL_SELECT
A default policy is selected based on the current configuration
of the volume. If the volume has two or more active plexes, and
exactly one of those plexes is striped, then the striped plex
is preferred; otherwise, the round-robin read policy is used.
v_minor
The minor number of the block and character volume devices associated
with the volume record. The volume minor number is assigned when the
volume is created. This is a read-only field. Conditions may force the
actual volume device minor number to differ from the v_minor field.
This can happen in disk groups other than rootdg, if a conflict occurs.
This can also happen in the rootdg disk group if the V_PFLAG_FORCEMINOR
flag is used to force a particular value for v_minor, even if the
indicated number is unavailable.
v_uid, v_gid, and v_mode
The user ID, group ID, and permission modes for the volume's block and
character device nodes, and for the device nodes for the associated
plexes.
v_pflag
Flags associated with the volume that are preserved across reboots. The
set of persistent flags that can be set is:
V_PFLAG_WRITEBACK
The write-back-on-read-failure flag. If set, then an attempt is
made to fix a read error from a participating plex (i.e., one
without the noerror flag). The method used to fix the read
error is to read from another plex associated with the volume
and write back to the plex with the read error. The read
operation is then retried to verify that the operation is
fixed. This requires at least two associated, enabled,
participating, read-mode plexes.
This is an effective way of handling device drivers that can
revector blocks on write failures, and can be used to handle
the majority of media failures on many disk drives. For this
operation to be effective, the underlying device driver must
not revector blocks on read errors.
V_PFLAG_WRITECOPY
If set (volmake and volassist set this by default), then some
writes to mirrored volumes that use block change logging will
be copied into an allocated kernel buffer before being written
to disk. The reason for doing such a copy is that write
requests given to the volume device driver can point to pages
of memory that are still undergoing change. Without doing a
copy, the blocks written to each plex might be different. If
you are sure that your application does not modify pages while
they are written, or if you are certain that mirrors with
differing contents do not represent a problem, then you can
turn off this flag.
V_PFLAG_ACTIVE
This flag is set on a reboot if the volume was open at the time
of a system crash, and the volume had been written at least
once. This implies that the volume, if it is mirrored, requires
recovery to ensure consistency between plexes.
V_PFLAG_FORCEMINOR
If this is set, then force the setting of v_minor specified on
creation of the volume record. If this flag is not set, v_minor
might be remapped to an unused value. This flag is required to
set minor numbers less than 5. This does not guarantee that the
actual volume device node will have the indicated minor number,
however, if the volume is in rootdg, then the volume will be
given that minor number (if no other volume in the disk group
has that minor number) after a reboot.
V_PFLAG_LOGTYPE
This is a bit-mask that specifies bits in the v_pflag field
that indicate the logging type for the volume. The bits masked
out by this macro can have one of the following values:
V_PFLAG_LOGUNDEF
The logging type is undefined. Volumes that were
created in Release 1.0 of the Logical Storage Manager
have this type. This value is effectively identical to
V_PFLAG_NONE except that utilities are able to use the
V_PFLAG_LOGUNDEF flage as a license to default the
logging type to something else.
V_PFLAG_LOGNONE
No logging is performed for the volume. Even if a
logging subdisk is defined for a plex, the logging
subdisk is not used.
V_PFLAG_LOGBLKNO
A block change log is written periodically to each log
subdisk associated with an associated, enabled, write-
only plex. This log lists all blocks which have been
received but which have not yet been written to disk.
These logs are exactly one sector in length. All writes
to the volume are first written into the log, and are
not removed from the log until the write to disk has
been confirmed, or has failed. If the log fills up,
then some writes are delayed until entries in the log
are freed.
v_pl_num
The number of plexes associated with the volume.
v_update_tid
The transaction ID of the last update to this record. This field is
assigned when changes to a disk group are committed.
v_len
The length of the volume. This can be set arbitrarily, even if it is
longer or shorter than some or all of the associated plexes. This value
is in sectors.
v_log_len
The length for a volume log. For the block-change-logging log type,
this value must always be 1. However, future logging types may support
larger log lengths. The length for all subdisk logs associated with the
volume must be at least this long. This value is in sectors.
v_plex_plex_rid
Specify the record ID of the preferred plex for the volume. This field
is used only if v_read_pol is set to V_PREFER.
v_tutil
An array of three null-terminated strings that can be used as scratch
pads by utilities. These fields are cleared on reboot. By convention,
the first field is reserved for usage types, the second field for
higher-level applications, such as the Visual Administrator; and the
third field for local site administrators.
v_rec_lock
A boolean value that is 1 if the volume is date-locked in the caller's
current transaction, and 0 otherwise. This is a read-only field.
v_data_lock
A boolean value that is 1 if the volume is data-locked in the caller's
current transaction, and 0 otherwise. This is a read-only field.
v_kstate
The accessibility of the volume. This field can have one of the
following values:
V_ENABLED
The volume block device can be used, and reads and writes to
the block or character volume device are accepted.
V_DETACHED
The volume block device cannot be used, and reads or writes to
the character device are rejected. Volume ioctls are still
usable, and the plex devices for associated plexes can be used,
within the bounds of the plex pl_kstate fields.
V_DISABLED
The volume cannot be used for any operations, and neither can
the plex devices for any of the associated plexes.
This field is set to V_DISABLED after a reboot.
v_r_all, v_r_some, v_w_all, and v_w_some
Exception policies for the volume. These devices are classified by the
following types:
v_r_all Read failure on all plexes
v_r_some
Read failure on some plexes
v_w_all Write failure on all plexes
v_w_some
Write failure on some plexes
If one of these exception conditions is encountered, then the
corresponding action is taken. The possible actions are:
V_NO_OP Takes no action. However, if the operation fails for all
candidate plexes, then the operation still fails.
V_FAIL_OP
Fails the operation, but takes no further action.
V_DET_PL
Detaches the plex with the failure. The operation fails only if
the operation fails for all candidate plexes.
V_FAIL_DET_PL
Detaches the plex with the failure and returns a failure for
the operation, even if the operation can be satisfied by
another plex.
V_DET_VOL
Detaches the volume but does not fail the operation.
V_FAIL_DET_VOL
Detaches the volume and fails the operation.
V_GEN_DET
A higher-level error policy which detaches failing plexes.
However, if detaching a complete plex would result in no
complete plexes remaining, then V_GEN_DET detaches the volume
rather than detaching the failing plexes. A complete plex is
one that has the PL_TFLAG_COMPLETE flag set in the plex
pl_tflag field.
V_GEN_DET_SPARSE
A higher-level error policy which detaches failing plexes.
However, if detaching a plex results in no complete plexes
remaining, then V_GEN_DET_SPARSE leaves exactly one complete
plex enabled, and detaches all incomplete plexes that have
volume blocks mapped to subdisks in the region of the failure.
This policy allows the volume to continue operating on a
failing plex, and does not disable mirrored regions that are
unaffected by the failing operation.
In the case of a logging volume, the volume is detached if a
write failure occurs to all enabled log subdisks associated
with the volume.
V_GEN_FAIL
Detaches the failing plexes, and the volume, and returns a
failure for the operation. This policy can be used by
applications that wish to make decisions about changing the
Logical Storage Manager configuration based on failures. The
detached state of a plex can be used as an indication of which
plexes failed, and making the volume detached prevents future
I/Os from succeeding until the problem is resolved.
V_GEN_DET2
This operates exactly like the V_GEN_DET error policy, except
that it detaches the volume if the number of complete plexes
would drop below two. This ensures that a volume is either
mirrored to at least two plexes, or is non-operational until
the situation is repaired.
Not all plexes are taken into account in the exception policy selection
or actions. A plex is ignored under any of the following conditions:
·
The plex is not enabled.
·
The plex does not have a read or write mode appropriate for the
operation.
·
The plex has the PL_PFLAG_NOERROR flag set.
·
The plex does not have mapped subdisk blocks that are appropriate for
the range of the requested operation.
The exception policies are normally set implicitly by the operational
utilities. The utilities provided by set all the exception policies to
V_GEN_DET_SPARSE and do not provide a means for changing the policies
to something else.
v_lasterr
A sequence number for the last I/O error to be encountered on the
volume. This is a read-only field.
v_tflag
A bitmask of flags that is cleared after a reboot. Flags defined in
this field are:
V_TFLAG_RWBACK
A flag that can be turned on to request read/writeback mode.
In read/writeback mode, a read request for a mirrored volume
will write back to all other plexes the resulting data from the
read. The operation is affected by the v_rwback_offset field.
This mode is intended for volume recovery operations.
V_TFLAG_KRWBACK
This is a status flag which indicates that the read/writeback
mode operation is still in effect. This flag is set when
V_TFLAG_RWBACK is set. If the read/writeback offset (see
v_rwback_offset) reaches the end of the volume, then the kernel
will turn off this flag.
VK_OPEN A status flag that indicates that the volume device that
corresponds to the volume record is open or mounted as a file
system.
V_TFLAG_LOGGING
A status flag which indicates the volume has a logging type of
VOL_PFLAG_LOGBLKNO, is enabled, and has at least one enabled,
associated plex with an enabled log subdisk. This flag is not
cleared when exception policies are invoked that detach a
volume or its plexes.
V_TFLAG_INVALID
An error has rendered the volume unusable. The volume cannot be
started.
v_log_serial_lo and v_log_serial_hi
These values, taken together, yield a unique monotonically increasing
value that is changed for every log write that occurs to a volume with
logging enabled. These two numbers are cleared by a reboot, but are
normally set explicitly by a volume start operation. The value in
v_log_serial_lo is incremented by one for every log write.
Unlike all other flags, the values of the log serial number fields
cannot always be trusted within a transaction. The reason for this is
that data-locks are not obtained by vold until after a utility has
completely described a transaction for vold to transmit to the kernel.
Other fields that can be changed by the kernel are checked at the time
of a vol_commit to ensure that the fields haven't changed, and if any
kernel-modifiable fields have changed since the corresponding vol_trans
call, then the utility is asked to retry the transaction.
However, a volume with significant I/O activity is likely to change the
value of the serial number fields often enough that such volumes may
have to be retried an unacceptable number of times, so these fields are
not checked.
Utilities must be prepared to ensure that volume logs are in a
quiescent state (normally by setting the volume to V_DETACHED or by
disabling logging) before using the value of a log within a
transaction. The existing utility set uses the log serial number
fields only to set the serial number for a volume.
v_bdev, v_cdev
The device numbers for the volume block device node. Normally, these
are computed from the v_minor number. However, in cases of collision,
they may have different minor numbers.
v_iosize
The largest sector size of any disk associated (through a subdisk) with
the volume. At the present time, only one sector size (normally 512
bytes) is supported, so this field will always match the single system
sector size.
v_rwback_offset
When read/writeback mode is turned on, this field is loaded into the
kernel as the current read/writeback offset pointer. Reads then occur
before this offset into the volume will not invoke read/writeback
recovery. If a read occurs on the boundary, then then the kernel will
increase the pointer to the end of that read, after a successful result
from the operation. This automatically-increasing pointer causes the
degradation from the read/writeback mode to decrease as volume recovery
progresses.
SEE ALSO
volintro(8), vold(8), voliod(8), volmake(4), plexrec(4), sdrec(4)