pk.org: CS 417/Lecture Notes

Coordination Services and Network-Attached Storage

Terms you should know

Paul Krzyzanowski – 2025-01-31

Coordination Services

Advisory lock
A lock that the system does not enforce; well-behaved processes check for lock ownership before proceeding, but nothing prevents code from ignoring it.
Cache invalidation
A message sent by a server to clients that hold cached copies of a resource, telling them their copy is no longer valid.
Chubby cell
A deployment unit of Google’s Chubby service, consisting of five replicas with one elected master; clients within a data center contact their local cell.
Coarse-grained lock
A lock held for a long period (minutes to hours) over a large resource such as a master role or an entire table, as opposed to a fine-grained lock held for milliseconds over a small resource.
Coordination service
A small, strongly consistent, highly available replicated store used by distributed systems for control-plane decisions such as leader election, locking, and configuration management.
Ephemeral node (znode)
A ZooKeeper node that is automatically deleted when the client session that created it ends, used as the primary mechanism for failure detection.
etcd
A strongly consistent distributed key-value store using Raft consensus, serving as the authoritative state store for Kubernetes.
Fencing token
A monotonically increasing number issued with each lock grant; the protected resource rejects requests carrying a token lower than the highest it has seen, preventing a stale lock holder from corrupting shared state.
Fine-grained lock
A lock held briefly (milliseconds) to protect a small resource such as a single record; not suitable for consensus-backed coordination services due to high acquisition frequency.
Grace period
In Chubby, the window of time after a master failover during which clients may reconnect and re-establish their sessions before the new master releases their locks.
Lease
A time-bounded grant from a server to a client; the client may rely on the grant (cached data, lock ownership) until the lease expires, and must renew it periodically to maintain it.
Leader election
The coordination pattern by which one replica among several atomically claims a well-known name in a coordination service and holds it as long as its session is alive.
Linearizable read
A read that reflects the most recently committed write, as if the entire system had a single consistent view at that instant; the strongest read consistency guarantee.
Persistent node (znode)
A ZooKeeper node that survives client disconnection and remains until explicitly deleted.
Sequential node (znode)
A ZooKeeper node created with a monotonically increasing integer appended to its name, used to implement ordered lock queues and prevent the thundering herd problem.
Service discovery
The coordination pattern by which running service instances register their addresses under ephemeral keys so that clients can find currently available instances.
Session timeout
In ZooKeeper, the negotiated period within which a client must send heartbeats; if the timeout elapses without a heartbeat, the session ends and all ephemeral nodes created by that session are deleted.
Thundering herd problem
A condition where a single state change simultaneously wakes many waiting clients, causing a burst of requests that overwhelms the system even though only one client can make progress.
Watch
In ZooKeeper and etcd, a one-shot notification that fires when a specified znode or key changes; the client must re-register the watch after each firing.
ZooKeeper
An open-source coordination service from Yahoo that provides a hierarchical namespace of znodes and primitives for building locks, leader election, and other coordination patterns.

Network-Attached Storage: General Concepts

Access transparency
The property that remote files are accessible via the same system calls as local files, with no changes required in applications.
Close-to-open consistency
A caching model in which a client checks the server for updates when a file is opened; stale reads are possible between opens; used by NFS.
Mount point
A location in a directory tree where a different file system (local or remote) is attached, making its contents accessible under that path.
Read-ahead
A client-side optimization that speculatively fetches blocks beyond what the application has requested, anticipating sequential access and hiding network latency.
Referral
A response from a running server directing a client to contact a different server for a resource; used for planned migrations and namespace federation, not failure recovery.
Session semantics
A caching model in which a client’s changes to a file are not visible to other clients until the file is closed and uploaded; last writer wins on conflict; used by AFS and Coda.
Stateless server
A server that holds no information about client activity between requests; each request carries all necessary context, making crash recovery trivial.
Virtual File System (VFS)
A kernel abstraction layer that defines a standard interface for file system operations; allows local and remote file systems to coexist and be mounted into a single directory tree.
Write-behind
A caching policy in which modifications are accumulated locally and sent to the server in batches; more efficient than write-through but leaves the server with stale data in the interim.
Write-through
A caching policy in which every write is sent immediately to the server, keeping the server’s copy always current.

NFS

File handle (NFS)
An opaque server-generated identifier for a file or directory, used by the client in all subsequent requests; persists across server restarts to support stateless operation.
Network Lock Manager (NLM)
A separate service added alongside NFS to provide advisory file locking, a stateful function that could not be part of the stateless NFS protocol itself.
NFS (Network File System)
A stateless, RPC-based distributed file system designed for simplicity and cross-platform interoperability; uses file handles and timestamp-based cache validation.
Silly rename
An NFS client workaround for the lack of open-file reference counting: a file to be deleted while locally open is renamed to a hidden name and deleted only when the last local file descriptor closes.
Timestamp validation
The NFS mechanism for cache consistency; the client compares the server’s file modification time against its cached copy and discards the cache if the server’s version is newer.

AFS

AFS callback promise
A server-to-client guarantee that the server will send a revocation if a cached file is modified, allowing the client to use its local copy indefinitely without polling.
AFS callback revocation
A notification sent by an AFS server to all clients that have cached a file, telling them to invalidate their copy because the file has been modified.
Andrew File System (AFS)
A distributed file system using whole-file caching and server callbacks to achieve high scalability; enforces a uniform global namespace under /afs.
Uniform global namespace
An AFS property ensuring that the path to any file is identical on every client machine, in contrast to NFS where administrators mount remote directories at arbitrary local paths.
Volume (AFS/Coda)
An administrative unit of the AFS and Coda file systems, typically corresponding to a user’s home directory or a software repository; can be moved between servers transparently via referrals.
Whole-file caching
The AFS model in which an entire file is downloaded to the client’s local disk on open; reads and writes operate on the local copy until the file is closed.

Coda

Accessible Volume Storage Group (AVSG)
The subset of a VSG’s servers that a Coda client can currently reach; reads are served from any AVSG member and writes go to all of them simultaneously.
Client modification log (CML)
In Coda, a log of file system operations (store, create, remove, rename) recorded during disconnected operation; actual file data remains in the local disk cache and is uploaded when the CML is replayed on reconnection.
Coda
A distributed file system extending AFS to support disconnected operation via a client modification log, conflict detection on reconnection, and cache hoarding.
Disconnected operation
A Coda operating mode in which the client works entirely from its local disk cache when no server is reachable, recording operations in the CML for later replay.
Hoarding
In Coda, the pre-population of the local cache with user-specified files before going offline to ensure they are available during disconnected operation.
Volume Storage Group (VSG)
The set of servers that host a replicated Coda volume; the client checks that all accessible VSG members have consistent versions of a file before opening it.

SMB

Compound RPC
An NFSv4 and SMB 2 feature that packs multiple operations into a single network message, reducing round-trip latency for common operation sequences.
Delegation (NFSv4)
A server-granted right allowing a client to cache reads or writes for a file without contacting the server; the server recalls the delegation if another client creates a conflict.
Distributed File System (DFS)
A Microsoft namespace service, separate from SMB itself, that maps logical paths to physical server locations and issues referrals so clients are transparently redirected when content moves; predates SMB 2 and works across SMB versions.
Durable handle
An SMB 2 feature that preserves open file state across brief client disconnections, allowing the client to resume without reopening files or re-establishing locks.
Oplock break
A server-to-client message in SMB that revokes a previously granted opportunistic lock, requiring the client to flush any cached writes to the server before the conflicting access can proceed.
Opportunistic lock (oplock)
An SMB server-granted caching right; the server monitors access and sends an oplock break to the caching client when a conflict arises, requiring it to flush cached writes before the competing open proceeds.
SMB (Server Message Block)
Microsoft’s stateful, connection-oriented file sharing protocol; tracks open files, locks, and sessions at the server and enforces Windows file-sharing semantics.
SMB Multichannel
An SMB 3 feature that allows a single session to use multiple network interfaces simultaneously, increasing throughput and providing path redundancy.
Transparent Failover
An SMB 3 feature that allows a client connected to a clustered file server to survive the failure of one cluster node without losing open files or locks, because session state is shared across cluster nodes.