From: Eric Werme USG (werme@zk3.dec.com)
Date: 09/14/99-03:54:32 PM Z
From: Eric Werme USG <werme@zk3.dec.com> Message-Id: <199909142054.QAA0000745099@anw.zk3.dec.com> Subject: Re: fsync() fails under NFS, right? Date: Tue, 14 Sep 1999 16:54:32 -0400 It occurred to me this is the meat of the argument. You say it's all client business. OK, my client wants to do fsync(). Now your server receives WRITE requests. HOW does my client know, whether your server WROTE THROUGH to disk or CACHED my block? If the server is properly written and the client sends V2 writes or V3 synchronous writes, the server WILL NOT REPLY UNTIL THE DATA IS ON THE DISK. Period. If you are using a stock Linux server, you are using NFS V2 and I believe the Linux server does not wait for writes to reach the disk before replying. That's why many people see much better performance on Linux servers than from vendors you pay money to. Fast, cheap, or reliable - pick two. If you're staking your business on Linux, you need a QA department patterned after the major vendors' departments. This mail list is definitely not a Linux support hotline! How can I be sure that if your server crashes, the thing written before an fsync() doesn't die asleep in that remote cache? In the above examples, the client will retransmit the write until our server reboots and finally gets the data to disk and then replies. Um, our servers don't crash. :-) If you're using V3 async writes, then the client has no way of telling if the data is in the cache or on the disk. However, fsync() will block until all the data is on the disk. The client will send a commit at some point (e.g. fsync()). A "write verifier" is used to determine if the server rebooted between replying to the write and to the commit. If the server rebooted, then the client will retransmit the writes that commit may have missed. Either synchronous writes will be done, or asynchronous followed by another commit will be done, the latter checking the write verifier again. We've been through this many times now, please learn this, I'm not typing it again, but I might mail you the V2 and V3 spec.... It seems to us our NFS servers have huge caches and silently swallow our small 100 MB file. :-) (All our machines, Linux or Sun, start at 0.5 GB RAM.) IT'S TIME FOR SOME PROOF. Take a Sun-Sun system, watch writing a small file with snoop. It will work, you will see substantial delays between commit request and response. Linux client, Sun server will probably work if fsync() isn't a no-op. Linux server will probably reply too quickly in all cases. If you want to prove a failure, start writing a big file, and turn off the server. Turn it back on and reboot. Let the client finish, then verify the file on the server. Try again, but push reset during the write. Repeat until you have a corrupted file, send it and the snoop/tcpdump trace to your vendor's support department. -Ric Werme
This archive was generated by hypermail 2.1.2 : 03/04/05-01:47:34 AM Z CST