Re: CAM Target Layer and Linux (continued)

On Oct 4, 2012, at 2:52 AM, Chuck Tuffli <ctuffli@gmail.com> wrote:

> On Tue, Oct 2, 2012 at 3:03 AM, Nikolay Denev <ndenev@gmail.com> =
wrote:
>>=20
>> On Sep 27, 2012, at 6:33 PM, Nikolay Denev <ndenev@gmail.com> wrote:
>>=20
>>> Hi All,
>>>=20
>>> With the help of Chuck Tuffli, I'm now able to use CTL to export a =
zvol over FC to a Linux host:
>>>=20
>>> LUN Backend       Size (Blocks)   BS Serial Number    Device ID
>>>  0 block            4185915392  512 FBSDZFS001       ORA_ASM_01
>>>      lun_type=3D0
>>>      num_threads=3D14
>>>      file=3D/dev/zvol/tank/oracle_asm_01
>>>  1 block            4185915392  512 FBSDZFS002       ORA_ASM_02
>>>      lun_type=3D0
>>>      num_threads=3D14
>>>      file=3D/dev/zvol/tank/oracle_asm_02
>>>  2 block            4185915392  512 FBSDZFS003       ORA_ASM_03
>>>      lun_type=3D0
>>>      num_threads=3D14
>>>      file=3D/dev/zvol/tank/oracle_asm_03
>>>  3 block            4185915392  512 FBSDZFS004       ORA_ASM_04
>>>      lun_type=3D0
>>>      num_threads=3D14
>>>      file=3D/dev/zvol/tank/oracle_asm_04
>>>=20
>>> Then we ran some tests using Oracle's ORION benchmark tool from the =
Linux host.
>>> We ran one test which passed successfully,
>>> then I've just disabled zfs prefetch -> "vfs.zfs.prefetch_disable=3D1"=

>>> and rerun the test, which failed due to this error.
>>>=20
>>> On the FreeBSD side:
>>>=20
>>> (0:3:0:1): READ(10). CDB: 28 0 84 f9 58 0 0 4 0 0
>>> (0:3:0:1): Tag: 0x116220, Type: 1
>>> (0:3:0:1): CTL Status: SCSI Error
>>> (0:3:0:1): SCSI Status: Check Condition
>>> (0:3:0:1): SCSI sense: NOT READY asc:4b,0 (Data phase error)
> ...
>> After a whole day of orion tests without problems, we started an =
Oracle ASM instance from the Linux host and
>> again got an error, this time it was WRITE error :
>>=20
>> (0:3:0:3): WRITE(10). CDB: 2a 0 1 5b 10 0 0 4 0 0
>> (0:3:0:3): Tag: 0x110940, Type: 1
>> (0:3:0:3): CTL Status: SCSI Error
>> (0:3:0:3): SCSI Status: Check Condition
>> (0:3:0:3): SCSI sense: NOT READY asc:4b,0 (Data phase error)
>>=20
>> I've tried to track down this "Data phase error" in the CTL code and =
it looks like it is something related to the isp(4) driver:
>=20
> This would have been my first guess if there had been something in the
> logs from isp, but since there wasn't, it's hard to tell. I been
> running orion for ~3hrs now with a different FC driver + an analyzer
> but haven't seen this problem.
>=20
> Would it be possible to stick some prints in default clause of the
> ctlfedone() to confirm if this is front or back end problem?
> Especially interesting would be the value of done_ccb->ccb_h.status.
>=20
> ---chuck

I have added the printfs like this :

--- sys/cam/ctl/scsi_ctl.c.orig	2012-10-04 10:52:57.413144029 +0200
+++ sys/cam/ctl/scsi_ctl.c	2012-10-04 11:23:35.501143149 +0200
@@ -1415,6 +1415,7 @@
 				 */
 				io->io_hdr.port_status =3D 0xbad1;
 				ctl_set_data_phase_error(&io->scsiio);
+				printf("XXX: done_ccb->ccb_h.status =3D =
%lu\n", (long unsigned int)done_ccb->ccb_h.status);
 				/*
 				 * XXX KDM figure out residual.
 				 */

But I've postponed the tests as the pool got nearly filled up, and =
probably the ZVOLs became very fragmented
and they were extremely slow to access and generated scsi timeout and =
abort command errors from the Linux host.
Even deleting them took maybe 40 minutes.

Also there was some bad interaction while accessing the zvols over CAM =
and at the same time using a nfs share from this host,
which bring all disk IO on the pool almost to a stop.

I will create a new zvol tomorrow and retest with the printf enabled, =
while the machine is idle (no nfs activity).


_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"