[OSPL-Dev] Segfault when creating participant

Discussion:

S. Poehlsen

16 years ago

Hello, it's me again.

I forgot to dispose and unregister my messages in my programs.
After sending to many messages I was not able to start my program anymore
because DDS_DomainParticipantFactory_create_participant() failed with a
NULL pointer.

I think it would be better to return a proper error message since the
failing program does not really has something to do with the misbehavior
of other programs.

A valgrind output looks like this:

==24043== Invalid write of size 4
==24043== at 0x46F1EC7: c_new (in .../HDE/x86.linux2.6/lib/libddsdatabase.so)
==24043== by 0x4955927: v_qosCreate (in .../HDE/x86.linux2.6/lib/libddskernel.so)
==24043== by 0x49535E4: v_participantQosNew (in .../HDE/x86.linux2.6/lib/libddskernel.so)
==24043== by 0x49532A2: v_participantNew (in .../HDE/x86.linux2.6/lib/libddskernel.so)
==24043== by 0x49ADFE4: u_participantNew (in .../HDE/x86.linux2.6/lib/libddsuser.so)
==24043== by 0x49E67F5: _DomainParticipantNew (in .../HDE/x86.linux2.6/lib/libdcpsgapi.so)
==24043== by 0x49E7CC0: gapi_domainParticipantFactory_create_participant (in .../HDE/x86.linux2.6/lib/libdcpsgapi.so)
==24043== by 0x472F22F: DDS_DomainParticipantFactory_create_participant (in ../HDE/x86.linux2.6/lib/libdcpssac.so)
==24043== by 0x804C41D: main (rpctest.c:177)
==24043== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==24043==
==24043== Process terminating with default action of signal 11 (SIGSEGV)

In ospl-error.log I can found the following error message.

========================================================================================
Report : ERROR
Date : Fri May 15 15:05:39 2009
Description : Memory claim denied: required size (64) exceeds available resources (104)!
Node : acc
Process : 24679
Thread : main thread 4a59bc0
Internals : V4.1.090513/c_mmbase/c_mmbase.c/252/0/280393987

Steph

S. Poehlsen

16 years ago

Permalink

Hello.

Sometimes the opensplice daemon seems to break and brings later joining
programs down. The attached example code dies with a segmentation fault
in line 16 in this case.

How can I check in an external program if opensplice works before trying
to get the participant? Shouldn't it be better to get a appropriate
return code from DDS_DomainParticipantFactory_create_participant()?

Steph

==16373== Invalid write of size 4
==16373== at 0x46F1EC7: c_new (in
.../HDE/x86.linux2.6/lib/libddsdatabase.so)
==16373== by 0x494C60E: v_objectNew (in
.../HDE/x86.linux2.6/lib/libddskernel.so)
==16373== by 0x49532BB: v_participantNew (in
.../HDE/x86.linux2.6/lib/libddskernel.so)
==16373== by 0x49ADFE4: u_participantNew (in
.../HDE/x86.linux2.6/lib/libddsuser.so)
==16373== by 0x49E67F5: _DomainParticipantNew (in
.../HDE/x86.linux2.6/lib/libdcpsgapi.so)
==16373== by 0x49E7CC0:
gapi_domainParticipantFactory_create_participant (in
.../HDE/x86.linux2.6/lib/libdcpsgapi.so)
==16373== by 0x472F22F:
DDS_DomainParticipantFactory_create_participant (in
.../HDE/x86.linux2.6/lib/libdcpssac.so)
==16373== by 0x804B24E: main (bug.c:16)
==16373== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==16373==
==16373== Process terminating with default action of signal 11 (SIGSEGV)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: bug.c
Type: text/x-csrc
Size: 798 bytes
Desc: not available
URL: <http://dev.opensplice.org/pipermail/developer/attachments/20090518/d597f554/attachment.c>

S. Poehlsen

16 years ago

Permalink

Hello,

it is me again. I just discovered a few log message in /var/log/kern.log.

May 18 11:48:55 acc kernel: [11292.382943] durability[12790]: segfault at 0 ip 00000000f7d30ec7 sp 00000000f73cb230 error 6 in libddsdatabase.so[f7d28000+32000]
May 18 11:49:14 acc kernel: [11310.850486] networking[12771]: segfault at 4 ip 00000000f7cb2cbd sp 00000000f74331f0 error 6 in libddsdatabase.so[f7caa000+32000]
May 18 12:58:15 acc kernel: [15452.607921] rpctest[15308]: segfault at 0 ip 00000000f7ef7ec7 sp 00000000ff840bc0 error 6 in libddsdatabase.so[f7eef000+32000]
May 18 12:58:16 acc kernel: [15453.085148] durability[15303]: segfault at 0 ip 00000000f7cf0ec7 sp 00000000f7288230 error 6 in libddsdatabase.so[f7ce8000+32000]
May 18 13:27:40 acc kernel: [17217.236501] bug[16368]: segfault at 0 ip 00000000f7fa1ec7 sp 00000000ffcedc70 error 6 in libddsdatabase.so[f7f99000+32000]
May 18 13:52:49 acc kernel: [18725.772874] rpctest[17172]: segfault at 0 ip 00000000f7eb4ec7 sp 00000000ff9fe7b0 error 6 in libddsdatabase.so[f7eac000+32000]
May 18 13:52:53 acc kernel: [18729.945846] networking[15284]: segfault at 4 ip 00000000f7ca5cbd sp 00000000f74261f0 error 6 in libddsdatabase.so[f7c9d000+32000]
May 18 14:19:36 acc kernel: [20333.336974] rpctest[18150]: segfault at 0 ip 00000000f7eb2ec7 sp 00000000ff8fc6f0 error 6 in libddsdatabase.so[f7eaa000+32000]
May 18 14:19:36 acc kernel: [20333.337261] networking[17206]: segfault at 0 ip 00000000f7ca1ec7 sp 00000000f7454f60 error 6 in libddsdatabase.so[f7c99000+32000]
May 18 14:19:41 acc kernel: [20338.445495] durability[17228]: segfault at 0 ip 00000000f7cd3ec7 sp 00000000f7266230 error 6 in libddsdatabase.so[f7ccb000+32000

Another curios thing is: I have just running three nodes
with ospl and were transferring data between two nodes while
the third node raises some segfaults.

Steph

Hans van't Hag

16 years ago

Permalink

Hi Steph,

From the log-files and your description it is clear that your SegFaults

are related to the exhaustion of the shared memory segment that is
utilized by OpenSplice DDS to efficiently share information within a
single-node. As this memory resource is limited, you have to be careful
in 'unrestricted usage' and related DDS QOS-policies that are the
available means in DDS to prevent monopolization of platform resources.
We're looking into potential extensions of the standard and/or our
product to define watermarks/thresholds (with related triggers) to allow
for better control from an application perspective as well as more
graceful middleware behavior upon 'exhaustion'. However, in many cases
its flaws in the information-model and/or application information
processing that cause memory exhaustion. Lets summarize a few typical
cases:

Information model: keys & instances

From an information-model perspective, using 'keys' in DDS is a powerful

database-feature, yet induces a 'storage-spectrum' that depends on the
number of keys and their values. Each unique key-value identifies and
implies an 'instance' that relates to both the data itself (the samples
belonging to that instance) as well as meta-data that describes the
state of each instance (see chapter 7.1.2.5.1 of the DDS specification
w.r.t. instance_state and view_state). As instances (and related
storage) are created explicitly, using register_instance() or implicitly
(by writing a sample with implicit registration) they should also be
deleted to free-up the related resources (again either explicitly by
calling unregister_instance() or implicitly by deletion of the related
dataWriter). So using increasing/unbounded key-values without managing
the related instances can easily lead to memory-exhaustion. Please note
that 'taking' samples by a dataReader doesn't remove/unregister the
related instance-administration as it only 'removes' the samples from
the DataReader cache as explained below).

Application: history & resource-limits

From an application processing perspective, there are some QoS policies

for 'writers' and 'readers' that dramatically impact resource
(memory-)usage and/or help in controlling/bounding those. First of all,
when using the concept of HISTORY, published samples will be maintained
as historical data for (later) application processing, which of course
utilizes resources. Especially the KEEP_ALL policy-value (indicating
that all samples must be maintained) can cause unlimited memory usage in
case no RESOURCE_LIMITS are specified for the related dataWriter or
dataReader. As explained above, using the take()method of the dataReader
will remove the topic-sample(s) from the history so will free-up memory
rather than using the read() method which provides the data (samples and
related sampleInfo meta-data) without removing the data from the
dataReader history (so that it can be queried/read again).

So now back to your use-case, it could be that there's nothing wrong
with how you've designed you information-model (topics/keys) or how
you're processing the information (using history/resource-limits) but
that simply you have configured the middleware with a too-small
shared-memory-segment (as defined in the domain-URI/configuration
xml-file under "Opensplice>Domain>Database>Size"), in which case you
just have to increase the shared-memory-Database size. On the other
hand, it could also mean that you're 'dynamically' running out of
resources as described above).

There's also a useful tool included in the latest community-edition
release now (called 'mmstat') that can be used to track the utilization
of the shared-memory-segment at runtime. Mmstat provides detailed
insight in the shared-memory utilization of your running OpenSplice DDS
system. It has both basic and expert modes to keep track of actual and
maximum shared-memory-usage and as such can help in both sizing the
shared-memory segment as well as pinpointing issues w.r.t. data-design
(such as continuously increasing key-values) and/or application-behavior
(such as applications that don't bound their resource-usage).

Hope this helps,

Cheers,

Hans

Hans van 't Hag

OpenSplice DDS Product Manager

PrismTech Netherlands

Email: hans.vanthag at prismtech.com

Tel: +31742472572

Fax: +31742472571

Gsm: +31624654078

-----Original Message-----
From: developer-bounces at opensplice.org
[mailto:developer-bounces at opensplice.org] On Behalf Of S. Poehlsen
Sent: Monday, May 18, 2009 2:34 PM
To: OpenSplice DDS Developer Mailing List
Subject: Re: [OSPL-Dev] Segfault when creating participant

Hello,

it is me again. I just discovered a few log message in
/var/log/kern.log.

May 18 11:48:55 acc kernel: [11292.382943] durability[12790]: segfault
at 0 ip 00000000f7d30ec7 sp 00000000f73cb230 error 6 in
libddsdatabase.so[f7d28000+32000]

May 18 11:49:14 acc kernel: [11310.850486] networking[12771]: segfault
at 4 ip 00000000f7cb2cbd sp 00000000f74331f0 error 6 in
libddsdatabase.so[f7caa000+32000]

May 18 12:58:15 acc kernel: [15452.607921] rpctest[15308]: segfault at 0
ip 00000000f7ef7ec7 sp 00000000ff840bc0 error 6 in
libddsdatabase.so[f7eef000+32000]

May 18 12:58:16 acc kernel: [15453.085148] durability[15303]: segfault
at 0 ip 00000000f7cf0ec7 sp 00000000f7288230 error 6 in
libddsdatabase.so[f7ce8000+32000]

May 18 13:27:40 acc kernel: [17217.236501] bug[16368]: segfault at 0 ip
00000000f7fa1ec7 sp 00000000ffcedc70 error 6 in
libddsdatabase.so[f7f99000+32000]

May 18 13:52:49 acc kernel: [18725.772874] rpctest[17172]: segfault at 0
ip 00000000f7eb4ec7 sp 00000000ff9fe7b0 error 6 in
libddsdatabase.so[f7eac000+32000]

May 18 13:52:53 acc kernel: [18729.945846] networking[15284]: segfault
at 4 ip 00000000f7ca5cbd sp 00000000f74261f0 error 6 in
libddsdatabase.so[f7c9d000+32000]

May 18 14:19:36 acc kernel: [20333.336974] rpctest[18150]: segfault at 0
ip 00000000f7eb2ec7 sp 00000000ff8fc6f0 error 6 in
libddsdatabase.so[f7eaa000+32000]

May 18 14:19:36 acc kernel: [20333.337261] networking[17206]: segfault
at 0 ip 00000000f7ca1ec7 sp 00000000f7454f60 error 6 in
libddsdatabase.so[f7c99000+32000]

May 18 14:19:41 acc kernel: [20338.445495] durability[17228]: segfault
at 0 ip 00000000f7cd3ec7 sp 00000000f7266230 error 6 in
libddsdatabase.so[f7ccb000+32000

Another curios thing is: I have just running three nodes

with ospl and were transferring data between two nodes while

the third node raises some segfaults.

Steph

_______________________________________________

OpenSplice DDS Developer Mailing List

Developer at opensplice.org

Subscribe / Unsubscribe
http://www.opensplice.org/mailman/listinfo/developer

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://dev.opensplice.org/pipermail/developer/attachments/20090519/64ab7b27/attachment.htm>

S. Poehlsen

16 years ago

Permalink

Hi,

*Information model: keys & instances*

I am using a unique 36 char long string (UUID) as my key for each
message. After each write to a write I immediately dispose and
unregister_instance so that the memory from the writers point of view
should be freed.

*Application: history & resource-limits*

On the reader side I always use "take" to remove the sample and "loan"
to release the allocated memory from the reader.

I have not change any QoS setting, so it should just work fine.

There should always be one one request and one response message be in
the system since my program starts the next call only after receiving
the result from the previous call.

There?s also a useful tool included in the latest community-edition
release now (called ?mmstat?) that can be used to track the utilization
of the shared-memory-segment at runtime. Mmstat provides detailed

Hans van't Hag

16 years ago

Permalink

Hi Steph,

A few remarks: when a writer unregisters an instance, it affects storage
both at the writing-side (see 7.1.2.4.2.8 of the DDS specification) as
well as the reading-side (see 7.1.2.5.1.8 where the specification
states: ".. once an instance has been detected as not having any 'live'
writers and all the samples associated wit the instance are 'taken' from
the DataReader, the middleware can reclaim all local resources regarding
the instance".

So doing a 'take()' on the reader will remove the sample, but ONLY when
the instance is (implicitly or explicitly) unregistered by the writer,
the instance(-state)information will be removed for the reader.

And to complete the whole story, and reminding you of the
'dummy-samples' story of last Friday ( [OSPL-Dev] Too many messages for
PingPong example and RPC overDDS on pong/server side) , the write-side
'unregister_instance()' action (and related auto-dispose if the
autodispose_unregistered_instances QoS is still at its default
'TRUE'value) might result in a dummy-sample (with valid_data flag set to
FALSE) in case samples where already taken from dataReader cache (which
- as you probably remember - is to allow triggering of the
NOT_ALIVE_NO_WRITERS or NOT_ALIVE_DISPOSED instance states), which then
also have to be 'taken' in order for the resources to be (finally)
freed.

There's no asynchronous garbage collection involved here, its in-line
unregister_instance() processing at the writer-side as well as
receiver-side network-processing upon receiving of related
unregister_instance data.

... think this deserves an entry at the FAQ as its pretty complex stuff
... :-)

W.r.t. the 'mmstat' tool, yes this will be part of our next
community-update.

Thanks,

Hans

Hans van 't Hag

OpenSplice DDS Product Manager

PrismTech Netherlands

Email: hans.vanthag at prismtech.com

Tel: +31742472572

Fax: +31742472571

Gsm: +31624654078

-----Original Message-----
From: developer-bounces at opensplice.org
[mailto:developer-bounces at opensplice.org] On Behalf Of S. Poehlsen
Sent: Tuesday, May 19, 2009 4:50 PM
To: OpenSplice DDS Developer Mailing List
Subject: Re: [OSPL-Dev] Segfault when creating participant

Hi,

*Information model: keys & instances*

I am using a unique 36 char long string (UUID) as my key for each

message. After each write to a write I immediately dispose and

unregister_instance so that the memory from the writers point of view

should be freed.

*Application: history & resource-limits*

On the reader side I always use "take" to remove the sample and "loan"

to release the allocated memory from the reader.

I have not change any QoS setting, so it should just work fine.

There should always be one one request and one response message be in

the system since my program starts the next call only after receiving

the result from the previous call.

There's also a useful tool included in the latest community-edition
release now (called 'mmstat') that can be used to track the

utilization

of the shared-memory-segment at runtime. Mmstat provides detailed

I can not find mmstat on the opensplice website. In the blogpost

http://opensplice.blogspot.com/2009/05/opensplice-dds-v41-open-source-wh
ats.html

I assume that it will be released next week?

I just discovered a new thing. I read Timothy's email about waiting 10

msec. When I insert a usleep(1000) (just 1 ms) everything works fine.

I am able to start my rpc client as often as I would like to.

So I just added a new command line arg to the set amount of usecs each

time I call my client. Now I am able to call the client without a

sleeping delay if I call it every now and then with sleeping delays.

A side-effect is, that the roundtrip time increases about 0.4 ms for the

median with a sleep between to calls.

Is there a garbage collector in the opensplice daemon that may produce

that behavior?

Steph

_______________________________________________

OpenSplice DDS Developer Mailing List

Developer at opensplice.org

Subscribe / Unsubscribe
http://www.opensplice.org/mailman/listinfo/developer

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://dev.opensplice.org/pipermail/developer/attachments/20090519/4e3f005e/attachment.htm>

S. Poehlsen

16 years ago

Permalink

Hello Hans,

thanks for your help. I just experimented a bit with the valid_data flag
but I did not get it right. So I try to summarize all.

On the writer side I think I cannot really change anything since I am
using dispose() and unregister_instance() always immediately after the
Writer_write().
That means that the message is not alive anymore (dispose) and that no
further updates are coming from this writer (unregister_instance).
I think it is is also not possible to avoid the dummy samples since they
are possibly required by other readers.

But on the reader side I have two choices:

1. use the enable_invalid_sample = FALSE as you mentioned in the email
on last friday:
drQos->reader_data_lifecycle.enable_invalid_samples = FALSE

2. check each sample if it is valid with the valid_data flag in the
DDS_SampleInfoSeq variable.

For solution 1 I am not able to take the dummy samples from the reader
because the take() only returns valid data samples. So I do not know how
to free the space for the dummy samples. I expect that if the reader is
not interested in dummy samples it will drop them immediately, or am I
wrong?

For solution 2 with the valid_data flag I have the problem that my
programs suddenly hang because they did not received the expected
message from the other side. This happens only sometimes and on
different sides of my rpc call. Sometimes the request is lost, sometimes
the response.

Solution 1 works quite well, except that it has problems with the memory
if I do to many RPC calls without a sleep (100 us for example) between
these calls. Currently I am running client and server on the same
machine with one ospl daemon.

Steph

Patrick Wildenborg

16 years ago

Permalink

Hi Steph,

Normally DDS_DomainParticipantFactory_create_participant() should return
an appropriate return code if opensplice is not working.
We're not (yet) able to determine why this didn't happen in this case,
and what caused the segfault. Would it be possible for you to send the
ospl-error.log and ospl-info.log files from such a crash ?
These might contain more detailed information about what has gone wrong
during or prior to this situation.

Best Regards,
Patrick

-----Original Message-----
From: developer-bounces at opensplice.org [mailto:developer-
bounces at opensplice.org] On Behalf Of S. Poehlsen
Sent: 2009-05-18 13:41
To: OpenSplice DDS Developer Mailing List
Subject: Re: [OSPL-Dev] Segfault when creating participant
Hello.
Sometimes the opensplice daemon seems to break and brings later

joining

programs down. The attached example code dies with a segmentation

fault

in line 16 in this case.
How can I check in an external program if opensplice works before

trying

to get the participant? Shouldn't it be better to get a appropriate
return code from DDS_DomainParticipantFactory_create_participant()?
Steph
==16373== Invalid write of size 4
==16373== at 0x46F1EC7: c_new (in
.../HDE/x86.linux2.6/lib/libddsdatabase.so)
==16373== by 0x494C60E: v_objectNew (in
.../HDE/x86.linux2.6/lib/libddskernel.so)
==16373== by 0x49532BB: v_participantNew (in
.../HDE/x86.linux2.6/lib/libddskernel.so)
==16373== by 0x49ADFE4: u_participantNew (in
.../HDE/x86.linux2.6/lib/libddsuser.so)
==16373== by 0x49E67F5: _DomainParticipantNew (in
.../HDE/x86.linux2.6/lib/libdcpsgapi.so)
gapi_domainParticipantFactory_create_participant (in
.../HDE/x86.linux2.6/lib/libdcpsgapi.so)
DDS_DomainParticipantFactory_create_participant (in
.../HDE/x86.linux2.6/lib/libdcpssac.so)
==16373== by 0x804B24E: main (bug.c:16)
==16373== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==16373==
==16373== Process terminating with default action of signal 11

...

(SIGSEGV)

S. Poehlsen

16 years ago

Permalink

Hello Patrick,

thanks for your help.

What I did:
1. I started ospl with "ospl start"
2. I started my rpc server device
3. I started my rpc client which makes 11000 calls. everything is fine
4. The second run of my rpc client with 11000 calls is also fine.
5. The third run of my rpc client hangs and the rpc device segfaults:

May 19 10:51:34 acc kernel: [ 5848.933935] rpctest[10422]: segfault at 0
ip 00000000f7eff642 sp 00000000ff836a40 error 6 in
libddsdatabase.so[f7ee3000+32000]

That happened at the time the first memory claim denied in
ospl-error.log appears.

6. Start my "bug" program I posted already which immediately segfaults
with the second memory claim denied:

May 19 10:58:03 acc kernel: [ 6238.345237] bug[10744]: segfault at 0 ip
00000000f7fa0f40 sp 00000000fffec4c0 error 6 in
libddsdatabase.so[f7f98000+32000]

Also the durability service gets problems and is responsible for the
third memory problem in ospl-error.log

May 19 10:58:05 acc kernel: [ 6240.138727] durability[10414]: segfault
at 0 ip 00000000f7cabec7 sp 00000000f7346230 error 6 in
libddsdatabase.so[f7ca3000+32000]

7. aborting the hanging rpc client results in a segfault:

May 19 11:04:41 acc kernel: [ 6635.908857] rpctest[10444]: segfault at 0
ip 00000000f7f12f40 sp 00000000f7c0e010 error 6 in
libddsdatabase.so[f7f0a000+32000]

Just for your information: rpctest is my server and client in one
application. The command line argument decides if it is client or server
side. That's why both segfaults for client and device have the same
binary name.

Steph
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ospl-error.log
Type: text/x-log
Size: 1492 bytes
Desc: not available
URL: <http://dev.opensplice.org/pipermail/developer/attachments/20090519/bca1a8b7/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ospl-info.log
Type: text/x-log
Size: 10271 bytes
Desc: not available
URL: <http://dev.opensplice.org/pipermail/developer/attachments/20090519/bca1a8b7/attachment-0001.bin>

S. Poehlsen

16 years ago

Permalink

Hi Patrick,

would it help you if I will send you my source code by private mail?
In the meantime I modified the source code a bit, but the symptoms are
still the same.

Steph

Hans van't Hag

16 years ago

Permalink

Hi Steph,

Sorry for the delay in getting back to you but last week Thursday/Friday
where bank-holidays in Holland :-)

I think we've figured out what goes on in your application.

Something that slipped my mind (as I reported earlier that
garbage-collection is 'in-line') is that there's a sort of 'retention
period' upon freeing-up the memory related to unregistered instances to
accommodate for order reversal issues that can occur in some networks
(like a WAN). Even though our native networking implementation performs
order-preservation 'by design', we didn't want to put that up as a
requirement for a networking-implementation upfront and that's why we
only 'really-delete' the instance administration after a certain amount
of time to accommodate for proper handling of out-of-order information
that would otherwise 'resurrect' an instance unintended when such data
would arrive after the unregistration-message and related purging of the
instance(-state)administration.

This 'safety-period' is currently a fixed value of 5 seconds (see
src/kernel/code/v_group.c line-381):

static void

updatePurgeList(

v_group group,

c_time now)

{

c_time delay = {5,0}; <==

c_time timestamp;

...

So if you manage to register/unregister enough instances within a 5
second interval, you could exhaust the memory (that explains too why
adding a small delay 'resolves' the issue in your application).

Now towards a resolution, there are multiple options, ranging from a
work-around (as you currently have or to increase the
shared-memory-size) to fixing (by reducing this 5 seconds to a much
lower value, I'd say for LAN-based systems 0.5 sec. would be enough) to
reconsidering your design w.r.t. the actual need to uniquely identify
such a large amount of instances (i.e. reconsider the amount and
dynamics of used key-fields)

Personally I'd start with reconsidering the usage of keys in your
information model since creating/destroying instances is much more
heavyweight than communicating the samples themselves, so - remembering
my small document on DDS-and-RPC - maybe you don't need that many
instances to implement your RPC-emulation, i.e. using a unique key for
each request (I would have expected every server to have a unique
identification i.e. a key-field in a 'request_topic' rather than a
unique ID of every request).

Anyhow, I'll open a ticket to propose making this purge-interval a
deployment-configuration option.

Cheers,

Hans

Hans van 't Hag

OpenSplice DDS Product Manager

PrismTech Netherlands

Email: hans.vanthag at prismtech.com

Tel: +31742472572

Fax: +31742472571

Gsm: +31624654078

-----Original Message-----
From: developer-bounces at opensplice.org
[mailto:developer-bounces at opensplice.org] On Behalf Of S. Poehlsen
Sent: Friday, May 22, 2009 10:59 AM
To: developer at opensplice.org
Subject: Re: [OSPL-Dev] Segfault when creating participant

Hi Patrick,

would it help you if I will send you my source code by private mail?

In the meantime I modified the source code a bit, but the symptoms are

still the same.

Steph

_______________________________________________

OpenSplice DDS Developer Mailing List

Developer at opensplice.org

Subscribe / Unsubscribe
http://www.opensplice.org/mailman/listinfo/developer

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://dev.opensplice.org/pipermail/developer/attachments/20090525/f8551dd1/attachment.htm>

S. Poehlsen

16 years ago

Permalink

Hi Hans,

thanks for that email. Regarding DDS-and-RPC:

In case I just have a ServerID and a ReplyToID (client ID) I am not able
to do asynchronous RPC, am I? For that scenario I need something like
MessageIDs.

But I do not know if I have to set the keylist to the MessageID or if it
is sufficient to set it to the ServerID.

Is there a document describing for what I have to use the keylist
exactly? In a database I am using primary keys to identify an entry
uniquely, but in DDS it seems to me that the keylist is more for a data
stream identification.

Kind regards,
Steph

Hans van't Hag

16 years ago

Permalink

Hi Steph,

Not sure I understand your email below ...

I'd say for (any) RPC you have to address a server, or in
DDS-terminology a server should be able to subscribe to requests that
are meant for him which can be realized in multiple ways:

1. From a generic request-type, utilize server-specific
server_x_request_topics that 'channel' requests to specific servers

2. From a generic request-topic, publish in a (server-)specific
partition to which a specific server subscribes (this one wasn't in the
document I produced earlier)

3. From a generic request_topic, model the server_ID as a key
attribute and utilize server-side content-awareness
(content-filtered-topic)

So from these options, only option-3 uses key-attributes to uniquely
identify a specific server in a generic request.

The 'way back' from a server to a client is easy since the request
contains the ReplyToID which can be used again in the same 3 ways to get
the reply back to the client:

1. From a generic reply-type, create/use a reply-topic that get
its name from the ReplyToID parameter in the request so that that
replies are 'channeled' to the proper client

2. From a generic reply-topic, publish in a client-specific
partition identified by the ReployToID to which the client subscribes

3. From a generic reply_topic, model the ReplyToID as key
attribute and utilize client-side content-awareness
(content-filtered-topic)

In none of the above a request itself needs to be uniquely identified
provided that the readers have configured enough history so that they
can cope with backlog in processing requests and/or replies.

Finally to your question w.r.t. keylists for DDS and databases, actually
they are pretty much the same as keys uniquely identify entries in a
database ('instances' in case of DDS). The interesting difference is
that databases typically don't have a 'history' QoS to create a
FIFO-queue per entry (or 'instance' in DDS) as DDS supports. For
databases one could emulate it by using an extra key-field where you
would put the insertion-time in).

So maybe the solution to your issue is to get rid of the MessageIDs,
only use serverID (for requests) and clientID (for replies) as keys in
your request/reply topics and use a reasonable history-depth to allow
for queuing of request (and asynchronous replies if you choose
asynchronous RPC). I'd go for a KEEP_LAST policy at the reader rather
than KEEP_ALL since the latter implies peer-to-peer flow-control which
is typically not what you want in a decoupled/asynchronous system.

Hope this helps a little again ..

Regards,

Hans

Hans van 't Hag

OpenSplice DDS Product Manager

PrismTech Netherlands

Email: hans.vanthag at prismtech.com

Tel: +31742472572

Fax: +31742472571

Gsm: +31624654078

-----Original Message-----
From: developer-bounces at opensplice.org
[mailto:developer-bounces at opensplice.org] On Behalf Of S. Poehlsen
Sent: Monday, May 25, 2009 2:44 PM
To: OpenSplice DDS Developer Mailing List
Subject: Re: [OSPL-Dev] Segfault when creating participant

Hi Hans,

thanks for that email. Regarding DDS-and-RPC:

In case I just have a ServerID and a ReplyToID (client ID) I am not able

to do asynchronous RPC, am I? For that scenario I need something like

MessageIDs.

But I do not know if I have to set the keylist to the MessageID or if it

is sufficient to set it to the ServerID.

Is there a document describing for what I have to use the keylist

exactly? In a database I am using primary keys to identify an entry

uniquely, but in DDS it seems to me that the keylist is more for a data

stream identification.

Kind regards,

Steph

_______________________________________________

OpenSplice DDS Developer Mailing List

Developer at opensplice.org

Subscribe / Unsubscribe
http://www.opensplice.org/mailman/listinfo/developer

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://dev.opensplice.org/pipermail/developer/attachments/20090525/0e37159f/attachment.htm>

S. Poehlsen

16 years ago

Permalink

Hello Hans,

Post by Hans van't Hag
Hope this helps a little again ..

thanks, with the serverID in the keylist it works quite well.

Kind regards,
Steph