Container - EntityBean Contract

Why should I not allow business methods of Entity Beans to execute without a transaction?

If you look at the EJB specification, all the ejbStores are performed in the before-completion phase of the commit protocol. If you run with transaction mode "Supports", and the client does not do transaction demarcation, then you are running without a transaction. (Exactly the same happens with NotSupported and Never) Thus, there is no commit, meaning there is no before-completion, meaning there is no ejbStore. The whole protocol for data persistence breaks down.

Transaction mode "Supports" is of very questionable usefulness for Entity beans. Transaction modes "Never" and "NotSupported" are even more problematic wrt Entity beans.

In conclusion, it is not a bug in the container if in such situations your data modifications never show up in the database. It might be argued to be a bug in your deployment, or else a bug in the EJB specification. (The spec does have a note on this in the Entity Contract chapter) The solution is to fix your transaction attributes.

On the other hand it is reasonable to call finders outside of transactions.

How does activation/passivation of entities differ from that of sessions?

Activation/Passivation in entity beans and in session beans is completely different. Session beans are passivated by the Container after a certain timeout period. The motivation is to put a cap on resource usage. At passivation, they are written to secondary storage. When they are next used, they are read back in. If they are not used for longer than the timeout, they disappear altogether. Thus, for session beans, ejbPassivate means "the container is about to write the object to secondary storage, do whatever you need to do to support this". Similarly, ejbActivate means "the container just read the object in from secondary storage, do whatever you need to support this". Typically, there is nothing to do in either call.

For entity beans activation/passivation means that the bean instance is being associated/disassociated with a particular primary key (e.g., EJBObject reference). When in the pooled state, entity instances are not associated with any particular primary key. So, ejbActivate means "the container just associated the instance with a particular primary key, do whatever you need to support this". Likewise, ejbPassivate means "the container is disassociating the instance with the curently associated primary key, do whatever you need to support this". Again, typically, there is nothing to do in either call.

Note that in the Inprise Container, the association between an instance and a primary key can last for the duration of a single transaction or last longer, depending on the selected transaction mode.

Can I use non JDBC persistent stores for Entity Beans?

It is tricky to target persistent stores other than standard RDBMSes and OODBMSes. Entity beans need to be stored in a transactional system. Furthermore, this transactional system needs to be coordinated by JTS. In short, your database needs to be JTS aware.

We do provide an SPI (service provider interface) to the container managed persistence engine in the container. One can implement the interface for custom persistent stores and be able to have Entity Beans use that backend.

Can the Container automatically generate a primary key for each new entity bean created?

No, we do not have such a facility.

What is the strategy used for concurrent access from multiple transactions?

The section “Concurrent access from multiple transactions” of the EJB specification discusses two types of entity access. We implement the model for multiplexed access. That is, multiple clients can access the entity simultaneously, without locking at the container level. Note that locking may occur at the database, but will not occur in the container.

Transactions are used to control access to entity beans. If two different transactions (e.g., clients) access the same entity, they will in fact see different instances, each with a separate copy of the state. If the state is modified in both transactions, then only one of the transactions will be able to commit, and the other will abort. The concurrency control is done by the database (more on this in other FAQs). All locking is provided by the database, via isolation levels specified to the JDBC driver.

Note that the Container supports parallel transactional access to the same entity. Other "popular" EJB containers serialize access to entities, by locking. This will obviously have a severe impact on performance.

Let's take a particular example. We have two clients A, and B. Each client is attempting to access the same entity object E. Client A will be running in one transaction, and client B will be running in the other. Let's say that client A starts first. The container will load in a copy of entity E, and run client A's requests against this first copy of E. Then client B starts. The container will again load in a copy of entity E, and run client B's requests against this second copy of E. Note that there will be two copies of the EJB implementation class (e.g., the Bean class), each containing the same state. However, over time, client A might modify the state of E, as might client B. At some point, both of the clients' transactions are completed (either committed or rolled back). Let's say that client B commits first. This will cause the container to update the database (or whatever) using the state stored in the second copy of E. Later, client A commits, which will cause the container to (attempt to) udate the state stored in the first copy of E. This update may fail, depending on the particulars of the updates. However, the important point is that these transactions will run in parallel in the container. If locking is required, it will be done by the database. If the transactions are incompatible it is up to the database to detect this and (presumably) cause the second update to be rolled back.

Again, the point is that the container is not doing any locking. All locking is done in the database, which as pointed out before, is a good thing.

So, how does this differ from other products? Other products do the locking in the container. Thus, in the above scenario, client B would be blocked until client A had completed its work. If client A is running a relatively long transaction, this quickly becomes unacceptable in terms of overall transaction throughput!

Sidebar:

Implementing this parallel transaction feature is non-trivial for most EJB container vendors, because it requires a relatively sophisticated server-side object dispatch model. What is required is the ability to dispatch on a server-side object using both the primary key (which is usually embedded in the object reference) and the current transaction. Doing this multi-key dispatch is very hard to implement on top of an RMI or BOA style server-side model, which is why no other EJB container provides this functionality. We leverage the POA, which is a server- side frame-work which is much more sophisticated, and which is much more conducive to implementing this functionality.

What is the level of support for caching of entity state across transactions?

A simplistic scheme is that the state of an entity be only cached for the duration of a transaction. Beans stay in the ready state for the duration of the current transaction, or until they are removed.

The only optimizations can be in the CMP engine: (Tuned Writes)

However we go beyond the basics and you can specify any of the three modes outlined in the EJB specification.
* Option A: exclusive database access, retains state across transactions
* Option B: avoid passivation/activation across transactions
* Option C: passivate/activate across transactions (the behaviour above)
Note that Option A, in conjunction with field-level modification-detection in the CMP engine, will allow repeated readonly transaction to run completely in the container, e.g., without going to the database. This should be very fast.

Note that this option makes sense only when the entity “owns” the data in the database. You will have problems if the database is modified externally (eg by a data base administrator, another app) while the EJB server is running.

What exactly is the difference between the three commit modes?

The easiest to understand is Option C, which as you can see in the diagrams in the spec, causes a entity to be passivated and activated between transactions. A scenario is as follows:
1) Your entity is marked as transaction “Required”. If the client is not running a transaction, the Container starts one around every client call into the bean. So, when the client calls "create", a transaction is begun. When the creation is complete (e.g., after ejbPostCreate) the transaction is committed.
2) With Option C, after the transaction is committed, the entity is passivated.
3)The entity is activated for the next transaction, and passivated when it completes.
If you want to play with this, try creating a user transaction, that spans both the create and a business method:
        userTx.begin();
        Entity entity = entityHome.create();
        entity.doWork();
        userTx.commit();
You will notice the entity transition from pooled to ready state via an ejbCreate and then get passivated. On the doWork() call it will get activated and again passivated.

In Option B, the container will not call passivate after a transaction, and subsequently will not call activate before the next transaction using the same entity. The ejbStore still happens on a commit.

In Option A, the container assumes it has exclusive access to the database, and does not need to load the entity during the next transaction using the same entity. When the database is un-shared, it is possible to avoid reading the state of an entity at the beginning of a transaction, by using the state of the entity from the last (committed) transaction.

What is the default transaction commit-mode ?

Option B.

OptionB is a much better default than OptionC. OptionC requires that every entity bean go back and forth between the pooled and the ready state, between every transaction. In the (very common) case that you are using a working set of entity beans, this is just busy work, which has a measurable performance overhead.

Basically, we are trying to make the defaults have the best performance for the common use cases, and allow the user to change the defaults when their usage patterns differ. Caching is an option, specifiable in the deployment descriptor (and by default Option A style caching is disabled, since it cannot be assumed that by default the database is unshared).

How do I change the default commit mode?

You specify the transaction commit mode in the XML deployment descriptor – specifically the vendor specific XML (file ejb-inprise.xml). It is specified as a property on an entity and goes like this
  <property>
            <prop-name>ejb.transactionCommitMode</prop-name>
            <prop-type>Enumerated</prop-type>
            <prop-value>A</prop-value>
   </property>

Should I cache entities or replicate them?

It is assumed that caching is only of interest to small deployments where although bad for scalability it is excellent for a small number of users/objects as there is less need for passivation/activation. In a large deployment, you will certainly get improved performance by disabling caching in each container, and simply running more containers. i.e., get the database to do the caching; that's what you are paying them the big bucks for anyway. Replication is a much more scalable option compared to caching entities in the Container.

I want to cache entities using Option A but do not have “exclusive” access to the database for short, well defined periods. Can I workaround this in any way?

Expanded Question:

I want to use Option A but I need to give up “exclusive” database control for occasional DBA tasks. Is there any way of:

(a) dynamically turning off caching? Is there an IAS API for accessing the container and telling it to unload the cache for some beans at a specified time? This would be useful in situations where it makes sense to enable container caching but where there are (infrequent) external modifications of the database by administrators (eg, cleaning up a screwed-up transaction, running batch process to delete expired accounts, whatever). It makes container caching more usable/useful in situations where it's a selling point.

(b) being able to send a message forcing state to be resynchronized?

(c)The ideal might be some means by which an administrator could turn it off while an external task is running. A typical user scenario is as follows:

We are at present using the ejb.transactionCommitMode=exclusive flag for all of our beans. But for some beans we have an issue that we each night perform a database synhronization with our switches’ internal routing tables. At present this update is done directly to the database outside of the entity beans. The update takes 15 minutes and is done in the early hours so we do not expect any user issues but because the container could be possibly be caching the relevant beans when the user logs on in the morning they would not get the correct view. Now we could get rid of this flag but then we sacrifice our performance gains for the sake of 15 minutes update. What options do I have?

Answer:

The Container has the ability to dynamically add and remove EJBs from a running container. You can do the following:
1) Run the beans as normal in exclusive mode.
2) Do the update.
3) Remove the old EJBs, and add the same EJBs back to the running container.

This will effectively clear the cache and update the entities.

If you are unhappy with having the beans be inconsistent for the duration of the update, then you can do the following:
1) Run the beans as normal in exclusive mode.
2) When you want to do the batch update, substitute versions of the beans running in shared mode for the beans running in exclusive mode.
3) Do the update.
4) Substitute the exclusive mode beans for the shared mode beans.

We provide a command-line interface which can be used to load and unload beans from a running container, so it should just be a matter of adding these commands to your existing batch run.

Is it really useful to cache entities in any but trivial applications?

The question arises since it is believed that caching assumes your server "owns" the underlying data, which in practice, will rarely (never) be true. One stored procedure in your database and it becomes shared. Also, when you want to cluster you must share underlying data.

But caching does not necessarily need to assume exclusive ownership. There are several applications which can work on an "optimistic concurrency" basis. This sort of design works for many applications where concurrency collisions are rare. It can fail miserably, particularly from the users viewpoint, when concurrency collisions are probable, but such applications are likely to require a more sophisticated design under any architecture.

Can I use Option A along with replicated beans?

If multiple containers are hosting the same CMP Bean, Option A will break if used in a round-robin load balancing technique. Currently it is pretty much an either-or choice between:

1) A single container can have exclusive access to the database (or more precisely, exlusive access with respect to the entity's table in the database), and you can use Options A.

2) Multiple replicated containers can have non-exclusive access to the database (e.g., tables), and you can use Options B and C only. In this scenario round robin load balancing works great.

There is no way to support both exclusive access and fail-over simultaneously, given our current design. This really boils down to an OSAgent issue, which doesn't distinguish between round-robin for performance, and fail-over for availability.

What is the policy for pooling of entity bean instances? Is it configurable?

Typically, the pool will hold a small number of beans. The only way the pool would grow large is if either:
1) A large number of find operations were being performed simultaneously.
2) A large number of transactions were running simultaneously.
3) A large number of beans were being accessed in a single transaction.

In all of these cases, one would very much suspect that the size of the pool is not a limiting factor. In (1) and (2) it is much more likely that the number of database connections would be the bottleneck, since each concurrent find and/or transaction must use its own connection. It is possible to configure the pool for a particular entity by setting some properties in the XML DD. They are:

ejb.maxBeansInPool
This option specifies the maximum number of beans in the ready pool. If the ready pool exceeds this limit, entities will be removed from the container by calling unsetEntityContext. The default setting is 1000.
ejb.maxBeansInCache
This option specifies the maximum number of beans in the "Option A" cache. If the cache exceeds this limit, entities will be moved to the ready pool by calling ejbPassivate. The default setting is 1000.

Some appserver vendors claim that if data of entity beans changes in the database, they will refresh entity beans cached in the container. Does Inprise support anything like this?

Possibility 1:
Using Option A you'll be caching the state of EBs in the container after a transaction completes, and when that same EB becomes involved in another transaction, you'll reuse the state without going to the db. Now, let's say there's a legacy application making changes to the data in the db, behind the container's back. Potentially common case.

What some Containers could do is while the cached EBs were NOT involved in a transaction, a background thread in the container would walk through the list of cached EJBs (periodically) and check if their data was changed in the database, and if so, refresh them.

Unfortunately, this approach is broken. Say then, at time t1, the container verified that the DB and cache were in sync. Then say at t2, the container starts a new transaction, and uses the up-to-date cached state. Also, let's say at t3, a legacy application changes the data in the DB. The problem occurs if t3 comes between t1 and t2. The container thinks the cache is up-to-date, and in fact it is not. When t3 falls between t1 and t2, the legacy app's changes are potentially wiped out.

The only way to solve this problem is to check the database before starting the transaction. This is exactly what we were trying to avoid in the first place!

Unfortunately, ACID properties are nasty. You just can't cheat! So again, any vendor that claims they solve this problem is either:
1) lying
2) doing something incorrect

Persistence's appserver claims to use optimistic concurrency control attributes (OCAs) to detect the case where t3 fell in between t1 and t2, in which case the DB would have a higher OCA value than the cached EJB state, and the EJB transaction would either fail with a special concurrency-control exception, or wipe out the legacy app's changes (configurable).

Possibility 2:
The only way to do this correctly is to run triggers in the database within the current transaction (technically in the before-completion phase of the transaction). This seems extremely painful (and proprietary). How does the trigger get associated with the current JTS/OTS managed transaction? Just updating the caches after the database has changed introduces (albeit transient) inconsistencies in the data, which obviously violates any kind of correctness.

Can I assume that a ejbLoad has preceded ejbRemove ?

Is it allowed for a container to call ejbRemove on an entity bean with CMP without ever having called ejbLoad on the instance? Specifically, is it a valid state transition for an entity bean with BMP to go from pooled to ready with a call to ejbActivate, and then immediately back to pooled with a call to ejbRemove, without ejbLoad being called by the container?

ejbLoad would be called by the Container in the above scenario.

The intent of the EJB spec is that the instance must be "loaded" before ejbRemove is called. Requiring the ejbLoad method is also necessary to make the BMP model consistent with CMP. Allowing the container not to call the ejbLoad would lead to an error-prone programming model for the developer of the ejbRemove method. This is because sometimes the instance would be "loaded", and sometimes it would not. Then an entity with BMP will need to be coded to only depend on the primary key in the entity context, specifically not assuming that its other instance variables (cached state) have been set.

However there is no guarantee that an ejbLoad callback is always performed before ejbRemove. For example, an instance may not receive an ejbLoad if the instance got to the "ready" state via the ejbCreate/ejbPostCreate transition. Here the ejbCreate will essentially load the bean, since it is required to set the bean's instance variables before returning.

The bottomline is that the instance must be "loaded" before ejbRemove is called. In both cases, the container ensures that the instance variables are up-to-date before dispatching the ejbRemove method. This means that unless the container is certain that the instance variables are up-to-date, the container must load the CMP fields from DB and call ejbLoad before it calls ejbRemove.

Which fields are eligible for CMP?

The only restrictions the EJB spec gives regarding CMP fields is that they must be public and not transient. In practice they should also be non-static and non-final. If we allow static finals then serialVersionUID is a candidate for CMP. Also, in general, static fields are NOT allowed in ANY EJB implemenations. The reason is that the semantics of static cannot be enforced in a distributed environment. For example, if I have instances of the same EJB implementation class in two different VMs (a very useful thing, for performance reasons) then the static field is not shared among the instances.

How do I make bi-directional association between entities at their creation?

Problem Scenario from an actual support case

I've been trying to setup a relatively simple example to try out a 1:1 relationship between 2 entity beans, call them A and B. The only interesting action is that I want the create method of A to create the related B object. Object A has a CMP field that is a reference to B and B has a CMP field that is a reference to A.

In A::ejbCreate, I can't create the B since the current instance of A has no real object (getEJBObject returns null). So, I put the code to create B into A::ejbPostCreate. Here's a few code fragments:
From A, the ejbCreate and ejbPostCreate methods:

public void ejbCreate( String data ) 
throws CreateException, RemoteException {
    _mid = new MessagePK().allocMID();
    _data = new String( data );
    System.out.println( "in ejbCreate: mid = " + _mid );
}

public void ejbPostCreate( String data ) 
throws CreateException, RemoteException {
    System.out.println( "in ejbPostCreate( String )" );

    // Create our associated meta data object.
    // Have to do this here since we need a reference to ourself and in
    // ejbCreate getEJBObject returns null.

    _metaData =_mmdHome.create( _mid, (Message)_context.getEJBObject());
    System.out.println( "Finish: mmd = " + _metaData );
}
Even though the print "Finish:..." shows a valid reference, the database is not updated to reflect the value. This insert takes place at the end of ejbCreate:
*cm* execute create "INSERT INTO Message (_mid, _data, _metaData) VALUES
(?, ?, ?)" args: [31, Message number 0, null]
As expected, the _metaData field is null since it hasn't been created. Yet, at the exit from ejbPostCreate, no database update takes place. So the DB value remains null.

Am I doing something illegal? What is the "right" way to make this mutual cross-reference?

Resolution

If you look at the sequence diagrams for creating entity beans, the calls to the database occur between ejbCreate and ejbPostCreate. Thus, any changes that you make to the state in ejbPostCreate will be ignored, from a persistence perspective.

There is no simple way to do what you are trying to do. It could be construed as a bug in the EJB spec or wrong usage.

The only way to do what you want, is as follows:
1) In the ejbCreate for A, create B (not pointing to A, since you don't have A's self-reference). This will cause A to have a reference to B.
2) In the ejbPostCreate, set B's reference to A. This will cause B to have a reference to A.

If you wrap A's ejbCreate/ejbPostCreate within a single transaction (using either client-managed- transactions, or container-managed-), then you will never see the temporarily incomplete B, except within the scope of that single transaction.

Here's are the relavent code fragments:

public void ejbCreate( String data ) throws CreateException, RemoteException {
    _mid = new MessagePK().allocMID();
    _data = new String( data );
    System.out.println( "in ejbCreate: mid = " + _mid );

    // Create our meta data object, MsgMetaData.  We'll give it an initial null reference
    // to us (since we really don't exist yet).  Later, in ejbPostCreate,
    // we'll set the reference properly.

    _metaData = _mmdHome.create( _mid, null );
}

public void ejbPostCreate( String data ) throws CreateException, RemoteException {
    System.out.println( "in ejbPostCreate( String )" );
    // Update our associated meta data object's reference to us.
    _metaData.setMessage( (Message) _context.getEJBObject() );
}
The MsgMetaData.setMessage method is a single line of code setting the member reference.