Wednesday, February 27, 2013

On PostgreSQL Inverse mode and database audit table triggers

While working on a different issue, I noticed something strange about a history table. A history table keeps all changes done to the records in the main table - whenever something was inserted, updated or deleted, a record is added to the history table. However, each time something was inserted, there were two records added to the history table. First one had a null value in the main table foreign key field, and the second one had the correct value.

Duplicates in history table

Turns out that the reason is the way NHibernate maps objects to data. In my case, the parent entity knows about the child entity. The child entity, however, has no knowledge of the parent entity. This can be most easily explained by looking at the following snippet:

public class Order
{
 public virtual int Id { get; set; }
 public virtual ICollection<Detail> Details { get; set; }
}

public class Detail
{
 public virtual int Id { get; set; }
 public virtual string Name { get; set; }
}

// ...
var order = new Order();
var detail = new Detail() {Name = "Widget"};

session.Persist(detail);
order.Details.Add(detail);

While the Order knows about the Details, each Detail has no knowledge about the parent Order. PostgreSQL will perform the following sequence of actions:

  • Insert the Order
  • Insert the Detail with OrderId = null
  • Update the Detail with actual OrderId
which in my case looks like this:

Insert, then Update

How can that be fixed so that only one INSERT statement is executed? Well, here is where Inverse comes into play. Check the references at the end of the post for a proper explanation, but the definition is that setting Inverse to true places responsibility of the saving the relationship on the "other side", in this case - on the child. In the mapping, that will look as a change from

mapping.HasMany(x => x.Children).KeyColumn("inventoryitemid")
 .AsSet();

To

mapping.HasMany(x => x.Children).KeyColumn("inventoryitemid")
 .AsSet()
 .Inverse(); 

And the code snippet should now look this way

public class Order
{
 public virtual int Id { get; set; }
 public virtual ICollection<Detail> Details { get; set; }
}

public class Detail
{
 public virtual Order Order { get; set; }
 public virtual int Id { get; set; }
 public virtual string Name { get; set; }
}

// ...
var order = new Order();
var detail = new Detail() {Name = "Widget", Order = order};

session.Persist(detail);
order.Details.Add(detail);

Note how the Order is added to Detail and also is passed when the Detail object is created. In my case, the profiler now shows the following:

Insert only

So, problem solved? Well, yes and no. While profiling the modified solution, I found that now a lot more actions were logged in the same scenario - most of them were second level cache access. The impact on performance is not immediately obvious, but hardly beneficial.

Second level cache access

It is likely that by adding a reference to the parent I forced NHibernate to retrieve the whole parent entity. Eventually, not having enough time to evaluate full impact of the change, a somewhat "compromise" solution was chosen: modify a trigger that inserts into a "history" table to check if the value of a foreign key is not null.

So, a trigger that looked like this

CREATE OR REPLACE FUNCTION process_emp_audit() RETURNS TRIGGER AS $emp_audit$
    BEGIN
        IF (TG_OP = 'INSERT') THEN
            INSERT INTO emp_audit SELECT 'I', now(), user, NEW.*;
            RETURN NEW;
  ...
        RETURN NULL; -- result is ignored since this is an AFTER trigger
    END;
$emp_audit$ LANGUAGE plpgsql;

would now be modified as follows

CREATE OR REPLACE FUNCTION process_emp_audit() RETURNS TRIGGER AS $emp_audit$
    BEGIN
        IF (TG_OP = 'INSERT' AND NEW.foreignkeyfield IS NOT NULL) THEN
            INSERT INTO emp_audit SELECT 'I', now(), user, NEW.*;
            RETURN NEW;
  ...
        RETURN NULL; -- result is ignored since this is an AFTER trigger
    END;
$emp_audit$ LANGUAGE plpgsql;

Looks like a dirty hack, which falls into the "best we can do in the least time" category.

References:

NHibernate Inverse attribute
Fluent NHibernate and Collections Mapping
NHibernate performance issues #1: evil List (non-inverse relationhip)
NHibernate's inverse - what does it really mean?
39.9. Trigger Procedures
by . Also posted on my website

Monday, February 11, 2013

Photobox – CSS3 JQuery Image Gallery

I came across a nice image gallery script which is lightweight, hardware accelerated and generally looks good. Image can be zoomed in and out using mouse wheel and navigated using mouse move. Image 'alt' is shown at the bottom, and the row of thumbnail images is also displayed at the bottom. The autoplay is supported and time is configurable. The script can be downloaded from Photobox github. It only supports IE 8 and higher, and does not look as good as in other browsers though.

The usage is very easy: jQuery, script and css have to be referenced as usual, i.e.

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.0/jquery.min.js" type="text/javascript"></script> 
<link href="@Url.Content("~/Scripts/photobox/photobox.css")" rel="stylesheet" type="text/css"/>
<link href="@Url.Content("~/Scripts/photobox/photobox.ie.css")" rel="stylesheet" type="text/css"/>
<script src="@Url.Content("~/Scripts/photobox/photobox.js")" type="text/javascript"></script>

A gallery with all default values (again, check Photobox github for parameters) is included as follows

<div id='gallery'>
  <a href="../../../Content/photobox/P1.jpg">
   <img src="../../../Content/photobox/P1_small.jpg" alt="photo1 title"/>
  </a>

  ...
  //More images
</div>

<script type="text/javascript">
 $(document).ready(function () {
     $('#gallery').photobox('a');
 });
</script>

A more involved setup with parameters may look as follows

<script type="text/javascript">
 $(document).ready(function () {
     $('#gallery').photobox('a:first', { thumbs:false, time:0 }, imageLoaded);
  function imageLoaded(){
   console.log('image has been loaded...');
  }
 });
</script>

I added a sample gallery (photos courtesy of my wife) to my website: Photobox Example

The border around the images is totally optional

<style type="text/css">
img {
   padding:1px;
   border:1px solid #021a40;
   background-color:#ff0;
}
</style>

References

Photobox – CSS3 JQuery Image Gallery
Photobox github
jquery access nested div
Using CSS for Image Borders
by . Also posted on my website

Monday, February 4, 2013

Use of PostgreSQL Indexes

I'm busy with investigating how indexing works in PostgreSQL and what are the ways to improve it. One particularly useful query I came across is this:

SELECT idstat.schemaname AS schema_name, idstat.relname AS table_name,
indexrelname AS index_name,
idstat.idx_scan AS times_used,
pg_size_pretty(pg_relation_size(idstat.relid)) AS table_size, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
n_tup_upd + n_tup_ins + n_tup_del as num_writes,
indexdef AS definition
FROM pg_stat_user_indexes AS idstat JOIN pg_indexes ON (indexrelname = indexname AND idstat.schemaname = pg_indexes.schemaname)
JOIN pg_stat_user_tables AS tabstat ON idstat.relid = tabstat.relid
WHERE idstat.idx_scan  < 200
 AND indexdef !~* 'unique'
ORDER BY idstat.relname, indexrelname;

It returns the following information for each index in the database:

  • schema name
  • table name
  • index name
  • disk space used by the index, and the table
  • how many rows were inserted, deleted or updated
  • how many times the index was used

If the database was used for some time, the information may help to find out which indexes are not used at all, or used rarely but occupy a lot of space on the disk. Or it may suggest that something is not working as designed - a rarely used index that was expected to be used a lot is probably a warning sign.

The unfortunate complication that I came across was that the query returned absolutely no data after the database restore. My current understanding is that PostgreSQL does not backup the pg_catalog, and also the indexes are rebuilt when the database is restored. Therefore, if I do not have direct access to the database, I have to either ask someone to run the script (and give them the PostgreSQL/pgAdmin password), or somehow obtain a file system level copy of the database. In the future, I'll need to create a utility that extracts this information and saves it to a file.

References

Handling Growth with Postgres: 5 Tips From Instagram
Efficient Use of PostgreSQL Indexes
Finding Useless Indexes
9.20. System Administration Functions
27.2. The Statistics Collector
PostgreSQL index usage, backing up pg_catalog data
by . Also posted on my website