Thursday, December 26, 2013

Minimal change problem

A popular problem to introduce dynamic programming is the minimal change problem. Suppose a cashier needs to give me a certain amount of change and wants to do it with the minimal amount of coins possible. The input is a set of denominations and the amount, and the output is the set of coins.

For example, I may need to give 45 cents change and available coins are 1, 5 and 20. A solution intuitively is 20 + 20 + 5, but what is the best way to achieve it?

A recursive solution may be the first to try. A minimal collection of coins definitely belongs to the following set:

  • minimal collection of coins totalling 44 cents, plus 1 cent coin
  • minimal collection of coins totalling 40 cents, plus 5 cent coin
  • minimal collection of coins totalling 25 cents, plus 20 cent coin

So on the next step we will apply the same logic to our new change amounts: 44, 40 and 25. This looks like a classic recursion problem, which can be illustrated by the following image

Solving Minimum Change with Recursion

and described by the following algorithm / pseudocode (where |coins| is the number of denominations available)

RECURSIVECHANGE(money, coins)
	if money = 0
		return 0
	MinNumCoins ← ∞
	for i ← 1 to |coins|
		if money ≥ coini
			NumCoins ← RECURSIVECHANGE(money − coini, coins)
			if NumCoins + 1 < MinNumCoins
				MinNumCoins** ← NumCoins + 1
	 output MinNumCoins*

This should work, but is there something wrong with this approach? Well, one can see that the recursive algorithm will calculate the full solution for 19 cents 6 times on the third step only, and it will only get worse on the following steps. If the input value is large enough, the memory and time required to compute the solution will be huge. So, this is a classic example of the benefits of dynamic programming. I came across dynamic programming a few times before, but just couldn't properly figure it out.

Finally I found a good explanation. It came as a part of a free course in Bioinformatics Algorithms. Here's how it goes:

The key to dynamic programming is to take a step that may seem counterintuitive. Instead of computing MinNumCoins(m) for every value of m from 45 downward toward m = 1 via recursive calls, we will invert our thinking and compute MinNumCoins(m) from m = 1 upward toward 45, storing all these values in an array so that we only need to compute MinNumCoins(m) once for each value of m. MinNumCoins(m) is still computed via the same recurrence relation.

MinNumCoins(m) = min{MinNumCoins(m − 20) + 1, MinNumCoins(m - 5) + 1, MinNumCoins(m - 1) + 1}

For example, assuming that we have already computed MinNumCoins(m) for m < 6, MinNumCoins(6) is equal to one more than the minimum of MinNumCoins(6 - 5) = 1 and MinNumCoins(6 - 1) = 5. Thus, MinNumCoins(6) is equal to 1 + 1 = 2.

This translates into the following algorithm / pseudocode

DPCHANGE(money, coins)
 MinNumCoins(0) ← 0
 for m ← 1 to money
        MinNumCoins(m) ← ∞
        for i ← 1 to |coins|
            if m ≥ coini
                if MinNumCoins(m - coini) + 1 < MinNumCoins(m)
                    MinNumCoins(m) ← MinNumCoins(m - coini) + 1
    output MinNumCoins(money)

And a further C# implementation, which takes a comma-separated string of denominations available, and the target amount.

public static void DPCHANGE(int val, string denoms)
{
	int[] idenoms = Array.ConvertAll(denoms.Split(','), int.Parse);
	Array.Sort(idenoms);
	int[] minNumCoins = new int[val + 1];

	minNumCoins[0] = 0;
	for (int m = 1; m <= val; m++)
	{
		minNumCoins[m] = Int32.MaxValue - 1;
		for (int i = 1; i <= idenoms.Count() - 1; i++)
		{
			if (m >= idenoms[i])
			{
				if (minNumCoins[m - idenoms[i]] + 1 < minNumCoins[m])
				{
					minNumCoins[m] = minNumCoins[m - idenoms[i]] + 1;
				}
			}
		}
	}
}

References

Bioinformatics Algorithms
An Introduction to Dynamic Programming: The Change Problem
by . Also posted on my website

Monday, August 12, 2013

Using Active Setup to Update Anything in HKEY_CURRENT_USER

Following my last post, I had next to make sure that every user's entry in the registry was updated, and that change had to be scripted. This turned out to be a non-trivial task and took some research. First of all, the entry is located in HKEY_CURRENT_USER registy hive. Therefore, being logged in as Admin I cannot directly set an entry for Bob because Bob is not the current user at the moment. Then what can I do? The HKEY_CURRENT_USER is a kind of shortcut to HKEY_USERS. Under HKEY_USERS I can see the following structure

HKEY_USERS\.DEFAULT
HKEY_USERS\S-1-5-18
HKEY_USERS\S-1-5-19
HKEY_USERS\S-1-5-20
HKEY_USERS\S-1-5-21-0123456789-012345678-0123456789-1004
HKEY_USERS\S-1-5-21-0123456789-012345678-0123456789-1004_Classes

The first 4 entries correspond to built-in system accounts, and the rest are real user accounts on a PC. So, one way to make the change I need is to loop through all users and make the changes as requested. Someone even wrote a VB script which does exactly that. My case is a bit different, though. I only have a small handful of users, but the change I'm making in the registry key depends on the user. So, maybe I can map a username to the registry key.

If I run the following from the command line wmic useraccount get name,sid, I will see a table similar to the following

Name            SID
Administrator   S-1-5-21-1180699209-877415012-3182924384-500
Guest           S-1-5-21-1180699209-877415012-3182924384-501
Tim             S-1-5-21-1180699209-877415012-3182924384-1004

Great. Now I can script my change and run it. However - it does not work. It appears that user hives are usually only loaded for currently logged in users. That complicates things.

Fortunately, I came across the alternative solution - use Active Setup. It's original use is, likely, to check if a specific version of the software is installed to help installers to install, uninstall and repair software. It can, however, be used to write pretty much anything in the HKCU of the user who logs on. Here's how it works:

When the user logs on, the following registry key is checked: HKCU\Software\Microsoft\Active Setup\Installed Components\

If the HKCU key is not found in the registry then the contents of the string value StubPath is executed. This is essentially all that's important, so here is my example.

reg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\MountDrive" /v "Version" /d "1" /t REG_SZ /f

reg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\MountDrive" /v "StubPath" /d "reg add HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Run /v "MountDrive" /d "C:\map.cmd" /t REG_SZ /f" /f

Or, translating the reg add commands into PowerShell script

$mapcmd = "C:\map.cmd"
$regKey = "HKLM:\SOFTWARE\Microsoft\Active Setup\Installed Components\MountDrive"
New-Item -path $regKey | Out-Null
$regKey = "HKLM:\SOFTWARE\Microsoft\Active Setup\Installed Components\MountDrive"
$regName = "Version"
$value = "1"
New-ItemProperty -path $regKey -name $regName -value $value | Out-Null
$regName = "StubPath"
$value ="reg add HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Run /v MountDrive /d $mapcmd /f"
New-ItemProperty -path $regKey -name $regName -value $value | Out-Null

Here's what happens when the user logs on:

  • HKCU\Software\Microsoft\Active Setup\Installed Components\MountDrive is checked. There is nothing there.
  • string value in StubPath is executed. The value is "reg add" command and it creates a MountDrive string under Run key, with a value "C:\map.cmd". Therefore, this cmd script will run on user logon.
  • Also, a Version entry is created in HKCU\Software\Microsoft\Active Setup\Installed Components\MountDrive with a value of 1.
  • Next time the user logs on, step 1 find the Version entry, thefore no actions will be performed.

Seems a little complicated, but after running once and observing the changes as they are made in the registry, it becomes clear.

References:

Update a registry key for ALL users on a system
HKEY_USERS
Security Identifier
How To Find a User's Security Identifier (SID) in Windows
Adding Registry Settings
by . Also posted on my website

Saturday, July 13, 2013

A Little Strangeness in the Way A Script is Run by Registry

An observation from today.

Currently net.exe is executed when the user logs in and maps a network drive (may also use credentials based on the user, etc.). This is achieved as follows:

Scenario 1. Normal conditions

Under the registry key HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Run. Create a string value name MountDrive and data net.exe use b: /persistent:no \\localhost\mydrive

Behaviour: Command runs, drive is mapped.

Now suppose command fails occasionally. To understand why, I would like to redirect the output of net.exe into the text file. That should be easy to do.

Scenario 2. Modified Value in Registry

I modified the registry entry to

net.exe use b: /persistent:no \\localhost\mydrive >>C:\netlog.txt

Behaviour: Command runs, drive is mapped, but no output file. If I run the same command from command line, the drive is mapped and output file is created (as expected).

Next, I came up with a workaround

Scenario 3. Run a cmd file from Registry.

  • create map.cmd with the contents net.exe use b: /persistent:no \\localhost\mydrive >>C:\netlog.txt
  • place it on C: drive
  • modify the registry entry to C:\map.cmd

Behaviour: Command runs, drive is mapped, output file is created with the contents The command completed successfully. Why does it behave like that - I don't know.

by . Also posted on my website

Tuesday, July 9, 2013

In SQL Query, Only Return Every n-th Record

A little SQL trick that helps in some cases. In my case, I wanted to select some data that is logged into the table every several seconds, and then quickly plot it in Excel, but over a date range of a few months. So, I run a query

SELECT "timestamp", valueiwant
FROM mytable
order by timestamp

And that potentially leaves me with thousands or hundreds of thousands of rows. However, to visualise a trend over time, I don't need to plot each and every value on the graph. I'll be happy with 1/10 or even 1/100 of records. Here is how I can use ROW_NUMBER to achieve that.

SELECT * FROM
(
 SELECT "timestamp", instrumenttimestamp, ROW_NUMBER() OVER (order by timestamp) AS rownum
    FROM mytable
 order by timestamp
) AS t
WHERE t.rownum % 25 = 0

Row number returns the sequential number of a row in the result set. The WHERE clause then checks if the number is a multiple of 25, therefore only rows 25, 50, 75, 100 etc. will be returned by the outer query.

References

ROW_NUMBER (Transact-SQL)
Return row of every n'th record
by . Also posted on my website

Saturday, June 29, 2013

Final Hurdles While Installing Windows Updates

There are three different PC models here, all configured in the same way, all have to have Windows Updates installed on them automatically. However, while two of the models were happily updated, the third decided it does not like some of the updates and presented me with the screen similar to the following:

Turns out these two error codes are quite known problems.

The first one is solved, among some other options, by downloading and installing a specific update: , which is over 300MB in size. So in my case I had to do a check for the PC model by reading it via PowerShell

$strModel = (Get-WmiObject Win32_ComputerSystem).Model.Trim()

and if the model was the "weird" one, install this update.

The second one is solved by repairing the .NET Framework 4 installation. Fortunately, this can be done either silently or with unattended option. All in all, the fix for my two problems was applied as the following addition to the script and, fortunately, no additional restarts were required and after running this bit I could proceed to install updates as per my previous post.

if($strModel -Like "*WeirdModel*")
{
 Write-Host "Verifying .NET Framework 4 ..."
 Start-Process "C:\Windows\Microsoft.NET\Framework64\v4.0.30319\SetupCache\Client\setup.exe" "/repair /x86 /x64 /ia64 /parameterfolder Client /passive /norestart" -Wait
 Write-Host "Done."
 Write-Host "Installing System Update Readiness Tool ..."
 $readinessTool = Join-Path $win7Folder "Windows6.1-KB947821-v27-x64.msu"
 $toolCommand = $readinessTool + " /quiet /norestart"
 Write-Host $toolCommand
 Start-Process "wusa.exe" $toolCommand -Wait
 Write-Host "Done."
}

References:

Error Code 0x80073712 occurs in Windows Update or Microsoft Update
System Update Readiness Tool for Windows 7 (KB947821) [May 2013]
Error codes “0x80070643” or “0x643” occur when you install the .NET Framework updates
Silent install, repair and uninstall command lines for each version of the .NET Framework
by . Also posted on my website

Tuesday, June 25, 2013

Optimising Windows Updates Installation

It's a good thing that I wrote about Installing Windows Updates via Shell Script some time ago because today I needed to reuse that bit of a script and could not find it anywhere on my PC or our corporate network.

This time I'm reusing most of the functionality, but additionally do the following:

  • Make sure that the Windows Update service is started
  • Run a PowerShell script that passes a folder where the Windows update files are stored to the VbScript file
  • Execute VbScript to install all updates in a folder
  • Repeat. (I want to keep my "required" and "optional" updates separate

I was caught for a while trying to use PowerShell Set-Service and Start-Service commands and getting permission errors. I did not quite solve it, but found a simple workaround by utilising a command line:

@ECHO OFF

sc config wuauserv start= auto
net start wuauserv

Next, the PowerShell script is used to pass parameters to VbScript:

cscript .\Common\InstallUpdates.vbs $updatesFolder

Finally, the VbScript is almost the same as in the previous version, but note how the argument passed by PowerShell is parsed. The argument is the name of the folder where I placed the updates downloaded from Microsoft Update Catalog

Set args = WScript.Arguments
sfolder = args.Item(0)

Dim objfso, objShell
Dim iSuccess, iFail
Dim files, folderidx, Iretval, return
Dim fullFileName

Set objfso = CreateObject("Scripting.FileSystemObject")
Set folder = objfso.GetFolder(sfolder)
Set objShell = CreateObject("Wscript.Shell")

With (objfso)
 If .FileExists("C:\log.txt") Then
  Set logFile = objfso.OpenTextFile("C:\log.txt", 8, TRUE)
 Else
  Set logFile = objfso.CreateTextFile("C:\log.txt", TRUE)
 End If
End With

Set files = folder.Files
iSuccess = 0
iFail = 0

For each folderIdx In files

fullFileName = sfolder & "\" & folderidx.name

 If Ucase(Right(folderIdx.name,3)) = "MSU" then
  logFile.WriteLine("Installing " & folderidx.name & "...")
  iretval=objShell.Run ("wusa.exe " & fullFileName & " /quiet /norestart", 1, True)
  If (iRetVal = 0) or (iRetVal = 3010) then
   logFile.WriteLine("Success.")
   iSuccess = iSuccess + 1
  Else
   logFile.WriteLine("Failed.")
   iFail = iFail + 1
  End If
 ElseIf Ucase(Right(folderIdx.name,3)) = "EXE" Then
  logFile.WriteLine("Installing " & folderidx.name & "...")
  iretval = objShell.Run(fullFileName & " /q /norestart", 1, True)
  If (iRetVal = 0) or (iRetVal = 3010) then
   logFile.WriteLine("Success.")
   iSuccess = iSuccess + 1
  Else
   logFile.WriteLine("Failed.")
   iFail = iFail + 1
  End If
 End If
Next
 
wscript.echo iSuccess & " update(s) installed successfully and " & iFail & " update(s) failed. See C:\log.txt for details."

Disable the Windows Update service again if necessary

net stop wuauserv
sc config wuauserv start= disabled

References:

Managing Windows Services from the command line
Working with Command-Line Arguments
How do you pass a variable to VBS script in a powershell command?
Enable/Disable a Service via PowerShell
PowerShell queryService – Wait for a Dependency Starting Service
by . Also posted on my website

Tuesday, May 14, 2013

Customising Windows Installation

In some cases - for example, where the only purpose of the PC is to run a specific software package - certain Windows features are customised with the provider's brand. Such features may include logon screen background, individual desktop backgrounds for each user and user account logon pictures (tile images). Therefore it may be useful to know where those are stored and how to update them. The following description is for Windows 7 and Windows Server 2008.

1. Windows logon screen background.

This is the easiest one. It should be copied to the following location: C:\Windows\system32\oobe\info\backgrounds\backgroundDefault.jpg. A registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Authentication\LogonUI\Background\OEMBackground should exist with a dword value of 00000001.

2. Desktop backgrounds (wallpaper) for each user

Check the following registry key: HKEY_CURRENT_USER\Control Panel\Desktop\Wallpaper. What I found was the following: C:\Users\Account_Name\AppData\Roaming\Microsoft\Windows\Themes\TranscodedWallpaper.jpg

Therefore, for each Account_Name the image has to be replaced with the desired one.

3. User Account logon pictures

This turned out to be trickier. There is a description on MSDN on how to do it manually [1]. Automating the task is not so obvious. Turns out, there is a function in the shell32.dll that sets a user login picture. It can be called easily from C# code by P/Invoke. In fact, the following is the full code of a console application that will update the user login picture.

using System;
using System.Runtime.InteropServices;

namespace UserPictureUpdater
{
    class Program
    {
        [DllImport("shell32.dll", EntryPoint = "#262", CharSet = CharSet.Unicode, PreserveSig = false)]
        public static extern void SetUserTile(string username, int notneeded, string picturefilename);
        [STAThread]
        static void Main(string[] args)
        {
            if (args.Length == 2)
            {
                SetUserTile(args[0], 0, args[1]);
            }
        }
    }
}

The application can run from command line

UserPictureUpdater Administrator adminnew.png

Or, in my case, I'm running it from PowerShell script the following way

Start-Process $command $params

Where $command is the full path, i.e. "C:\Folder\UserPictureUpdater.exe", and $params is the command line parameters, i.e. "Administrator adminnew.png".

Also, turns out someone figured out the way to do the whole thing in PowerShell [2]. But I was done with my approach by the time I found out.

References

About User Profiles
Setting the user tile image in Windows 7 and Server 2008 R2
by . Also posted on my website

Saturday, April 6, 2013

Google as my automated testing tool

It is probably a well-known fact, but Google webmaster tools record errors when they crawl the website. Of course, first of all the website has to be submittet to Google. After that, the crawl errors can be accessed by selecting the website of interest and clicking Health. On the left side, Crawl Errors will be available.

I did not even know that this was available until I received an email from Google which informed me that there was an increase in server errors and provided me a link to review those errors.

Google Webmaster Tools

Looks like the website was generating about ~70 errors when crawled, and I did not know. When the number of errors increased to ~90, I got an email. The errors are listed and can be marked as fixed as I deal with the root cause.

Website Crawl Errors

Most errors were caused by my refactoring where I replaced direct access to the database with using the repository pattern, and did it carelessly.

Another reason was that I used a tool I found on the web to generate my website map and did not review it before uploading on the website. The map contained some links that were not supposed to be accessed directly. It is probably a good idea to review those links to verify they are really needed or can be replaced and if they are needed, remove them from the site map. I'll be tracking the errors from now on.

References

Webmaster tools
Crawl errors
Sitemap Generator
by . Also posted on my website

Friday, April 5, 2013

Project ROSALIND: Rabbits and Recurrence Relations

I came across the project ROSALIND which is described as learning bioinformatics through problem solving. It is intriguing and well-designed, so I started with solving some introductory ones.

The first interesting problem was modified Fibonacchi sequence. Actually, I did not know that the background of the Fibonacci sequence was modelling of rabbit reproduction. It assumed that rabbits reach reproductive age after one month, and that every mature pair of rabbits produced a pair of newborn rabbits each month. A modified problem, however, suggested that every mature pair of rabbits produced k pairs of newborn rabbits each month. The task is to calculate a total number of rabbit pairs after n months, assuming we have one pair of newborn rabbits at the start.

While the problem could be solved by recursion, the cost of calculation would be high. Every successive month the program would re-calculate the full solution for each previous month. A better approach is dynamic programming (which, in essence, is just remembering and reusing the already calculated values). Here is the modified solution in C#.

/// <summary>
/// Modified Fibonacchi problem: each rabbit pair matures in 1 month and produces "pairs" of newborn rabbit pairs each month
/// </summary>
/// <param name="pairs">Number of newborn rabbit pairs produced by a mature pair each month</param>
/// <param name="to">Number of months</param>
/// <returns>Total number of rabbit pairs after "to" months</returns>
static Int64 Fibonacci(int pairs, int to)
{
 if (to == 0)
 {
  return 0;
 }

 Int64 mature = 0;
 Int64 young = 1;

 Int64 next_mature;
 Int64 next_young;
 Int64 result = 0;
 for (int i = 0; i < to; i++)
 {
  result = mature + young;

  next_mature = mature + young;
  next_young = mature * pairs;

  mature = next_mature;
  young = next_young;
 }
 return result;
}

Note: the result grows fast! When trying to use the default Int32 (32 bit, or up to ~2 billion) and calculate the result for 4 pairs and 32 months, the value overflowed at around month 23.

The next problem was another variation on the rabbit simulation. In this case, the rabbits are mortal and die after k months. My solution was to have a counter for rabbits of each age at each step. I keep the counters in the dictionary, where the key is the age of a rabbit pair and the value is the number of rabbit pairs of that age on that step.

/// <summary>
/// Mortal Rabbits Fibonacci sequence variation
/// </summary>
/// <param name="months">How many months does the simulation run for</param>
/// <param name="lifespan">Rabbit lifespan</param>
/// <returns>A count of rabbit pairs alive at the end</returns>
static UInt64 MortalRabbits(int months, int lifespan)
{
 Dictionary<int, UInt64> dRabbits = GetEmptyDictionary(lifespan);
 dRabbits[0]++;

 for (int i = 0; i < months - 1; i++)
 {
  Dictionary<int, UInt64> newRabbits = GetEmptyDictionary(lifespan);
  foreach (KeyValuePair<int, UInt64> pair in dRabbits)
  {
   int age = pair.Key;

   if (age == 0)
   {
    newRabbits[1] = newRabbits[1] + dRabbits[age];
   }
   else if (age > 0 && age < lifespan - 1)
   {
    newRabbits[age + 1] = newRabbits[age + 1] + dRabbits[age];
    newRabbits[0] = newRabbits[0] + dRabbits[age];
   }
   else if (age == lifespan - 1)
   {
    newRabbits[0] = newRabbits[0] + dRabbits[age];
   }
  }
  dRabbits = newRabbits;
 }

 UInt64 count = 0;
 foreach (KeyValuePair<int, UInt64> pair in dRabbits)
 {
  count = count + pair.Value;
 }

 return count;
}

/// <summary>
/// Creates an dictionary where keys are integers from 0 to lifespan - 1, and all values are zeros
/// </summary>
/// <param name="lifespan"></param>
/// <returns>An empty dictionary</returns>
static Dictionary<int, UInt64> GetEmptyDictionary(int lifespan)
{
 Dictionary<int, UInt64> dRabbits = new Dictionary<int, UInt64>();

 for (int i = 0; i < lifespan; i++)
 {
  dRabbits.Add(i, 0);
 }
 return dRabbits;
}

References

Project ROSALIND
Modified Fibonacci Problem
Mortal Fibonacci Rabbits
Fibonacci Series
by . Also posted on my website

Wednesday, March 27, 2013

Improving a PostgreSQL report performance: Part 2 - Temporary Table

The report I was working on still did not live up to expectations. There was something else going on. I had to dig a little deeper.

The report was generated by XTraReports and I have no authority to edit it. The ReportDataSource contains functions for retrieving datasets for main report and subreport.

public class ReportDataSource : BaseDataSource
{
 public DataSet GetPrimaryReportData(DateTime fromDate, DateTime toDate)
 {
  string commandText = "select * from myreportfunction;";
  var reportDataSet = ExecuteQuery("ReportDataSet", commandText, new[] { "MainDataSet" });
  return reportDataSet;
 }

 public DataSet GetSubReportData(DateTime fromDate, DateTime toDate)
 {
  string commandText = String.Format("select * from myreportsubfunction");
  return ExecuteQuery("SubReportDataSet", commandText, new[] { "SubDataSet" });
 }
}

And here's how the PostgreSQL functions looked like.

The myreportsubfunction is the one I worked on already so now it looked like the following

CREATE OR REPLACE FUNCTION myreportsubfunction(IN from timestamp without time zone, IN to timestamp without time zone)
  RETURNS TABLE(Name1 character varying, Name2 character varying) AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'select Column1 as Name1, Column2 as Name2
from sometable tbl
inner join ...
where ...
and ...
and $1 <= somedate
and $2 >= somedate
group by ...
order by ...;' USING $1, $2;
END
$BODY$
  LANGUAGE plpgsql VOLATILE

And there was the myreportfunction

CREATE FUNCTION myreportfunction ( FROMdate timestamp without time zone, todate timestamp without time zone )
 RETURNS TABLE ( name character varying, somevalue1 integer, somevalue2 real ) AS $body$
 SELECT
    something,
    sum(somevalue1)::int as somevalue1,
    sum(somevalue2)::real as somevalue2
 FROM myreportsubfunction($1, $2)      group by something;
  $body$ LANGUAGE sql;

What's going on here? Well, looks like first the myreportfunction is called, it calls the myreportsubfunction and returns aggregated results. But then the myreportsubfunction is called separately and essentially executes the same huge query again! No wonder the performance is nowhere near acceptable. Anyway, to satisfy the report requirements, I need to have the aggregated data first, which means that I need to save the results of the huge complex query, aggregate the results and return them for the main report, and then return the saved results of the query as subreport, or "detailed" data. My approach is to use the temporary table.

Here is what the functions will do:

myreportfunction

  • if a temptable exists, drop it
  • create a temptable
  • run a complex query and save results in the temptable
  • run a query that returns the aggregated data for the report

myreportsubfunction

  • if the temptable exists, return everything from the table, then drop the table

And the resulting PostgreSQL code

myreportfunction

CREATE OR REPLACE FUNCTION myreportfunction(IN fromdate timestamp without time zone, IN todate timestamp without time zone)
  RETURNS TABLE("name" character varying, somevalue1 character varying, somevalue2 character varying) AS
$BODY$
BEGIN
DROP TABLE IF EXISTS temptable;
CREATE TEMPORARY TABLE temptable("name" character varying, somevalue1 character varying, somevalue2 character varying);
DELETE FROM temptable;

EXECUTE '
insert into temptable(name, somevalue1, somevalue2)  
select Column1 as Name1, Column2 as Name2
from sometable tbl
inner join ...
where ...
and ...
and $1 <= somedate
and $2 >= somedate
group by ...
order by ...;' USING $1, $2;

RETURN QUERY EXECUTE 'SELECT name, somevalue1, somevalue2 FROM temptable group by name;';
END
$BODY$
  LANGUAGE plpgsql VOLATILE

myreportsubfunction

CREATE OR REPLACE FUNCTION myreportsubfunction(IN timestamp without time zone, IN timestamp without time zone)
  RETURNS TABLE(name character varying, somevalue1 integer, somevalue2 real) AS
$BODY$
BEGIN
IF EXISTS (SELECT 1 FROM temptable) THEN 
 RETURN QUERY EXECUTE'select * from temptable';
 DELETE FROM temptable;
 DROP TABLE IF EXISTS temptable;
END IF;
END
$BODY$
  LANGUAGE plpgsql VOLATILE

Now hoping for the performance improvement by at least 50% ...

References

CREATE TABLE
Improving a PostgreSQL report performance: Part 1 - RETURN QUERY EXECUTE
by . Also posted on my website

Sunday, March 24, 2013

Implementing a simple logging engine with MVC 4

I was investigating the simple logging mechanism for the MVC application. First I came up with some requirements for the logging engine:

High-level requirements:

  • Create logging engine that can log exceptions and other messages and store them in the database
  • Display a filterable grid of all messages

Logging engine

  • Should allow logging of exceptions, including handled and unhandled
  • Should allow logging of custom messages

Filterable grid

  • Should allow paged display of all exceptions and log messages in the database
  • Should allow the option to filter messages based on the date logged and severity

I started with these simple classes that will allow handling of messages and exception:

//actual entry, each will correspond to a line in the grid
public class LogEntry
{
 public int Id { get; set; }
 public DateTime TimeStamp { get; set; }
 public string Path { get; set; }
 public string RawUrl { get; set; }
 public string Source { get; set; }
 public string Message { get; set; }
 public string StackTrace { get; set; }
 public string TargetSite { get; set; }

 public int TypeId { get; set; }
 public virtual LogType LogType { get; set; }
}

//a "helper" class for types like Warning, Information etc.
public class LogType
{
 public int LogTypeId { get; set; }
 public string Type { get; set; }
}

//finally, an enum of all supported log message types
public enum LogTypeNames
{
 All = 0,
 Info = 1,
 Warn = 2,
 Debug = 3,
 Error = 4,
 Fatal = 5,
 Exception = 6
}

These will be reflected in two database tables - the main table for saving all log messages, and a helper table to keep names of message severity levels.

public DbSet<LogType> LogTypes { get; set; }
public DbSet<LogEntry> LogEntries { get; set; }

Next, it is time to mention logging of handled and unhandled exceptions, which can be divided into handled and unhandled exceptions.

Develop mechanism for logging exceptions:

1. Log unhandled exceptions

Unhandled exceptions are, well, exceptions that are not handled in the source code. First, the site web.config must be modified to add a line in the section:

Here's how it works: a method is called in Global.asax file:

public static void RegisterGlobalFilters(GlobalFilterCollection filters)
{
    filters.Add(new HandleErrorAttribute());
}

It registers the HandleErrorAttribute as global action filter. The HandleErrorAttribute checks the customErrors mode, and if it is off, shows the "yellow screen of death". If it is on, the Error view is rendered and a Model is passed to it containing exceptions stack trace. Therefore, an Error.cshtml should be added under Views/Shared, and in its simplest form may look as follows

@using Recipes.Logging
@using Recipes.Models
@{
    Layout = null;
    ViewBag.Title = "Error";
    Logger.WriteEntry(Model.Exception);
}

<!DOCTYPE html>
<html>
<head>
    <title>Error</title>
</head>
<body>
    <h2>
        Sorry, an error occurred while processing your request. The details of the error were logged.
    </h2>
</body>
</html>

For simplicity, all log messages - exceptions, handled and unhandled, and all other custom messages - will be saved in a single database table.

2. Log handled exceptions

The handled exceptions are caught by code and handled directly. The following is the Logger class which handles exceptions and custom messages and saves them to the database:

public static class Logger
{
 public static void WriteEntry(Exception ex)
 {
  LogEntry entry = BuildExceptionLogEntry(ex);
  SaveLogEntry(entry);        
 }

 public static void WriteEntry(string mesage, string source, int logType)
 {
  LogEntry entry = BuildLogEntry(mesage, source, logType);
  SaveLogEntry(entry);
 }

 private static void SaveLogEntry(LogEntry entry)
 {
  using (RecipesEntities context = new RecipesEntities())
  {
   context.LogEntries.Add(entry);
   context.SaveChanges();
  }
 }

 private static LogEntry BuildLogEntry(string message, string source, int logType)
 {
  LogEntry entry = BuildLogEntryTemplate();

  entry.Message = message;
  entry.Source = source;
  entry.LogType = GetLogEntryType((LogTypeNames)logType);
  entry.TypeId = logType;

  return entry;
 }

 private static LogEntry BuildExceptionLogEntry(Exception x)
 {
  Exception logException = GetInnerExceptionIfExists(x);
  LogEntry entry = BuildLogEntryTemplate();

  entry.Message = logException.Message;
  entry.Source = logException.Source ?? string.Empty;
  entry.StackTrace = logException.StackTrace ?? string.Empty;
  entry.TargetSite = logException.TargetSite == null ? string.Empty : logException.TargetSite.ToString();
  entry.LogType = GetLogEntryType(LogTypeNames.Exception);
  entry.TypeId = (int) LogTypeNames.Exception;

  return entry;
 }

 private static LogEntry BuildLogEntryTemplate()
 {
  return new LogEntry
       {
        Path = HttpContext.Current.Request.Path,
        RawUrl = HttpContext.Current.Request.RawUrl,
        TimeStamp = DateTime.Now,
       };
 }

 public static string BuildExceptionMessage(Exception x)
 {
  Exception logException = GetInnerExceptionIfExists(x);

  string strErrorMsg = Environment.NewLine + "Error in Path :" + HttpContext.Current.Request.Path;
  // Get the QueryString along with the Virtual Path
  strErrorMsg += Environment.NewLine + "Raw Url :" + HttpContext.Current.Request.RawUrl;
  // Get the error message
  strErrorMsg += Environment.NewLine + "Message :" + logException.Message;
  // Source of the message
  strErrorMsg += Environment.NewLine + "Source :" + logException.Source;
  // Stack Trace of the error
  strErrorMsg += Environment.NewLine + "Stack Trace :" + logException.StackTrace;
  // Method where the error occurred
  strErrorMsg += Environment.NewLine + "TargetSite :" + logException.TargetSite;
  return strErrorMsg;
 }

 private static LogType GetLogEntryType(LogTypeNames name)
 {
  return new LogType{LogTypeId = (int)name, Type = name.ToString()};
 }

 private static Exception GetInnerExceptionIfExists(Exception x)
 {
  if (x.InnerException != null)
   return x.InnerException;
  return x;
 }
}

With this basic structure in place, I can start adding user interface for displaying the log. I decided to only have two views, Index for the main grid which contains all log messages, and a Details for a detailed information about a single message. Details will be linked from the line in a grid that corresponds to a log message.

Index view.

The view will have several main parts, wrapped in a form.

@using (Html.BeginForm("Index", "Logging", new { CurrentPageIndex = 1 }, FormMethod.Get, new { id = "myform" }))
{

}

First is the div that shows the number of records found and gives an option to choose how many records per page will be displayed.

<div class="grid-header">
    <div class="grid-results">
        <div class="inner">
            <span style="float: left">
                @string.Format("{0} records found. Page {1} of {2}", Model.LogEvents.TotalItemCount, Model.LogEvents.PageNumber, Model.LogEvents.PageCount)
            </span>

            <span style="float: right">
                Show @Html.DropDownListFor(model => model.PageSize, new SelectList(FormsHelper.PagingPageSizes, "Value", "Text", Model.PageSize), new { onchange = "document.getElementById('myform').submit()" }) results per page
            </span>
            
            <div style="clear: both"></div>
        </div> <!-- inner -->
    </div>  <!-- grid-results -->
 </div>  <!-- grid-header -->

The second allows to filter records by date logged and severity

 <div class="grid-filter">        
    <div class="inner">
        Level : @Html.DropDownList("LogLevel", new SelectList(FormsHelper.LogLevels, "Value", "Text"))

        For : @Html.DropDownList("Period", new SelectList(FormsHelper.CommonTimePeriods, "Value", "Text"))
        
        <input id="btnGo" name="btnGo" type="submit" value="Apply Filter" />                      
    </div>
 </div>   

Next is the "pager" div, which allows navigation if multiple pages are reqiured

  <div class="paging">
    <div class="pager">
        @Html.Pager(ViewData.Model.LogEvents.PageSize, ViewData.Model.LogEvents.PageNumber, ViewData.Model.LogEvents.TotalItemCount, new { LogType = ViewData["LogType"], Period = ViewData["Period"], PageSize = ViewData["PageSize"] })
    </div>
 </div>

Finally, the main part is the actual grid which displays the messages.

 @if (Model.LogEvents.Count() == 0)
 {
 <p>No results found</p>
 }
 else
 {
 <div class="grid-container">
 <table class="grid">
    <tr>
        <th></th>
        <th>#</th>
        <th>Source</th>
        <th>Date</th>
        <th style='white-space: nowrap;'>Time ago</th>
        <th>Message</th>
        <th>Type</th>
    </tr>

 @{int i = 0;}
     @foreach (var item in Model.LogEvents)
     {
     <tr class="@(i++ % 2 == 1 ? "alt" : "")">
     <td>
        @Html.ActionLink("Details", "Details", new { id = item.Id.ToString(), loggerProviderName = "Go To Details" /*item.LoggerProviderName*/ })
     </td>
     <td>
        @i.ToString()
     </td>
     <td>
        @item.Source
     </td>
     <td style='white-space: nowrap;'>
        @String.Format("{0:g}", item.TimeStamp.ToLocalTime())
     </td>
     <td style='white-space: nowrap;'>
        @item.TimeStamp.ToLocalTime().TimeAgoString()
     </td>
     <td>
        <pre>@item.Message.WordWrap(80)</pre>
     </td>
     <td>
        @item.LogType.Type
     </td>
     </tr>
     }

 </table>
 </div> <!-- grid-container -->
}

A few points of interest:

The Index method in the controller returns a ViewModel. By default, all configurable parameters (page size, time period, page number and log level) are not set, and all log messages are displayed with the default page size of 20 entries. When a value is set in the UI and the form is submitted, a corresponding parameter is passed to the controller.

public ActionResult Index(string Period = null, int? PageSize = null, int? page = null, string LogLevel = null)
{
 string defaultPeriod = Session["Period"] == null ? "All" : Session["Period"].ToString();
 string defaultLogLevel = Session["LogLevel"] == null ? "All" : Session["LogLevel"].ToString();

 LoggingIndexModel model = new LoggingIndexModel();

 model.Period = Period ?? defaultPeriod;
 model.LogLevel = LogLevel ?? defaultLogLevel;
 model.CurrentPageIndex = page.HasValue ? page.Value - 1 : 0;
 model.PageSize = PageSize.HasValue ? PageSize.Value : 20;

 TimePeriod timePeriod = TimePeriodHelper.GetUtcTimePeriod(model.Period);

 model.LogEvents = repository.GetByDateRangeAndType(model.CurrentPageIndex, model.PageSize, timePeriod.Start, timePeriod.End, model.LogLevel);

 ViewData["Period"] = model.Period;
 ViewData["LogLevel"] = model.LogLevel;
 ViewData["PageSize"] = model.PageSize;

 Session["Period"] = model.Period;
 Session["LogLevel"] = model.LogLevel;

 return View(model);
}

GetByDateRangeAndType does the work for selecting appropriate set of log messages from the database.

public IPagedList<LogEntry> GetByDateRangeAndType(int pageIndex, int pageSize, DateTime start, DateTime end, string logLevel)
{
 IQueryable<LogEntry> list;
 IPagedList<LogEntry> pagedList;

 list = db.LogEntries.Where(e =>
   (e.TimeStamp >= start && e.TimeStamp <= end));

 if (logLevel != LogTypeNames.All.ToString())
 {
  list = list.Where(e => e.LogType.Type.ToLower() == logLevel.ToLower());
 }

 list = list.OrderByDescending(e => e.TimeStamp);
 pagedList = new PagedList<LogEntry>(list, pageIndex, pageSize);
 return pagedList;
}

The data is returned in the form of a PagedList which is implemented as follows:

public interface IPagedList<T> : IList<T>
{
 int PageCount { get; }
 int TotalItemCount { get; }
 int PageIndex { get; }
 int PageNumber { get; }
 int PageSize { get; }

 bool HasPreviousPage { get; }
 bool HasNextPage { get; }
 bool IsFirstPage { get; }
 bool IsLastPage { get; }
}

Main part of the PagedList class:

public class PagedList<T> : List<T>, IPagedList<T>
{
 public PagedList(IEnumerable<T> source, int index, int pageSize)
  : this(source, index, pageSize, null)
 {
 }

 #region IPagedList Members

 public int PageCount { get; private set; }
 public int TotalItemCount { get; private set; }
 public int PageIndex { get; private set; }
 public int PageNumber { get { return PageIndex + 1; } }
 public int PageSize { get; private set; }
 public bool HasPreviousPage { get; private set; }
 public bool HasNextPage { get; private set; }
 public bool IsFirstPage { get; private set; }
 public bool IsLastPage { get; private set; }

 #endregion

 protected void Initialize(IQueryable<T> source, int index, int pageSize, int? totalCount)
 {
  //### argument checking
  if (index < 0)
  {
   throw new ArgumentOutOfRangeException("PageIndex cannot be below 0.");
  }
  if (pageSize < 1)
  {
   throw new ArgumentOutOfRangeException("PageSize cannot be less than 1.");
  }

  //### set source to blank list if source is null to prevent exceptions
  if (source == null)
  {
   source = new List<T>().AsQueryable();
  }

  //### set properties
  if (!totalCount.HasValue)
  {
   TotalItemCount = source.Count();
  }
  PageSize = pageSize;
  PageIndex = index;
  if (TotalItemCount > 0)
  {
   PageCount = (int)Math.Ceiling(TotalItemCount / (double)PageSize);
  }
  else
  {
   PageCount = 0;
  }
  HasPreviousPage = (PageIndex > 0);
  HasNextPage = (PageIndex < (PageCount - 1));
  IsFirstPage = (PageIndex <= 0);
  IsLastPage = (PageIndex >= (PageCount - 1));

  //### add items to internal list
  if (TotalItemCount > 0)
  {
   AddRange(source.Skip((index) * pageSize).Take(pageSize).ToList());
  }
 }
}

PagedList uses some helper methods from the Pager class to render HTML and generate links to other pages of the log.

public class Pager
{
 .....

 /// <summary>
 /// Rendes HTML to display a "pager" control (used at the top and bottom of the list of logged messages)
 /// </summary>
 /// <returns>String of HTML</returns>
 public string RenderHtml()
 {
  int pageCount = (int)Math.Ceiling(totalItemCount / (double)pageSize);
  const int nrOfPagesToDisplay = 10;

  var sb = new StringBuilder();

  // Previous
  if (currentPage > 1)
  {
   sb.Append(GeneratePageLink("<", this.currentPage - 1));
  }
  else
  {
   sb.Append("<span class=\"disabled\"><</span>");
  }

  int start = 1;
  int end = pageCount;

  if (pageCount > nrOfPagesToDisplay)
  {
   int middle = (int)Math.Ceiling(nrOfPagesToDisplay / 2d) - 1;
   int below = (currentPage - middle);
   int above = (currentPage + middle);

   if (below < 4)
   {
    above = nrOfPagesToDisplay;
    below = 1;
   }
   else if (above > (pageCount - 4))
   {
    above = pageCount;
    below = (pageCount - nrOfPagesToDisplay);
   }

   start = below;
   end = above;
  }

  if (start > 3)
  {
   sb.Append(GeneratePageLink("1", 1));
   sb.Append(GeneratePageLink("2", 2));
   sb.Append("...");
  }
  for (int i = start; i <= end; i++)
  {
   if (i == currentPage)
   {
    sb.AppendFormat("<span class=\"current\">{0}</span>", i);
   }
   else
   {
    sb.Append(GeneratePageLink(i.ToString(), i));
   }
  }
  if (end < (pageCount - 3))
  {
   sb.Append("...");
   sb.Append(GeneratePageLink((pageCount - 1).ToString(), pageCount - 1));
   sb.Append(GeneratePageLink(pageCount.ToString(), pageCount));
  }

  // Next
  if (currentPage < pageCount)
  {
   sb.Append(GeneratePageLink(">", (currentPage + 1)));
  }
  else
  {
   sb.Append("<span class=\"disabled\">></span>");
  }
  return sb.ToString();
 }

 /// <summary>
 /// Generates a link to a page
 /// </summary>
 /// <param name="linkText">Text displayed on the page</param>
 /// <param name="pageNumber">Number of the page the link leads to</param>
 /// <returns></returns>
 private string GeneratePageLink(string linkText, int pageNumber)
 {
  var pageLinkValueDictionary = new RouteValueDictionary(linkWithoutPageValuesDictionary) {{"page", pageNumber}};
  var virtualPathData = RouteTable.Routes.GetVirtualPath(this.viewContext.RequestContext, pageLinkValueDictionary);

  if (virtualPathData != null)
  {
   const string linkFormat = "<a href=\"{0}\">{1}</a>";
   return String.Format(linkFormat, virtualPathData.VirtualPath, linkText);
  }
  return null;
 }
}

Details view.

There is nothing special about the details view - it's a usual MVC view that displays entity data.

@model Recipes.Models.LogEntry
@{
    ViewBag.Title = "Details";
}

<link href="@Url.Content("~/Content/logging.css")" rel="stylesheet" type="text/css" />

<h2>Details</h2>

<p>        
    @Html.ActionLink("Back to List", "Index")
</p>

<fieldset>
    <legend>Fields</legend>
        
    <div class="display-label">Id</div>
    <div class="display-field">@Model.Id</div>
        
    <div class="display-label">LogDate</div>
    <div class="display-field">@String.Format("{0:g}", Model.TimeStamp)</div>
        
    <div class="display-label">Source</div>
    <div class="display-field">@Model.Source</div>
        
    <div class="display-label">Type</div>
    <div class="display-field">@Model.LogType.Type</div>
        
    <div class="display-label">Message</div>
    <div class="display-field">
        <pre>@Model.Message.WordWrap(80)</pre>
    </div>
        
    <div class="display-label">StackTrace</div>
    <div class="display-field">@Model.StackTrace</div>                      
        
</fieldset>

<p>        
    @Html.ActionLink("Back to List", "Index")
</p>

Details Controller

public ActionResult Details(string loggerProviderName, string id)
{
 LogEntry logEvent = repository.GetById(id);

 return View(logEvent);
}

The final result is represented in the image below:

MVC logging engine

References

Processing Unhandled Exceptions (C#)
Logging in MVC Part 1- Elmah (and other posts of the series)
MVC Basic Site: Step 2 - Exceptions Management
How is Error.cshtml called in ASP.NET MVC?
by . Also posted on my website

Wednesday, March 20, 2013

Improving a PostgreSQL report performance: Part 1 - RETURN QUERY EXECUTE

I was working on optimising a report which had a very poor performance. The "heart" of the report was a fairly complex query which I will briefly refer to as follows

select Column1 as Name1, Column2 as Name2
from sometable tbl
inner join ...
where ...
and ...
and $1 <= somedate
and $2 >= somedate
group by ...
order by ...;

In fact, the query joined seven tables and several WHERE conditions, grouping on four fields and finally sorting the results. I went through the usual stuff with analyzing the query plan, verifying that all required indexes were in place (a couple of joins on particularly large tables, unfortunately, were on the 'varchar' fields, but modifying the database at this stage is out of the question so I had to do what I could). Eventually what limited amount of tricks I had at my disposal was depleted, and the performance only slightly improved. However, when I measured the performance of the report when called from the application and compared it to running the query directly against the database, there was a significant overhead in case of the report. When the report was ran from code, it sent the following query to the database:

select * from myreportsubfunction ('2013-03-13', '2013-03-20');

And the myreportsubfunction was declared similar to the following:

CREATE OR REPLACE FUNCTION myreportsubfunction(IN from timestamp without time zone, IN to timestamp without time zone)
  RETURNS TABLE(Name1 character varying, Name2 character varying) AS
$BODY$
select Column1 as Name1, Column2 as Name2
from sometable tbl
inner join ...
where ...
and ...
and $1 <= somedate
and $2 >= somedate
group by ...
order by ...;
$BODY$
  LANGUAGE sql VOLATILE

So - what's the trick here? The function seems to return the result of the query, but takes way much longer to execute compared to the raw query. And here is the reason: when the database prepares a query plan for the function, it does not know anything about the parameters. The result is likely to be a bad query plan, especially if the query is complex. The solution is to change sql language to plpgsql and make use of the RETURN QUERY EXECUTE command. Now the myreportsubfunction looks like the following:

CREATE OR REPLACE FUNCTION myreportsubfunction(IN from timestamp without time zone, IN to timestamp without time zone)
  RETURNS TABLE(Name1 character varying, Name2 character varying) AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'select Column1 as Name1, Column2 as Name2
from sometable tbl
inner join ...
where ...
and ...
and $1 <= somedate
and $2 >= somedate
group by ...
order by ...;' USING $1, $2;
END
$BODY$
  LANGUAGE plpgsql VOLATILE

The function now takes as much time to run as the "raw" query, significantly improving the performance.

References

39.5. Basic Statements
Postgresql Slow on custom function, php but fast if directly input on psql using text search with gin index
by . Also posted on my website

Sunday, March 10, 2013

Fixing PostgreSQL index bloating with scheduled REINDEX via pgAgent

There is a problem generally called 'index bloating' that occurs in PostgreSQL under certain circumstances when there are continuous inserts and deletes happening in a table. It is described as follows "B-tree index pages that have become completely empty are reclaimed for re-use. However, there is still a possibility of inefficient use of space: if all but a few index keys on a page have been deleted, the page remains allocated. Therefore, a usage pattern in which most, but not all, keys in each range are eventually deleted will see poor use of space. For such usage patterns, periodic reindexing is recommended".

This looks exactly like the problem I came across, when a table with ~2K rows had an index of over 120MB and another table with ~80K rows had 4 indexes on it, with total size approaching 3GB. The AUTOVACUUM was running as configured by default but apparently not enough to prevent index bloating.

Eventually, I decided configuring a REINDEX to run monthly on said tables. pgAgent is the job scheduler to use with PostgreSQL, but with PostgreSQL 9.1 I could install it following the documentation - the tables were created and the service was running, but I could not find any UI for it. So here's an example script that I used to create a scheduled job.

SET search_path = pgagent;

INSERT INTO pga_jobclass VALUES (6, 'Scheduled Tasks');

INSERT INTO pga_job VALUES (5, 6, 'TableReindex', 'Reindex tables', '', true, 
 '2013-03-27 10:00:00.000+11', --date created
 '2013-03-07 10:00:00.000+11', --date changed
 NULL, NULL, NULL);

INSERT INTO pga_schedule VALUES (3, 5, 'TableReindexSchedule', 'Reindex tables', 
 true, --enabled
 '2013-03-27 10:00:00.000+11', --start date
 NULL, --end (never)
 '{t,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f}', --minutes: 't' for run on the first minute of an hour
 '{t,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f}', --hours: 't' to run at 3 AM
 '{t,t,t,t,t,t,t}', -- weekdays: don't care, all false
 '{t,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f}', -- monthdays: 't' to run on the first day
 '{t,t,t,t,t,t,t,t,t,t,t,t}'); -- months: all true to run on the first day on each month

INSERT INTO pga_jobstep VALUES (5, 5, 'TableReindexInfo', '', true, 's', 'REINDEX TABLE mytable1;REINDEX TABLE mytable2;', '', '@@DATABASE_NAME@@', 'f', NULL);

To verify, I checked the pga_job table and found '2013-04-01 03:00:00+11' in jobnextrun column - that's when I want to run it, at 3 AM on the first of next month.

I still have a question though - I tried using the

'{f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,t}', -- monthdays: 't' 

to run on the first day pattern, because the last value is supposedly used for setting the "last day" - run on the last day of the month.

This, however, returns me quite a bunch of errors which suggest that PostgreSQL has some troubles calculating the next time the job will run.

ERROR:  value "32768" is out of range for type smallint
CONTEXT:  PL/pgSQL function "pga_next_schedule" line 1254 at assignment
SQL statement "SELECT                                                    MIN(pgagent.pga_next_schedule(jscid, jscstart, jscend, jscminutes, jschours, jscweekdays, jscmonthdays, jscmonths))
               FROM pgagent.pga_schedule
              WHERE jscenabled AND jscjobid=OLD.jobid"
PL/pgSQL function "pga_job_trigger" line 24 at SQL statement
SQL statement "UPDATE pgagent.pga_job
           SET jobnextrun = NULL
         WHERE jobenabled AND jobid=NEW.jscjobid"
PL/pgSQL function "pga_schedule_trigger" line 60 at SQL statement

********** Error **********

ERROR: value "32768" is out of range for type smallint
SQL state: 22003
Context: PL/pgSQL function "pga_next_schedule" line 1254 at assignment
SQL statement "SELECT                                                    MIN(pgagent.pga_next_schedule(jscid, jscstart, jscend, jscminutes, jschours, jscweekdays, jscmonthdays, jscmonths))
               FROM pgagent.pga_schedule
              WHERE jscenabled AND jscjobid=OLD.jobid"
PL/pgSQL function "pga_job_trigger" line 24 at SQL statement
SQL statement "UPDATE pgagent.pga_job
           SET jobnextrun = NULL
         WHERE jobenabled AND jobid=NEW.jscjobid"
PL/pgSQL function "pga_schedule_trigger" line 60 at SQL statement

The simplest way to reproduce it is to run an UPDATE similar to the following

UPDATE pgagent.pga_schedule
SET
jscmonthdays = '{f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,t}'
WHERE jscid = 3

Since I don't really care about running it on a first or last day, I won't dig deep into it. Could be a bug in PostgreSQL for all I know.

References:

23.2. Routine Reindexing
How To Install PgAgent On Windows (PostgreSQL Job Scheduler)
Setting up PgAgent and Doing Scheduled Backups
Automated Backup on Windows
by . Also posted on my website

Wednesday, February 27, 2013

On PostgreSQL Inverse mode and database audit table triggers

While working on a different issue, I noticed something strange about a history table. A history table keeps all changes done to the records in the main table - whenever something was inserted, updated or deleted, a record is added to the history table. However, each time something was inserted, there were two records added to the history table. First one had a null value in the main table foreign key field, and the second one had the correct value.

Duplicates in history table

Turns out that the reason is the way NHibernate maps objects to data. In my case, the parent entity knows about the child entity. The child entity, however, has no knowledge of the parent entity. This can be most easily explained by looking at the following snippet:

public class Order
{
 public virtual int Id { get; set; }
 public virtual ICollection<Detail> Details { get; set; }
}

public class Detail
{
 public virtual int Id { get; set; }
 public virtual string Name { get; set; }
}

// ...
var order = new Order();
var detail = new Detail() {Name = "Widget"};

session.Persist(detail);
order.Details.Add(detail);

While the Order knows about the Details, each Detail has no knowledge about the parent Order. PostgreSQL will perform the following sequence of actions:

  • Insert the Order
  • Insert the Detail with OrderId = null
  • Update the Detail with actual OrderId
which in my case looks like this:

Insert, then Update

How can that be fixed so that only one INSERT statement is executed? Well, here is where Inverse comes into play. Check the references at the end of the post for a proper explanation, but the definition is that setting Inverse to true places responsibility of the saving the relationship on the "other side", in this case - on the child. In the mapping, that will look as a change from

mapping.HasMany(x => x.Children).KeyColumn("inventoryitemid")
 .AsSet();

To

mapping.HasMany(x => x.Children).KeyColumn("inventoryitemid")
 .AsSet()
 .Inverse(); 

And the code snippet should now look this way

public class Order
{
 public virtual int Id { get; set; }
 public virtual ICollection<Detail> Details { get; set; }
}

public class Detail
{
 public virtual Order Order { get; set; }
 public virtual int Id { get; set; }
 public virtual string Name { get; set; }
}

// ...
var order = new Order();
var detail = new Detail() {Name = "Widget", Order = order};

session.Persist(detail);
order.Details.Add(detail);

Note how the Order is added to Detail and also is passed when the Detail object is created. In my case, the profiler now shows the following:

Insert only

So, problem solved? Well, yes and no. While profiling the modified solution, I found that now a lot more actions were logged in the same scenario - most of them were second level cache access. The impact on performance is not immediately obvious, but hardly beneficial.

Second level cache access

It is likely that by adding a reference to the parent I forced NHibernate to retrieve the whole parent entity. Eventually, not having enough time to evaluate full impact of the change, a somewhat "compromise" solution was chosen: modify a trigger that inserts into a "history" table to check if the value of a foreign key is not null.

So, a trigger that looked like this

CREATE OR REPLACE FUNCTION process_emp_audit() RETURNS TRIGGER AS $emp_audit$
    BEGIN
        IF (TG_OP = 'INSERT') THEN
            INSERT INTO emp_audit SELECT 'I', now(), user, NEW.*;
            RETURN NEW;
  ...
        RETURN NULL; -- result is ignored since this is an AFTER trigger
    END;
$emp_audit$ LANGUAGE plpgsql;

would now be modified as follows

CREATE OR REPLACE FUNCTION process_emp_audit() RETURNS TRIGGER AS $emp_audit$
    BEGIN
        IF (TG_OP = 'INSERT' AND NEW.foreignkeyfield IS NOT NULL) THEN
            INSERT INTO emp_audit SELECT 'I', now(), user, NEW.*;
            RETURN NEW;
  ...
        RETURN NULL; -- result is ignored since this is an AFTER trigger
    END;
$emp_audit$ LANGUAGE plpgsql;

Looks like a dirty hack, which falls into the "best we can do in the least time" category.

References:

NHibernate Inverse attribute
Fluent NHibernate and Collections Mapping
NHibernate performance issues #1: evil List (non-inverse relationhip)
NHibernate's inverse - what does it really mean?
39.9. Trigger Procedures
by . Also posted on my website

Monday, February 11, 2013

Photobox – CSS3 JQuery Image Gallery

I came across a nice image gallery script which is lightweight, hardware accelerated and generally looks good. Image can be zoomed in and out using mouse wheel and navigated using mouse move. Image 'alt' is shown at the bottom, and the row of thumbnail images is also displayed at the bottom. The autoplay is supported and time is configurable. The script can be downloaded from Photobox github. It only supports IE 8 and higher, and does not look as good as in other browsers though.

The usage is very easy: jQuery, script and css have to be referenced as usual, i.e.

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.0/jquery.min.js" type="text/javascript"></script> 
<link href="@Url.Content("~/Scripts/photobox/photobox.css")" rel="stylesheet" type="text/css"/>
<link href="@Url.Content("~/Scripts/photobox/photobox.ie.css")" rel="stylesheet" type="text/css"/>
<script src="@Url.Content("~/Scripts/photobox/photobox.js")" type="text/javascript"></script>

A gallery with all default values (again, check Photobox github for parameters) is included as follows

<div id='gallery'>
  <a href="../../../Content/photobox/P1.jpg">
   <img src="../../../Content/photobox/P1_small.jpg" alt="photo1 title"/>
  </a>

  ...
  //More images
</div>

<script type="text/javascript">
 $(document).ready(function () {
     $('#gallery').photobox('a');
 });
</script>

A more involved setup with parameters may look as follows

<script type="text/javascript">
 $(document).ready(function () {
     $('#gallery').photobox('a:first', { thumbs:false, time:0 }, imageLoaded);
  function imageLoaded(){
   console.log('image has been loaded...');
  }
 });
</script>

I added a sample gallery (photos courtesy of my wife) to my website: Photobox Example

The border around the images is totally optional

<style type="text/css">
img {
   padding:1px;
   border:1px solid #021a40;
   background-color:#ff0;
}
</style>

References

Photobox – CSS3 JQuery Image Gallery
Photobox github
jquery access nested div
Using CSS for Image Borders
by . Also posted on my website

Monday, February 4, 2013

Use of PostgreSQL Indexes

I'm busy with investigating how indexing works in PostgreSQL and what are the ways to improve it. One particularly useful query I came across is this:

SELECT idstat.schemaname AS schema_name, idstat.relname AS table_name,
indexrelname AS index_name,
idstat.idx_scan AS times_used,
pg_size_pretty(pg_relation_size(idstat.relid)) AS table_size, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
n_tup_upd + n_tup_ins + n_tup_del as num_writes,
indexdef AS definition
FROM pg_stat_user_indexes AS idstat JOIN pg_indexes ON (indexrelname = indexname AND idstat.schemaname = pg_indexes.schemaname)
JOIN pg_stat_user_tables AS tabstat ON idstat.relid = tabstat.relid
WHERE idstat.idx_scan  < 200
 AND indexdef !~* 'unique'
ORDER BY idstat.relname, indexrelname;

It returns the following information for each index in the database:

  • schema name
  • table name
  • index name
  • disk space used by the index, and the table
  • how many rows were inserted, deleted or updated
  • how many times the index was used

If the database was used for some time, the information may help to find out which indexes are not used at all, or used rarely but occupy a lot of space on the disk. Or it may suggest that something is not working as designed - a rarely used index that was expected to be used a lot is probably a warning sign.

The unfortunate complication that I came across was that the query returned absolutely no data after the database restore. My current understanding is that PostgreSQL does not backup the pg_catalog, and also the indexes are rebuilt when the database is restored. Therefore, if I do not have direct access to the database, I have to either ask someone to run the script (and give them the PostgreSQL/pgAdmin password), or somehow obtain a file system level copy of the database. In the future, I'll need to create a utility that extracts this information and saves it to a file.

References

Handling Growth with Postgres: 5 Tips From Instagram
Efficient Use of PostgreSQL Indexes
Finding Useless Indexes
9.20. System Administration Functions
27.2. The Statistics Collector
PostgreSQL index usage, backing up pg_catalog data
by . Also posted on my website