December 2003 - Posts

Should an O/R Mapper also be a code generator?

With all the hype surrounding O/R Mappers of late Mark Bonafe decided it was time to look at LLBLGen Pro which he “thought was among the best O/R tools available.” 

When I originally looked at LLBLGen Pro I had some of the same feelings as Mark.  In my opinion an O/R Mapper should not be a code generator.  An O/R Mapper should perform all its magic through its core components, not by generating lots of code.  I don't view LLBLGen Pro as a “pure” O/R Mapper.  I tried the demo version of the product several times and each time I was left with the feeling that I was evaluating an n-tier code generator with a little bit of O/R Mapping thrown in. 

In what I call a “pure O/R Mapper“, the developer doesn't see any generated code.  The mapper *may* generate some of the data access code at runtime, but, it's not something that the developer using the mapper has to worry about.  The code doesn't clutter up their project, and introduce another thing for them to manage. 

Tablet PC recommentations?

I've been thinking about getting myself a Tablet PC for a Christmas present to myself.  Anyone have any recommendations?  I've done some research myself but wanted to see if any developers out there had any thoughts about what I should look for in a developer friendly Tablet (if there is such a thing).

Should all of us think, act, and behave the same?

I've recently come across a number of posts, and replies which has left me with a lousy feeling.  Should we all think, act, and behave the same?  If Joe Developer thinks one way about something does that mean I need to think the same way?  If George Programmer thinks that he needs to write a tool to help the community, should I have a problem with that?

I have recently seen a lot of people labeling people as one thing or another because of what they decide to do with their time.  If I decide to write a O/R Mapper does that mean I'm an egoistical developer that thinks I'm way better then everyone else on this planet? 

We are all different.  We have different views, opinions, and experiences.  This is what makes the community great.  If everyone thought the same, found the same things important, and had the same knowledge this wouldn't be very interesting.  The thing that is great about blogs is that you can read posts from all different kinds of people.  You can learn new things from people more experienced then you, you can help those with not as much experience see the .NET light, and you can post your ideas and viewpoints on a subject and get constructive criticism from your peers. 

I have a request.  Let's try our best to keep the community a positive place.  Let's not pass judgment on people for what they spend their time doing.  Let's give constructive feedback, let's tell people when we totally disagree with what they're saying or doing, but, lets keep it positive.

I realize by posting this message I'm more or less doing exactly what I'm saying not to do.  I apologize for that. 

.NET O/R Mappers

Now that I've attempted to describe what is an O/R Mappers, described how an O/R Mapper is different then a code generator, and attempted to provide advantages dynamic sql provides an O/R Mapper I thought it might be useful to provide a list of some of the .NET O/R Mappers I've come across.

Feel free to leave your favorite .NET O/R Mapper in the comments of this post!

NVelocity templating engine

The last couple of days I've been digging through the source of NEO.  Today I came across the code generation logic which uses NVelocity.  It appears to be a pretty nice templating engine which is a port of the Velocity Jakarta project.  Anyone used it?

What advantages does dynamic sql provide an O/R Mapper that procs don't?

Ok, so now that I've given my O/R Mapper overview, as well as shared how an O/R Mapper is different then a code generator let me try and attack the question that prompted me to write these posts. 

What advantage does dynamic sql provide an O/R Mapper that stored procedures don't?

To answer this question lets first look at the type of SQL the O/R Mapper has to generate.

  • Save - INSERT and UPDATE statements to update all the properties of the object when the .Save() method is called on the object.  Possibly generate SQL to only update certain columns, for example set the Active flag to true.
  • Delete - DELETE statement to remove an object from the data store by its primary key, by a column, or group of columns.
  • Select - SELECT statement to retrieve an object by its primary key, or any combination of column values.

One can easily create dynamic sql OR stored procedures to provide the necessary functionality.  Clearly neither dynamic SQL or stored procedures offer an advantage if we look at it purely from a functionality point of view.  I haven't come across anything that I could do in dynamic SQL that I couldn't in stored procedures.  However, there are certain things which dynamic sql makes easier.  Lets look at an simple example.

Imagine that you want to be able to query the system for all employees who live in Pennsylvania, have a salary greater then $45,000, and were hired after 1/1/2002.  If your O/R Mapper is using dynamic SQL the mapper will create some SQL that looks like this:

SELECT * FROM t_Employee WHERE State = 'PA' AND Salary > 45000 AND HireDate > '1/1/2002'

Lets assume you want to query the system slightly differently.  This time you want to find all employees who live in Virginia, have a salary less then $150,000, and where hired before 1/1/2003.  Easy enough, this time the OR Mapper creates the following dynamic sql:

SELECT * FROM t_Employee WHERE State = 'VA' AND Salary < 125000 AND HireDate < '1/1/2003'

Now lets take a look at how the stored procedure would be written.  In the examples above we're querying the t_Employee table by three criteria.  However, in reality we want to be able to query the t_Employee table by any of the columns within the table.  How do you write a stored procedure to handle that?  As I've noted in Optional Parameters in SQL Server Search Queries there is a pretty clean way to handle a bunch of optional parameters.  The question becomes what columns do I want to allow the user to query against.  At the start you probably don't make every possible column an optional parameter, since it may introduce some overhead into the procedure.  The other thing which now has to be considered is the fact that your may want to query the columns using different operators (<, >, <>, <=, >=) as in our example above.  How do you handle that in your stored procedure?  I know you can do it, it just takes a lot more work, and a lot more code.

Let me disregard the argument above for the time being.  I'll assume that somebody out there will respond with a really snazzy way to handle a lot of optional parameters as well as the ability to query the table's columns using whatever operator you like (Please leave any solutions in the comments!!).  The real reason I'm starting not to like working with stored procedures is because of the overhead in maintenance they require.  When I change a business object I have to perform several tasks to get my stored procedures and tables in sync.  I have to update each Save, and Select proc that references the table to properly handle the change.  This may mean adding additional columns to the select, changing the WHERE clause to support an additional optional parameter, or changing the save to persist an additional property.  Now consider the number of changes of this type that occur during the development of a project.  It happens often, and it can add a lot of overhead. With dynamic SQL this isn't a problem.  I update my business object, and my table and away I go. 

In summary the two main reasons I like dynamic sql for a O/R Mapper are:

  • Using dynamic sql the O/R Mapper is better able to handle ad-hoc queries.  With dynamic sql we can allow the developer to query the system for objects by any combination of properties.  They can choose to use whatever operator they so desire and don't have to worry about hacking a stored procedure together to accomplish what they want.
  • The maintenance overhead required in an O/R Mapper that uses stored procedures is higher.  When an object is changed you have to go through all your stored procedures and perform all the necessary updates.  Considering the amount of object changes that are made during development of a typical application this can add a lot of overhead to the project.

With that said let me conclude with a little anti-dynamic sql thought.

A good O/R Mapper that supports stored procedures should make the maintenance problem almost non-existent.  If the O/R Mapper is responsible for creating the stored procedures it should be able to update them when a change is made to an object.  The O/R Mapper should also be able to make schema changes to the database.  Maybe the tool could even allow the developer to select two assemblies containing the business objects for the application, and do a diff on the assemblies to create a SQL Script to update the schema, as well as the stored procedures.  If someone creates that (me?) I think I may have less reason to move to dynamic sql.  We shall see....

 

What's the difference between an O/R Mapper and a code generator?

In my previous post I provided an O/R Mapper overview.  One of the common responses I get from people after I describe an OR Mapper is...so its a fancy code generator?

Although my initial reaction is always “Its not a code generator!?!?“, I catch myself each time I get the response and realize it “sorta-kinda” is a super duper fancy code generator.  So what is the difference between an OR Mapper and a code generator?

A code generator generates code.  You run the generator using a set of templates that you setup, and it spits out a whole bunch of code.  When something changes you open up the code generator and re-generate all the code.  A code generator makes it so that when things change you need to re-generate the code, and recompile.  ( I realize this is a huge generalization but for the most part I think it holds true )

An OR Mapper is a framework of components.  The framework may use runtime code generation to aid in the mapping of objects to relational databases, but it doesn't just generate a DAL.  Let me give you a quick example.

In my OR Mapper (not sure if it can really be called an OR Mapper yet but oh well) I use code generation at runtime heavily.  The code that is generated, however, is never seen or compiled by the developer using the mapper.  Below is a step by step of how/when code generation is used in my mapper.

The first step in the process involves a traditional code generator as discussed above.  The generator creates the object with it's properties, along with the correct custom attributes necessary for the mappings.  After the object is created the object can be compiled into the assembly.  At this point none of the code/SQL for saving, deleting, and retrieving my objects from the data store exists.  When the application is run, and the user of the applications goes to save information within one of my objects the code and SQL is generated by the framework and compiled into a dynamic assembly.  The dynamic assembly is then cached and on subsequent save requests used by the framework for saving the object.  This offers a couple of advantages. 

  • The developer never sees, and thus never worries about this code. 
  • The developer doesn't have to go to a separate tool and say to regenerate the code.
  • The generated assembly is always up to date.

In summary, a code generator just generates code.  An O/R mapper may use runtime code generation within its framework of objects to aid in mapping objects to databases, but, it isn't just a code generator.  You don't see the code that it generates (usually). It removes the burden of writing and managing DAL code from the developer.  Rather then worrying about writing (or using a code generator to generate) a bunch of Save, Delete, and Retrieve routines for objects, the developer worries about the business rules and requirements for the application.  The O/R Mapper handles the rest!

What is an O/R Mapper anyway?

There has been a lot of talk in the community recently regarding O/R Mappers....so....what is an O/R Mapper?

Let's start with O/R.  The O stands for object and the R stands for relational.  Within almost any application we have objects and we have a relational database.  Within our business and UI layers we work with our objects.  When we're done making changes we save the information in the objects to the relational database. 

When you're developing an application (without an O/R Mapper) you probably write a lot of data access code.  Code to save, delete, and retrieve your object from the database.  You write methods in your DAL to retrieve different sets of objects, change status', and perform various other tasks.  Writing this data access code *can be* very repetitive.

If you open up your latest application and look at the DAL you will likely see a common pattern among your methods.  Lets take the DAL methods for saving your objects to the database as an example.  You pass in an object, add a SqlParameter object to a SqlCommand object for each of the properties of your object that should be saved, set the .CommandText property of the SqlCommand to the stored procedure for saving your object, and execute the SqlCommand.  You write the code to do this for every object.

There is a better way!  Enter an O/R Mapper.  Essentially an O/R Mapper “generates” your DAL for you.  Rather then writing the DAL code, you use an O/R Mapper.  You use the O/R Mapper to save, delete, and retrieve your objects.  The O/R Mapper figures out the SQL, you just worry about your objects.

Ok, but how?  Ah, I never did describe the Mapper part did I?  We know we have objects, and we know we have a relational database.  In between the two we have our O/R Mapper.  Often times the properties on our objects do not match 100% to the columns in our database.  In order to have the FirstName property of our customer object save to the first_name column of our t_Customer table we need to do some “mapping.”  We need to let the O/R Mapper know that the FirstName property on our customer needs to be saved to the first_name column of the t_Customer table.  We setup these mappings in our O/R Mapper.  Besides just mapping properties of our objects onto columns in our database we also need to define keys, and relationships within our objects. 

The good O/R Mappers have a designer that allows us to setup all of the necessary mappings.  Some of the more immature (FREE) one's define the mappings between the objects and database with an XML file.  Others use custom attributes to do the mappings.  The way the mappings are done isn't really all that important, what is important is that the O/R Mapper allows us to setup these mappings.

After all the mappings are defined, the O/R Mapper can do A LOT of work for us.  Using the mappings the O/R Mapper can generate all the SQL for saving, deleting, and retrieving our objects.  We no longer have to write hundreds of lines of DAL code.

There is a lot more to O/R Mappers then I've described here.  In future posts I'll go into some of the other features that O/R Mappers provide. 

I'm sick of my stored procedure generator

Tonight as I was working on some enhancements to an application I developed a couple months back I realized how sick I am of my stored procedure generator.  I wrote the application a couple of years ago because I was so tired of coding the same old stored procedures over and over for my applications.  When I wrote the app it was an amazing productivity boost.  Rather then writing my procs by hand I could just open up my generator and with a few clicks have all my procs.  Well the honeymoon is over.  I'm officially over my stored proc generator.  Frans I hope your happy.  This is in large part due to the dynamic SQL vs. stored procedure debate that went on a couple weeks back.  The debate was what pushed me over the edge.  I had for a while been thinking that dynamic SQL would be a better option for some of the stuff I was doing in my entity framework, however, I had stored procedures so ingrained into my mind that it just didn't feel write to move away from them.  Well I am now ready.  Unfortunately I'm kinda in the middle of a project and can't stop mid-stream and write my dynamic sql entity provider.  I guess I'll have to hate my friggin stored proc generator for at least a couple more weeks

myEntity.ID.GetType() == typeof(int)

What should a ID property on a base entity object be declared as?  I've always coded my ID properties to be int's, however, that has been largely due to the fact that I almost always use SQL Server as my backend data store.  Since my entity framework has evolved from a work project which was using SQL Server, it still has some SQL Server specific logic which needs to be removed. 

In order to support the largest number of data stores I'm thinking of switching things around.  I've identified a couple of different options:

  1. Convert the ID property of the base entity object to be a string or object.  This will allow me to store pretty much any type of ID in the property.  The Entity Provider for each data store would then need to deal with any conversions it requires.
  2. Create a ObjectID/EntityKey base object.  Have the framework support different types of EntityKey objects during the persistence of objects.
  3. Stay with what I have, since it is working where I'm currently using it.

 

Consulting Tips

Today as I was catching up on some blogs I came across a link to Consulting Tips from the Million Dollar Consultant.  The articles provided on the site offer a lot of advice and guidance for anyone doing consulting work. 

A plea for some organization and structure on weblogs.asp.net!

Over the past couple of days I've seen a lot of blog posts over on blogs.gotdotnet.com that look something like this....

“We're being forced to move, we're too popular :-), guess we'll have to move in with all them other fools over at weblogs.asp.net”

Of course I'm kidding about the wording, but, the bottom line is it looks like we're about to have a serious influx of MS bloggers over on weblogs.asp.net.  As of right now I'm not very happy about what's coming.  The number of bloggers over on weblogs.asp.net is already a little more then I'd like.  I like having a main feed for a bunch of bloggers, however, when the feed is aggregating a friggin ton of blogs it starts to get out of control.  How about a separate url for all the MS peeps, ms.weblogs.asp.net or something similar. 

At least make available a public OPML for each set of bloggers so I can use Feedgregator to create my own MS only feed.  Please do not just add all these MS bloggers and make the main feed for the site useless.  I like the main feed, I like that I know its mostly non-ms people, I like the non ms community they have.  I've seen a couple of posts recently about how people are thinking of leaving and creating separate blogs now that Microsoft is moving in.  I don't want that.  If some of them leave, I'm sure I'll follow them to where they go, however, I will not follow them all, there are just too many blogs out there to follow.

Ok, so I've said I don't want everybody moving in, but what do I want?  They are coming and that isn't going to change so how about this:

1. Add groupings to the administration for .Text.  When a new blogger is added to the site assign them to the appropriate grouping.  To start with I'd be happy with two groupings: Microsoft and Non-Microsoft.  Ideally we'd have a finer level of grouping.  Below is a couple example sub-groupings:

  • Indigo
  • Longhorn
  • C#
  • VB.NET
  • ASP.NET

2. Create a separate main feed for each of the groupings. 

3. Allow people to create their own customized feed by picking the groupings they're interested in.  So if I want a feed with MS bloggers who are on the Indigo and WinFS teams I should be able to have it.

Whatever you do  (whoever “you” are)  don't just throw everyone in there and make people feel like they are getting lost, put some organization around things, give people a home, make them feel like they can still belong.

Please!

 

Persisting properties of child objects

I often get frustrated by the amount of code I need to write to handle the persistence of entity objects with children objects.  Lets take an Order object as an example.  An order has a customer associated with it.  When designing your order class you may add a Customer property that returns a fully loaded Customer object.  This allows you to get all the details pertaining to the customer that made the order.  To support this relationship you will likely have a database schema that looks something like this:

OrderID int
CustomerID int
Total money
OrderDate datetime
etc...

When you save your order you first need to save the details of the Customer associated with the order.  Then you need to write some code to set the CustomerID property of the order to the ID of the customer that was just saved.  Then you need to do the same thing for all the other child objects, such as Credit Card, Address, Billing Address, and etc.  An entity object with a large number of children objects often gets filled with code for saving all the child objects, syncing up the associated ID properties, and then saving the parent object.

To help reduce the amount of code I needed to write in theses scenarios I created a new custom attribute.  The [PersistProperty] attribute tells my entity framework what property of the child object needs to be persisted with the parent object.  The below example shows that the ID property of the Customer should be persisted with the Order object, and that it should be stored in the CustomerID column of the order table.  When the SQL is sent to the database for saving the order object the Order.Customer.ID property will be sent along and stored in the CustomerID column of the Order table.

[PersistProperty(PropertyName="ID", PersistAs="CustomerID")]

public Customer Customer {
  get {
    
if(_customer == null)
        _customer =
new Customer();
     
return _customer;
  }
  set {
     _customer =
value;
  }
}

The support for this new attribute within my framework allows me to remove the need to write a bunch of code for saving the customer and hooking the saved customer ID up to the order.  I haven't completely finished the implementation, but the information in the [PersistProperty] attribute can also be used during the load of the Order to set the Order.Customer.ID property to the value in the CustomerID column of the Order table.

See try and tell me this isn't fun and interesting.  I couldn't let Thomas and Frans have all the fun!

Why create your own O/R Mapper (Entity Framework)?

A question I get asked every so often is “why write your own O/R Mapper when there are companies out there with an entire team of developers working on them for you”?  There are many reasons I've created my own, and will continue to do so.

  • I really enjoy thinking about the challenges it presents
  • I like the ability to completely change the way it works to make it do what I need it to.
  • It makes me think more about API usability, and design then some of the other things I could be spending my time on.
  • Its fun to post things about my O/R Mapper and hear Thomas tell me its a stupid design :-)
  • Its fun to pretend mine is better then everyone else's even though its just a “little” side project.

All in all I work on my O/R Mapper (Entity Framework) because its fun.  The reason I code is because I enjoy it.  I'm too often “forced” to do a lot of other things at work that more or less suck, so sitting down and coding up something that presents an interesting challenge is a nice change of pace!

 

Configuration Section Handlers, the easy way

A couple months ago I came across a couple of posts from Craig Andera about a really snazzy configuration section handler.  I recently stumbled upon it again and it's so slick I had to get a post up here for anyone who might have missed it.

The Last Configuration Section Handler I'll Ever Need

and a follow up...

XmlSerializer as XPath

Configuration Section Handlers, the easy way

A couple months ago I came across a couple of posts from Craig Andera about a really snazzy configuration section handler.  I recently stumbled upon it again and it's so slick I had to get a post up here for anyone who might have missed it.

The Last Configuration Section Handler I'll Ever Need

and a follow up...

XmlSerializer as XPath

My new coding spot

I just recently had my attic transformed into an office.  I have my second little one on the way and my wife stole my old office so we could keep our guest bedroom...the fear of not having a nice coding spot “forced” me into transforming my attic.  I'm not the most handy fellow in the world so I had somebody come in to do the construction, and then did the painting, staining, and touch up myself over the thanksgiving holiday.  I'm extremely pleased with how it turned out.  Up on the third floor I can barely hear my wife yell up to me to stop working and come to bed ;-)

Check out my new coding spot in the Photo Gallery

Finding the type of a class from a static method

One of the common requests I get from people using my base entity framework is in regards to how they call some methods on the entity objects.  Rather then calling the methods on an instance of a class they want to call static methods.

// current implementation
CustomerCollection customers = new Customer().GetAll();

// requested implementation
CustomerCollection customers = Customer.GetAll();

In order to for the entity object to work its magic in the above example it needs to know what type of class its is retrieving.  Using the type of the class it figures out what table to select the data from, as well as what type of object to create an instance of (Activator.CreateInstance).  In order for the static method to work I would need to know how to get the name of the class that the static method is within.  GetType() gives us this information when we have an instance of the class, however, it is not available within static methods.  The search goes on....

Using global variables in DTS Packages

Today to move a bunch of data out of a database I used SQL Server's Data Transformation Services.  I created several Transform Data Tasks to move data from the production database to an “archive” database.  When all the data was removed I removed the data that had been archived.  The data we archived was selected by its modified date.  Rather then go into each DTS Task and update the WHERE clause for the SELECT or DELETE we used Global Variables.  This was actually the first time I've used global variables, so I figured I'd share my experiences.

To setup a global variable for use in your DTS package open up the properties for the package (Right click on the DTS Designer and select “Package properties“).  Within the global properties dialog you can enter the name for the global variable, its data type, as well as a value.  Then within the SQL for each of your transform data tasks you can enter a ? where the global variable should be inserted.  Click on the “Parameters” button in the edit window for the transform data task (its below the SQL query textbox) to wire up the parameters in your query to your global variable.

In my particular scenario I had about 5 transform data tasks that needed to either select or delete data based on one date.  By using a global variable I can now update the date to use in each query without touching each individual task.  Ah, the joys of DTS!

Archiving data in one big ass database

So we have a problem.  One of our clients has a database that is growing out of control.  The database is currently about 85gb, and they only have a 100gb drive in the server.  With all the other “stuff” on the machine they have a total of approximately 4gb remaining, well probably about 3.5gb by now.

The largest tables in the database are for tracking visitors of the site as well as for tracking where visitors go on the site.  We're planning on archiving the data in these tables by creating another database on a backup server, copying the “out of date“ data to the backup database, and then removing the data from the live database server.  The only question is....can we do it before we run out of space.....

Test Infected Developers Anonymous

If I don’t have tests, I will get lost, assuming the code I’m writing is even vaguely challenging. Maybe I can just bash out the code, but how long will testing it manually take? Probably at least as long as writing my unit tests. I could just hand the “completed” code to QA when I think I’m done, without performing any testing, but then they’re going to find glaring bugs in it and hand it back
# http://mikemason.ca/2003/12/03/#029FastestWayToDevelop.

The more I do TDD the more I feel just as Mike does.  I've gone back to a couple of pre-TDD projects recently and I really don't like the feeling I have while working with them.  As soon as I start on a project without any tests I create a new test project so that I can validate that my changes do what I expect.  Unfortunately that doesn't give me any assurance that what I'm changing isn't breaking a bunch of other classes that don't have unit tests.  Coding in a non-TDD world is scary, I need the warm glow of those green bars!