Introduction


This article describes the implementation of C1DBSpellChecker, a utility that performs spell-checking on OleDb databases including SqlServer and Access.


After reading this article, you will be able to specify a connection string, and then select a table and one or more string fields on the table to spell-check. Typing errors will then be displayed in a grid where they can be corrected. When all corrections have been made, the changes will be saved back to the database.


C1DBSpellChecker


Background


Spell-checking is an important tool used by most people when creating documents of all kinds, from formal reports to simple e-mail messages. Many popular applications provide built-in spell-checking that makes this task easy and almost automatic.


Unfortunately, not all applications have built-in spell-checking. Visual Studio is a good example. Developers use Microsoft Visual Studio to create applications, web pages, and documentation. Unless they have a package such as ComponentOne IntelliSpell, it is likely that spelling mistakes will creep into their work and eventually surface on a website or commercial application.


Even developers that use ComponentOne IntelliSpell often rely on content that is stored in databases and hasn't been properly spell-checked. For example, the popular AdventureWorks database contains typos such as:



  • alluminum, alumunim (Aluminum)

  • comparible (Comparable)

  • securly (Securely)

  • funtionality (functionality)

  • manuverability (maneuverability)

  • responsivness (responsiveness)

  • you'l find (you'll find)

Spelling errors like this aren't very professional on a catalog or website. Surprisingly, there are few or no tools for spell-checking databases. The C1DBSpellChecker application was designed to fill this need.


Application Structure


The C1DBSpellChecker application performs the following tasks:



  1. Allow the user to select a connection string.

  2. Get the schema for the selected connection (list of tables and fields).

  3. Allow the user to pick a table and one or more string fields.

  4. Load the data and keep only rows that contain spelling errors.

  5. Show the spelling errors on a grid and allow the user to fix them.

  6. Save the changes back to the database.

These tasks are described in the following sections.


Select a connection string


The first step required to spell-check a database is selecting the database to use. This is done by specifying a connection string. We use the ADODB and MSDASC libraries to accomplish this. These libraries are provided by Microsoft and can be freely distributed. They provide a user interface for creating and editing OleDb connection strings.


We decided to use the OleDb data provider because it provides great flexibility, allowing connections to Sql Server, Access, and many others.


The code used to get and edit the connection strings is simple (the version below omits error checking code for clarity; please refer to the source for a more complete version):

// prompt user for a connection string
string PromptConnectionString(string connString)
{
// create objects we'll need
var dlinks = new MSDASC.DataLinksClass();
var conn = new ADODB.ConnectionClass();

// show connection picker dialog
object obj = conn;
dlinks.hWnd = (int)Handle;
if (dlinks.PromptEdit(ref obj))
{
connString = conn.ConnectionString;
}

// done
return connString;
}

Connection strings created by the user are added to a ComboBox and saved as part of the application settings, so they can be reused across sessions. The code that performs this task is listed below:

// form loaded: load recently used connection strings
protected override void OnLoad(EventArgs e)
{
var mru = Properties.Settings.Default.RecentConnections;
if (mru != null)
{
foreach (string connString in mru)
{
_cmbConnString.Items.Add(connString);
}
}
base.OnLoad(e);
}

// form closing: save recently used connection strings
protected override void OnFormClosing(FormClosingEventArgs e)
{
var mru = new System.Collections.Specialized.StringCollection();
foreach (string item in _cmbConnString.Items)
{
mru.Add(item);
}
Properties.Settings.Default.RecentConnections = mru;
Properties.Settings.Default.Save();
base.OnFormClosing(e);
}

Because the connection strings are fairly long, the application uses the owner-draw feature of the ComboBox control to trim the connection strings when they are displayed in the drop down. This makes it a lot easier for users to find the connections they are looking for.


The owner-draw code for the ComboBox is as follows:

public Form1())
{
InitializeComponent();

// make combo owner-drawn
var cmb = _cmbConnString.ComboBox;
cmb.DrawMode = DrawMode.OwnerDrawFixed;
cmb.DrawItem = cmb_DrawItem;
}

// trim items in combo using ellipsis (they're very long)
void cmb_DrawItem(object sender, DrawItemEventArgs e)
{
var fmt = new StringFormat();
fmt.LineAlignment = StringAlignment.Center;
fmt.Trimming = StringTrimming.EllipsisPath;

var text = (string)_cmbConnString.Items[e.Index];
text = TrimConnectionString(text);

var brush = (e.State & DrawItemState.Selected) != 0
? SystemBrushes.HighlightText
: SystemBrushes.WindowText;

e.DrawBackground();
e.Graphics.DrawString(text, _cmbConnString.Font, brush, e.Bounds, fmt);
e.DrawFocusRectangle();
}

// trim connection string for display
string[] _keys = new string[] { "Provider", "Initial Catalog", "Data Source" };
string TrimConnectionString(string text)
{
var sb = new StringBuilder();
foreach (var item in text.Split(';'))
{
foreach (var key in _keys)
{
if (item.IndexOf(key, StringComparison.InvariantCultureIgnoreCase) > -1)
{
if (sb.Length > 0)
{
sb.Append("...");
}
sb.Append(item.Split('=')[1].Trim());
}
}
}
return sb.ToString();
}

The code works by splitting the connection string into key/value pairs and then keeping only the parts that help identify the connection. Configuration options are removed for clarity. For example, here is a typical OleDb connection string in all its glory:

Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=AdventureWorks;Data Source=spock\sqlexpress

And here is the trimmed version:

SQLOLEDB.1...AdventureWorks...spock\sqlexpress

The trimmed version shows only the information that is relevant for selecting an entry from a long list.


Get the database schema


Once the user picks a connection string, the next step is to use the connection to obtain a database schema. The schema is a list of the tables, fields, and relations in the database. It describes the structure of the database.


We retrieve the database schema using an auxiliary class called OleSDbSchema. This class extends the system DataSet class with a ConnectionString property. Setting this property populates the OleSDbSchema with tables that have the same structure as the tables in the database (but no data).


The code below shows how the OleSDbSchema class obtains the database schema (this version is simplified for clarity; please refer to the source code for a complete version, including code that retrieves constraints, relations, and stored procedures):

// Gets or sets the connection string used to fills the schema.
public string ConnectionString
{
get { return _connString; }
set
{
if (value != _connString)
{
_connString = value;
GetSchema();
}
}
}
void GetSchema()
{
// initialize this DataSet
this.Reset();

// go get the schema
EnforceConstraints = false;
using (var conn = new OleDbConnection(connString))
{
conn.Open();
GetTables(conn);
conn.Close();
}
}
void GetTables(OleDbConnection conn)
{
// add tables
var dt = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
foreach (DataRow dr in dt.Rows)
{
// get type (table/view)
var type = (string)dr[TABLE_TYPE];
if (type != TABLE && type != VIEW && type != LINK)
{
continue;
}

// create table
var name = (string)dr[TABLE_NAME];
var table = new DataTable(name);
table.ExtendedProperties[TABLE_TYPE] = type;

// save definition in extended properties
foreach (DataColumn col in dt.Columns)
{
table.ExtendedProperties[col.ColumnName] = dr[col];
}

// get table schema and add to collection
try
{
var select = GetSelectStatement(table);
var da = new OleDbDataAdapter(select, conn);
da.FillSchema(table, SchemaType.Mapped);
Tables.Add(table);
}
catch { }
}
}

The code shows how tables and columns are created. Notice that the ExtendedProperties property is used to store additional information about each element. This allows us, for example, to distinguish between regular tables, views, and stored procedures since all these elements are represented by DataTable objects within the OleDbSchema.


The OleDbSchema class is described in detail in a separate article. We only use its basic features in this project.


Select table and fields to spell-check


Once the schema has been obtained, it is used to populate a TreeView control. The TreeView has nodes that represent the tables in the database, and each node has child nodes that represent the fields. Only string fields are included since they are the only ones eligible for spell-checking.


This is the code that populates the TreeView control:

// update table tree to reflect new connection string
void UpdateTableTree()
{
// initialize table tree
TreeNodeCollection nodes = _treeTables.Nodes;
nodes.Clear();

// populate using current schema
_treeTables.BeginUpdate();
foreach (DataTable dt in _schema.Tables)
{
if (_schema.GetTableType(dt) == TableType.Table)
{
// create new node, save table in tag property
var node = new TreeNode(dt.TableName);
node.Tag = dt;

// add string fields to node
foreach (DataColumn col in dt.Columns)
{
if (col.DataType == typeof(string))
{
var ndCol = node.Nodes.Add(col.ColumnName);
ndCol.Tag = col;
}
}

// add new node to the tree
if (node.Nodes.Count > 0)
{
nodes.Add(node);
}
}
}

// done
_treeTables.Sort();
_treeTables.EndUpdate();
}

The TreeView has check boxes next to each table and field. The check boxes require a fair amount of code to work in an intuitive, automatic manner. The code must ensure that only one table is selected, and is must handle the check boxes next to tables and fields. Specifically, checking a table automatically checks all its fields and un-checks all other tables. Un-checking a table will un-check all its fields. And checking a field automatically checks the parent table and un-checks all other tables.


Here is the code that manages the check boxes:

// handle check boxes
void _treeTables_AfterCheck(object sender, TreeViewEventArgs e)
{
if (_updatingTree)
{
return;
}

// start updating...
_updatingTree = true;

// get node that was clicked
var n = e.Node;

// clicked on table node
if (n.Tag is DataTable)
{
// apply check state to all child nodes (fields)
SetCheck(n, n.Checked);

// if this table is checked, uncheck all other tables
if (n.Checked)
{
foreach (TreeNode c in n.TreeView.Nodes)
{
if (c != n)
{
SetCheck(c, false);
}
}
}
}

// clicked on column node
if (n.Tag is DataColumn)
{
// update parent node state
bool check = false;
foreach (TreeNode c in n.Parent.Nodes)
{
check |= c.Checked;
}
n.Parent.Checked = check;

// if this node is checked, uncheck all other tables
if (n.Checked)
{
foreach (TreeNode c in n.TreeView.Nodes)
{
if (c != n.Parent)
{
SetCheck(c, false);
}
}
}
}

// done updating...
_updatingTree = false;
}
void SetCheck(TreeNode n, bool check)
{
n.Checked = check;
foreach (TreeNode c in n.Nodes)
{
c.Checked = check;
}
}

Load and spell-check the selected data


Once the user has selected one or more fields to spell-check, we need to read the data, spell-check each field, and keep only the records that have spelling errors in them. We use an OleDbReader to read the records and a C1SpellChecker to spell-check each one. Records that contain spelling errors are stored in a DataTable to be edited by the user.


Here is the code that reads the data and performs the spell-checking:

// get data table with the rows that contain spelling errors
DataTable GetSpellingErrors(OleDbDataAdapter da, List columns)
{
// create table and adapter
var dt = new DataTable();

// get table schema
da.FillSchema(dt, SchemaType.Mapped);

// make columns not being spell-checked read-only
foreach (DataColumn col in dt.Columns)
{
if (!columns.Contains(col.ColumnName))
{
col.ReadOnly = true;
}
}

// read rows with DataReader
var cmd = da.SelectCommand;
cmd.Connection.Open();
using (var reader = cmd.ExecuteReader())
{
while (reader.Read())
{
// read a row
var dr = dt.NewRow();
foreach (DataColumn col in dt.Columns)
{
var index = col.Ordinal;
dr[index] = reader.GetValue(index);
}

// check for errors
bool hasErrors = false;
foreach (string col in columns)
{
var text = dr[col] as string;
if (!string.IsNullOrEmpty(text))
{
var errors = _spell.CheckText(text);
if (errors.Count > 0)
{
hasErrors = true;
}
}
}

// keep rows that have errors
if (hasErrors)
{
dt.Rows.Add(dr);
}
}
}

// table has no changes
dt.AcceptChanges();

// done
return dt;
}

A simpler but less efficient alternative would be to read all the data into the DataTable first, then spell-check it and remove the rows that don't contain any spelling mistakes. The version listed above is better because rows without errors are discarded immediately and never added to the DataTable.


Show errors and allow editing


Now that we have a DataTable with all the errors, the next step is to show the table to the user so he can review and fix them. We use a C1FlexGrid control to do that. We use the grid's owner-draw feature to mark the typos with the familiar red wavy underlines, and attach a C1SpellChecker to the grid editor so the user gets a nice context menu with spelling suggestions while editing the cells.


Here is the code that binds the grid to the table containing the errors and implements the owner-draw logic:

// bind the grid and enable owner-draw to show the errors
_flex.DataSource = _dtErrors;
_flex.DrawMode = C1.Win.C1FlexGrid.DrawModeEnum.OwnerDraw;
_flex.OwnerDrawCell = _flex_OwnerDrawCell;

// draw wavy read lines under spelling errors
void _flex_OwnerDrawCell(object sender, C1.Win.C1FlexGrid.OwnerDrawCellEventArgs e)
{
// spell-check editable cells (unless we're just measuring)
if (!e.Measuring &&
_flex.Cols[e.Col].AllowEditing &&
e.Row >= _flex.Rows.Fixed)
{
var text = _flex.GetDataDisplay(e.Row, e.Col);
var errors = _spell.CheckText(text);

if (errors.Count > 0)
{
// draw cell as usual
e.Style = _errorStyle;
e.DrawCell();

// build list with error ranges
var ranges = new CharacterRange[errors.Count];
for (int i = 0; i < errors.Count; i )
{
ranges[i] = new CharacterRange(
errors[i].Start, errors[i].Length);
}

// create StringFormat to locate the error ranges
var sf = new StringFormat(e.Style.StringFormat);
try
{
sf.SetMeasurableCharacterRanges(ranges);
}
catch { }

// locate the error ranges
var rc = e.Style.GetTextRectangle(e.Bounds, null);
var rgns = e.Graphics.MeasureCharacterRanges(
text, e.Style.Font, rc, sf);

// draw wavy red underline for each range
foreach (var rgn in rgns)
{
rc = Rectangle.Truncate(rgn.GetBounds(e.Graphics));
for (Point pt = new Point(rc.X, rc.Bottom);
pt.X 2 < rc.Right;
pt.X = 4)
{
e.Graphics.DrawLines(Pens.Red, new Point[]
{
new Point(pt.X, pt.Y),
new Point(pt.X 2, pt.Y - 2),
new Point(pt.X 4, pt.Y)
});
}
}
}
}
}

And here is the code that connects a C1SpellChecker with the grid editor so the user can see the underlines while editing the cell and get a nice context menu with spelling suggestions and commands:

// connect event handler to customize the grid editor
_flex.SetupEditor = _flex_SetupEditor;

// enable spell checker in cell editor
void _flex_SetupEditor(object sender, RowColEventArgs e)
{
var tb = _flex.Editor as TextBox;
if (tb != null)
{
_spell.SetActiveSpellChecking(tb, true);
}
}

The form also contains a button that performs a modal (dialog-based) spell-check over the entire grid. That can be more convenient and comfortable than searching for the errors by inspecting each grid cell visually.


Implementing the modal check option is easy:

// perform modal check on the grid
private void _btnSpell_Click(object sender, EventArgs e)
{
var cols = new List();
foreach (Column col in _flex.Cols)
{
if (col.AllowEditing)
{
cols.Add(col.Name);
}
}
var speller = new FlexGridSpellChecker(_flex, cols.ToArray());
_spell.CheckControl(speller);
}

The code starts by building a list with the columns that are to be spell-checked. Then is uses the list to build a FlexGridSpellChecker object that implements the spell-checking interface required by the C1SpellChecker control on behalf of the grid. The FlexGridSpellChecker implementation is not especially interesting, so we won't reproduce it here. Please refer to the source code if you are interested in the details.


Save changes back to the database


Once the user has reviewed and corrected the errors, the only task remaining is saving the changes back into the database. To do this, we need an OleDbDataAdapter object with select and update commands. The select command is used to retrieve the data, and the update command is used to write changes back into the database.


Here is the code that creates the OleDbDataAdapter and writes the changes back into the database:

// save changes in the data table
void SaveChanges()
{
try
{
// build DataAdapter with select and update commands
var sql = _schema.GetSelectStatement(table);
var da = new OleDbDataAdapter(sql, ConnectionString);
var cmdBuilder = new OleDbCommandBuilder(da);
da.UpdateCommand = cmdBuilder.GetUpdateCommand();

// save the changes
da.Update(_dtErrors);

// mark table as clean
_dtErrors.AcceptChanges();

// disable save changes button until there are more changes
_btnSaveChanges.Enabled = false;
}
catch (Exception x)
{
// something went wrong? inform the user
var dr = MessageBox.Show(
this,
string.Format(Properties.Resources.FailedToSave, x.Message),
Application.ProductName,
MessageBoxButtons.OK,
MessageBoxIcon.Error);
}
}

This concludes the last part of the application. We walked over the steps required to select the database, spell-check its content, display the errors, allow users to fix them, and save the corrections back to the database. It was a long and fun journey!


Conclusion


We developed the C1DBSpellChecker application with two objectives in mind:



  1. To create a useful tool that fills a real need. Many sites and applications are driven by databases that contain spelling errors, and fixing them without a proper tool is difficult.

  2. To show some useful techniques required to implement real applications in .NET. The techniques covered include database management, owner-draw controls, application settings and resources, and more.

We hope both objectives have been achieved and you find them useful. If you have requests or suggestions for improving this document or the application, please post on our site. Thanks in advance.


Download the full project with source code: C1DBSpellChecker.zip