DataConnector | ComponentOne
ADO.NET provider for CSV / Schema Discovery / RowsToScan
In This Topic
    RowsToScan
    In This Topic

    RowsToScan is a connection string property. It is used to specify the number of rows representing the virtual database from local files that should be scanned. This scanning process helps to accurately assign data types based on the section of the table being read.

     

    When creating the connection string, you can define this property as a string. For example:

    C#
    Copy Code
    connectionString = @'....detectionschemetype=RowScan;rowstoscan=2000;...';

    or using the connection string builder of a specific Provider:

    C#
    Copy Code
    C1[Provider]ConnectionStringBuilder builder = new C1[Provider]ConnectionStringBuilder();
    builder.DetectionScheme = DetectionSchemeType.RowScan;
    builder.RowsToScan = 200;

     

    When 'DetectionScheme' is not specified, 'RowsToScan' will be ignored, and the data type for each column will default to 'string'. The default value for 'RowsToScan' is set at 500 rows.

    The allowed values for 'RowsToScan' are as follows:

     

    Every value lower than -1 will throw an error of type C1DataConnectorProviderException with the appropriate message: “Value of RowsToScan cannot be less than -1.”

     

    How do we determine what data type suites the best for a particular column

    Data Types we support

    1. bool : true, false 
    1. TimeSpan
      1. Represents a duration or time interval.
      2. Used to measure the elapsed time between two DateTime instances.
      3. Doesn't include information about a specific point in time.
      4. ex: 01:30:00
    2.  DateTime:
      1. Represents a point in time, typically expressed as a combination of date and time.
      2. It includes both a date and a time component.
      3. Doesn't account for time zone information.
      4. ex: 11/27/2023 11:56:34 AM
    3. DateTime Offset
      1. Represents a point in time with an offset from UTC (Coordinated Universal Time).
      2. Includes both a date and time component, similar to DateTime, but also has information about the time zone offset.
      3. Useful when dealing with scenarios where time zone information is important.
      4. It contains “+“ or “T“
    1.  AssignedNumbers
      1. SByte: -128 to 127
      2. Int16: -32,768 to 32,767
      3. Int32: -2,147,483,648 to 2,147,483,647
      4. Int64: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
    2. UnassignedNumbers
      1. Byte: 0 to 255
      2. UInt16: 0 to 65,535
      3. UInt32: 0 to 4,294,967,295
      4. UInt64: 0 to 18,446,744,073,709,551,615
    3. FloatingNumbers
      1. Single(float): ±1.5 x 10-45 to ±3.4 x 1038
      2. Double: ±5.0 × 10-324 to ±1.7 × 10308
      3. Decimal: ±1.0 x 10-28 to ±7.9228 x 1028

     Choosing the appropriate type

    1. Normally if a column contains only one data type that is the data type of the whole column.
    2. If a column contains a String, then the whole column is treated as a column that holds strings.
    3. If a column contains two different data types, then the whole column will be treated as a string.
    4. Exception when column holds numeric values but different types of numbers:

      Case 1: Floating numbers and AssignedNumbers or UnassignedNumbers  then the whole column will be treated as it contains only floating numbers.

      Case 2: AssignedNumbers and UnassignedNumbers  : the best types that will result in the lowest memory occupied.