hash_table_list

Overview of Search with C Source Code

1. Introduction

A common task in programming is to search for some record in memory. This task is called data search. In the classical formulation of the problem, there are entries in computer memory for example “apple”, “pear”, “plum”, etc., which correspond to some keys, i.e. numbers. The task is to find the desired record with a specific key. I will simplify this task and assume that there is an array of numbers in which we need to find a certain number. To help myself, I will take in my opinion the best monograph on programming [1]. It contains many different search methods. I’ll look at a few of them here. The source codes of C programs will be attached to each of the search algorithms I am considering, and you will be able to try them out on your computer yourself and choose the “best” algorithm for yourself.
What is a good or bad search program? Unfortunately, there is no definite answer here. It depends on the size of the data array, the features of the machine system command (assembler) and the speed of the computer.
So let’s move on to considering the most interesting search algorithms from my point of view.

2. Sequential Search Method

This is the simplest method based on sequential search of data and search for the desired record. Let there be an array of N+1 numbers: A[0], A[1], A[2],…, A[j], A[j+1], …, A[N-1], Pattern[N]. The last element of the array is a number (pattern), which we will look for among the first N numbers of the array. This formulation of the problem makes it possible to make a faster sequential algorithm with fewer checks per cycle. The program returns the array index: 0 … (N-1) of the first found number equal to Pattern[N]. If the desired element is missing in the array, then the program returns an invalid index INVALID_INDEX = 0xFFFFu
C Code:

/* Defines */
#define INVALID_INDEX 0xFFFFu /* Invalid Index */

/*
Sequential Search
Input
sint16* ptrArray – pointer on the working array
and searched element is ptrArray[arraySize-1]
uint16 arraySize – array length
Return
uint16 – index in array
INVALID_INDEX – invalid index (the element is not found
*/
uint16 SequentialSearchMethod(sint16* ptrArray, uint16 arraySize)
{

sint16 search_value = ptrArray[arraySize-1];
sint16* ptrEndArray = ptrArray + arraySize — 1;
sint16 currentIndex = -((sint16)(arraySize — 1));
uint16 retIndex;

while(*(ptrEndArray+currentIndex) != search_value)
{

currentIndex++;

}

if(currentIndex < 0)
{

retIndex = (uint16)(currentIndex + arraySize — 1);

}
else
{

retIndex = INVALID_INDEX;

}

return retIndex;

}

Notes:
  • The complexity of the algorithm is O(1) at best case, and O(N) at worst case
  • This algorithm is used only for relatively small arrays
  • For large arrays this method takes a lot of time and is inefficient

3. Binary Search Algorithm

Let’s have a sorted array: A[0] <A[1] <A[2] <…, A[j] < A[j+1] <… < A[N-1]. See the article “Overview of Sorts with C Source Code” for sorting algorithms. The search algorithm is based on the fact that the middle element of the array is compared with the desired value (searchValue), while determining in which half of the array the desired value may be located. Next the middle element is taken in the selected half again and apply this method further. The complexity of the algorithm is determined by the logarithm with base 2:

S_formula1

The program returns the array index: 0 … (N-1) of the first found number equal to searchValue. If the desired element is missing in the array, then the program returns an invalid index INVALID_INDEX = 0xFFFFu
Let’s take for example an array of 20 odd numbers:
table_sorted_array_20_odd_numbers
Let’s look for the number searchValue = 23 in this array, then:
Step 1
beginNmb = 1, endNmb = 20
middleNmb = (beginNmb + endNmb)/2 =(1+20)/2 = 10 => C_middleIndex = middleNmb -1 = 10 – 1 = 9 => A[9]=19
(serachValue=23) > (A[9]=19) => beginNmb = middleNmb + 1 = 10 + 1 = 11
Step 2
beginNmb = 11, endNmb = 20
middleNmb = (beginNmb + endNmb)/2 =(11+20)/2 = 15 => C_middleIndex = middleNmb -1 = 15 -1 = 14 => A[14]=29
(serachValue=23) < (A[14]=29) => endNmb = middleNmb – 1 = 15 – 1 = 14
Step 3
beginNmb = 11, endNmb = 14
middleNmb = (beginNmb + endNmb)/2 =(11+14)/2 = 12 => C_middleIndex = middleNmb -1 = 12 -1 = 11 => A[11]=23
(serachValue=23) = (A[11]=23) => searchValue is found
C Code:

/* Defines */
#define INVALID_INDEX 0xFFFFu /* Invalid Index */

/*
Binary Search
Input
sint16* ptrArray – pointer on the working array
uint16 arraySize – array length
sint16 searchValue – search element
Return
uint16 – index in array
INVALID_INDEX – invalid index (the element is not found)
*/
uint16 BinarySearchMethod(sint16* ptrArray, uint16 arraySize, sint16 searchValue)
{

uint16 beginNmb = 1;
uint16 endNmb = arraySize;
uint16 middleNmb;
uint8 exitFlag = FALSE;
uint16 retIndex = INVALID_INDEX;

if(arraySize != 0)
{

do
{

middleNmb = (beginNmb + endNmb)/2; /* (beginNmb + endNmb) >> 1u */
if(beginNmb > middleNmb)
{

/* The element is not found */
retIndex = INVALID_INDEX;
exitFlag = TRUE;

}
else
{

if(searchValue >= ptrArray[middleNmb-1])
{

if(searchValue == ptrArray[middleNmb-1])
{

/* The element is found */
retIndex = middleNmb – 1;
exitFlag = TRUE;

}
else
{

beginNmb = middleNmb + 1;

}

}
else
{

endNmb = middleNmb – 1;

}

}

} while(FALSE == exitFlag);

}
else
{ /* arraySize = 0 */

retIndex = INVALID_INDEX;

}
return retIndex;

}

Note
In addition to binary search the Fibonaccian search is also used. See [1] for more information

4. Binary Search Tree

Let’s imagine the binary search from the previous section (Table 3-1) as a binary tree. See the Figure 4-1:

binary_search_tree

Figure 4-1: Binary Search Tree for an Array of size N=20

The search nodes with the corresponding position number are shown in circles. The arrows to the right of the node correspond to searchValue < A[k-1] and the transition to the left subtree corresponds to searchValue > A[k-1].

circle_node_with_text

The square nodes correspond to an unsuccessful search and the searchValue is missing from the array.

square_node_with_text

If you define a structure for a tree node:

typedef struct tree_node_data
{

/* Left/Right Tree node pointers */
struct tree_node_data * left_link;
struct tree_node_data * right_link;
/* Value */
sint16 value;

} tree_node_data_s;

Then the necessary tree can be built based on an array of such structures. For it the C code is used, which searches for the desired entry in the tree and if it does not find it, then builds a new node for the undiscovered entry. This way a dynamic tree is got that can be expanded. In this example it is not possible to delete an entry. See [1] if you need it.
C Code:

/*
Tree Search
Input
node_data_s* ptrRoot – pointer on the root of the tree
uint16* ptr_freeIndex – pointer on the node array
sint16 searchValue – search element
Return
tree_search_results_e
node_is_found, node_is_new, node_is_over – search result
*/
tree_search_results_e TreeSearchMethod(node_data_s* ptrRoot, uint16* ptr_freeIndex, sint16 searchValue)
{

tree_search_results_e retResult = node_is_over;
node_data_s* ptrCurrentNode = ptrRoot;
uint8 exitFlag = FALSE;

do
{

if(searchValue > ptrCurrentNode->value)
{

if(NULL != ptrCurrentNode->right_link)
{

ptrCurrentNode = ptrCurrentNode->right_link;

}
else
{ /* NULL == ptrCurrentNode->right_link */

if(TREE_MAX_SIZE > *ptr_freeIndex)
{ /* Exit with New Node */

ptrCurrentNode->right_link = &ptrRoot[*ptr_freeIndex];
ptrCurrentNode = ptrCurrentNode->right_link;
ptrCurrentNode->left_link = NULL;
ptrCurrentNode->right_link = NULL;
ptrCurrentNode->value = searchValue;
*ptr_freeIndex += 1;
retResult = node_is_new;
exitFlag = TRUE;

}
else
{ /* Exit with Error */

retResult = node_is_over;
exitFlag = TRUE;

}

}

}
else
{ /* searchValue <= ptrCurrentNode->value */

if(searchValue == ptrCurrentNode->value)
{ /* Exit with found state */

retResult = node_is_found;
exitFlag = TRUE;

}
else
{ /* searchValue < ptrCurrentNode->value */

if(NULL != ptrCurrentNode->left_link)
{

ptrCurrentNode = ptrCurrentNode->left_link;

}
else
{ /* NULL == ptrCurrentNode->left_link */

if(TREE_MAX_SIZE > *ptr_freeIndex)
{ /* Exit with New Node */

ptrCurrentNode->left_link = &ptrRoot[*ptr_freeIndex];
ptrCurrentNode = ptrCurrentNode->left_link;
ptrCurrentNode->left_link = NULL;
ptrCurrentNode->right_link = NULL;
ptrCurrentNode->value = searchValue;
*ptr_freeIndex += 1;
retResult = node_is_new;
exitFlag = TRUE;

}
else
{ /* Exit with Error */

retResult = node_is_over;
exitFlag = TRUE;

}

}

}

}

} while(FALSE == exitFlag);

return retResult;

}

Notes:
  • The complexity of the algorithm is determined by the logarithm with base 2:
    S_formula1
  • To build a binary tree, the MakeBinaryTree() function is used. See the Search.c/h
  • Sometimes the resulting trees turn out to be degenerate (degeneracy unbalanced tree). To avoid it the AVL algorithm is used. The AVL method (from the two persons: Adelson-Velsky and Landis) prepares a self balancing binary search tree. See [1] for more information

5. Hash Table with the Separate Chaining

This method consists in determining the set of lists that the data is being searched for. The desired list for a given array value is determined using the hash function. Often the hash function is used to calculate the remainder of the division by some prime number P:
hashFunction(Value) = Value % P
The result lies in the range: 0 … (P-1). Thus some array can be defined as P lists. Each list will include values with the same remainder from division by a prime number P.
Suppose there is an array of data: 15, 5, 3, 4, 16, 21, 10
Let’s choose for example: P = 11. Then hashFunction(Value) = Value % 11. See the Table 6-1:

table_hash_function

Then for our array we have the following hash table:

hash_table_list

Figure 5-1: Example: Hash Table for the array: 15, 5, 3, 4, 16, 21, 10 with the hashFunction = Value % 11

Arrays can be dynamic. The new array element will be added to the corresponding hash function list. If the values of the array are random, then the lists will be approximately the same in length and the search will be carried out P times faster compared to one common list.
If you define a structure for a list node:

typedef struct list_node_data
{

/* List node pointer */
struct list_node_data * link;
/* Value */
sint16 value;

} list_node_data_s;

Then the necessary list can be built based on an array of such structures. For it the C code is used, which searches for the desired value in the list and if it does not find it then builds a new list item for the undiscovered value. This way a dynamic hash table is got that can be expanded. In this example it is not possible to remove nodes from the lists. See [1] if you need it.
C Code:

/*
Hash Table Initialization
Input
list_node_data_s** ptrHashTable – pointer on the Hash Table
Return
void
*/
void HashTableInit(list_node_data_s** ptrHashTable)
{

uint16 loopIndex;

for(loopIndex = 0; loopIndex < HASH_FUNCTION_MODULO; loopIndex++)
{

ptrHashTable[loopIndex] = NULL;

}

}

/*
Hash Table Search Method
Input
list_node_data_s** ptrHashTable – pointer on the Hash Table
list_node_data_s* ptrListArray – pointer on the list node array
uint16* ptr_freeIndex – pointer on the node array
sint16 searchValue – search element
Return
list_search_results_e
list_node_is_found, list_node_is_new, list_node_is_over – search result
*/
list_search_results_e HashTableSearchMethod(list_node_data_s** ptrHashTable, list_node_data_s* ptrListArray, uint16* ptr_freeIndex, sint16 searchValue)
{

list_search_results_e retResult = list_node_is_over;
uint8 exitFlag = FALSE;
list_node_data_s* ptrCurrentAddress;
sint16 hash_value;

/* Hash Function: searchValue % HASH_FUNCTION_MODULO and Initializate the Current Address */
hash_value = searchValue%HASH_FUNCTION_MODULO;
if(hash_value < 0)
{

hash_value += HASH_FUNCTION_MODULO;

}
ptrCurrentAddress = ptrHashTable[hash_value];
if(NULL == ptrCurrentAddress)
{

if(LIST_MAX_SIZE > *ptr_freeIndex)
{

ptrListArray[*ptr_freeIndex].link = NULL;
ptrListArray[*ptr_freeIndex].value = searchValue;
ptrHashTable[hash_value] = &ptrListArray[*ptr_freeIndex];
*ptr_freeIndex += 1;
retResult = list_node_is_new;

}
else
{

retResult = list_node_is_over;

}

}
else
{

do
{

if(searchValue == ptrCurrentAddress->value)
{

retResult = list_node_is_found;
exitFlag = TRUE;

}
else
{

if(NULL == ptrCurrentAddress->link)
{

if(LIST_MAX_SIZE > *ptr_freeIndex)
{

ptrListArray[*ptr_freeIndex].link = NULL;
ptrListArray[*ptr_freeIndex].value = searchValue;
ptrCurrentAddress->link = &ptrListArray[*ptr_freeIndex];
*ptr_freeIndex += 1;
retResult = list_node_is_new;

}
else
{

retResult = list_node_is_over;

}
exitFlag = TRUE;

}
else
{

ptrCurrentAddress = ptrCurrentAddress->link;

}

}

} while(FALSE == exitFlag);

}

return retResult;

}

Notes:
  • The minimum complexity of the algorithm is O(1), the worst case is O(N), where N is the size of the array
  • For additional search acceleration the binary trees can be used described above instead of lists

6. Resume

Take the programs of various search methods from the site, try it and determine for yourself what is best for your application. Measure the time it takes to search for arrays of different sizes and make your choice.
Search.c/h
Functions:

uint16 SequentialSearchMethod(sint16*, uint16); – Sequential Search

uint16 BinarySearchMethod(sint16*, uint16, sint16); – Binary Search

tree_search_results_e TreeSearchMethod(tree_node_data_s*, uint16*, sint16); – Tree Search

void MakeBinaryTree(tree_node_data_s*, uint16*, sint16*, uint16); – Make the Binary Tree

void HashTableInit(list_node_data_s**); – Hash Table Initialization

list_search_results_e HashTableSearchMethod(list_node_data_s**, list_node_data_s*, uint16*, sint16); – Hash Table Method

7. Download the Search.c/h

You can download the files:
Search.c/h
with the button:

8. Literature / References

[1] Donald E. Knuth “The Art of Computer Programming“, Volume 3, ISBN 0-201-03801-3, Addison-Wesley, 1968