Maurice Calhoun - Datasets Everywhere

Created: June 30th 2015 @ 8:37 AM

What are Datasets, per Wikipedia


A data set (or datasets) is a collection of data.


Most commonly a datasets corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Each value is known as a datum. The data set may comprise data for one or more members, corresponding to the number of rows.

To me a datasets is an array of items... an array of items. MYSQL, MSSQL, Oracle, PostgreSQL, SQLite, Mongo, etc... all contains datasets or tables. To access these datasets you must write database query in a language call SQL(Structured Query Language). After running or execute the query(SQL), our results is sometime a set of data. Awesome... so what's the problem? Remember a datasets is an array of items not just database objects.

What if our datasets does not come from a database, and when we receive it, it comes in the form of an array, serialize data, csv, excel, xml, etc. How do we query these items? I know most of the time the before stated format are the results, but what if they were the origins. How do we query it, how does that query language looks? My programming language of choice is PHP, and in that language we do query arrays, we parse them. Parsing arrays can be messy and problematic. Here an example


	$company             = array(
	    array(
	        'id'         => 1,
	        'first_name' => 'John',
	        'last_name'  => 'Doe',
	        'salary'     => 50000,
	        'sex'        => 'Male',
	        'role_id'    => 1

	    ),
	    array(
	        'id'         => 2,
	        'first_name' => 'Sally',
	        'last_name'  => 'Smith',
	        'salary'     => 65000,
	        'sex'        => 'Female',
	        'role_id'    => 1
	    ),
	    array(
	        'id'         => 3,
	        'first_name' => 'Jane',
	        'last_name'  => 'Jones',
	        'salary'     => 90000,
	        'sex'        => 'Female',
	        'role_id'    => 2

	    ),
	    array(
	        'id'         => 4,
	        'first name' => 'Peter',
	        'last_name'  => 'Doe',
	        'salary'     => 100000,
	        'sex'        => 'Male',
	        'role_id'    => 3
	    )
	);
    

If we want to get all the name, we can do this


	$results = array_column($company, 'name');
    

or this


	$results = array_map(function($person){
	    return $person['name'];
	}, $company);
    

That is not bad, but what if we need all the records where the last name is Doe and their salary is greater 45,000. It may look like this...


	$results = array_filter($company, function($person){
	        return ($person['last_name'] == 'Doe' &person['salary'] > 45000);
	});
    

That works, but it is inconstinct. There are so many what if's. The above are parsing function, not querying statement. But $company is a dataset, and if it was in a database, the previous requests would have been consistent and concise. That is because we would have written a query statement (SQL).


	Select * From company Where last_name = 'Doe' and salary > 50000;
    

Database are not special, and its datasets are not more important then any other kind. Datasets are datasets no matter where they come from. The only important datasets are the ones you are currently using. But I can not query them because the language I use does not see my datasets as database tables. This is what I would like to do


	$db = new DataSetDB();
	$db->load_database($company, 'company');
	$executives = $db->query("Select * From person Where last_name = 'Doe' and salary > 50000;");
    

This would be awesome, because it would convert my arrays, and make them ask as if it was an actual database. So I can now query it. But remember a datasets is a dataset, that means I can also do this


	$db->load_database($executives, 'executives');
	$results = $db->query("Select * From executives Where sex = 'Female' ");
    

That may seem silly, but it expresses the idea the even the results of a datasets can itself be a dataset.

What about, who is the high paid employee


	$db = new DataSetDB();
	$db->load_database($company, 'company');
	$highPaid = $db->query("Select *, MAX(salary) as salary From company;");
    

Thats great, but that is if we only had one dataset. What is we had many datasets, we add another datasets and do this


	$roles = array(
	    array(
	        'id'   => 1,
	        'name' => 'Employer'
	    ),
	    array(
	        'id'   => 2,
	        'name' => 'Manager'
	    ),
	    array(
	        'id'   => 3,
	        'name' => 'Director'
	    )                    
	);
    

Now I need to get all the Managers


	$db = new DataSetDB();
	$db->load_database($company, 'company');
	$db->load_database($roles, 'roles');
	$managers = $db->query("Select * From company Join roles On company.role_id = roles.id Where roles.name = 'Manager';");

I have tried some methods to get this to work, like writing regular expressions to parse a query statement. That's crazy, have you seen Mysql SELECT Syntax , that would be a massive undertaking. Why recreate the wheel, this syntax already exists.

After exhausting that possibility, I started to look back at the idea of a database. But here are my reservations about that,

  • I would have to create a database
  • I would have to also create the tables and columns
  • I would also have to manager that database

That is a lot to do, for just to query a dataset. Or is it? The part that I have the biggest concern about was the management of the database. Why? We already have the dataset, we don't need to store, we just need to query it.

I have realized that SQLite allows you to use a database file or your can create it in memory. In memory.... That's what I'm talking about. It all starts with


	$pdo = new PDO('sqlite:memory:');
    

Now I can create my tables and columns on the fly, and I can also import my data. I can run query on my data in a clean and concise way. The best part about this solution, is that there is no need for database management. The data has already been management somewhere else, I just needed to query it.

The datasets type does not make a difference. All we need to do is convert whatever we have to an array, and import it. The bonus is that the database would only be around for as long as we need it.

Ruby boast that everything is an object.

In Ruby, everything is an object. Every bit of information and code can be given their own properties and actions.

I boast that you can now in PHP, you can query any object that you can convert into an array. I call it datasets everywhere.

<?php


class DataSetDB{

    protected $databases = array();
    protected $datatypes = array(
        'integer' => 'INTEGER',
        'numeric' => 'INTEGER',
        'string'  => 'TEXT',
        'boolean' => 'INTEGER',
        'NULL'    => 'TEXT'
    );

    public function __construct($filenames = array()){

        if(is_array($filenames) && !empty($filenames))
        {
            $this->load_databases($filenames);
        }

    }
	
    private function get_type($var){ 
		
        if (is_array($var)) return "array"; 
        if (is_bool($var)) return "boolean"; 
        if (is_callable($var)) return "function reference"; 
        if (is_float($var)) return "float"; 
        if (is_int($var)) return "integer"; 
        if (is_null($var)) return "NULL"; 
        if (is_numeric($var)) return "numeric"; 
        if (is_object($var)) return "object"; 
        if (is_resource($var)) return "resource"; 
        if (is_string($var)) return "string"; 
        return "unknown type"; 
		
    }

    public function load_database($filename = NULL , $name = ''){

        if(is_null($filename))
        {
            throw new PDOException('Where is the database file');
        }    

        if(empty(trim($name)))
        {
            throw new PDOException('What is the table name?');
        }    

        $this->databases[$name] = $this->convert_to_array($filename);    
    }

    public function load_databases($filenames = array()){

        if(!is_array($filenames))
        {
            throw new PDOException('The Data must be an array!!!');
        }    

        foreach($filenames as $filename){
            $ext = pathinfo($filename, PATHINFO_EXTENSION);
            $name = basename( $filename , "." . $ext);

            $this->load_database($filename, $name);
        }
    }

    private function convert_to_array($filename){

        if(is_array($filename)){
            return $filename;
        }

        if(!file_exists($filename)){
            throw new PDOException('That file ('. $filename .') does not exist');
        }

        // Create your converter parse
        return array();

    }    

    public function loaded_databases(){
		
        return array_keys($this->databases);
		
    }

    public function is_database_loaded($name){
		
        $results = FALSE;

        if(array_key_exists($name, $this->databases))
        {
            $results = TRUE;
        }

        return $results;
    }    


    public function query($query = ''){
		
            try{

                if(empty(trim($query)))
                {
                    throw new PDOException('This is crazy it is called array_sql_search ... hello');
                }    

                if(!count($this->databases)){
                    throw new PDOException('There are no databases loaded');
                }                        

                $db = new PDO('sqlite::memory:');


                foreach($this->databases as $table => $table_data)
                {    
					
                    $fields = array_map(function ($field){
                        return strtolower(preg_replace("/[^A-Z0-9_]/i", '_', $field));
                    }, array_keys(current($table_data)));

                    $fields_type = array_map(function ($field){
                        return $this->get_type($field);
                    }, array_values(current($table_data)));

                    $fields = array_combine($fields, $fields_type);                    

                    $create_fields_str = join(', ', array_map(function ($field, $value){
                            $value = $this->datatypes[$value];
                            return "$field $value NULL";
                    }, array_keys($fields), array_values($fields)));


                    $db->beginTransaction();

                    $create_table_sql = "CREATE TABLE IF NOT EXISTS $table ($create_fields_str)";
                    $db->exec($create_table_sql);

                    $insert_fields_str = join(', ', array_keys($fields));
                    $insert_values_str = join(', ', array_fill(0, count($fields),  '?'));
                    $insert_sql = "INSERT INTO $table ($insert_fields_str) VALUES ($insert_values_str)";
                    $insert_sth = $db->prepare($insert_sql);

                    $inserted_rows = 0;

                    foreach($table_data as $value){
                        $insert_sth->execute(array_values($value));
                        $inserted_rows++;        
                    }

                    $db->commit();                    
                }    

                $resource = $db->query($query);    

                return $resource->fetchAll(PDO::FETCH_ASSOC);

            }
            catch(PDOException $exception)
            {
                return $exception->getMessage();
            }
    }

}

?>

Copyright © 2015 Maurice Calhoun. All rights reserved.