Natural Language Interface for Relational Database - What is it and How it can be made?


Database management systems are are systems used for accessing and manipulating information. The data can be manipulated using a set of keywords following of syntax rules. To perform operations on the database, it is required to learn the structured query language(SQL). Hence, a user who do not know SQL cannot directly access information from the database.

Natural Language Interface for Database(NLIDB). What is it?

It is a proposed solution to the problem of accessing information in a database using natural language like English, having no technical knowledge about language like database languages like SQL. It is a tool which can understand a user's query in natural language, convert it into appropriate SQL query, so that the user can get the required information from the database.

Various approaches for building NLIDB.

Symbolic or Rule based approach.

This is the approach in which the translation from natural language to SQL is done using human-crafted and curated set of rules. If you do not have good or enough training data, then statistical techniques cannot be applied and one has to go for rule based approach.
An example of a rule based system can look like:
Pattern... "capital"... <country>
Action: Report CAPITAL of row where COUNTRY=<country>
Meaning: If a user's request contains the word "capital" followed by a country name, then system should return the capital of the country <country>.

The rule based system has many disadvantages as compared to other systems as it required time, money and domain experts to make the set of rules and embed them in the system.

Statistical or corpus based approach.

This is a machine learning paradigm using statistical inference for automatically learning such rules, stated in above approach, through the analysis of large corpus or structured set of text related to natural language to SQL translation of real world examples. There are algorithms that extract "features" from input and use them to find relations and generalize using the given corpus. This analysis does not require much time or a domain expert but data and large amount of research into supervised machine learning you apply, in comparison to the rule based approach. 
There are various statistical techniques such as N-gram, Hidden Markov Model, Context free grammar(probabilistic), Decision Tree, etc. are employed as major Natural Language Processing methods.

Connectionist approach or using Neural Networks.

Neural Networks are computational model which is based on a large collection of simple neural units (artificial neurons), loosely analogous to the observed behavior of a biological brain.
A special class of Neural Network called Recurrent Network is widely being used in Natural Language tasks. The presence of feedback loop in recurrent networks makes them more suitable for temporal data tasks such as natural language understanding. This approach is based on distributed representations corresponding to statistical regularities in language.
Neural Networks, once trained using standard back propagation algorithm, the translator is able to recognize all valid queries presented and convert it into appropriate SQL command.
The main advantage of this technique is that the system can be built for any natural language, having appropriate dataset for the same. The recognition and translation time for any query is also decreased using such a system due to inherent parallelism exhibited by Neural Networks.

  1. SQL translator using Artificial Neural Network - N Prakash, K Garg and Y.C. Chopra.
  2. A Survey: Natural Language Interface to Databases - Jaina Patel, Jay Dave
  3. A Generic Model for Natural Language Interface to Database - B. Sujatha, S Viswanadha Raju.


  1. Hey Arpit , nicely summarized different approaches , do you have link to any live demo or video which shows this NL--> SQL implementation ?

    Also , what kind of database would be more suitable to natural language queries --> relational , no-sql , or some thing entirely new ?

  2. This comment has been removed by a blog administrator.


Post a Comment

Popular posts from this blog

Designing Distributed File Storage Systems - Explained with Google File System(GFS) - Part 1

Automating the Machine Learning Workflow - AutoML

Replication Explained - Designing Distributed Systems (Part 2)