<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	 xmlns:media="http://search.yahoo.com/mrss/" >

<channel>
	<title>Machine Learning &#8211; Anupinder Singh</title>
	<atom:link href="https://anupinder.com/tag/machine-learning/feed/" rel="self" type="application/rss+xml" />
	<link>https://anupinder.com</link>
	<description>&#34;everyday brings new choices&#34;</description>
	<lastBuildDate>Wed, 26 Jan 2022 15:21:59 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.4.5</generator>
	<item>
		<title>Deep Learning- A brief Introduction</title>
		<link>https://anupinder.com/deep-learning-a-brief-introduction/</link>
					<comments>https://anupinder.com/deep-learning-a-brief-introduction/#respond</comments>
		
		<dc:creator><![CDATA[Anupinder Singh]]></dc:creator>
		<pubDate>Wed, 26 Jan 2022 14:52:00 +0000</pubDate>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[Supervised Models]]></category>
		<category><![CDATA[Unsupervised Models]]></category>
		<guid isPermaLink="false">https://anupinder.com/?p=103</guid>

					<description><![CDATA[<p>Ever wondered how YouTube is able to give us relevant recommendations based on our taste in videos or how self-driving cars operate? All of this is possible because of Deep Learning. Deep learning is a machine learning approach that teaches the machines to learn by examples and experience. It is a technique where machines acquire ... <a title="Deep Learning- A brief Introduction" class="read-more" href="https://anupinder.com/deep-learning-a-brief-introduction/" aria-label="More on Deep Learning- A brief Introduction">Read more</a></p>
<p>The post <a rel="nofollow" href="https://anupinder.com/deep-learning-a-brief-introduction/">Deep Learning- A brief Introduction</a> appeared first on <a rel="nofollow" href="https://anupinder.com">Anupinder Singh</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Ever wondered how YouTube is able to give us relevant recommendations based on our taste in videos or how self-driving cars operate? All of this is possible because of Deep Learning. </p>



<p>Deep learning is a machine learning approach that teaches the machines to learn by examples and experience. </p>



<p>It is a technique where machines acquire skills without human intervention.</p>



<h2 class="wp-block-heading" id="what-is-deep-learning"><strong>What is Deep Learning?</strong></h2>



<p>It can be described as a machine learning model which enables the computer to perform classifications based on images, text, sound, etc. </p>



<p>These deep learning models are trained with a large amount of data and neural network architectures which may contain multiple layers. </p>



<p>As a result, the deep learning models are able to achieve a state of the art accuracy, which may exceed human-level performance in some scenarios.&nbsp;</p>



<p>Machine Learning vs Artificial Intelligence vs Deep Learning: Are all of them are same?</p>



<p>Artificial Intelligence is a generic term that refers to procedures that enable computers to imitate human nature. <a href="https://anupinder.com/machine-learning-let-us-get-started/" target="_blank" rel="noreferrer noopener nofollow">Machine Learning</a> can be described as a set of algorithms that are trained on data to increase their performance. </p>



<p>Whereas, Deep Learning is a machine learning technique that is inspired by human brain structure. It uses a multiple layered model framework called a neural network.</p>



<p>Deep Learning is a subset of Machine Learning, which is a subset of Artificial Intelligence. This can be understood by the Venn diagram given below:</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img fetchpriority="high" decoding="async" src="https://lh5.googleusercontent.com/9d5svMkiSl4V8NS7YTgVljOOYlTrJ_cbZ3K7bVZGBtDmd9t4YT421jyonJvAtIWChc8nW5_YXA3r5PvDbSxYYloh478bdCJJMYqbB4vMUQyEw9FZBSY7p4OaCEsLgKwE-LQRQcM" alt="Domain of Deep Learning" width="370" height="332"/></figure></div>



<h2 class="wp-block-heading" id="why-is-deep-learning-so-popular"><strong>Why is deep learning so popular?</strong></h2>



<p>Deep Learning models provide accurate results which enable consumer electronics to meet user expectations. Also, accuracy is important in the case of safety-critical products like self-driving cars. </p>



<p>Following are the reasons which make the deep learning models more accurate than ever:</p>



<ul><li>Deep learning models are trained using massive amounts of labelled data like the self-driving car models are trained using zillions of images and videos.</li><li>Deep Learning consumes high computing power like good quality GPUs, powerful clusters and cloud computing. These high quality computing machines help the deep learning models to lower the training time.</li></ul>



<p>Another reason why these models have gained so much popularity is that they do not require the feature extraction step. </p>



<p>The traditional machine learning models like Logistic regression, Decision trees, SVM, etc. cannot be used on raw data directly. They require a separate preprocessing step called feature extraction. </p>



<p>On the other hand, the artificial neural networks used in Deep Learning do not require the feature extraction step.</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh6.googleusercontent.com/vt-WH4oi9dBUzcepEa066lx25Cil9ODKjws29qpDwcWCUIZERtP81eIKMZKchPxke_3LStbrwp9O-iwWUdK2B17AFw8ERmNq0EYva7nbkDSbIFh1nhHxCI-s3aZq4Dg4qzWHMI0" alt="Basic Process of Deep Learning implementation approach"/></figure>



<blockquote class="wp-block-quote has-text-align-center has-large-font-size"><p><em>In other words, the feature extraction step is part of the process which takes place in the artificial neural network.</em></p></blockquote>



<h2 class="wp-block-heading" id="how-does-deep-learning-work"><strong>How does deep learning work?</strong></h2>



<p>Deep Learning models implement Artificial Neural Networks which imitate the way the human brain computes information. </p>



<p>The training process involves unknown elements in the input distribution to extract features and discover useful data. </p>



<p>This training process occurs on multiple levels results for accurate computations.</p>



<p>No model is considered perfect, we need to choose the algorithms depending upon the nature of the task to be performed. </p>



<p>Gaining a proper understanding of all the elementary algorithms is required to choose the relevant algorithm.</p>



<h2 class="wp-block-heading" id="deep-learning-algorithms"><strong>Deep learning algorithms</strong></h2>



<p>It is the fastest-growing tech and In order to implement it, learning about various models is mandatory. </p>



<p><strong>There are two types of models in deep learning: <a href="https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning" target="_blank" rel="noreferrer noopener nofollow">super</a><a href="https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning" target="_blank" rel="noreferrer noopener nofollow">vised and unsupervised</a>.</strong></p>



<p><strong>Supervised models</strong> are trained using examples of a labelled dataset, i.e. the algorithm can use an answer key to evaluate the accuracy of training data. </p>



<p>Whereas in <strong>unsupervised models</strong>, unlabeled data is used and the algorithms try to gather information by extracting features and patterns on their own.</p>



<h3 class="wp-block-heading" id="supervised-models"><strong>Supervised Models&nbsp;</strong></h3>



<h5 class="wp-block-heading" id="convolutional-neural-networks-cnns"><strong><span style="text-decoration: underline"><em>Convolutional Neural Networks (CNNs)</em></span></strong></h5>



<p>The Convolutional Neural Network or CNN is built to handle a large amount of complexity for pre-processing and computation of data. </p>



<p>It is an advanced and more powerful variation of the classic artificial neural networks. They were developed for image detection and for image classification problems.</p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://lh5.googleusercontent.com/b7FBXpkIjnRlF2qKayIG2PdSCL0sCn4lypurAuRI4EjKpEiyojgsEhgo4cn6zwaYdqmVmZNQio5A9cNcBK0CMiTJSvlHpLU-BEUJjQgqJX7NApBG5eumD_fVC5ATyIHJTokwSBc" alt="CNN Deep Learning model demo"/></figure></div>



<p><strong>When to use the CNNs:</strong></p>



<ul><li>While using Image Datasets &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;</li><li>OCR document analysis &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;</li><li>When the model requires high complexity in computing the output&nbsp; &nbsp; </li><li>When the input data is 2-D but can be transformed to 1-D internally for    rapid processing.</li></ul>



<h5 class="wp-block-heading" id="classic-neural-networks-multilayer-perceptrons"><strong><span style="text-decoration: underline"><em>Classic Neural Networks (Multilayer Perceptrons)</em></span></strong></h5>



<p>The singular nature of the Classic neural networks helps it to adapt to the elementary binary patterns through a series of input, resembling the learning patterns of a human brain.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img decoding="async" src="https://lh5.googleusercontent.com/xm8Qpgvgzn6InxWJzRbxc-up3OfHYnc2OlVvk3M9vRinmiZd6Lh3M7rCMOSLXSJZ8lb2xP1RcntD5MlaizNKdKviNrK98-NboVyZ-aeUbZKIEc8t4TjraT-ZXTjKezVY-q5o5zg" alt="Multilayer perceptron based Deep Learning model" width="508" height="280"/></figure></div>



<p><strong>When to use the Classic Neural Networks:</strong></p>



<ul><li>Classification problems where the set of real values is given as input.</li><li>Tabular dataset, in the form of rows and columns i.e. the CSV files.</li></ul>



<h5 class="wp-block-heading" id="recurrent-neural-networks-rnns"><strong><span style="text-decoration: underline"><em>Recurrent Neural Networks (RNNs)</em></span></strong></h5>



<p>Recurrent Neural Networks or RNNs were discovered to be used for predicting sequences. LSTM or Long short-term memory is a renowned RNN algorithm with various possible use cases.</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img decoding="async" src="https://lh3.googleusercontent.com/FGOoXwKKMSu7kfLFcZXyrN1yWPCtf8lCEcuDmlZaF9BIovlQLHOCpvGCFvz1NmO9LdDozbC54UindY6gYx2dct6FlQPoA8EXyXeHJAOWRBj0Z_sZPTtRAE2yt0EOXbZpF7DCZpE" alt="Recurrent Neural Network" width="470" height="347"/></figure></div>



<p><strong>When to use the RNNs:</strong></p>



<ul><li>One to one mapping: a single input mapped to a single output, example: Image classification. &nbsp; &nbsp; </li><li>One to many mapping: a single input mapped to a sequence of outputs, example: Image captioning i.e. multiple words from a single image. &nbsp; &nbsp; &nbsp; &nbsp; </li><li>Many to one mapping: A sequence of inputs produces a single output, example: Sentiment Analysis i.e. binary output from multiple words&nbsp; &nbsp; &nbsp; &nbsp; </li><li>Many to many mapping: A sequence of inputs produces a sequence of outputs, example: Video classification i.e. splitting the video into multiple frames and labelling each frame separately.</li></ul>



<h3 class="wp-block-heading" id="unsupervised-models"><strong>Unsupervised Models</strong></h3>



<h5 class="wp-block-heading" id="boltzmann-machines"><strong><span style="text-decoration: underline"><em>Boltzmann Machines</em></span></strong></h5>



<p>Boltzmann machines, unlike the above models, do not follow any certain direction. Direction here means input layer→ hidden layer → output. </p>



<p>These machines have nodes connected to each other in a circular fashion of hyperspace like in an image.</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img loading="lazy" decoding="async" src="https://lh5.googleusercontent.com/S5bVMGgeGLVKANKjTJWTj3qLJGLJ9hdyOmeM6boLPSWSTrlxfuivc_Lh6DttvRZj4E4AANeVnsSNmi-iqK-Vfq9bPtdOKS4P4Zy22Dttd8yKYaBXbqjmsjVjFc0cYwQHE9ryDEs" alt="model based on Boltzmann Machines" width="488" height="305"/></figure></div>



<p><strong>When to use the Boltzmann Machines:</strong></p>



<ul><li>While working with a very specific set of data &nbsp; &nbsp; &nbsp; </li><li>To build a binary recommendation system&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;</li><li>Monitoring a system</li></ul>



<h5 class="wp-block-heading" id="self-organising-maps-soms"><strong><span style="text-decoration: underline"><em>Self-Organising Maps (SOMs)</em></span></strong></h5>



<p>Self-Organising Maps or the SOMs use unsupervised data and help with reducing the random variables present in the model (dimensionality reduction). </p>



<p>The output produced is always two dimensional for a self-organising map.</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img loading="lazy" decoding="async" src="https://lh3.googleusercontent.com/oL6RCUiKGe_8oIiv7QNCg8Y5y2EUN-0V35UUlivVO5B_MkjduuVQKIZ01lqluVWzFgyMoTChf7pKOmy54JUhWifvFr3ikmNnzyRu4Db4eyaKY1cfNRJ5bi-kzFKc6sVJ1X641v0" alt="model of Self-organizing map" width="566" height="343"/></figure></div>



<p><strong>When to use the SOMs:</strong></p>



<ul><li>When the data does not have an output or Y column &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;</li><li>Creative projects like music, text, videos, etc. produced by Artificial intelligence Dimensional reduction for feature detection&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </li><li>Exploring the projects to understand the framework behind the dataset</li></ul>



<h5 class="wp-block-heading" id="autoencoders"><strong><span style="text-decoration: underline"><em>AutoEncoders</em></span></strong></h5>



<p>The autoencoders work by automatically encoding the data based on the input values, followed by an activation function and then finally decoding the data as output.&nbsp;</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img loading="lazy" decoding="async" src="https://lh5.googleusercontent.com/W8LGwH2zP2fJPA6aBhjSsjJZHwAgrM7wXzsQKsCoNGIrJaLFLyOwVWawNl-eSPs7nJHrL3cdT0bf_cPWGPLYPdwE7U71XATCsJy1EqmEeN_W_Lf5xPBlhqkPgw6G854EnEv1kq0" alt="Auto encoder" width="517" height="294"/></figure></div>



<p><strong>When to use the AutoEncoders:</strong></p>



<ul><li>Feature or dimensionality detection &nbsp; &nbsp; &nbsp; &nbsp; </li><li>Building powerful recommendation systems&nbsp; &nbsp; &nbsp; &nbsp;</li><li>Performing encoding on massive datasets.</li></ul>



<h2 class="wp-block-heading" id="deep-learning-applications"><strong>Deep learning Applications</strong></h2>



<p>Deep Learning, a subset of machine learning, has become a buzzword in the field of artificial intelligence. </p>



<p>It enables the computers to learn from past experiences and examples, helping them to solve complicated problems without human involvement.</p>



<p>What exactly are the problems being tackled by deep learning?</p>



<p>Following are a <a href="https://www.mathworks.com/solutions/deep-learning.html" target="_blank" rel="noreferrer noopener nofollow">few major examples</a> where deep learning plays an important role:</p>



<p><strong><span style="text-decoration: underline">Healthcare:</span></strong> Deep learning is a fast-growing trend in the healthcare industry. The sensors and devices that provide real-time data about patients like overall health condition, heartbeat count, blood sugar level, etc. use deep learning. Apart from this, the pharmaceutical companies also implement these algorithms for disease detection, image segmentation, etc.&nbsp;</p>



<p><strong><span style="text-decoration: underline">Virtual Assistant:</span></strong> Virtual assistants have various applications nowadays. They act like chatbots, online training instructors, etc. The main area of application of virtual assistants is speech recognition, text to speech recognition, and vice versa using natural language processing. All this is possible due to deep learning. Siri, Alexa, Cortana, Google Assistant, etc. are some of the most popular virtual assistants.</p>



<p><strong><span style="text-decoration: underline">Social Media:</span></strong> Deep learning helps Twitter to enhance its performance. These models access and analyse a lot of data in order to learn about user preferences. Not only this, Facebook uses it to improve its user experience by recommending relevant pages, posts, friends etc. In addition to this, Instagram uses its models to prevent cyberbullying and eliminate controversial comments.</p>



<p><strong><span style="text-decoration: underline">Chatbots:</span></strong> Chatbots help in solving customer problems in just a few seconds using Artificial Intelligence to chat via text or text to speech. Chatbots help in consumer interaction, marketing on social media platforms, and instant response to clients. They use machine learning and deep learning models to generate various types of reactions.</p>



<p><strong><span style="text-decoration: underline">Self-driving cars:</span></strong> Self-driving cars operate using machine learning and deep learning algorithms.</p>



<blockquote class="wp-block-quote has-large-font-size"><p><em><strong>“</strong>Self-driving cars are the natural extension of active safety and obviously something we should do”. -Elon Musk.</em></p></blockquote>



<p>They are able to detect objects near the car, understand the traffic signals, detect the distance between the car and the other vehicles, etc. Tesla is the most renowned self-driving car in the market.</p>



<h2 class="wp-block-heading" id="limitations-and-challenges"><strong>Limitations and Challenges</strong></h2>



<p>Although deep learning is an <a href="https://www.google.com/aclk?sa=l&amp;ai=DChcSEwjjgtyp48L1AhVHDisKHSIZCzUYABAAGgJzZg&amp;sig=AOD64_2iD_FtoNNIBa6V-37ap6xelVQDag&amp;q&amp;nis=1&amp;adurl&amp;ved=2ahUKEwiGttWp48L1AhU1TGwGHT5fBpgQ0Qx6BAgDEAE" target="_blank" rel="noreferrer noopener nofollow">expanding technology</a> in various domains. It comes with a number of limitations and challenges:</p>



<ul><li>A large amount of data is required to train the models to achieve accurate results.</li><li>Training the deep learning models is a bit expensive since high quality GPUs and hundreds of powerful machines are required.</li><li>There is no predefined framework to help in selecting the relevant <a href="https://www.mathworks.com/products/deep-learning.html" target="_blank" rel="noreferrer noopener nofollow">deep learning tools</a>. As a result, adopting deep learning skills becomes difficult.</li><li>The data needs to be cleaned before applying any algorithm on it. Irrespective of how efficient the model is, without <a href="https://anupinder.com/data-cleaning-in-a-nutshell/" target="_blank" rel="noreferrer noopener">data cleansing</a>, it will deliver inaccurate results</li></ul>



<h2 class="wp-block-heading" id="conclusion"><strong>Conclusion</strong></h2>



<p>With the increase in the deployment of big data, deep neural network architecture and computational power, the conventional predictive models have improved in terms of accuracy and efficiency. </p>



<p>The number of organizations adopting big data and advanced technologies like artificial intelligence, machine learning, the Internet of things, etc. have grown and will continue to grow in the near future.&nbsp;</p>
<p>The post <a rel="nofollow" href="https://anupinder.com/deep-learning-a-brief-introduction/">Deep Learning- A brief Introduction</a> appeared first on <a rel="nofollow" href="https://anupinder.com">Anupinder Singh</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://anupinder.com/deep-learning-a-brief-introduction/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Data Cleaning in a Nutshell</title>
		<link>https://anupinder.com/data-cleaning-in-a-nutshell/</link>
					<comments>https://anupinder.com/data-cleaning-in-a-nutshell/#respond</comments>
		
		<dc:creator><![CDATA[Anupinder Singh]]></dc:creator>
		<pubDate>Thu, 20 Jan 2022 16:27:50 +0000</pubDate>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Data Cleaning]]></category>
		<category><![CDATA[Data Cleansing]]></category>
		<category><![CDATA[Data Science]]></category>
		<guid isPermaLink="false">https://anupinder.com/?p=86</guid>

					<description><![CDATA[<p>“Better data beats fancier algorithms.” Garbage in, garbage out is the motto that needs to be followed to build an accurate machine learning model. If the data under analysis is not accurate, then it is not useful. Irrespective of how accurate your model is, without data cleaning, it will deliver biased and inaccurate results. Thus, ... <a title="Data Cleaning in a Nutshell" class="read-more" href="https://anupinder.com/data-cleaning-in-a-nutshell/" aria-label="More on Data Cleaning in a Nutshell">Read more</a></p>
<p>The post <a rel="nofollow" href="https://anupinder.com/data-cleaning-in-a-nutshell/">Data Cleaning in a Nutshell</a> appeared first on <a rel="nofollow" href="https://anupinder.com">Anupinder Singh</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h4 class="has-text-align-center wp-block-heading"><em><strong>“Better data beats fancier algorithms.”</strong></em></h4>



<p>Garbage in, garbage out is the motto that needs to be followed to build an accurate machine learning model.</p>



<p>If the data under analysis is not accurate, then it is not useful. Irrespective of how accurate your model is, without data cleaning, it will deliver biased and inaccurate results. </p>



<p>Thus, data cleaning, also called data cleansing or data scrubbing, is one of the most crucial parts of machine learning.</p>



<h2 class="wp-block-heading"><strong>What is data cleaning?</strong></h2>



<p>Data cleansing can be understood as a process of making the data ready for analysis. </p>



<p>Eliminating null records and unnecessary columns, fixing the outliers (junk values), restructuring the data to enhance its readability, etc. are some of the components of data cleaning.</p>



<p>Data cleaning also focuses on increasing the accuracy of the dataset by rectifying the existing information, instead of just removing chunks of useless data.</p>



<h2 class="wp-block-heading"><strong>Steps involved in data cleaning</strong></h2>



<p>There is no particular procedure for data cleaning, it varies from one dataset to another. However, having a roadmap is essential to keep you on the right track.</p>



<p>Given below are the basic steps which can be followed to create a template for your data cleaning process.</p>



<h4 class="wp-block-heading"><span style="text-decoration: underline">Eliminating duplicates and irrelevant observations</span></h4>



<ul><li>Duplicate or redundant values affect the efficiency of the model to a large extent.  The data is repeated and may add towards either the correct side or incorrect side, thereby giving biased results. </li></ul>



<ul><li>The irrelevant data do not add any value to the dataset, thus should be dropped or removed to save resources like memory and processing time.</li></ul>



<h4 class="wp-block-heading"><span style="text-decoration: underline">Rectifying structural errors</span></h4>



<ul><li>Structural errors include inconsistencies in naming conventions, typos, and wrong capitalization. These typographical errors result in mislabeled classes or categories.&nbsp;</li><li>For instance, the model might treat “NA” and “Not Applicable” as two different categories, though they represent the same value. These structural variations make the algorithms very inefficient resulting in unfaithful results.</li></ul>



<h4 class="wp-block-heading"><span style="text-decoration: underline">Filter out the irrelevant outliers</span></h4>



<ul><li>Outliers are the values that do not fit in the dataset under observation. These values can be understood as the noise in the dataset.</li><li>Outliers arise due to manual errors or data entry mistakes. The Outliers are not always incorrect, so they should not be dropped until we have a valid reason.</li></ul>



<h4 class="wp-block-heading"><span style="text-decoration: underline">Handling missing data</span></h4>



<p>Handling missing values is the trickiest step in the data cleaning process. The missing values can’t be ignored or eliminated since they can represent something crucial. </p>



<p>Following are a couple of the most common methods to deal with the missing data:</p>



<ul><li>Removing the observations having missing values, but might result in losing some useful information.</li><li>Imputing the missing values based on the previous observations. Since it is based on assumptions and not actual observations, it does not add any value to the dataset and may result in losing the data integrity.</li></ul>



<h2 class="wp-block-heading"><strong>Some data cleansing tools</strong></h2>



<p><a href="https://youtu.be/NWqL3ZccBBM" target="_blank" rel="noreferrer noopener nofollow">Data cleaning</a> is the most important step in machine learning to get accuracy and efficiency. </p>



<p>Performing data cleansing on zillions of data manually is tedious and may result in errors. </p>



<p>This makes the data cleaning tools prominent since they help in keeping a large amount of data clean and consistent.</p>



<p>Openrefine, TIBCO Clarity, Trifacta Wrangler, IBM Infosphere, Cloudingo, Quality Stage, etc. are some of the most popular data cleaning tools.</p>



<h2 class="wp-block-heading"><strong>Conclusion</strong></h2>



<p>Working with clean data comes with a lot of advantages like improved efficiency, reduced error margin, accuracy, consistency, better decision making, and many more. </p>



<p>Thus, the data should be cleansed before fitting any model with it.</p>



<p>If you want to invest in Data cleaning then you can learn by implementing it using <a href="https://www.analyticsvidhya.com/blog/2021/06/how-to-clean-data-in-python-for-machine-learning/" target="_blank" rel="noreferrer noopener nofollow">Python</a> or <a href="https://towardsdatascience.com/data-cleaning-in-r-made-simple-1b77303b0b17" target="_blank" rel="noreferrer noopener nofollow">R.</a></p>
<p>The post <a rel="nofollow" href="https://anupinder.com/data-cleaning-in-a-nutshell/">Data Cleaning in a Nutshell</a> appeared first on <a rel="nofollow" href="https://anupinder.com">Anupinder Singh</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://anupinder.com/data-cleaning-in-a-nutshell/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Machine Learning- let us get started!</title>
		<link>https://anupinder.com/machine-learning-let-us-get-started/</link>
		
		<dc:creator><![CDATA[Anupinder Singh]]></dc:creator>
		<pubDate>Mon, 08 Feb 2021 18:33:02 +0000</pubDate>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Webinar]]></category>
		<category><![CDATA[webinar]]></category>
		<guid isPermaLink="false">https://anupinder.com/?p=50</guid>

					<description><![CDATA[<p>Machine learning is one of the most popular domain the new age application development programmers and companies are encashing on—just another field of computer science, which leverages on the applied practice of mathematics as well as statistics. Why this created the buzz? Because it reduced the intensive logic implementations for processing the massive quantity of ... <a title="Machine Learning- let us get started!" class="read-more" href="https://anupinder.com/machine-learning-let-us-get-started/" aria-label="More on Machine Learning- let us get started!">Read more</a></p>
<p>The post <a rel="nofollow" href="https://anupinder.com/machine-learning-let-us-get-started/">Machine Learning- let us get started!</a> appeared first on <a rel="nofollow" href="https://anupinder.com">Anupinder Singh</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Machine learning is one of the most popular domain the new age application development programmers and companies are encashing on—just another field of computer science, which leverages on the applied practice of mathematics as well as statistics.</p>



<h2 class="wp-block-heading">Why this created the buzz?</h2>



<p>Because it reduced the intensive logic implementations for processing the massive quantity of data(generally known as big data), and the results are promising in terms of finding patterns in the data resulting in better business-oriented decisions.</p>



<p>Now, as a beginner, the concept of Machine learning could be overwhelming as there has been plenty of scattered information available across the web, including various theoretical courses and proprietor documentations.</p>



<p>So, here I will try to get you a simple flow on how as a beginner, you can get your self to familiarize yourself with the machine learning domain and where you can start looking at in the first place.</p>



<p>The formal definition could be:</p>



<blockquote class="wp-block-quote has-text-align-center"><p><em>Machine learning(ML) is a field of computer science concerned with programs that learn as well as is concerned with the question of how to construct computer programs that automatically improve with experience.</em></p></blockquote>



<p>Now you might also be thinking about how artificial intelligence is different from machine learning, so here is a big picture for you. </p>



<p>Here you can see that machine learning is the subset or, in fact, the more specialized form of artificial intelligence. And, further supports the deep learning domain for more intense &amp; intelligent applications.</p>



<figure class="wp-block-image is-style-default"><img decoding="async" src="https://media-exp1.licdn.com/dms/image/C4D12AQEuJ1KSIgB7DA/article-inline_image-shrink_1000_1488/0/1595315404736?e=1623283200&amp;v=beta&amp;t=10HTKTJHY3HaDCwRR-9im6i6mWbwGPerBlvr-jv42-A" alt="No alt text provided for this image"/></figure>



<p>Now the next point to understand is why do we want the computer programs to improve with experience. it&#8217;s because:</p>



<blockquote class="wp-block-quote"><p>we have huge data and we want to make decisions or predictions from it</p></blockquote>



<blockquote class="wp-block-quote"><p><em><u>AND</u></em></p></blockquote>



<blockquote class="wp-block-quote"><p>we want computers to learn to identify patterns without being explicitly programmed to</p></blockquote>



<p>And as said, <a href="https://en.wikipedia.org/wiki/Data" target="_blank" rel="noreferrer noopener nofollow">DATA</a> is the new currency for this digital world and is priceless. Therefore, it&#8217;s essential to utilize it to achieve the unique potential for your business.</p>



<p>Great, you know why it is essential for computers to improve.</p>



<p>Now, as a programmer, what should you know So that this automation can be achieved.</p>



<h2 class="wp-block-heading">Types of machine learning</h2>



<p>Broadly there are three</p>



<h5 class="wp-block-heading"><strong>Supervised Learning</strong></h5>



<p>This is simplest to implement, where primarily the problems related to regression and classification are solved. And the most important is that the Data available for analysis is available in a structured way with minimum anomalies, and even if anomalies are present, they can be rectified by using statistical measures. </p>



<p>General use cases that are implemented under this: Image classifications, Fraud detections, weather/market forecasting, etc. So you can simply infer that where ever the simple predictions are supposed to be done that Supervised learning.</p>



<h5 class="wp-block-heading"><strong>Unsupervised Learning</strong></h5>



<p>This is again working on the same objective of prediction, but the complexity is increased. Because the data available for analysis is either minimally structured or totally unstructured. Therefore the added process of Clustering or Dimensionality Reduction is required to be performed before the process of predications can be put in place. </p>



<p>So this requires more insights into the working concepts of statistical procedures and is the next stage of learning in ML. The general use case implementations can be Customer segmentation, recommender system, Feature discoveries, etc.</p>



<h5 class="wp-block-heading"><strong>Reinforcement Learning</strong></h5>



<p>This is basically leveraging the power from both the supervised and unsupervised procedures with an addon factor of iterative learning if some error occurred(mispredictions) in the data interpretations. </p>



<p>The procedures(algorithms) implemented in this system are designed in such a way so that it can tune their attributes/parameters(variables) to test it against the variety of values and find the best combinations, for example, neural networks have a variety of parameters like the number of layers, the number of neurons in each layer, connection density between neurons, weights, etc. </p>



<p>The general use cases for such types of implementations are Robot navigation, learning tasks, game AI, self-driving cars, etc.</p>



<p>The interesting point is that corresponding to each type of learning there have been plenty of algorithms published as APIs under the various opensource ML libraries such as <a href="https://scikit-learn.org/" target="_blank" rel="noreferrer noopener nofollow">skLearn</a>, <a href="https://keras.io/" target="_blank" rel="noreferrer noopener nofollow">Keras</a>, <a href="https://www.tensorflow.org/" target="_blank" rel="noreferrer noopener nofollow">Tensorflow</a>, etc. and for data management is working memory(RAM) the primary libraries used are <a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener nofollow">panadas</a> and <a href="https://numpy.org/" target="_blank" rel="noreferrer noopener nofollow">Numpy</a>.</p>



<p>Here is a webinar discussion on the machine learning types and relevant stuff</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Get Started with Machine Learning" width="900" height="506" src="https://www.youtube.com/embed/HE9Vv4_xQe0?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div></figure>



<p>So, as a programmer, it has become very easy for you to implement your use cases, provided you know what problem you are trying to solve and what data you will be using along with which algorithm you are going to use and which library supports it.</p>



<h2 class="wp-block-heading">Machine Learning implementation steps</h2>



<ol><li>Defining your problem statement</li><li>Getting data from various sources and pre-processing it for feeding to the selected algorithm(s).</li><li>Model building by selecting the right ML algorithm and test it with data.</li><li>Optimize and improve(this requires a repeat of step 2 and step 3 till satisfactory results were produced)</li><li>Summarize the results/Tell a story by using various Data visualizations.</li></ol>



<p>That would be it if you followed these steps you are through with your ML implementation work.</p>



<p>Now the next point is how do I know which library to look into and which language shall be learned so that the implementation can be hassle-free.</p>



<h2 class="wp-block-heading">Possible Machine learning track </h2>



<ul><li><strong>Choose a programming language</strong>: <a href="https://www.python.org/" target="_blank" rel="noreferrer noopener nofollow">Python </a>OR<a href="https://www.r-project.org/about.html" target="_blank" rel="noreferrer noopener nofollow"> R programming</a>. I would prefer to have a python as a beginner as it&#8217;s easy to follow, and many libraries are supported by the ML community are programmed using Python. Apart from this should CRUD skills for <a href="https://www.khanacademy.org/computing/computer-programming/sql" target="_blank" rel="noreferrer noopener nofollow">SQL</a>. Also, it is not like that you required to be an expert in programming skills that you will become as you practice your work.</li><li>Practice your data processing/wrangling using Pandas &amp; NumPy. Also, you should practice with the <a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener nofollow">Matplot</a> library to get yourself familiarised with the data visualizations using various charts.</li><li>Now, as you are through with the first two stages, it is time to open your wings and get your hands dirty with algorithms from sklearn/Keras libraries or any other of your interest as per your problem statement. Take your time to work on various small implementations, start with regression-based algorithms, then classification, clustering, and so on. Spend some good time practising these as this will lay the foundation for your enterprise career.</li><li>So finally, it&#8217;s time to move on to the enterprise solutions used by the industry for processing real-time data like <a href="https://prestodb.io/" target="_blank" rel="noreferrer noopener nofollow">presto</a>, <a href="https://hive.apache.org/" target="_blank" rel="noreferrer noopener nofollow">HIVE</a>, <a href="https://hadoop.apache.org/" target="_blank" rel="noreferrer noopener nofollow">Hadoop</a>, <a href="https://aws.amazon.com/machine-learning/" target="_blank" rel="noreferrer noopener nofollow">AWS ML toolkits</a>, <a href="https://spark.apache.org/" target="_blank" rel="noreferrer noopener nofollow">SPARK</a>, etc.</li></ul>



<p>Moreover, apart from what all is mentioned above, each specific cloud service provider has its own service stack to support the machine learning environment within its platform. And it is always up to your inclination toward the provider, and you additionally learn their platform-dependent tools over and above what we have discussed.</p>



<p>In case if you have a different say or have something to discuss, feel free to start the discussion thread below. I would love to do so.</p>



<h4 class="wp-block-heading">Who am I to teach you about machine learning?</h4>



<p>Well, I have been working intensively in ML to solve my Ph.D. Research problem and have been through various ML projects to test out multiple hypotheses. </p>



<p>Apart from this, I have been mentoring the budding researchers working on finding solutions to complex problems in the <a href="https://www.cloudsimtutorials.online/" rel="nofollow noopener" target="_blank">cloud computing domain</a>. </p>



<p>You may read my brief career progress on the <a href="https://anupinder.com/about/" target="_blank" rel="noreferrer noopener">About page</a> or check my <a href="https://www.linkedin.com/in/anupinders/" target="_blank" rel="noreferrer noopener nofollow">LinkedIn</a>. </p>



<p>Look forward to having you in the webinar and have a great discussion.</p>



<p>Cheers!<br>Anupinder</p>
<p>The post <a rel="nofollow" href="https://anupinder.com/machine-learning-let-us-get-started/">Machine Learning- let us get started!</a> appeared first on <a rel="nofollow" href="https://anupinder.com">Anupinder Singh</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Page Caching using Disk: Enhanced 

Served from: anupinder.com @ 2025-08-01 03:48:41 by W3 Total Cache
-->