My Machine Learning Articles on CodeProject
About a week ago I wrote and submitted 2 articles for the Machine Learning and AI Challenge on CodeProject. While I will admit that I put these articles together rather hastily, they do demonstrate many of the core tasks in a machine learning project; such as: performing a preliminary review of the data, pre-processing data, building a machine learning pipeline, and performing comparisons of the results of several machine learning algorithms to determine the best model to use.
As usual, I used Anaconda with VS Code, Scikit-Learn, Pandas, Numpy, and all the other usual tools for an ML project.
Links to Articles
- Create Your First Machine Learning Model to Filter Spam – this shows how I had to deal with the “unusual” format that the labeled input data was in. Also, you will see an example of using the TfidfVectorizer which helped produce really good results.
- Use Machine Learning to Determine the Programming Language of Text – I had to knock the dust off of my regular expression skills for this one as the data was in XML (see snippet below – not pretty, but it got the job done). Additionally, the LabelEncoder was used to perform one-hot encoding
Regular Expressions for Parsing Data

Using Regular Expressions to Parse Data
Let me know what you think. Thanks.