Documentation Displayer

What do you get when you mix file hosting APIs, Github APIs, PDF converters, and directory comparisons?

A documentation displayer bot of course! 

The Parts Required

In this release I created another bot for the keras repository, this time for documentation. I began with using the skeleton of Gabriel’s existing bot, and changing the functionalities that were unnecessary for the documentation checks. There were four main components of this release, and they were:

Anonymous File Hosting

PyGitHub

PDFKit

Filecmp

All of these were needed in order to accomplish a bot that would detect changes to the documentation in a pull request, and create a link to a file hosting service with a pdf of the built documentation.

Building The Bot

PyGithub was the first place to start as it gave me access to usernames, pull requests, and repository information. Using PyGithub I was able to clone the user’s pull request into a local directory, as well as clone the master branch of keras, and compare the docs folder.

Filecmp was needed in order to compare the directories in both structure, as well as file contents, to ensure the documentation has been changed. Using Filecmp I was able to get both the unchanged files, as well as the changed files to be used in the Github comment. After determining if the documentation had been changed, the next step was to build the documentation using mkdocs, which would generate an .html file to then be converted into a PDF.

After using mkdocs to build the documentation, I used PDFKit to convert html from a local file into a pdf. This solution works well, but not amazing, as some of the original html goes missing due to libraries loaded in such as bootstrap. Nonetheless this still works well, and converts the file appropriately.

Lastly it was time to host the file temporarily, so that the document can be linked within the Github comment. Anonymous Files was a perfect solution for this, and allowed me to POST and GET using python requests libraries.

End Results

Files have officially made their way into the docDisplayer bot comments, thanks to the components mentioned above. The bot will only be used if a change in the documentation is seen, otherwise it will not run. This was an interesting issue to work on that took several turns and had me carefully select the tools needed to accomplish this pull request.

Keras botting has been a wild ride.

Open Source Development: A Journey Through Github

In the past 12 months, I’ve had my world revolutionized by the introduction of open source developments and contributions. What started off as a professional option quickly became a strong interest, and a passion. Open source allows freedom like no other courses, allowing you to choose preferences, and get involved in opportunities you otherwise would not find available.

Where Did It Start?

My open source development began in September of 2018, taking the first iteration of open source development with Dave. I got an opportunity to earn a t-shirt participating in my first ever programming event: Hacktoberfest, and worked on many cool projects. Most notably I made a contribution to keras, a machine learning library used in python. Little did I know this minor contribution would be a major stepping stone for my future, and gave me inspiration to pursue further ideas. An additional project I worked on was a discord bot, and this was easily one of the most fun and enjoyable projects to work on. These two formed the pillars of my interest in open source, and lead me to where I am now, but it has been a rocky road.

The Journey

Flash forward to January of 2019 and a new semester, coming back into the second open source professional option. A smaller class size, more formal presentations on progress, and more freedom for projects to work on were a combination that had me excited to continue working. Throughout this semester I got wrapped up in keras, more specifically one issue in keras that lead me on a rollercoaster with several ups and downs, in several different directions. The original issue asked that two directories within keras be checked for relative and absolute imports within their files, and produce a warning. I was excited to tackle an interesting issue, but did not know what it had in store for me.

The Importer…

Throughout my releases I have tackled the same issue in a variety of ways, adapting it to use many different tools throughout my releases. The importer has gone through, and interacted with the following tools:

Pylint and Abstract Search Trees (AST)

Travis CI

Pylint custom checkers

Pip package manager (PyPi)

PyGithub

GithubAPI

Needless to say there has been a lot of changes.

My Takeaway

Earlier I mentioned two main projects from the fist open source course, keras documentation, and discord bot. Both of these formed my foundation by leading me to keras, and expanding my interest in bots. One question I receive often as a computer science student is “What do you want to do for work?” to which I never have a concrete answer, however I believe this open source iteration may have changed that. I got the ability to work with bots in a few of my releases, doing different actions and functionalities. The use of bots has always intrigued me, and I think may be something I look to continue pursuing development on. It seems I’ve come full circle from where I started, working with documentation, and bots, to now finishing up with a Github bot made for dealing with documentation.

Bad News First: Keras Bot Development

Update On Last Week

I am a firm believer in the philosophy of bad news first, so with that being said, there is no large pull request as promised. I made an error on my side and had the directories flipped with their respective checks, and only had realized this when creating the first pull request. By the time I got to this and pulled the changes to be the same as the keras master, the few existing problems had been fixed.

With that being said I moved on to fixing my previous PR for the keras bot and implemented some cool features.

Bot Link

In this update the comment is written using summary and detail tags in markdown to display error messages in drop down menus of file names. Additionally I updated some of the functionality and reduced the code.

Screen Shot 2019-03-29 at 12.05.39 PM 1

One of the requested changes was to use f strings in Python which I was unfamiliar with, but was quick to learn its power. F strings work with direct substitution within a string rather than using concatenation and type conversion.

Regular String

Screen Shot 2019-03-29 at 12.09.54 PM

F String

Screen Shot 2019-03-29 at 12.10.39 PM

What Next?

Moving forward from here, I looked through the issues in the keras repository for something to work on, however a lot of issues are created by users and are not “real issues” so to speak. In an effort to filter this, I searched by author, starting with Gabriel. I quickly found one that caught my eye. This issue involves using the bot again, however for a different purpose. This issue deals with documentation changes, with a problem being it is difficult to see changes in the documentation purely in the code of a pull request. With this issue in mind, this bots purpose will be to build the documentation if it was changed, and then render the html in a pdf and display in the comments so it is easier to see the changes made.

I feel confident in taking on this issue as I am now familiar with PyGithub and dealing with pull requests and comments. I think this will be a good challenge as it will involve learning and setting up Docker, which is needed to use athenapdf, the tool which converts html to pdf.

Closing Thoughts

While not everything went as planned and I did make an error, I kept my head up high and continued to look forward for things to work on, and previous fixes to be made, I am currently awaiting feedback on the pull request changes, but am hopeful!

Until next time…

from Joshua import Patience

Over the last little while, I have worked to develop a tool to be used for parsing imports in python. This tool has had several different versions and specifications, but this week finally found its use. This week I decide to get into the belly of the beast, and begin hashing away at the improper imports in the keras, and tests directories.

Identifying The Problem

The purpose of the tool is to check for absolute and relative imports in a specified directory or file. Thanks to some intuition, my tool does exactly this! The results are a dictionary of filenames as keys, with the error(s) as values, which tells me the problem and on which line.

Screen Shot 2019-03-22 at 1.15.46 PM

Screen Shot 2019-03-22 at 1.18.02 PM

Absolute vs Relative

Anywhere a relative import can be used, an absolute can as well, just as with absolute and relative paths. This idea can be seen as follows:

Absolute

import PackageA.A1
import PackageA.SubA.A2

Relative

from . import A1
from .SubA import A2

The Solution

With everything set up, and my tool working, I began to pick away at dictionary full of errors, starting with the absolute imports. There were over 58 files with errors in them, all with at least 1 error. I began to go through the errors solving them one by one. Currently I have completed all import errors in the keras directory, changing relative imports to absolute, with only the tests directory to go.

Closing Thoughts

Given how long the keras directory took, the tests directory will take even longer. The tests directory has less files, but more errors to be fixed. This gives me an opportunity to get my hands dirty, and touch a number of files in the repository.

Until next time…

Keras Bot Development

This week I got more Involved in a new project still closely related to the previous one. Gabriel’s keras bot is used to monitor pull requests on the github repo, and I looked into extending this bot to parse and warn about improper imports. I learned to work with PyGithub a powerful tool to interact with Github repos through python, as well as extended commands for git.

Where To Begin?

In order to see results from this I knew I needed to start by creating a repo solely to test this. I began by creating the repo, followed by creating the first pull request. I made sure to use an actual keras PR so the structure would be identical to the real thing.

I originally began by trying to implement my tool in its current state, however I faced a slight problem. PyGithub is powerful but faces setbacks, one example being the inability to access the PR Files, which was where I first faced an issue, and reached out to Gabriel. Thankfully Gabriel was able to provide a work around, cloning the repo using the user name and repo name. This worked like a charm, combatted the inability to access PR files, but created another problem. In any open source project, and pull requests there will be different branches, fundamentally straightforward, but difficult to mitigate in PyGithub.

Example of problem

Link to docs. The user will have the directory in their repos, but the branch by default goes to master. In PyGithub there are two methods related to branch; get_branches(), and get_branch(). The latter required an Id to get that branch which would require knowing the branch name. The first option provides a list of all branches on the repo, which is again not very helpful.

Silver lining

Everything else works! I said earlier I learned more about git commands this week, more specifically the clone command. I learned about the –single-branch flag that exists and the need to use it get the correct branch of the PR. For the time being I have it hardcoded to test and it works! Results are collected properly, and I am waiting to hear about how I should add them to the message sent to the user who submitted the PR.

Wrapping up

To wrap up this release I created my pull request, as the inaugural PR to the keras-bot repository. I described the changes in the pull request, being the additional script importTester.py, as well as the updates I made to the existing file. The major changes include testing the imports, and adding the error messages to the user message.

I didn’t want to bombard the user with walls of text with errors, so I limited it to 10 errors for relative and 10 for absolute, as well as including the total amount of files with errors.

Screen Shot 2019-03-15 at 1.42.39 PM

Closing thoughts
PyGithub is very cool and fun to work with, but does not encompass the full functionality that this tool needs, so for the time being I am still looking for a work around. This week gave me new insights into thinking of alternative flows, as a few times I became stuck but was able to change path. I would love to see this tool through and incorporate it to keras and the keras bot, just need to figure out the pesky branch!

Until next time…

Riding Solo

This week I’ve spread my wings, and separated my tool designed for parsing imports into its own repository. The unnamed parser has now found its home on GitHub at ImPyParser. The first step was creating the repo, and filling in a simple README to indicate what the project was.

Getting Started

With the repo up and README written, it was time to get my code onto the repo, but I wanted to carefully plan out how I wanted to do this. With my previous feedback in mind about wanting a more robust tool, I began to brainstorm. With the power of python and additional tools made for python, I decided to wrap my tool in a pip package.

Pip is the package installer for python that allows the installation of packages from the python index, and is a very easy to use tool, the basis being pip install [package]. Having used pip multiple times I wanted to create my own, so I began my research and development on a custom package.

Preparing Pip

Step 1 was to register myself with the python package index, PyPI and create an account so I would be able to upload my custom tool ImPyParser, and manage it as well.

Following registering the account, I downloaded the required tools, and created my executable that would be run when my package is called. This script is a python script that will receive parameters and use them to call pylint all from one command line argument and one package. This process included creating and modifying a setup.py file, which is responsible for the name of the project, version number, author, description, scripts, etc.

Next I added a licence, followed by uploading to PyPI.

Seeing Results

I was happy to see my project on PyPI, with all the version history and descriptions and all, it was a cool experience to see that. Another interesting thing to see was the ability to pip install my own package, something that I had done numerous times with other projects and packages, was now my own, and seeing it download and install successfully was rewarding. Screen-Shot-2019-03-06-at-4.15.34-PM

Circling Back

Picking up where I left off, I went back to Keras, back to the same issue, and back to Gabriel to talk about what I had created and get some ideas.

Screen Shot 2019-03-06 at 4.24.23 PM

I let him know of the main change, being more robust, as well as where he could find the project on GitHub, and PyPI, and eagerly awaited a response.

Gabriel was quick to respond and gave me an informative suggestion, with room to progress and continue integrating this into the keras project.

Screen Shot 2019-03-06 at 4.24.33 PM

I plan on doing two main things right of the bat. One is making sure my code is reliable, and polished, as well as beginning to look into Gabriel’s bot, and PyGithub.

Closing Thoughts

I believe it was good for me to branch out and try to move this project independantly, as well as learn about PyPI and registering custom packages. Moving forward I have enough on my plate to progress and try to integrate the tool into the bot to warn about invalid imports in Github pull requests. I am looking forward to what the next week has in store for me.

 

Until next time…

A Series of Unfortunate Events (Kind of)

Goose Chase

This week I went on a goose chase, a few of them if we’re being honest. Picking up from last week I continued to look into pylint for automating the testing process, as well as staying hopeful for a response from Gabriel regarding further suggestions. My goose chase began with me continuing to look into AST again, and the possibility of being able to get the sweet sweet . in the imports I so desired. This goose chase left me once again empty handed as I was still unable to find a solution.

Moving On

With this roadblock ahead I shifted my attention over to pylint and began experimenting with the implementation into my script only to be lead on another goose chase. One of the larger problems and complications of using pylint (or python for that matter) on Windows OS is the environment variables. pylint requires the use of custom plugins in order to run additional checkers, and said plugins are added by adding them to the PYTHONPATH environment variables. When a module does not exist in the environment variables this message is displayed.module

Needless to say, I saw a lot of this.

After sorting this out and having my custom plugin within my PYTHONPATH environment variable, I was ready to go. The link Gabriel gave me to look at was an excellent guide into creating a custom checker for pylint, however it too faced a large flaw. The example uses AST, which is something I am unable to use, this itself was not the problem, the way the imports are checked was. Essentially the checker function uses an AST node instance in order to check the imports, which is the only way to access the current file (as far as I am aware) in pylint.

Pylint Problems

With AST even further out of the question, I put all my belief in pylint, which was quick to disappoint. Pylint is run through the command line and takes the file to perform tests on as one of the parameters, which makes it impossible to get the current file within a checker (located in a separate file). With the inability to grab the current file, or read the current file, I can not even begin to test my script using pylint.

GitHub Guidance

Still awaiting my response from Gabriel, I reached out again to “nudge” him as to if he had seen my response.module

This was an answer I dreaded, but got nonetheless.

Where To Go From Here?

I was committed to bringing something to the table and finding some hope in this dying light, and after some digging I found it. It turns out I was looking in the wrong place, looking into pylint AST checkers, rather than raw checkers. These raw checkers gave me exactly what I needed, a way to access the current line, and parse imports. Using pylint loaded plugins and keras directories, I was able to test it with the following command:

​​​pylint keras/activations.py --load-plugins=absolute,relative

Where absolute and relative are my loaded plugins, running on one single keras file. For the time being I have not figured out how to disable all warnings except for mine, as the keras directory has a lot of pylint warnings and errors, but these are the results for activations.py

Screen Shot 2019-02-22 at 12.54.18 PM

The top 3 warnings indicated by the W are from my testers.

Closing thoughts

This was a long and windy road, but I believe I continued to make more progress. Continuing from here I will be shifting my focus to another issue until there is more feedback on this current one, and a stronger direction to go in. Pylint really kicked my butt this week, but I am glad I was able to learn it, lots of it, even parts I didn’t really need to. My next issue(s) have yet to be chosen, but will be from the keras project as I have enjoyed working in it so far.

Until next time…

Keras Communication

The Discussion

When I last wrote I spoke about awaiting a response for information and looking into the suggested ideas by Gabriel de Marmiesse, and ran into the same issue I had seen before with AST parsing imports. I explained this to Gabriel tagging him in the comment of the issue, but I knew the last time I attempted that it wasn’t seen. I can imagine it’s difficult to see every comment in every issue and moderate it all. With that being said I went to Gabriel’s GitHub profile to see if I could find anything.

Screen Shot 2019-02-15 at 11.28.03 AMGabriel has an email attached to his profile, so in an attempt to get his attention should he not see the comment, I sent him an email explaining the problem, and that I had previously reached out on GitHub as well.

Gabriel was quick to respond, and as the day progressed we commented back and forth discussing the problem. Ultimately I provided a short code snippet in the comment to demonstrate the issue, and I am currently awaiting a response.

On The Side

While waiting for answers and suggestions from Gabriel I looked into the other half of his recommendation to use pylint custom tests. I found the process to be surprisingly easy, as it operates similar to other python testing programs, and I wanted to integrate my existing tool (albeit not being used) into pylint as a test and a way to play around with pylint.

pylint works through command line and will test the files and/or directories provided for several tests, which can be toggled on and off, as well as add custom tests as discussed above.pylint_console_1

In this example we can see pylint looking for a config file, which determines which tests to run on the given file path, as well as return all of the exceptions and they’re name and line number.

Closing Thoughts

I am continuing to try and hash this out with Gabriel and the keras team to come to a solution that can satisfy their requirements and solve the issue. I am actively searching for alternate solutions and will not be shy to share them with Gabriel if I feel they are adequate.

 

Until next time…

Keras Continued

Release 0.2

As release 0.2 comes to an end, I have created my second pull request for keras, continuing my work with creating an import tool. Last week I discussed my issue about only reading keras relevant imports, and implemented the ignore terms array strategy. Using this strategy I began to get one step closer to an optimal result, but was not quite yet done as the tool needed some advancements.

Advancements Needed

One oversight on my end was ignoring comments from the files being read. In python comments follow this syntax

Screen Shot 2019-02-08 at 11.42.08 AM

In the first case it is easy to ignore, to see if a line starts with # and ignore everything following it on that line, the latter however is a bit more complex. In a few of the files in keras, there a multiline comments that contain examples of scripts, and some of these contain import statements within comments as seen below.

Screen Shot 2019-02-08 at 11.46.55 AM

I had to ignore these multiline comments, in order to receive clean and correct results. Following this I took to GitHub to create my pull request in an effort to:

  1. Show the work I have currently done for this tool, and the ideas I have brought forward
  2. Draw attention the (relatively dead) issue I worked on, in order to get some ideas on how to tackle the problems I am facing

Additionally the issue requested this tool be integrated with Travis CI, however I did not want to invest my time and effort into Travis before I knew I was on the right approach and I may end up scrapping everything anyway. The functionality of the script works effectively and can be integrated with Travis if acceptable.

The results

It would seem that the efforts I made in my pull request worked, as I received a comment on the post from gabrieldemarmiesse, the main moderator of the keras project giving me advice on what to work on. It would seem that I am going back to stage one, looking back into AST (abstract search trees), as well as pylint, this time creating custom tests for pylint. This seems like the right approach, and was where I began originally, but did not know about pylint custom tests, and faced problems with AST.

Next steps

The comment is still fresh so I am beginning to look into these tools suggested, and seeing how I can complete this or at least begin working on it. I am proud of the progress I made over this release, and even if the work I did is not being merged, the idea behind it, and the know how to achieve it is just as viable and effective. Open source is a big world, but even the little steps count.

Until next time…

Continuing in Keras

Following last week, I am continuing to bash through problems in Keras, and continue to get involved. I spoke heavily about the importance of making an effort to get involved with the community regarding the open source project, and this week this is exactly what I did. I reached out on the Keras Slack, to ask about contribution opportunities and where I could begin to work, but unfortunately the channel seems to be relatively dead in that respect. Nevertheless I found an interesting issue to work on for this release.

The Issue

Issue 12110 to be exact. This issue deals with import statements in python, and the importance of relative versus absolute imports, and where to use them. The issue explains that all files contained in the keras/keras directory should only use relative imports for keras functions and classes. Additionally another directory, keras/tests should be using only absolute imports for keras functions and classes. This problem is relatively (pun intended) straightforward however, I was surprised that a tool for testing/checking this does not currently exist, but there are libraries that will assist this process.

The kicker here and the most challenging part for me will be creating a tool that can check for this, and running that tool in Travis CI, as per request of the issue. I have no experience with Travis CI, but that’s what programming and open source is about! I look forward to learning more about Travis CI, and integrating my tests into it.

Strategic Planning

Going into this I knew that the right approach would not be to read all lines from every file within the respective directories and read for imports, as there must be an easier way, and/or a library that can do this. This sparked my search, bringing me to find tools such as Pep8 (now pystylecode), Flake8, and Pylint. These are all tools that are capable of searching the files to bring out different things, however none really do what I needed in regard to checking imports.

Then I found AST – Abstract Syntax Trees, a python library that would allow me to parse import statements with ease and analyze the parsed imports.

After working with AST, it was so close to being the all in one library I needed, but suffered one fatal flaw; it would not return names beginning with . which is the main thing I needed.

The Backup Plan

When all else fails, I decided to do the work on my own, and began hashing through parsing the import statements and determining which are relative, and which are absolute.

The problem I ran into after parsing imports was that the issue asked to specifically check that keras classes and functions use relative and absolute for the respective directories, but this does not mean ALL imports must be like that. I have reached out on GitHub to ask about any solutions or recommendations to this, but I am currently using an ignore list of library names for the time being.

Screen Shot 2019-02-01 at 12.45.36 PM

Action Shots

temp/test.py

Screen Shot 2019-02-01 at 12.47.21 PM

checkImports(dir) yields

Screen Shot 2019-02-01 at 12.49.35 PM

 

Next steps

The next steps are figuring out the problem mentioned before, as well as integrating this tool into Travis CI. I am looking forward to discussing a work around for this problem with others, and continuing to bash through and brainstorm further on the issue.

Until next time…