So last week I overheard one of our developers’ discussion with the account manager who happens to be a tech person as well. They were talking about errors encountered by a client’s website written in Python Django. The developer reported that the site just began crashing whenever this or that script starts to write things to the website’s log file and they were expecting that the user under which the site runs has not enough permission to perform logging. It was reasonable since the whole website was moved from one hosting provider to another and it might have been encountering configuration issues. I ignored it. I was preparing for a prototype presentation that afternoon.
That same developer works for my team and even joined me in the presentation. When we got back to the office, she said that she would have to stay a little longer to work on the issue she was discussing with the account manager earlier. That’s when I offered my help. It was Friday afternoon. Although I still got a lot in my task list, I knew that I better start on them the following week.
I asked her to describe to me how the site runs. She said that the site has this main script that runs all the other scripts. The other scripts perform the logging. I asked her to show me a sample of the logs and she showed me what was on her screen. I was expecting a log meant for audit trails but the logs appeared to be those default logs for debugging the website. I could have argued that it should no longer be there in production but I let it go. She was doing maintenance work anyway and debug logs can be helpful. I then asked her if everything was working before. She said yes. I told her it should have nothing to do with permissions then unless the hosting provider messed up with user permissions in the server. She seemed a bit puzzled of what I was trying to point out.
Log files are developers’ best friends. When I was still a student and was developing using Windows’ classic notepad and was not yet acquainted to debugging tools, I referred to logs for errors and “watched” variables. I tended to log everything when frustrated. It was helping me; what could go wrong? I left snippets of code meant to log things when I submitted my works just so I can have a ready reference once something goes wrong. Only when I started working did I realize the problem it could introduce if not done properly.
Programming languages and frameworks are designed to perform things fast. Show a tech person a website that takes a long time to load in the browser for whatever reason be it internet speed, architecture, etc. and somebody will raise one of their eyebrows. This same capability of frameworks to perform processes fast makes a lot of developers forget to properly handle logging.
Concurrency. This can be the number one problem if logging is mishandled. And the annoying thing about it is that you know as a developer that you should be wary about concurrency issues but you missed this one about concurrent log file access because you were treating logging as a small thing. And to make it worse, your website will behave as expected as long as one script finishes logging to the file before another starts writing its own logs; something that is very possible until the website gets large traffic.
I told our developer that Python is a very fast language and that’s probably why the website did okay for years with all the logging that was happening. But now that it’s handling a lot of requests, it’s now encountering concurrent file access issues. For a quick test and patch I suggested that she use different log files and see if the website will still encounter the issue. She reported that it worked. I asked her if it was really necessary to put everything in a single log file but she said she would have to confirm with the client first. I then just told her that if it’s really necessary, she should create a handler that stores all logs in memory and then flushes it in the file after about one to three seconds. There will be a delay but no logs should be lost and there should no more of the IO exceptions that the site is currently encountering. I will find out later. -aB
Featured Image by: ~no-reaction