Tech TIL

TIL: How to live-rotate PostgreSQL credentials

OK, I didn’t actually learn this today, but it wasn’t that long ago.

Postgres creds rotation is straightforward with the exception of the PG maintainers deciding in recent years that words don’t mean anything while designing their identity model. “Users” and “Groups” used to exist in PG, but were replaced in version 8.1 with the “Role” construct.

Here’s a map to translate PG identifies to a model that will make sense for anyone who is familiar with literally any other identity system.

PostgresLiterally anything else

Now that we’ve established this nonsense, here’s a way of handling live creds rotation.

CREATE ROLE user_group; -- create a role, give it appropriate grants.


CREATE ROLE user_green WITH ENCRYPTED PASSWORD 'REPLACE ME AS WELL' IN ROLE user_group nologin; -- This one isn't being used yet, so disable the login.

That gets you prepped. When you’re ready to flip things.

ALTER USER user_green WITH PASSWORD 'new_password' login;

Update the creds wherever else they need updating, restart processes, confirm everything is using the new credentials, etc. Then

ALTER USER user_blue WITH PASSWORD 'new_password_2' nologin;

Easy, peasy.


TIL: How to exclude specific Boto errors

Some background:

I previously wrote a python lambda function that copies AWS RDS snapshots to a different region. This has been working for months but recently started throwing this obfuscated error:

An error occurred (DBSnapshotAlreadyExists) when calling the CopyDBSnapshot operation: Cannot create the snapshot because a snapshot with the identifier copy-snap-of-TIMESTAMP-DB_NAME already exists.

Thinking this might be due to some timestamp shenanigans, I looked at the Cloudwatch Events trigger for the lambda and saw that there were two triggers instead of the original one that I setup. Both were scheduled for the same time. I deleted the new one and waited until the next day to see if the error re-occurred, which it did.

Looking through the Cloudwatch errors, even though the second trigger was gone, the lambda was still trying to execute twice. I’ve filed a support ticket with AWS, but in the meantime, needed to silence the false positives to keep people from getting paged.

The error handling had been done as:

except botocore.exceptions.ClientError as e:    
    raise Exception("Could not issue copy command: %s" % e)

Initially, I tried:

except botocore.exceptions.ClientError as e:
   if 'DBSnapshotAlreadyExists' in e:
       raise Exception("Could not issue copy command: %s" % e)

Instead, I had to do:

except botocore.exceptions.ClientError as e:
   if e.response['Error']['Code'] == 'DBSnapshotAlreadyExists':
       raise Exception("Could not issue copy command: %s" % e)

Which works.

Tech TIL

TIL: How to use NumPy

I’ve been trying to flesh out my Python knowledge and learn more about machine learning(ML) in the process. Most of my day-to-day Python use is focused on text manipulation, API calls, and JSON parsing, so leveling up on ML is more math (specifically stat-related) than I’m used to.

Today I played around with the NumPy Python package a bit and figured out some simple things.

For example, if I wanted to multiply the numbers in two lists with vanilla Python, like this:

a = [1, 2, 3, 4] 
b = [4, 3, 2, 1]

print(a * b)

I’d get TypeError: can’t multiply sequence by non-int of type ‘list’ . I’d have to write something to iterate through each list.

NumPy, on the other hand, can handle this like a champ. And this is probably the simplest thing you could use it for.

import numpy

a = [1, 2, 3, 4]
b = [4, 3, 2, 1]

new_a = numpy.array(a)
new_b = numpy.array(b)

print(new_a * new_b)

> [4 6 6 4]

NumPy really shines when you start dealing with multidimensional lists and stat work.


> array([[1, 2, 3, 4],
       [4, 3, 2, 1]])

And then it’s just turtles all the way down. You can slice intersections, calculate standard deviations, and so on. It’s a handy Python package that I knew literally nothing about prior today and a nice tool to add to the toolbox.

Tech TIL

TIL: How to use list comprehensions in python

In the past, if I wanted to make a new list by pulling values out of an existing list based on a condition I would have done something like:

def listItems():
    a = [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
    new = []
    for num in a:
        if num % 2 != 0:
    print new

But, I figured out via that list comprehenisions can dramatically compress these functions while still maintaining readability. Here’s an example of a list comprehension that would render the same output as the expanded function above:

def listComp():
    a = [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
    new = [ num for num in a if num % 2 != 0] 
    print new


The syntax is a little weird because python has so little structure to it “num for num in a…”, but makes more sense if you’re referencing a tuple, where it would be “( 1, 2 ) for num in a…”


Tech TIL

TIL: How to pass MySQL queries as variables in Bash

Note: I also learned how to handle query blocks with Bash here documents. Most of what I’ve done in the past with MySQL and Bash has been limited to single-line selects, so Yay!, learning.



user_id=$(mysql -u$db_user -p$db_pass $db_server <<GET_USER_ID
USE main;
SELECT user_id FROM users WHERE username="dave";

echo "Dave's User ID is $user_id"

It’s the small things in life…

Tech TIL

TIL: How to disable core dumps on AWS Linux

I ran across a situation where a previous sysadmin had enabled core dumps on a server that were filling up the root volume. The dumps weren’t needed, so I decided to disable them, problem is, I’ve never really dealt with core dumps because I’ve never had to, so I had to do some Googling.

Here’s the result:
# Only tested with AWS Linux but should work on RHEL, CentOS, Fedora, etc.

echo '*     hard    core    0' >> /etc/security/limits.conf
echo 'fs.suid_dumpable = 0' >> /etc/sysctl.conf
sysctl -p

Line 4 disables core dump creation for users. Line 5 disables setuid itself from generating core files, and Line 6 applies the changes to the running kernel.

Tech TIL

TIL: How to get RedShift to access S3 buckets in a different region

While trying to get a Spark EMR job running, I encountered this error on a Spark step that copied data from RedShift to S3.

error: S3ServiceException:The bucket you are attempting to access must be addressed using the specified endpoint.

I’ve seen issues in the past with S3 buckets outside of US-East-1 needing to be targeted with region-specific URLs for REST ( vs. but had not seen anything similar for s3:// targeted buckets.

This got me looking at Hadoop file system references, none of which are helpful because EMR rolls their own, proprietary file system for Hadoop S3 access. So Hadoop’s recommended s3a:// (which is fast and resilient – and supports self-discovering cross-region S3 access!) does not work on EMR. Your only option is s3://, which appears to be region-dumb.

The fix turns out to be simple, you just have to pass the bucket region to the Spark step as a separate argument.  i.e. us-west-2

… simple, but annoying, because the steps worked in a pre-prod environment (in a different region), so it wasn’t immediately apparent what was causing the failure, which was buried in the logs.

Tech TIL

TIL: How to get JBoss AS to gracefully handle database server failovers

Today I learned how to get JBoss AS (Wildfly) to not lose its mind during a database server failover event (as occurs in the cloud quite often).

The config is simple, but finding comprehensive documentation is a bit challenging since most current JBoss docs require a RedHat subscription. So you’re stuck piecing things together from multiple sites that contain tidbits of what’s needed.

Note: This is for a DBaaS scenario (think AWS RDS or Azure SQL Database), where the DB load balancing is done downstream of your app server. If you’re specifying multiple servers per connection, you’ll have to do some Googling of your own.

Otherwise, you’ve probably got a datasource (DS) connection defined in standalone.xml (or elsewhere depending on your deployment) that looks sort of like this:

<datasource jndi-name="java:jboss/datasources/defaultDS" enabled="true" use-java-context="true" pool-name="defaultDS" use-ccm="true">

Adding these options make JBoss handle failovers a bit better.

<datasource jndi-name="java:jboss/datasources/defaultDS" enabled="true" use-java-context="true" pool-name="defaultDS" use-ccm="true">
        <check-valid-connection-sql>SELECT 1</check-valid-connection-sql>

Checkout lines 2 and 8-14.

On line 2 we’ve added the autoReconnect=true option to the connection-url. This does exactly what it says. If a database connection attempt fails, JBoss will attempt to re-establish the connection instead of sulking in a corner like it does by default. But it needs a way to know that it should reconnect…

On lines 8-11, we’ve added a connection validation. I believe some DS drivers handle this on their own, but the MySQL JDBC drivers I’ve tested appear not to. This seems to be the standard workaround from what I could find but does have the downside of issuing wasteful queries on the DB. The “background-validation” setting helps a little by issuing the validation checks in a separate thread.

This section should force JBoss to drop dead DB connections instead of letting them gum up the pipes.

Lines 12-14 help with the same problem. By default, JBoss is supposed to flush stale connections (that’s what the docs say at least), but this doesn’t seem to always happen in practice. Using IdleConnections should take care of any failed connections that aren’t getting flushed or EntirePool can be used if you want to be really aggressive.