Stratum-1 NTP Server & ntppool.org

It's about time

What is ntppool.org? From the website.
"The pool.ntp.org project is a big virtual cluster of timeservers providing reliable easy to use NTP service for millions of clients.
The pool is being used by hundreds of millions of systems around the world. It's the default "time server" for most of the major Linux distributions and many networked appliances (see information for vendors).
Because of the large number of users we are in need of more servers. If you have a server with a static IP address always available on the internet, please consider adding it to the system."
While not good OpSec*, I wanted to support the ntppool by adding a stratum-1 server to it.
* While the GPS coordinates of my server give my location away, this is public information already. Also note that the rounding actually results in Google Maps not pinpointing my home exactly :). My IP address is easily found too, however I'll have to live with it being as easily found as it is now.

In order not to publish any internal stratum-1 servers to the internet the firewall itself was "upgraded" with a GPS configured to provide NMEA time info and PPS over serial - more on that later. While this is something you should never do on your corporate firewall, I've accepted the risk associated with this - most importantly NTP is not running as root.

After configuring and tuning NTP on the firewall [4], it was time to add it to the pool [1]. According to the firewall logs the monitoring station in Newark, NJ, USA connected shortly thereafter racking up the score of the system over the hours of March 6th 2020, as seen in figure 1.

Figure 1 - Initial Monitoring 

Note: It is noticeable from the monitoring that the server was added to hastily - It was still being tuned.

Update: After the first day the score has been consistently at 20 as shown in figure 2.

Figure 2 - March 9th and 10th Monitoring

After achieving a score of 10+ clients started connecting (as expected). Connected peers can be shown using $ sudo ntpq -n -cmru. However I'm also logging all requests on the WAN interface to a separate log file. To ensure that it doesn't outgrow the disk, logrotate is configured to rotate that logfile daily with only 1 backup. The logs are ingested into elasticsearch, so there's no reason to keep those logs on the firewall itself for longer periods of time.
The logging performed for NTP may seem excessive and puts an extra load on the system, however is justified by the need for visibility into this traffic.
The ntp logging is used to monitor NTP activity, including vizualizations in Kibana. A few examples are shown in the figures below. The numbers are relatively low, as I had a power outage just before capturing these images, thus the 24 hour view lacks a few hours of activity.

 Figure 3 - Count of requests and peers

 Figure 4 - World Map of NTP peers 

 Figure 5 - Autonomous System and Country

In an effort to provide a dashboard for everything NTP monitoring, local dashboards for NTPPOOL monitoring were also created. The data from ntppool.org is pulled daily at noon (RandomizedDelaySec=7200) and ingested into elasticsearch. The specific visualizations are shown below.

Figure 6 - Offset Monitoring based on NTPPOOL monitoring data

Figure 7 - Score Monitoring based on NTPPOOL monitoring data

When the NTP server was added to the pool, the NTPPOOL monitoring was giving it a low score which slowly, but steadily, increased the following hours. The score then saw a slight decrease, correlating with the (planned) change to an outdoor GPS antenna. Late on the 7th the score was at 20, and stayed there until a power outage on the 12th (local incident in the household), which caused it to drop below 10 (5.7 to be precise). The development in score from Day 0 until March 15th is shown in Figure 8, below.

Figure 8 - Score Monitoring from Day 0 - March 15th

An observation about ISP's and Cloud Providers use of servers in the pool
NTP clients from several major ISP's as well as Cloud Providers connect to timeservers in the pool including mine. While that is fine, it seems wrong - at least to me - that these providers (often) do not themselves support the pool with NTP servers and/or help their customers use the providers own servers. The cost of doing that would be negligible and they (supposedly) have the required infrastructure.
So if you're working for an ISP or Cloud Provider, please push this agenda to the right people, you're all running on open source software anyway.

The actual configuration

The original firewall, discussed in a previous blog post was (re)configured as discussed briefly below.

The GPS receiver used is based on the U-blox NEO-7M. Chinese factories are cranking these boards out in high numbers and they can be found on e-bay at around the $7 price tag. I've bought quite a few of these exact boards, and have 5 stratum-1 servers deployed so far using these.

Please note that they do not have any holdover, so will stop disciplining NTP when there's no GPS-fix. When that happens, the quality of the other peers configured in ntp.conf and the internal real time clock (RTC) decides how badly the server will drift until there's a GPS-fix again. Buying a device with an oven controlled crystal (OCXO) would help mitigate this, however the price tag is much higher. (Still considering it, as sometimes used ones come up at decent prices). Be aware that devices such as BG7TBL's GPSDO (GNSSDO) does not deliver pps without a GPS-fix either, so don't buy them for that purpose, however there's some refurbished Symmetricom and Samsung devices available that would do the job.

alt text
Picture 1 - U-Blox NEO7M board

Opening the APU and connecting the GPS

For the APU4C4 case I drilled a 6.5 mm hole in the front of the lid and mounted the GPS board internally. Had to mount it there, as there's no room left at the back with 4 NIC's and wireless. The board is connected to J18 on the APU board itself as discussed below.

alt text
Picture 2 - APU 4C4 with mounted GPS board

Connect the GPS to J18 as described in Table 1 using serial #3 of the LPC UART (schematics for for the APU4C4 board can be found here: https://www.pcengines.ch/schema/apu4c.pdf), specifically the LPC UART is shown on page 12 of 18.

Connecting to serial #3 translates to the serial device becoming /dev/ttyS2. It was used for the following main reasons:
  1. Serial #1 (COM1 on the schematic) is used for console access and should be reserved for that purpose.
  2. The GPS board is 3v so doesn't support the RS232 levels on Serial #1 or Serial #2 (COM2 on the schematic).
  3. Serial #2 Also does not have DCD (or RI) required for PPS.
  4. Serial #3, thus is the first feasible port for communicating with the GPS.

GPS to LPC UART connections
GPS   J18     J18 Pin     Comment
GND         Ground    1    Ground
VCC  V3    2    3 Volt
TXD         RXD3#    7    TX (GPS) -> RX (J18)
RXD        TXD3#    8    RX (GPS) -> TX (J18)
PPS        DCD3#    9    Kernel PPS uses DCD
Table 1 - GPS to APU4C4 connection


CPU Throttling

The impact on PPS timing from the CPU changing clock frequency dynamically is very noticeable on the accuracy of NTP. Thus the system is configured with the performance governor set to performance using sysfsutils. The performance governor ensures that all CPU cores will run at 1 Ghz all the time. This was added to the main script, as that doesn't hurt netfilter's performance either.

[1] How do I join pool.ntp.org? https://www.ntppool.org/en/join.html
[2] My NTP server: https://www.ntppool.org/a/aika &
[3] http://support.ntp.org/bin/view/Servers/PublicTimeServer001660
[4] Debian Firewall: https://github.com/martinboller/DebFirewall
[5] PC Engines APU4C4 Schematics: https://www.pcengines.ch/schema/apu4c.pdf

Added server to Beta monitoring and changed the graphs accordingly.

Figure 9 - Beta Offset Monitoring

 Figure 10 - Beta Score Monitoring

No comments: