3 posts - 2 participants
]]>Potentially a XY problem question, but hopefully I can have some insights.
I have two hobby Telegram bots deployed to Fly.io (if that matters, Airnope to kill cryptocurrency spa, and a simple list app, Bot na Lista), both written in Rust with Teloxide.
I had to set them both to:
auto_stop_machines = false
min_machines_running = 1
So they don’t suspend/stop the machines. I wasn’t expecting to need that, because the way Telegram bots work is via webhook, meaning, every time there is a message for the Bot, Telegram servers send a POST
request to Fly.io (to my bot).
I expected that to re-start the suspended or stop machines, but it does not. Is that expected?
Any ideas to debug the whole integration?
Many many thanks,
4 posts - 2 participants
]]>Nothing works on my end and am wondering what the issue is. Everything works locally.
3 posts - 2 participants
]]>My apps are hanging and not working on the servers despite working locally.
I do not know how to proceed here as nothing works despite the status saying ‘operational’
8 posts - 8 participants
]]>I spinned up a working vm yesterday
, everything worked untill this morning. The machine the volume was connected to is gon, the id is still connected to the volume.
And when i did a deploy today it spinned up a new machine with a new volume.
Now my issue is that the volume is still connected to the ghost machine and i cannot move? It to the other one.
Marked in red is the ghost machine:
I tried forking it, but it needs a machine:
fly volumes fork vol_v3yqonej59x0npl4 -a qwikwinsservice
Error: failed to get VM 148e460b452918: machine not found (Request ID: 01JDKXZPGG6P54Q0WTS5DNWSZ8-ams)
Question:
How can i attach the old volume to the new machine?
Thanks
1 post - 1 participant
]]>I am moving from Digital Ocean to Fly.io, but I can’t quite get my head around the right configuration to allow my Fly.io app to safely access my Digital Ocean managed postgres instances.
As you all likely know, Digital Ocean’s Managed DBs limit requests based on IP, but Fly.io instances don’t have a fixed outgoing IP address, so that’s an issue.
I’m assuming what we’d do is:
I’m checking all of this is correct because this thread seems to imply that something about Fly.io makes using Wireguard easier than OpenVPN (which we already have running in the Digital Ocean account for other reasons), but that’s not actually clear to me in this configuration.
It also says “it also requires more Docker shenanigans to get connected” and I’m far too serious of a person to engage in shenanigans.
@kurt also keeps linking this page in response to these types of threads, but I’m confused by that because this page is all about routing between fly resources or connecting from the outside internet… into Fly… I think?
Thank you friends I hope this makes sense <3
3 posts - 2 participants
]]>72 posts - 34 participants
]]>This is requested with some (current) frustration, but it has also been a repeated experience: There’s currently an issue on the status page called Degraded API Performance. But what it actually seems to mean: we cannot deploy. Numerous calls to this API are required to effect a deploy, and apparently some are 5xxing.
Random examples from retries in our deploy action:
Run superfly/flyctl-actions/setup-flyctl@master
Error: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
Run flyctl deploy --image-label git-3ce0ed3 --build-arg CI_RELEASE=prod-v168-git-3ce0ed3 --config fly.prod.toml --strategy rolling --auto-confirm --remote-only --verbose
==> Verifying app config
--> Verified app config
Validating fly.prod.toml
✓ Configuration is valid
Error: server returned a non-200 status code: 504
Error: failed retrieving current user: server returned a non-200 status code: 504
Error: Process completed with exit code 1.
I have been oncall throughout my career and I’m sympathetic to the balance that must be struck by oncall responders in just stating the facts, vs wasting time editorializing what impact could be. I don’t think Fly is intentionally diffusing the impact of the issue.
But our experience is that many dryly-named & narrated outages at fly, like “degraded API performance”, often have very blunt & immediate impact which we only discover when we try to ship. There’s something in this update about “Corrosion”; while it’s cool to see technical details, in the moment I don’t really care what the name of the global state store is. I mostly just care (a) are my servers responsive, and (b) are deploys working.
So the concrete suggestion is: Run deploys continuously, from a variety of regions, and show whether they’re succeeding or not. I think it might tell a more complete story about what’s going on, and complement the work of oncall responders when there is an outage. Thanks!
2 posts - 2 participants
]]> Litestack modifies config/database.yml with adapter: litedb
and database: <%= Litesupport.root("production").join("data.sqlite3") %>
It also adds development and production folders to the db folder with data.sqlite3 being the sqlite files in each.
fly volumes create sqlite_volume --region syd -n 1 --size 1 --app my-app
fly volumes list
ID STATE NAME SIZE REGION ZONE ENCRYPTED ATTACHED VM
vol_vg32ogzyxjdxyyyy created sqlite_volume 1GB syd 80c5 true
[[mounts]]
source = "sqlite_volume"
destination = "/data"
LITESTACK_DATA_PATH="/data"
fly deploy
✓ Configuration is valid
.
.
Error: Process group 'app' needs volumes with name 'sqlite_volume' to fulfill mounts defined in fly.toml; Run `fly volume create sqlite_volume -r REGION -n COUNT` for the following regions and counts: syd=2
I don’t understand the error message - I already created the volume and added it as source in fly.toml. What else do I need to do?
I thought I was well prepared for this but I must have missed something? Can anyone please help?
10 posts - 3 participants
]]>I’m having some issues with my Django Channels WebSocket application scaling configuration. I’ve set up connection-based scaling with a softlimit of 20 and hardlimit of 25 connections. However, the limits don’t seem to be working as expected, and I’m noticing some strange behavior.
The main issues I’m experiencing are:
Look at some metrics from grafana:
Here is my fly.toml file:
primary_region = 'gru'
console_command = 'python manage.py shell'
[build]
dockerfile = "./infra/Dockerfile"
[deploy]
strategy = "bluegreen"
[[services]]
internal_port = 8000
protocol = "tcp"
auto_stop_machines = "stop"
auto_start_machines = true
min_machines_running = 0
[services.concurrency]
type = "connections"
hard_limit = 25
soft_limit = 20
[[services.ports]]
handlers = ["http"]
port = "80"
[[services.ports]]
handlers = ["tls", "http"]
port = "443"
[[services.tcp_checks]]
interval = 10000
timeout = 2000
[[vm]]
memory = '2gb'
cpu_kind = 'shared'
cpus = 2
[[statics]]
guest_path = '/src/output/staticfiles'
url_prefix = '/static/'
I have another monitoring interface and it showed that at this time there was 60 connections with the websocket
Look at this other example:
For instance, there is 12-15 people using the websocket on the last minutes and suddenly the “Concurrency” drops to 0. We know the machine is still running by looking on the other metrics and we also know that Websocket users are connected because we are monitoring via our own monitoring system, so what is going on?
This looks like an issue on Flys end. I don’t whether its on the metrics side or on the auto balancing part, but definetely on the infrastructure management part.
1 post - 1 participant
]]>2 posts - 2 participants
]]>flyctl status
mean?
WARN WARNING the config file at ‘D:\personal\tnext\fly.toml’ is not valid: json: cannot unmarshal string into Go struct field HTTPService.http_service.auto_stop_machines of type bool
fly.toml:
app = 'trilium-rmi9xw'
primary_region = 'sea'
[build]
dockerfile = 'Dockerfile.alpine'
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = 'stop'
auto_start_machines = true
min_machines_running = 0
processes = ['app']
[[vm]]
memory = '1gb'
cpu_kind = 'shared'
cpus = 1
2 posts - 2 participants
]]>I am getting the following error
litestream restore -o mydb.sqlite3 s3://my-bucket.fly.storage.tigris.dev/my-db.sqlite3
2024/11/25 13:42:44 ERROR failed to run error="cannot fetch generations: cannot lookup bucket region: InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.\n\tstatus code: 403, request id: blah, host id: blah"
my litestream.yml looks like this
dbs:
- path: /litefs/my-db.sqlite3
meta-path: /data/litefs/my-db.sqlite3-litestream
replicas:
- type: s3
bucket: ${BUCKET_NAME}
path: my-db.sqlite3
endpoint: ${AWS_ENDPOINT_URL_S3}
region: ${AWS_REGION}
access-key-id: ${AWS_ACCESS_KEY_ID}
secret-access-key: ${AWS_SECRET_ACCESS_KEY}
force-path-style: true
I have also taken the liberty of adding
LITESTREAM_ACCESS_KEY_ID and LITESTREAM_SECRET_ACCESS_KEY to secrets with the same values of the AWS keys in case litestream is looking for them.
1 post - 1 participant
]]>fly logs -a Myapp
if possible streaming the logs output would be amazing to show users the status of there app whats happining
5 posts - 2 participants
]]>1 post - 1 participant
]]>I am trying to connect an elixir/phoenix app to crunchy-bridge. On deploy, I run a create and migrate command via the deploy shell script. When running a fly deploy github action, I’m getting the following output on the fly.io machine logs:
2024-11-25T04:13:46.471 app[x] iad [info] 2024/11/25 04:13:46 INFO SSH listening listen_address=[y]:22 dns_server=[z]:53
2024-11-25T04:13:48.292 app[x] iad [info] IP [error] Postgrex.Protocol (#PID<0.151.0>) failed to connect: ** (DBConnection.ConnectionError) ssl connect: Options (or their values) can not be combined: [{verify,verify_peer},
2024-11-25T04:13:48.292 app[x] iad [info] {cacerts,undefined}] - {:options, :incompatible, [verify: :verify_peer, cacerts: :undefined]}
My runtime.exs
file includes the following:
config :my_app, MyApp.Repo,
ssl: true,
url: database_url,
pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10"),
socket_options: maybe_ipv6,
# username: "application",
# password: database_pw,
database: database_name
The crunchy bridge helper provides the following instructions for Phoenix:
DATABASE_URL
env variablePhoenix uses postgres by default when you generate a new application.Using the generator below you can access the connection url. Since Phoenix expects a URL the format has be preset to URL.
config/prod.secret.exs
fileWhen connecting to your Crunchy database we enforce SSL. You will need to un-commnent the following line to enable SSL in your Repo connection.
postgres://application:…{long_string}
I set the database_url environment variable in the fly.io secrets, and it appears to be loading correctly into the application.
Both the create
and migrate
functions lead to the same error shown above. Both work locally on my machine.
def migrate do
load_app()
for repo <- repos() do
{:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, :up, all: true))
end
end
def create do
load_app()
Enum.each(repos(), fn repo ->
case repo.__adapter__().storage_up(repo.config()) do
:ok ->
IO.puts("The database for #{inspect(repo)} has been created")
{:error, :already_up} ->
IO.puts("The database for #{inspect(repo)} has already been created")
{:error, term} when is_binary(term) ->
raise("The database for #{inspect(repo)} couldn't be created: #{term}")
{:error, term} ->
raise("The database for #{inspect(repo)} couldn't be created: #{inspect(term)}")
end
end)
end
defp repos do
Application.fetch_env!(@app, :ecto_repos)
end
defp load_app do
:ssl.start()
Application.load(@app)
end
Any suggestions for where to troubleshoot?
2 posts - 1 participant
]]>In the .env file it’s defined as:
NEXT_PUBLIC_BASE_URL=https:www.rleaguez.com/
I have it defined in the fly.io webpage admin in secrets with name & value;
in my local terminal if I use flyctl secrets list I can view the secret listed;
I have seen this help page Fly.io build secrets which states we need to
In the docker file there is this section where it’s mounted:
# Build application
RUN --mount=type=secret,id=NEXT_PUBLIC_BASE_URL \
NEXT_PUBLIC_BASE_URL="$(cat /run/secrets/NEXT_PUBLIC_BASE_URL)" \
yarn run build
then when I deploy in terminal I use:
fly deploy \
--build-secret NEXT_PUBLIC_BASE_URL="https:www.rleaguez.com/"
(this deploy seems to be much faster than a normal deploy with out the secret;)
In the page I placed a simple console log to view the value in the browser such as
console.log("process.env.NEXT_PUBLIC_BASE_URL === ", process.env.NEXT_PUBLIC_BASE_URL);
But what occurs is that the value of the secret is stuck on “http://localhost:3000” not the production value which should be “https:www.rleaguez.com/”
If you want to see the console log the page is here: [Page with console log localhost](https://www.rleaguez.com/organizationz/create)
What needs to be done differently to get the correct secret value in this Next.js app on fly.io?
Thanks in advance for any & all feedback~
3 posts - 2 participants
]]>I’m seeing the following logs on an instance:
2024-11-24T14:44:28Z runner[] cdg [info]machine has reached its max restart count of 10
2024-11-24T14:44:33Z proxy[] cdg [info]waiting for machine to be reachable on 0.0.0.0:8080 (waited 12.53250087s so far)
2024-11-24T14:44:35Z proxy[] cdg [error][PM05] failed to connect to machine: gave up after 15 attempts (in 14.533430503s)
2024-11-24T14:44:37Z proxy[] cdg [info]Starting machine
etc.
So basically, the machine hits its max restart count, and then… immediately restarts.
Is there a way to make it stop trying to restart?
7 posts - 3 participants
]]>I don’t understand what is happening. The log says that the app is not listening on the correct port and host, but I think it is. In fact, the logs of the Puma server show that the host is 0.0.0.0
and the port is 3000
.
Could somebody help me?
2024-11-24T10:14:55Z runner[48e290ea749928] mad [info]Machine started in 866ms
2024-11-24T10:14:55Z proxy[48e290ea749928] mad [info]machine started in 880.191048ms
2024-11-24T10:14:55Z proxy[48e290ea749928] mad [info]machine became reachable in 6.735179ms
2024-11-24T10:14:55Z proxy[48e290ea749928] mad [error][PC01] instance refused connection. is your app listening on 0.0.0.0:3000? make sure it is not only listening on 127.0.0.1 (hint: look at your startup logs, servers often print the address they are listening on)
2024-11-24T10:14:56Z app[48e290ea749928] mad [info]2024/11/24 10:14:56 INFO SSH listening listen_address=[fdaa:0:bc23:a7b:48:7e3c:932f:2]:22 dns_server=[fdaa::3]:53
2024-11-24T10:14:57Z app[48e290ea749928] mad [info]=> Booting Puma
2024-11-24T10:14:57Z app[48e290ea749928] mad [info]=> Rails 7.2.1 application starting in production
2024-11-24T10:14:57Z app[48e290ea749928] mad [info]=> Run `bin/rails server --help` for more startup options
2024-11-24T10:14:59Z app[48e290ea749928] mad [info]Puma starting in single mode...
2024-11-24T10:14:59Z app[48e290ea749928] mad [info]* Puma version: 6.4.3 (ruby 3.3.5-p100) ("The Eagle of Durango")
2024-11-24T10:14:59Z app[48e290ea749928] mad [info]* Min threads: 5
2024-11-24T10:14:59Z app[48e290ea749928] mad [info]* Max threads: 5
2024-11-24T10:14:59Z app[48e290ea749928] mad [info]* Environment: production
2024-11-24T10:14:59Z app[48e290ea749928] mad [info]* PID: 336
2024-11-24T10:14:59Z app[48e290ea749928] mad [info]* Listening on http://0.0.0.0:3000
2024-11-24T10:14:59Z app[48e290ea749928] mad [info]Use Ctrl-C to stop
2024-11-24T10:15:11Z proxy[48e290ea749928] mad [error][PR04] could not find a good candidate within 21 attempts at load balancing
fly.toml
# fly.toml app configuration file generated for cartly on 2024-11-23T19:30:23+01:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#
app = '###########'
primary_region = 'mad'
console_command = '/rails/bin/rails console'
[build]
[deploy]
release_command = './bin/rails db:prepare'
[http_service]
internal_port = 3000
force_https = true
auto_stop_machines = 'stop'
auto_start_machines = true
min_machines_running = 0
processes = ['app']
[[mounts]]
source = "########"
destination = "#########"
[[vm]]
memory = '1gb'
cpu_kind = 'shared'
cpus = 1
[[http_service.checks]]
grace_period = "30s"
[[statics]]
guest_path = '/rails/public'
url_prefix = '/'
Dockerfile
# syntax = docker/dockerfile:1
# Make sure RUBY_VERSION matches the Ruby version in .ruby-version and Gemfile
ARG RUBY_VERSION=3.3.5
FROM ruby:$RUBY_VERSION-slim AS base
LABEL fly_launch_runtime="rails"
# Rails app lives here
WORKDIR /rails
# Set production environment
ENV BUNDLE_DEPLOYMENT="1" \
BUNDLE_PATH="/usr/local/bundle" \
BUNDLE_WITHOUT="development:test" \
RAILS_ENV="production"
# Update gems and bundler
RUN gem update --system --no-document && \
gem install -N bundler
# Throw-away build stage to reduce size of final image
FROM base AS build
# Install packages needed to build gems
RUN apt-get update -qq && \
apt-get install --no-install-recommends -y \
build-essential \
git \
pkg-config \
libpq-dev \
libz-dev \
dh-autoreconf \
libexpat1-dev \
libglib2.0-dev \
libvips-dev \
libvips \
libvips-tools \
curl \
postgresql-client \
neovim \
xz-utils && \
rm -rf /var/lib/apt/lists /var/cache/apt/archives
# Install the latest stable LTS version of Node.js
RUN curl -fsSL https://deb.nodesource.com/setup_lts.x | bash - && \
apt-get install -y nodejs && \
npm install -g npm@latest
# Install jemalloc
RUN git clone https://github.com/jemalloc/jemalloc.git && \
cd jemalloc && \
git checkout 5.2.1 && \
autoconf && \
./configure && \
make && \
make install
# Install application gems
COPY Gemfile Gemfile.lock ./
RUN bundle install && \
bundle exec bootsnap precompile --gemfile && \
rm -rf ~/.bundle/ "${BUNDLE_PATH}"/ruby/*/cache "${BUNDLE_PATH}"/ruby/*/bundler/gems/*/.git
RUN mkdir /data
COPY ./config/locale /data/locale
# Copy application code
COPY . .
# Precompile bootsnap code for faster boot times
RUN bundle exec bootsnap precompile app/ lib/
# Precompiling assets for production without requiring secret RAILS_MASTER_KEY
RUN SECRET_KEY_BASE_DUMMY=1 ./bin/rails assets:precompile
# Final stage for app image
FROM base
# Install packages needed for deployment
RUN apt-get update -qq && \
apt-get install --no-install-recommends -y curl imagemagick libvips postgresql-client && \
rm -rf /var/lib/apt/lists /var/cache/apt/archives
# Copy built artifacts: gems, application
COPY --from=build "${BUNDLE_PATH}" "${BUNDLE_PATH}"
COPY --from=build /rails /rails
# Run and own only the runtime files as a non-root user for security
RUN groupadd --system --gid 1000 rails && \
useradd rails --uid 1000 --gid 1000 --create-home --shell /bin/bash && \
chown -R 1000:1000 db log storage tmp
USER 1000:1000
# Deployment options
ENV RUBY_YJIT_ENABLE="1"
# Entrypoint sets up the container.
ENTRYPOINT ["/rails/bin/docker-entrypoint"]
# Start the server by default, this can be overwritten at runtime
EXPOSE 3000
CMD bin/rails server -b 0.0.0.0 -p 3000
1 post - 1 participant
]]>here are snippets of the config files, as I need some assistance.
litestream.yml
dbs:
- path: /data/litefs/app.sqlite3
replicas:
url: "sftp://${RCLONE_SFTP_USER}:${RCLONE_SFTP_PASS}@localhost:${RCLONE_SFTP_PORT}/db-backup/app.sqlite3"
litefs.yml
.
.
.
exec:
# Set the journal mode for the database to WAL. This reduces concurrency deadlock issues
- cmd: sqlite3 ${DATABASE_PATH} "PRAGMA journal_mode = WAL;"
if-candidate: true
# Start rclone to serve the SFTP server for Dropbox backups
- cmd: rclone serve sftp the-dropbox:/db-backup --user ${RCLONE_SFTP_USER} --pass ${RCLONE_SFTP_PASS} --addr :${RCLONE_SFTP_PORT}
if-primary: true
# Start litestream for backup to Dropbox
- cmd: /usr/local/bin/litestream replicate -config /etc/litestream.yml
if-primary: true
- cmd: dotnet APP.dll
In the litestream.yml file should path be to the litefs mount or the real location of the db file on the fly volume eg “/data/litefs” also when I look into this directory i see it has actually placed the db file on the fly volume in “/data/litefs/dbs”
What is the correct path to place in litestream.yml
rclone seems to be a long running app, and litestream may be too, so the way i am running them in litefs.yml it never gets past starting rclone.
What is the better way to start rclone and litestream?
Thanks in advance.
1 post - 1 participant
]]>// Create an Express app
const app = express();
// Enable CORS
app.use(cors());
// Parse JSON requests
app.use(express.json());
// Use task routes
app.use(‘/tasks’, taskRoutes);
// Use user routes
app.use(‘/users’, userRoutes);
// Authenticate routes
app.use(auth.authenticate);
// Port number
const PORT = process.env.PORT || 3000;
// Start the server
app.listen(PORT, () => {
console.log(Server started on port ${PORT}
);
});
// Error handling
process.on(‘uncaughtException’, (err) => {
console.error(‘Uncaught exception:’, err);
process.exit(1);
});
// Close database connection on exit
process.on(‘exit’, () => {
mongoose.connection.close();
});
1 post - 1 participant
]]>$ ssh git.adoublef.dev -v
OpenSSH_9.8p1, LibreSSL 3.3.6
debug1: Reading configuration data /Users/me/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug1: Connecting to git.adoublef.dev port 22.
ssh: connect to host git.adoublef.dev port 22: Operation timed out
I am using a cloudflare domain and wondering what could be the issue.
13 posts - 3 participants
]]>My last deployment was 3 months ago. What happened in the meantime ?
Here my database url : postgres://user:password@app-name.flycast:5432/db-name?sslmode=disable
I restarted my postegresql cluster, nothing works.
I can connect via
fly proxy 5432 -a app-name
1 post - 1 participant
]]>the error is:
1.718 ERROR: Error [Errno 2] No such file or directory: ‘git’ while executing command git version
1.719 ERROR: Cannot find command ‘git’ - do you have ‘git’ installed and in your PATH?
How can I resolve this issue?
2 posts - 1 participant
]]>So I followed the instructions at Custom domains · Fly Docs - the domain got verified, all green and then start receiving a [ERR_SSL_VERSION_OR_CIPHER_MISMATCH
]
Check the docs and apparently it doesnt work that i have two TLS certificates at the domain or something like that.
So I went to my namecheap panel, changed the DNS hosting from the cloudflare servers to the basic namecheap dns and repeat the process.
Copied the A, AAAA, CNAME records, all green but still when I visit the domain got “This site cant be reach”
Has anyone faced a similar situation? If so what do you do to solve this issue?
2 posts - 1 participant
]]>I have a project here that seems to have one of the machines in some kind of zombie mode. I am unable to start or delete the machine either. This has been happening for over 24 hours now. I am also not able to deploy anything whether from local or via GitHub Actions. How can I fix this?
11 posts - 4 participants
]]>I have been chasing an issue with the performance of my database for a few days now, and I think I finally have a small clue about it. For reasons still unknown to me, my CPU remains within the lower limits for most of the day. We are talking about less than 30% usage. The issue is that the load spikes to more than 100% throughout the day, and it does this in a very noticeable pattern. I have been able to identify that it does this in 90-100 minutes intervals. What I have not been able to identify is what causes them. I have looked into scheduled jobs, autovacuum/autoanalyze operations, replication. I have toyed with settings to the workers, shared_buffers, connections, you name it. My volume is fairly constant so if it was application load related I should be seeing constant load issues and perhaps higher CPU usage, but the constant spikes on very marked specific cycles make me thing there is something else going on.
The latest clue I have is that it has to do with checkpoints. Searching through the logs I have come across these messages, and while the checkpoints happen more often than the load spikes, I have seen them at the start of every spike. The actual checkpoint process seems to last about 5 minutes, but the spike period lasts around 30-40 minutes constantly.
2024-11-22T06:33:05.816 app[1781959f4de2d8] iad [info] postgres | 2024-11-22 06:33:05.815 UTC [477] LOG: checkpoint complete: wrote 32949 buffers (6.3%); 0 WAL file(s) added, 0 removed, 6 recycled; write=269.959 s, sync=0.743 s, total=270.705 s; sync files=44, longest=0.434 s, average=0.017 s; distance=94509 kB, estimate=94509 kB; lsn=DE/4547FE20, redo lsn=DE/1712E1B8
2024-11-22T06:33:27.206 proxy[1781959f4de2d8] iad [error] [PC05] timed out while connecting to your instance. this indicates a problem with your app (hint: look at your logs and metrics)
2024-11-22T06:33:27.223 proxy[1781959f4de2d8] iad [error] [PC05] timed out while connecting to your instance. this indicates a problem with your app (hint: look at your logs and metrics)
Now, the timeouts after the checkpoints are kind of telling, my theory so far is this:
For more context, I even upgraded to a 16x machine for a couple of days to try to see if that would help, but even then the spikes continued, and obviously a 16x machine I think is an overkill for what I am doing. Even the 8x one is over-provisioned, but I can work with that for a few more days.
Has anyone experienced similar issues? I am not a Postgres expert by any means, and I am not very familiar with things like replication, WAL settings, and similar. But even then, I am running this on a single machine. (I know I dont have redundancy, but my use case is very specific and I can live without it).
Here are some charts that show you the spikes in a very constant pattern.
4 posts - 2 participants
]]>Hi, I’m hoping to launch my RAG AI chatbot app on Fly soon, and I’m looking at possible costs. I’ve heard from other devs/Reddit etc. that Fly is super reasonably priced, low monthly bills, which is super great especially at the beginning.
However, in my case I’m looking at GPUs and I guess Volumes for storing chat histories etc. volumes can store vector embeddings?
Might have been messing up with the calculator but with my guesstimates using the L40S put me at nearly $1k/month. A10 did a bit worse, really.
And it’s very possible my site will not see a lot of traffic at all and will still help me aggregate job matches and I’ll get what I want. However, it isn’t an AI tool people have seen much of before, but there are other AI applications being built in the same hiring/HR sector.
I very much do want to be able to (auto?)scale to high traffic/traffic elasticity if it somehow comes to that/gets viral/hype etc. I want to be very prepared. (though mindful of Murphy’s Law…)
The app’s entire purpose is to get me a new job, so if it goes viral I’ll be choosing amongst the highest bidders… so it’s okay if I get a high Fly bill at that point. I’m willing to bet on myself.
But, I read the GPUs are priced in usage per second within the quota, so my calculator fears may be overblown.
Any input is appreciated! I’m brand new to Fly, though I’ve heard y’all on Changelog when I listen to them! <3
3 posts - 2 participants
]]>
Error updating Streamlit code: 422 Client Error: Unprocessable Entity for url: https://api.machines.dev/v1/apps/streamlit-hello47/machines/e825494f746078
Response Status: 422
Response Body: {"error":"Your organization has reached its machine limit. Please contact billing@fly.io"}
Failed to update code
but when i am creating a more new 50 machine its creating so the error reach machine limit is not making sense
this is my code which create a new app if app dosent exist
def deploy_or_update_app(app_name, streamlit_code, dependencies=None, region='iad'):
"""Comprehensive app deployment or update"""
try:
# Check if app exists
app_info = check_app_exists(app_name)
if not app_info:
# Create app if it doesn't exist
create_app(app_name)
# Allocate IPs
print("Allocating IPs...")
with concurrent.futures.ThreadPoolExecutor() as executor:
ipv4_future = executor.submit(allocate_ip, app_name, "v4")
ipv6_future = executor.submit(allocate_ip, app_name, "v6")
# Wait for both to complete
ipv4_response = ipv4_future.result()
ipv6_response = ipv6_future.result()
print("IPs allocated successfully")
# Prepare installation command
install_cmd = install_dependencies(dependencies or [])
# Encode the Streamlit code
encoded_code = base64.b64encode(streamlit_code.encode('utf-8')).decode('utf-8')
# Prepare machine configuration
machine_config = {
"name": app_name,
"region": region,
"config": {
"image": "registry.fly.io/myimaage",
"env": {
"PORT": "8505"
},
"files": [
{
"guest_path": "/app/streamlit_app.py",
"raw_value": encoded_code
}
],
"services": [
{
"ports": [
{
"port": 443,
"handlers": ["tls", "http"]
},
{
"port": 80,
"handlers": ["http"]
}
],
"protocol": "tcp",
"internal_port": 8505,
"autostop": "suspend",
"autostart": True
}
],
"processes": [
{
"cmd": [
"sh",
"-c",
f"{install_cmd} && streamlit run streamlit_app.py --server.port=8505 --server.address=0.0.0.0"
]
}
],
"guest": {
"cpu_kind": "shared",
"cpus": 2,
"memory_mb": 512
}
}
}
# Deploy or update machine
url = f"{FLY_API_HOSTNAME}/v1/apps/{app_name}/machines"
# Check if machine exists
existing_machines = requests.get(url, headers=headers).json()
if existing_machines:
# Update existing machine
machine_id = existing_machines[0]['id']
# Suspend the machine
suspend_url = f"{FLY_API_HOSTNAME}/v1/apps/{app_name}/machines/{machine_id}/suspend"
suspend_response = requests.post(suspend_url, headers=headers)
suspend_response.raise_for_status()
# Wait for suspension
max_suspend_attempts = 90
for _ in range(max_suspend_attempts):
status = check_machine_status(app_name, machine_id)
if status and status.get('state') == 'suspended':
break
time.sleep(0.5)
# Update machine
update_url = f"{FLY_API_HOSTNAME}/v1/apps/{app_name}/machines/{machine_id}"
response = requests.post(update_url, headers=headers, json={"config": machine_config["config"]})
else:
# Create new machine
response = requests.post(url, headers=headers, json=machine_config)
response.raise_for_status()
machine_id = response.json()['id']
# Monitor deployment
start_time = time.time()
timeout = 160 # 30 seconds timeout
while time.time() - start_time < timeout:
status = check_machine_status(app_name, machine_id)
if status and status.get('state') == 'started':
return {
"status": "success",
"message": "App deployed/updated successfully",
"app_url": f"https://{app_name}.fly.dev",
"machine_id": machine_id
}
time.sleep(0.5)
return {
"status": "error",
"message": "Deployment timeout",
"machine_id": machine_id
}
except Exception as e:
print(f"Deployment error: {str(e)}")
traceback.print_exc()
return {
"status": "error",
"message": str(e)
}
3 posts - 2 participants
]]>Does Fly still only support Intel Mac machines for building images? Is there maybe something I’m missing, like with containerization it’ll still ship if it doesn’t work on my own machine? (Macs are my only machines so far, but not for long…) Not sure if that goes logically, since I wouldn’t be running locally to deploy, but want to bring up the arm64 thing since I only have that version.
My other wonder is, what happens when I add GPUs for my chatbot app? (I do have several specific docs bookmarked so I can look at the GPUs section again too)
Thanks for any insight!
3 posts - 2 participants
]]>